LexiLearn Study Companion

Intelligent study tool that extracts unknown vocabulary and generates audio for learning

Planned EdTech

Problem Solved

Students reading in a second language encounter unknown words and lose comprehension. Digital reading tools don't provide context-aware vocabulary support. LexiLearn extracts vocabulary automatically and helps students master new words.

Core Features

✓Upload learning materials (PDF, EPUB, TXT)
✓Automatic vocabulary extraction and term frequency analysis
✓Context-aware definitions with usage examples
✓Text-to-speech generation with adjustable speed
✓Spaced repetition review system for vocabulary retention
✓Personal vocabulary flashcards with progress tracking
✓Reading statistics and comprehension insights
✓Export study materials to Anki format

Technical Architecture

┌────────────────────────────────────┐
│     Angular Frontend (SPA)         │
│   - Document Upload                │
│   - Reading Interface              │
│   - Vocabulary Dashboard           │
│   - Flashcard Reviewer             │
└──────────────┬─────────────────────┘
               │
      ┌────────┴────────┐
      │                 │
┌─────▼──────────────────▼──────┐
│   Spring Boot REST API        │
│   - Auth Service              │
│   - Document Parser           │
│   - NLP Engine (vocabulary)   │
│   - Review Service (SRS)      │
│   - TTS Service               │
└──────────────┬─────────────────┘
      ┌────────┴─────────┬──────────────┐
      │                  │              │
┌─────▼─────┐   ┌────────▼──┐  ┌──────▼────┐
│PostgreSQL │   │ MongoDB   │  │Google TTS │
│(Progress) │   │(Document  │  │  API      │
│           │   │Content)   │  │           │
└───────────┘   └───────────┘  └────────────┘

Tech Stack

Frontend

• Angular 21
• TypeScript
• RxJS
• TailwindCSS
• Web Audio API

Backend

• Java 17
• Spring Boot 3
• OpenNLP
• Apache PDFBox
• Google Cloud TTS

Data & DevOps

• PostgreSQL
• MongoDB
• Redis (for caching)
• AWS Lambda (document processing)

Technical Challenges

Natural Language Processing at Scale

Integrated OpenNLP for tokenization and POS tagging to extract meaningful vocabulary. Built a processing pipeline using Spring Batch to handle large PDF uploads asynchronously without blocking users.

Spaced Repetition Algorithm (SRS)

Implemented the SM-2 algorithm to calculate optimal review intervals based on user performance. Used PostgreSQL scheduling to auto-generate review tasks without manual intervention.

Future Improvements

→AI-powered reading difficulty recommendations
→Collaborative learning groups with shared vocabulary lists
→Integration with popular books/articles APIs
→Mobile app for offline studying
→Integration with language learning platforms (Duolingo API)
→Grammar checker with explanations

Back Home