Project Overview (WIP)

1. WebSocket for Real-Time Communication

Purpose: Stream audio from the client to the backend service.
WebSocket Protocol: Allows for bidirectional communication between the client and server.
Audio Streaming: Client sends raw audio data every 250 milliseconds.
Transcription Frequency: Transcriptions are performed every second (4 chunks) to achieve near real-time speech-to-text conversion.
Return transcribed text back to client in real time

2. Whisper Model

Purpose: Perform speech-to-text recognition.
Model: Whisper model for efficient and accurate transcription.
Model Size: Start with base/small first
Deployment: Run locally on the host machine to avoid using OpenAI's API (quite expensive?)
Other option: Hugging face's model

3. LLama3 Model

Purpose: Perform error analysis on the transcriptions after the user has finished speaking.
Output: Generate an error report and provide recommendations for improvement.
Model Size: Start with 8B now, consider fine tune or upgrade to 70B later.
Other option: Hugging face's model
How to handle the poorly transcribed inputs gracefully

4. LangChain

Purpose: Orchestrate the NLP tasks and manage the workflow.
Functionality: Chain Whisper and LLama3: Create a pipeline that first processes the audio with Whisper and then analyzes the text with LLama3.
Prompt Engineering: Craft prompts for LLama3 to ensure accurate and relevant error analysis.
RAG (Retrieval-Augmented Generation): Future integration with a customized curriculum database for enhanced feedback and learning resources.

After proof of concept:

5. Verify JWT for Controlled Access

Connection: Accept any WebSocket connection.
JWT Token: Require the client to send their JWT token as the first message.
Verification: Authenticate the token with the backend.
Access Control: Allow further communication only if the JWT token is valid.

6.SSL Connection

Encryption: Use SSL/TLS to secure the WebSocket connection.
Certificates: Implement proper SSL certificates.
Configuration: Configure the WebSocket server to use secure WebSockets (wss://).

Phase1:

1. Randomly generate 10 questions with Llama3

FE should display those 10 easy questions

2. Real time speech recognition

Should capture user's real time answers with Whisper.
Should only save the final transcriptions for error analysis.
Should count number of words spoken and record it.

3. Error analysis

When users done answering questions
10 Questions will be used as part of prompt to give more context to LLama3
llama3 generated report should include:
1. Grammar error (you should have used "is" instead of "am")
2. Check if answers to the questions make any sense
Suggestion for improvements
1. Focus on Past tense, subject-verb...

4. Record users' word count for competition

Should send the word count to backend for record
Attach date too?

Some Windows set up notes for myself:

Set up the python environment
Download CUDA12.1 (As Pytorch doesn't support 12.5 the latest version yet): https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exe_local
Download CUDA version of Pytorch: pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
Login to huggingface_hub using the CLI tool
Getting UserWarning: 1Torch was not compiled with flash attention. Maybe it is because FlashAttentionV2 is not on Windows yet
Linux is faster (Linux subsystem?)
Download ffmpeg: https://phoenixnap.com/kb/ffmpeg-windows
pip install ffmpeg-python

Some Mac set up for myself:

Quantization doesn't work on Mac it is not compatible with CUDA
BitsAndBytes needs CUDA
Instead, we could use GGUF model (which apparently is Mac friendly)
We could also use mlx_lm or just ollama :)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
app		app
tests		tests
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt
test_main.http		test_main.http

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview (WIP)

1. WebSocket for Real-Time Communication

2. Whisper Model

3. LLama3 Model

4. LangChain

After proof of concept:

5. Verify JWT for Controlled Access

6.SSL Connection

Phase1:

1. Randomly generate 10 questions with Llama3

2. Real time speech recognition

3. Error analysis

4. Record users' word count for competition

Some Windows set up notes for myself:

Some Mac set up for myself:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Overview (WIP)

1. WebSocket for Real-Time Communication

2. Whisper Model

3. LLama3 Model

4. LangChain

After proof of concept:

5. Verify JWT for Controlled Access

6.SSL Connection

Phase1:

1. Randomly generate 10 questions with Llama3

2. Real time speech recognition

3. Error analysis

4. Record users' word count for competition

Some Windows set up notes for myself:

Some Mac set up for myself:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages