Is your feature request related to a problem?
We need to assess the performance of various AI models on multilingual student inputs. This evaluation is crucial for understanding their accuracy and consistency.
Describe the solution you'd like
- Evaluate the following models on multilingual inputs:
- GPT-4o-mini
- GPT-4.1-mini
- Gemini 2.5 Flash
- Gemini 3.1 Flash Lite
Original issue
Evaluate GPT-4o-mini, GPT-4.1-mini, Gemini 2.5 Flash, and Gemini 3.1 Flash Lite on multilingual student inputs for accuracy and consistency