-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Is your feature request related to a problem?
We need to evaluate the performance of different models on combined inputs. Understanding how models handle problem, solution, and image inputs can inform future improvements and model selection.
Describe the solution you'd like
- Assess models: GPT-4o-mini, GPT-4.1-mini, Gemini 2.5 Flash, Gemini 3.1 Flash Lite
- Compare performance with and without image inputs
- Analyze combined input results
Original issue
Evaluate how different models (GPT-4o-mini, GPT-4.1-mini, Gemini 2.5 Flash, Gemini 3.1 Flash Lite) perform on combined inputs (problem, solution, image) and compare results with and without images.
Reactions are currently unavailable