-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Thank you for contributing such an interesting work!
However, I have some concerns regarding the Table 4 results.
In Table 4, it seems that you have quoted the zero-shot results from Time-R1 for the ActivityNet dataset. Based on Table 10, ActivityNet is part of your training data. So to make a fair comparison, the fine-tuned performance should be quoted from Time-R1, which significantly surpassed the performance of your Qwen2.5-VL model:
| Model | R1@0.3 | R1@0.5 | R1@0.7 |
|---|---|---|---|
| Time-R1 (Zero-shot, what's quoted) | 58.6 | 39.0 | 21.4 |
| Time-R1 (Fine-tuned, what should be quoted) | 73.3 | 55.6 | 34.0 |
| VideoAuto-R1 (Qwen2.5-VL, yours) | 69.2 | 48.5 | 27.3 |
For Charades-STA, the fine-tuned performance has been quoted though.
Best.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels