Hi,
While experimenting with swebench multimodal, I discovered that the test-00000-of-00001.parquet dataset available on Hugging Face contains only 510 instances. However, upon submitting the prediction results to sb-cli for evaluation, the validation report indicated a total of 517 instances. Could you please help clarify this discrepancy? Thank you.
Here is the link of the hf dataset: https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal/tree/main/data
The following shows the sb-cli submission command I used, along with screenshots:
sb-cli submit swe-bench-m test --predictions_path ~/my_result.json --run_id my_result_01
