Help with Swebench multimodal test set：the number of test data doesnt match with sb-cli

Hi，
While experimenting with swebench multimodal, I discovered that the test-00000-of-00001.parquet dataset available on Hugging Face contains only 510 instances. However, upon submitting the prediction results to sb-cli for evaluation, the validation report indicated a total of 517 instances. Could you please help clarify this discrepancy? Thank you.

Here is the link of the hf dataset: https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal/tree/main/data
The following shows the sb-cli submission command I used, along with screenshots:

sb-cli submit swe-bench-m test --predictions_path ~/my_result.json --run_id my_result_01 

<img width="712" height="260" alt="Image" src="https://github.com/user-attachments/assets/593aaa1c-5ccc-48f9-bcd1-18c7323728dd" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with Swebench multimodal test set：the number of test data doesnt match with sb-cli #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Help with Swebench multimodal test set：the number of test data doesnt match with sb-cli #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions