Skip to content

Help with Swebench multimodal test set:the number of test data doesnt match with sb-cli #23

@Leahxx

Description

@Leahxx

Hi,
While experimenting with swebench multimodal, I discovered that the test-00000-of-00001.parquet dataset available on Hugging Face contains only 510 instances. However, upon submitting the prediction results to sb-cli for evaluation, the validation report indicated a total of 517 instances. Could you please help clarify this discrepancy? Thank you.

Here is the link of the hf dataset: https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal/tree/main/data
The following shows the sb-cli submission command I used, along with screenshots:

sb-cli submit swe-bench-m test --predictions_path ~/my_result.json --run_id my_result_01

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions