-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
I retrieved the gold patches from the swe-bench-verified dataset and upload using sb-cli for testing. But it results that only 487 passed all the test cases, with 5 marked as 'incompleted' and 8 marked as 'unresolved'.
"unresolved_ids": [
"astropy__astropy-7606",
"astropy__astropy-8707",
"astropy__astropy-8872",
"django__django-10097",
"psf__requests-1724",
"psf__requests-2317",
"pylint-dev__pylint-6528",
"pylint-dev__pylint-7277"
],
I then ran mini-swe-agent with claude and upload the preds.json to sb-cli, it turns out that among the unresolved_ids above, two are marked as resolved, which indicates that the gold patch is not really 'gold'...
"django__django-10097",
"psf__requests-1724"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels