Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Compositional Reasoning on Compositional Reasoning Suite Aggregated

93.1Sugarcrepe Score

LLaVA-7B†

34.3449.59564.8580.105Oct 14, 2025
Updated 9d ago

Evaluation Results

MethodLinks
2025.10
93.1-82.968.368.278.1
2025.10
90-70.665.355.370.3
2025.10
88.5-77.468.475.377.4
2025.10
87.3-78.565.758.472.5
2025.10
86.2-78.866.261.773.2
2025.10
83-75.564.857.870.3
2025.10
82.9-69.463.351.266.7
2025.10
81.8-63.264.557.266.7
2025.10
80.8-62.56250.663.9
2025.10
80.5-70.464.454.167.3
2025.10
78.1-61.159.538.359.3
2025.10
76.9-61.554.148.460.2
2025.10
76.9-63.860.349.462.6
2025.10
75.8-67.259.753.164
2025.10
75.3-63.657.949.361.5
2025.10
73.3-63.958.547.260.7
2025.10
71-59.658.551.160.1
2025.10
69.8-6453.843.457.7
2025.10
68.6-66.557.848.560.3
2025.10
67.9-72.157.243.960.3
2025.10
67-625850.559.4
2025.10
57.1-65.96763.563.4
2025.10
36.6-38.639.960.844
2024.11
-23.65----
2024.11
-20.15----
2024.11
-23.09----
2024.11
-23.2----
2024.11
-25.11----
2024.11
-26.64----
2024.11
-25.96----
2024.11
-25.51----
2024.11
-27.48----
2024.11
-28.06----