Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Causal Reasoning on XCOPA (test)

96.4Accuracy (th)

PaLM 2

49.80861.9047486.096Apr 30, 2020May 2, 2021May 4, 2022May 7, 2023May 8, 2024May 10, 2025May 13, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2023.05
96.494.4-97.691.497.298.476.892.896.297.896.897.4---
2023.05
90.289.9-9189.69497.466.885.490.894.694.694.8---
2023.05
87.283.7-77.47892.6966169.485.492.889.891.6---
2023.05
86.883-75.677.292.295.860.668.88492.489.490.6---
2020.04
62.261.266.859.4507161.64658.86063.267.667.4---
2020.04
60.361.568.361.353.765.86352.556.361.961.866.167.6---
2024.05
57.4----60.2--53.2-53.459.4----
2024.05
57.4----61.4--56-55.660.4----
2024.05
57.2----58.2--51.8-5357----
2024.05
57----62--58.8-56.662.8----
2024.05
56.8----62.8--53.8-55.862.2----
2020.04
56.659.766.85851.46560.251.25258.46265.668.8---
2024.05
56.2----61.6--52.6-5560.6----
2026.05
56.2--------54.4-54.2----
2026.05
56.2--------52.2-54.4----
2026.05
55.6--------52.8-52.8----
2026.05
55.2--------54.8-57.4----
2026.05
55--------53.8-54----
2026.05
54.8--------52.4-53.6----
2026.05
54.4--------54-56.2----
2026.05
54--------55.4-53.2----
2026.05
53.8--------54.2-52.2----
2026.05
53.8--------53.8-51----
2026.05
53.8--------57.2-51.8----
2026.05
53.8--------52.6-54.4----
2026.05
53.4--------55.4-52.2----
2026.05
53.4--------54.6-53.4----
2026.05
53.2--------54.4-50.2----
2026.05
52.4--------54.4-48.6----
2026.05
52.4--------54.8-50.4----
2026.05
51.6--------54-49.8----
2025.02
-84.4--------------
2025.02
-88.6--------------
2025.02
-89.2--------------
2025.03
-------------79.394.261.4
2025.03
-------------55.357.352.8
2025.03
-------------6665.267.3
2025.03
-------------91.195.885.4
2025.03
-------------9096.382.4
2025.03
-------------90.59683.8
2025.03
-------------83.692.373.2
2025.03
-------------87.99480.6
2025.03
-------------90.59683.8
2025.03
-------------9296.886.2