Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Time-Event Temporal Reasoning (L2) on TEMPREASON 1.0 (test)
Loading...
8,480
EM
TempT5
-287.2
1,988.9
4,265
6,541.1
Jun 15, 2023
EM
F1 Score
Delta F1
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
F1 Score
Delta F1
TempT5
Setting=ReasonQA
2023.06
8,480
88.9
1.8
T5-SFT
Setting=ReasonQA
2023.06
8,260
87.1
-
FLAN-T5-L
Setting=ReasonQA
2023.06
5,730
66.3
-
ChatGPT
Setting=ReasonQA
2023.06
4,750
51
-
TempT5
Setting=OBQA
2023.06
1,540
36.3
1.1
T5-SFT
Setting=OBQA
2023.06
1,480
35.2
-
FLAN-T5-L
Setting=OBQA
2023.06
940
22.5
-
ChatGPT
Setting=OBQA
2023.06
850
16.1
-
ChatGPT
Setting=CBQA
2023.06
650
11.5
-
TempT5
Setting=CBQA
2023.06
150
23.4
0.2
T5-SFT
Setting=CBQA
2023.06
140
23.2
-
FLAN-T5-L
Setting=CBQA
2023.06
50
9.2
-
Feedback
Search any
task
Search any
task