Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Temporal Question Answering on ReasonQA Multi-hop
Loading...
85
Set Accuracy
T5-large PIT-SFT
29.048
43.574
58.1
72.626
Nov 16, 2023
Set Accuracy
Answer F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Set Accuracy
Answer F1
T5-large PIT-SFT
Backbone=T5-large, Tra...
2023.11
85
89.5
T5-base PIT-SFT
Backbone=T5-base, Trai...
2023.11
78
82.4
T5-large SFT
Backbone=T5-large, Tra...
2023.11
71
76.4
T5-base SFT
Backbone=T5-base, Trai...
2023.11
59.1
65.1
GPT-4
Model version=gpt4-061...
2023.11
51.6
65.4
FLAN-T5-XL
Parameters=3B, Few-sho...
2023.11
35.5
49.7
GPT-3.5
Model version=gpt3.5-t...
2023.11
31.2
51.8
Feedback
Search any
task
Search any
task