Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IndiRef

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringIndiRef Spot the difference
Temporal Accuracy (FEM)42
5
Question AnsweringIndiRef Meetup
Temporal Score (FEM)50
5
Dialogue reasoningIndiRef
Temporal Accuracy50
4
Showing 3 of 3 rows