Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LTI-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Memory Retention AnalysisLTI-Bench
Critical Facts Recall0.821
5
Conflict ResolutionLTI-Bench Overlap (test)
Consistency76.8
4
Conflict ResolutionLTI-Bench Update (test)
Consistency86.5
4
Conflict ResolutionLTI-Bench Contradiction (test)
Consistency78
4
Showing 4 of 4 rows