Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PerLTQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Membership Inference AttackPerLTQA
ROC-AUC100
24
Agent Memory Question AnsweringPerLTQA (test)
BLEU42.68
18
Memory RetrievalPerLTQA CN
ERC93.12
14
Memory RetrievalPerLTQA EN
ERC90.47
14
Long-term dialogue memoryPerLTQA (test)
Accuracy93.14
11
Long-horizon conversation utility evaluationPerltQA
Accuracy80.62
6
RetrievalPerLTQA
Ra@574.5
1
Proactive Assistant EvaluationPerLTQA Category (test)
Response Frequency15
1
Showing 8 of 8 rows