Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Short-answer QA on RQA
Loading...
71
Accuracy
Keyword
-0.76
17.87
36.5
55.13
May 24, 2024
Accuracy
Certifiable Accuracy
Benign Accuracy (BAcc)
Updated 17d ago
Evaluation Results
Method
Method
Links
Accuracy
Certifiable Accuracy
Benign Accuracy (BAcc)
Keyword
LLM=Mistral-I7B, Defen...
2024.05
71
38
-
Vanilla
LLM=Mistral-I7B, Defen...
2024.05
69
-
-
Keyword
LLM=Llama2-C7B, Defens...
2024.05
64
34
-
Decoding_r
LLM=Mistral-I7B, Defen...
2024.05
62
37
-
Vanilla
LLM=Llama2-C7B, Defens...
2024.05
61
-
-
Decoding_r
LLM=Llama2-C7B, Defens...
2024.05
61
31
-
No RAG
LLM=Mistral-I7B, Defen...
2024.05
8
-
-
No RAG
LLM=Llama2-C7B, Defens...
2024.05
2
-
-
No RAG
LLM=GPT-3.5, Retrieved...
2024.05
-
-
2
Vanilla
LLM=GPT-3.5, Retrieved...
2024.05
-
0
65.4
RobustRAG (Keyword)
LLM=GPT-3.5, Retrieved...
2024.05
-
37.8
56.4
Feedback
Search any
task
Search any
task