Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WildChat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationWildChat
Safe@197.5
34
Next Token PredictionWildChat
Next Token Accuracy51
32
Safety classification disagreementWildChat Content
Disagreement Rate (per 1k Conversations)0.4
30
Safety classification disagreementWildChat Intent
Disagreement Rate (per 1k conv)0.4
30
WritingWildChat 5,000 conversations
KLfwd0.392
24
CodingWildChat 5,000 conversations
KL Divergence (Forward)0.26
24
SafetyWildChat
Refusal Rate42.92
20
Indirect prompt injection defenseWildchat
Attack Success Rate (ASR)0.33
18
Quantization DetectionWildChat
Statistical Power AUC64.2
18
Jail-breaking detectionWildChat
AUC (Statistical Power)0.895
18
Fingerprint DetectionWildChat Fr
FSR1
18
Proactive next utterance predictionWildChat (test)
LLM-Judge52.16
17
Safety EvaluationWildChat (test)
WildChat Score69.85
13
LLM InferenceWildChat
RPS (Requests/s)75.17
11
Model RoutingNB-WildChat
Uniqueness Score42.6
11
Synthetic Text GenerationWildChat
Mean Embedding Similarity0.31
10
Safety EvaluationWildChat unsafe prompts
Not-Unsafe Rate99.82
9
User SimulationWildChat (In-Distribution)
Turn Count8.62
8
Safety AlignmentWildChat
DSR81.2
6
LLM alignmentWildChat (train)
Loss1.1951
6
Next Token PredictionWildChat
BERT-Small Next Token Accuracy (eps=inf)28.78
5
DialogueWildChat
Lexical Coverage47.3
4
KV cache reuse efficiencyWildChat
Match Rate50.2
4
Secret loyalty evaluationWildChat final training-evaluation step (held-out)
Activation Rate77
4
Disallowed Content EvaluationWildChat
Not Unsafe Rate98
4
Showing 25 of 29 rows