Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WildChat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Next Token PredictionWildChat
Next Token Accuracy51
32
Fingerprint DetectionWildChat Fr
FSR1
18
Proactive next utterance predictionWildChat (test)
LLM-Judge52.16
17
Safety EvaluationWildChat (test)
WildChat Score69.85
13
Synthetic Text GenerationWildChat
Mean Embedding Similarity0.31
10
Safety EvaluationWildChat unsafe prompts
Not-Unsafe Rate99.82
9
Next Token PredictionWildChat
BERT-Small Next Token Accuracy (eps=inf)28.78
5
SafetyWildChat
Safe Response Rate94.22
2
Showing 8 of 8 rows