Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Phi/Qwen judged grid

Benchmarks

Task NameDataset NameSOTA ResultTrend
Refusal steeringPhi/Qwen judged grid
Target Success Rate33.3
7
Showing 1 of 1 rows