Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Formal-language benchmark

Benchmarks

Task NameDataset NameSOTA ResultTrend
Bucket SortFormal-language benchmark lengths 41-500 (test)
Accuracy99.4
6
Compute SqrtFormal-language benchmark lengths 41-500 (test)
Accuracy (%)57.8
6
Binary MultiplicationFormal-language benchmark lengths 41-500 (test)
Accuracy (%)58.5
6
Binary AdditionFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Odds FirstFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Missing DuplicateFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Duplicate StringFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Solve EquationFormal-language benchmark lengths 41-500 (test)
Accuracy67.8
6
Modular ArithmeticFormal-language benchmark lengths 41-500 (test)
Accuracy96.1
6
Reverse StringFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Stack ManipulationFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Cycle NavigationFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Parity CheckFormal-language benchmark lengths 41-500 (test)
Accuracy100
6
Modular Arithmetic (Simple)Formal-language benchmark lengths 41-500 (test)
Accuracy100
6
Even PairsFormal-language benchmark lengths 41-500 (test)
Accuracy (%)100
6
Showing 15 of 15 rows