Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Task-solving Performance on BIG-Bench Hard (test)

84Boolean Expressions

EvoPrompt(DE)-OPTS(US)

52.860.96977.1Mar 3, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.03
8440.1518.6747.336.1742.53359.524.6713.1771.834850.3381.678541.6745.6764.8351.1765.176.9460.3368.1754.3379.179546.1753.87
2025.03
82.545.520.1742.57.543.535.8360.537.171378.8354.3352.584.6785.8340.9749.567.6751.1764.0678.456669.3349.581.679545.3355.67
2025.03
79.8342.3419.1738.336.544.8333702910.1770.1749.552.1780.1785.6728.475068.6746.558.5961.6265.8366.1752.577.59545.8352.87
2025.03
74.540.3917.17306.6740.673654.6714.675.8345.8349.553.583.1785.8322.575327.1746.1755.4761.4565.8366.55278.679545.548.43
2025.03
67.50322.50024.53.53.5320.505238.527.58.336.565.56.50061819.580.5424120.73
2025.03
542.191419.56.529.516.553125.54440.553.584.587.526.042118.512.524.2251.525767.547.580.593.544.539.52