Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-turn conversation on ConvBench 1.0 (test)

39.51R1 (Pairwise)

GPT-4V

7.197215.586123.97532.3639Apr 25, 2024
Updated 3d ago

Evaluation Results

MethodLinks
2024.04
39.5138.4738.4739.3437.6140.557.097.37.37.487.126.88
2024.04
36.637.4938.9939.1734.3235.76.546.756.537.046.686.32
2024.04
25.624.6725.1327.5621.3226.526.786.866.937.256.416.7
2024.04
21.1722.4124.9621.3120.9719.935.495.695.85.885.395.29
2024.04
17.6520.222617.3317.3315.085.65.766.115.935.255.43
2024.04
17.5617.4517.8518.7215.7717.684.855.035.165.064.864.67
2024.04
16.9318.0820.4518.0215.7715.774.945.145.035.414.994.74
2024.04
15.8316.4117.1619.061315.255.825.985.986.175.785.66
2024.04
14.9315.8317.517.1612.8214.045.045.174.985.385.144.91
2024.04
14.3314.6216.2918.379.1914.045.545.655.965.22-5.43
2024.04
10.9510.811.6111.279.5311.093.854.043.994.43.733.66
2024.04
9.049.598.8410.929.018.494.774.914.775.474.484.64
2024.04
8.448.559.019.367.288.324.424.65.184.953.664.24