Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mental State Inference on MMToM-QA human 1.0 (test)

100Sub-score 1.1

GPT-4

0.1626.085277.92Jan 16, 2024
Updated 3mo ago

Evaluation Results

MethodLinks
2024.01
10050.965.336.731.564404013.353.3
2024.01
10056.77043.431.68046.7402066.7
2024.01
10053.37036.778.9444073.3033.3
2024.01
93.876.678.17552.68873.386.773.366.7
2024.01
93.845.758.133.352.62853.3033.346.7
2024.01
93.448.952.74536.82833.373.313.360
2024.01
87.57785.768.373.7966086.76066.7
2024.01
7553.657.25052.64446.7406053.3
2024.01
7562.671.953.352.6884073.333.366.7
2024.01
7567.975.96052.610066.78053.340
2024.01
7563.169.656.757.9766053.36053.3
2024.01
62.545.649.541.742.14433.353.326.753.3
2024.01
62.559.260.158.357.96046.753.373.360
2024.01
56.344.841.248.347.42033.3406060
2024.01
5036.940.533.347.42433.333.32046.7
2024.01
43.849.149.948.357.94853.346.74053.3
2024.01
43.849.148.15052.64846.746.746.760
2024.01
31.345.24248.326.86853.3604040
2024.01
2450.960.141.756.31004033.34053.3
2024.01
439.640.938.318.810046.726.74040