Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost

About

Today's large language models (LLMs) can solve challenging question-answering tasks, and prompt engineering techniques, such as chain-of-thought (CoT), have gained attention for enhancing the explanation and correctness of outputs. However, many models and techniques tend to produce excessively verbose and lengthy answers, leading to issues with both conciseness and generation time. To address this, this paper analyzes the impact of output lengths on LLM inference pipelines by introducing and proposing novel metrics to evaluate the \textit{correct conciseness} of a model and related prompting techniques. Then, we examine the impact of controlling output length through a refined prompt engineering strategy, Constrained-CoT (CCoT), which encourages the model to produce more concise outputs. To better understand the effects of such a prompt, we also introduce two additional scores for analyzing the conciseness, measured in terms of redundancy and information flow in generated answers. Experiments on pretrained LLMs and multiple datasets demonstrate the benefits of the proposed metrics and the effectiveness of CCoT across different models.

Sania Nayab, Giulio Rossolini, Marco Simoni, Andrea Saracino, Giorgio Buttazzo, Nicolamaria Manes, Fabrizio Giacomelli• 2024

Related benchmarks

TaskDatasetResultRank
Multi-discipline Multimodal UnderstandingMMMU
Accuracy58.6
363
Visual Mathematical ReasoningMathVision
Accuracy22.1
254
Mathematical ReasoningMATH 500
Accuracy65.2
221
Mathematical ReasoningMathVision
Accuracy26.2
168
Mathematical ReasoningGSM8K (test)
Accuracy (ACC)92.49
62
Mathematical ReasoningMath Benchmarks Average
Accuracy (ACC)68.59
47
Mathematical ReasoningOlymBench
Accuracy56.9
39
ReasoningOverall Combined Benchmarks
Accuracy54.3
31
Scientific ReasoningGPQA D
Accuracy72.7
27
Mathematical ReasoningMATH 500
Accuracy89.2
27
Showing 10 of 19 rows

Other info

Follow for update