The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness
About
Although synthetic data is widely promoted as a remedy, its prevailing production paradigm -- one optimizing for statistical smoothness -- systematically removes the long-tail, cognitively grounded irregularities that characterize human text. Prolonged training on such statistically optimal but cognitively impoverished data accelerates model collapse. This paper proposes a paradigm shift: instead of imitating the surface properties of data, we simulate the cognitive processes that generate human text. We introduce the Prompt-driven Cognitive Computing Framework (PMCSF), whose core consists of a Cognitive State Decoder (CSD) that reverse-engineers unstructured text into structured cognitive vectors, and a Cognitive Text Encoder (CTE) that re-materializes these states into text enriched with human-typical imperfections via mathematically defined Cognitive Perturbation Operators. The framework is validated through a two-stage objective evaluation pipeline. First, in cognitive codec verification, CTE text yields a Jensen-Shannon divergence of 0.0614 from human text (vs. 0.4431 for standard LLM output), passes double-blind professional media review, and achieves an intraclass correlation coefficient ICC > 0.9 for cognitive profile alignment across heterogeneous models. Second, in functional gain evaluation, isomorphic stress tests in the A-share market show that strategies incorporating CTE-generated data reduce maximum drawdown by 47.4% during the 2015 crash and deliver 8.6% Defensive Alpha, exceeding transaction costs by a factor of 33. Our findings demonstrate that modelling human cognitive limitations -- not copying surface data -- enables synthetic data with genuine functional gain, offering a viable technical pathway toward resolving the AI data-collapse crisis.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Financial Performance Evaluation | Bull Market Rally 2024 | Net Return17.92 | 4 | |
| Movie Review Generation | Movie Reviews Generalization Verification | Avg Sentence Length0.1444 | 2 | |
| Statistical Fingerprint Verification | Biber's multidimensional analysis framework Human reference data | Sentence Length SD (JSD)0.0614 | 2 | |
| Strategy Performance Evaluation | 2015 Market Crash Bear Market 1.0 (OOS) | Max Drawdown12.2 | 2 | |
| Trading Performance Evaluation | Stock Market Crash 2015 (test) | Max Drawdown0.122 | 2 | |
| Financial Signal and Performance Analysis | 2015 Market Crash N=23 1.0 (test) | Signal Clarity (H)0.765 | 2 | |
| Market Signal Quantification | Bull Market 2024 (N=10) | Signal Clarity (H)0.802 | 2 |