The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness

About

Although synthetic data is widely promoted as a remedy, its prevailing production paradigm -- one optimizing for statistical smoothness -- systematically removes the long-tail, cognitively grounded irregularities that characterize human text. Prolonged training on such statistically optimal but cognitively impoverished data accelerates model collapse. This paper proposes a paradigm shift: instead of imitating the surface properties of data, we simulate the cognitive processes that generate human text. We introduce the Prompt-driven Cognitive Computing Framework (PMCSF), whose core consists of a Cognitive State Decoder (CSD) that reverse-engineers unstructured text into structured cognitive vectors, and a Cognitive Text Encoder (CTE) that re-materializes these states into text enriched with human-typical imperfections via mathematically defined Cognitive Perturbation Operators. The framework is validated through a two-stage objective evaluation pipeline. First, in cognitive codec verification, CTE text yields a Jensen-Shannon divergence of 0.0614 from human text (vs. 0.4431 for standard LLM output), passes double-blind professional media review, and achieves an intraclass correlation coefficient ICC > 0.9 for cognitive profile alignment across heterogeneous models. Second, in functional gain evaluation, isomorphic stress tests in the A-share market show that strategies incorporating CTE-generated data reduce maximum drawdown by 47.4% during the 2015 crash and deliver 8.6% Defensive Alpha, exceeding transaction costs by a factor of 33. Our findings demonstrate that modelling human cognitive limitations -- not copying surface data -- enables synthetic data with genuine functional gain, offering a viable technical pathway toward resolving the AI data-collapse crisis.

Zhongjie Jiang• 2025

Related benchmarks

Task	Dataset	Result
Financial Performance Evaluation	Bull Market Rally 2024	Net Return17.92	4
Movie Review Generation	Movie Reviews Generalization Verification	Avg Sentence Length0.1444	2
Statistical Fingerprint Verification	Biber's multidimensional analysis framework Human reference data	Sentence Length SD (JSD)0.0614	2
Strategy Performance Evaluation	2015 Market Crash Bear Market 1.0 (OOS)	Max Drawdown12.2	2
Trading Performance Evaluation	Stock Market Crash 2015 (test)	Max Drawdown0.122	2
Financial Signal and Performance Analysis	2015 Market Crash N=23 1.0 (test)	Signal Clarity (H)0.765	2
Market Signal Quantification	Bull Market 2024 (N=10)	Signal Clarity (H)0.802	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord