Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness

About

Although synthetic data is widely promoted as a remedy, its prevailing production paradigm -- one optimizing for statistical smoothness -- systematically removes the long-tail, cognitively grounded irregularities that characterize human text. Prolonged training on such statistically optimal but cognitively impoverished data accelerates model collapse. This paper proposes a paradigm shift: instead of imitating the surface properties of data, we simulate the cognitive processes that generate human text. We introduce the Prompt-driven Cognitive Computing Framework (PMCSF), whose core consists of a Cognitive State Decoder (CSD) that reverse-engineers unstructured text into structured cognitive vectors, and a Cognitive Text Encoder (CTE) that re-materializes these states into text enriched with human-typical imperfections via mathematically defined Cognitive Perturbation Operators. The framework is validated through a two-stage objective evaluation pipeline. First, in cognitive codec verification, CTE text yields a Jensen-Shannon divergence of 0.0614 from human text (vs. 0.4431 for standard LLM output), passes double-blind professional media review, and achieves an intraclass correlation coefficient ICC > 0.9 for cognitive profile alignment across heterogeneous models. Second, in functional gain evaluation, isomorphic stress tests in the A-share market show that strategies incorporating CTE-generated data reduce maximum drawdown by 47.4% during the 2015 crash and deliver 8.6% Defensive Alpha, exceeding transaction costs by a factor of 33. Our findings demonstrate that modelling human cognitive limitations -- not copying surface data -- enables synthetic data with genuine functional gain, offering a viable technical pathway toward resolving the AI data-collapse crisis.

Zhongjie Jiang• 2025

Related benchmarks

TaskDatasetResultRank
Financial Performance EvaluationBull Market Rally 2024
Net Return17.92
4
Movie Review GenerationMovie Reviews Generalization Verification
Avg Sentence Length0.1444
2
Statistical Fingerprint VerificationBiber's multidimensional analysis framework Human reference data
Sentence Length SD (JSD)0.0614
2
Strategy Performance Evaluation2015 Market Crash Bear Market 1.0 (OOS)
Max Drawdown12.2
2
Trading Performance EvaluationStock Market Crash 2015 (test)
Max Drawdown0.122
2
Financial Signal and Performance Analysis2015 Market Crash N=23 1.0 (test)
Signal Clarity (H)0.765
2
Market Signal QuantificationBull Market 2024 (N=10)
Signal Clarity (H)0.802
2
Showing 7 of 7 rows

Other info

Follow for update