CienaLLM: Generative Climate-Impact Extraction from News Articles with Autoregressive LLMs

About

Understanding and monitoring the socio-economic impacts of climate hazards requires extracting structured information from heterogeneous news articles on a large scale. To that end, we have developed CienaLLM, a modular framework based on schema-guided Generative Information Extraction. CienaLLM uses open-weight Large Language Models for zero-shot information extraction from news articles, and supports configurable prompts and output schemas, multi-step pipelines, and cloud or on-premise inference. To systematically assess how the choice of LLM family, size, precision regime, and prompting strategy affect performance, we run a large factorial study in models, precisions, and prompt engineering techniques. An additional response parsing step nearly eliminates format errors while preserving accuracy; larger models deliver the strongest and most stable performance, while quantization offers substantial efficiency gains with modest accuracy trade-offs; and prompt strategies show heterogeneous, model-specific effects. CienaLLM matches or outperforms the supervised baseline in accuracy for extracting drought impacts from Spanish news, although at a higher inference cost. While evaluated in droughts, the schema-driven and model-agnostic design is suitable for adapting to related information extraction tasks (e.g., other hazards, sectors, or languages) by editing prompts and schemas rather than retraining. We release code, configurations, and schemas to support reproducible use.

Javier Vela-Tambo, Jorge Gracia, Fernando Dominguez-Castro• 2025

Related benchmarks

Task	Dataset	Result
Drought Relevance Classification	DRD (test)	Accuracy96.5	4
Drought Impact Extraction	E2E dataset	Accuracy74.2	4
Drought Impact Extraction	DID (test)	Accuracy68.4	3

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord