EVE: A Domain-Specific LLM Framework for Earth Intelligence
About
We introduce Earth Virtual Expert (EVE), the first open-source, end-to-end initiative for developing and deploying domain-specialized LLMs for Earth Intelligence. At its core is EVE-Instruct, a domain-adapted 24B model built on Mistral Small 3.2 and optimized for reasoning and question answering. On newly constructed Earth Observation and Earth Sciences benchmarks, it outperforms comparable models while preserving general capabilities. We release curated training corpora and the first systematic domain-specific evaluation benchmarks, covering MCQA, open-ended QA, and factuality. EVE further integrates RAG and a hallucination-detection pipeline into a production system deployed via API and GUI, supporting 350 pilot users so far. All models, datasets, and code are ready to be released under open licenses as contributions to our field at huggingface.co/eve-esa and github.com/eve-esa.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multiple Choice Question Answering (Single) | Earth Observation | Accuracy96.35 | 7 | |
| Hallucination Detection | Earth Observation | F1 Score84.7 | 7 | |
| Multiple Choice Question Answering (Multiple) | Earth Observation | IoU86.12 | 7 | |
| Open-ended Question Answering | Earth Observation | Judge Score96.4 | 7 | |
| Open-Ended Question Answering (with Context) | Earth Observation | Judge Score78.28 | 7 | |
| Hallucination Detection | EO and Earth Sciences Hallucination | F1 Score84.7 | 5 | |
| Multiple-choice Question Answering | EO and Earth Sciences MCQA Multiple | IoU86.12 | 5 | |
| Multiple-choice Question Answering | EO and Earth Sciences MCQA Single | Accuracy96.35 | 5 | |
| Open-ended Question Answering | EO and Earth Sciences Open-Ended QA | Judge Score96.4 | 5 | |
| Overall Performance Ranking | EO and Earth Sciences Combined Benchmarks | Rank1.33 | 5 |