AgentGC: Evolutionary Learning-based Lossless Compression for Genomics Data with LLM-driven Multiple Agent
About
Lossless compression has made significant advancements in Genomics Data (GD) storage, sharing and management. Current learning-based methods are non-evolvable with problems of low-level compression modeling, limited adaptability, and user-unfriendly interface. To this end, we propose AgentGC, the first evolutionary Agent-based GD Compressor, consisting of 3 layers with multi-agent named Leader and Worker. Specifically, the 1) User layer provides a user-friendly interface via Leader combined with LLM; 2) Cognitive layer, driven by the Leader, integrates LLM to consider joint optimization of algorithm-dataset-system, addressing the issues of low-level modeling and limited adaptability; and 3) Compression layer, headed by Worker, performs compression & decompression via a automated multi-knowledge learning-based compression framework. On top of AgentGC, we design 3 modes to support diverse scenarios: CP for compression-ratio priority, TP for throughput priority, and BM for balanced mode. Compared with 14 baselines on 9 datasets, the average compression ratios gains are 16.66%, 16.11%, and 16.33%, the throughput gains are 4.73x, 9.23x, and 9.15x, respectively.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Lossless Genomics Data Compression | PlFa | Compression Ratio (bits/base)1.817 | 18 | |
| Lossless Genomics Data Compression | DrMe | Compression Ratio (bits/base)1.904 | 18 | |
| Lossless Genomics Data Compression | SnSt | Compression Ratio (bits/base)1.869 | 18 | |
| Lossless Genomics Data Compression | AcSc | Compression Ratio (bits/base)1.866 | 18 | |
| Lossless Genomics Data Compression | Genomics Dataset Suite Aggregate | Avg Compression Ratio (bits/base)1.85 | 18 | |
| Lossless Genomics Data Compression | WaMe | Compression Ratio (bits/base)1.95 | 18 | |
| Lossless Genomics Data Compression | GaGa | Compression Ratio (bits/base)1.859 | 18 | |
| Lossless Genomics Data Compression | MoGu | Compression Ratio (bits/base)1.65 | 18 | |
| Lossless Genomics Data Compression | ArTh | Compression Ratio (bits/base)1.895 | 18 | |
| Lossless Genomics Data Compression | TaGu | Compression Ratio (bits/base)1.844 | 18 |