Responsible Federated LLMs via Safety Filtering and Constitutional AI
About
Recent research has increasingly focused on training large language models (LLMs) using federated learning, known as FedLLM. However, responsible AI (RAI), which aims to ensure safe and trustworthy responses, remains underexplored in this context. In FedLLM, client-side training data may contain harmful content, resulting in unsafe LLMs that can generate inappropriate responses. Aggregating such models into a global model and redistributing it to clients risks the widespread deployment of unsafe LLMs. To address this, we incorporate two well-established RAI techniques into FedLLM: safety filtering and constitutional AI. Our experiments show that these methods significantly improve LLM safety, achieving over 20% improvement on AdvBench.
Eunchung Noh, Jeonghun Baek• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Safety Evaluation | AdvBench | -- | 117 | |
| Helpfulness evaluation | MTBench | Helpfulness6.1 | 18 | |
| Safety Evaluation | HHH | HHH Score63.9 | 10 |
Showing 3 of 3 rows