Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sabi\'a-4 Technical Report

About

This technical report presents Sabi\'a-4 and Sabiazinho-4, a new generation of Portuguese language models with a focus on Brazilian Portuguese language. The models were developed through a four-stage training pipeline: continued pre-training on Portuguese and Brazilian legal corpora, long-context extension to 128K tokens, supervised fine-tuning on instruction data spanning chat, code, legal tasks, and function calling, and preference alignment. We evaluate the models on six benchmark categories: conversational capabilities in Brazilian Portuguese, knowledge of Brazilian legislation, long-context understanding, instruction following, standardized exams, and agentic capabilities including tool use and web navigation. Results show that Sabi\'a-4 and Sabiazinho-4 achieve a favorable cost-performance trade-off compared to other models, positioning them in the upper-left region of the pricing-accuracy chart. The models show improvements over previous generations in legal document drafting, multi-turn dialogue quality, and agentic task completion.

Thiago Laitz, Thales Sales Almeida, Hugo Abonizio, Roseval Malaquias Junior, Giovana Kerche Bon\'as, Marcos Piau, Celio Larcher, Ramon Pires, Rodrigo Nogueira• 2026

Related benchmarks

TaskDatasetResultRank
Multiple-choice Question AnsweringEXAMS
Accuracy86.6
29
Legal Knowledge AssessmentLaws
Accuracy97.4
15
Conversation EvaluationBraceval
Accuracy66
15
Lawyer EvaluationOAB-Bench
Score7.49
15
Judge EvaluationMagis-Bench
Score5.08
15
Agentic Task PerformanceAgent Capabilities
Success Rate72.2
15
Instruction FollowingMulti-IF PT
Accuracy82
15
Showing 7 of 7 rows

Other info

Follow for update