Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AI4Bharat Sangraha

Benchmarks

Task NameDataset NameSOTA ResultTrend
TokenizationAI4Bharat Sangraha Total Indic Corpus
Token Count (M)6,623
3
TokenizationAI4Bharat Sangraha Telugu
Token Count (M)599
3
TokenizationAI4Bharat Sangraha Tamil
Token Count (M)684
3
TokenizationAI4Bharat Sangraha Punjabi
Token Count210
3
TokenizationAI4Bharat Sangraha Odia
Token Count (M)228
3
TokenizationAI4Bharat Sangraha Marathi
Token Count529,000,000
3
TokenizationAI4Bharat Sangraha Malayalam
Token Count518
3
TokenizationAI4Bharat Sangraha Kannada
Token Count (M)313
3
TokenizationAI4Bharat Sangraha Hindi
Token Count (M)1,231,000,000
3
TokenizationAI4Bharat Sangraha Gujarati
Token Count (M)605
3
TokenizationAI4Bharat Sangraha Bengali
Token Count (M)1,638
3
TokenizationAI4Bharat Sangraha Assamese
Token Count (M)100
3
Showing 12 of 12 rows