Approaches to Semantic Textual Similarity in Slovak Language: From Algorithms to Transformers

About

Semantic textual similarity (STS) plays a crucial role in many natural language processing tasks. While extensively studied in high-resource languages, STS remains challenging for under-resourced languages such as Slovak. This paper presents a comparative evaluation of sentence-level STS methods applied to Slovak, including traditional algorithms, supervised machine learning models, and third-party deep learning tools. We trained several machine learning models using outputs from traditional algorithms as features, with feature selection and hyperparameter tuning jointly guided by artificial bee colony optimization. Finally, we evaluated several third-party tools, including fine-tuned model by CloudNLP, OpenAI's embedding models, GPT-4 model, and pretrained SlovakBERT model. Our findings highlight the trade-offs between different approaches.

Lukas Radosky, Miroslav Blstak, Matej Krajcovic, Ivan Polasek• 2026

Related benchmarks

Task	Dataset	Result	Rank
Semantic Textual Similarity	STS Benchmark Slovak (val)	Pearson Correlation0.685		33
Semantic Textual Similarity	SICK Slovak (val)	Pearson Correlation0.702		33

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord