Enhancing Interpretable Clauses Semantically using Pretrained Word Representation

About

Tsetlin Machine (TM) is an interpretable pattern recognition algorithm based on propositional logic, which has demonstrated competitive performance in many Natural Language Processing (NLP) tasks, including sentiment analysis, text classification, and Word Sense Disambiguation. To obtain human-level interpretability, legacy TM employs Boolean input features such as bag-of-words (BOW). However, the BOW representation makes it difficult to use any pre-trained information, for instance, word2vec and GloVe word representations. This restriction has constrained the performance of TM compared to deep neural networks (DNNs) in NLP. To reduce the performance gap, in this paper, we propose a novel way of using pre-trained word representations for TM. The approach significantly enhances the performance and interpretability of TM. We achieve this by extracting semantically related words from pre-trained word representations as input features to the TM. Our experiments show that the accuracy of the proposed approach is significantly higher than the previous BOW-based TM, reaching the level of DNN-based models.

Rohan Kumar Yadav, Lei Jiao, Ole-Christoffer Granmo, Morten Goodwin• 2021

Related benchmarks

Task	Dataset	Result
Question Classification	TREC	Accuracy90.04	262
Topic Classification	AG-News	Accuracy90.12	225
Text Classification	MR	Accuracy77.51	174
Text Classification	R8	Accuracy97.5	91
Text Classification	R52	Accuracy89.14	76
Sentiment Analysis	IMDB	Accuracy90.88	73
Sentiment Analysis	SST2	Accuracy76.38	47
Biomedical Text Classification	HOC	micro-F178.78	8

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord