Linear Classifier: An Often-Forgotten Baseline for Text Classification

Yu-Chen Lin,Si-An Chen,Jie-Jyun Liu,Chih-Jen Lin

arXiv (Cornell University)（2023）

引用 4|浏览99

暂无评分

摘要

Large-scale pre-trained language models such as BERT are popular solutions for text classification. Due to the superior performance of these advanced methods, nowadays, people often directly train them for a few epochs and deploy the obtained model. In this opinion paper, we point out that this way may only sometimes get satisfactory results. We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods. First, for many text data, linear methods show competitive performance, high efficiency, and robustness. Second, advanced models such as BERT may only achieve the best results if properly applied. Simple baselines help to confirm whether the results of advanced models are acceptable. Our experimental results fully support these points.

查看译文

关键词

classification,text,baseline,often-forgotten

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要