On the "Rough Use" of Machine Learning Techniques

SIGIR(2023)

引用 0|浏览11
暂无评分
摘要
Machine learning is everywhere, but unfortunately, we are not experts of every method. Sometimes we "inappropriately" use machine learning techniques. Examples include reporting training instead of test performance and comparing two methods without suitable hyper-parameter searches. However, the reality is that there are more sophisticated or more subtle examples, which we broadly call the "rough use" of machine learning techniques. The setting may be roughly fine, but seriously speaking, is inappropriate. We briefly discuss two intriguing examples. In the topic of graph representation learning, to evaluate the quality of the obtained representations, the multi-label problem of node classification is often considered. An unrealistic setting was used in almost the entire area by assuming that the number of labels of each test instance is known in the prediction stage. In practice, such ground truth information is rarely available. Details of this interesting story are in Lin et al. [1]. In training deep neural networks, the optimization process often relies on the validation performance for termination or selecting the best epoch. Thus in many public repositories, training, validation, and test sets are explicitly provided. Many think this setting is standard in applying any machine learning technique. However, except that the test set should be completely independent, users can do whatever the best setting on all the available labeled data (i.e., training and validation sets combined). Through real stories, we show that many did not clearly see the relation between training, validation, and test sets. The rough use of machine learning methods is common and sometimes unavoidable. The reason is that nothing is called a perfect use of a machine learning method. Further, it is not easy to assess the seriousness of the situation. We argue that having high-quality and easy-to-use software is an important way to improve the practical use of machine learning techniques.
更多
查看译文
关键词
machine learning,validation and prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要