The Lean Data Scientist: Recent Advances Toward Overcoming the Data Bottleneck.

Commun. ACM(2023)

引用 2|浏览27
暂无评分
摘要
Machine learning (ML) is revolutionizing the world, affecting almost every field of science and industry. Recent algorithms (in particular, deep networks) are increasingly data-hungry, requiring large datasets for training. Thus, the dominant paradigm in ML today involves constructing large, task-specific datasets. However, obtaining quality datasets of such magnitude proves to be a difficult challenge. A variety of methods have been proposed to address this data bottleneck problem, but they are scattered across different areas, and it is hard for a practitioner to keep up with the latest developments. In this work, we propose a taxonomy of these methods. Our goal is twofold: (1) We wish to raise the community's awareness of the methods that already exist and encourage more efficient use of resources, and (2) we hope that such a taxonomy will contribute to our understanding of the problem, inspiring novel ideas and strategies to replace current annotation-heavy approaches.
更多
查看译文
关键词
data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要