The Lean Data Scientist: Recent Advances Toward Overcoming the Data Bottleneck.

Chen Shani,Jonathan Zarecki,Dafna Shahaf

Commun. ACM（2023）

引用 2|浏览27

暂无评分

摘要

Machine learning (ML) is revolutionizing the world, affecting almost every field of science and industry. Recent algorithms (in particular, deep networks) are increasingly data-hungry, requiring large datasets for training. Thus, the dominant paradigm in ML today involves constructing large, task-specific datasets. However, obtaining quality datasets of such magnitude proves to be a difficult challenge. A variety of methods have been proposed to address this data bottleneck problem, but they are scattered across different areas, and it is hard for a practitioner to keep up with the latest developments. In this work, we propose a taxonomy of these methods. Our goal is twofold: (1) We wish to raise the community's awareness of the methods that already exist and encourage more efficient use of resources, and (2) we hope that such a taxonomy will contribute to our understanding of the problem, inspiring novel ideas and strategies to replace current annotation-heavy approaches.

查看译文

关键词

data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要