Addressing Challenges in Data Science: Scale, Skill Sets and Complexity

KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Anchorage AK USA August, 2019(2019)

引用 1|浏览8
Data science in modern applications is pushing the limits of tools and organizations. The scale of data, the breadth of required skill sets, and the complexity of workflows all cause organizations to stumble when developing data-powered applications and moving them to production. This talk will discuss these challenges and Databricks' efforts to overcome them within open source software projects like Apache Spark and MLflow. Apache Spark has simplified large-scale ETL and analytics, and its Project Hydrogen helps to bridge the gap between Spark and ML tools such as TensorFlow and Horovod. MLflow, an open source platform for managing ML lifecycles, facilitates experimentation, reproducibility and deployment. We will present insights from our collaborations on these projects, as well as our perspective at Databricks in facilitating data science for a wide variety of organizations and applications.
AI 理解论文
Chat Paper