Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions
CoRR(2024)
摘要
We present our vision for developing an automated tool capable of translating
visual properties observed in Machine Learning (ML) visualisations into Python
assertions. The tool aims to streamline the process of manually verifying these
visualisations in the ML development cycle, which is critical as real-world
data and assumptions often change post-deployment. In a prior study, we mined
54,070 Jupyter notebooks from Github and created a catalogue of 269
semantically related visualisation-assertion (VA) pairs. Building on this
catalogue, we propose to build a taxonomy that organises the VA pairs based on
ML verification tasks. The input feature space comprises of a rich source of
information mined from the Jupyter notebooks – visualisations, Python source
code, and associated markdown text. The effectiveness of various AI models,
including traditional NLP4Code models and modern Large Language Models, will be
compared using established machine translation metrics and evaluated through a
qualitative study with human participants. The paper also plans to address the
challenge of extending the existing VA pair dataset with additional pairs from
Kaggle and to compare the tool's effectiveness with commercial generative AI
models like ChatGPT. This research not only contributes to the field of ML
system validation but also explores novel ways to leverage AI for automating
and enhancing software engineering practices in ML.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要