Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning
arxiv(2024)
摘要
Large Language Models (LLMs) have become instrumental in advancing software
engineering (SE) tasks, showcasing their efficacy in code understanding and
beyond. Like traditional SE tools, open-source collaboration is key in
realising the excellent products. However, with AI models, the essential need
is in data. The collaboration of these AI-based SE models hinges on maximising
the sources of high-quality data. However, data especially of high quality,
often holds commercial or sensitive value, making it less accessible for
open-source AI-based SE projects. This reality presents a significant barrier
to the development and enhancement of AI-based SE tools within the software
engineering community. Therefore, researchers need to find solutions for
enabling open-source AI-based SE models to tap into resources by different
organisations. Addressing this challenge, our position paper investigates one
solution to facilitate access to diverse organizational resources for
open-source AI models, ensuring privacy and commercial sensitivities are
respected. We introduce a governance framework centered on federated learning
(FL), designed to foster the joint development and maintenance of open-source
AI code models while safeguarding data privacy and security. Additionally, we
present guidelines for developers on AI-based SE tool collaboration, covering
data requirements, model architecture, updating strategies, and version
control. Given the significant influence of data characteristics on FL, our
research examines the effect of code data heterogeneity on FL performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要