Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development
CoRR(2024)
摘要
Deploying machine learning (ML) on diverse computing platforms is crucial to
accelerate and broaden their applications. However, it presents significant
software engineering challenges due to the fast evolution of models, especially
the recent s (s), and the emergence of new computing platforms.
Current ML frameworks are primarily engineered for CPU and CUDA platforms,
leaving a big gap in enabling emerging ones like Metal, Vulkan, and WebGPU.
While a traditional bottom-up development pipeline fails to close the gap
timely, we introduce TapML, a top-down approach and tooling designed to
streamline the deployment of ML systems on diverse platforms, optimized for
developer productivity. Unlike traditional bottom-up methods, which involve
extensive manual testing and debugging, TapML automates unit testing through
test carving and adopts a migration-based strategy for gradually offloading
model computations from mature source platforms to emerging target platforms.
By leveraging realistic inputs and remote connections for gradual target
offloading, TapML accelerates the validation and minimizes debugging scopes,
significantly optimizing development efforts.
TapML was developed and applied through a year-long, real-world effort that
successfully deployed significant emerging models and platforms. Through
serious deployments of 82 emerging models in 17 distinct architectures across 5
emerging platforms, we showcase the effectiveness of TapML in enhancing
developer productivity while ensuring model reliability and efficiency.
Furthermore, we summarize comprehensive case studies from our real-world
development, offering best practices for developing emerging ML systems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要