SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
arxiv(2024)
摘要
We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach
for controlling an autoregressive generative music transformer using classifier
probes. These simple logistic regression probes are trained on the output of
each attention head in the transformer using a small dataset of audio examples
both exhibiting and missing a specific musical trait (e.g., the
presence/absence of drums, or real/synthetic music). We then steer the
attention heads in the probe direction, ensuring the generative model output
captures the desired musical trait. Additionally, we monitor the probe output
to avoid adding an excessive amount of intervention into the autoregressive
generation, which could lead to temporally incoherent music. We validate our
results objectively and subjectively for both audio continuation and
text-to-music applications, demonstrating the ability to add controls to large
generative models for which retraining or even fine-tuning is impractical for
most musicians.
Audio samples of the proposed intervention approach are available on our demo
page http://tinyurl.com/smitin .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要