Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
arxiv(2023)
摘要
Deep neural networks based on linear complex-valued RNNs interleaved with
position-wise MLPs are gaining traction as competitive approaches to sequence
modeling. Examples of such architectures include state-space models (SSMs) like
S4, LRU, and Mamba: recently proposed models that achieve promising performance
on text, genetics, and other data that require long-range reasoning. Despite
experimental evidence highlighting these architectures' effectiveness and
computational efficiency, their expressive power remains relatively unexplored,
especially in connection to specific choices crucial in practice - e.g.,
carefully designed initialization distribution and use of complex numbers. In
this paper, we show that combining MLPs with both real or complex linear
diagonal recurrences leads to arbitrarily precise approximation of regular
causal sequence-to-sequence maps. At the heart of our proof, we rely on a
separation of concerns: the linear RNN provides a lossless encoding of the
input sequence, and the MLP performs non-linear processing on this encoding.
While we show that using real diagonal linear recurrences is enough to achieve
universality in this architecture, we prove that employing complex eigenvalues
near unit disk - i.e., empirically the most successful strategy in SSMs -
greatly helps the RNN in storing information. We connect this finding with the
vanishing gradient issue and provide experimental evidence supporting our
claims.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要