Sample-Efficient Linear Regression with Self-Selection Bias
CoRR(2024)
摘要
We consider the problem of linear regression with self-selection bias in the
unknown-index setting, as introduced in recent work by Cherapanamjeri,
Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes m
i.i.d. samples (𝐱_ℓ,z_ℓ)_ℓ=1^m where
z_ℓ=max_i∈ [k]{𝐱_ℓ^T𝐰_i+η_i,ℓ},
but the maximizing index i_ℓ is unobserved. Here, the
𝐱_ℓ are assumed to be 𝒩(0,I_n) and the noise
distribution η_ℓ∼𝒟 is centered and independent
of 𝐱_ℓ. We provide a novel and near optimally sample-efficient
(in terms of k) algorithm to recover 𝐰_1,…,𝐰_k∈ℝ^n up to additive ℓ_2-error ε with polynomial
sample complexity Õ(n)·𝗉𝗈𝗅𝗒(k,1/ε) and
significantly improved time complexity
𝗉𝗈𝗅𝗒(n,k,1/ε)+O(log(k)/ε)^O(k). When
k=O(1), our algorithm runs in 𝗉𝗈𝗅𝗒(n,1/ε) time,
generalizing the polynomial guarantee of an explicit moment matching algorithm
of Cherapanamjeri, et al. for k=2 and when it is known that
𝒟=𝒩(0,I_k). Our algorithm succeeds under significantly
relaxed noise assumptions, and therefore also succeeds in the related setting
of max-linear regression where the added noise is taken outside the maximum.
For this problem, our algorithm is efficient in a much larger range of k than
the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran
[IEEE Trans. Inf. Theory 2022] for not too small ε, and leads to
improved algorithms for any ε by providing a warm start for
existing local convergence methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要