Sample-Efficient Linear Regression with Self-Selection Bias

CoRR（2024）

引用 0|浏览9

暂无评分

摘要

We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes m i.i.d. samples (𝐱_ℓ,z_ℓ)_ℓ=1^m where z_ℓ=max_i∈ [k]{𝐱_ℓ^T𝐰_i+η_i,ℓ}, but the maximizing index i_ℓ is unobserved. Here, the 𝐱_ℓ are assumed to be 𝒩(0,I_n) and the noise distribution η_ℓ∼𝒟 is centered and independent of 𝐱_ℓ. We provide a novel and near optimally sample-efficient (in terms of k) algorithm to recover 𝐰_1,…,𝐰_k∈ℝ^n up to additive ℓ_2-error ε with polynomial sample complexity Õ(n)·𝗉𝗈𝗅𝗒(k,1/ε) and significantly improved time complexity 𝗉𝗈𝗅𝗒(n,k,1/ε)+O(log(k)/ε)^O(k). When k=O(1), our algorithm runs in 𝗉𝗈𝗅𝗒(n,1/ε) time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for k=2 and when it is known that 𝒟=𝒩(0,I_k). Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of k than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small ε, and leads to improved algorithms for any ε by providing a warm start for existing local convergence methods.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要