DP-TabICL: In-Context Learning with Differentially Private Tabular Data
CoRR(2024)
摘要
In-context learning (ICL) enables large language models (LLMs) to adapt to
new tasks by conditioning on demonstrations of question-answer pairs and it has
been shown to have comparable performance to costly model retraining and
fine-tuning. Recently, ICL has been extended to allow tabular data to be used
as demonstration examples by serializing individual records into natural
language formats. However, it has been shown that LLMs can leak information
contained in prompts, and since tabular data often contain sensitive
information, understanding how to protect the underlying tabular data used in
ICL is a critical area of research. This work serves as an initial
investigation into how to use differential privacy (DP) – the long-established
gold standard for data privacy and anonymization – to protect tabular data
used in ICL. Specifically, we investigate the application of DP mechanisms for
private tabular ICL via data privatization prior to serialization and
prompting. We formulate two private ICL frameworks with provable privacy
guarantees in both the local (LDP-TabICL) and global (GDP-TabICL) DP scenarios
via injecting noise into individual records or group statistics, respectively.
We evaluate our DP-based frameworks on eight real-world tabular datasets and
across multiple ICL and DP settings. Our evaluations show that DP-based ICL can
protect the privacy of the underlying tabular data while achieving comparable
performance to non-LLM baselines, especially under high privacy regimes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要