Abstract 5375: Learned phenotypic embeddings enable scalable imputation of high-content molecular data elucidating prognostic chromatin signatures

Christopher S. Probert,Zachary R. McCaw, Navami Jain,Daphne Koller

Cancer Research(2023)

引用 0|浏览20
暂无评分
摘要
Abstract Emerging high-content data modalities like functional genomics and spatial proteomics have enormous potential to reveal determinants of phenotypic plasticity that underlie variability in clinical outcomes, but to date these modalities are only collected in modestly sized research cohorts (< 200-400 patients), where we lack power to detect subtype-specific or prognostic signatures. To study intertumor heterogeneity on a much larger scale (>10,000 patients), we developed a machine learning framework based on self-supervised embeddings that allows scalable imputation of high-content data on large standard of care datasets. Our framework starts by learning a phenotypic embedding of tumor state based solely on H&E histology images, allowing the embedding to be trained on large patient cohorts regardless of availability of molecular covariates. It then learns to predict genomic or proteomic labels from the lower-dimensional phenotypic embeddings. This model can be used for imputation in much larger cohorts, where only clinical outcome and histology are available. To demonstrate our method, we use the TCGA ATAC-seq data, which is available for 400 patients across 23 cancer types. By learning self-supervised embeddings of histology, our framework was able to impute ATAC-seq for 5,000 peaks in 11,000 patients across 31 cancer types, with high accuracy in held out samples (R2 = 0.61). To our knowledge, this represents the broadest available pan-cancer chromatin landscape. The imputed ATAC-seq reveals a subset of peaks that are significantly associated with overall survival (OS) in multiple cancer types (e.g., Breast (BC) ATAC-only HR 1.75, p: 8.6E-3). Genes proximal to these peaks are strongly enriched for well characterized oncogenes, and also several novel genes with functions in cellular metabolism and chromatin remodeling whose expression is not known to be prognostic in our disease settings. Finally, we developed models to predict OS from H&E slide embeddings and from imputed ATAC-seq, both pan-cancer and in specific tumor types. Both models significantly outperform baseline stage and molecular subtype clinical risk predictors (e.g., BC baseline HR: 2.44, p: 3E-6 vs. embedding/imputed ATAC HR 3.78, 2E-9, p for improvement 2E-9) and, interestingly, we find that adding an ATAC-seq based risk score to an embedding-based risk score significantly improves disease-specific survival prediction (HCC embedding-only HR: 2.13, p: 8E-5 vs. HCC embedding/imputed ATAC HR: 2.65, p: 1E-6). This suggests that histopathology images are a rich source of prognostic information beyond that which is captured by traditional pathologist grading. Overall, our work highlights the ability to use self-supervised embeddings of histopathology to impute biological covariates on large, standard-of-care cohorts, empowering novel insights into disease mechanisms and patient outcome. Citation Format: Christopher S. Probert, Zachary R. McCaw, Navami Jain, Daphne Koller. Learned phenotypic embeddings enable scalable imputation of high-content molecular data elucidating prognostic chromatin signatures. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5375.
更多
查看译文
关键词
prognostic chromatin signatures,phenotypic embeddings,scalable imputation,molecular data,high-content
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要