A k-mer query tool for assessing population diversity in pangenomes

Hang Su,Ziwei Chen,Maya L. Najarian,Martin T. Ferris,Fernando Pardo-Manuel de Villena,Leonard McMillan

BCB（2021）

引用 0|浏览15

暂无评分

摘要

ABSTRACTInexpensive and fast genome sequencing has yielded multiple genome assemblies that, taken together, can be considered as a single pangenome model. However, applying conventional alignment-based sequence analysis to the assemblies of a pangenome is computationally expensive and largely redundant. Here, we present an alignment-free method that analyzes the relationship of any new sample relative to a given pangenome model using selected k-mer queries. We select a representative set of k-mers from the pangenome as probes and determine their frequencies in the raw short-read sequence data. The selection of probes is designed to cover every base of the pangenome, maximize sharing, and identify informative probes that discriminate between haplotypes. The k-mer frequencies are determined using an FM-index built over the raw sequence data of the new sample. Prior to the k-mer search, the probes are reordered to maximize the shared suffixes between succesive k-mers, thus reducing the overall run time compared to executing each search independently. We aggregate the forward and reverse k-mer probe counts, save them in the appropriate rows of a count matrix and remap them back to their locations in the pangenome. The resulting probe database serves as a valuable resource for representing population-scale sequence variations based on the pangenome model.

查看译文

关键词

Pangenome, K-mer Query, Sequence Analysis, BWT

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要