A k-mer query tool for assessing population diversity in pangenomes

BCB(2021)

引用 0|浏览15
暂无评分
摘要
ABSTRACTInexpensive and fast genome sequencing has yielded multiple genome assemblies that, taken together, can be considered as a single pangenome model. However, applying conventional alignment-based sequence analysis to the assemblies of a pangenome is computationally expensive and largely redundant. Here, we present an alignment-free method that analyzes the relationship of any new sample relative to a given pangenome model using selected k-mer queries. We select a representative set of k-mers from the pangenome as probes and determine their frequencies in the raw short-read sequence data. The selection of probes is designed to cover every base of the pangenome, maximize sharing, and identify informative probes that discriminate between haplotypes. The k-mer frequencies are determined using an FM-index built over the raw sequence data of the new sample. Prior to the k-mer search, the probes are reordered to maximize the shared suffixes between succesive k-mers, thus reducing the overall run time compared to executing each search independently. We aggregate the forward and reverse k-mer probe counts, save them in the appropriate rows of a count matrix and remap them back to their locations in the pangenome. The resulting probe database serves as a valuable resource for representing population-scale sequence variations based on the pangenome model.
更多
查看译文
关键词
Pangenome, K-mer Query, Sequence Analysis, BWT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要