On The Complexity Of Sequence-To-Graph Alignment

JOURNAL OF COMPUTATIONAL BIOLOGY(2020)

引用 18|浏览0
暂无评分
摘要
Availability of extensive genetic data across multiple individuals and populations is driving the growing importance of graph-based reference representations. Aligning sequences to graphs is a fundamental operation on several types of sequence graphs (variation graphs, assembly graphs, pan-genomes, etc.) and their biological applications. Although research on sequence-to-graph alignments is nascent, it can draw from related work on pattern matching in hypertext. In this article, we study sequence-to-graph alignment problems under Hamming and edit distance models, and linear and affine gap penalty functions, for multiple variants of the problem that allow changes in query alone, graph alone, or in both. We prove that when changes are permitted in graphs either standalone or in conjunction with changes in the query, the sequence-to-graph alignment problem is NP-complete under both Hamming and edit distance models for alphabets of size >= 2. For the case where only changes to the sequence are permitted, we present an O(|V|+m|E|) time algorithm, where m denotes the query size, and V and E denote the vertex and edge sets of the graph, respectively. Our result is generalizable to both linear and affine gap penalty functions, and improves upon the runtime complexity of existing algorithms.
更多
查看译文
关键词
approximate pattern matching,computational complexity,genomics,graph search,sequence alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要