Pairwise Whole Genome Alignment (WGA) is a crucial first step to understanding evolution at the DNA-sequence level. Pairwise WGA of thousands of currently available species genomes could help make biological discoveries; computing them, however, for even a fraction of the millions of possible pairs is prohibitive; WGA of a single pair of vertebrate genomes (human-mouse) takes 11 hours on a 96-core Amazon Web Services (AWS) instance (c5.24xlarge).
This paper presents SegAlign; a scalable, GPU-accelerated system for computing pairwise WGA. SegAlign is based on the standard seed-filter-extend heuristic, in which the filtering stage dominates the runtime (e.g., 98% for human-mouse WGA), and is accelerated using GPU(s). Using three vertebrate genome pairs, we show that SegAlign provides a speedup of up to 14x on an 8-GPU, 64-core AWS instance (p3.16xlarge) for WGA and a nearly 2.3x reduction in dollar cost. SegAlign also allows parallelization over multiple GPU nodes and scales efficiently.