alkan lab / support

Novel algorithms and hardware designs for ultra-fast next-gen sequence analysis

United States of America National Institutes of Health (R01 HG006004), 2011-2015
PI: Onur Mutlu
Co-PI: Can Alkan
Subaward amount (4 years): $462,847
The goal of this project is to develop specialized hardware architectures to accelerate mapping reads generated with the high throughput sequencing platforms.

People

Principal Investigators: Assistant Prof. Can Alkan (Bilkent U.) and Assistant Prof. Onur Mutlu (Carnegie Mellon U.)
Students

CMU: Hongyi Xin, Donghyuk Lee, Samihan Yedkar, Damla Şenol Çalı
UCLA: Farhad Hormozdiari
Bilkent: Mustafa Korkmaz, Azita Nouri, Mohammed Alser, Tuğba Doğan

Abstract

Our proposed research aims to accelerate next generation sequence analysis 1000-fold or more by combining our knowledge in genomic sequence analysis, algorithms development, and computer architecture/engineering. Our plan to address the problems of processing unprecedented amounts of sequence data has three major components. First, we will develop and improve sophisticated software algorithms and tools to handle large amounts of sequence reads generated by all major NGS platforms without sacrificing sensitivity while correcting for the sequencing biases associated by each of the NGS platforms. Our algorithms will also be able to map reads in the duplicated regions of the genome and report the underlying sequence variation, an important feature especially to characterize segmental duplications and structural variation that no other read mapping tool can currently achieve. Second, we will boost the performance and efficiency of our algorithms (100 to 1000-fold) by accelerating the required inherently-parallel computations of the sequence search problem on massively-parallel hardware engines available today, graphics processing units (GPUs). Finally, we will design specialized hardware architectures to enhance the speed of sequence analysis beyond orders of magnitude while reducing energy consumed by it by 100-fold or more.

Dissemination

SCALCE: boosting Sequence Compression Algorithms using Locally Consistent Encoding. Faraz Hach, Ibrahim Numanagić, Can Alkan, S. Cenk Sahinalp. Bioinformatics, Dec 1;28(23):3051-57, 2012.
Accelerating read mapping with FastHASH. Hongyi Xin, Donghyuk Lee, Farhad Hormozdiari, Samihan Yedkar, Onur Mutlu, Can Alkan. BMC Genomics, 14(Suppl 1):S13, 2013.
mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications. Faraz Hach*, Iman Sarrafi*, Farhad Hormozdiari, Can Alkan, Evan E. Eichler, S. Cenk Sahinalp. Nucl Acids Res, Jul;42(Web Server issue):W494-500, 2014.
Fast and accurate mapping of Complete Genomics reads. Donghyuk Lee*, Farhad Hormozdiari*, Hongyi Xin, Faraz Hach, Onur Mutlu, Can Alkan. Methods, [epub October 22], doi :10.1016/j.ymeth.2014.10.012, 2014.
Shifted Hamming Distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Hongyi Xin, John Greth, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan* and Onur Mutlu*. Bioinformatics, [published online, Jan 10], 2015.
Optimal Seed Solver: Optimizing Seed Selection in Read Mapping. Hongyi Xin, Sunny Nahar, Richard Zhu, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan*, Onur Mutlu*. Bioinformatics, Jun 1;32(11):1632-42, 2016.
MAGNET: understanding and improving the accuracy of genome pre-alignment filtering. Mohammed Alser, Onur Mutlu*, Can Alkan*. IPSI Transactions on Internet Research, 13(2), 2017.
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Mohammed Alser, Hasan Hassan, Hongyi Xin, Oguz Ergin, Onur Mutlu*, Can Alkan*. Bioinformatics, Nov 1; 33(21):3335-63, 2017.
GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies. Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan*, Onur Mutlu*. BMC Genomics, 19 (Suppl 2):89, 2018.
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan*, Onur Mutlu*. Briefings in Bioinformatics, [epub Apr 2; doi: 10.1093/bib/bby017], 2018.