alkan lab / support

Integrated approaches for genomic variation discovery using high throughput sequencing

European Union Marie Curie Actions Career Integration Grant (PCIG10-GA-2011-303772), 2012-2016
PI: Can Alkan
Students: Marzieh Eslami Rasekh, Can Fırtına
Total €100,000 for four years.
The goal of this project is to develop computational methods to understand genomic variation using high throughput sequencing (HTS) with a special focus on structural variation (SV) including copy-number variation (CNV) and balanced rearrangements (inversions, translocations) in the complex regions of the human genome that are rich in repeats and duplications.

Abstract

The new sequencing technologies revolutionize genomics as they promise low-cost, high-throughput sequencing of both new species and different individuals to more fully analyze the patterns of genetic variation. These “next-generation” platforms started to contribute our understanding of human genome diversity with the 1000 Genomes Project that employs the high throughput sequencing (HTS) methods to produce the most detailed map of human variation. Other large scale sequencing projects are initiated to characterize genomes to assess characteristics of human genome diversity, to find genetic causes for disease, and infer the evolutionary history of species. Although we can now generate data at a rate previously unimaginable, the analysis of such data is lingering as currently available algorithms to analyze HTS data show different strengths and biases for different classes of variation. There is a need to forge an alliance between computer science and genomics to devise better methods to use the massive amount of sequence data. Here we propose to develop novel algorithms to comprehensively and quickly discover all forms of genomic variants including point mutations, indel polymorphisms and structural variation while resolving inconsistencies among different variants to accurately identify normal and disease-causing variation.

Dissemination

Accelerating read mapping with FastHASH. Hongyi Xin, Donghyuk Lee, Farhad Hormozdiari, Samihan Yedkar, Onur Mutlu, Can Alkan. BMC Genomics, 14(Suppl 1):S13, 2013
Genome Sequencing Highlights the Dynamic Early History of Dogs. Adam H. Freedman, Ilan Gronau, Rena M. Schweizer, Diego Ortega-Del Vecchyo, Eunjung Han, Pedro M. Silva, Marco Galaverni, Zhenxin Fan, Peter Marx, Belen Lorente-Galdos, Holly Beale, Oscar Ramirez, Farhad Hormozdiari, Can Alkan, Carles Vilà, Kevin Squire, Eli Geffen, Josip Kusak, Adam R. Boyko, Heidi G. Parker, Clarence Lee, Vasisht Tadigotla, Adam Siepel, Carlos D. Bustamante, Timothy T. Harkins, Stanley F. Nelson, Elaine A. Ostrander, Tomas Marques-Bonet, Robert K. Wayne, John Novembre. PLoS Genetics, 10(1): e1004016, 2014.
Early postzygotic mutations contribute to de novo variation in a healthy monozygotic twin pair. Gülşah M Dal, Bekir Ergüner, Mahmut S Sağıroğlu, Bayram Yüksel, Onur Emre Onat, Can Alkan, Tayfun Özçelik. J Med Genet, 51(7):455-459, 2014.
mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications. Faraz Hach*, Iman Sarrafi*, Farhad Hormozdiari, Can Alkan, Evan E. Eichler, S. Cenk Sahinalp. Nucl Acids Res, Jul;42(Web Server issue):W494-500, 2014.
Fast and accurate mapping of Complete Genomics reads. Donghyuk Lee*, Farhad Hormozdiari*, Hongyi Xin, Faraz Hach, Onur Mutlu, Can Alkan. Methods, Jun;79-80:3-10, 2015.
Shifted Hamming Distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Hongyi Xin, John Greth, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan* and Onur Mutlu*. Bioinformatics, [published online, Jan 10], 2015.
Optimal Seed Solver: Optimizing Seed Selection in Read Mapping. Hongyi Xin, Sunny Nahar, Richard Zhu, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan*, Onur Mutlu*. Bioinformatics, Jun 1;32(11):1632-42, 2016.
Robustness of massively parallel sequencing platforms. Pınar Kavak, Bayram Yüksel, Soner Aksu, M. Oğuzhan Külekçi, Tunga Güngör, Faraz Hach, S. Cenk Sahinalp, Turkish Human Genome Project, Can Alkan*, M. Şamil Sağıroğlu*. PLoS ONE, Sep 18;10(9):e0138259, 2015.
A global reference for human genetic variation. 1000 Genomes Project Consortium. Nature, Oct 1; 526 (7571):68–74, 2015.
On genomic repeats and reproducibility. Can Firtina and Can Alkan. Bioinformatics, Aug 1;32(15):2243-7, 2016.
Discovery of large genomic inversions using long range information. Marzieh Eslami Rasekh, Giorgia Chiatante, Mattia Miroballo, Joyce Tang, Mario Ventura, Chris T. Amemiya, Evan E. Eichler, Francesca Antonacci*, Can Alkan*. BMC Genomics, Jan 10;18(1):65, 2017.
MAGNET: understanding and improving the accuracy of genome pre-alignment filtering. Mohammed Alser, Onur Mutlu*, Can Alkan*. IPSI Transactions on Internet Research, 13(2), 2017.
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Mohammed Alser, Hasan Hassan, Hongyi Xin, Oguz Ergin, Onur Mutlu*, Can Alkan*. Bioinformatics, Nov 1; 33(21):3335-63, 2017.
Hercules: a profile HMM-based hybrid error correction algorithm for long reads. Can Firtina, Ziv-Bar Joseph, Can Alkan*, A. Ercument Cicek*. Nucleic Acids Research, 46(21): e125, 2018.