Integrated approaches for genomic variation discovery using high throughput sequencing


The new sequencing technologies revolutionize genomics as they promise low-cost, high-throughput sequencing of both new species and different individuals to more fully analyze the patterns of genetic variation. These “next-generation” platforms started to contribute our understanding of human genome diversity with the 1000 Genomes Project that employs the high throughput sequencing (HTS) methods to produce the most detailed map of human variation. Other large scale sequencing projects are initiated to characterize genomes to assess characteristics of human genome diversity, to find genetic causes for disease, and infer the evolutionary history of species. Although we can now generate data at a rate previously unimaginable, the analysis of such data is lingering as currently available algorithms to analyze HTS data show different strengths and biases for different classes of variation. There is a need to forge an alliance between computer science and genomics to devise better methods to use the massive amount of sequence data. Here we propose to develop novel algorithms to comprehensively and quickly discover all forms of genomic variants including point mutations, indel polymorphisms and structural variation while resolving inconsistencies among different variants to accurately identify normal and disease-causing variation.