kcbd-lab-lgThe multidisciplinary group will comprise companies that address biology, computers, and big data, initially addressing four challenges in genomic biology:

Genotypic Determinants of Human Disease: Researchers compare genomes of diseased patients and healthy individuals. Single-nucleotide polymorphisms (SNPs) are a common type of variation between individual genomes believed to have disease associations. Pairs or sets of SNPs in biologically interacting genes are likely to have much greater predictive power; identifying them involves sifting through combinations of millions of SNPs and is currently prohibitively costly.

Understanding How Microbiomes Affect Health, Agriculture, and the Environment: This involves studying a large collection of microbial genomes present in a biological sample. By understanding the sample’s genomic composition, one can form hypotheses about its properties (e.g., disease correlates). The research holds tremendous promise, but problems of scalable data analysis arise from the complexity and heterogeneity of the samples and from the technological challenges associated with “next-generation sequencing” (NGS).

Detection of Genomic Variation: A major bottleneck in the vision of personalized medicine is reliable detection of genomic variation from high-throughput sequence data from NGS. There is an urgent need to explore relative contributions of the many causes of errors in variant calling, as well as to build algorithmic solutions to correct errors.

Gene Network Analysis: It is widely appreciated that genes act in complex networks; no gene acts alone. A key to using genomic information to understand and predict features of biological systems is the ability to model these networks and determine how they change under different environmental conditions, or in health and disease. New computing approaches will be necessary to integrate datasets at unprecedented scales to achieve necessary predictive power and insights.


Current Research Projects include:

From Analytics to Cognition: Taking Genomic Science to the Bedside | The goal of this project is to generate actionable intelligence using smart analytics to integrate big-data in the form of omics (genomics, transcriptomics, metabolomics, etc.), clinical data, and longitudinal data from electronic health records (EHR). The actionable intelligence is a descriptive piece of information with high confidence and accuracy that can be used to tailor and individualize diagnosis and therapeutics for a given patient or inform potential candidates for biomarker discovery. The analytics and tools will be developed using engineering expertise at the Univ. of Illinois in close collaboration and partnership with leading clinicians, biologists, and bioinformatics specialists at Mayo Clinic. This project is exploring societally relevant, prevalent, and yet less-understood diseases such as triple-negative breast cancer, major depressive disorder, and diabetes.

Information-Compression Algorithms for Genomic Data Storage and Transfer | Data compression is crucial for enabling timely exchange and long-term storage of heterogeneous biological and clinical data. To facilitate efficient organization and maintenance of genomic databases and to allow for fast random access, query, and search, specialized software solutions for compression and computing in the compressive domain is being developed.

Improving the Accuracy of Genomic Variant Calling Through Deep Learning | This project will develop new deep learning approaches to tackle unsolved problems for variant calling (e.g., SNPs and small indels in low-complexity regions with ambiguity). Unlike traditional methods, our algorithms will not only provide the best variant calling quality but also translate well across different application domains (germline/somatic), sequencing methods (WGS/exome/amplicon), and platforms (Illumina/IonTorrent). Meanwhile, our new machine-learning-based implementation will use industry-standard libraries, such as Tensorflow and STL, and target both GPUs and FPGAs for computation acceleration.

Scaling the Computation of Epistatic Interactions in GWAS Data | Calculating epistatic interactions between genomic variants in studies incorporating complex endophenotypes is a computationally challenging problem that requires emphasis on accelerating and parallelizing the code and achieving workload distribution efficiency. Development of fast production-grade software in this area will enable the detection of epistasis in many existing GWAS datasets, in both the biomedical and agricultural fields

Potential research projects:


Current Publication List – click here.