kcbd-lab-lgThe multidisciplinary group will comprise companies that address biology, computers, and big data, initially addressing four challenges in genomic biology:

Genotypic Determinants of Human Disease: Researchers compare genomes of diseased patients and healthy individuals. Single-nucleotide polymorphisms (SNPs) are a common type of variation between individual genomes believed to have disease associations. Pairs or sets of SNPs in biologically interacting genes are likely to have much greater predictive power; identifying them involves sifting through combinations of millions of SNPs and is currently prohibitively costly.

Understanding How Microbiomes Affect Health, Agriculture, and the Environment: This involves studying a large collection of microbial genomes present in a biological sample. By understanding the sample’s genomic composition, one can form hypotheses about its properties (e.g., disease correlates). The research holds tremendous promise, but problems of scalable data analysis arise from the complexity and heterogeneity of the samples and from the technological challenges associated with “next-generation sequencing” (NGS).

Detection of Genomic Variation: A major bottleneck in the vision of personalized medicine is reliable detection of genomic variation from high-throughput sequence data from NGS. There is an urgent need to explore relative contributions of the many causes of errors in variant calling, as well as to build algorithmic solutions to correct errors.

Gene Network Analysis: It is widely appreciated that genes act in complex networks; no gene acts alone. A key to using genomic information to understand and predict features of biological systems is the ability to model these networks and determine how they change under different environmental conditions, or in health and disease. New computing approaches will be necessary to integrate datasets at unprecedented scales to achieve necessary predictive power and insights.

Below is a list of envisioned research projects: