Our lab are interested in multiple projects from biomedial research, particuarly in the area of genetics and genomes. Below is a summary of the ongoing research in our lab.
1. Statistical methods for next-generation sequencing data
Emerging sequencing technologies have made whole-genome sequencing become available for researches to study various phenotypes/diseases of interest, particularly focusing on rare variants sites. Although the first batch of sequencing projects has mainly focused on the analysis of unrelated individuals, numerous sequencing studies including related individuals have been carried out or launched recently as the sequencing cost reduces rapidly. However, the methodologies for analyzing family-based sequence data are largely falling behind partially due to the complexity of family structures and computational barrier. In this study, our primary goals are to efficiently and accurately infer individual genotypes and haplotypes - the key component of any sequencing project - by combining information from both family and population levels, and to study how differential sequencing errors will affect downstream association analysis. This study is currently supported by NIH grant R01HG007358.
2. Genetic studies of complex diseases
AMD: Genome-wide association study (GWAS) has led to notable successes in identifying multiple loci associated with the risk of age-related macular degeneration (AMD). Disease risk loci have been identified through GWAS, either by individual studies or through meta-analyses of multiple studies from Consortium. On the other hand, there are growing interests on disease progression (e.g. time to advanced AMD). Synergy of genome-wide genetic data and progression outcomes will greatly advance our knowledge about disease biology and its prediction. Genome-wide survey on survival or longitudinal type of data has not been formulated and well performed in the area of AMD. Genes that affect AMD progression are largely unknown. Emerging genetic and phenotypic data from our collaborators provide unique opportunity for us to develop statistical methods on existing data sets and facilitate ongoing consortium studies in which we are involved. This study is currently supported by NIH grant R01EY024226.
Asthma: Puerto Ricans share a disproportionate burden of childhood asthma in the United States, no disease-susceptibility genes have been confidently identified in this ethnic group. Over the last six years, our collaborate Dr. Celedon has used funding from NIH (grant HL079966) and internal sources to collect genetic, epigenetic and phenotypic data in up to 1,127 Puerto Rican school-aged children (with and without asthma), including genome-wide (GW) genotypes, expression profiling and DNA methylation. Building on this work, we obtained new NIH funding (grant R01 HL117191) to conduct a GW study of DNA methylation and gene expression studies using DNA/RNA from two tissues (white blood cells [WBCs] and nasal epithelium) in Puerto Rican children, thus expanding our sample size. On the basis of our preliminary studies, we hypothesize that single nucleotide polymorphisms (SNPs) that are more common in West Africans than in members of other ethnic groups (“African”) and SNPs that are common across racial ancestral groups for PRs (“cosmopolitan”) influence asthma and lung function in Puerto Rican children. To test this hypothesis, we will use an approach integrating our “omics” data in a well-characterized cohort of Puerto Rican children.
3. Integrative analysis of omics data
Motivated by the rapidly generating datasets from our collaborator, we are interested in analyze multi-omics data in a unified framework. Particularly, we are trying to understand the following relations among DNA, RNA, and methylation data on the same cohort of samples from multiple tissues: A) Expression quantitative trait loci (eQTLs); B) Methylation quantitative trait loci (mQTLs); C) Expression qauntitative trait methylations (eQTMs).
4. Statistical methods for admixed populations
Genetic admixture is the interbreeding of individuals from populations, such as Europeans, Native Americans, and Africans. Inference of ancestry proportion using whole-genome genetic data from admixed individuals such as Hispanics and African Americans is useful in a broad array of practical applications, most notably through population disease-linkage, and the origins and histories of human populations. We are building a computational pipeline that allows researches to systematically and flexibily test existing programs. On top of it, we are developing new methods for local ancestry inference in the context of sequencing data.