Below is a summary of ongoing research projects in our lab.

1. Statistical methods for next-generation sequencing data

Emerging sequencing technologies have made whole-genome sequencing become available for researches to study various phenotypes/diseases of interest, particularly focusing on rare variants sites. Although the first batch of sequencing projects has mainly focused on the analysis of unrelated individuals, numerous sequencing studies including related individuals have been carried out or launched recently as the sequencing cost reduces rapidly. However, the methodologies for analyzing family-based sequence data are largely falling behind partially due to the complexity of family structures and computational barrier. In this study, our primary goals are to efficiently and accurately infer individual genotypes and haplotypes - the key component of any sequencing project - by combining information from both family and population levels, and to study how differential sequencing errors will affect downstream association analysis. This study is currently supported by NIH grant R01HG007358. In addition to DNA-sequencing data, our lab has extensive experience in analyzing other types of omics data including RNA-seq, methylation, ATAC-seq, and single cell sequencing data.

2. Genetic studies of complex diseases

AMD: Genome-wide association study (GWAS) has led to notable successes in identifying multiple loci associated with the risk of age-related macular degeneration (AMD). Disease risk loci have been identified through GWAS, either by individual studies or through meta-analyses of multiple studies from Consortium. On the other hand, there are growing interests on disease progression (e.g. time to advanced AMD). Synergy of genome-wide genetic data and progression outcomes will greatly advance our knowledge about disease biology and its prediction. Genome-wide survey on survival or longitudinal type of data has not been formulated and well performed in the area of AMD. Genes that affect AMD progression are largely unknown. Emerging genetic and phenotypic data from our collaborators provide unique opportunity for us to develop statistical methods on existing data sets and facilitate ongoing consortium studies in which we are involved. This study is currently supported by NIH grant R01EY024226.

Asthma: Puerto Ricans share a disproportionate burden of childhood asthma in the United States, no disease-susceptibility genes have been confidently identified in this ethnic group. Over the last six years, our collaborate Dr. Celedon has used funding from NIH (grant HL079966) and internal sources to collect genetic, epigenetic and phenotypic data in up to 1,127 Puerto Rican school-aged children (with and without asthma), including genome-wide (GW) genotypes, expression profiling and DNA methylation. Building on this work, we obtained new NIH funding (grant R01 HL117191) to conduct a GW study of DNA methylation and gene expression studies using DNA/RNA from two tissues (white blood cells [WBCs] and nasal epithelium) in Puerto Rican children, thus expanding our sample size. On the basis of our preliminary studies, we hypothesize that single nucleotide polymorphisms (SNPs) that are more common in West Africans than in members of other ethnic groups (“African”) and SNPs that are common across racial ancestral groups for PRs (“cosmopolitan”) influence asthma and lung function in Puerto Rican children. To test this hypothesis, we will use an approach integrating our “omics” data in a well-characterized cohort of Puerto Rican children.

3. Statistical methods for single cell transcriptomic data

The study of single cell transcriptome can shed light on cellular and molecular processes at single cell resolution, which is essential for understanding the heterogeneity of cell populations across different conditions. The recently developed droplet-based single cell transcriptome sequencing (scRNA-seq) technology, such as 10X Genomics Chromium system, is able to measure the gene expression of tens of thousands of single cells from multiple individuals simultaneously in a short time period and at relatively low cost, making a population-scale single cell transcriptome study feasible. Despite the progress of method developments for analyzing scRNA-seq data from early generation platforms, there is a severe lack of tailored statistical methods and efficient computational tools for analyzing scRNA-seq data increased by an order of magnitude from this new generation platform.Our group is in the first place at the Univerisity of Pittsburgh to study single cell transcriptoimc profiling using the new droplet-based platform (10X Genomics). Collaborating with world-class experts at UPMC and Pitt, we have generated high-quality scRNA data (thousands of cells per sample) to study human diseases. We are also actively developing novel statistial methods to faciliate our ongoing analysis and address new challenges. We have developed DIMM-SC for clustering droplet-based single cell data.

4. Integrative analysis of omics data

Motivated by the massive high-throughput omics datasets from our collaborator, we are interested in analyze multi-omics data in a unified framework. Particularly, we are trying to understand the following relations among DNA, RNA, methylation, and ATAC-seq data on the same cohort of samples from multiple tissues: A) Expression quantitative trait loci (eQTLs); B) Methylation quantitative trait loci (mQTLs); C) Expression qauntitative trait methylations (eQTMs); D) chromatin accessibility QTLs (caQTLs)

An Incomplete List of Our Collaborators

Dr. Juan Celedon (Pitt)
Dr. Jay Kolls (Pitt)
Dr. Daniel Weeks (Pitt)
Dr. Timothy Billiar (Pitt)
Dr. Ying Ding (Pitt)
Dr. George Tseng (Pitt)
Dr. Rick Duerr (Pitt)
Dr. Robert Lafyatis (Pitt)
Dr. Zhao Ren (Pitt)
Dr. Bingshan Li (Vanderbilt)
Dr. Yun Li (North Carolina)
Dr. Ming Hu (Cleveland Clinic)
Dr. Goncalo Abecasis (Michigan)
Dr. Anand Swaroop (NEI)
Dr. Emily Chew (NEI)
Dr. Ruzong Fan (Georgetown)