Research

Multivariate Survival Analysis with High-dimensional Genetic Variants

I develop copula-based parametric and semiparametric regression models for bivariate censored data (right- and interval-censored) and perform GWAS to identify significant genetic variants associated with progression of a bilateral eye disease called Age-relate Macular Degeneration (AMD). I also develop the first R package CopulaCenR to perform copula-based modeling and testing for both bivariate right-censored and bivariate interval-censored data.

Progression of AMD
Genetic variants that are significantly associated with AMD progression
  • Tao Sun, Ying Ding. Copula-based Semiparametric Regression Method for Bivariate Data under General Interval Censoring. (An earlier version won the 2019 ENAR Distinguished Student Paper Award). Biostatistics. In Press. doi:10.1093/biostatistics/kxz032.
  • Tao Sun, Yi Liu, Richard J. Cook, Wei Chen, Ying Ding. (2019). Copula-based Score Test for Bivariate Time-to-event Data, with Application to a Genetic Study of AMD Progression. Lifetime Data Analysis. 25(3), 546–568. PMID: 30560439.
  • Tao Sun, Ying Ding. CopulaCenR: Copula-based Regression Models for Bivariate Censored Data in R. R Journal. Accepted. Software available in [CRAN].

Deep-learning Survival Model for Prognostic Prediction and Subgroup Identification

Recent advances in deep learning techniques have made extraordinary achievements in establishing flexible and powerful prediction models. However, the application of deep learning in biomedical research is limited. I developed a novel deep learning survival prediction framework for the time-to-event outcome. One advantage of the deep learning model is that it can effectively extract complex interactions among features, enabling the identification of patient subgroups at different risk levels. Then, I built a deep learning model for predicting AMD progression based on the GWAS results from two large-scale clinical trials (AREDS1 and AREDS2).

A 2-hidden layer neural network (from cs231n.github.io)
GWAS-based DL model identifies subgroups with distinct progression profiles.
  • Tao Sun, Yue Wei, Wei Chen, Ying Ding. GWAS-based Deep Learning for Survival Prediction. Statistics in Medicine. Accepted.

Goodness-of-fit Tests on Copula Specification under Censoring

There are limit works on testing the goodness of a fitted copula model in the presence of censoring, especially interval-censoring. I develop a computationally efficient testing procedure under both right- and interval-censoring. The proposed method is applicable to any Archimedean copula family with an explicit form.

Copula families with various dependence structures
  • Tao Sun, Yu Cheng, Ying Ding. Goodness-of-fit Test for Specification of Copula Models for Bivariate Time-to-event Data. In preparation.

Utilizing Large-scale National Health Survey Data and Statistical Methods for Public Health Discoveries and Applications

I have extensive experience working on the full cycles of CDC National Health and Nutritional Examination Survey (NHANES 1999-2016), Specifically, I integrated over 1,000 raw data files across 18 years into a single and unified dataset, with ~100,000 subjects and ~10,000 features. The synergy of the wealthy NHANES data and statistical methods leads to novel discovery of disease risk factors and accurate diagnosis/prediction of diseases. In addition, I have linked the full cycles of NHANES data with mortality data from the National Center for Health Statistics (NCHS).

NHANES has a big data with ~100,000 subjects and ~10,000 features (Picture from CDC)
  • Ge Yang*, Tao Sun*, Yueh-Ying Han, Franziska Rosser, Erick Forno, Wei Chen, Juan C. Celedón. Serum cadmium and lead, wheezing and lung function in a nationwide study of adults in the United States. Journal of Allergy and Clinical Immunology: In Practice. In Press. PMID:31146018. (* co-first author)
  • Ge Yang, Yueh-Ying Han, Tao Sun, Ling Li, Franziska Rosser, Erick Forno, Sanjay R. Patel, Wei Chen, Juan C. Celedón. Sleep duration, current asthma, and lung function in a nationwide study of U.S. adults. American Journal of Respiratory and Critical Care Medicine. In Press. PMID: 31225970.
  • Tao Sun, Minyue Liu, Wei Chen, Juan C. Celedón. Computer-aided Asthma Diagnosis in School-age Children. In preparation. [shiny]

Omics (e.g., bulk RNA and single-cell RNA sequencing) data analysis for novel biomarker and cell type discoveries

DE analyses discovered biomarkers that are differentially expressed between two patient groups: volcano plot (left) with significant genes marked in red and heatmap plot (right) of gene expressions in two groups.
  • Tao Sun, Zhe Sun, Yale Jiang, Annabel Ferguson, Joseph M. Pilewski, Jay K. Kolls, Wei Chen, Kong Chen. (2019). Transcriptomic responses to Ivacaftor and prediction of Ivacaftor clinical responsiveness. American Journal of Respiratory Cell and Molecular Biology. 61(5):643-652. PMID:30995102. (Editorial Highlight)
scRNA analyses revealled distinct subtypes within cells that were thought to be homogenous in the past: k-mean clustering (left) and differential expression heatmap (right).
The scRNA sequencing is a rapidly emerging source of big data (Picture from Human Cell Atlas)
  • Hiroshi Yano, Deepali Sawant, Maria Chikina, Qianxia Zhang, Zhe Sun, Tao Sun, Wei Chen, Creg Workman, Dario Vignali. (2019). Adaptive plasticity of IL10+ and IL35+ regulatory T cells and their cooperative regulation of anti-tumor immunity. Nature Immunology. In Press. PMID: 30936494.

Causal inference in observational data

Propensity score analysis
  • Gabrielle Snyder, Claudia Holzman, Tao Sun, Marnie Bertolet, Bertha Bullen, Janet M. Catov. (2018). Breastfeeding greater than six months is associated with smaller maternal waist circumference up to one decade after delivery. Journal of Women’s Health. 28(4):462-472. PMID: 30481097.

Other collaborative works

  • Yale Jiang, Olena Grozieva, Ting Wang, Erick Forno, Nadia Boutaoui, Tao Sun, Edna Acosta-Perez, Glorisa Canino, Erik Melen, Wei Chen, Juan C. Celedon. (2019). Transcriptomics of atopy and atopic asthma in white blood cells from children and adolescents. European Respiratory Journal. In Press. PMID: 30923181.
  • Kristy Boggs, Ting Wang, Abrahim Orabi, Amitava Mukherjee, John Eisses, Tao Sun, Li Wen, Tanveer Javed, Farzad Esni, Wei Chen, Sohail Husain. (2018). Pancreatic gene expression during recovery after pancreatitis reveals unique transcriptome profiles. Scientific Reports. PMID: 29362419.
  • Qi Yan, Ying Ding, Yi Liu, Tao Sun, Lars G Fritsche, Traci Clemons, Rinki Ratnapriya, Michael Klein, Richard Cook, Yu Liu, Ruzong Fan, Lai Wei, Gonçalo Abecasis, Anand Swaroop, Emily Chew, AREDS2 Research Group, Daniel Weeks, Wei Chen. (2018). Genome-wide Analysis of Disease Progression in Age-related Macular Degeneration. Human Molecular Genetics. 27(5):929-940. PMID: 29346644.
  • Adam Christopher, Abraham Apfel, Tao Sun, Jackie Kreutzer, David Ezon. (2018). Diastolic velocity half time is associated with aortic coarctation gradient at catheterization independent of echocardiographic and clinical blood pressure gradients. Congenital Heart Disease. PMID: 30395387.
  • Sergiu Abramovici, Arun Antony, Maria Elizabeth Baldwin, Alexandra Urban, Gena Ghearing, Julie Pan, Tao Sun, Robert Todd Krafty, R. Mark Richardson, Anto Bagic. (2017) Features of Simultaneous Scalp and Intracranial EEG That Predict Localization of Ictal Onset Zone. Clinical EEG and Neuroscience. PMID: 29067832.
  • Abhinav P. Acharya, Kathryn M. Theisen, Andres Correa, Thiagarajan Meyyappan, Abraham Apfel, Tao Sun, Tatum V. Tarin, and Steven R. Little. (2017) An inexpensive, point-of-care urine test for bladder cancer in patients undergoing hematuria evaluation. Advanced Healthcare Materials. PMID: 28885787.
  • Joshua Mattila, Pauline Maiello, Tao Sun, Laura Via, JoAnne Flynn. (2015). Granzyme B-expressing neutrophils correlate with bacteria load in granulomas from Mycobacterium tuberculosis-infected cynomolgus macaques. Cellular Microbiology. PMID: 25653138.