Monday, October 14, 2019,
Davis Science Building / 301

Uncovering cluster structures, identifying relevant features and finding key relationships between variables can shed important insights when analyzing high-dimensional data. In this talk, I will present methods we have proposed to address these problems in a unified manner. I will start by discussing variable selection in the context of unsupervised clustering, where the goal is to uncover the latent classes while identifying variables that discriminate between the different groups. This may consist, for example, in using genomic data to simultaneously discover disease subtypes and locate markers that distinguish between these subtypes. In the second part of the talk, I will focus on the problem of relating two high-dimensional data sets, as in integrative genomic studies, where there is interest in finding relationships between genomic data from different sources. I will discuss methods we have proposed that combine ideas of mixture of regression models and variable selection to uncover correlated response profiles and to identify cluster-specific subsets of covariates. Finally, I will present mixtures of regression trees, which allow us to perform variable selection while accommodating non-linear relationships and accounting for interaction effects. I will illustrate the methods with various applications.


Public event