Dear all,
The next seminar in our series of statistics talks for people interested in quantitative methods in biology and medicine will take place on Thursday 30 October 2014 at 11h00 in the Salle Delachaux (Biopôle 2, premier étage, Route de la Corniche 10, Lausanne, M2: Vennes).
The speaker will be Dr. Aaron McDaid (IUMSP, postdoc by Zoltan Kutalik) who will speak about MCMC for mixtures of Gaussians and model selection
Abstract: In cluster analysis, it is usual to consider mixtures of Gaussian distributions. In this presentation, we consider six multivariate models of such mixtures which differ in the restrictions placed on the covariance matrices. The covariance may be different in each cluster or constrained to be equal. The covariance may be restricted to be diagonal-only or, further, such that each element of the diagonal is equal (spherical clusters). The most general covariance model places no restriction on the covariance matrices. Estimating the number of clusters and selecting one from the six covariance models in a fully Bayesian framework is the main goal of this research. Through the use of integration and conjugate priors, such as the Wishart distribution for covariances, we are able to compute the joint probability of the number of clusters, the assignment vector and the observed data, given a particular choice from the six covariance models, where all other quantities (means, covariances, mixing proportions) have been integrated out. We have used that joint probability in the `allocation sampler', an existing MCMC algorithm where we have included some novel moves to speed it up, to sample estimates for the number of clusters and for the assignment vector. This allowed a direct estimation of the number of clusters, freeing us from the need to run the algorithm repeatedly, once for each possible number of clusters. The challenging problem was then to select one model from the six covariance models. We show how we can run the six MCMC chains in parallel and combine samples from the six chains, in proportion to the Bayes Factor, using an augmented data model based on a method by Carlin and Chib. This algorithm is illustrated and compared to other classical approaches of cluster analysis using six real data sets.
Hope to see you there!
Valentin
(for the organizers: Jérôme Goudet, Valentin Rousson, Frédéric Schütz)