Contents/conteúdo

Probability and Statistics Seminar   RSS

03/04/2018, 11:00 — 12:00 — Room P3.10, Mathematics Building
, Escola Superior de Comunicação Social, Instituto Politécnico de Lisboa

Challenges of Clustering

Grouping similar objects in order to produce a classification is one of the basic abilities of human beings. It is one of the primary milestones of a child's concrete operational stage and continues to be used throughout adult life, playing a very important role on how we analyse our world. Although being a practical skill, clustering techniques are also commonly used in several applications areas such as social sciences, medicine, biology, engineering and computer science. Despite its wide application there are two issues that remain as ongoing research issues: (i) how many clusters should be selected? and (ii) which are the relevant variables for clustering? These two questions are crucial in order to obtain the best solution. We will answer them using a model-based approach based on finite mixture distributions and information criteria: Bayesian Information Criteria (BIC), Akaike's Information Criteria (AIC), Integrated Completed Likelihood (ICL) and Minimum Message Length (MML).