Room P3.10, Mathematics Building

José G. Dias, Instituto Universitário de Lisboa (ISCTE-IUL), BRU-IUL, Lisboa, Portugal
Multiple-valued symbolic data clustering: heuristic and model-based approaches

Symbolic data analysis (SDA) has been developed as an extension of the data analysis to handle more complex data structures. In this general framework the pair observation/variable is characterized by more than one value: from two (e.g., interval-value data defined by minimum and maximum values) to multiple-valued variables (e.g., frequencies or proportions).

This research discusses the clustering of multiple-valued symbolic data. First, we discuss an extension of heuristic clustering based on the symmetric Kullback-Leibler distance combined with a complete-linkage rule within the hierarchical clustering framework. Then, we propose a new model-based clustering framework. These new family of models based on the Dirichlet distribution includes mixture of regression/expert models. Results are illustrated with synthetic and demographic (population pyramids) data.