# Probability and Statistics Seminar

### Taking Variability in Data into Account: Symbolic Data Analysis

Symbolic Data, introduced by E. Diday in the late eighties of the last century, is concerned with analysing data presenting intrinsic variability, which is to be explicitly taken into account. In classical Statistics and Multivariate Data Analysis, the elements under analysis are generally individual entities for which a single value is recorded for each variable - e.g., individuals, described by their age, salary, education level, marital status, etc.; cars each described by its weight, length, power, engine displacement, etc.; students for each of which the marks at different subjects were recorded. But when the elements of interest are classes or groups of some kind - the citizens living in given towns; teams, consisting of individual players; car models, rather than specific vehicles; classes and not individual students - then there is variability inherent to the data. To reduce this variability by taking central tendency measures - mean values, medians or modes - obviously leads to a too important loss of information.

Symbolic Data Analysis provides a framework allowing representing data with variability, using new variable types. Also, methods have been developed which suitably take data variability into account. Symbolic data may be represented using the usual matrix-form data arrays, where each entity is represented in a row and each column corresponds to a different variable - but now the elements of each cell are generally not single real values or categories, as in the classical case, but rather finite sets of values, intervals or, more generally, distributions.

In this talk we shall introduce and motivate the field of Symbolic Data Analysis, present into some detail the new variable types that have been introduced to represent variability, illustrating with some examples. We shall furthermore discuss some issues that arise when analysing data that does not follow the usual classical model, and present data representation models for some variable types.