###
05/04/2013, 14:30 — 15:30 — Room P3.10, Mathematics Building

Paula Brito, *Faculdade de Economia / LIAAD - INESC TEC, Universidade do Porto*

```
```###
Taking Variability in Data into Account: Symbolic Data Analysis

Symbolic Data, introduced by E. Diday in the late eighties of
the last century, is concerned with analysing data presenting
intrinsic variability, which is to be explicitly taken into
account. In classical Statistics and Multivariate Data Analysis,
the elements under analysis are generally individual entities for
which a single value is recorded for each variable - e.g.,
individuals, described by their age, salary, education level,
marital status, etc.; cars each described by its weight, length,
power, engine displacement, etc.; students for each of which the
marks at different subjects were recorded. But when the elements of
interest are classes or groups of some kind - the citizens living
in given towns; teams, consisting of individual players; car
models, rather than specific vehicles; classes and not individual
students - then there is variability inherent to the data. To
reduce this variability by taking central tendency measures - mean
values, medians or modes - obviously leads to a too important loss
of information.

Symbolic Data Analysis provides a framework allowing
representing data with variability, using new variable types. Also,
methods have been developed which suitably take data variability
into account. Symbolic data may be represented using the usual
matrix-form data arrays, where each entity is represented in a row
and each column corresponds to a different variable - but now the
elements of each cell are generally not single real values or
categories, as in the classical case, but rather finite sets of
values, intervals or, more generally, distributions.

In this talk we shall introduce and motivate the field of
Symbolic Data Analysis, present into some detail the new variable
types that have been introduced to represent variability,
illustrating with some examples. We shall furthermore discuss some
issues that arise when analysing data that does not follow the
usual classical model, and present data representation models for
some variable types.