###
02/10/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Manuel Cabral Morais, *CEMAT & Department of Mathematics, IST*

```
```###
Strategies to reduce the probability of a misleading signal

Standard practice in statistical process control is to run two
individual charts, one for the process mean and another one for the
process variance. The resulting scheme is known as a simultaneous
scheme and it provides a way to satisfy Shewhart's dictum that
proper process control implies monitoring both location and
dispersion.

When we use a simultaneous scheme, the quality characteristic is
deemed to be out-of-control whenever a signal is triggered by
either individual chart. As a consequence, the misidentification of
the parameter that has changed can occur, meaning that a shift in
the process mean can be misinterpreted as a shift in the process
variance and vice-versa. These two events are known as misleading
signals (MS) and can occur quite frequently.

We discuss (necessary and) sufficient conditions to achieve
values of PMS smaller than or equal to \(0.5\), explore, for
instance, alternative simultaneous Shewhart-type schemes and check
if they lead to PMS which are smaller than the ones of the popular
\((\bar{X}, S^2)\) simultaneous scheme.

###
25/09/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Daniel Schwarz, *IST and CMU*

```
```###
Price Modelling in Carbon Emission and Electricity Markets

We present a model to explain the joint dynamics of the prices of
electricity and carbon emission allowance certificates as a
function of exogenously given fuel prices and power demand. The
model for the electricity price consists of an explicit
construction of the electricity supply curve; the model for the
allowance price takes the form of a coupled forward-backward
stochastic differential equation (FBSDE) with random coefficients.
Reflecting typical properties of emissions trading schemes the
terminal condition of this FBSDE exhibits a gradient singularity.
Appealing to compactness arguments we prove the existence of a
unique solution to this equation. We illustrate the relevance of
the model at the example of pricing clean spread options, contracts
that are frequently used to value power plants in the spirit of
real option theory.

###
04/07/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Sven Knoth, *Institute of Mathematics and Statistics, Helmut Schmidt University, Hamburg, Germany*

```
```###
Incorporating parameter uncertainty into the setup of EWMA control charts monitoring normal variance

Most of the literature concerned with the design of control charts relies on perfect knowledge of the distribution for at least the good (so-called in-control) process. Some papers treated the handling of EWMA charts monitoring normal mean in case of unknown parameters - refer to Jones, Champ and Rigdon (2001) for a good introduction. In Jensen, Jones-Farmer, Champ, and Woodall (2006): “Effects of Parameter Estimation on Control Chart Properties: A Literature Review” a nice overview was given. Additionally, it was mentioned that it would be interesting and useful to evaluate and take into account these effects also for variance control charts. Here, we consider EWMA charts for monitoring the normal variance. Given a sequence of batches of size $n$, $\{X_{i j}\}$, $i=1,2,\ldots$ and $j=1,2,\ldots,n$ utilize the following EWMA control chart: \begin{align*} Z_0 & = z_0 = \sigma_0^2 = 1 \,, \\ Z_i & = (1-\lambda) Z_{i-1} + \lambda S_i^2 \,,\; i = 1,2,\ldots \,,\\ & \qquad\qquad S_i^2 = \frac{1}{n-1} \sum_{i=1}^n (X_{ij} - \bar X_i)^2 \,,\; \bar X_i = \frac{1}{n} \sum_{i=1}^n X_{ij} \,, \\ L & = \inf \left\{ i \in I\!\!N: Z_i > c_u \sigma_0^2 \right\} \,. \end{align*} The parameters $\lambda \in (0,1]$ and $c_u \gt 0$ are chosen to enable a certain useful detection performance (not too much false alarms and quick detection of changes). The most popular performance measure is the so-called Average Run Length (ARL), that is $E_{\sigma}(L)$ for the true standard deviation $\sigma$. If $\sigma_0$ has to be estimated by sampling data during a pre-run phase, then this uncertain parameter effects, of course, the behavior of the applied control chart. Typically the ARL is increased. Most of the papers about characterizing the uncertainty impact deal with the changed ARL patterns and possible adjustments. Here, a different way of designing the chart is treated: Setup the chart through specifying a certain false alarm probability such as $P_{\sigma_0}(L\le 1000) \le \alpha$. This results in a specific $c_u$. Here we describe a feasible way to determine this value $c_u$ also in case of unknown parameters for a pre-run series of given size (and structure). A two-sided version of the introduced EWMA scheme is analyzed as well.

###
27/06/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Emanuele Dolera, *Università di Modena e Reggio Emilia, Italy*

```
```###
Reaching the best possible rate of convergence to equilibrium of
Boltzmann-equation solutions

This talk concerns a definitive answer to the problem of
quantifying the relaxation to equilibrium of the solutions to the
spatially homogeneous Boltzmann equation for Maxwellian molecules.
Under really mild conditions on the initial datum - closed to
necessity - and a weak, physically consistent, angular cutoff
hypothesis, the main result states that the total variation
distance (i.e. the ${L}^{1}$-norm in the absolutely continuous case)
between the solution and the limiting Maxwellian distribution
admits an upper bound of the form $C\mathrm{exp}(-{\Lambda}_{b}^{*}t)$,
${\Lambda}_{b}^{*}$ being the spectral gap of the linearized collision
operator and $C$ a constant depending only on the initial datum.
Hilbert hinted at the validity of this quantification in 1912,
which was explicitly formulated as a conjecture by McKean in 1966.
The main line of the new proof is based on an analogy between the
problem of convergence to equilibrium and the central limit theorem
of probability theory, as suggested by McKean.

###
12/06/2013, 11:00 — 12:00 — Room P4.35, Mathematics Building

Ana M. Bianco, *Universidad de Buenos Aires and CONICET*

```
```###
Robust Procedures for Nonlinear Models for Full and Incomplete Data

Linear models are one of the most popular models in Statistics.
However, in many situations the nature of the phenomenon is
intrinsically nonlinear and so, linear approximations are not valid
and the data must be fitted using a nonlinear model. Besides, in
some occasions the responses are incomplete and some of them are
missing at random.

It is well known that, in this setting, the classical estimator
of the regression parameter based on least squares is very
sensitive to outliers. A family of general M-estimators is proposed
to estimate the regression parameter in a nonlinear model. We give
a unified approach to treat full data or data with missing
responses. Under mild conditions, the proposed estimators are
Fisher-consistent, consistent and asymptotically normal. To study
local robustness, their influence function is also derived.

A family of robust tests based on a Wald-type statistic is
introduced in order to check hypotheses that involve the regression
parameter. Monte Carlo simulations illustrate the finite sample
behaviour of the proposed procedures in different settings in
contaminated and uncontaminated samples.

###
31/05/2013, 15:00 — 16:00 — Room P3.10, Mathematics Building

Isabel Silva and Maria Eduarda Silva, *Faculdade de Engenharia, Universidade do Porto, and Faculdade de Economia, Universidade do Porto*

```
```###
An INteger AutoRegressive afternoon - Statistical analysis of
discrete valued time series

Part I: Univariate and multivariate models based on thinning

Part II: Modelling and forecasting time series of counts

Time series of counts arise when the interest lies on the number
of certain events occurring during a specified time interval. Many
of these data sets are characterized by low counts, asymmetric
distributions, excess zeros, over dispersion, etc, ruling out
normal approximations. Thus, during the last decades there has been
considerable interest in models for integer-valued time series and
a large volume of work is now available in specialized monographs.
Among the most successful models for integer-valued time series are
the INteger- valued AutoRegressive Moving Average, INARMA, models
based on the thinning operation. These models are attractive since
they are linear-like models for discrete time series which exhibit
recognizable correlation structures. Furthermore, in many
situations the collected time series are multivariate in the sense
that there are counts of several events observed over time and the
counts at each time point are correlated. The first talk introduces
univariate and multivariate models for time series of counts based
on the thinning operator and discusses their statistical and
probabilistic properties. The second talk addresses estimation and
diagnostic issues and illustrates the inference procedures with
simulated and observed data.

###
17/05/2013, 14:30 — 15:30 — Room P3.10, Mathematics Building

David Taylor, *Research and Actuarial Science Division, School of Management Studies, University of Cape Town, South Africa*

```
```###
Mathematical Finance in South Africa

I have been involved in Math Finance university education in South
Africa since 1996. During this time I have produced numerous
graduates & grown an extensive network of industry &
academic partners. I'll talk about these experiences & take
questions.

###
16/05/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

David Taylor, *Research and Actuarial Science Division, School of Management Studies, University of Cape Town, South Africa*

```
```###
Aggregational Gaussianity Using Sobol Sequencing In the South
African Equity Markets: Implications for the Pricing of Risk

Stylized facts of asset returns in the South African market have
received extensive attention, with multiple studies published on
non-normality of returns, heavy-tailed distributions, gain-loss
asymmetry and, particularly, volatility clustering. The one such
fact that has received only cursory attention world-wide is that of
Aggregational Gaussianity - the widely-accepted/stylized fact that
empirical asset returns tend to normality when the period over
which the return is computed increases. The aggregational aspect
arises from the \(n\)-day log-return being the simple sum of \(n\)
one-day log-returns. This fact is usually established using
Q-Q-plots over longer and longer intervals, and can be
qualitatively confirmed. However, this methodology inevitably uses
overlapping data series, especially for longer period returns. When
an alternative resampling methodology for dealing with common
time-overlapping returns data is used an alternate picture emerges.
Here we describe evidence from the South African market for a
discernible absence of Aggregational Gaussianity and briefly
discuss the implications of these findings for the quantification
of risk and to the pricing and hedging of derivative securities.

###
06/05/2013, 16:30 — 17:30 — Room P3.10, Mathematics Building

Cristina Barros, *Departamento de Engenharia do Ambiente, Escola Superior de Tecnologia e Gestão - Instituto Politécnico de Leiria*

```
```###
Real Time Statistical Process Control of the Quantity of Product in
Prepackages

In this presentation we will describe how we developed a
methodology for the statistical quantity control processes of
prepackagers and present a number of different case studies based
on the type of product, packaging, production, filling line and
system of data acquisition. With the aim of establishing a global
strategy to control the quantity of product in prepackages, an
integrated planning model based on statistical tools was developed.
This model is able to manage the production functions concerning
the legal metrological requirements. These requirements are similar
all around the world because they are based on the recommendation
R-87: 2004 (E) from the International Organization of Legal
Metrology (OIML). Based on the principles of Statistical Process
Control a methodology to analyze in real time the quantity of
product in prepackages was proposed; routine inspections, condition
monitoring of the main components and friendly comprehension of the
outputs were taken into account. Subsequently, software of data
acquisition, registration to guarantee traceability and treatment
for decisions which can be configured for any kind of filling
process was introduced. The impacts of this system, named ACCEPT-
Computer Based Help for the Statistic Control of the Filling
Processes, at the industry is demonstrated by the large number of
companies that are using this system to control their processes. In
Portugal, more than 50 companies and thousands of operators with
very low qualifications are working every day with SPC tools and
capability analysis in order to minimize variability and waste (for
example: over filling), to ensure compliance and to guarantee the
consumers rights.

###
23/04/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Stéphane Villeneuve, *Toulouse School of Economics, University of Toulouse*

```
```###
Corporate cash policy with liquidity and profitability risks

We develop a dynamic model of a firm facing both liquidity and
profitability concerns. This leads us to study and to solve
explicitly a bi-dimensional control problem where the two state
variables are the controlled cash reserves process and the belief
process about the firm’s profitability. Our model encompasses
previous studies and provides new predictions for corporate cash
policy. The model predicts a positive relationship between cash
holdings and beliefs about the firm’s profitability, a
non-monotonic relationship between cash holdings and the volatility
of the cash flows as well as a non-monotonic relationship between
cash holdings and the risk of profitability. This yields novel
insights on the firm’s default policy and on the relationship
between volatility of stock prices and the level of stock.

###
18/04/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Ana Subtil, *CEMAT, Instituto Superior Técnico, Universidade Técnica de Lisboa; Faculdade de Ciências, Universidade de Lisboa*

```
```###
Using Latent Class Models to Evaluate the Performance of Diagnostic
Tests in the Absence of a Gold Standard

Diagnostic tests are helpful tools for decision-making in a
biomedical context. In order to determine the clinical relevance
and practical utility of each test, it is critical to assess its
ability to correctly distinguish diseased from non-diseased
individuals. Statistical analysis has an essential role in the
evaluation of diagnostic tests, since it is used to estimate
performance measures of the tests, such as sensitivity and
specificity. Ideally, these measures are determined by comparison
with a gold standard, i.e., a reference test with perfect
sensitivity and specificity.

When no gold standard is available, admitting the supposedly
best available test as a reference may cause misclassifications
leading to biased estimates. Alternatively, Latent Class Models
(LCM) may be used to estimate diagnostic tests performance measures
as well as the disease prevalence, in the absence of a gold
standard. The most common LCM estimation approaches are the maximum
likelihood estimation using the Expectation-Maximization algorithm
and the Bayesian inference using Markov Chain Monte Carlo methods,
via Gibbs sampling.

This talk illustrates the use of Bayesian Latent Class Models
(BLCM) in the context of malaria and canine dirofilariosis. In each
case, multiple diagnostic tests were applied to distinct
subpopulations. To analyze the subpopulations simultaneously, a
product multinomial distribution was considered, since the
subpopulations were independent. By introducing constraints, it was
possible to explore differences and similarities between
subpopulations in terms of prevalence, sensitivities and
specificities.

We also discuss statistical issues such as the assumption of
conditional independence, model identifiability, sampling
strategies and prior distribution elicitation.

###
05/04/2013, 14:30 — 15:30 — Room P3.10, Mathematics Building

Paula Brito, *Faculdade de Economia / LIAAD - INESC TEC, Universidade do Porto*

```
```###
Taking Variability in Data into Account: Symbolic Data Analysis

Symbolic Data, introduced by E. Diday in the late eighties of
the last century, is concerned with analysing data presenting
intrinsic variability, which is to be explicitly taken into
account. In classical Statistics and Multivariate Data Analysis,
the elements under analysis are generally individual entities for
which a single value is recorded for each variable - e.g.,
individuals, described by their age, salary, education level,
marital status, etc.; cars each described by its weight, length,
power, engine displacement, etc.; students for each of which the
marks at different subjects were recorded. But when the elements of
interest are classes or groups of some kind - the citizens living
in given towns; teams, consisting of individual players; car
models, rather than specific vehicles; classes and not individual
students - then there is variability inherent to the data. To
reduce this variability by taking central tendency measures - mean
values, medians or modes - obviously leads to a too important loss
of information.

Symbolic Data Analysis provides a framework allowing
representing data with variability, using new variable types. Also,
methods have been developed which suitably take data variability
into account. Symbolic data may be represented using the usual
matrix-form data arrays, where each entity is represented in a row
and each column corresponds to a different variable - but now the
elements of each cell are generally not single real values or
categories, as in the classical case, but rather finite sets of
values, intervals or, more generally, distributions.

In this talk we shall introduce and motivate the field of
Symbolic Data Analysis, present into some detail the new variable
types that have been introduced to represent variability,
illustrating with some examples. We shall furthermore discuss some
issues that arise when analysing data that does not follow the
usual classical model, and present data representation models for
some variable types.

###
14/03/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Iryna Okhrin & Ostap Okhrin, *Faculty of Business Administration and Economics at the Europa-Universität Viadrina Frankfurt (Oder) & Wirtschaftswissenschafttliche Fakultät at the Humboldt-Universität zu Berlin*

```
```###
Forecasting The Temperature Data & Localising Temperature Risk

Forecasting The Temperature Data:

This paper aims at describing the intraday temperature
variations which is a challenging task in modern econometrics and
environmetrics. Having a high-frequency data, we separate the
dynamics within a day and over days. Three main models have been
considered in our study. As the benchmark we employ a simple
truncated Fourier series with autocorrelated residuals. The second
model uses the functional data analysis, and is called the shape
invariant model (SIM). The third one is the dynamic semiparametric
factor model (DSFM). In this work we discuss rises and pitfalls of
all the methods and compare their in- and out-of-sample
performances.

&

Localising Temperature Risk:

On the temperature derivative market, modelling temperature
volatility is an important issue for pricing and hedging. In order
to apply the pricing tools of financial mathematics, one needs to
isolate a Gaussian risk factor. A conventional model for
temperature dynamics is a stochastic model with seasonality and
intertemporal autocorrelation. Empirical work based on seasonality
and autocorrelation correction reveals that the obtained residuals
are heteroscedastic with a periodic pattern. The object of this
research is to estimate this heteroscedastic function so that,
after scale normalisation, a pure standardised Gaussian variable
appears. Earlier works investigated temperature risk in different
locations and showed that neither parametric component functions
nor a local linear smoother with constant smoothing parameter are
flexible enough to generally describe the variance process well.
Therefore, we consider a local adaptive modelling approach to find,
at each time point, an optimal smoothing parameter to locally
estimate the seasonality and volatility. Our approach provides a
more flexible and accurate fitting procedure for localised
temperature risk by achieving nearly normal risk factors. We also
employ our model to forecast the temperature in different cities
and compare it to a model developed in Campbell and Deibol
(2005).

###
28/02/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Graciela Boente, *Universidad de Buenos Aires and CONICET, Argentina*

```
```###
S-estimators for functional principal component analysis

A well-known property of functional principal components is that
they provide the best q-dimensional approximation to random
elements over separable Hilbert spaces. Our approach to robust
estimates of principal components for functional data is based on
this property since we consider the problem of robustly estimating
these finite-dimensional approximating linear spaces. We propose a
new class of estimators for principal components based on robust
scale functionals by finding the lower dimensional linear space
that provides the best prediction for the data. In analogy to the
linear regression case, we call this proposal S-estimators. This
method can also be applied to sparse data sets when the underlying
process satisfies a smoothness condition with respect to the
functional associated with the scale defining the S-estimators. The
motivation is a problem of outlier detection in atmospheric data
collected by weather balloons launched into the atmosphere and
stratosphere.

###
14/02/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Verena Hagspiel, *Department of Operations, Faculty of Business and Economics, University of Lausanne*

```
```###
Technological Change: A Burden or a Chance

The photography industry underwent a disruptive change in
technology during the 1990s when the traditional film was replaced
by digital photography (see e.g. The Economist January 14th 2012).
In particular Kodak was largely affected : by 1976 Kodak accounted
for 90% of film and 85% of camera sales in America. Hence it was a
near-monopoly in America. Kodak′s revenues were nearly 16 billion
in 1996 but the prediction is that it will decrease to 6.2 billion
in 2011. Kodak tried to get (squeeze) as much money out of the film
business as possible and it prepared for the switch to digital
film. The result was that Kodak did eventually build a profitable
business out of digital cameras but it lasted only a few years
before camera phones overtook it.

According to Mr Komori, the former CEO of Fujifilm of 2000-2003,
Kodak aimed to be a digital company, but that is a small business
and not enough to support a big company. For Kodak it was like
seeing a tsunami coming and there′s nothing you can do about it,
according to Mr. Christensen in The Economist (January 14th
2012).

In this paper we study the problem of a firm that produces with
a current technology for which it faces a declining sales volume.
It has two options: it can either exit this industry or invest in a
new technology with which it can produce an innovative product. We
distinguish between two scenarios in the sense that the resulting
new market can be booming or ends up to be smaller than the old
market used to be.

We derive the optimal strategy of a firm for each scenario and
specify the probabilities with which a firm would decide to
innovate or to exit. Furthermore, we assume that the firm can
additionally choose to suspend production for some time in case
demand is too low, instead of immediately taking the irreversible
decision to exit the market. We derive conditions under which such
an suspension area exists and show how long a firm is expected to
remain in this suspension area before resuming production,
investing in new technology or exiting the market.

###
04/06/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Bruno de Sousa, *Instituto de Higiene e Medicina Tropical, UNL, CMDT*

```
```###
Understanding the state of men's health in Europe through a life
expectancy analysis

A common feature of the health of men across Europe is their higher
rates of premature mortality and shorter life expectancy than
women. Following the publication of the first State of Men's Health
in Europe we sought to explore possible reasons.
We described trends in life expectancy in the European Union
member States (EU27) between 1999 and 2008 using mortality data
obtained from Eurostat. We then used Pollard's decomposition method
to identify the contribution of deaths from different causes and at
different age groups to differences in life expectancy. We first
examined the change in life expectancy for men and for women
between the beginning and end of this period. Second, we examined
the gap in life expectancy between men and women at the beginning
and end of this period.

Between 1999 and 2008 life expectancy in the EU27 increased by
2.77 years for men and by 2.12 years for women. Most of these
improvements were due to reductions in mortality at ages over 60,
with cardiovascular disease accounting for 1.40 years of the
reduction in men. In 2008 life expectancy of men in the EU27 was
6.04 years lower than that of women. Deaths from all major groups
of causes, and at all ages, contribute to this gap, with external
causes contributing 1.00 year, cardiovascular disease 1.75 years
and neoplasms 1.71 years.

Improvements in the life expectancy of men and women have
mostly occurred at older ages. There has been little improvement in
the high rate of premature death in younger men. This would suggest
a need for interventions to tackle the high death rate in younger
men. The demonstration of variations in premature death and life
expectancy seen in men within the new European Commission report,
highlight the impact of poor socio-economic conditions. The more
pronounced adverse effect on the health of men suggests that men
suffer from 'heavy impact diseases' and these are more quickly
life-limiting with women more likely to survive, but with poorer
health.

###
16/05/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Verena Hagspiel , *CentER, Department of Econometrics and Operations Research Tilburg University, The Netherlands *

```
```###
Optimal Technology Adoption when the Arrival Rate of New
Technologies Changes

Our paper contributes to the literature of technology adoption. In
most of these models it is assumed that after the arrival of a new
technology the probability of the next arrival is constant. We
extend this approach by assuming that after the last technology
jump the probability of a new arrival can change. Right after the
arrival of a new technology the intensity equals a specific value
that switches if no new technology arrival has taken place within a
certain period after the last technology arrival. We look at
different scenarios, dependent on whether the firm is threatened by
a drop in the arrival rate after a certain time period or expects
the rate of new arrivals to rise. We analyze the effect of variance
of time between two consecutive arrivals on the optimal investment
timing and show that larger variance accelerates investment in a
new technology. We find that firms often adopt a new technology a
time lag after its introduction, which is a phenomenon frequently
observed in practice. Regarding a firm's technology releasing
strategy we explain why clear signals set by regular and steady
release of new product generations stimulates customers buying
behavior. Depending on whether the arrival rate is assumed to
change or be constant over time, the optimal technology adoption
timing changes significantly. In a further step we add an
additional source of uncertainty to the problem and assume that the
length of the time period after which the arrival intensity changes
is not known to the firm in advance. Here, we find that increasing
uncertainty accelerates investment, a result that is opposite to
the standard real options theory.

###
02/05/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Manuel Cabral Morais, *Departamento de Matemática - CEMAT - IST*

```
```###
On the Aging Properties of the Run Length of Markov-Type Control
Charts

A change in a production process must be detected quickly so
that a corrective action can be taken. Thus, it comes as no
surprise that the run length (RL) is usually used to describe the
performance of a quality control chart.

This popular performance measure has a phase-type distribution
when dealing with Markov-type charts, namely, cumulative sum
(CUSUM) and exponentially weighted moving average (EWMA) charts, as
opposed to a geometric distribution, when standard Shewhart charts
are in use.

In this talk, we briefly discuss sufficient conditions on the
associated probability transition matrix to deal with run lengths
with aging properties such as new better than used in expectation,
new better than used, and increasing hazard rate.

We also explore the implications of these aging properties of
the run lengths, namely when we decide to confront the in control
and out-of-control variances of the run lengths of matched in
control Shewhart and Markov-type control charts.

#### Keywords

Phase-type distributions; Run length; Statistical process
control; Stochastic ordering.

#### Bibiography

Morais, M.C. and Pacheco, A. (2012). A note on the aging
properties of the run length of Markov-type control charts.
Sequential Analysis 31, 88-98.

###
16/04/2012, 15:00 — 16:00 — Room P3.10, Mathematics Building

Jorge Cadima, *Matemática/DCEB, ISA/UTL e CEAUL/UL*

```
```###
Espaço das variáveis: onde estatística e geometria se casam. O caso das distâncias de Mahalanobis.

A forma usual de conceptualizar a representação gráfica duma matriz $X_{n\times p}$ de dados de indivíduos $\times$ variáveis consiste em associar um eixo a cada variável e nesse referencial cartesiano representar cada individuo por um ponto, cujas coordenadas são dadas pela linha de $X$ correspondente ao individuo. A popularidade desta representação no espaço dos individuos ($\mathbb{R}^p$) resulta, em grande medida, do facto de ser visualizável para dados bivariados ou tri-variados. No entanto, para um número maior de variáveis ($p \gt 3$) essa vantagem deixa de existir.

Uma representação alternativa é importante na análise e modelação dos dados. No espaço das variáveis, cada eixo corresponde a um individuo e cada variável é representada por um vector a partir da origem, definido pelas $n$ coordenadas da respectiva coluna matricial. Esta representação das variáveis em $\mathbb{R}^n$ tem a enorme vantagem de casar conceitos estatísticos e conceitos geométricos, permitindo uma melhor compreensão dos primeiros. Tem raízes sólidas na escola francesa de análise de dados, mas o seu potencial nem sempre é explorado.

Nesta comunicação começa-se por relembrar os conceitos geométricos correspondentes a indicadores fundamentais da estatística univariada e bivariada (média, desvio padrão, coeficiente de variação ou coeficiente de correlação) ou multivariada (exemplificando com o caso da análise em componentes principais). Aprofunda-se a discussão no contexto de regressões lineares múltiplas, cujos conceitos fundamentais (coeficiente de determinação, as três somas de quadrados e a sua relação fundamental) têm interpretação geométrica no espaço das variáveis.

Seguidamente, discute-se a utilidade desta representação geométrica no estudo das distâncias de Mahalanobis, que desempenham um papel de primeiro plano na estatística multivariada. Mostra-se como as distâncias (ao quadrado) de Mahalanobis medem a inclinação do subespaço de $\mathbb{R}^n$ gerado pelas colunas da matriz centrada dos dados, o subespaço $\mathcal{C}(X_c)$, em relação ao sistema de eixos. Em particular, mostra-se como as distâncias de Mahalanobis ao centro, \[D^2_{x_i,\overline{x}}=(x_i-\overline{x})^t \S^{-1} (x_i-\overline{x}),\] são apenas função de $n$ e do ângulo $\theta_i$ entre o eixo correspondente ao indivíduo $i$ e $\mathcal{C}(X_c)$, enquanto que a distância (ao quadrado) de Mahalanobis entre dois individuos, \[D^2_{x_i,x_j}=(x_i-x_j)^t \S^{-1} (x_i-x_j),\] é também função apenas de $n$ e do ângulo entre $\mathcal{C}(X_c)$ e a bissectriz gerada por $e_i-e_j$, sendo $e_i$ e $e_j$ os vectores canónicos de $\mathbb{R}^n$ associados aos dois individuos. Algumas recentes majorações e outras propriedades importantes destas distâncias (Gath & Hayes, 2006 e Branco & Pires, 2011) são expressão directa destas relações geométricas. Apesar das distâncias de Mahalanobis dizerem respeito aos individuos, os conceitos geométricos que lhes estão associados no espaço das variáveis podem ser explorados para aprofundar e estender esses resultados.

###
26/03/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Russell Alpizar-Jara , *Research Center in Mathematics and Applications (CIMA-U.E.) Department of Mathematics, University of Évora*

```
```###
An overview of capture-recapture models

Capture-recapture methods have been widely used in Biological
Sciences to estimate population abundance and related demographic
parameters (births, deaths, immigration, or emigration). More
recently, these models have been used to estimate community
dynamics parameters such as species richness, rates of extinction,
colonization and turnover, and other metrics that require
presence/absence data of species counts. In this presentation, we
will use the latest application to illustrate some of the concepts
and the underlying theory of capture-recapture models. In
particular, we will review basic closed-population,
open-population, and combination of closed and open population
models. We will briefly mention about other applications of these
models to Medical, Social and Computer Sciences.

Keywords: Capture-recapture experiments; multinomial and mixture
distributions; non-parametric and maximum likelihood estimation;
population size estimation.