###
18/04/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Ana Subtil, *CEMAT, Instituto Superior Técnico, Universidade Técnica de Lisboa; Faculdade de Ciências, Universidade de Lisboa*

```
```###
Using Latent Class Models to Evaluate the Performance of Diagnostic
Tests in the Absence of a Gold Standard

Diagnostic tests are helpful tools for decision-making in a
biomedical context. In order to determine the clinical relevance
and practical utility of each test, it is critical to assess its
ability to correctly distinguish diseased from non-diseased
individuals. Statistical analysis has an essential role in the
evaluation of diagnostic tests, since it is used to estimate
performance measures of the tests, such as sensitivity and
specificity. Ideally, these measures are determined by comparison
with a gold standard, i.e., a reference test with perfect
sensitivity and specificity.

When no gold standard is available, admitting the supposedly
best available test as a reference may cause misclassifications
leading to biased estimates. Alternatively, Latent Class Models
(LCM) may be used to estimate diagnostic tests performance measures
as well as the disease prevalence, in the absence of a gold
standard. The most common LCM estimation approaches are the maximum
likelihood estimation using the Expectation-Maximization algorithm
and the Bayesian inference using Markov Chain Monte Carlo methods,
via Gibbs sampling.

This talk illustrates the use of Bayesian Latent Class Models
(BLCM) in the context of malaria and canine dirofilariosis. In each
case, multiple diagnostic tests were applied to distinct
subpopulations. To analyze the subpopulations simultaneously, a
product multinomial distribution was considered, since the
subpopulations were independent. By introducing constraints, it was
possible to explore differences and similarities between
subpopulations in terms of prevalence, sensitivities and
specificities.

We also discuss statistical issues such as the assumption of
conditional independence, model identifiability, sampling
strategies and prior distribution elicitation.

###
05/04/2013, 14:30 — 15:30 — Room P3.10, Mathematics Building

Paula Brito, *Faculdade de Economia / LIAAD - INESC TEC, Universidade do Porto*

```
```###
Taking Variability in Data into Account: Symbolic Data Analysis

Symbolic Data, introduced by E. Diday in the late eighties of
the last century, is concerned with analysing data presenting
intrinsic variability, which is to be explicitly taken into
account. In classical Statistics and Multivariate Data Analysis,
the elements under analysis are generally individual entities for
which a single value is recorded for each variable - e.g.,
individuals, described by their age, salary, education level,
marital status, etc.; cars each described by its weight, length,
power, engine displacement, etc.; students for each of which the
marks at different subjects were recorded. But when the elements of
interest are classes or groups of some kind - the citizens living
in given towns; teams, consisting of individual players; car
models, rather than specific vehicles; classes and not individual
students - then there is variability inherent to the data. To
reduce this variability by taking central tendency measures - mean
values, medians or modes - obviously leads to a too important loss
of information.

Symbolic Data Analysis provides a framework allowing
representing data with variability, using new variable types. Also,
methods have been developed which suitably take data variability
into account. Symbolic data may be represented using the usual
matrix-form data arrays, where each entity is represented in a row
and each column corresponds to a different variable - but now the
elements of each cell are generally not single real values or
categories, as in the classical case, but rather finite sets of
values, intervals or, more generally, distributions.

In this talk we shall introduce and motivate the field of
Symbolic Data Analysis, present into some detail the new variable
types that have been introduced to represent variability,
illustrating with some examples. We shall furthermore discuss some
issues that arise when analysing data that does not follow the
usual classical model, and present data representation models for
some variable types.

###
14/03/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Iryna Okhrin & Ostap Okhrin, *Faculty of Business Administration and Economics at the Europa-Universität Viadrina Frankfurt (Oder) & Wirtschaftswissenschafttliche Fakultät at the Humboldt-Universität zu Berlin*

```
```###
Forecasting The Temperature Data & Localising Temperature Risk

Forecasting The Temperature Data:

This paper aims at describing the intraday temperature
variations which is a challenging task in modern econometrics and
environmetrics. Having a high-frequency data, we separate the
dynamics within a day and over days. Three main models have been
considered in our study. As the benchmark we employ a simple
truncated Fourier series with autocorrelated residuals. The second
model uses the functional data analysis, and is called the shape
invariant model (SIM). The third one is the dynamic semiparametric
factor model (DSFM). In this work we discuss rises and pitfalls of
all the methods and compare their in- and out-of-sample
performances.

&

Localising Temperature Risk:

On the temperature derivative market, modelling temperature
volatility is an important issue for pricing and hedging. In order
to apply the pricing tools of financial mathematics, one needs to
isolate a Gaussian risk factor. A conventional model for
temperature dynamics is a stochastic model with seasonality and
intertemporal autocorrelation. Empirical work based on seasonality
and autocorrelation correction reveals that the obtained residuals
are heteroscedastic with a periodic pattern. The object of this
research is to estimate this heteroscedastic function so that,
after scale normalisation, a pure standardised Gaussian variable
appears. Earlier works investigated temperature risk in different
locations and showed that neither parametric component functions
nor a local linear smoother with constant smoothing parameter are
flexible enough to generally describe the variance process well.
Therefore, we consider a local adaptive modelling approach to find,
at each time point, an optimal smoothing parameter to locally
estimate the seasonality and volatility. Our approach provides a
more flexible and accurate fitting procedure for localised
temperature risk by achieving nearly normal risk factors. We also
employ our model to forecast the temperature in different cities
and compare it to a model developed in Campbell and Deibol
(2005).

###
28/02/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Graciela Boente, *Universidad de Buenos Aires and CONICET, Argentina*

```
```###
S-estimators for functional principal component analysis

A well-known property of functional principal components is that
they provide the best q-dimensional approximation to random
elements over separable Hilbert spaces. Our approach to robust
estimates of principal components for functional data is based on
this property since we consider the problem of robustly estimating
these finite-dimensional approximating linear spaces. We propose a
new class of estimators for principal components based on robust
scale functionals by finding the lower dimensional linear space
that provides the best prediction for the data. In analogy to the
linear regression case, we call this proposal S-estimators. This
method can also be applied to sparse data sets when the underlying
process satisfies a smoothness condition with respect to the
functional associated with the scale defining the S-estimators. The
motivation is a problem of outlier detection in atmospheric data
collected by weather balloons launched into the atmosphere and
stratosphere.

###
14/02/2013, 11:00 — 12:00 — Room P3.10, Mathematics Building

Verena Hagspiel, *Department of Operations, Faculty of Business and Economics, University of Lausanne*

```
```###
Technological Change: A Burden or a Chance

The photography industry underwent a disruptive change in
technology during the 1990s when the traditional film was replaced
by digital photography (see e.g. The Economist January 14th 2012).
In particular Kodak was largely affected : by 1976 Kodak accounted
for 90% of film and 85% of camera sales in America. Hence it was a
near-monopoly in America. Kodak′s revenues were nearly 16 billion
in 1996 but the prediction is that it will decrease to 6.2 billion
in 2011. Kodak tried to get (squeeze) as much money out of the film
business as possible and it prepared for the switch to digital
film. The result was that Kodak did eventually build a profitable
business out of digital cameras but it lasted only a few years
before camera phones overtook it.

According to Mr Komori, the former CEO of Fujifilm of 2000-2003,
Kodak aimed to be a digital company, but that is a small business
and not enough to support a big company. For Kodak it was like
seeing a tsunami coming and there′s nothing you can do about it,
according to Mr. Christensen in The Economist (January 14th
2012).

In this paper we study the problem of a firm that produces with
a current technology for which it faces a declining sales volume.
It has two options: it can either exit this industry or invest in a
new technology with which it can produce an innovative product. We
distinguish between two scenarios in the sense that the resulting
new market can be booming or ends up to be smaller than the old
market used to be.

We derive the optimal strategy of a firm for each scenario and
specify the probabilities with which a firm would decide to
innovate or to exit. Furthermore, we assume that the firm can
additionally choose to suspend production for some time in case
demand is too low, instead of immediately taking the irreversible
decision to exit the market. We derive conditions under which such
an suspension area exists and show how long a firm is expected to
remain in this suspension area before resuming production,
investing in new technology or exiting the market.

###
04/06/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Bruno de Sousa, *Instituto de Higiene e Medicina Tropical, UNL, CMDT*

```
```###
Understanding the state of men's health in Europe through a life
expectancy analysis

A common feature of the health of men across Europe is their higher
rates of premature mortality and shorter life expectancy than
women. Following the publication of the first State of Men's Health
in Europe we sought to explore possible reasons.
We described trends in life expectancy in the European Union
member States (EU27) between 1999 and 2008 using mortality data
obtained from Eurostat. We then used Pollard's decomposition method
to identify the contribution of deaths from different causes and at
different age groups to differences in life expectancy. We first
examined the change in life expectancy for men and for women
between the beginning and end of this period. Second, we examined
the gap in life expectancy between men and women at the beginning
and end of this period.

Between 1999 and 2008 life expectancy in the EU27 increased by
2.77 years for men and by 2.12 years for women. Most of these
improvements were due to reductions in mortality at ages over 60,
with cardiovascular disease accounting for 1.40 years of the
reduction in men. In 2008 life expectancy of men in the EU27 was
6.04 years lower than that of women. Deaths from all major groups
of causes, and at all ages, contribute to this gap, with external
causes contributing 1.00 year, cardiovascular disease 1.75 years
and neoplasms 1.71 years.

Improvements in the life expectancy of men and women have
mostly occurred at older ages. There has been little improvement in
the high rate of premature death in younger men. This would suggest
a need for interventions to tackle the high death rate in younger
men. The demonstration of variations in premature death and life
expectancy seen in men within the new European Commission report,
highlight the impact of poor socio-economic conditions. The more
pronounced adverse effect on the health of men suggests that men
suffer from 'heavy impact diseases' and these are more quickly
life-limiting with women more likely to survive, but with poorer
health.

###
16/05/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Verena Hagspiel , *CentER, Department of Econometrics and Operations Research Tilburg University, The Netherlands *

```
```###
Optimal Technology Adoption when the Arrival Rate of New
Technologies Changes

Our paper contributes to the literature of technology adoption. In
most of these models it is assumed that after the arrival of a new
technology the probability of the next arrival is constant. We
extend this approach by assuming that after the last technology
jump the probability of a new arrival can change. Right after the
arrival of a new technology the intensity equals a specific value
that switches if no new technology arrival has taken place within a
certain period after the last technology arrival. We look at
different scenarios, dependent on whether the firm is threatened by
a drop in the arrival rate after a certain time period or expects
the rate of new arrivals to rise. We analyze the effect of variance
of time between two consecutive arrivals on the optimal investment
timing and show that larger variance accelerates investment in a
new technology. We find that firms often adopt a new technology a
time lag after its introduction, which is a phenomenon frequently
observed in practice. Regarding a firm's technology releasing
strategy we explain why clear signals set by regular and steady
release of new product generations stimulates customers buying
behavior. Depending on whether the arrival rate is assumed to
change or be constant over time, the optimal technology adoption
timing changes significantly. In a further step we add an
additional source of uncertainty to the problem and assume that the
length of the time period after which the arrival intensity changes
is not known to the firm in advance. Here, we find that increasing
uncertainty accelerates investment, a result that is opposite to
the standard real options theory.

###
02/05/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Manuel Cabral Morais, *Departamento de Matemática - CEMAT - IST*

```
```###
On the Aging Properties of the Run Length of Markov-Type Control
Charts

A change in a production process must be detected quickly so
that a corrective action can be taken. Thus, it comes as no
surprise that the run length (RL) is usually used to describe the
performance of a quality control chart.

This popular performance measure has a phase-type distribution
when dealing with Markov-type charts, namely, cumulative sum
(CUSUM) and exponentially weighted moving average (EWMA) charts, as
opposed to a geometric distribution, when standard Shewhart charts
are in use.

In this talk, we briefly discuss sufficient conditions on the
associated probability transition matrix to deal with run lengths
with aging properties such as new better than used in expectation,
new better than used, and increasing hazard rate.

We also explore the implications of these aging properties of
the run lengths, namely when we decide to confront the in control
and out-of-control variances of the run lengths of matched in
control Shewhart and Markov-type control charts.

#### Keywords

Phase-type distributions; Run length; Statistical process
control; Stochastic ordering.

#### Bibiography

Morais, M.C. and Pacheco, A. (2012). A note on the aging
properties of the run length of Markov-type control charts.
Sequential Analysis 31, 88-98.

###
16/04/2012, 15:00 — 16:00 — Room P3.10, Mathematics Building

Jorge Cadima, *Matemática/DCEB, ISA/UTL e CEAUL/UL*

```
```###
Espaço das variáveis: onde estatística e geometria se casam. O caso das distâncias de Mahalanobis.

A forma usual de conceptualizar a representação gráfica duma matriz $X_{n\times p}$ de dados de indivíduos $\times$ variáveis consiste em associar um eixo a cada variável e nesse referencial cartesiano representar cada individuo por um ponto, cujas coordenadas são dadas pela linha de $X$ correspondente ao individuo. A popularidade desta representação no espaço dos individuos ($\mathbb{R}^p$) resulta, em grande medida, do facto de ser visualizável para dados bivariados ou tri-variados. No entanto, para um número maior de variáveis ($p \gt 3$) essa vantagem deixa de existir.

Uma representação alternativa é importante na análise e modelação dos dados. No espaço das variáveis, cada eixo corresponde a um individuo e cada variável é representada por um vector a partir da origem, definido pelas $n$ coordenadas da respectiva coluna matricial. Esta representação das variáveis em $\mathbb{R}^n$ tem a enorme vantagem de casar conceitos estatísticos e conceitos geométricos, permitindo uma melhor compreensão dos primeiros. Tem raízes sólidas na escola francesa de análise de dados, mas o seu potencial nem sempre é explorado.

Nesta comunicação começa-se por relembrar os conceitos geométricos correspondentes a indicadores fundamentais da estatística univariada e bivariada (média, desvio padrão, coeficiente de variação ou coeficiente de correlação) ou multivariada (exemplificando com o caso da análise em componentes principais). Aprofunda-se a discussão no contexto de regressões lineares múltiplas, cujos conceitos fundamentais (coeficiente de determinação, as três somas de quadrados e a sua relação fundamental) têm interpretação geométrica no espaço das variáveis.

Seguidamente, discute-se a utilidade desta representação geométrica no estudo das distâncias de Mahalanobis, que desempenham um papel de primeiro plano na estatística multivariada. Mostra-se como as distâncias (ao quadrado) de Mahalanobis medem a inclinação do subespaço de $\mathbb{R}^n$ gerado pelas colunas da matriz centrada dos dados, o subespaço $\mathcal{C}(X_c)$, em relação ao sistema de eixos. Em particular, mostra-se como as distâncias de Mahalanobis ao centro, \[D^2_{x_i,\overline{x}}=(x_i-\overline{x})^t \S^{-1} (x_i-\overline{x}),\] são apenas função de $n$ e do ângulo $\theta_i$ entre o eixo correspondente ao indivíduo $i$ e $\mathcal{C}(X_c)$, enquanto que a distância (ao quadrado) de Mahalanobis entre dois individuos, \[D^2_{x_i,x_j}=(x_i-x_j)^t \S^{-1} (x_i-x_j),\] é também função apenas de $n$ e do ângulo entre $\mathcal{C}(X_c)$ e a bissectriz gerada por $e_i-e_j$, sendo $e_i$ e $e_j$ os vectores canónicos de $\mathbb{R}^n$ associados aos dois individuos. Algumas recentes majorações e outras propriedades importantes destas distâncias (Gath & Hayes, 2006 e Branco & Pires, 2011) são expressão directa destas relações geométricas. Apesar das distâncias de Mahalanobis dizerem respeito aos individuos, os conceitos geométricos que lhes estão associados no espaço das variáveis podem ser explorados para aprofundar e estender esses resultados.

###
26/03/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Russell Alpizar-Jara , *Research Center in Mathematics and Applications (CIMA-U.E.) Department of Mathematics, University of Évora*

```
```###
An overview of capture-recapture models

Capture-recapture methods have been widely used in Biological
Sciences to estimate population abundance and related demographic
parameters (births, deaths, immigration, or emigration). More
recently, these models have been used to estimate community
dynamics parameters such as species richness, rates of extinction,
colonization and turnover, and other metrics that require
presence/absence data of species counts. In this presentation, we
will use the latest application to illustrate some of the concepts
and the underlying theory of capture-recapture models. In
particular, we will review basic closed-population,
open-population, and combination of closed and open population
models. We will briefly mention about other applications of these
models to Medical, Social and Computer Sciences.

Keywords: Capture-recapture experiments; multinomial and mixture
distributions; non-parametric and maximum likelihood estimation;
population size estimation.

###
07/03/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

K F Turkman, *CEAUL - DEIO - FCUL - University of Lisbon*

```
```###
Why we need non-linear time series models and why we are not using
them so often

The Wold Decomposition theorem says that under fairly general
conditions, a stationary time series ${X}_{t}$ has a unique linear
causal representation in terms of uncorrelated random variables.
However, The Wold Decomposition theorem gives us a representation,
not a model for ${X}_{t}$, in the sense that we can only recover
uniquely the moments of ${X}_{t}$ up to second order from this
representation, unless the input series is a Gaussian sequence. If
we look for models for ${X}_{t}$, then we should look for such model
within the class of convergent Volterra series expansions. If we
have to go beyond second order properties, and many real data sets
from financial and environmental sciences indicate that we should,
then linear models with iid Gaussian input are a very tiny,
insignificant fraction of possible models for a stationary time
series, corresponding to the first term of the infinite order
Volterra expansion. On the other hand, Volterra series expansions
are not particularly useful as a possible class of models, as
conditions of stationarity and invertibility are hard to check, if
not impossible, therefore they have very limited use as models for
time series, unless the input series is observable. From a
prediction point of view, the Projection Theorem for Hilbert spaces
tells us how to obtain the best linear predictor for ${X}_{t+k}$
within the linear span of $\{{X}_{t},{X}_{t-1},\dots ,\}$ , but when
linear predictors are not sufficiently good, it is not
straightforward to find, if possible at all, the best predictor
within richer subspaces constructed over $\{{X}_{t},{X}_{t-1},\dots ,\}$. It is therefore important to look for classes of
nonlinear models to improve upon the linear predictor, which are
sufficiently general, but at the same time are sufficiently
flexible to work with. There are many ways a time series can be
nonlinear. As a consequence, there are many classes of nonlinear
models to explain such nonlinearities, but whose probabilistic
characteristics are difficult to study, not to mention the
difficulties associated with modeling issues. Likelihood based
inference is particularly a difficult issue as for most nonlinear
processes, we can not even write the likelihood. However, recently
there has been very exciting advances in simulation based
inferential methods such as sequential Markov Chain Monte Carlo,
Particle filters and Approximate Bayesian Computation methods for
generalized state space models which we will mention briefly.

###
22/02/2012, 14:30 — 15:30 — Room P3.10, Mathematics Building

Maria Isabel Fraga Alves, *CEAUL-DEIO- FC - Universidade de Lisboa*

```
```###
Até onde pode ir o H(h)omem?

Neste seminário será abordada a questão do “Qual é o Maior Salto em Comprimento ao alcance do H(h)omem, dado o actual * state of the art*”? Para responder a essa pergunta será usado o *crème de la crème*, i.e., os dados são coligidos a partir dos melhores atletas olímpicos na modalidade, a partir da base de dados do World Athletics Competitions - Long Jump Men Outdoors. Esta abordagem do problema é baseada na Teoria de Valores Extremos e as respectivas técnicas estatísticas. Usar-se-ão apenas os melhores desempenhos das World top lists. A estimativa final do potencial recorde, i.e., o limite superior do acontecimento salto em comprimento, permite inferir acerca da melhor marca individual possível, dadas as condições actuais, quer em termos de conhecimento do fenómeno, quer relativamente às condições e regras de registo na modalidade desportiva. Actualmente o recorde de 8,95m é detido por Mike Powell (USA) em Tokyo, 30/08/1991. Em Valores Extremos insere-se na estimativa do limite superior do suporte para uma distribuição no Max-domínio da Gumbel.

Palavras-chave: Valores Extremos em Desporto, Teoria de Valores Extremos, Estimação do Limite Superior do Suporte no Domínio Gumbel, Abordagem Semi-paramétrica para Estatística de Extremos.

###
10/02/2012, 11:00 — 12:00 — Room P3.10, Mathematics Building

Patrícia Ferreira, *CEMAT - Departamento de Matemática - IST*

```
```###
Sinais erróneos em esquemas conjuntos para o valor esperado e paraa variância de processos

Quando se pretende controlar simultaneamente o valor esperado e a variância de um processo é comum utilizar-se um esquema conjunto. Este tipo de esquema é constituído por duas cartas de controlo que operam em simultâneo, uma que controla o valor esperado e outra que controla a variância do processo. A utilização deste tipo de esquemas pode levar à ocorrência de sinais erróneos, associados, por exemplo, às seguintes situações:

- o valor esperado do processo está fora de controlo, no entanto a carta para a variância emite um sinal antes da carta usada para controlar o valor esperado;
- a variância do processo está fora de controlo mas a carta para o valor esperado é a primeira a emitir sinal.

Os sinais erróneos são sinais válidos que podem levar o operador de controlo de qualidade a desencadear acções inadequadas para corrigir uma causa inexistente. Posto isto, é importante considerar a frequência com que estes sinais ocorrem como uma medida de desempenho dos esquemas conjuntos. Neste trabalho analisa-se o desempenho de esquemas conjuntos do ponto de vista da probabilidade de ocorrência de um sinal erróneo com especial enfoque em esquemas conjuntos para processos univariados i.i.d. e autocorrelacionados.

###
19/01/2012, 11:00 — 12:00 — Room P3.10, Mathematics Building

Peter Kort, *Tilburg University*

```
```###
Strategic Capacity Investment Under Uncertainty

In this talk we consider investment decisions within an uncertain dynamic and competitive framework. Each investment decision involves to determine the timing and the capacity level. In this way we extend the main bulk of the real options theory where the capacity level is given. We consider a monopoly setting as well as a duopoly setting. Our main results are the following. In the duopoly setting we provide a fully dynamic analysis of entry deterrence/accommodation strategies. Contrary to the seminal industrial organization analyses that are based on static models, we find that entry can only be deterred temporarily. To keep its monopoly position as long as possible the first investor overinvests in capacity. In very uncertain economic environments the first investor eventually ends up being the largest firm in the market. If uncertainty is moderately present, a reduced value of waiting implies that the preemption mechanism forces the first investor to invest so soon that a large capacity cannot be afforded. Then it will end up with a capacity level being lower than the second investor.

###
04/05/2011, 14:00 — 15:00 — Room P4.35, Mathematics Building

Verena Hagspiel, *Tilburg University, Netherlands*

```
```###
Production Flexibility and Capacity Investment under Demand Uncertainty

he paper takes a real option approach to consider optimal capacity investment decisions under uncertainty. Besides the timing of the investment, the firm also has to decide on the capacity level. Concerning the production decision, we study a flexible and an inflexible scenario. The flexible firm can costlessly adjust production over time with the capacity level as the upper bound, while the inflexible firm fixes production at capacity level from the moment of investment onwards. We find that the flexible firm invests in higher capacity than the inflexible firm, where the capacity difference increases with uncertainty. For the flexible firm the initial occupation rate can be quite low, especially when investment costs are concave and the economic environment is uncertain. As to the timing of the investment there are two contrary effects. First, the flexible firm has an incentive to invest earlier, because flexibility raises the project value. Second, the flexible firm has an incentive to invest later, because costs are larger due to the higher capacity level. The latter effect dominates in highly uncertain economic environments.

###
01/03/2011, 11:00 — 12:00 — Room P3.10, Mathematics Building

Christine Fricker, *INRIA, France*

```
```###
Performance of passive optical networks

We introduce PONs (Passive Optical Networks), which are designed to provide high speed access to users via fiber links. The problem for the OLT (Optical Line Terminal) is to share dynamically the wavelength bandwidth among the ONUs (Optical Network Units). For that, with an optimal algorithm, the system can be modeled as a relatively standard polling system. Due to technological constraints, in the polling system, the number of servers which visit one queue at the same time is limited. The performance of the system is directly related to the stability condition of the polling model. It is unknown in general. A mean field approach provides a limit stability condition when the system gets large.

###
07/10/2010, 16:30 — 17:30 — Room P3.10, Mathematics Building

Magnus Fontes, *Lund University*

```
```###
Mathematics-A Catalyst for Innovation- Giving European Industry an Edge

We will discuss the role of Mathematics in Industry and in innovation processes. The focus will be European and we will look at good examples provided e.g. by the experiences of the network European Consortium for Mathematics in Industry (ECMI). I will also present the ongoing ESF Forward Look: "Mathematics and Industry" (see http://www.ceremade.dauphine.fr/FLMI/FLMI-frames-index.html) and discuss possible future developments on a European scale.

###
21/07/2010, 15:00 — 16:00 — Room P3.10, Mathematics Building

Graciela Boente, *Universidad de Buenos Aires and CONICET, Argentina*

```
```###
Robust inference in generalized linear models with missing responses

he generalized linear model GLM (McCullagh and Nelder, 1989) is a popular technique for modelling a wide variety of data and assumes that the observations are independent such that the conditional distribution of y|x belongs to the canonical exponential family. In this situation, the mean $E(y|x)$ is modelled linearly through a known link function. Robust procedures for generalized linear models have been considered among others by Stefanski et al. (1986), Künsch et al. (1989), Bianco and Yohai (1996), Cantoni and Ronchetti (2001), Croux and Haesbroeck (2002) and Bianco et al. (2005). Recently, robust tests for the regression parameter under a logistic model were considered by Bianco and Martínez (2009).

In practice, some response variables may be missing, by design (as in two-stage studies) or by happenstance. As it is well known, the methods described above are designed for complete data sets and problems arise when missing responses may be present, while covariates are completely observed. Even if there are many situations in which both the response and the explanatory variables are missing, we will focus our attention only when missing data occur only in the responses. Actually, missingness of responses is very common in opinion polls, market research surveys, mail enquiries, social-economic investigations, medical studies and other scientific experiments, where the explanatory variables can be controlled. This pattern is common, for example, in the scheme of double sampling proposed by Neyman (1938). Hence, we will be interested on robust inference when the response variable may have missing observations but the covariate x is totally observed.

In the regression setting with missing data, a common method is to impute the incomplete observations and then proceed to carry out the estimation of the conditional or unconditional mean of the response variable with the completed sample. The methods considered include linear regression (Yates, 1933), kernel smoothing (Cheng, 1994; Chu and Cheng, 1995) nearest neighbor imputation (Chen and Shao, 2000), semiparametric estimation (Wang et al., 2004, Wang and Sun, 2007), nonparametric multiple imputation (Aerts et al. , 2002, González-Manteiga and Pérez-Gonzalez, 2004), empirical likelihood over the imputed values (Wang and Rao, 2002), among others. All these proposals are very sensitive to anomalous observations since they are based on least squares approaches.

In this talk, we introduce a robust procedure to estimate the regression parameter under a GLM model, which includes, when there are no missing data, the family of estimators previously studied. It is shown that the robust estimates of are root-$n$ consistent and asymptotically normally distributed. A robust procedure to test simple hypothesis on the regression parameter is also considered. The finite sample properties of the proposed procedure are investigated through a Monte Carlo study where the robust test is also compared with nonrobust alternatives.

###
01/06/2010, 16:00 — 17:00 — Room P4.35, Mathematics Building

Ana Pires, *Universidade Técnica de Lisboa - Instituto Superior Técnico and CEMAT*

```
```###
CSI: are Mendel's data "Too Good to be True?"

Gregor Mendel (1822-1884) is almost unanimously recognized as the founder of modern genetics. However, long ago, a shadow of doubt was cast on his integrity by another eminent scientist, the statistician and geneticist, Sir Ronald Fisher (1890-1962), who questioned the honesty of the data that form the core of Mendel's work. This issue, nowadays called "the Mendel-Fisher controversy", can be traced back to 1911, when Fisher first presented his doubts about Mendel's results, though he only published a paper with his analysis of Mendel's data in 1936.

A large number of papers have been published about this controversy culminating with the publication in 2008 of a book (Franklin et al., "Ending the Mendel-Fisher controversy"), aiming at ending the issue, definitely rehabilitating Mendel's image. However, quoting from Franklin et al., "the issue of the `too good to be true' aspect of Mendel's data found by Fisher still stands".

We have submitted Mendel's data and Fisher's statistical analysis to extensive computations and simulations attempting to discover an hidden explanation or hint that could help finding an answer to the questions: is Fisher right or wrong, and if Fisher is right is there any reasonable explanation for the "too good to be true", other than deliberate fraud? In this talk some results of this investigation and the conclusions obtained will be presented.

###
18/05/2010, 16:00 — 17:00 — Room P4.35, Mathematics Building

Alex Trindade, *Texas Tech University*

```
```###
Fast and Accurate Inference for the Smoothing Parameter in Semiparametric Models

We adapt the method developed in Paige, Trindade, and Fernando (2009) in order to make approximate inference on optimal smoothing parameters for penalized spline, and partially linear models. The method is akin to a parametric bootstrap where Monte Carlo simulation is replaced by saddlepoint approximation, and is applicable whenever the underlying estimator can be expressed as the root of an estimating equation that is a quadratic form in normal random variables. This is the case under a variety of common optimality criteria such as ML, REML, GCV, and AIC. We apply the method to some well-known datasets in the literature, and find that under the ML and REML criteria it delivers a performance that is nearly exact, with computational speeds that are at least an order of magnitude faster than exact methods. Perhaps most importantly, the proposed method also offers a computationally feasible alternative where no known exact methods exist, e.g. GCV and AIC.