# Probability and Statistics Seminar

### Robust estimators in Generalized Partially Linear Models

Semiparametric models contain both a parametric and a nonparametric component. Sometimes the nonparametric component plays the role of a nuisance parameter. The aim of this talk is to consider semiparametric versions of the generalized linear models where the response $y$ is to be predicted by covariates $({\bf x},t)$, where ${\bf x}\in\mathbb{R}^{p}$ and $t\in\mathbb{R}$. It will be assumed that the conditional distribution of $y|({\bf x},t)$ belongs to the canonical exponential family $\exp\left[y\theta({\bf x},t)-B\left(\theta({\bf x},t)\right)+C(y)\right]$, for known functions $B$ and $C$. The generalized linear model (McCullagh and Nelder, 1989), which is a popular technique for modelling a wide variety of data, assumes that the mean is modelled linearly through a known link function, $g$, i.e., $g(\mu\left({\bf x},t\right))=\theta({\bf x},t)=\beta_{0}+{\bf x}^T{\bf\beta}+\alpha t\;.$ In many situations, the linear model is insufficient to explain the relationship between the response variable and its associated covariates. A natural generalization, which suffers from the curse of dimensionality, is to model the mean nonparametrically in the covariates. An alternative strategy is to allow most predictors to be modeled linearly while one or a small number of predictors enter the model nonparametrically. This is the approach we will follow, so that the relationship will be given by the semiparametric generalized partially linear model $$\mu\left({\bf x},t\right)=E\left(y|({\bf x},t)\right)=H\left(\eta(t)+{\bf x}^T{\bf\beta}\right)\qquad(\text{GPLM})$$ where $H=g^{-1}$ is a known link function, ${\bf\beta}\in\mathbb{R}^{p}$ is an unknown parameter and $\eta$ is an unknown continuous function. Severini and Wong (1992) introduced the concept of generalized profile likelihood, which was later applied to this model by Severini and Staniswalis (1994). In this method, the nonparametric component is viewed as a function of the parametric component, and root--$n$ consistent estimates for the parametric component can be obtained when the usual optimal rate for the smoothing parameter is used. Such estimates fail to deal with outlying observations. In a semiparametric setting, outliers can have a devastating effect, since the extreme points can easily affect the scale and the shape of the function estimate of $\eta$, leading to possibly wrong conclusions on $\beta$. Robust procedures for generalized linear models have been considered among others by Stephanski, Carroll and Ruppert (1986), Künsch, Stefanski and Carroll (1989), Bianco and Yohai (1995), Cantoni and Ronchetti (2001), Croux and Haesbroeck (2002) and Bianco, García Ben and Yohai (2005). The basic ideas from robust smoothing and from robust regression estimation have been adapted to deal with the case of independent observations following a partly linear regression model with $g(t)=t$; we refer to Gao and Shi (1997) and Bianco and Boente (2004), and He, Zhu and Fung (2002). In this talk, we will first remind the classical approach to generalized partly linear models. The sensitivity to outliers of the classical estimates for these models is good evidence that robust methods are needed. The problem of obtaining a family of robust estimates was first considered by Boente, He and Zhou (2006). However, their procedure is computationally expensive. We will introduce a general three--step robust procedure to estimate the parameter ${\bf\beta}$ and the function $\eta$, under a generalized partly linear model (GPLM), that is easier to compute than the one introduce by Boente, He and Zhou (2006). It is shown that the estimates of ${\bf\beta}$ are root--$n$ consistent and asymptotically normal. Through a Monte Carlo study, we compare the performance of these estimators with that of the classical ones. Besides, through their empirical influence function we study the sensitivity of the estimators. A robust procedure to choose the smoothing parameter is also discussed. We will briefly discuss the generalized partially linear single index model which generalizes the previous one since the independent observations are such that $y_{i}|\left({{\bf x}_{i},t_{i}}\right)\sim F\left(\cdot,\mu_{i}\right)$ with $\mu_{i}=H\left(\eta({\bf\alpha}^T{\bf t}_{i})+{\bf x}_{i}{\bf\beta}^T\right)$, where now ${\bf t}_{i}\in\mathbb{R}^{q}$, ${\bf x}_{i}\in\mathbb{R}^{p}$ and $\eta:\mathbb{R}\to\mathbb{R}$, ${\bf\beta}\in\mathbb{R}^{p}$ and ${\bf\alpha}\in\mathbb{R}^{q}$ ($\|{\bf\alpha}\|=1$) are the unknown parameters to be estimated. Two families of robust estimators are introduced which turn out to be consistent and asymptotically normally distributed. Their empirical influence function is also computed. The robust proposals improve the behavior of the classical ones when outliers are present.

Trabalho efectuado em parceria com Daniela Rodriguez