Room P3.10, Mathematics Building

Graciela Boente, Universidad de Buenos Aires and CONICET, Argentina
Robust inference in generalized linear models with missing responses

he generalized linear model GLM (McCullagh and Nelder, 1989) is a popular technique for modelling a wide variety of data and assumes that the observations are independent such that the conditional distribution of y|x belongs to the canonical exponential family. In this situation, the mean $E(y|x)$ is modelled linearly through a known link function. Robust procedures for generalized linear models have been considered among others by Stefanski et al. (1986), Künsch et al. (1989), Bianco and Yohai (1996), Cantoni and Ronchetti (2001), Croux and Haesbroeck (2002) and Bianco et al. (2005). Recently, robust tests for the regression parameter under a logistic model were considered by Bianco and Martínez (2009).

In practice, some response variables may be missing, by design (as in two-stage studies) or by happenstance. As it is well known, the methods described above are designed for complete data sets and problems arise when missing responses may be present, while covariates are completely observed. Even if there are many situations in which both the response and the explanatory variables are missing, we will focus our attention only when missing data occur only in the responses. Actually, missingness of responses is very common in opinion polls, market research surveys, mail enquiries, social-economic investigations, medical studies and other scientific experiments, where the explanatory variables can be controlled. This pattern is common, for example, in the scheme of double sampling proposed by Neyman (1938). Hence, we will be interested on robust inference when the response variable may have missing observations but the covariate x is totally observed.

In the regression setting with missing data, a common method is to impute the incomplete observations and then proceed to carry out the estimation of the conditional or unconditional mean of the response variable with the completed sample. The methods considered include linear regression (Yates, 1933), kernel smoothing (Cheng, 1994; Chu and Cheng, 1995) nearest neighbor imputation (Chen and Shao, 2000), semiparametric estimation (Wang et al., 2004, Wang and Sun, 2007), nonparametric multiple imputation (Aerts et al. , 2002, González-Manteiga and Pérez-Gonzalez, 2004), empirical likelihood over the imputed values (Wang and Rao, 2002), among others. All these proposals are very sensitive to anomalous observations since they are based on least squares approaches.

In this talk, we introduce a robust procedure to estimate the regression parameter under a GLM model, which includes, when there are no missing data, the family of estimators previously studied. It is shown that the robust estimates of are root-$n$ consistent and asymptotically normally distributed. A robust procedure to test simple hypothesis on the regression parameter is also considered. The finite sample properties of the proposed procedure are investigated through a Monte Carlo study where the robust test is also compared with nonrobust alternatives.