Probability and Statistics Seminar   RSS

Planned sessions

25/02/2019, 11:00 — 12:00 — Room P3.10, Mathematics BuildingInstituto Superior Técnico
Anna Couto, INESC-ID and CEMAT

A Comprehensive Methodology to Analyse Topic Difficulties in Educational Programmes

We propose a comprehensive Learning Analytics methodology to investigate the level of understanding students achieve in the learning process. The goals of such methodology are

  1. To identify topics in which students experience difficulties on;
  2. To assess whether these difficulties are recurrent along semesters;
  3. To decide if there are conceptual associations between topics in which students experience difficulties on; and, more generally,
  4. To discover statistically significant groups of topics in which students show similar performance.

The proposed methodology uses statistics and data visualization techniques to address the first and the second goals, frequent itemset mining to tackle the third goal, and biclustering is proposed to find relationships within educational data, revealing meaningful and statistically significant patterns of students’ performance.

We illustrate the application of the methodology to a Computer Science course.

06/03/2019, 13:00 — 14:00 — Room P3.10, Mathematics BuildingInstituto Superior Técnico
Ana Bianco and Graciela Boente, University of Buenos Aires

Robust logistic regression with sparse predictor variables

Nowadays, dealing with high-dimensional data is a recurrent problem that cuts across modern statistics. One main feature of high dimensional data is that the dimension p, that is, the number of covariates, is high, while the sample size n is relatively small. In this circumstance, the bet on sparsity principle suggests to proceed under the assumption that most of the effects are not significant. Sparse covariates are frequent in the classification problem and in this situation the task of variable selection may be also of interest. We focus on the logistic regression model and our aim is to address robust and sparse estimators of the regression parameter in order to perform estimation and variable selection at the same time.For this purpose, we introduce a family of penalized M-type estimators for the logistic regression parameter that are stable against atypical data. We explore different penalizations functions and we introduce the so--called sign penalization. This new penalty has the advantage that it does not shrink the estimated coefficients to 0 and that it depends only on one parameter.We will discuss the variable selection capability of the proposal as well as its asymptotic behaviour. Through a numerical study, we compare the finite sample performance of the proposal with different penalized estimators either robust or classical, under different scenarios.