Contents/conteúdo

Probability and Statistics Seminar   RSS

Past sessions

22/10/2020, 13:00 — 14:00 — Online
, Escola Superior de Tecnologia da Saúde de Lisboa e CEAUL

Impact of OVL Variation on AUC Bias Estimated by Non-parametric Methods

The area under the ROC curve (AUC) is the most commonly used index in the ROC methodology to evaluate the performance of a classifier that discriminates between two mutually exclusive conditions. The AUC can admit values between 0.5 and 1, where values close to 1 indicate that the model of classification has a high discriminative power. The overlap coefficient (OVL) between two density functions is defined as the common area between both functions. This coefficient is used as a measure of agreement between two distributions presenting values between 0 and 1, where values close to 1 reveal total overlapping densities. These two measures were used to construct the arrow plot to select differential expressed genes. A simulation study using the bootstrap method is presented in order to estimate AUC bias and standard error using empirical and kernel methods. In order to assess the impact of the OVL variation on the AUC bias, samples from various distributions were simulated considering different values for its parameters and for fixed OVL values between 0 and 1. Samples of dimensions 15, 30, 50 and 100 and 1000 bootstrap replicates for each scenario were considered.

See also

Slides

Joint seminar CEMAT and CEAUL

15/10/2020, 11:00 — 12:00 — Online
, School of Mathematics and Statistics, University New South Wales, Sydney

High-dimensional inference for max-stable processes

Droughts, high temperatures and strong winds are key causes of the recent bushfires that have touched a major part of the Australian territory. Such extreme events seem to appear with increasing frequency, creating an urgent need to better understand the behaviour of extreme environmental phenomena. Max-stable processes are a widely popular tool to model spatial extreme events with several flexible models available in the literature. For inference on max-stable models, exact likelihood estimation becomes quickly computationally intractable as the number of spatial locations grows, limiting their applicability to large study regions or fine grids. In this talk, we introduce two methodologies based on composite likelihoods, to circumvent this issue. First, we assume the occurrence times of maxima available in order to incorporate the Stephenson-Tawn concept into the composite likelihood framework. Second, we propose to aggregate the information between locations into histograms and to derive a composite likelihood variation for these summaries. The significant improvements in performance of each estimation procedures is established through simulation studies and illustrated on two temperature datasets from Australia.

Joint seminar CEMAT and CEAUL

Conceição Amado 01/10/2020, 13:00 — 14:00 — Online
, Instituto Superior Técnico and CEMAT

From high dimensional space to a random low dimensional space

What might happen if we have points in a high dimensional space and one decided to project them into a random low dimensional space?

In this seminar, we will discuss this subject and will see some simple applications.

Joint seminar CEMAT and CEAUL

Joaquim Ferreira 23/07/2020, 11:00 — 12:00 — Online
, Laboratório de Farmacologia Clínica e Terapêutica, Faculdade de Medicina, Universidade de Lisboa

COVID, uncertainty and clinical trials

The current COVID-19 pandemic is putting an enormous pressure not just in the society but also in all the scientific community.

If we want to follow a scientific approach to respond to the doubts and challenges that were generated, we need to find a balance between the most robust data, the best experimental methodologies to address the new problems and all the uncertainty associated.

In this presentation we will try to address this balance between best available data, clinical research methodology and uncertainty applied to what we know about pandemics, vaccine development and clinical trials. There will be a particular focus on the COVID-19 pandemic data and current research efforts for the development of vaccines and efficacious treatments.

See also

Slides of the talk

Miguel de Carvalho 16/07/2020, 11:00 — 12:00 — Online
, University of Edinburgh

Elements of Bayesian geometry

In this talk, I will discuss a geometric interpretation to Bayesian inference that will yield a natural measure of the level of agreement between priors, likelihoods, and posteriors. The starting point for the construction of the proposed geometry is the observation that the marginal likelihood can be regarded as an inner product between the prior and the likelihood. A key concept in our geometry is that of compatibility, a measure which is based on the same construction principles as Pearson correlation, but which can be used to assess how much the prior agrees with the likelihood, to gauge the sensitivity of the posterior to the prior, and to quantify the coherency of the opinions of two experts. Estimators for all the quantities involved in our geometric setup are discussed, which can be directly computed from the posterior simulation output. Some examples are used to illustrate our methods, including data related to on-the-job drug usage, midge wing length, and prostate cancer.

Joint work with G. L. Page and with B. J. Barney.

See also

Slides of the talk

Ismael Lemhadri 09/07/2020, 16:00 — 17:00 — Online
, Stanford University

LassoNet: A Neural Network with Feature Sparsity

Much work has been done recently to make neural networks more interpretable, and one obvious approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or $\ell_1$-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.

See also

Lemhadri_slides.pdf

Maria do Rosário Oliveira 25/06/2020, 11:00 — 12:00 — Online
, CEMAT-IST

Theoretical foundations of forward feature selection methods based on mutual information

Feature selection problems arise in a variety of applications, such as microarray analysis, clinical prediction, text categorization, image classification and face recognition, multi-label learning, and classification of internet traffic. Among the various classes of methods, forward feature selection methods based on mutual information have become very popular and are widely used in practice. However, comparative evaluations of these methods have been limited by being based on specific datasets and classifiers. In this talk, we discuss a theoretical framework that allows evaluating the methods based on their theoretical properties. The estimation difficulties of the method’s objective functions will also be addressed.

This is a joint work with Francisco Macedo, António Pacheco, and Rui Valadas.

See also

Slides of the talk

Igor Kravchenko 18/06/2020, 11:00 — 12:00 — Online
Igor Kravchenko, CEMAT-IST

Investment problem with switching modes

In this talk we will look at the optimal control problem of a firm that may operate in two different modes, one being more risky than the other, in the sense that in case the demand decreases, the return of the risky mode is lower than with the more conservative mode. On the other side, in case the demand increases, the opposite holds. The switches between these two alternative modes have associated costs. In both modes, there is the option to exit the market.

We will focus on two different parameter scenarios, that describe particular (and somehow extreme) economic situations. In the first scenario, we assume that the market is expected to increase in such a way that once the firm is producing in the more risky mode, it is never optimal to switch to the more conservative one. In the second scenario, there is a hysteresis region, where the firm is waiting in the more risky mode, in production, until some drop or increase in the demand leads to an exit or changing to the more conservative mode. This hysteresis region cannot be attained under continuous production.

We then address the problem of the optimal time to invest under each situation. Depending on the relation between the switching costs (equal or different from one mode to another), it may happen that the firm invests in the hysteresis region.

Joint work with Cláudia Nunes and Carlos Oliveira.

See also

Kravchenko.pdf

Cláudia Nunes 28/05/2020, 11:00 — 12:00 — Online
, CEMAT-IST

Quasi-analytical solution of an investment problem with decreasing investment cost due to technological innovations

In this talk we address, in the context of real options, an investment problem with two sources of uncertainty: the price (reflected in the revenue of the firm) and the level of technology. The level of technology impacts in the investment cost, that decreases when there is a technology innovation. The price follows a geometric Brownian motion, whereas the technology innovations are driven by a Poisson process. As a consequence, the investment region may be attained in a continuous way (due to an increase of the price) or in a discontinuous way (due to a sudden decrease of the investment cost).

For this optimal stopping problem no analytical solution is known, and therefore we propose a quasi-analytical method to find an approximated solution that preserves the qualitative features of the exact solution. This method is based on a truncation procedure and we prove that the truncated solution converges to the solution of the original problem.

We provide results for the comparative statics for the investment thresholds. These results show interesting behaviors, particularly, the investment may be postponed or anticipated with the intensity of the technology innovations and with their impact on the investment cost.

(joint work with Carlos Oliveira and Rita Pimentel)

See also

Kravchenko.pdf

Manuel Cabral Morais 14/05/2020, 11:00 — 12:00 — Online
, CEMAT-IST

On ARL-unbiased charts to monitor the traffic intensity of a single server queue

We know too well that the effective operation of a queueing system requires maintaining the traffic intensity at a target value. This important measure of congestion can be monitored by using control charts, such as the one found in the seminal work by Bhat and Rao (1972) or more recently in Chen and Zhou (2015). For all intents and purposes, this paper focus on three control statistics chosen by Morais and Pacheco (2016) for their simplicity, recursive and Markovian character:

  • the number of customers left behind in the M/G/1 system by the n-th departing customer;
  • the number of customers seen in the GI/M/1 system by the n-th arriving customer;
  • the waiting time of the n-th arriving customer to the GI/G/1 system.

Since an upward and a downward shift in the traffic intensity are associated with a deterioration and an improvement (respectively) of the quality of service, the timely detection of these changes is an imperative requirement, hence, begging for the use of ARL-unbiased charts Pignatiello et al. (1995), in the sense that they detect any shifts in the traffic intensity sooner than they trigger a false alarm. In this paper, we focus on the design of these type of charts for the traffic intensity of the three single server queues mentioned above.

Joint work with Sven Knoth

See also

Slides of the talk

19/03/2020, 11:00 — 12:00 — Room P3.10, Mathematics Building
, CEMAT-IST

On ARL-unbiased charts to monitor the traffic intensity of a single server queue

We know too well that the effective operation of a queueing system requires maintaining the traffic intensity at a target value.
This important measure of congestion can be monitored by using control charts, such as the one found in the seminal work by Bhat and Rao (1972) or more recently in Chen and Zhou (2015).
For all intents and purposes, this paper focus on three control statistics chosen by Morais and Pacheco (2016) for their simplicity, recursive and Markovian character:
- the number of customers left behind in the M/G/1 system by the n-th departing customer;
- the number of customers seen in the GI/M/1 system by the n-th arriving customer;
- the waiting time of the n-th arriving customer to the GI/G/1 system.
Since an upward and a downward shift in the traffic intensity are associated with a deterioration and an improvement (respectively) of the quality of service, the timely detection of these changes is an imperative requirement, hence, begging for the use of ARL-unbiased charts Pignatiello et al. (1995), in the sense that they detect any shifts in the traffic intensity sooner than they trigger a false alarm.
In this paper, we focus on the design of these type of charts for the traffic intensity of the three single server queues mentioned above.

Joint work with Sven Knoth

Cancelled due to Covid-19 containment measures.

09/01/2020, 16:00 — 17:00 — Room P3.10, Mathematics Building
Stevo Rackovic, Mathematics Department, Instituto Superior Técnico

Gaussian Process Regression for Animation Rig Towards the Face Model

In professional 3D animation artists model movements and scenes using rig functions - constrained set of sliders or controllers that propagate deformations and drive mechanism of object or character in systems of 3D tools. These controllers are manually built for each character and cannot be reused if the underlying structure is not exactly the same. There are often hundreds of adjustable parameters, and artists have to learn the structure for each new character. This is usually bottleneck in production, that might be avoided by automating this process. D. Holden et. al. proposed possible solutions using Gaussian Processes Regression, which showed useful in the case of skeletal (quadriped) characters. We want to further apply this on face model, that has a completely different structure than the skeletal model. In this work we explain the model for 3D face animation, the theory of Gaussian processes regression and a method to apply it for solving the problem of interest. At the end results and examples are presented with a simple animation model we have at our disposal.

19/11/2019, 14:00 — 15:00 — Room P3.10, Mathematics Building
, Departamento de Matemática, Instituto Superior Técnico

Extreme Value Theory applied to Longevity of Humans

There has been a long discussion on whether the distribution of human longevity has a finite or infinite right support. We shall discuss some recent results on Extreme Value Theory applied to Longevity of Humans. Some basic methods of EVT will be reviewed, where discussion will be oriented towards applications on human life-span data. It turns out that the quality of the actual data is a crucial issue. The results are based on data sets from the International Database on Longevity.

Joint work with Fei Huang (RSFAS, College of Business and Economics, Australian National University).

29/10/2019, 14:00 — 15:00 — Room P3.10, Mathematics Building
, European University Viadrina, Department of Statistics, Frankfurt, Germany

Monitoring Image Processes

In recent years we observe dramatic changes in the way in which quality features of manufactured products are designed and inspected. The modeling and monitoring problems obtained by new inspection methods and fast multi-stream high-speed sensors are quite complex. These measurement tools are used in emerging technologies like, e.g., additive manufacturing. It has been shown that in these fields other types of quality characteristics have to be monitored. It is mainly not the mean, the variance, the covariance matrix or a simple profile which reflects the behavior of the quality characteristics but the shape, surfaces and images, etc. This is a new area for SPC. Note that more complicated characteristics arise in other fields of applications as well like, e.g., the monitoring of optimal portfolio weights in finance. Since in the last years many new approaches have been developed in the fields of image analysis, spatial statistics and for spatio-temporal modeling a huge amount of tools are available to model the underlying processes. Thus the main problem lies on the development of monitoring schemes for such structures.

In this talk new procedures for monitoring image processes are introduced. They are based on multivariate exponential smoothing and cumulative sums taking into account the local correlation structure. A comparison is given with existing methods. Within an extensive simulation study the performance of the analyzed methods is discussed.

The presented results are based on a joint work with Yarema Okhrin and Ivan Semeniuk.

15/10/2019, 14:00 — 15:00 — Room P3.10, Mathematics Building
, University of Lisbon, Portugal

A LASSO-type model for the bulk and tail of a heavy-tailed response

As widely known, in an extreme value framework, interest focuses on modelling the most extreme observations — disregarding the central part of the distribution; commonly, the effort centers on modelling the tail of the distribution by the generalized Pareto distribution, in a Peaks over threshold framework. Yet, in most practical situations it would be desirable to model both the bulk of the data along with the extreme values. In this talk, I will introduce a novel regression model for the bulk and the tail of a heavy-tailed response. Our regression model builds over the extended generalized Pareto distribution, as recently proposed by Naveau et al (2016). The proposed model allows us to learn the effect of covariates on a heavy-tailed response via a LASSO-type specification conducted via a Lagrangian restriction. The performance of the proposed approach will be assessed through a simulation study, and the method will be applied to a real data set.

26/09/2019, 14:00 — 15:00 — Room P3.10, Mathematics Building
, Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

First Come, First Served Queues with Two Classes of Impatient Customers

We study systems with two classes of impatient customers who differ across the classes in their distribution of service times and patience times. The customers are served on a first-come, first served basis (FCFS), regardless of their class. Such systems are common in customer call centers, which often segment their arrivals into classes of callers whose requests may differ greatly in their complexity and criticality. We first consider an $M/G/1 + M$ queue and then analyze the $M/M/k + M$ case. Analyzing these systems using a queue length process proves intractable as it would require us to keep track of the class of each customer at each position in queue. Consequently, we introduce a virtual waiting time process where the service times of customers who will eventually abandon the system are not considered. We analyze this process to obtain performance characteristics such as the percentage of customers who receive service in each class, the expected waiting times of customers in each class, and the average number of customers waiting in queue. We use our characterization of the system to perform a numerical analysis of the $M/M/k + M$ system, and find several managerial implications of administering a FCFS system with multiple classes of impatient customers. Finally, we compare the performance a system based on data from a call center with the steady-state performance measures of a comparable $M/M/k + M$ system. We find that the performance measures of the $M/M/k + M$ system serve as good approximations of the system based on real data.

Joint work with:

Ivo Adan, Eindhoven University of Technology, the Netherlands,

and

Brett Hathaway, Kenan-Flagler School of Business, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

20/05/2019, 11:00 — 12:00 — Room P3.10, Mathematics Building
Alexandra Moura, ISEG and CEMAPRE

Optimal reinsurance of dependent risks

The talk will focus on the optimal reinsurance problem for two dependent risks, from the point of view of the ceding insurance company. We aim at maximizing the expected utility or the adjustment coefficient of the insurer wealth. The insurer buys reinsurance on each risk separately. By risk we mean a line of business, a portfolio of policies or a policy. We assume a generic known dependence structure, so that the optimal solution depends on the joint distribution. Due to dependencies, the optimal level of reinsurance for each risk involves a trade-off between the reinsurance premia of both risks. We study the shape of this trade-off and characterize the optimal treaties. We show that an optimal solution exists and provide an optimality condition. Unfortunately, explicit optimal treaties are not easy to compute from this condition. We discuss some strategies to obtain numerical approximations for the optimal treaties and discuss some aspects of the structure of the optimal strategy. Numerical results are presented assuming that the two risks are dependent by means of a copula structure and that the reinsurance treaty consists of a combination of quota-share and stop-loss. Sensitivity of the optimal reinsurance strategy is analyzed numerically to several factors, including the dependence structure, through the copula chosen, and the dependence strength, by means of the dependence parameter, corresponding to different values of the Kendall’s tau. A variety of reinsurance premium calculation principles are also considered.

06/05/2019, 11:00 — 12:00 — Room P8, Mathematics Building, IST
Manuel Cabral Morais, Department of Mathematics & CEMAT, Instituto Superior Técnico - Universidade de Lisboa

Improving the ARL profile of the Poisson EWMA chart

The Poisson exponentially weighted moving average (PEWMA) chart was proposed by Borror et al. (1998) to monitor the mean of counts of nonconformities. This chart regrettably fails to have an in-control average run length (ARL) larger than any out-of-control ARL, i.e., the PEWMA chart is ARL-biased. Moreover, due to the discrete character of its control statistic the PEWMA it is difficult to set the control limits in such way that the in-control takes a desired value, say ARL0. In this paper, we propose an ARL-unbiased counterpart of the PEWMA chart and use the R statistical software to provide gripping illustrations of this chart with a decidedly improved ARL profile and an in-control ARL equal to ARL0. We also compare the ARL performance of the proposed chart with the one of a few competing control charts for the mean of i.i.d. Poisson counts.

Joint work with Sven Knoth (Department of Mathematics and Statistics — Faculty of Economics and Social Sciences — Helmut Schmidt University, Hamburg, Germany)

24/04/2019, 13:00 — 14:00 — Room P4.35, Mathematics Building
Clément Dombry, Université Franche-Comté, Besançon, France

The coupling method in extreme value theory

One of the main goal of extreme value theory is to infer probabilities of extreme events for which only limited observations are available and require extrapolation of the tail distribution of the observations. One major result is Balkema-de Haan-Pickands theorem that provides an approximation of the distribution of exceedances above high threshold by a Generalized Pareto distribution. We revisit these results with coupling arguments and provide quantitative estimates for the Wasserstein distance between the empirical distribution of exceedances and the limit Pareto model. In a second part of the talk, we extend the results to the analysis of a proportional tail model for quantile regression closely related to the heteroscedastic extremes framework developed by Einmahl et al. (JRSSB 2016). We introduce coupling arguments relying on total variation and Wasserstein distances for the analysis of the asymptotic behavior of estimators of the extreme value index and integrated skedasis function.

Joint work with B. Bobbia and D. Varron (Université de Franche Comté).

22/04/2019, 11:00 — 12:00 — Room P8, Mathematics Building, IST
Soraia Pereira, Faculdade de Ciências da Universidade de Lisboa and CEAUL

Geostatistical analysis of sardine eggs data — a Bayesian approach

Understanding the distribution of animals over space, as well as how that distribution is influenced by environmental covariates, is a fundamental requirement for the effective management of animal populations. This is especially the case for populations which are harvested. The sardine is one of the most important fisheries species, both for its economic, sociologic, antropologic and cultural values.

Here we intend to understand the spatial distribution of the average number of sardine eggs by $m^3$. Our main objectives are to identify the environmental variables that better explain the spatial variation in sardine eggs density and to make predictions in spatial points that were not observed.

The data structure presents an excess of zeros and extreme values. To deal with this, we propose a point-referenced zero-inflated model to model the probability of presence together with the positive sardine eggs density and a point-referenced generalized Pareto model for the extremes. Finally, we combine the results of these two models to get the spatial predictions of the variable of interest. We follow a Bayesian approach and the inference is made using the package R-INLA in the software R.

Older session pages: Previous 2 3 4 5 6 7 8 9 10 11 12 Oldest