# Probability and Statistics Seminar

## Past sessions

### Mathematics in consultancy

Major telecommunications companies currently hold complex systems, with varying interdependencies and still remarkable reliability. The constant development and changes in their core, along with different types of users involved in the use of these systems, makes them extremely interesting. The maintenance, the service assurance and the positive evolution of this network depends on the deep understanding and control of these complex systems. This knowledge, which refers to different interactions between internal and exogenous systems, allows you, for instance, to control and increase the speed of problem solving, to correctly approach trouble tickets and to detect the root of the problems.

To address these problems, the deconstruction of an error safeguarding and fuzzy logic has been implemented, emphasizing the control, the precision and the planning of the systems behaviour. Statistics and Quality Control brought another perspective to approach these issues. The first results, obtained from field operations and with a direct influence on the user's experience, are the focus of this presentation.

### Estimating Adaptive Market Efficiency Using the Kalman Filter

This paper addresses the adaptive market hypothesis (AMH), which suggests that market efficiency is not a stable property, but rather that it evolves with time. The test of evolving efficiency (TEE) investigates the efficiency of a particular market by using a multi-factor model with time-varying coefficients and GARCH errors. The model is a variant of the stochastic GARCH in Mean (GARCH-M) proposed in 1990, which tests for market efficiency in an absolute sense, i.e. by assuming that market efficiency is unchanged over time. To resolve this problem, the TEE extends all previous tests and provides a mechanism for observing the market learning process by estimating the changes in market efficiency over time. Both stochastic GARCH-M and TEE models are estimated using Kalman filtering techniques. The contribution of this paper is two-fold:

1. we explain in detail the quasi-maximum likelihood estimation (QMLE) procedure based on the standard Kalman filter applied to the stochastic GARCH-M and TEE models;
2. we estimate the changes in the level of market efficiency in three markets over a period that includes the financial markets crisis of 2007/2008.

The three markets are specifically chosen to reflect a developed (London LSE), mature emerging (Johannesburg JSE) and immature emerging market (Nairobi NSE) perspective. Our empirical study suggests that, in spite of the financial crisis, all three markets maintained their pre-crisis level of weak-form efficiency.

### Stochastic modeling in Communications Networks

Research in stochastic modeling is strongly influenced by applications in diverse domains. Communication networks constitute a lively field that motivate the study of new stochastic models and sometimes the development of new methods of analysis to understand the overall functioning of these complex systems. In this talk, it is presented some stochastic models found in communications networks from the perspective of the speaker's work over the last decade.

### Traffic Estimation of a M/G/1 Queue Using Probes

The huge growth of the Internet associated to the appearance of new multimedia applications requiring high demands of traffic, gives an important role to the monitorization of Internet traffic for quality of service assessment.

In this context, Internet probing has been a subject of great interest for researchers, since it permits to measure the internet performance by sending controlled probe packets to the network whose observed performance can be used to estimate the characteristics of the original traffic.

In this work we consider the estimation of the arrival rate and the service time moments of a Internet router modelled as a M/G/1 queue with probing. The probe inter-arrival times are i.i.d. and probe service times follow a general positive distribution. The only observations used are the arrival times, service times and departure times of probes. We derive the main equations from which the quantities of interest can be estimated. Two particular probe arrivals, deterministic and Poisson, are investigated.

Joint work with with Nélson Antunes (FCT/DMAT of Algarve University and CEMAT) and António Pacheco (Instituto Superior Técnico and CEMAT)

### The Prior Uncertainty and Correlation of Statistical Economic Data

Empirical estimates of source statistical economic data such as transaction flows, greenhouse gas emissions or employment are always subject to measurement errors but empirical estimates of source data errors are often missing. This paper uses concepts from Bayesian inference and the Maximum Entropy Principle to estimate the prior probability distribution, uncertainty and correlations of source data when such information is not explicitly provided. In the absence of additional information, an isolated datum is described by a truncated Gaussian distribution, and if an uncertainty estimate is missing, its prior equals the best guess. When the sum of a set of disaggregate data is constrained to match an aggregate datum, it is possible to determine the prior correlations among disaggregate data. If aggregate uncertainty is missing, all prior correlations are positive. If aggregate uncertainty is available, prior correlations can be either all positive, all negative, or a mix of both. An empirical example is presented, which reports uncertainty and correlation priors for the County Business Patterns database.

### High quantile estimation and spatial aggregation applied to precipitation extremes

We shall address the problem of high quantile estimation in univariate and spatial Extreme Value Theory. Univariate methods are well known under the Maximum Domain of Attraction Condition and the Pareto tail approximation is the basis for many estimators. It turns out that the Pareto tail approximation is also valid under spatial aggregation but a spatial effect comes out. We shall address the problem both theoretically and in practice, by presenting a case study on 100-year return value estimation for precipitation data collected at rain-gauge stations.

### Level Crossing Ordering of Stochastic Processes

Stochastic Ordering is an important area of Applied Probability tailored for qualitative comparisons of random variables, random vectors, and stochastic processes. In particular, it may be used to investigate the impact of parameter changes in important performance measures of stochastic systems, avoiding exact computation of those performance measures. In this respect, the great diversity of performance measures used in applied sciences to characterize stochastic systems has inspired the proposal of many types of stochastic orderings.

In this talk we address the level crossing ordering, proposed by A. Irle and J. Gani in 2001, that compares stochastic processes in terms of the times they take to reach high levels (states). After introducing some motivation for the use of the level crossing ordering, we present tailored sufficient conditions for the level crossing ordering of (univariate and multivariate) Markov and semi-Markov processes. These conditions are applied to the comparison of birth-and- death processes with catastrophes, queueing networks, and particle systems.

Our analysis highlights the benefits of properly using the sample path approach, which compares directly trajectories of the compared processes defined on a common probability space. This approach provides, as a by-product, the basis for the construction of algorithms for the simulation of stochastic processes ordered in the level crossing ordering sense. In the case of continuous Markov chains, we resort additionally to the powerful uniformization technique, which uniformizes the rates at which transitions take place in the processes being compared.

Joint work with Fátima Ferreira (CM-UTAD and Universidade de Trás-os-Montes e Alto Douro).

### A low-rank tensor method for structured large-scale Markov Chains

A number of practical applications lead to Markov Chains with extremely large state spaces. Such an instance arises from the class of Queuing Networks, which lead to a number of applications of interest including, for instance, the analysis of the well-known tandem networks. The state space of a Markov process describing these interactions typically grows exponentially with the number of queues. More generally, Stochastic Automata Networks (SANs) are networks of interacting stochastic automata. The dimension of the resulting state space grows exponentially with the number of involved automata. Several techniques have been established to arrive at a formulation such that the transition matrix has Kronecker product structure. This allows, for example, for efficient matrix-vector multiplications. However, the number of possible automata is still severely limited by the need of representing a single vector (e.g., the stationary vector) explicitly. We propose the use of low-rank tensor techniques to avoid this barrier. More specifically, an algorithm will be presented that allows to approximate the solution of certain SAN s very efficiently in a low-rank tensor format.

### Strategies to reduce the probability of a misleading signal

Standard practice in statistical process control is to run two individual charts, one for the process mean and another one for the process variance. The resulting scheme is known as a simultaneous scheme and it provides a way to satisfy Shewhart's dictum that proper process control implies monitoring both location and dispersion.

When we use a simultaneous scheme, the quality characteristic is deemed to be out-of-control whenever a signal is triggered by either individual chart. As a consequence, the misidentification of the parameter that has changed can occur, meaning that a shift in the process mean can be misinterpreted as a shift in the process variance and vice-versa. These two events are known as misleading signals (MS) and can occur quite frequently.

We discuss (necessary and) sufficient conditions to achieve values of PMS smaller than or equal to $0.5$, explore, for instance, alternative simultaneous Shewhart-type schemes and check if they lead to PMS which are smaller than the ones of the popular $(\bar{X}, S^2)$ simultaneous scheme.

### Price Modelling in Carbon Emission and Electricity Markets

We present a model to explain the joint dynamics of the prices of electricity and carbon emission allowance certificates as a function of exogenously given fuel prices and power demand. The model for the electricity price consists of an explicit construction of the electricity supply curve; the model for the allowance price takes the form of a coupled forward-backward stochastic differential equation (FBSDE) with random coefficients. Reflecting typical properties of emissions trading schemes the terminal condition of this FBSDE exhibits a gradient singularity. Appealing to compactness arguments we prove the existence of a unique solution to this equation. We illustrate the relevance of the model at the example of pricing clean spread options, contracts that are frequently used to value power plants in the spirit of real option theory.

### Incorporating parameter uncertainty into the setup of EWMA control charts monitoring normal variance

Most of the literature concerned with the design of control charts relies on perfect knowledge of the distribution for at least the good (so-called in-control) process. Some papers treated the handling of EWMA charts monitoring normal mean in case of unknown parameters - refer to Jones, Champ and Rigdon (2001) for a good introduction. In Jensen, Jones-Farmer, Champ, and Woodall (2006): “Effects of Parameter Estimation on Control Chart Properties: A Literature Review” a nice overview was given. Additionally, it was mentioned that it would be interesting and useful to evaluate and take into account these effects also for variance control charts. Here, we consider EWMA charts for monitoring the normal variance. Given a sequence of batches of size $n$, $\{X_{i j}\}$, $i=1,2,\ldots$ and $j=1,2,\ldots,n$ utilize the following EWMA control chart: \begin{align*} Z_0 & = z_0 = \sigma_0^2 = 1 \,, \\ Z_i & = (1-\lambda) Z_{i-1} + \lambda S_i^2 \,,\; i = 1,2,\ldots \,,\\ & \qquad\qquad S_i^2 = \frac{1}{n-1} \sum_{i=1}^n (X_{ij} - \bar X_i)^2 \,,\; \bar X_i = \frac{1}{n} \sum_{i=1}^n X_{ij} \,, \\ L & = \inf \left\{ i \in I\!\!N: Z_i > c_u \sigma_0^2 \right\} \,. \end{align*} The parameters $\lambda \in (0,1]$ and $c_u \gt 0$ are chosen to enable a certain useful detection performance (not too much false alarms and quick detection of changes). The most popular performance measure is the so-called Average Run Length (ARL), that is $E_{\sigma}(L)$ for the true standard deviation $\sigma$. If $\sigma_0$ has to be estimated by sampling data during a pre-run phase, then this uncertain parameter effects, of course, the behavior of the applied control chart. Typically the ARL is increased. Most of the papers about characterizing the uncertainty impact deal with the changed ARL patterns and possible adjustments. Here, a different way of designing the chart is treated: Setup the chart through specifying a certain false alarm probability such as $P_{\sigma_0}(L\le 1000) \le \alpha$. This results in a specific $c_u$. Here we describe a feasible way to determine this value $c_u$ also in case of unknown parameters for a pre-run series of given size (and structure). A two-sided version of the introduced EWMA scheme is analyzed as well.

### Reaching the best possible rate of convergence to equilibrium of Boltzmann-equation solutions

This talk concerns a definitive answer to the problem of quantifying the relaxation to equilibrium of the solutions to the spatially homogeneous Boltzmann equation for Maxwellian molecules. Under really mild conditions on the initial datum - closed to necessity - and a weak, physically consistent, angular cutoff hypothesis, the main result states that the total variation distance (i.e. the ${L}^{1}$-norm in the absolutely continuous case) between the solution and the limiting Maxwellian distribution admits an upper bound of the form $C\mathrm{exp}\left(-{\Lambda }_{b}^{*}t\right)$, ${\Lambda }_{b}^{*}$ being the spectral gap of the linearized collision operator and $C$ a constant depending only on the initial datum. Hilbert hinted at the validity of this quantification in 1912, which was explicitly formulated as a conjecture by McKean in 1966. The main line of the new proof is based on an analogy between the problem of convergence to equilibrium and the central limit theorem of probability theory, as suggested by McKean.

### Robust Procedures for Nonlinear Models for Full and Incomplete Data

Linear models are one of the most popular models in Statistics. However, in many situations the nature of the phenomenon is intrinsically nonlinear and so, linear approximations are not valid and the data must be fitted using a nonlinear model. Besides, in some occasions the responses are incomplete and some of them are missing at random.

It is well known that, in this setting, the classical estimator of the regression parameter based on least squares is very sensitive to outliers. A family of general M-estimators is proposed to estimate the regression parameter in a nonlinear model. We give a unified approach to treat full data or data with missing responses. Under mild conditions, the proposed estimators are Fisher-consistent, consistent and asymptotically normal. To study local robustness, their influence function is also derived.

A family of robust tests based on a Wald-type statistic is introduced in order to check hypotheses that involve the regression parameter. Monte Carlo simulations illustrate the finite sample behaviour of the proposed procedures in different settings in contaminated and uncontaminated samples.

### An INteger AutoRegressive afternoon - Statistical analysis of discrete valued time series

Part I: Univariate and multivariate models based on thinning

Part II: Modelling and forecasting time series of counts

Time series of counts arise when the interest lies on the number of certain events occurring during a specified time interval. Many of these data sets are characterized by low counts, asymmetric distributions, excess zeros, over dispersion, etc, ruling out normal approximations. Thus, during the last decades there has been considerable interest in models for integer-valued time series and a large volume of work is now available in specialized monographs. Among the most successful models for integer-valued time series are the INteger- valued AutoRegressive Moving Average, INARMA, models based on the thinning operation. These models are attractive since they are linear-like models for discrete time series which exhibit recognizable correlation structures. Furthermore, in many situations the collected time series are multivariate in the sense that there are counts of several events observed over time and the counts at each time point are correlated. The first talk introduces univariate and multivariate models for time series of counts based on the thinning operator and discusses their statistical and probabilistic properties. The second talk addresses estimation and diagnostic issues and illustrates the inference procedures with simulated and observed data.

### Mathematical Finance in South Africa

I have been involved in Math Finance university education in South Africa since 1996. During this time I have produced numerous graduates & grown an extensive network of industry & academic partners. I'll talk about these experiences & take questions.

### Aggregational Gaussianity Using Sobol Sequencing In the South African Equity Markets: Implications for the Pricing of Risk

Stylized facts of asset returns in the South African market have received extensive attention, with multiple studies published on non-normality of returns, heavy-tailed distributions, gain-loss asymmetry and, particularly, volatility clustering. The one such fact that has received only cursory attention world-wide is that of Aggregational Gaussianity - the widely-accepted/stylized fact that empirical asset returns tend to normality when the period over which the return is computed increases. The aggregational aspect arises from the $n$-day log-return being the simple sum of $n$ one-day log-returns. This fact is usually established using Q-Q-plots over longer and longer intervals, and can be qualitatively confirmed. However, this methodology inevitably uses overlapping data series, especially for longer period returns. When an alternative resampling methodology for dealing with common time-overlapping returns data is used an alternate picture emerges. Here we describe evidence from the South African market for a discernible absence of Aggregational Gaussianity and briefly discuss the implications of these findings for the quantification of risk and to the pricing and hedging of derivative securities.

### Real Time Statistical Process Control of the Quantity of Product in Prepackages

In this presentation we will describe how we developed a methodology for the statistical quantity control processes of prepackagers and present a number of different case studies based on the type of product, packaging, production, filling line and system of data acquisition. With the aim of establishing a global strategy to control the quantity of product in prepackages, an integrated planning model based on statistical tools was developed. This model is able to manage the production functions concerning the legal metrological requirements. These requirements are similar all around the world because they are based on the recommendation R-87: 2004 (E) from the International Organization of Legal Metrology (OIML). Based on the principles of Statistical Process Control a methodology to analyze in real time the quantity of product in prepackages was proposed; routine inspections, condition monitoring of the main components and friendly comprehension of the outputs were taken into account. Subsequently, software of data acquisition, registration to guarantee traceability and treatment for decisions which can be configured for any kind of filling process was introduced. The impacts of this system, named ACCEPT- Computer Based Help for the Statistic Control of the Filling Processes, at the industry is demonstrated by the large number of companies that are using this system to control their processes. In Portugal, more than 50 companies and thousands of operators with very low qualifications are working every day with SPC tools and capability analysis in order to minimize variability and waste (for example: over filling), to ensure compliance and to guarantee the consumers rights.

Older session pages: Previous 6 7 8 9 10 11 Oldest