Room P3.10, Mathematics Building

Miguel Almeida, Priberam Labs
Statistical Learning for Natural Language Processing

The field of Natural Language Processing (NLP) deals with automatic processing of large corpora of text such as newswire articles (from online newspaper websites), social media (such as Facebook or Twitter) and user-created content (such as Wikipedia). It has experienced large growth in academia as well as in the industry,  ith large corporations such as Microsoft, Google, Facebook, Apple, Twitter, Amazon, among others, investing strongly in these technologies.

One of the most successful approaches to NLP is statistical learning (also known as machine learning), which uses the statistical properties of corpora of text to infer new knowledge.

In this talk I will present multiple NLP problems and provide a brief overview of how they can be solved with statistical learning. I will also present one of these problems (language detection) in more detail to illustrate how basic properties of Probability Theory are at the core of these techniques.