Skip to main content
EN
HU
Home
EN
HU

Main navigation

  • Discover
    • News
    • Events
    • Tenders
  • Research fields
  • Resources
    • Publications
    • Downloads
  • About us
  • Partners
  1. Home
  2. Events
Feb 25, 2021 | 2:30 - 4:00pm

Language technology research seminar

Judit Ács (BME, SZTAKI) will give a lecture in the research seminar series of the MILAB Language Technology sub-project on February 25 (Thursday) from 14:30 to 16:00. All interested parties are welcome at this link.

 

Evaluating multilingual language models

 

Contextualized language models such as BERT have changed the NLP landscape in the last few years. BERT and some of its contemporaries have massively multilingual versions with support for over 100 languages. Probing is a simple but popular method for evaluating the linguistic content of such models. We perform a systematic evaluation in dozens of languages across multiple probing tasks.

The first part of this talk describes a large scale morphological evaluation of the multilingual BERT in 40 languages. Aside from raw evaluation, we perturb the input in a way that removes parts of the information and we analyze the change in BERT's linguistic behavior. We show that linguistic typology can be recovered to some degree through these methods.

The second part deals with the tokenization of these contextualized language models. All models use some kind of subword tokenization with a fixed subword vocabulary. Token-level usage of such models such as named entity recognition requires a way of pooling multiple subwords that correspond to a single token. We show that the choice of subword pooling method often makes a large difference and that there is no one size fits all when it comes to subword pooling.

The third part of this talk focuses on the use of contextualized models for Hungarian. We compare 4 multilingual models against two Hungarian models, HuBERT and HILBERT on three Hungarian tasks, morphological probing, POS tagging and NER.

Research fields

Human Language Processing

Read more
Institutes
Read more
Read more
Read more
Home

LinkedIn

Become a partner

Subscribe to newsletter

Send partnership request

Explore

  • News
  • Events
  • Tenders
  • Publications
  • Downloads
  • Partners

Research fields

  • Foundations of AI
  • Human Language Processing
  • Machine perception
  • Medical, Health and Biology
  • Security and Privacy
  • Sensors, IoT and Telecommunications

Contact us

Hungary, H-1111 Budapest,
Kende u. 13-17.
+36 1 279 6000
milab@sztaki.hu

© 2020-2021 Artifical Intelligence National Laboratory, Budapest