Publications

(2026). Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech.
(2026). Omnilingual MT: Machine Translation for 1,600 Languages.
(2025). RAED: Retrieval-Augmented Entity Description Generation for Emerging Entity Linking and Disambiguation. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.
(2025). BOOKCOREF: Coreference Resolution at Book Scale. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
(2025). Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation. Findings of the Association for Computational Linguistics: NAACL 2025.
(2024). Minerva LLMs: The First Family of Large Language Models Trained from Scratch on Italian Data. Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024).
(2024). Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing.
(2024). Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget. Findings of the Association for Computational Linguistics: ACL 2024.
(2024). Mitigating Data Scarcity in Semantic Parsing across Languages with the Multilingual Semantic Layer and its Dataset. Findings of the Association for Computational Linguistics: ACL 2024.
(2024). Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People′s Gender and Origin. Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP).