Skip to Main content Skip to Navigation
New interface
Other publications

Résumé de communication (table-ronde Lexicométrie et corpus multilingues)

Abstract : Beyond automatic parallel text alignment, which is now well-known of our scientific community, this panel session focuses on how to extend statistical techniques in view of exploring multilingual textual data. As regards parallel corpora, new tools and methodologies have emerged. Processing comparable corpora (i.e. made-up of similar texts which are not the translation of one another) is also a significant challenge. Textual statistics for monolingual corpora can be adapted to this new type of data. Furthermore, some corpora are written in languages which raise new issues as concerns textual statistics softwares: for example the management of the characters encoding, the corpus tokenisation into sensible word-like units, or the definition of clear and coherent linguistic annotation schemes. International standards have recently been published and others are in preparation. They constitute efficient guidelines for corpus and linguistical ressources encoding. As they deal with the genuine diversity of languages throughout the world, these standards allow the comparability and the reusability of textual data.
Document type :
Other publications
Complete list of metadata
Contributor : Maria Zimina-Poirot Connect in order to contact the contributor
Submitted on : Wednesday, November 4, 2015 - 11:02:16 PM
Last modification on : Friday, March 4, 2022 - 3:31:33 AM


  • HAL Id : hal-01224679, version 1


Maria Zimina. Résumé de communication (table-ronde Lexicométrie et corpus multilingues). 2004, pp.1203-1206. ⟨hal-01224679⟩



Record views