Skip to main content

LangAge and the FAIR-principles

Data Managment
Photo: Pixabay

 

Since day one, sustainability of data has been a priority of the LangAge initiative, and it was easy to relate to the FAIR-principles of Findability, Accessibility, Interoperability and Reusability. Hence LangAge is listed under the CLARIN digital resource register infrastructure and will be registered with a DOI soon (“findable”).

LangAge has been made available to all via registration on LaBB-CAT, a browser-based annotation storage and searching tool (“accessible”). Data is stored in a robust XML format, and can additionally be exported in the equally broadly used textGrid format (“interoperable”). Due to the sensitive nature of LangAge data, access rights must be allocated according to corresponding juridical and ethical issues. Thanks to LaBB-CAT's flexible user management, public access is restricted to interviews for which written consent of online publication was granted by the participants.

On LaBB-CAT, interviews and annotations are searchable and results are exportable in *.csv files, analyzable, e.g., by R or Excel. The resources are published under a Creative Commons License (“reusable”). To maintain full transparency and data reusability, an in-depth transcription guide was published online explaining methodological decisions made.

Thanks to the LaBB-CAT database and other exchange formats, LangAge is used intensively for teaching and research purposes, including several Master and PhD theses. Furthermore, the granularity of the metadata architecture available to registered LaBB-CAT users makes the initiative easily reusable beyond the field of language and aging. As proven by the interdisciplinary body of work written around LangAge, the project is also openly accessible to non-linguists.

Thus, in addition to LangAge’s local LaBB-CAT database storage on a virtual machine at the University of Potsdam, a sample of the 2005 interviews will for example be published as part of the 2023 Oral-History.Digital-Project (OH.D) at the Freie Universität Berlin. This step will make the LangAge initiative easily available to researchers beyond linguistics, thanks to OH.D’s unified system of data storage, metadata and content-related keywords.

The next step in furthering the reusability of LangAge is future inclusion of the database in the French platform of digital humanities, Collections de Corpus Oraux Numériques (CoCoon, huma-num.fr), continuing LangAge's historic close cooperation with the ESLO-team in Orléans.