PRG 1978

Expanding the scope of a multi-purpose lexicographic resource to grammar and L2 competence

Uue aja sõnastik: grammatika ja keelepädevuse kirjeldamine integreeritud multifunktsionaalses leksikograafilises ressursis

The aim of the project is to develop theoretical and methodological foundations for the representation of grammar and linguistic competence in an integrated multi-purpose lexicographic resource created at the Institute of the Estonian Language, and, more generally, to advance lexicographic theory and methodology. We will employ the idea of the lexicon as a linking device (of fragments of different linguistic representations as syntax and semantics). Our long-term aim is to have a single data source that provides a comprehensive and empirically reliable description of the Estonian language for different user groups (native speakers, learners, researchers, developers of NLP applications, etc.) via API and a customised search interface. In addition to lexicographic theory and methodology, the project’s results will contribute to the description of Estonian and to usage-based and construction-based linguistic theory.

Projekti raames töötatakse välja teoreetilised ja metodoloogilised eeldused sõnavara ja grammatika seoste ning teise ja võõrkeele lingvistilise pädevuse kirjeldamiseks ühtses multifunktsionaalses leksikograafilises ressursis. Projekt viib edasi korpus- ja arvutileksikograafia uusimaid suundumusi ja täidab lünga eesti keele õppimise ja õpetamise ressurssides, lähtudes kasutuspõhisest ja konstruktsioonipõhisest lähenemis-viisist keeleteooriale ja keele omandamise teooriale ning leksikograafiale. Projekt tugineb Eesti Keele Instituudi pikaajalisele sõnaraamatutöö traditsioonile ning selle tulemusirakendatakse Eesti Keele Instituudi leksikograafilises ressursis Ekilex, mis pakub otsinguliidese ja API kaudu erikasutaja-gruppidele terviklikku eesti keele kirjeldust. Lisaks leksikograafia teooriale ja metodoloogiale annavad projektitulemused panuse eesti keele kirjeldamisse ning kasutus- ja konstruktsioonipõhisesse keele-teooriasse.

Team

Jelena Kallas, PhD Jelena.Kallas@eki.ee
Ene Vainik, PhD Ene.Vainik@eki.ee
Geda Paulsen, PhD Geda.Paulsen@eki.ee
Heete Sahkai, PhD Heete.Sahkai@eki.ee
Kristina Koppel, PhD Kristina.Koppel@eki.ee
Raili Pool, PhD Raili.Pool@eki.ee
Arvi Tavast, PhD Arvi.Tavast@eki.ee
Katrin Tsepelina Katrin.Tsepelina@eki.ee
Ahto Kiil ahto.kiil@gmail.com (alates 2026)
Annely-Maria Liivas annely.liivas@gmail.com (alates 2026)

PhD students
Kertu Saul Kertu.Saul@eki.ee
Kelly Lilles Kelly.Lilles@eki.ee
Natalia Vaiss natalia.vaiss@gmail.com
Mai Raet Mai.Raet@tlu.ee
Liina Lutsepp llutsepp@tlu.ee (alates 2026)
Pilvi Alp (kuni 2024)

Partners
Maria Tuulik, PhD
Ahti Lohk, PhD (kuni 2024)
Evelin Arust (andmehaldur alates 2026)
Tõnis Nurk (andmehaldur kuni 2025)

Keeleline teadmine ei jagune leksikoniks ja grammatikaks, vaid moodustab ühe erineva üldisusastmega sümboolsete üksuste võrgustiku. Seetõttu tuleb sõnaraamatut ja seal sõnaliigimärgenditega indekseeritavat traditsioonilist grammatikat paratamatult täiendada n-ö “konstruktikoniga”, kus on võimalik kirjeldada ka produktiivsete konstruktsioonide vormi, tähendust ja kombinatoorseid omadusi.
Heete Sahkai. 2008
Eesti rakenduslingvistika ühingu aastaraamat 4, 171–186

Work packages (WP)

WP 1 Methods and tools for the identification of grammatical constructions and their lexicographic description
WP2 Methods and tools for the identification and description of proficiency level information
WP 3 Revision of the Ekilex data model and the design of the Sõnaveeb interface
WP 4 Scientific events
WP 5 Dissemination

Related projects

The project integrates the results and ideas from an earlier project (PSG227) “Redefining Estonian parts of speech: a corpus-driven approach“ carried out at the Institute of the Estonian Language (2019-2022)
CA22126 – European Network On Lexical Innovation (ENEOLI) 2023-2027
CA21167 – Universality, diversity and idiosyncrasy in language technology (UniDive) 2022-2026
CA22115 – A Multilingual Repository of Phraseme Constructions in Central and Eastern European Languages (PhraConRep) 2023-2027
European network for Web-centred linguistic data science (CA18209) 2019-2023
European Network for Combining Language Learning with Crowdsourcing Techniques (CA16105) 2017-2021
L2 hääldusõpe: tehisarupõhise ja konstruktsioonipõhise õppe võimalused (Institute of the Estonian Language)

(Co-)organized workshops

NLP4CALL 2025 Workshop at NoDaLiDa/Baltic-HLT 2025 conference (March 2025)
From a dictionary to a constructicon – how to represent the syntax-lexicon continuum in a digital resource? Workshop at EAAL2024 conference (April 2024) Video
Linking Lexicographic and Language Learning Resources (4LR). Workshop at LDK 2023 (September 2023)
Lexicography and CEFR: Linking lexicographic resources and language proficiency levels. Workshop at eLex2023 (June 2023)

Conferences and workshops

Constructionist Approaches to Language Pedagogy (CALP-5) (March 2026)
Eesti Keele Instituudi seminar (Tallinn) (January 2025)
eLex 2025: Electronic Lexicography in the 21st century (Bled, Slovenia) (November 2025)
28th International Conference on Text, Speech and Dialogue (TSD 2025) (Erlangen-Nürnberg, Germany) (August 2025)
10th Workshop on Speech and Language Technology in Education (SLaTE) (Nijmegen, Netherlands) (August 2025)
14. rahvusvaheline fennougristika kongress (Tartu, Estonia) (August 2025)
Corpus Linguistics 2025 (Birmingham, UK) (June 2025)
XXI Språkets funktion -symposium (Åbo Akademi University, Turku) (May 2025)
Workshop on Pedagogical Application of Constructicography (CAP 2025) (Gothenburg, Sweden) (May 2025)
UniDive COST Action CA21167 3rd general meeting (January 2025)
20. muutuva keele päev (Detsember 2024)
21st EURALEX International Congress Lexicography and Semantics (November 2024)
Learner Corpus Research Conference (LCR) (September 2024)
The 13th International Conference on Construction Grammar (ICCG13) (August 2024)
Constructionist Approaches to Language Pedagogy (CALP4) (March 2024)
eLEX 2023: Electronic Lexicography in the 21st Century (June 2023)
Workshop on Profiling second language vocabulary and grammar (2023) (April 2024)

Presentations

Publications

International Conference “Constructionist Approaches to Language Pedagogy” (CALP-5) March 4-6, 2026

Applications

Noun D-index calculator / Käändsõna D-indeksi kalkulaator Kalkulaator näitab, kas ja kuivõrd erineb käändsõna suhteline sagedus Ühendkorpuses 2019 käändsõnadele üldisest omasest suhtelisest sagedusest. Sagedusnormid on arvutatud tuginedes varem avaldatud statistikale käändekategooriate esinemisest eestikeelsetes tekstides. Suhtelise sageduse lävend DI = 0,130 on seatud empiiriliselt võrreldes normaalse jaotusega ning paradigmadest irduma kippuvate sõnavormide suhtelise sageduse näitajaid. Lävendist suurema suhtelise sagedusega vormid on varustatud sildiga „kriitiline“. Kalkulaatori näit on heuristik, mis osutab statistilisele tendentsile. Sõnavormi leksikograafilise staatuse üle otsustamine jääb leksikograafi ülesandeks, kes kaalutleb sõnavormi morfosüntaktilisi, semantilisi ja paradigmaatilisi jm omadusi. Normide arvutamise ning kalkulaatori loomise protseduurid on kirjeldatud artiklites: Vainik, Ene; Paulsen, Geda; Lohk, Ahti (2021a). Käändevormist sõnaks: mida näitab sagedus? Eesti Rakenduslingvistika Ühingu aastaraamat = Estonian papers in applied linguistics, 17, 285−307. DOI: 10.5128/ERYa17.16. Vainik, Ene; Lohk, Ahti; Paulsen, Geda (2021b). The Distribution Index Calculator for Estonian. Electronic lexicography in the 21st century. Proceedings of the eLex 2021 conference.: eLex 2021 conference: Post-editing lexicography; 5–7 July 2021, virtual. Ed. Kosem, I., Cukr, M., Jakubíček, M., Kallas, J., Krek, S. & Tiberius, C. Brno: Lexical Computing CZ, s.r.o, 121−138. Paulsen, Geda; Vainik, Ene; Lohk, Ahti; Tuulik, Maria (2021). Catching lexemes. The case of Estonian noun-based ambiforms. Electronic lexicography in the 21st century. Proceedings of the eLex 2021 conference.: eLex 2021 conference: Post-editing lexicography; 5–7 July 2021, virtual. Ed. Kosem, I., Cukr, M., Jakubíček, M., Kallas, J., Krek, S. & Tiberius, C. Brno: Lexical Computing CZ, s.r.o, 288−311.
Adjective similarity calculator
Eesti freimileksikon (Tartu Ülikool)
Sõnastiku- ja terminibaas Ekilex
Sõnaveebi “Õpetajate tööriistad”
ruMab

How to cite

This work was supported by the Estonian Research Council grant (PRG 1978). / Uurimistööd on finantseerinud Eesti Teadusagentuur (PRG 1978).

Funded by the Estonian Research Council.

Kas leidsid, et sisu on kasulik?

Jah

Sinu tagasiside on meieni jõudnud. Aitäh!