Discovering Automated Lexicography: The Case of the Slovene Lexical Database
Citations Over TimeTop 10% of 2016 papers
Abstract
In this paper, we describe the compilation of the Slovene Lexical Database; main focus being on developing the methodology to improve the tools used for lexicographic analysis and to introduce automatic data extraction in the lexicographic process. The semi-automated approach, which was devised in the last stages of database compilation, involved extracting corpus data, i.e. grammatical relations, collocations, examples, and grammatical labels, and conducting lexicographic analysis in the dictionary-writing system rather than in the corpus tool. An evaluation that compared the manual approach with the semi-automatic approach showed that the semi-automatic approach is much quicker and presents the lexicographers with almost all the information they identified as relevant during the manual analysis, as well as additional potentially relevant information for the dictionary entry. The final section of the paper proposes a few avenues for improvement of the semi-automated approach, including the implementation of crowdsourcing and additional post-processing of automatically extracted data.
Related Papers
- → Discovering Automated Lexicography: The Case of the Slovene Lexical Database(2016)21 cited
- → Automated Extraction of Semantic Word Relations in Turkish Lexicon(2011)11 cited
- Automatic Lexicon Generation through WordNet(2003)
- BUILDING A CORPUS BASED ADJECTIVE LEXICON FOR TURKISH(2004)
- → A Corpus-Based Approach for Building Semantic Lexicons(1997)161 cited