Building the essential resources for Finnish: the Turku Dependency Treebank
Citations Over TimeTop 10% of 2013 papers
Abstract
In this paper, we present the final version of a publicly available treebank of Finnish, the Turku Dependency Treebank. The treebank contains 204,399 tokens (15,126 sentences) from 10 different text sources and has been manually annotated in a Finnish-specific version of the well-known Stanford Dependency scheme. The morphological analyses of the treebank have been assigned using a novel machine learning method to disambiguate readings given by an existing tool. As the second main contribution, we present the first open source Finnish dependency parser, trained on the newly introduced treebank. The parser achieves a labeled attachment score of 81 %. The treebank data as well as the parsing pipeline are available under an open license at http://bionlp.utu.fi/ .
Related Papers
- → Comparing State-of-the-art Dependency Parsers on the Italian Stanford Dependency Treebank(2016)8 cited
- An Empirical Evaluation of Automatic Conversion from Constituency to Dependency in Hungarian(2014)
- → A preliminary comparison of state-of-the-art dependency parsers on the Italian Stanford Dependency Treebank(2014)3 cited
- → Factors influencing dependency parsing of coordinating structure(2009)1 cited
- → Developing Universal Dependency Treebanks for Magahi and Braj(2022)