Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
Citations Over TimeTop 25% of 2018 papers
Abstract
We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements. For normalizing the mentions into unique identifiers we use fuzzy character n-gram matching. The normalization approach has also been improved with a better abbreviation resolution method and stricter guideline compliance resulting in vastly improved results for various entity types. All tools and models used for both named entity recognition and normalization are publicly available under open license.Database URL: https://github.com/TurkuNLP/BioCreativeVI_BioID_assignment.
Related Papers
- → Two-phase biomedical named entity recognition using CRFs(2009)81 cited
- → Biomedical Named Entity Recognition Based on Skip-Chain CRFS(2012)22 cited
- Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields Models(2006)
- Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models(2008)
- → Conditional Random Fields for Biomedical Named Entity Recognition Revisited(2020)