Can we replace curation with information extraction software?
Citations Over TimeTop 14% of 2016 papers
Abstract
Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL.
Related Papers
- → Big data curation framework: Curation actions and challenges(2022)24 cited
- → Understanding the value of curation: A survey of researcher perspectives of data curation services from six US institutions(2023)14 cited
- → Crowd-sourcing and author submission as alternatives to professional curation(2016)18 cited
- → Role definition of STI agencies in data curation(2016)2 cited
- Susquehanna Chorale Spring Concert "Roots and Wings"(2017)