On the use of words and n-grams for Chinese information retrieval
2000pp. 141–148
Citations Over TimeTop 10% of 2000 papers
Abstract
In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes. Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes lead to comparable IR performances. In this study, we carry out more experiments on different ways to segment documents and queries, and to combine words with n-grams. Our experiments show that a combination of the longest-matching algorithm with single characters is the best choice.
Related Papers
- → Standard and optimized carry trades(2018)1 cited
- → Useful branched surfaces which carry nothing(2000)3 cited
- → Mathematical Approaches to Carry‐Over(2002)
- Rapid Carry Adder Based on Carry Forecast Between Groups(2011)
- → Currency Carry(2011)