|
eCommons@Cornell >
College of Engineering >
Computer Science >
Computer Science Technical Reports >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1813/7410
| Title: | Unsupervised Statistical Segmentation of Japanese Kanji Strings |
| Authors: | Ando, Rie Lee, Lillian |
| Keywords: | computer science technical report |
| Issue Date: | Jul-1999 |
| Publisher: | Cornell University |
| Citation: | http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR99-1756 |
| Abstract: | Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character $n$-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of both standard and novel error metrics. |
| URI: | http://hdl.handle.net/1813/7410 |
| Appears in Collections: | Computer Science Technical Reports
|
Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.
|