Skip to main content


eCommons@Cornell >
College of Engineering >
Computer Science >
Computer Science Technical Reports >

Please use this identifier to cite or link to this item:
Title: Unsupervised Statistical Segmentation of Japanese Kanji Strings
Authors: Ando, Rie
Lee, Lillian
Keywords: computer science
technical report
Issue Date: Jul-1999
Publisher: Cornell University
Abstract: Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character $n$-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of both standard and novel error metrics.
Appears in Collections:Computer Science Technical Reports

Files in This Item:

File Description SizeFormat
99-1756.pdf139.98 kBAdobe PDFView/Open
99-1756.ps329.79 kBPostscriptView/Open

Refworks Export

Items in eCommons are protected by copyright, with all rights reserved, unless otherwise indicated.


© 2014 Cornell University Library Contact Us