Aim of Text Break project is to make a universal compact and extensible word segmentation engine for any language.
The main idea is the assumption that every word segmentation software is based on something similar appoach that is
searching shortest path on directed acyclic graph which each node represents a character. However the variant is how
weighted edges are added into graph. In term of extensible, Text Break should be adapted to new langauges and also
new techniques for supported languages. Finally, I start coding it now :-) http://svn.gna.org/viewcvs/textbreak/trunk/textbreak/
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment