The Link grammar parser from CMU is freely available and very robust. It has to be used over C API or the command line interface. In order to integrate tools from heterogeneous platforms, most of my tools communicate each others by Plain Old XML ( POX ) and the Hypertext Transfer Protocol ( HTTP ). Hence I write POX and HTTP
wrapper for the Link grammar parser by using the Link grammar parser,
its Ruby binding and the WEBrick. Moreover the Link grammar parser from
Abiword CVS has been adopted because it provides the script for the GNU Building tools and the pkg-config are convenient.
In order to show the way this service work, HTML script below is written.
The script above will generate HTML page, as shown in figure 1 and it send plain text to parsing service. Then parsing service will return parsing result in XML, which is illustrated in figure 2. For real usage, a web browser is replaced by a program with a xml parser and a http client framework.
Figure 1
Figure 2
Now I have a parser service, which is easy (?) to use. So I will step to look at pdf2text. :-)