Monday, November 14, 2005

The client of the English parser service

According to the English parsing service over POX and HTTP that was mentioned in this blog. It seems to miss an important example that is example of client wriiting. So it will be shown here.




require 'net/http'
require 'uri'

res = Net::HTTP.post_form(URI.parse('http://127.0.0.1:8030/parse/'),
{'txt'=>'I love you.'})
print res.body




We can just use standard HTTP lib of any languages. Then the parsing result in XML can be obtained, as shown below.




<result>
<constituent label="S" >
....
....
....
</constituent>
</result>




Easy, isn't it?

English parsing service over POX and HTTP

The Link grammar parser from CMU is freely available and very robust. It has to be used over C API or the command line interface. In order to integrate tools from heterogeneous platforms, most of my tools communicate each others by Plain Old XML ( POX ) and the Hypertext Transfer Protocol ( HTTP ). Hence I write POX and HTTP wrapper for the Link grammar parser by using the Link grammar parser, its Ruby binding and the WEBrick. Moreover the Link grammar parser from Abiword CVS has been adopted because it provides the script for the GNU Building tools and the pkg-config are convenient.



In order to show the way this service work, HTML script below is written.



form1



The script above will generate HTML page, as shown in figure 1 and it send plain text to parsing service. Then parsing service will return parsing result in XML, which is illustrated in figure 2. For real usage, a web browser is replaced by a program with a xml parser and a http client framework.




web1

Figure 1




web2

Figure 2




Now I have a parser service, which is easy (?) to use. So I will step to look at pdf2text. :-)

Saturday, November 12, 2005

Ruby/Link Grammar Binding

This blog entry is out of date. Please visit http://rubyforge.org/projects/linkgrammar4r/ instead.

Since I want to use link grammar parser for some particular tasks and I'm using Ruby language. There is no existing Ruby binding for Link Grammar Parser. Hence I wrote one. It is incomplete but it may be useful for someone who want to make this binding but don't want to start from scratch.

http://www.geocities.com/veetai/ruby-linkgrammar-20051111.tar.gz

The example of using link grammar parsing in Ruby is as follow:

require 'linkgrammar'

dict = LinkGrammar::Dictionary.new("4.0.dict", nil, nil, nil)
sent = LinkGrammar::Sentence.new('I love you.', dict)
opts = LinkGrammar::ParseOptions.new
sent.parse(opts)
linkage = LinkGrammar::Linkage.new(0, sent, opts)
words = linkage.get_words
words.each{|w|
print "w = #{w}\n"
}
cnode = linkage.constituent_tree
print "Root node label = #{cnode.label}\n"



The result:

w = LEFT-WALL
w = I.p
w = love.v
w = you.[?].n
w = RIGHT-WALL
Root node label = S
Freeing dictionary 4.0.dict


P.S. I probably post the update later if I found that this binding is not enough for my tasks :-)

Sunday, August 21, 2005

Phrase Structure Tree Drawer for paper writting




















Tree is defined in XML.













And it was exported to PDF :-)

source code

Converting NSTextStorage to Ruby string for non ASCII

text_storage.string.UTF8String.to_s

Tuesday, August 09, 2005

Text Break: Coding

Aim of Text Break project is to make a universal compact and extensible word segmentation engine for any language.
The main idea is the assumption that every word segmentation software is based on something similar appoach that is
searching shortest path on directed acyclic graph which each node represents a character. However the variant is how
weighted edges are added into graph. In term of extensible, Text Break should be adapted to new langauges and also
new techniques for supported languages. Finally, I start coding it now :-) http://svn.gna.org/viewcvs/textbreak/trunk/textbreak/

Sunday, July 31, 2005

Free Jabber-RPC

Since Jabber-RPC implementation in Java cannot be found
( If it exists, please inform me. ) so I write one and named it
Free Jabber-RPC. It is still in very early experimental state
and only client side is implemented. I steal some codes from Apache XML-RPC Framework and
use Smack heavily but Smack and Apache XML-RPC have different license so I try to keep
every thing that were imported from Apache XML-RPC in Apache Lisence and
lisence new source code as GPL. ( Should I do this? )
I just tested it by sending simple method request to JabberXMLRPC in Python.


Obtain Free Jabber-RPC 20050731

Thursday, July 07, 2005

Shopping Music Online: Mixiclub.com

Mixiclub.com is a website of RS Promotion which is a port for selling ( Thai ) songs online. I conceptually appreciate this service since I can choose to buy a song not an album. Moreover it is quite reasonable price ( 15 Bath for 1 song ). However musics are distributed in WMA format which I need window media player to
listen to music. It is absolutely inconvenient. Is they any GNU/Linux user friendly service availible?

Monday, June 06, 2005

NO, I'm not SCV.

Starcraft can teach people about management. A manager should maximize resource utilization. Anyways, I 'm not marine, not even SCV. Sometimes I think people should play football manager game instead because football manager game is not as real as Starcraft is. :-P

P.S. Just for practising my extremely *POOR* english. Could you do me a favor? Please suggest me about English. Please tell me where my writting is poor. I want to improve myself.

Monday, April 18, 2005

The Attempt to Port RubyCocoa to GNUstep

RubyCocoa seems to be more functional than RIGS. IMHO, to unify Ruby -> GNUstep/Cocoa interface would allow us to write portable GNUstep/Cocoa based program. After trying to build RubyCocoa on GNUstep, I found that there is not NSNetService
and NSDistantObjectRequest on GNUstep. I don't know why they does not exist. By this result, I think I should learn more about about NSProxy. ;-)

Friday, April 08, 2005

Why do we should use Theora instead of *MPEG4* ?

It was painful when we can't read or procude some document or audio or video in format that some people want. MS .doc is one of the example we don't exactly how to encode and decode .doc . Thus, someone try to use open standard for file format. Unfortunately, some open standards suck coz we can't use it freely. We may have to buy its document. However to buy document is not the most terrible. Some open standards are patent which means that we can't not use it freely without permission of its owner. By patent, free/open source software developers in some part of the world can't develope their encoders/decoders and distribute them freely.

MPEG4 is patent technology, isn't it? Is there any of its owners allow us to use MPEG4? IMHO, we should recognise other format that we are allowed to use - Theora ,for instance.