Chinese Computing Lab
 Site Map 
About CCL
Site News
Projects
PolyU TreeBank
Chunk Bank
Collocation Extraction
ASAB
CERG
Hong Kong Character Glyphs
Jyutping
Dash Line
Publications
Download Area
Contact Information
Useful Links


Warning: A non-numeric value encountered in /webhome/cclab/public_html/menu.php on line 69

PolyU Treebank

 

中文

 

 

 



VII. Applications of The PolyU Treebank

The fact that the PolyU Treebank provides not only syntactic information but also semantic information of phrases means that it can be applied to a variety of NLP applications. Of course, the most obvious candidate is to train and test an automatic shallow parser [Lu et al. 2003]. Other uses would include its application in Chinese collocation extraction and research into the acquisition of temporal expressions.

In 2003, our team developed an effective statistical, window-based algorithm for extracting Chinese collocations which extracted bigram collocation with a precision of 61% [Xu 2003]. The extraction results included some pseudo-collocations, that is, word combinations that frequently co-occurred but were in fact irrelevant, like the typical problem of ‘doctor-nurse’ in English[Church 1989]. The fact that these pseudo-collocations were statistically significant made it difficult to remove them individually using any statistic-based extraction method. However, given that a Chinese collocation normally occurs only within a phrase, or between the headwords of relevant phrases [Zhang and Lin 1992], we were able to use the syntactic information, i.e. the boundaries and headword of phrases, recorded in the PolyU Treebank to refine the searching context window, eliminate some pseudo-collocations and also retrieve some low-frequency collocations.

The PolyU Treebank is currently used in the acquisition of temporal expressions. This is simply because the Treebank annotated the time phrases (TP) and the additional annotation with more finely-tuned point-of-time (TP-PO) and period-of-time (TP-DU). Such information is very helpful to construct temporal expressions.

 

<< Current Progress and Future Work         Publications Arising From This Project >>

 

Last modified on Thu, 11 May 2006 11:54:26 +0800
THE HONG KONG POLYTECHNIC UNIVERSITY