Chinese Computing Lab

Site Map

Notice: Undefined offset: 1 in /webhome/cclab/public_html/hitcounter.php on line 19

Warning: A non-numeric value encountered in /webhome/cclab/public_html/menu.php on line 69

中文

VII. Applications of The PolyU Treebank

The fact that the PolyU Treebank provides not only syntactic information but also semantic information of phrases means that it can be applied to a variety of NLP applications. Of course, the most obvious candidate is to train and test an automatic shallow parser [Lu et al. 2003]. Other uses would include its application in Chinese collocation extraction and research into the acquisition of temporal expressions.

In 2003, our team developed an effective statistical, window-based algorithm for extracting Chinese collocations which extracted bigram collocation with a precision of 61% [Xu 2003]. The extraction results included some pseudo-collocations, that is, word combinations that frequently co-occurred but were in fact irrelevant, like the typical problem of ‘doctor-nurse’ in English[Church 1989]. The fact that these pseudo-collocations were statistically significant made it difficult to remove them individually using any statistic-based extraction method. However, given that a Chinese collocation normally occurs only within a phrase, or between the headwords of relevant phrases [Zhang and Lin 1992], we were able to use the syntactic information, i.e. the boundaries and headword of phrases, recorded in the PolyU Treebank to refine the searching context window, eliminate some pseudo-collocations and also retrieve some low-frequency collocations.

The PolyU Treebank is currently used in the acquisition of temporal expressions. This is simply because the Treebank annotated the time phrases (TP) and the additional annotation with more finely-tuned point-of-time (TP-PO) and period-of-time (TP-DU). Such information is very helpful to construct temporal expressions.

<< Current Progress and Future Work

Publications Arising From This Project >>

Last modified on Thu, 11 May 2006 11:54:26 +0800