中文
V. Working team, schedule and quality control:
Our research team is formed by four people at the Hong Kong Polytechnic University, two linguists from Beijing Language and Culture University and some research collaborators from Peking University. Furthermore, the annotation work has been conducted by four post-graduate students in language studies and computational linguistics from the Beijing Language and Culture University.
The annotation work is conducted in 5 separate stages to ensure quality output of the annotation work. The preparation of annotation specification and corpus selection was done in the first stage. Researchers in Hong Kong invited two linguists from China to come to Hong Kong to prepare for the corpus collection and selection work. A thorough study on the reported work in this area was conducted. After the project scope was defined, the SS labels and the FF labels were then defined. A treebank specification was then documented. The treebank was given the name PolyU Treebank to indicate that it is produced at the Hong Kong Polytechnic University. In order to validate the specifications drafted, all the six members first manually annotated 10k-word material, separately. The outputs were then compared, and the problems and ambiguities occurred were discussed and consolidated and named Version 1.0. Stage 1 took about 5 months to complete.
In Stage 2, the annotators in Beijing were then involved. They had to first study the specification and understand the requirement of the annotation. Then, the annotators under the supervision of a team member in Stage 1 annotated 20k-word materials together and discussed the problems occurred. During this two-month work, the annotators were trained to understand the specification. The emphasis at this stage was to train the annotators’ good understanding of the specification as well as consistency by each annotator and consistency by different annotators. Further problems occurred in the actual annotation practice were then solved and the specification was also further refined or modified.
In Stage 3, which took about 2 months, each annotator was assigned 40k-word material each in which 5k-words material were duplicate annotated to all the annotators. Meanwhile, the team members in Hong Kong also developed a post-annotation checking tool to verify the annotation format, phrase bracketing, annotation tags, and phrase marks to remove ambiguities and mistakes. Furthermore, an evaluation tool was built to check the consistency of annotation output. The detected annotation errors were then sent back to the annotators for discussion and correction. Any further problems occurred were submitted for group discussion and minor modification on the specification was also done.
In Stage 4, each annotator was dispatched with one set of 50k-word material each time. For each distribution, 15k-word data in each set were distributed to more than two annotators in duplicates so that for any three annotators, there would be 5K duplicated materials. When the annotators finished the first pass annotation, we used the post-annotation checking tool to do format checking in order to remove the obvious annotation errors such as wrong tag annotation and cross bracketing. However, it was quite difficult to check the difference in annotation due to different interpretation of a sentence. What we did was to make use of the annotations done on the duplicate materials to compare for consistency. When ambiguity or differences were identified, discussions were conducted and a result used by the majority would be chosen as the accepted result. The re-annotated results were regarded as the Golden Standard to evaluate the accuracy of annotation and consistency between different annotators. The annotators were required to study this Golden Standard and go back to remove similar mistakes. The annotated 50k data was accepted only after this. Then, a new 50k-word materials was distributed and repeated in the same way. During this stage, the ambiguous and out-of-tag-set phrase structures were marked as OT for further process. The annotation specification was not modified in order to avoid frequent revisit to already annotated data. About 4 months were spent on this stage.
In Stage 5, all the members and annotators were grouped and discuss the OT cases. Some typical new phrase structure and function types were appended in the specification and thus the final formal annotation specification was established. Using this final specification, the annotators had to go back to check their output, modify the mistakes and substitute the OT tags by the agreed tags. Currently, the project was already in Stage 5 with 2 months of work finished. A further 2 months was expected to complete this work.
Since it is impossible to do all the checking and analysis manually, a series of checking and evaluating tools are established. One of the tools is to check the consistency between text corpus files and annotated XML files including checking the XML format, the filled XML header, and whether the original txt material is being altered by accident. This program ensures that the XML header information is correctly filled and during annotation process, no additional mistakes are introduced due to typing errors.
Furthermore, we have developed and trained a shallow parser using the Golden Standard data. This shallow parser is performed on the original text data, and its output and manually annotated result are compared for verification to further remove errors.
We developed several effective analyzing tools to evaluate the accuracy and consistency for the whole annotated corpus. First we check if all the bracketing formats are correct. For the exactly matched bracketed phrases, we further check whether the same phrase labels are given. Abnormal cases will be manually checked and confirmed. Our final goal is to ensure the bracketing can reach 99% accuracy and consistency. More tools on checking the consistency are under developing.
|