General
Home
SourceForge
User management
Login
Lost Password
Register now!
Corpus management
Public corpora
CWB Query
Task Management
Main
- pre-processing
- tagger
- parser/chunker
- sentence aligner
- word aligner
Documentation/Links
F.A.Q.
Status
Fri Jun 11 13:05:25 2004
joerg@stp.ling.uu.se
 

The Uplug home page


Uplug is a collection of tools for linguistic corpus processing, word alignment and term extraction from parallel corpora. It includes two main components:

  • Corpus Manager - Monolingual and bilingual corpora can be added to your personal repository. The corpus manager includes tools for updating the repository and inspecting corpus data in your collection.
  • Task Manager - The task manager allows to run applications on registered corpora. Several tools are integrated which can be used to process monolingual and bilingual corpora. Jobs are queued on the local system and results will be send by mail and added to the personal data collection.
Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, tokenizer and external part-of-speech tagger and shallow parsers. The following external tools are used: The TreeTagger for English, French, Italian, and German, the TnT tagger for English, German and Swedish, The Grok system for English (tagging and chunking), and the morphological analyzer ChaSen for Japanese. Translated documents can be sentence aligned using the length-based approach by Gale&Church. Words and phrases can be aligned using the clue alignment approach and the toolbox for statistical machine translation GIZA++.

Publications

Tiedemann, J. 2003,
Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing,
Doctoral Thesis, Studia Linguistica Upsaliensia 1, ISSN 1652-1366, ISBN 91-554-5815-7
[pdf, 2.1MB] [html] [errata, pdf]
Tiedemann,J. 2003,
Combining Clues for Word Alignment. In Proceedings of the 10th Conference of the European Chapter of the ACL (EACL03) Budapest, Hungary, April 12-17, 2003
[pdf, 90 kB] [ps, 93 kB]
Ahrenberg, Lars, Merkel, Magnus, SÄgvall Hein, A., Tiedemann, J., 2000.
Evaluation of Word Alignment Systems. In Proceedings of LREC 2000, Athens/Greece.
[pdf, 406kB] [ps, 757kB] [gzipped ps, 236kB]