Next:
List of Figures
Up:
Automatical Lexicon Extraction from
Previous:
Automatical Lexicon Extraction from
Contents
Contents
List of Figures
Introduction
Scope of the Bilingual Lexicon Extraction Project
Why Corpus Linguistics in Bilingual Lexicon Creation?
An Overview of this Thesis
Methods of Bilingual Lexicon Extraction
Introduction to Automatical Lexicon Extraction
Translating Collocations - Champollion
Using Word Alignments - Termight
The Termight System
Char_align
Acquisition of Bilingual Terminology
The K-vec Method
Automatic Lexicon Evaluation
Other Approaches and Techniques
BICORD
DK-vec
Bitext Mapping - SIMR
Other Methods
Lexicon Architecture
Norms, Guidelines and Recommendations
MULTILEX
GENELEX
COMLEX
MULTRA Lexicon Architecture
The Scania Project
Corpus Characteristics
Converting to SGML
Sentence Alignment
Language Restriction
Project Outlook
Lexicon Extraction from the Scania Corpus
Tokenization
Cleaning the Database
Creating a Basic Dictionary File
Searching the Dictionary
Looking for Similarities
Four Simple Similarity Measures
Combining Similarities Scores
Length and Word Order
Quality of Similarity Scores
Evaluation of Dictionary Search Results
Cascading the Process
Counting Frequencies
Word Frequencies
Word Pair Frequencies
Finding Correlations
Low Frequent Words
Looking for Compounds
Using Part-of-Speech Tags
Evaluating the Dictionaries
Lemmatizing
Multiple Translations
Why consider Multiple Translations?
Filtering with the Dice Coefficient
Length Filter
Similarity Filter
Combined and other Techniques
Function Words
Experimental Results
Cleaning the Database
Dictionary Search Results
Using Similarity Measures
Evaluation of Low Frequency Tokens
Evaluation of the Dice Coefficient
Other Results
Summary and Conclusion
Summary of the Work
Concluding Remarks
Future Work
Symbols and Terminology
Sample for a Cleaning Process
Scania Sentence Alignments
Cleaned Alignments
Modifications and Removed Data
Theses
Bibliography
Jörg Tiedemann
2000-09-07