PANACEA defines its typical use and user

Categories:  Language Resources, Machine Translation, Natural Language Processing, Use Cases, Users

WP8 defines its typical user which will perform typical use cases in PANACEA web service platform.

Typical use cases and operations that PANACEA web services will cover include the following:

Corpus Tasks

• Build a corpus by web crawling
• Process a corpus by different services: sentence-segment it, tokenize / lemmatize / tag it
• Align two parallel texts: on document level, on paragraph level, on sentence level

Dictionary tasks

• Input a corpus for dictionary extraction (general purpose or domain specific)
• Submit a corpus for dictionary gap identification
• Acquire corpora for new / unknown words
• Enlarge a dictionary merging corpus-extracted information (on entry level), on transfer level and annotation level (additional translations)
• Trace word occurrences over time (‘word of the day’)

Extraction tasks

• Send a corpus to extract information items (named entities, or just key terms)
• Build an “Alerting System” (do texts match the alerting profile?) by intercalating a detecting dictionary gaps service
• Construct a workflow for “Topic Assignment” by using services for keyword extraction and training a classifier with pre-annotated data.

Translation Tasks

• Use a crawling system to collect / add corpus data for SMT creation
• Send a corpus to create a Language Model, for specific language, and / or for specific domain
• Send a parallel or aligned corpus to create your Translation Model (new language direction, new specific domain)
• Create / Adapt an (R)MT dictionary [with translations, with linguistic annotations (monolingual, transfer)]

Leave a Comment