WP8 defines its typical user which will perform typical use cases in PANACEA web service platform.
Typical use cases and operations that PANACEA web services will cover include the following:
Corpus Tasks
• Build a corpus by web crawling
• Process a corpus by different services: sentence-segment it, tokenize / lemmatize / tag it
• Align two parallel texts: on document level, on paragraph level, on sentence level
Dictionary tasks
• Input a corpus for dictionary extraction (general purpose or domain specific)
• Submit a corpus for dictionary gap identification
• Acquire corpora for new / unknown words
• Enlarge a dictionary merging corpus-extracted information (on entry level), on transfer level and annotation level (additional translations)
• Trace word occurrences over time (‘word of the day’)
Extraction tasks
• Send a corpus to extract information items (named entities, or just key terms)
• Build an “Alerting System” (do texts match the alerting profile?) by intercalating a detecting dictionary gaps service
• Construct a workflow for “Topic Assignment” by using services for keyword extraction and training a classifier with pre-annotated data.
Translation Tasks
• Use a crawling system to collect / add corpus data for SMT creation
• Send a corpus to create a Language Model, for specific language, and / or for specific domain
• Send a parallel or aligned corpus to create your Translation Model (new language direction, new specific domain)
• Create / Adapt an (R)MT dictionary [with translations, with linguistic annotations (monolingual, transfer)]
