<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>In the LR business &#187; Natural Language Processing</title>
	<atom:link href="http://blog.panacea-lr.eu/category/natural-language-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.panacea-lr.eu</link>
	<description>a PANACEA blog</description>
	<lastBuildDate>Mon, 15 Nov 2010 09:47:50 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>PANACEA will participate in META-Forum 2010</title>
		<link>http://blog.panacea-lr.eu/2010/10/28/panacea-will-participate-in-meta-forum-2010/</link>
		<comments>http://blog.panacea-lr.eu/2010/10/28/panacea-will-participate-in-meta-forum-2010/#comments</comments>
		<pubDate>Thu, 28 Oct 2010 16:21:12 +0000</pubDate>
		<dc:creator>Núria Bel</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Language Resources]]></category>
		<category><![CDATA[Machine Translation]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://blog.panacea-lr.eu/?p=35</guid>
		<description><![CDATA[PANACEA will soon be participating in META-Forum  2010. This event, which will take place in Brussels, on November 17-18,  2010, is the first edition of the annual conference series organized by  META-NET (http://www.meta-net.eu).
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.panacea-lr.eu">PANACEA</a> will soon be participating in META-Forum  2010. This event, which will take place in Brussels, on November 17-18,  2010, is the first edition of the annual conference series organized by  META-NET (<a href="http://www.meta-net.eu/">http://www.meta-net.eu</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.panacea-lr.eu/2010/10/28/panacea-will-participate-in-meta-forum-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WP8 defines requirements for PANACEA platform</title>
		<link>http://blog.panacea-lr.eu/2010/10/28/wp8-defines-requirements-for-panacea-platform/</link>
		<comments>http://blog.panacea-lr.eu/2010/10/28/wp8-defines-requirements-for-panacea-platform/#comments</comments>
		<pubDate>Thu, 28 Oct 2010 16:06:43 +0000</pubDate>
		<dc:creator>Núria Bel</dc:creator>
				<category><![CDATA[Evaluation]]></category>
		<category><![CDATA[Language Resources]]></category>
		<category><![CDATA[Machine Translation]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://blog.panacea-lr.eu/?p=28</guid>
		<description><![CDATA[Detailed list of the requirements necessary to successfully run the PANACEA platform.  The functional requirements, registry requirements and operational requirements are all listed and elaborated upon. ]]></description>
			<content:encoded><![CDATA[<p>Some requirements for PANACEA Platform are listed below; however, consult the <a href="http://panacea-lr.eu/deliverables/PANACEA_D8_1.pdf"> <span style="color: #2051fc;">first deliverable</span></a> from WP8: <strong>Evaluation in industrial environment,</strong> for a more complete list of requirements.</p>
<p>However, if you consider that there are requirements missing, please let us know and e-mail accompanying information to: info@panacea-lr.eu</p>
<p><strong>1.1.	Functional Requirements</strong><br />
Req-FCT-001: Inspect available services<br />
Req-FCT-002: Run a service<br />
Req-FCT-004: Inspect input/output data<br />
Req-FCT-008: Configure services into workflows<br />
Req-FCT-009: Run workflows<br />
<strong>1.2.	Registry Requirements</strong><br />
Req-FCT-123: Announce a web service<br />
Req-FCT-124: List web services<br />
Req-FCT-125: Search web services<br />
Req-FCT-126: Documentation and annotation of web services<br />
<strong>1.3.	Operational Requirements</strong><br />
Req-FCT-302: Speed / Waiting times<br />
Req-FCT-303: Scalability<br />
Req-FCT-305: Error Handling<br />
Req-FCT-306: Validity Checks</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.panacea-lr.eu/2010/10/28/wp8-defines-requirements-for-panacea-platform/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PANACEA defines its typical use and user</title>
		<link>http://blog.panacea-lr.eu/2010/10/28/panacea-defines-its-typical-use-and-user/</link>
		<comments>http://blog.panacea-lr.eu/2010/10/28/panacea-defines-its-typical-use-and-user/#comments</comments>
		<pubDate>Thu, 28 Oct 2010 15:54:56 +0000</pubDate>
		<dc:creator>Núria Bel</dc:creator>
				<category><![CDATA[Language Resources]]></category>
		<category><![CDATA[Machine Translation]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[Use Cases]]></category>
		<category><![CDATA[Users]]></category>

		<guid isPermaLink="false">http://blog.panacea-lr.eu/?p=17</guid>
		<description><![CDATA[The specific use cases of the factory that PANACEA will offer is mainly directed to NLP Based Application Developers.  WP8 defined and explained specific tasks and comprising activities to fix criteria for final evaluation.]]></description>
			<content:encoded><![CDATA[<p><strong>WP8 defines its typical user which will perform typical use cases in<strong> PANACEA web service platform</strong>.</strong></p>
<p>Typical use cases and operations that PANACEA web services will cover include the following:</p>
<p><strong>Corpus Tasks</strong></p>
<p>•        Build a corpus by web crawling<br />
•        Process a corpus by different services: sentence-segment it, tokenize / lemmatize / tag it<br />
•        Align two parallel texts: on document level, on paragraph level, on sentence level</p>
<p><strong>Dictionary tasks</strong></p>
<p>•        Input a corpus for dictionary extraction (general purpose or domain specific)<br />
•        Submit a corpus for dictionary gap identification<br />
•        Acquire corpora for new / unknown words<br />
•        Enlarge a dictionary merging corpus-extracted information (on entry level), on transfer level and annotation level (additional translations)<br />
•        Trace word occurrences over time (‘word of the day’)</p>
<p><strong>Extraction tasks</strong></p>
<p>•        Send a corpus to extract information items (named entities, or just key terms)<br />
•        Build an “Alerting System” (do texts match the alerting profile?) by intercalating a detecting dictionary gaps service<br />
•        Construct a workflow for “Topic Assignment” by using services for keyword extraction and training a classifier with pre-annotated data.</p>
<p><strong>Translation Tasks</strong></p>
<p>•        Use a crawling system to collect / add corpus data for SMT creation<br />
•        Send a corpus to create a Language Model, for specific language, and / or for specific domain<br />
•        Send a parallel or aligned corpus to create your Translation Model (new language direction, new specific domain)<br />
•	Create / Adapt an (R)MT dictionary [with translations, with linguistic annotations (monolingual, transfer)]</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.panacea-lr.eu/2010/10/28/panacea-defines-its-typical-use-and-user/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Warming up</title>
		<link>http://blog.panacea-lr.eu/2010/05/14/warming-up/</link>
		<comments>http://blog.panacea-lr.eu/2010/05/14/warming-up/#comments</comments>
		<pubDate>Fri, 14 May 2010 16:10:08 +0000</pubDate>
		<dc:creator>Núria Bel</dc:creator>
				<category><![CDATA[Language Resources]]></category>
		<category><![CDATA[Machine Translation]]></category>
		<category><![CDATA[Natural Language Processing]]></category>

		<guid isPermaLink="false">http://www.panacea-lr.eu/panaceahltblog/?p=4</guid>
		<description><![CDATA[There is a shortage of data for a full deployment of Language  Resources and Technologies (LR’s). On the one hand, rule-based methods  on which, for instance, most of the commercial MT systems are based,  have not been able to cover all languages and all domains. On the other  hand, statistically or [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.panacea-lr.eu/wp-content/uploads/2010/05/210820090833.jpg"><img class="size-thumbnail wp-image-11 alignleft" title="21082009083" src="http://blog.panacea-lr.eu/wp-content/uploads/2010/05/210820090833-150x150.jpg" alt="" /></a>There is a shortage of data for a full deployment of Language  Resources and Technologies (LR’s). On the one hand, rule-based methods  on which, for instance, most of the commercial MT systems are based,  have not been able to cover all languages and all domains. On the other  hand, statistically or ML-based mehods that need Language  Resources-data for inducing information encounter also the problem of a  shortage of ready-to-use material for all languages and domains.<br />
In addition to the problems in achieving a full coverage, the use of  existing data is hindered by several factors:</p>
<p>1. Little understanding of the need for standards for the  representation of data, which makes difficult the use of several  sources, and also, crucially, the evaluation of the quality and  particular value of a resource for a given application. Although some  different standards have been proposed for the representation of LR  data, they are considered to be too much research-oriented, not  documented, too abstract and too cumbersome to be implemented, as the  return of the investment is not obvious. Industrials have preferred to  implement standards that are too much driven only by specific needs and  lack a long-term vision.</p>
<p>2. Insufficient documentation of existing resources, which, in  addition, are not maintained in regular terms (update, bug reporting and  coverage enlargement) because they are normally the results of finite  projects. A common, again standard, way of representing and documenting  linguistic information should be devised.</p>
<p>3. The LR market for most of the written resources is rather reduced  and the legal framework is too complex.</p>
<p>These facts seem to be hints of the need for changes in the  behavioral patterns and culture of LR-data consumers and providers.  Given the breadth of the current landscape of LRs, are the changes  needed along the following lines? We should start to discuss about that.</p>
<p>- The market of LR-data has to be rethought/remodelled by introducing  collaborative strategies that overcome the current model based mostly  on purely competitive terms (leading to the non-adherence to standards,  to the repetition of work and efforts, etc). The supply of language  resources is conditioned by traditional business behavioural patterns  and culture that still overprotect products from competition.</p>
<p>- Not only the academy, but also the industry of LR have to undergo a  cultural change and to recognize the high added value of participating  in the creation of common pools of LR-data. Such a change requires  movements of all the stakeholders in unison, the creation of rules and  guidelines for new forms of cooperation, and sharpening a culture of  mutual respect and fairness. Actions towards fostering such a change are  a first priority for the field.</p>
<p>- The coverage problem is of a nature and magnitude such that  strategies approaching or envisaging the full automation of LR-data  production have to be promoted along campaigns for fostering evaluation  in real-life applications. Thus, research can progressively approach the  characteristics of the materials needed by the industry in size and  granularity of the information contained.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.panacea-lr.eu/2010/05/14/warming-up/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

