<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for In the LR business</title>
	<atom:link href="http://blog.panacea-lr.eu/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.panacea-lr.eu</link>
	<description>a PANACEA blog</description>
	<lastBuildDate>Mon, 07 Jun 2010 15:33:24 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Warming up by Luca Dini</title>
		<link>http://blog.panacea-lr.eu/2010/05/14/warming-up/comment-page-1/#comment-5</link>
		<dc:creator>Luca Dini</dc:creator>
		<pubDate>Mon, 07 Jun 2010 15:33:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.panacea-lr.eu/panaceahltblog/?p=4#comment-5</guid>
		<description>Just a short comment relating on standards and the &quot;warm up&quot; sentence:
&quot;The market of LR-data has to be rethought/remodelled by introducing collaborative strategies that overcome the current model based mostly on purely competitive terms (leading to the non-adherence to standards, to the repetition of work and efforts, etc).&quot;

I have the impression that competition is still a crucial issues, and that, in a sense, lack of real competition is obstaculating the raise of standards. There is the tendency to consider standard a little bit like ISO-9000 certification: if a big customer or more customers are asking for it. it might become worth (but only if my my direct competitors are already qualified), otherwise, why should I? In my view competition (real and sane competition) would enforce standards, assuming the following lines are kept as directive:
1) Availability of tools/utilities based on standards. This is what Nuria hinted at, and UTF-8 is a good example: I would be surprised if some language industry is considering now different format. But: can we proceed a little bit further than character encoding? :-)
2) Availability of industry oriented benchmark or evaluation suite: this will allow to assess in an objective way the quality of an industrial resources and would at the same time foster competitivity and push towards the adoption of standards. Unfortunately these kind of benchmark should be more industry oriented then current ones. Just to make an examples: you cannot assume that in order to evaluate dependency parsing you start from fully disambiguated input (as it happens most of the cases), as this is not a real world situation, although one much more interesting for a scientific evaluation. But of course this is again a egg and chicken problem: they should be produced mostly by academy in order to stay vendor neuter, but from an academy point of view they are less interesting than more in-lab evaluation.

3) I think that customer maturity is crucial in the enforcement of standard. The fact is that in many cases NLP applications are kind of standalone, or obscurely integrated into solutions. So on average customers do not ask the vendor to comply specific standards, simply because they do not see an advantage on this. On this respect it would be interesting to work on compliance from the point of view of consumer applications. Just to make a short example: Lucene is one of the more used search engine, and SOLR is becoming a standard for facet search (aka &quot;semantic serach :-) ). They (i.e. apache Group) would be in the position to enforce stardard: but they need to be convinced.</description>
		<content:encoded><![CDATA[<p>Just a short comment relating on standards and the &#8220;warm up&#8221; sentence:<br />
&#8220;The market of LR-data has to be rethought/remodelled by introducing collaborative strategies that overcome the current model based mostly on purely competitive terms (leading to the non-adherence to standards, to the repetition of work and efforts, etc).&#8221;</p>
<p>I have the impression that competition is still a crucial issues, and that, in a sense, lack of real competition is obstaculating the raise of standards. There is the tendency to consider standard a little bit like ISO-9000 certification: if a big customer or more customers are asking for it. it might become worth (but only if my my direct competitors are already qualified), otherwise, why should I? In my view competition (real and sane competition) would enforce standards, assuming the following lines are kept as directive:<br />
1) Availability of tools/utilities based on standards. This is what Nuria hinted at, and UTF-8 is a good example: I would be surprised if some language industry is considering now different format. But: can we proceed a little bit further than character encoding? <img src='http://blog.panacea-lr.eu/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
2) Availability of industry oriented benchmark or evaluation suite: this will allow to assess in an objective way the quality of an industrial resources and would at the same time foster competitivity and push towards the adoption of standards. Unfortunately these kind of benchmark should be more industry oriented then current ones. Just to make an examples: you cannot assume that in order to evaluate dependency parsing you start from fully disambiguated input (as it happens most of the cases), as this is not a real world situation, although one much more interesting for a scientific evaluation. But of course this is again a egg and chicken problem: they should be produced mostly by academy in order to stay vendor neuter, but from an academy point of view they are less interesting than more in-lab evaluation.</p>
<p>3) I think that customer maturity is crucial in the enforcement of standard. The fact is that in many cases NLP applications are kind of standalone, or obscurely integrated into solutions. So on average customers do not ask the vendor to comply specific standards, simply because they do not see an advantage on this. On this respect it would be interesting to work on compliance from the point of view of consumer applications. Just to make a short example: Lucene is one of the more used search engine, and SOLR is becoming a standard for facet search (aka &#8220;semantic serach <img src='http://blog.panacea-lr.eu/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ). They (i.e. apache Group) would be in the position to enforce stardard: but they need to be convinced.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Warming up by Nuria</title>
		<link>http://blog.panacea-lr.eu/2010/05/14/warming-up/comment-page-1/#comment-4</link>
		<dc:creator>Nuria</dc:creator>
		<pubDate>Tue, 01 Jun 2010 16:37:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.panacea-lr.eu/panaceahltblog/?p=4#comment-4</guid>
		<description>Yes, this is a circle  unless a clear benefit motivates the change. Then, which of these two possible benefits would be more convincing for you as a clear benefit for devoting time/resources to map your resources to one standard?

A	the possibility of using services that save time/resources, i.e. to clean a corpus of duplicates, or to derive a list of out of vocabulary or missing words will motivate the conversion of a corpus into UTF-8, or into a XCES format.
B	the possibility of entering a pool of resources (type TAUS) where you put your own resources but you have the right to use the resources of the others that cover other languages and/or a variety of domains.</description>
		<content:encoded><![CDATA[<p>Yes, this is a circle  unless a clear benefit motivates the change. Then, which of these two possible benefits would be more convincing for you as a clear benefit for devoting time/resources to map your resources to one standard?</p>
<p>A	the possibility of using services that save time/resources, i.e. to clean a corpus of duplicates, or to derive a list of out of vocabulary or missing words will motivate the conversion of a corpus into UTF-8, or into a XCES format.<br />
B	the possibility of entering a pool of resources (type TAUS) where you put your own resources but you have the right to use the resources of the others that cover other languages and/or a variety of domains.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Warming up by Juan Alberto Alonso</title>
		<link>http://blog.panacea-lr.eu/2010/05/14/warming-up/comment-page-1/#comment-3</link>
		<dc:creator>Juan Alberto Alonso</dc:creator>
		<pubDate>Tue, 01 Jun 2010 14:52:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.panacea-lr.eu/panaceahltblog/?p=4#comment-3</guid>
		<description>I just wanted to shortly comment on the adoption and use of one specific LR standard by the industry. In my opinion this is a kind of vicious circle: on the one hand, no commercial company would ever invest the huge effort involved in exporting all its LRs (leave apart purely commercial factors) unless it is dead sure that this standard is going to be THE standard used by everybody. On the other hand, experience and evidence show that potential standards (and even &quot;unexpected&quot; standards) only reach the category of actual &quot;universal&quot; standards by the sheer fact that companies/organisms/universities/etc DO USE them. So, unless it is THE standard we won&#039;t use it, and unless we DO USE it, it won&#039;t be THE standard. 
The question (well, one of them...) is how can we move away from this circle.</description>
		<content:encoded><![CDATA[<p>I just wanted to shortly comment on the adoption and use of one specific LR standard by the industry. In my opinion this is a kind of vicious circle: on the one hand, no commercial company would ever invest the huge effort involved in exporting all its LRs (leave apart purely commercial factors) unless it is dead sure that this standard is going to be THE standard used by everybody. On the other hand, experience and evidence show that potential standards (and even &#8220;unexpected&#8221; standards) only reach the category of actual &#8220;universal&#8221; standards by the sheer fact that companies/organisms/universities/etc DO USE them. So, unless it is THE standard we won&#8217;t use it, and unless we DO USE it, it won&#8217;t be THE standard.<br />
The question (well, one of them&#8230;) is how can we move away from this circle.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Warming up by Nuria</title>
		<link>http://blog.panacea-lr.eu/2010/05/14/warming-up/comment-page-1/#comment-2</link>
		<dc:creator>Nuria</dc:creator>
		<pubDate>Tue, 01 Jun 2010 09:50:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.panacea-lr.eu/panaceahltblog/?p=4#comment-2</guid>
		<description>You can add coments by clicking the Comment field.</description>
		<content:encoded><![CDATA[<p>You can add coments by clicking the Comment field.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

