<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-9058677219636510204</id><updated>2011-12-22T14:01:42.364-08:00</updated><category term='Solr 1.4'/><category term='Social network'/><category term='Inverted index'/><category term='Apache Solr'/><category term='Lucene'/><category term='Prague'/><category term='Facebook'/><category term='Apache Lucene'/><category term='Mac OS X Snow Leopard'/><category term='Google'/><category term='Apache Lucene EuroCon 2010'/><category term='Information retrieval'/><title type='text'>The Unreasonable Effectiveness of Data</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>10</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-8260395513190376306</id><published>2011-11-06T13:32:00.000-08:00</published><updated>2011-11-06T13:32:21.693-08:00</updated><title type='text'>Comments are not code</title><content type='html'>&lt;div style="text-align: right;"&gt;&lt;span class="zemanta-img separator zemanta-action-dragged" style="clear: right;"&gt;&lt;a href="http://www.flickr.com/photos/99247795@N00/4299152140" style="clear: right; display: block; float: right; margin-left: 1em; margin-right: 1em;"&gt;&lt;img alt="Code review" height="200" src="http://farm5.static.flickr.com/4038/4299152140_9a860a1ac1_m.jpg" style="border: medium none; font-size: 0.8em;" width="188" /&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution" style="clear: both; float: right; margin-left: 1em; margin-right: 1em; width: 227px;"&gt;Image by &lt;a href="http://www.flickr.com/photos/99247795@N00/4299152140"&gt;Richard Masoner / Cyclelicious&lt;/a&gt; via Flickr&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;I'm a firm believer that the &lt;a href="http://wtfcode.net/post/186795315/good-code-is-its-own-best-documentation"&gt;best software documentation is the running code&lt;/a&gt;. If the code is well structured and written, it speaks for itself and it does not need any additional documentation. Comments are not code and therefore should not be used where better code organization would suffice.&lt;br /&gt;&lt;br /&gt;A misplaced use of comments that I often see while doing code reviews is to use comments to divide a method into logical subunits. For example:&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;def &lt;span class="nf"&gt;check_specific_candidate&lt;/span&gt;():&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; # &lt;span class="c"&gt;first check if we already have X by any chance&lt;/span&gt;&lt;/div&gt;&lt;span class="c"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;lt; 10 lines of code, return if true&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; # Try out if candidate is Y&lt;/span&gt;&lt;br /&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;lt; 30 lines of code, return if true&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt; &lt;/span&gt;&lt;span class="c"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span class="c"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span class="c"&gt;# candidate is not Y, try out if it is Z&lt;/span&gt;&lt;/div&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;lt; another 30 lines of code, return if true&amp;gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; # construct a list of elements in the candidate&lt;/span&gt;&lt;br /&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt; another 30 lines of code&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if len(list_of_elements) &amp;gt; 0:&lt;/span&gt;&lt;br /&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # process list of elements for the candidate&lt;/span&gt;&lt;br /&gt;&lt;span class="c" style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt; another 10 lines of code&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: inherit;"&gt;&lt;span class="c"&gt;This example is based on actual routine in Zemanta &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Codebase" rel="wikipedia" title="Codebase"&gt;code base&lt;/a&gt; that is altogehter 140 lines long. Supporting such code is not a nice experience. While comments in this routine do help, they are actually a symptom of a larger problem, i.e. poor code organization.&lt;/span&gt;&lt;span class="c"&gt;&lt;/span&gt;&lt;span class="c"&gt; Comments would immediately become redundant, if this routine would be split into logical steps with each step being a separate routine. Let's &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Code_refactoring" rel="wikipedia" title="Code refactoring"&gt;refactor&lt;/a&gt; the above routine as such:&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;span class="c"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="c"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;def &lt;span class="nf"&gt;check_specific_candidate&lt;/span&gt;(candidate):&lt;/div&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; if _candidate_has_X(candidate):&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt; &lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _candidate_is_Y(candidate):&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _candidate_is_Z(candidate):&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; list_of_elements = _get_list_of_elements(candidate)&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if len(list_of_elements) &amp;gt; 0:&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _process_list_of_elements(list_of_elements)&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: inherit;"&gt;So instead of using comments, this routine is now documented using method names. When you approach such code for the first time, seeing such nice 15-lines long routine is much less stressful than seeing a 140-lines long monster.&amp;nbsp;&lt;/div&gt;&lt;div class="zemanta-related"&gt;&lt;h6 class="zemanta-related-title" style="font-size: 1em; margin: 1em 0pt 0pt;"&gt;Related articles&lt;/h6&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://blog.summify.com/2011/09/21/code-reviews-a-framework-for-startups/"&gt;Code Reviews: A Framework for Startups&lt;/a&gt; (summify.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://nigelb.me/ubuntu/mozilla/2011/10/30/please-nitpick.html"&gt;Nigel Babu: Please nitpick&lt;/a&gt; (nigelb.me)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.bureau14.fr/blogea/index.php/2011/09/on-quality-and-code-size/"&gt;On quality and code size&lt;/a&gt; (bureau14.fr)&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="zemanta-pixie" style="height: 15px; margin-top: 10px;"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img alt="Enhanced by Zemanta" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=1ca657a6-4168-4543-b101-fbfa7327ce46" style="border: medium none; float: right;" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-8260395513190376306?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/8260395513190376306/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2011/11/comments-are-not-code.html#comment-form' title='Št. komentarjev: 1'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/8260395513190376306'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/8260395513190376306'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2011/11/comments-are-not-code.html' title='Comments are not code'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm5.static.flickr.com/4038/4299152140_9a860a1ac1_t.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-8348253677944831143</id><published>2011-08-22T14:41:00.000-07:00</published><updated>2011-08-22T14:41:23.657-07:00</updated><title type='text'>#sigir2011</title><content type='html'>&lt;div style="font-family: inherit;"&gt;&lt;span style="font-size: small;"&gt;It's more than a bit ironic that a premiere &lt;a href="http://www.sigir2011.org/"&gt;conference on information retrieval&lt;/a&gt; took place behind the &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Golden_Shield_Project" rel="wikipedia" title="Golden Shield Project"&gt;Great Firewall&lt;/a&gt; and consequently without discussions on &lt;a class="zem_slink" href="http://twitter.com/" rel="homepage" title="Twitter"&gt;Twitter&lt;/a&gt;. But Chinese have become also great scientist and are not just cheap labor anymore, so I guess they have well deserved to host this event.&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: inherit;"&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="font-family: inherit;"&gt;&lt;style&gt;&lt;!-- /* Font Definitions */@font-face	{font-family:"ＭＳ 明朝";	panose-1:0 0 0 0 0 0 0 0 0 0;	mso-font-charset:128;	mso-generic-font-family:roman;	mso-font-format:other;	mso-font-pitch:fixed;	mso-font-signature:1 134676480 16 0 131072 0;}@font-face	{font-family:"ＭＳ 明朝";	panose-1:0 0 0 0 0 0 0 0 0 0;	mso-font-charset:128;	mso-generic-font-family:roman;	mso-font-format:other;	mso-font-pitch:fixed;	mso-font-signature:1 134676480 16 0 131072 0;}@font-face	{font-family:Cambria;	panose-1:2 4 5 3 5 4 6 3 2 4;	mso-font-charset:0;	mso-generic-font-family:auto;	mso-font-pitch:variable;	mso-font-signature:3 0 0 0 1 0;} /* Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal	{mso-style-unhide:no;	mso-style-qformat:yes;	mso-style-parent:"";	margin-top:0cm;	margin-right:0cm;	margin-bottom:10.0pt;	margin-left:0cm;	mso-pagination:widow-orphan;	font-size:12.0pt;	font-family:Cambria;	mso-ascii-font-family:Cambria;	mso-ascii-theme-font:minor-latin;	mso-fareast-font-family:"ＭＳ 明朝";	mso-fareast-theme-font:minor-fareast;	mso-hansi-font-family:Cambria;	mso-hansi-theme-font:minor-latin;	mso-bidi-font-family:"Times New Roman";	mso-bidi-theme-font:minor-bidi;	mso-ansi-language:SL;	mso-fareast-language:JA;}.MsoChpDefault	{mso-style-type:export-only;	mso-default-props:yes;	font-family:Cambria;	mso-ascii-font-family:Cambria;	mso-ascii-theme-font:minor-latin;	mso-fareast-font-family:"ＭＳ 明朝";	mso-fareast-theme-font:minor-fareast;	mso-hansi-font-family:Cambria;	mso-hansi-theme-font:minor-latin;	mso-bidi-font-family:"Times New Roman";	mso-bidi-theme-font:minor-bidi;	mso-fareast-language:JA;}.MsoPapDefault	{mso-style-type:export-only;	margin-bottom:10.0pt;}@page WordSection1	{size:612.0pt 792.0pt;	margin:72.0pt 90.0pt 72.0pt 90.0pt;	mso-header-margin:36.0pt;	mso-footer-margin:36.0pt;	mso-paper-source:0;}div.WordSection1	{page:WordSection1;}--&gt;&lt;/style&gt;     &lt;/div&gt;&lt;div class="MsoNormal" style="font-family: inherit;"&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;The 34th instance of SIGIR conference in &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Beijing" rel="wikipedia" title="Beijing"&gt;Beijing&lt;/a&gt; was attended by more than 800 people from throughout the world (China 400, USA 250, Europe 100, ...). The acceptance rate for the papers was only 20%, which makes this conference one of the more competitive. What came as a nice surprise this year is that the presentation level was substantially better than last year, with almost all speakers giving their talks in comprehensible English and with good rhetoric skills.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="font-family: inherit; margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;span style="font-size: small;"&gt;&lt;a href="http://1.bp.blogspot.com/-2lIWTaJEcGI/TlLBEvjNMUI/AAAAAAAAAgQ/Vzi1_cMr0DI/s1600/sigir_basic_facts.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="265" src="http://1.bp.blogspot.com/-2lIWTaJEcGI/TlLBEvjNMUI/AAAAAAAAAgQ/Vzi1_cMr0DI/s400/sigir_basic_facts.jpg" width="400" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;&lt;style&gt;&lt;!-- /* Font Definitions */@font-face	{font-family:"ＭＳ 明朝";	panose-1:0 0 0 0 0 0 0 0 0 0;	mso-font-charset:128;	mso-generic-font-family:roman;	mso-font-format:other;	mso-font-pitch:fixed;	mso-font-signature:1 134676480 16 0 131072 0;}@font-face	{font-family:"Cambria Math";	panose-1:2 4 5 3 5 4 6 3 2 4;	mso-font-charset:1;	mso-generic-font-family:roman;	mso-font-format:other;	mso-font-pitch:variable;	mso-font-signature:0 0 0 0 0 0;}@font-face	{font-family:Cambria;	panose-1:2 4 5 3 5 4 6 3 2 4;	mso-font-charset:0;	mso-generic-font-family:auto;	mso-font-pitch:variable;	mso-font-signature:3 0 0 0 1 0;} /* Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal	{mso-style-unhide:no;	mso-style-qformat:yes;	mso-style-parent:"";	margin-top:0cm;	margin-right:0cm;	margin-bottom:10.0pt;	margin-left:0cm;	mso-pagination:widow-orphan;	font-size:12.0pt;	font-family:Cambria;	mso-ascii-font-family:Cambria;	mso-ascii-theme-font:minor-latin;	mso-fareast-font-family:"ＭＳ 明朝";	mso-fareast-theme-font:minor-fareast;	mso-hansi-font-family:Cambria;	mso-hansi-theme-font:minor-latin;	mso-bidi-font-family:"Times New Roman";	mso-bidi-theme-font:minor-bidi;	mso-ansi-language:SL;	mso-fareast-language:JA;}.MsoChpDefault	{mso-style-type:export-only;	mso-default-props:yes;	font-family:Cambria;	mso-ascii-font-family:Cambria;	mso-ascii-theme-font:minor-latin;	mso-fareast-font-family:"ＭＳ 明朝";	mso-fareast-theme-font:minor-fareast;	mso-hansi-font-family:Cambria;	mso-hansi-theme-font:minor-latin;	mso-bidi-font-family:"Times New Roman";	mso-bidi-theme-font:minor-bidi;	mso-fareast-language:JA;}.MsoPapDefault	{mso-style-type:export-only;	margin-bottom:10.0pt;}@page WordSection1	{size:612.0pt 792.0pt;	margin:72.0pt 90.0pt 72.0pt 90.0pt;	mso-header-margin:36.0pt;	mso-footer-margin:36.0pt;	mso-paper-source:0;}div.WordSection1	{page:WordSection1;}--&gt;&lt;/style&gt;&lt;span style="font-size: small;"&gt;   &lt;span lang="SL"&gt;&lt;a href="http://en.wikipedia.org/wiki/W._Bruce_Croft"&gt;Bruce Croft&lt;/a&gt; (program chair) presenting basic facts about the conference&lt;/span&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="MsoNormal" style="font-family: inherit;"&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;&lt;span style="font-family: inherit;"&gt;What makes the field of &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Information_retrieval" rel="wikipedia" title="Information retrieval"&gt;IR&lt;/a&gt; different from the other scientific fields is the influence of industry and their research labs. Almost 50% of the papers had at least one author from &lt;a href="http://www.microsoft.com/"&gt;Microsoft&lt;/a&gt;, &lt;a class="zem_slink" href="http://google.com/" rel="homepage" title="Google"&gt;Google,&lt;/a&gt; &lt;a href="http://www.yahoo.com/"&gt;Yahoo&lt;/a&gt;, &lt;a href="http://www.facebook.com/"&gt;Facebook&lt;/a&gt;, &lt;a class="zem_slink" href="http://www.yandex.ru/" rel="homepage" title="Yandex"&gt;Yandex&lt;/a&gt;, &lt;a class="zem_slink" href="http://www.baidu.com/" rel="homepage" title="Baidu"&gt;Baidu&lt;/a&gt; or some other company. Therefore, while SIGIR is a scientific conference, I got the feeling that it is very much oriented towards the real problems of the industry. If this assumption is correct, than we could perhaps deduce the problems of the industry by examining share of papers in different areas.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="font-family: inherit; margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;span style="font-size: small;"&gt;&lt;a href="http://2.bp.blogspot.com/-fxyWpvirkkI/TlLBFuF61tI/AAAAAAAAAgU/5cdIq4ZKvxA/s1600/sigir_top_areas.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="266" src="http://2.bp.blogspot.com/-fxyWpvirkkI/TlLBFuF61tI/AAAAAAAAAgU/5cdIq4ZKvxA/s400/sigir_top_areas.jpg" width="400" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;&lt;style&gt;&lt;!-- /* Font Definitions */@font-face	{font-family:"ＭＳ 明朝";	panose-1:0 0 0 0 0 0 0 0 0 0;	mso-font-charset:128;	mso-generic-font-family:roman;	mso-font-format:other;	mso-font-pitch:fixed;	mso-font-signature:1 134676480 16 0 131072 0;}@font-face	{font-family:"Cambria Math";	panose-1:2 4 5 3 5 4 6 3 2 4;	mso-font-charset:1;	mso-generic-font-family:roman;	mso-font-format:other;	mso-font-pitch:variable;	mso-font-signature:0 0 0 0 0 0;}@font-face	{font-family:Cambria;	panose-1:2 4 5 3 5 4 6 3 2 4;	mso-font-charset:0;	mso-generic-font-family:auto;	mso-font-pitch:variable;	mso-font-signature:3 0 0 0 1 0;} /* Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal	{mso-style-unhide:no;	mso-style-qformat:yes;	mso-style-parent:"";	margin-top:0cm;	margin-right:0cm;	margin-bottom:10.0pt;	margin-left:0cm;	mso-pagination:widow-orphan;	font-size:12.0pt;	font-family:Cambria;	mso-ascii-font-family:Cambria;	mso-ascii-theme-font:minor-latin;	mso-fareast-font-family:"ＭＳ 明朝";	mso-fareast-theme-font:minor-fareast;	mso-hansi-font-family:Cambria;	mso-hansi-theme-font:minor-latin;	mso-bidi-font-family:"Times New Roman";	mso-bidi-theme-font:minor-bidi;	mso-ansi-language:SL;	mso-fareast-language:JA;}.MsoChpDefault	{mso-style-type:export-only;	mso-default-props:yes;	font-family:Cambria;	mso-ascii-font-family:Cambria;	mso-ascii-theme-font:minor-latin;	mso-fareast-font-family:"ＭＳ 明朝";	mso-fareast-theme-font:minor-fareast;	mso-hansi-font-family:Cambria;	mso-hansi-theme-font:minor-latin;	mso-bidi-font-family:"Times New Roman";	mso-bidi-theme-font:minor-bidi;	mso-fareast-language:JA;}.MsoPapDefault	{mso-style-type:export-only;	margin-bottom:10.0pt;}@page WordSection1	{size:612.0pt 792.0pt;	margin:72.0pt 90.0pt 72.0pt 90.0pt;	mso-header-margin:36.0pt;	mso-footer-margin:36.0pt;	mso-paper-source:0;}div.WordSection1	{page:WordSection1;}--&gt;&lt;/style&gt;     &lt;br /&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;Top 5 areas for accepted papers&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="MsoNormal" style="font-family: inherit;"&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;&lt;span style="font-family: inherit;"&gt;The main stress of SIGIR2011 could be summed as "find data that solve the problem". Here are couple of examples of this approach in action:&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;ul style="font-family: inherit;"&gt;&lt;li&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;&lt;a href="http://ir-ub.mathcs.emory.edu/uFindIt/"&gt;The best paper award&lt;/a&gt; was given to a Russian Mikhail Ageev, who devised a simple game that enabled collection of data for measuring success of search. &lt;/span&gt;&lt;span lang="SL"&gt;They collected search trails for apx. 150 users using Mechanical Turk and that was sufficient to learn the model that predicts whether the user found the information he was searching for or not. This technique enable Google et al. to automatically evaluate quality of their search.&lt;/span&gt;&amp;nbsp;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size: small;"&gt;&lt;a href="http://picasso.mmci.uni-saarland.de/demo/"&gt;PICASSO&lt;/a&gt; is a system by &lt;a href="http://twitter.com/stuparCoa"&gt;Aleksandar Stupar&lt;/a&gt; that, given an image, recommends related music. The main idea behind this system is to use movies and their soundtracks to learn relation between images and music.&lt;span lang="SL"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;&lt;a href="http://research.microsoft.com/apps/pubs/default.aspx?id=150729"&gt;Guys from Microsoft&lt;/a&gt; have presented a clever way how to identify geographical relevance of a web site - just track where the readers come from.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;Overall (and excluding censorship) I liked the SIGIR2011 in Beijing more than &lt;a href="http://www.sigir2010.org/"&gt;last year's conference in Geneve&lt;/a&gt;. Last year too much stress was put on rigorous evaluation, while program committee allowed for more bold thinking this year. I got many good ideas while attending SIGIR2011 and you may expect many of them being implemented in &lt;a href="http://www.zemanta.com/"&gt;Zemanta&lt;/a&gt; soon.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;span lang="SL"&gt;Looking forward to &lt;a href="http://www.sigir.org/sigir2012/"&gt;SIGIR2012&lt;/a&gt;!&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;ul style="font-family: inherit;"&gt;&lt;/ul&gt;&lt;div style="font-family: inherit;"&gt;&lt;/div&gt;&lt;div class="zemanta-related"&gt;&lt;h6 class="zemanta-related-title" style="font-size: 1em; margin: 1em 0pt 0pt;"&gt;           Related articles&lt;/h6&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://jochenleidner.posterous.com/report-from-sigir-2011-beijing-china"&gt;Report from SIGIR 2011 (Beijing, China)&lt;/a&gt; (http://jochenleidner.posterous.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://palblog.fxpal.com/?p=5287"&gt;Censoring conferences&lt;/a&gt; (palblog.fxpal.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://scienceblogs.com/aardvarchaeology/2011/05/i_hate_the_great_firewall.php"&gt;I Hate the Great Firewall&lt;/a&gt; (scienceblogs.com)&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="zemanta-pixie" style="height: 15px; margin-top: 10px;"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img alt="Enhanced by Zemanta" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=fdb70c66-3f1d-46d2-9af3-3a4723ade7d6" style="border: none; float: right;" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-8348253677944831143?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/8348253677944831143/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2011/08/sigir2011.html#comment-form' title='Št. komentarjev: 0'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/8348253677944831143'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/8348253677944831143'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2011/08/sigir2011.html' title='#sigir2011'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-2lIWTaJEcGI/TlLBEvjNMUI/AAAAAAAAAgQ/Vzi1_cMr0DI/s72-c/sigir_basic_facts.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-4768716659341539510</id><published>2011-05-11T14:07:00.000-07:00</published><updated>2011-05-13T13:38:39.900-07:00</updated><title type='text'>Startup Slovenia</title><content type='html'>&lt;div class="zemanta-img separator" style="clear: right;"&gt;&lt;a href="http://www.flickr.com/photos/22719916@N03/3593055702" style="clear: right; display: block; float: right; margin-left: 1em; margin-right: 1em;"&gt;&lt;img alt="Dragon Bridge in Ljubljana, Slovenia" src="http://farm3.static.flickr.com/2032/3593055702_46003367df_m.jpg" style="border: none; font-size: 0.8em;" /&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution" style="clear: both; float: right; margin-left: 1em; margin-right: 1em;"&gt;Image by &lt;a href="http://www.flickr.com/photos/22719916@N03/3593055702"&gt;FromTheNorth&lt;/a&gt; via Flickr&lt;/span&gt;&lt;/div&gt;Today I attended a &lt;a href="http://www.facebook.com/event.php?eid=199727340063828"&gt;talk&lt;/a&gt; by &lt;a href="https://twitter.com/rf45"&gt;Robert Farazin&lt;/a&gt; about &lt;a href="http://doublerecall.com/"&gt;DoubleRecall&lt;/a&gt;'s successful application to &lt;a class="zem_slink" href="http://www.ycombinator.com/" rel="homepage" title="Y Combinator"&gt;Y Combinator&lt;/a&gt;. While the talk was great, what has really impressed me was the attendance of some 200 people. Startups are mushrooming in &lt;a class="zem_slink" href="http://www.lonelyplanet.com/slovenia" rel="lonelyplanet" title="Slovenia"&gt;Slovenia&lt;/a&gt; at the moment and hopefully many more will join the ranks of &lt;a class="zem_slink" href="http://www.zemanta.com/" rel="homepage" title="Zemanta"&gt;Zemanta&lt;/a&gt;, &lt;a href="http://www.celtra.com/"&gt;Celtra&lt;/a&gt;, &lt;a href="http://outfit7.com/"&gt;Outfit7&lt;/a&gt;, &lt;a href="http://vox.io/"&gt;Vox.io&lt;/a&gt; and &lt;a href="http://doublerecall.com/"&gt;DoubleRecall&lt;/a&gt;. Everything seems to be in place for &lt;a class="zem_slink" href="http://www.lonelyplanet.com/slovenia/ljubljana" rel="lonelyplanet" title="Ljubljana"&gt;Ljubljana&lt;/a&gt; to become &lt;a href="http://en.wikipedia.org/wiki/Boulder,_Colorado"&gt;Boulder&lt;/a&gt; of Europe. I hold my fingers crossed for some great exits that would enable the creation of a &lt;a href="http://www.avc.com/a_vc/2011/04/reinvesting-capital.html"&gt;proper startup ecosystem&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Ten years ago I was trying to start a company that had set to achieve something similar to what &lt;a href="http://www.netsuite.com/"&gt;NetSuite&lt;/a&gt; later managed to achieve. As I recollect those times now, the first thing that comes to my mind is how doomed to fail we really were. At that time the only VC fund at least remotely interested in funding eastern European ventures was a murky fund from &lt;a class="zem_slink" href="http://www.lonelyplanet.com/austria/vienna" rel="lonelyplanet" title="Vienna"&gt;Vienna&lt;/a&gt; called &lt;a href="https://docs.google.com/leaf?id=0B6d9L_S2s6yaZDRmZjY0MmEtNmVkMS00MTcwLTllYjItNjE2NGI1M2Y3ODhl&amp;amp;hl=en"&gt;Red-stars.com&lt;/a&gt; whose motto was "&lt;i&gt;from communism to .com&lt;/i&gt;". At that time there were no people around here to tell us that a startup does not need a &lt;a href="https://docs.google.com/viewer?a=v&amp;amp;pid=explorer&amp;amp;chrome=true&amp;amp;srcid=0B6d9L_S2s6yaYWJhYTEwODQtZmE4Zi00OGYyLWI1NTctZTQ3MGI0ODY0MDcz&amp;amp;hl=en"&gt;fifty-page business plan&lt;/a&gt;. At that time the only two other "start-ups" that we could share experience with were two dubious endeavors, the first being Telemach and the second EON of &lt;a href="http://www.mojvideo.com/video-cash-for-laws-scandal-mep-zoran-thaler-slovenia/8c7d464e8d68fefa66e0"&gt;Zoran Thaler&lt;/a&gt;. At that time the nearest event for start-ups was First Tuesday in &lt;a class="zem_slink" href="http://www.lonelyplanet.com/croatia/zagreb" rel="lonelyplanet" title="Zagreb"&gt;Zagreb&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I am so glad to see how much environment for start-ups has changed for the better in these ten years and I am really grateful that I have an opportunity to contribute to Startup Slovenia myself.&lt;br /&gt;&lt;div class="zemanta-related"&gt;&lt;h6 class="zemanta-related-title" style="font-size: 1em; margin: 1em 0pt 0pt;"&gt;Related articles&lt;/h6&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.zemanta.com/blog/can-ljubljana-become-web-capital-of-europe/"&gt;Can Ljubljana become web-capital of Europe?&lt;/a&gt; (zemanta.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://holykaw.alltop.com/startups-exposed-the-anatomy-of-a-newborn-tec"&gt;Startups exposed: The anatomy of a newborn tech company [infographic]&lt;/a&gt; (holykaw.alltop.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.zdnet.com/blog/howlett/can-netsuite-successfully-move-up-market/3100"&gt;Can NetSuite successfully move up market?&lt;/a&gt; (zdnet.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.sfgate.com/cgi-bin/article.cgi?f=/g/a/2011/05/03/rickstevesrickstevestmsarticlecfmid267.DTL"&gt;Slovenia Offers Up Charm Without the Crowds&lt;/a&gt; (sfgate.com)&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="zemanta-pixie" style="height: 15px; margin-top: 10px;"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img alt="Enhanced by Zemanta" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=246b4578-57fb-49ee-bc47-8b79ec366058" style="border: none; float: right;" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-4768716659341539510?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/4768716659341539510/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2011/05/startup-slovenia.html#comment-form' title='Št. komentarjev: 0'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/4768716659341539510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/4768716659341539510'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2011/05/startup-slovenia.html' title='Startup Slovenia'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm3.static.flickr.com/2032/3593055702_46003367df_t.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-1113053565596179170</id><published>2010-11-14T13:57:00.000-08:00</published><updated>2010-11-14T13:57:38.382-08:00</updated><title type='text'>Man-Computer Symbiosis</title><content type='html'>&lt;div class="zemanta-img separator" style="clear: right;"&gt;&lt;a href="http://commons.wikipedia.org/wiki/File:Deep_Blue.jpg" style="clear: right; display: block; float: right; margin-left: 1em; margin-right: 1em;"&gt;&lt;img alt="Deep Blue, the computer who defeated chess wor..." height="400" src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Deep_Blue.jpg/300px-Deep_Blue.jpg" style="border: medium none; font-size: 0.8em;" width="266" /&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution" style="clear: both; float: right; margin-left: 1em; margin-right: 1em; width: 300px;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Image via &lt;a href="http://commons.wikipedia.org/wiki/File:Deep_Blue.jpg"&gt;Wikipedia&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;October issue of &lt;a class="zem_slink" href="http://cacm.acm.org/" rel="homepage" title="Communications of the ACM"&gt;Communications of the ACM&lt;/a&gt; reports about &lt;a href="http://www.nature.com/nature/journal/v466/n7307/full/nature09304.html"&gt;a scientific paper&lt;/a&gt; that shows how a complex scientific problem of predicting protein structure can be solved by harvesting brain-power of 57.000+ people. The integration of human problem-solving capabilities with computational algorithms has enormous potential that might fundamentally change the world. While machines excel at computation, humans shine at pattern recognition. By combining the two, many intractable problems can be solved.&lt;br /&gt;&lt;br /&gt;There are 1.4B people &lt;a href="http://data.worldbank.org/topic/poverty"&gt;living&lt;/a&gt; below the $1.25 poverty line and at current pace of &lt;a href="http://www.bbc.co.uk/news/10569081"&gt;mobile phones penetration&lt;/a&gt;, even the poorest people will soon own a mobile phone. Even a basic mobile phone enables somebody to receive a problem, to post a solution and to get a payment. If we would be able to map important problems to a large number of people, reduce them to small cogs in a humongous analytic machine, and harvest a solution, everyone would benefit.&lt;br /&gt;&lt;br /&gt;Just like Zynga has stormed the world by social gaming, some future start-up might fundamentally change the lives of hundreds millions of poorest people in the world for the better by social problem solving, earning billions along the way. &lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;It is almost 15 years since Kasparov lost against Deep Blue. I think it is time for human race to take back supremacy in chess by intelligently harnessing our brain power. I know I would be most thrilled to take part in such a match.&lt;br /&gt;&lt;br /&gt;&lt;div class="zemanta-related"&gt;&lt;h6 class="zemanta-related-title" style="font-size: 1em; margin: 1em 0pt 0pt;"&gt;Related articles&lt;/h6&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://blog.ogilvypr.com/2010/11/humans-and-computers-unite/"&gt;Humans and Computers, Unite&lt;/a&gt; (ogilvypr.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://kotaku.com/5605576/humans-triumph-over-machines-in-protein-folding-game-foldit"&gt;Humans Triumph Over Machines In Protein Folding Game FoldIt [Science]&lt;/a&gt; (kotaku.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.reuters.com/article/technologyNews/idUSTRE69D4XA20101014"&gt;Mobile phones help lift poor out of poverty: U.N. study&lt;/a&gt; (reuters.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://techcrunch.com/2010/11/09/why-is-quora-mass-creating-twitter-accounts-on-mechanical-turk/"&gt;Why Is Quora Mass Creating Twitter Accounts On Mechanical Turk?&lt;/a&gt; (techcrunch.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.chron.com/disp/story.mpl/health/4282040.html"&gt;Researcher calls the human brain freakishly efficient as a computer&lt;/a&gt; (chron.com)&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="zemanta-pixie" style="height: 15px; margin-top: 10px;"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img alt="Enhanced by Zemanta" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=bd8eefdd-42e9-49ac-a55c-3e23c118d6d4" style="border: none; float: right;" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-1113053565596179170?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/1113053565596179170/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/11/man-computer-symbiosis.html#comment-form' title='Št. komentarjev: 0'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/1113053565596179170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/1113053565596179170'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/11/man-computer-symbiosis.html' title='Man-Computer Symbiosis'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-2610519749910066120</id><published>2010-09-10T01:13:00.000-07:00</published><updated>2010-09-10T01:13:21.858-07:00</updated><title type='text'>Term pruning</title><content type='html'>&lt;div class="zemanta-img separator" style="clear: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Pruning_Tools_M_D_Vaden.jpg" style="clear: right; display: block; float: right; margin-left: 1em; margin-right: 1em;"&gt;&lt;img alt="Pruning tools utilized by a pruning and tree-s..." height="251" src="http://upload.wikimedia.org/wikipedia/en/thumb/c/c2/Pruning_Tools_M_D_Vaden.jpg/300px-Pruning_Tools_M_D_Vaden.jpg" style="border: none; font-size: 0.8em;" width="300" /&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution" style="clear: both; float: right; margin-left: 1em; margin-right: 1em; width: 300px;"&gt;Image via &lt;a href="http://en.wikipedia.org/wiki/File:Pruning_Tools_M_D_Vaden.jpg"&gt;Wikipedia&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;At &lt;a class="zem_slink" href="http://www.zemanta.com/" rel="homepage" title="Zemanta"&gt;Zemanta&lt;/a&gt; we are constantly experimenting with new ideas how to improve our service. Most of our experiments are gainless, but quite often one learns more from failure than success.&lt;br /&gt;&lt;br /&gt;One of the gainless but illuminating experiments we did lately is term pruning. Experimentally, we have observed that 52% of terms occur in only one document and that excluding terms occurring only once have had no influence on precision of our recommendations.&amp;nbsp; Our recommendation engine is computationally very demanding and make it more efficient is a never-ending process. A chance to prune 52% of terms seemed quite promising for increasing performance of our engine and reducing index size.&lt;br /&gt;&lt;br /&gt;Our recommendation engine is based on &lt;a href="http://lucene.apache.org/"&gt;Apache Lucene/Solr&lt;/a&gt;. At a recent &lt;a href="http://lucene-eurocon.org/index.html"&gt;Lucene EuroCon&lt;/a&gt; conference, Andrzej Bialecki &lt;a href="http://lucene-eurocon.org/sessions-track1-day1.html#3"&gt;presented&lt;/a&gt; a Lucene &lt;a href="https://issues.apache.org/jira/browse/LUCENE-1812"&gt;patch&lt;/a&gt; that provides an easy tool for index manipulation. Using this tool we have removed all terms occuring in only one document, and all postings and payloads belonging to such terms. It has turned out that efficiency of our engine did not change and also the index size decreased only slightly (by 1.5%). &lt;br /&gt;&lt;br /&gt;In our opinion, this experiment has shown that Lucene is very efficient at storing terms and associated term data (postings &amp;amp; payloads), and that presence of rarely used terms in the index is not of a concern.&lt;br /&gt;&lt;div class="zemanta-related"&gt;&lt;h6 class="zemanta-related-title" style="font-size: 1em; margin: 1em 0pt 0pt;"&gt;     Related articles by Zemanta&lt;/h6&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://arnoldit.com/wordpress/2010/08/07/1-august-lucenesolr-information/"&gt;Lucene/Solr Information&lt;/a&gt; (arnoldit.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://blog.sematext.com/2010/09/01/solr-digest-august-2010/"&gt;Solr Digest, August 2010&lt;/a&gt; (sematext.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://itexpertvoice.com/home/11-apache-technologies-for-the-enterprise/"&gt;11 Apache Technologies for the Enterprise&lt;/a&gt; (itexpertvoice.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.lucidimagination.com/blog/2010/08/11/stumped-with-solr-chris-hostetter-of-lucene-pmc-at-lucene-revolution/"&gt;Stumped with Solr? Chris Hostetter of Lucene PMC at Lucene Revolution&lt;/a&gt; (lucidimagination.com)&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="zemanta-pixie" style="height: 15px; margin-top: 10px;"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img alt="Enhanced by Zemanta" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=dc11920c-d37d-41dd-a6cc-6c2dd49515cf" style="border: none; float: right;" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-2610519749910066120?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/2610519749910066120/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/09/term-pruning.html#comment-form' title='Št. komentarjev: 0'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/2610519749910066120'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/2610519749910066120'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/09/term-pruning.html' title='Term pruning'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-320530943836783275</id><published>2010-05-19T17:00:00.000-07:00</published><updated>2010-05-19T17:00:02.032-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Apache Lucene EuroCon 2010'/><category scheme='http://www.blogger.com/atom/ns#' term='Prague'/><title type='text'>Apache Lucene EuroCon 2010</title><content type='html'>&lt;p class="zemanta-img" style="margin: 1em; float: right; display: block; width: 250px;"&gt;&lt;a href="http://www.flickr.com/photos/10155443@N00/50304563"&gt;&lt;img src="http://farm1.static.flickr.com/29/50304563_64071a8554_m.jpg" alt="Prague Castle &amp;amp; Charles Bridge by Nite" style="border: medium none; display: block;" width="240" height="180" /&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution"&gt;Image by &lt;a href="http://www.flickr.com/photos/10155443@N00/50304563"&gt;StrudelMonkey&lt;/a&gt; via Flickr&lt;/span&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;I'm in &lt;a class="zem_slink" href="http://maps.google.com/maps?ll=50.0833333333,14.4166666667&amp;amp;spn=0.1,0.1&amp;amp;q=50.0833333333,14.4166666667%20%28Prague%29&amp;amp;t=h" title="Prague" rel="geolocation"&gt;Prague&lt;/a&gt; this week to attend Apache Lucene EuroCon 2010. It is always great to be in Prague. It is such a great city to just stroll along the banks of &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Vltava" title="Vltava" rel="wikipedia"&gt;Vltava&lt;/a&gt;, across the &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Charles_Bridge" title="Charles Bridge" rel="wikipedia"&gt;Charles bridge&lt;/a&gt; to Zlata ulička and all the way up to Hradčani.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;fieldset class="zemanta-related"&gt;&lt;legend class="zemanta-related-title"&gt;Related articles by Zemanta&lt;/legend&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://arnoldit.com/wordpress/2010/05/04/lucene-solr-developer-event-in-prague-arrives/"&gt;Lucene Solr Developer Event in Prague Arrives&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://arnoldit.com/wordpress/2010/05/17/first-u-s-open-source-search-conference/"&gt;First U.S. Open Source Search Conference&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/fieldset&gt;      &lt;div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img style="border: medium none; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=a26615d1-65aa-4ef1-b5f0-7fff6eaa3baf" alt="Enhanced by Zemanta" /&gt;&lt;/a&gt;&lt;span class="zem-script more-related pretty-attribution"&gt;&lt;script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"&gt;&lt;/script&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-320530943836783275?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/320530943836783275/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/05/apache-lucene-eurocon-2010_19.html#comment-form' title='Št. komentarjev: 0'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/320530943836783275'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/320530943836783275'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/05/apache-lucene-eurocon-2010_19.html' title='Apache Lucene EuroCon 2010'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm1.static.flickr.com/29/50304563_64071a8554_t.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-2358242428814006976</id><published>2010-05-10T05:15:00.000-07:00</published><updated>2010-05-10T05:46:39.034-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Apache Lucene'/><category scheme='http://www.blogger.com/atom/ns#' term='Information retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='Inverted index'/><title type='text'>BooleanScorer vs. BooleanScorer2</title><content type='html'>&lt;p class="zemanta-img" style="margin: 1em; float: right; display: block; width: 250px;"&gt;&lt;a href="http://www.flickr.com/photos/72825507@N00/3428218606"&gt;&lt;img src="http://farm4.static.flickr.com/3298/3428218606_a31c368e84_m.jpg" alt="Inverted (Negative) of Dandelion puff ball (cl..." style="border: medium none; display: block;" width="240" height="180" /&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution"&gt;Image by &lt;a href="http://www.flickr.com/photos/72825507@N00/3428218606"&gt;mikebaird&lt;/a&gt; via Flickr&lt;/span&gt;&lt;/p&gt;This post is about scorers in &lt;a class="zem_slink" href="http://lucene.apache.org/" title="Lucene" rel="homepage"&gt;Apache Lucene&lt;/a&gt;. Let me first tell what scorers are, for those not intimately familiar with Lucene. In &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Inverted_index" title="Inverted index" rel="wikipedia"&gt;inverted index&lt;/a&gt;, a result list is compiled by merging postings lists of terms present in the query. A scorer is a function that combines scores of individual query terms into a single score for each document in the index. The scoring is usually the most time consuming step of searching in inverted index. Therefore, the efficiency of the scorer function is a prerequisite for efficient search in inverted index.&lt;br /&gt;&lt;br /&gt;There exists two scorers for a BooleanQuery in Apache Lucene, BooleanScorer and BooleanScorer2. The BooleanScorer uses a ~16k array to score windows of docs, while the BooleanScorer2 merges priority queues of postings (see description in BooleanScorer.java for more details). In principle, the BooleanScorer should be much faster for boolean queries with lots of frequent terms, since it does not need to update a priority queue for each posting.&lt;br /&gt;&lt;br /&gt;We were curious how much faster the BooleanScorer really is, so we compared the two scorers using &lt;a class="zem_slink" href="http://www.zemanta.com/" title="Zemanta" rel="homepage"&gt;Zemanta&lt;/a&gt; related articles application. In order to identify related articles, Zemanta's engine matches approximately a dozen of entities extracted from user's blog post against an index of several millions of documents.&lt;br /&gt;&lt;br /&gt;The results have confirmed that the BooleanScorer is much faster than the BooleanScorer2:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; the average response time has decreased by one third,&lt;/li&gt;&lt;li&gt;the maximum response time for 99% of requests has decreased by 20%.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Please notice, that BooleanScorer scores documents out of order and therefore cannot be used with filters or in complex queries.&lt;br /&gt;&lt;br /&gt;&lt;fieldset class="zemanta-related"&gt;&lt;legend class="zemanta-related-title"&gt;Related articles by Zemanta&lt;/legend&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://arnoldit.com/wordpress/2010/05/04/lucene-solr-developer-event-in-prague-arrives/"&gt;Lucene Solr Developer Event in Prague Arrives&lt;/a&gt; (arnoldit.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.theopenforce.com/2010/04/apache-lucene-eurocon.html"&gt;Apache Lucene EuroCon - May 18-21&lt;/a&gt; (theopenforce.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.lucidimagination.com/blog/2010/04/22/apache-lucene-eurocon-agenda-the-revolution-is-on/"&gt;Apache Lucene EuroCon Agenda - The Revolution is On!&lt;/a&gt; (lucidimagination.com)&lt;/li&gt;&lt;/ul&gt;&lt;/fieldset&gt;    &lt;div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img style="border: medium none; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=b8bdd960-f5ae-4d64-b98d-c369db42c76c" alt="Enhanced by Zemanta" /&gt;&lt;/a&gt;&lt;span class="zem-script more-related pretty-attribution"&gt;&lt;script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"&gt;&lt;/script&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-2358242428814006976?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/2358242428814006976/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/05/booleanscorer-vs-booleanscorer2.html#comment-form' title='Št. komentarjev: 1'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/2358242428814006976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/2358242428814006976'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/05/booleanscorer-vs-booleanscorer2.html' title='BooleanScorer vs. BooleanScorer2'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm4.static.flickr.com/3298/3428218606_a31c368e84_t.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-707006476170363775</id><published>2010-04-08T11:54:00.000-07:00</published><updated>2010-04-09T06:25:37.856-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Facebook'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Social network'/><title type='text'>Authentication on the Internet - a solved problem?</title><content type='html'>&lt;p class="zemanta-img" style="margin: 1em; float: right; display: block;"&gt;&lt;a href="http://www.flickr.com/photos/7338462@N06/3618869855"&gt;&lt;img src="http://farm4.static.flickr.com/3332/3618869855_2836044b3e_m.jpg" alt="Facebook Social Graph" style="border: medium none ; display: block;"&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution"&gt;Image by &lt;a href="http://www.flickr.com/photos/7338462@N06/3618869855"&gt;Rafiq Phillips&lt;/a&gt; via Flickr&lt;/span&gt;&lt;/p&gt;In my opinion, the greatest potential of Facebook is to become a dominant authentication method on the Internet. If Facebook succeeds, it will supplement Google as a gatekeeper of the Internet and become the mightiest company in the universe with unprecedented power. But I think Facebook will fail to seize this opportunity. Here is why.&lt;br /&gt;&lt;br /&gt;The solution to the problem of authentication on the Internet has so far eluded us. None of the more secure methods (e.g., &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Public_key_certificate" title="Public key certificate" rel="wikipedia"&gt;digital certificates&lt;/a&gt;, &lt;a class="zem_slink" href="http://openid.net" title="OpenID Foundation" rel="homepage"&gt;OpenID&lt;/a&gt;, &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Security_token" title="Security token" rel="wikipedia"&gt;security tokens&lt;/a&gt;) have gained wide traction and the dominant authentication system we have in place today (username/password combination) is so &lt;a href="http://portal.acm.org/citation.cfm?id=1143120.1143127"&gt;broken&lt;/a&gt; that it does not stand a chance against Nigerian phishermen.&lt;br /&gt;&lt;br /&gt;Mostly by coincidence Facebook has found out that people who know you are also the most authoritative source for confirming your identity. By building your social graph you are building your on-line identity. Facebook, as the keeper of your social graph, can pass upon your request your social graph to some other web site in order for you to confirm your identity. Building a social graph takes time and requires constant maintenance resulting in a substantial user lock-in that Facebook enjoys.&lt;br /&gt;&lt;br /&gt;We have put up with effort required for building the social graph because Facebook gave us some very sweet incentives. &lt;a href="http://blog.thoughtpick.com/2010/04/facebook-facts-and-figures-infograph-image.html"&gt;400M+ users&lt;/a&gt; are a living proof how strong incentives friday party's photos and juicy comments can be. But our social graphs are now mostly built. In order to keep us interested, Facebook is providing us with ever more incentives (videos, games, news, chat, e-mail, ...). But with every new incentive that seduces us, more and more about our lives can be deduced from the river of news we generate. While our social graph included only our like-minded friends, total exposure of our personality on Facebook did not matter. But with moms and bosses joining Facebook, teenagers and tweens will leave (with others to follow soon) for more private quarters or &lt;a href="http://gigaom.com/2010/03/30/are-your-facebook-friends-really-who-they-say-they-are/"&gt;hide themselves under a false name&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I think Facebook is making a capital mistake by making us &lt;a href="http://news.cnet.com/8301-1023_3-10457480-93.html"&gt;spend more time&lt;/a&gt; using it. With every piece of information about ourselves exposed on Facebook, the chances are increasing that our social graphs will collapse due to unwanted exposure of our personal details. The situation very much reminds me of the year 2000 when portals such as Yahoo and AOL were competing for users' eyeballs time. As history testifies, the whole portal edifice collapsed with arrival of Google who made a fortune by making users leave Google's web site as fast as possible.&lt;br /&gt;&lt;br /&gt;I think Facebook should follow Google's recipe of getting out of user's way and transform itself into a simple Facebook button on every web site thus becoming the dominant authentication method on the Internet and collecting &lt;a href="http://mashable.com/2010/03/31/facebook-vs-google-default-profile/"&gt;hundredths billions dollars in transaction fees for (virtual) goods&lt;/a&gt; along the way.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Livin%27_la_Vida_Loca"&gt;I feel a premonition&lt;/a&gt; that Facebook is already too big to be capable of &lt;a href="http://www.shirky.com/weblog/2010/04/the-collapse-of-complex-business-models/"&gt;simplification of its business model&lt;/a&gt;. Even though Facebook feels like the king of the world at the moment, it might very well end the same as Yahoo and AOL did. I think now it's the perfect time for the onset of a company that would do social authentication that does not suck, just as Google provided us with a web search that doesn't suck a decade ago. Just to make fun out of myself ten years from now, I'll make a prediction that &lt;a href="http://valleywag.gawker.com/5511623/the-arrogance-of-turning-down-100-million"&gt;Foursquare&lt;/a&gt; will be the David who will trounce the Facebook Goliath.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;As a side note, let me note that Twitter is no alternative to Facebook as a authentication method since Twitter's "follower" model does not provide a chain of trust. Well, unless Twitter solves the scalability problem of the &lt;a href="http://twitter.com/help/verified"&gt;verified account&lt;/a&gt; approach.&lt;br /&gt;&lt;br /&gt;&lt;fieldset class="zemanta-related"&gt;&lt;legend class="zemanta-related-title"&gt;Related articles by Zemanta&lt;/legend&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://mashable.com/2010/04/07/google-facebook-sign-in/"&gt;Facebook and Google Dominating Online Identity War [STATS]&lt;/a&gt; (mashable.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.businessinsider.com/chart-of-the-day-unique-visitors-social-networking-sites-2010-4"&gt;CHART OF THE DAY: Facebook Is Absolutely Crushing The Competition&lt;/a&gt; (businessinsider.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://techcrunch.com/2010/04/05/itunes-facebook-connect/"&gt;iTunes To Integrate Facebook Connect&lt;/a&gt; (techcrunch.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.paulspoerry.com/2010/03/28/facebooks-end-run-around-google/"&gt;Facebook's end run around Google&lt;/a&gt; (paulspoerry.com)&lt;/li&gt;&lt;/ul&gt;&lt;/fieldset&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=cdfae269-8fc3-4345-a895-895c9940c335" alt="Enhanced by Zemanta"&gt;&lt;/a&gt;&lt;span class="zem-script more-related pretty-attribution"&gt;&lt;script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"&gt;&lt;/script&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-707006476170363775?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/707006476170363775/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/04/authentication-on-internet-solved.html#comment-form' title='Št. komentarjev: 1'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/707006476170363775'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/707006476170363775'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/04/authentication-on-internet-solved.html' title='Authentication on the Internet - a solved problem?'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm4.static.flickr.com/3332/3618869855_2836044b3e_t.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-6641942800913093512</id><published>2010-03-10T03:22:00.000-08:00</published><updated>2010-03-10T06:14:00.831-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Solr 1.4'/><category scheme='http://www.blogger.com/atom/ns#' term='Apache Solr'/><category scheme='http://www.blogger.com/atom/ns#' term='Lucene'/><title type='text'>Boosting more recent content in a custom Apache Solr request handler</title><content type='html'>&lt;p class="zemanta-img" style="margin: 1em; float: right; display: block; width: 188px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Solr.png"&gt;&lt;img src="http://upload.wikimedia.org/wikipedia/en/3/3e/Solr.png" alt="Solr" style="border: medium none ; display: block; width: 178px; height: 98px;" /&gt;&lt;/a&gt;&lt;span class="zemanta-img-attribution"&gt;Image via &lt;a href="http://en.wikipedia.org/wiki/Image:Solr.png"&gt;Wikipedia&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;Users prefer recent information. &lt;a class="zem_slink" href="http://lucene.apache.org/solr/" title="Apache Solr" rel="homepage"&gt;Apache Solr&lt;/a&gt; 1.4 has excellent support for &lt;a href="http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents"&gt;boosting more recent content&lt;/a&gt;. It turns out that adding the same functionality to custom request handlers requires some digging into Solr internals. Though it's really cool digging into Solr, you might appreciate a ready made solution. So here's the code to implement boosting more recent content for a given query.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;ValueSource document_date =&lt;br /&gt; new TrieDateFieldSource("document_date_field",&lt;br /&gt;     FieldCache.NUMERIC_UTILS_LONG_PARSER);&lt;br /&gt;&lt;br /&gt;/* ValueSource that calculates the number of miliseconds&lt;br /&gt;* between the &lt;span style="font-style: italic;"&gt;document_date&lt;/span&gt; (e.g. blog publication date)&lt;br /&gt;* and the present time, i.e. &lt;span style="font-style: italic;"&gt;now&lt;/span&gt;.&lt;br /&gt;ValueSource vs = new DualFloatFunction(&lt;br /&gt; new LongConstValueSource(now), document_date) {&lt;br /&gt;&lt;br /&gt;     private static final long serialVersionUID = 1L;&lt;br /&gt;&lt;br /&gt;     protected String name() { return "ms"; }&lt;br /&gt;&lt;br /&gt;     protected float func(int doc, DocValues aVals,&lt;br /&gt;             DocValues bVals) {&lt;br /&gt;&lt;br /&gt;         return now - bVals.longVal(doc);&lt;br /&gt;     }&lt;br /&gt; };&lt;br /&gt;&lt;br /&gt;/* ReciprocalFloatFunction implements a reciprocal&lt;br /&gt;* function f(x) = a/(mx+b), based on the float value&lt;br /&gt;* of a field or function as exported by ValueSource vs.&lt;br /&gt;* Values m, a, and b are float constants. */&lt;br /&gt;ValueSource recip = new ReciprocalFloatFunction(vs, m, a, b);&lt;br /&gt;&lt;br /&gt;/* Boosting a given &lt;span style="font-style: italic;"&gt;query&lt;/span&gt; with the &lt;a class="zem_slink" href="http://en.wikipedia.org/wiki/Multiplicative_inverse" title="Multiplicative inverse" rel="wikipedia"&gt;reciprocal function&lt;/a&gt; &lt;span style="font-style: italic;"&gt;recip&lt;/span&gt; */&lt;br /&gt;Query boostedQuery = new BoostedQuery(query, recip);&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Notes:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;"date" field type should be of the class "solr.TrieDateField" (introduced in Solr 1.4) in order for the above recipe to work.&lt;/li&gt;&lt;li&gt;Unfortunately, org.apache.solr.search.LongConstValueSource and org.apache.solr.schema.TrieDateFieldSource classes are not public. I've expressed my wish to the Solr team to make this two classes public. Until they do, you should copy these two classes from Solr source code to your project.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;fieldset class="zemanta-related"&gt;&lt;legend class="zemanta-related-title"&gt;Related articles by Zemanta&lt;/legend&gt;&lt;ul class="zemanta-article-ul"&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.lucidimagination.com/blog/2009/11/10/apache-solr-1-4-is-officially-released/"&gt;Apache Solr 1.4 is officially released&lt;/a&gt; (lucidimagination.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.lucidimagination.com/blog/2009/12/12/apache-solr-1-5-on-the-move-with-more-functionality/"&gt;Apache Solr 1.5 on the move with more "functionality"&lt;/a&gt; (lucidimagination.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.cmswatch.com/Trends/1727-Solr-1.4?source=RSS"&gt;Solr heads for an even sunnier future&lt;/a&gt; (cmswatch.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://www.lucidimagination.com/blog/2010/01/11/book-review-solr-packt-book/"&gt;Book Review: Solr 1.4 Enterprise Search Server (Packt) by David Smiley and Eric Pugh&lt;/a&gt; (lucidimagination.com)&lt;/li&gt;&lt;li class="zemanta-article-ul-li"&gt;&lt;a href="http://blog.jteam.nl/2009/12/08/being-at-the-fore-of-apache-solr-and-lucene-development/"&gt;Being at the fore of Apache Solr and Lucene Development&lt;/a&gt; (jteam.nl)&lt;/li&gt;&lt;/ul&gt;&lt;/fieldset&gt;  &lt;div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"&gt;&lt;a class="zemanta-pixie-a" href="http://www.zemanta.com/" title="Enhanced by Zemanta"&gt;&lt;img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/zemified_e.png?x-id=8253349e-f975-4eea-90d1-c75ebae6406a" alt="Enhanced by Zemanta" /&gt;&lt;/a&gt;&lt;span class="zem-script more-related pretty-attribution"&gt;&lt;script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"&gt;&lt;/script&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-6641942800913093512?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/6641942800913093512/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/03/boosting-more-recent-content-in-custom.html#comment-form' title='Št. komentarjev: 0'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/6641942800913093512'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/6641942800913093512'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2010/03/boosting-more-recent-content-in-custom.html' title='Boosting more recent content in a custom Apache Solr request handler'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9058677219636510204.post-6188134602200265861</id><published>2009-11-04T00:16:00.000-08:00</published><updated>2009-11-04T01:58:57.510-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mac OS X Snow Leopard'/><title type='text'>Building lxml on Mac OS X 10.6 (Snow Leopard)</title><content type='html'>Build of lxml on Mac OS X 10.6 (Snow Leopard) is broken at the moment (2009-11-03). Here are the steps that worked for me in order to build it.&lt;br /&gt;&lt;br /&gt;1. Fix the Makefile in /Library/Frameworks/Python.framework/Versions/Current/lib/python2.5/config by:&lt;br /&gt;- replacing all occurences of "MacOSX10.4u.sdk" with "MacOSX10.5.sdk" and&lt;br /&gt;- setting "MACOSX_DEPLOYMENT_TARGET' to "10.5"&lt;br /&gt;&lt;br /&gt;2. Install &lt;a class="zem_slink" href="http://www.cython.org/" title="Cython" rel="homepage"&gt;Cython&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;3. Check out the source: svn co http://codespeak.net/svn/lxml/tag/lxml-2.2.3&lt;br /&gt;(remark: the latest trunk did not work for me)&lt;br /&gt;&lt;br /&gt;4. In directory lxml-2.2.3 fix the file buildlibxml.py by:&lt;br /&gt;- replacing all occurences of "MacOSX10.4u.sdk" with "MacOSX10.5.sdk" and&lt;br /&gt;- setting "MACOSX_DEPLOYMENT_TARGET' to "10.5"&lt;br /&gt;&lt;br /&gt;5. Build and install lxml by issuing:&lt;br /&gt;- python setup.py install --static-deps&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I hope it works for you too!&lt;br /&gt;&lt;br /&gt;Dušan&lt;br /&gt; &lt;div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"&gt;&lt;img style="border: medium none ; float: right;" class="zemanta-pixie-img" alt="" src="http://img.zemanta.com/pixy.gif?x-id=715afe41-511c-4f4e-83cf-cd72975da957" /&gt;&lt;span class="zem-script more-related pretty-attribution"&gt;&lt;script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"&gt;&lt;/script&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9058677219636510204-6188134602200265861?l=unreasonableeffectivenessofdata.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://unreasonableeffectivenessofdata.blogspot.com/feeds/6188134602200265861/comments/default' title='Objavi komentarje'/><link rel='replies' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2009/11/building-lxml-on-mac-os-x-106-snow.html#comment-form' title='Št. komentarjev: 0'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/6188134602200265861'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9058677219636510204/posts/default/6188134602200265861'/><link rel='alternate' type='text/html' href='http://unreasonableeffectivenessofdata.blogspot.com/2009/11/building-lxml-on-mac-os-x-106-snow.html' title='Building lxml on Mac OS X 10.6 (Snow Leopard)'/><author><name>Dušan Omerčević</name><uri>http://www.blogger.com/profile/17161430613388211010</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://4.bp.blogspot.com/_AkG5HI5UE_I/STRD9N615JI/AAAAAAAAAHI/pu1lN_AesDY/S220/dusan.jpg'/></author><thr:total>0</thr:total></entry></feed>
