nedelja, 6. november 2011

Comments are not code

Code reviewImage by Richard Masoner / Cyclelicious via Flickr
I'm a firm believer that the best software documentation is the running code. If the code is well structured and written, it speaks for itself and it does not need any additional documentation. Comments are not code and therefore should not be used where better code organization would suffice.

A misplaced use of comments that I often see while doing code reviews is to use comments to divide a method into logical subunits. For example:

def check_specific_candidate():
    
    # first check if we already have X by any chance
    < 10 lines of code, return if true>

    # Try out if candidate is Y
    < 30 lines of code, return if true>

    # candidate is not Y, try out if it is Z
    < another 30 lines of code, return if true> 

    # construct a list of elements in the candidate
    < another 30 lines of code>

    if len(list_of_elements) > 0:
        # process list of elements for the candidate
        < another 10 lines of code>

This example is based on actual routine in Zemanta code base that is altogehter 140 lines long. Supporting such code is not a nice experience. While comments in this routine do help, they are actually a symptom of a larger problem, i.e. poor code organization. Comments would immediately become redundant, if this routine would be split into logical steps with each step being a separate routine. Let's refactor the above routine as such:

def check_specific_candidate(candidate):

    if _candidate_has_X(candidate):
        return

    if _candidate_is_Y(candidate):
        return

    if _candidate_is_Z(candidate):
        return

    list_of_elements = _get_list_of_elements(candidate)
    if len(list_of_elements) > 0:
        _process_list_of_elements(list_of_elements)

So instead of using comments, this routine is now documented using method names. When you approach such code for the first time, seeing such nice 15-lines long routine is much less stressful than seeing a 140-lines long monster. 
Enhanced by Zemanta

ponedeljek, 22. avgust 2011

#sigir2011

It's more than a bit ironic that a premiere conference on information retrieval took place behind the Great Firewall and consequently without discussions on Twitter. But Chinese have become also great scientist and are not just cheap labor anymore, so I guess they have well deserved to host this event.

The 34th instance of SIGIR conference in Beijing was attended by more than 800 people from throughout the world (China 400, USA 250, Europe 100, ...). The acceptance rate for the papers was only 20%, which makes this conference one of the more competitive. What came as a nice surprise this year is that the presentation level was substantially better than last year, with almost all speakers giving their talks in comprehensible English and with good rhetoric skills.
Bruce Croft (program chair) presenting basic facts about the conference
What makes the field of IR different from the other scientific fields is the influence of industry and their research labs. Almost 50% of the papers had at least one author from Microsoft, Google, Yahoo, Facebook, Yandex, Baidu or some other company. Therefore, while SIGIR is a scientific conference, I got the feeling that it is very much oriented towards the real problems of the industry. If this assumption is correct, than we could perhaps deduce the problems of the industry by examining share of papers in different areas.

Top 5 areas for accepted papers
The main stress of SIGIR2011 could be summed as "find data that solve the problem". Here are couple of examples of this approach in action:
  • The best paper award was given to a Russian Mikhail Ageev, who devised a simple game that enabled collection of data for measuring success of search. They collected search trails for apx. 150 users using Mechanical Turk and that was sufficient to learn the model that predicts whether the user found the information he was searching for or not. This technique enable Google et al. to automatically evaluate quality of their search. 
  • PICASSO is a system by Aleksandar Stupar that, given an image, recommends related music. The main idea behind this system is to use movies and their soundtracks to learn relation between images and music. 
  • Guys from Microsoft have presented a clever way how to identify geographical relevance of a web site - just track where the readers come from.
Overall (and excluding censorship) I liked the SIGIR2011 in Beijing more than last year's conference in Geneve. Last year too much stress was put on rigorous evaluation, while program committee allowed for more bold thinking this year. I got many good ideas while attending SIGIR2011 and you may expect many of them being implemented in Zemanta soon.


Looking forward to SIGIR2012!
    Enhanced by Zemanta

    sreda, 11. maj 2011

    Startup Slovenia

    Dragon Bridge in Ljubljana, SloveniaImage by FromTheNorth via Flickr
    Today I attended a talk by Robert Farazin about DoubleRecall's successful application to Y Combinator. While the talk was great, what has really impressed me was the attendance of some 200 people. Startups are mushrooming in Slovenia at the moment and hopefully many more will join the ranks of Zemanta, Celtra, Outfit7, Vox.io and DoubleRecall. Everything seems to be in place for Ljubljana to become Boulder of Europe. I hold my fingers crossed for some great exits that would enable the creation of a proper startup ecosystem.

    Ten years ago I was trying to start a company that had set to achieve something similar to what NetSuite later managed to achieve. As I recollect those times now, the first thing that comes to my mind is how doomed to fail we really were. At that time the only VC fund at least remotely interested in funding eastern European ventures was a murky fund from Vienna called Red-stars.com whose motto was "from communism to .com". At that time there were no people around here to tell us that a startup does not need a fifty-page business plan. At that time the only two other "start-ups" that we could share experience with were two dubious endeavors, the first being Telemach and the second EON of Zoran Thaler. At that time the nearest event for start-ups was First Tuesday in Zagreb.

    I am so glad to see how much environment for start-ups has changed for the better in these ten years and I am really grateful that I have an opportunity to contribute to Startup Slovenia myself.
    Enhanced by Zemanta