sreda, 10. marec 2010

Boosting more recent content in a custom Apache Solr request handler

SolrImage via Wikipedia

Users prefer recent information. Apache Solr 1.4 has excellent support for boosting more recent content. It turns out that adding the same functionality to custom request handlers requires some digging into Solr internals. Though it's really cool digging into Solr, you might appreciate a ready made solution. So here's the code to implement boosting more recent content for a given query.

ValueSource document_date =
new TrieDateFieldSource("document_date_field",
FieldCache.NUMERIC_UTILS_LONG_PARSER);

/* ValueSource that calculates the number of miliseconds
* between the document_date (e.g. blog publication date)
* and the present time, i.e. now.
ValueSource vs = new DualFloatFunction(
new LongConstValueSource(now), document_date) {

private static final long serialVersionUID = 1L;

protected String name() { return "ms"; }

protected float func(int doc, DocValues aVals,
DocValues bVals) {

return now - bVals.longVal(doc);
}
};

/* ReciprocalFloatFunction implements a reciprocal
* function f(x) = a/(mx+b), based on the float value
* of a field or function as exported by ValueSource vs.
* Values m, a, and b are float constants. */
ValueSource recip = new ReciprocalFloatFunction(vs, m, a, b);

/* Boosting a given query with the reciprocal function recip */
Query boostedQuery = new BoostedQuery(query, recip);


Notes:
  • "date" field type should be of the class "solr.TrieDateField" (introduced in Solr 1.4) in order for the above recipe to work.
  • Unfortunately, org.apache.solr.search.LongConstValueSource and org.apache.solr.schema.TrieDateFieldSource classes are not public. I've expressed my wish to the Solr team to make this two classes public. Until they do, you should copy these two classes from Solr source code to your project.


Enhanced by Zemanta