So in our Blacklight-based catalog, we have a facet/limit for “Location”, that is based on the collection/location codes from holdings, and is meant to limit to just items held a particular sub-library of our Hopkins-wide system.
We’ve gotten a new requirement, which is that when you’ve limited to any of these location limits (for instance, only items in the “Milton S. Eisenhower Library”), the result set should also include all ‘Online’ items. No matter what Location limits you’ve applied, the result set should always include all Online items too. (Mine is not to reason why…).
One thing that’s trickier than you might think is spec’ing exactly what counts as an ‘online’ item and how you identify it from the MARC records — but our Catalog already has an ‘Online’ limit, and we’ll just re-use that existing classification. How it classifies MARC records as ‘Online’ or not is a different discussion.
I can think of a couple approaches for making the feature work this way.
Option 1. Change how Blacklight app makes Solr requests
Ordinarily in a Blacklight app, if you choose a limit from a facet — say ”Milton S. Eisenhower Library” from the “Location” facet — it will add on an `fq` param to the Solr query: say, “&fq=location_facet:Milton S. Eisenhower Library” (except URL-encoded in the actual Solr URL of course).
So Blacklight actually gives us an easy way to customize this. We could remove add_facet_fq_to_solr from the solr_search_params_logic array in our local CatalogController, replacing it with our own custom local_add_facet_fq_to_solr method.
Our custom local method would, for facet limits from the Location facet only, do something special to add a different fq on to the Solr query, that looks more like: `&fq:location_facet:Milton S. Eisenhower Library OR format_facet:Online`. For other facet limits, our custom local method would just call the original add_facet_fq_to_solr.
This wouldn’t change our Solr index at all, and would still make it possible to implement some other (possibly hidden back-end) feature that really limited to the original location without throwing “Online” in too, in case eventually people realize they need that after all.
I am not sure if it would effect performance of applying those limits; I think it would probably not, that expanded ‘fq’ with the ‘OR’ in it can be cached in the solr filter cache same as anything else.
I worry it might be a fragile solution though, that could break in future versions of Blacklight (say, if Blacklight refactors/renames it’s request builder methods, so our code is no longer succesfully replacing the original logic in the `add_facet_fq_to_solr` method) — and then be confusing for future developers who aren’t me to figure out why it’s broken and how to fix it. It’s potentially a bit too clever a solution.
Option 2. Change how location facet is indexed
The other option is changing how the location_facet Solr field is indexed, so every bib that is marked “Online” is also assigned to every location facet value.
Then, without any other changes at all to app code, limiting to a particular location facet value will always include every ‘Online’ record too, because all those records are simply included in every location facet value in the index.
We do our indexing with traject, and it’s fairly straightforward to implement something like this in traject.
In our indexing file, after the rule for possibly assigning ‘Online’ to the `format_facet`, we’d create a rule that looked something like this:
each_record do |record, context| if (context.output_hash["format"] || ).include? "Online" context.output_hash["location_facet"] ||=  context.output_hash["location_facet"].concat all_the_locations end end
Pretty easy-peasy, eh? I think I would have had a lot more trouble doing this concisely and maintainably in SolrMarc, but maybe that’s just because I’m more comfortable in ruby and with traject (having written traject with Bill Dueber). But I think it actually might be because traject is awesome.
The only other trick is where I get that `all_the_locations` from. My existing code uses not one but TWO different translation maps to go from MARC data to Location facet values. The only place ‘all possible locations’ exists in code is in the values in these two hashes. If I just hard code it into a variable, it’ll be fragile and easily get out of sync with those. I guess I’d have to write ruby code to look at both those location maps, get all the values, and stick em in a variable, at boot-time.
No problem, just in the traject configuration file anywhere before the indexing rule we define above:
all_the_locations =  all_the_locations.concat Traject::TranslationMap.new("jh_locations").to_hash.values all_the_locations.concat Traject::TranslationMap.new("jh_collections").to_hash.values all_the_locations.uniq!
The benefit of traject being just ruby is that you can just write ruby, and I’ve tried to make the traject classes and api’s flexible so you can do what you need with them (I hadn’t considered this use case specifically when i wrote the TranslationMap api, but I gave it a to_hash figuring all sorts of things could be done with that, as ruby Hash has a flexible api).
Anyhow. Benefits of this approach is that no fancy potentially fragile “create a custom Solr query” code is needed, and the Solr `fq`s for facet queries are still ordinary “field:value” with Solr performance characteristics we are well familiar with.
Disadvantages might be that we’re adding something to our indexing size with all these additional postings (probably not too much though, Solr is pretty efficient with this stuff), and possibly changing the performance characteristics of our facet queries by changing the number and distribution of postings in location_facet.
Another disadvantage is that we’ve made it impossible to query the “real” location facet, without the inclusion of “Online”, but that does meet the specs we’ve been currently given.
So which approach to take?
I’m actually not entirely sure. I lean to option 2, despite it’s downsides, because my intuition still says it’s less fragile and easier for future developers to understand (a huge priority for me these days), but I’m not entirely sure i’m right about that.
Filed under: General