Using the MongoDB Aggregation Framework in Grails

Introduction

The long-awaited MongoDB Aggregation Framework shipped with MongoDB 2.2, providing a vastly improved capacity for aggregation / metrics / report-style queries that heretofore required MapReduce. Below, I rip off — verbatim — some examples from the documented examples to demonstrate their usage in a Grails application. To use the aggregation framework, you’ll need to download, install, and run MongoDB 2.2+.

In addition, the latest version — as of this writing — of the MongoDB plugin for Grails uses older jars which we’ll override in our BuildConfig.groovy.

Props: Tremendous gratitude to Graeme and the rest of the plugin contributors. It’s been an absolute pleasure to use with Mongo.

More props: Additional gratitude to Paulo Gabriel Poiati for GMongo. Again: a pleasure to use. Kudos for its sheer simplicity.

Using the latest Java Mongo Driver and GMongo

To use the Aggregation Framework, we need the latest Java driver (2.9+) and the latest GMongo. The former exposes the aggregate() method, and the latter enables us to pass Groovy maps to aggregate() rather than DBObjects. In BuildConfig.groovy, add the following lines to your dependencies{} and plugins{} blocks. Note that I’ve also removed hibernate and database-migration as they’re not necessary if working with Mongo and no additional datasource.

Clean, refresh-dependencies, and you should see the new jars show up and the old disappear.

An Aggregation Example

Overview

In the sample project linked above, I’m using a single Service which exposes several methods that use the aggregation examples provided by the MongoDB docs. The point here is not original thought; rather, the point is to translate those examples into Groovy. These examples are based on a Collection of Zipcodes with population data. The project linked above contains the a JSON file which you can import into your MongoDB. See scripts/README for instructions.

Inject mongo

First, in ZipCodeService.groovy, we have Spring inject an instance of GMongo, by name mongo. Then, we get a handle on the DB and zipcodes collection:

Example aggregation method

Taken from the docs and translated to Groovy, we have a method for returning population by state. This will look at every zipcode in the collection, sum its population, and group by State, sorting by population in ascending order. If you value open space and prefer a likelihood of seeing nothing but gas fields, consider North Dakota, FYI.

Note the multi-grouping idiom, which is common in the aggregation framework. Unlike SQL, where we could perform such grouping in one swoop, in this example we’ll first group by state and city, then pipeline that down to grouping by state before passing the results to the sort.

This iterative approach to aggregation — starting with a large resultset and whittling it down further — quite appeals to me as it mimics how I use CTEs (Common Table Expressions) to do the same when working with SQL Server.

aggregate() returns an  AggregationOutput object containing a number of useful methods for getting at the results in addition to very useful metadata regarding the aggregation.

In this example, I’m returning the AggregationOutput.results() method, which is an Array of Maps.

Aggregation Output

The result of the aggregation looks like this, truncated for blog-friendliness:

Use in a View

You’re certainly free to further transform that result into an array of proper objects. For my purposes here, that’s not necessary. Here’s a simple display for that result:

All the Code

Again, see the link at the top of this post for all the code

What’s Next?

As I was porting these examples, I settled into a very comfortable and enjoyable workflow for testing queries in the context of my grails application without re-initializing the framework, running tests, or refreshing my browser. In a future post, I’ll describe how I use the Grails console to mimic the approach I typically take to developing non-trivial queries in a more traditional relational database application.

For My ColdFusion Friends

My ColdFusion friends may be wondering, how do I use the Aggregation Framework in ColdFusion, specifically CFMongoDB? Good news! Sean Daniels has contributed this addition and it’ll end up in CFMongoDB once we finalize the API, which as of this writing should be very soon. Thanks Sean!

Marc's been coding apps on the JVM for over 10 years. He's worked in the print, financial services, and government contracting; he currently spends his days working for a U.S. Government Agency, using technology to help better the lives of American consumers. In the past, Marc contributed to MXUnit, CFMongoDB, and CFConcurrent. He lives and works in York, PA. Cats make him sneeze; bikes and banjos make him happy. Current programming language interests: Go and Clojure

Posted in Groovy/Grails, MongoDB
5 comments on “Using the MongoDB Aggregation Framework in Grails
  1. Will populationByState cache after the first call? I’d imagine this would be a ‘slow’ (comparatively) call and something you wouldn’t want to run every time you render the view.

  2. Marc Esher says:

    Ray, no it will not cache. You could take the same approach here as you would in a ColdFusion app though: use ehcache, etc.

    • Hmm – do you think you ever wouldn’t cache? This seems like a _lot_ of work if you have even a somewhat small dataset? Or am I wrong? It’s been a while since I tried Mongo. I remember it being fast.

  3. francesca k says:

    Also would love to chat with you about MongoDB — want to email me francesca at 10gen dot com

  4. Marc Esher says:

    Ray, I think the answer to your question is actually database-agnostic. Replace the above with a relational database and you have the same questions: How frequently does the data change? What is the cost of this query? Can I run it on a schedule and cache the results? Do I need real-time data? If I need real-time data should I consider a materialized view?

    In this example, where the data are quite static, then yeah, absolutely it’s cache-worthy. However, if I’m storing real-time traffic or shopping cart patterns or performance logs, then probably I want it real time. However, in that case, you would probably look into just fetching the latest data and aggregating it, keeping track of where you left off (which is common practice).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>