Now that I live on the West Coast, I've been able to attend many of the wonderful tech conferences that are hosted in the Bay Area. Yesterday, I attended MongoSF in beautiful San Francisco. I've been using MongoDB for a while now, mostly for personal projects. I've written some projects on GitHub that uses MongoDB as the primary data store, and I have also migrated some existing MySQL tables in other projects to use MongoDB instead. Having read MongoDB: The Definitive Guide from front to back, and spending quite some time on the MongoDB docs, I feel like I have a good grasp on the tool. So I was pretty excited to go to this conference and discover some more things about my one of favorite pieces of tech.
There were multiple tracks in this conference, and unfortunately I still can't clone myself in this day and age, so I'll just briefly touch on those sessions that I was able to attend.
Monitoring & Queuing MongoDB: This talk, given by David Mytton from Server Density touched on some of the integrated monitoring tools and commands that MongoDB has baked in. He also showed a bit of Server Density's MongoDB monitoring system, which looks to be incredibly useful. Overall, the talk was decent, but anyone who has MongoDB running in production should know most of this stuff already.
Evolving from relational to document store: Graham Tackley, lead for the web development team over at the U.K.'s Guardian news site, gave an interesting talk on how the site has evolved over time since the mid-90s (Lots of Perl / CGI goodness). Currently they are in the process of moving certain parts of the site to use MongoDB. Besides the history lesson on the site, he also mentioned how they are dealing with possible future changes in architecture, notably by building APIs around the site functionality. This talk got my gears running about some of my own projects, and how I might build any new projects I have in my mind.
MongoDB Profiling and Tuning: This talk was given by Kenny Gorman, who works as a data architect at Shutterfly. Kenny went through some of the steps used to profile MongoDB, like using explain() on your queries, and how to make things faster, not only on the software side, but also on hardware. He brought up Facebook's Flashcache, and how it makes MongoDB speed up. I particularly enjoyed hearing the hardware side of things, as I feel like hardware is mostly overlooked by developers.
MongoDB's New Aggregation Features - A Sneak Peek: Chris Westin is a core MongoDB contributor, and he gave us a sneak peek at a new framework for aggregating records in MongoDB. This framework is really not meant to replace Map/Reduce, which will still serve very well for massive data. But for people like me, who need to aggregate smaller amounts of data (thousands of documents instead of millions), this will be much easier and faster to deal with. Very cool stuff, and I can't wait to use it.
Lessons Learned from Migrating 2+ Billion Documents at Craigslist: Former Yahoo! and current Craigslist employee Jeremy Zawodny spoke about how Craigslist is using MongoDB for their posting archive, and the lessons learned along the way, like the usage of replica sets even in development, document encoding, and how important data types are when migrating collections. He had a similar talk at MongoSV a few months ago, so I didn't feel like there wasn't much new information here. Still, a good talk about migrating a large amount of data from MySQL to MongoDB - it can be done.
Practical Scaling and Sharding: Eliot Horowitz is one of the main MongoDB contributors, and the CTO of the company that backs MongoDB, 10gen. He went through the features and usage of Replica Sets and Sharding, with a few use cases and live examples. This seemed more like an introductory talk more than anything else, so there was nothing groundbreaking here.
MongoDB at Foursquare: This talk was given by Jorge Ortiz, an engineer at Foursquare, who proceeded to mention briefly how MongoDB was being used at Foursquare, some of the lessons they've learned throughout the years with MongoDB, and talked about their Scala library for querying MongoDB called Rogue. Frankly, I was disappointed with this talk, as Jorge didn't give much insight outside of a few numbers and oft-repeated tips. I was expecting a more informative talk here.
Indexing & Query Optimization: Alvin Richards, a West Coast 10gen employee, gave an in-depth talk about indexes in MongoDB. He went through everything, from basic indexes, to indexing order, to new indexing options in MongoDB 1.8 (sparse indexes and covered indexes), to even showing representations of the internal B-Tree implementation. Very informative.
Lightning Talks: This was divided in three shorter talks. First up was Michael Goff, who spoke about how his company, Cocoafish uses MongoDB to serve up data to mobile apps. Next was Chris Carrier (Is he on Twitter? I couldn't find an account to link here) from Zuberance, speaking about how to create a reporting backend using MongoDB. Last, but not least, was Chad Arimura, who went through SimpleWorker, a cloud-based job scheduling service that uses MongoDB. These talks were short - which you might have figured out with 'Lightning' in the title - and seemed to be mostly about the speakers' particular sites more than anything else.
At the end, Eliot Horowitz gave a quick rundown on the upcoming features for MongoDB 2.0, like TTL collections, online data compaction, faster Map/Reduce, etc. Surprisingly, this 2.0 release is scheduled for June 2011, as in next month. Seems like they're going to be doing quicker iterations and getting new stuff out there as soon as possible, which can only lead to good stuff for users. There was an after-party, but when I swung by the place there was a massive line outside to get in, so I decided to head home instead. Unfortunately, this is the third time in as many conference after-parties that I've had to do this, so it seems like these after-parties don't scale well.
Overall, while the conference was pretty good, and everything was organized very well, most of the times I felt like I was out of place or at the wrong talk. Judging from the few people I spoke to, and those that I overheard, it seems like most people at the conference hadn't used MongoDB much (or at all), and many talked I went to touched on what I consider some of the basics of MongoDB. Even if they weren't basic topics, like Replica Sets or Sharding, if you have read a recent MongoDB book, you knew what most of the speakers were talking about. Also, there was plenty of repeating of the same tips over and over again in a lot of the talks. For example, in the nine talks I went to, the speaker mentioned the "always keep indexes in memory" rule of thumb in at least five of those talks. Truth be told, MongoDB isn't really a super-deep technology (even the aforementioned O'Reilly book clocks in at a bit north of 200 pages), so this can be the reason why. But I wanted to know more on the upcoming features in MongoDB (I only really saw one talk in the schedule like this, which was the new aggregation framework), and some more in-depth views of how companies are leveraging MongoDB in their technology stack instead of just pointing out how awesome MongoDB is - which we already know.
In any case, I had tons of fun, learned some new tips and tricks, and got some fresh inspiration to use in my own work soon. The best thing about these conferences is knowing that what you're learning and using is valuable not just to you, but to many out there, and it just gives you motivation to keep on using those tools in familiar and new ways. Much props go out to 10gen for making this conference go smoothly, and hopefully there's another one of these sometime next year.