Performance notes for MongoDB
Some performance tips for improving your usage of MongoDB
I spent the past year working with MongoDB. Since our service was used for Halo and Call of Duty we had very aggressive scale and performance targets. So as part of our stress runs we identified some scalability issues that had their root cause in our data layer. Luckily MongoDB is a database that has great tools for identifying performance problems. Here are some of the approaches I did for debugging/fixing performance problems in Mongo.
Normal operations take too long
Slow Queries: This is the most common and the easiest to fix. You need to find out if you have an index that covers the query that is slow. As I mentioned before having indexes will speed up significantly your queries and as a general rule you want to have them for most of your queries. Optimize your queries with indexes!
Slow updates or inserts: This one is trickier! :) If you follow the previous recommendation you might find that your DB writes are too slow. Here are some operations that can slow down your updates
- You are grabbing and or modifying large arrays inside your documents.
- You are modifying fields that are part of an array index.
- You increase the size of the document you are modifying. This could cause it to move which will cause some IO cost while Mongo is moving stuff around
The solution to these might sound like workarounds but they help:
- Limit the size of the arrays inside your documents
- Add padding to the documents when you create them if you know they will grow.
- Change your schema to simplify your queries.
In our case we found that having indexes would help updates as well (because it would speed up the time it took to find the document to modify). But whenever the rate of updates/finds increased we found our finds would take longer (see below for the reason).
Normal operations are slow but they should not be slow
We saw this in our system a lot. Some of the queries would show as optimized in the .explain (See the previous blog for an explanation on what that is) but they would take too long to execute. Or they would show up in our profile as taking longer than 100 ms.
In our case we found that they were blocking waiting on slow operations such as updates.
You can confirm this by using the top command. This tells you how much time was spent locking for reads or writes. You can also look at the profile for the queries and it will tell you how much time it spent executing vs yielding to other operations.
{ "totals" : { "records.users" : { ... "readLock" : { "time" : 305264, "count" : 2824 }, "writeLock" : { "time" : 13, "count" : 1 }, ... } } |
Basically in MongoDB the writer is greedy. What this means is that if there is a write and a read operation the reads will yield to the write. So even if you have a fast query (that takes milliseconds to run normally) it might take very long because it’s yielding to slow updates (It's a little more complicated than that. Concurrency behavior changes depending on the engine you use and also if you use sharding or replicasets. See the FAQ for the latest update that applies to your configuration).
You have a couple of options:
- Follow the previous recommendations on speeding up updates
- Consider sharding your data so that your reads don’t wait on that many writes.
Slow Aggregates
Aggregates in Mongo are super powerful! However unlike other queries they will be touching most data in the Database (because you probably are trying to generate aggregate data or some kind of report). So they can easily become the bottleneck of your service. Also as I mentioned in a previous blog post the .explain for it is not great because it will tell you information about the indexes but not about memory.
- The first and very important tip is to take a backup of your production Database and use that to play around with the aggregates without fear of affecting your production system. Since it’s a backup with production data you will be able to measure your improvements. Super useful!
- Focus on Memory usage. MongoDB has a Hard limit for the data that is passed in the aggregation pipeline of 100MB. You can turn on Disk usage to go around this but it will really slow down everything. So make sure you are under this limit. (And yes you can save the output of the aggregates to a collection, this can can be great but there is a high perf cost to IO so you still need to reduce the memory usage as much as possible). I found that doing the aggregate in memory is the way to go, so making sure that we were passing the least amount of memory through each stage improved performance quite a bit. Use $Project as early as possible to remove fields you don’t use!
- Try to avoid tablescans or visiting every document in your collection! Again by filtering out documents that you don't care early in the pipeline you will improve performance. Use $match as soon as possible to make the aggregation pipeline happy!
Project and Match before sorting/unwinding:
- If you are using an operator like $sort, it has to go through all the documents. This can get very expensive if you haven’t done project + match before.
- Unwind creates new documents which duplicates data
- Index will help both match and sorts. So use indexes as much as possible. Note that this are only useful before you transform stuff (project, group, unwind) so make sure you measure that it helps (otherwise why add an index that is not helping?)
Make sure you MEASURE your fixes through all the pipeline. That is the great thing of using a production backup. You can get an idea of the performance impact of your changes.
Combining Sort and limit will keep around only the amount of documents that you passed to limit. So even though it will need to go through the entire collection the aggregation pipeline is smart enough to discard the documents that it does not need. Good for memory!
Project is supposed to be optimized (as in, the pipeline will try to infer if you don't need a field and move the project to the earliest possible moment). However in my local tests I found that making the change myself was much better.
So try to reduce work for the next stage in the pipeline in each step. And after every optimization make sure you measure!
Some other options (that we did not need to do)
1) Upgrade your hardware! Make sure your working set fits in RAM. We use pretty beefy machines for our Database so we did not need to worry about this.
2) Use fast disks (SSDs) to speed up writes and reads (try to avoid page faults if you can)
Good luck!