Yesterday, we made some tweaks to a couple of sections on the site with speed in mind. You may not have noticed them but they did happen. We've increased the speed at which gliphs appear on a writers page and reduced the loading time of the insights pages. The profile page changes are harder to spot as the gliphs always load in the background, but now they get there just that little bit faster. No, where the big gains have been made are on the insights pages.
If you've ever looked at your historical insight data you will have notice a very obvious lag in having them display and in some cases everything times out and you get an error page. We've managed to fix this problem and now historical insights data will load almost instantly.
And now for the science...
Ok I lied, it's not really science, more a technical discussion. Sorry! I think it's worth providing a little background to the problem before I jump into how we fixed the problem.
Here at Glipho we proud to say we use MongoDB to as our primary data store. Practically everything is stored in a Mongo database, except images. MongoDB has many great features, not least the ability to store information of any shape without requiring some form of formal schema. This does however, cause some issues.
In a tradition SQL database you tend to sit down and work out how the data will be shaped and stored, normally through a process of normalisation. After you've worked all that out you then work out what queries you might run and put indexes on queries that you know are going to be expensive. You might even go as far as using stored procedures to lock down the the types of queries run so that that you hit the correct indexes.
With MongoDB it is easy to store data, almost too easy, and instead of working out what indexes might be needed you can end up just storing and retrieving documents without really thinking of what is going on in the database. This is especially the case where, like us, you use the C# Linq Drivers. You run a Linq query, some magic happens and you get a result, normally a fair bit later. Combine this with the lack of indexes and all of a sudden you have the potential for a massive performance problem. Thankfully, all this can be easily solved.
The black box problem
Now knowing what queries are being run is just as important as having indexes on data. Putting random indexes on without knowing what's happening is as helpful as pissing into the wind. Unfortunately, the C# Linq driver doesn't help with this, at least at first glance, that is until you realise there is a way to get the actual query JSON that will be sent to the database.
If you have a Linq query similar to that below, it's not immediately obvious as to how to get the actual MongoDB query.
MongoDatabase db = MongoDatabase.Create(connectionString); MongoCollection<BsonDocument> entitiesCollection = db.GetCollection(collectionName); IQueryable<SomeEntity> entities = entitiesCollection.AsQueryable<SomeEntity>(); entities = entities.Where(e => e.Name == "A Query");
However if you recast the entities object, like below, you will find the useful GetMongoQuery() method.
(entities as MongoQueryable<SomeEntity>).GetMongoQuery();
By calling this method you get the JSON for the query. Armed with this information you can now see what is going on. If, like us, you had written a Linq query without really caring how it would be queried then you'll probably need to start by rewriting the query, especially when do range queries as happens for historical insights data. The next bit is relatively simple; write indexes for the queries.
The moral of the story?
Know what you queries are doing. Have indexes for those queries. Just because MongoDB doesn't need them, doesn't mean you shouldn't have them. The moment your data set get over 1000 entries you really need indexes. No scratch that, you need indexes full stop!