Web Analytics Using MongoDB

 Web Analytics using MongoDB

v  Logging with MongoDB:

The most basic requirement of web analytics is to log visits to different pages in a web application. Following are the steps to learn how we can implement a logger module that will log user requests to a web app in a MongoDB collection. The steps are;

1.  The page being visited

2.  The IP address of the user

3.  The time of visit

4. The user agent string of the browser

5. The query parameters(if required)

6.  The time taken to generate a response, in milliseconds.

User can implement user request logging by creating a collection in MongoDB and inserting the HTTP request data into this collection. A capped collection is a collection in which we can specify the maximize size and it will always maintain this size.

v  Capped Collections:

A capped collections is just like any other collection in MongoDB, except that if we specify the size of the collection in bytes, it will maintain this size by itself. That means when this collection grows larger than the specified size, it replaces the oldest documents automatically with new ones. A capped collection is created explicitly by calling createCollection(), unlike regular collections which are created implicitly. A second parameter has to be passed to this method specifying that this is a capped collection and the size of the collection in bytes.

In the following example, we are working with the “gfg” database in which we are creating a new capped collection of name Student with maximum document capacity 4 using createCollection() method.

  • Features:

1.   Sorting in natural order:

Another notable feature of a capped collection is that it implements natural ordering. Natural ordering is the database’s native approach of ordering documents in a collection. When we query a collection, without specifying to sort on a certain field, we will get the documents in the order they were inserted. In a regular collection, this is not guaranteed to happen because as we update the documents, their sizes change and they are moved around to fit into the collection. A capped collection on the other hand guarantees that the documents are returned in the order of their insertion.

 2.   Update and delete documents in a capped collection:

User can update documents in a capped collection the same way we update documents for a regular collection. But there is a catch; the document being updated is not allowed to grow in size (Otherwise capped collection could not guarantee natural ordering). Also, we cannot delete documents from a capped collection. We can however use drop() to delete the collection entirely.

  • Convert a regular collection to a capped one

We can also turn a regular collection into a capped collection by using the following command;

>db.runCommand({‘convertToCapped’: ‘r_coll’, size : 1000000}) { “ok” : 1 }

v  Extracting analytics data with MapReduce:

Generally, the log to contain raw data about page visits, but we need to extract some meaningful information out of it. For example, it might be useful to know how many times a page has been viewed over a certain time period, or what is the average response time for a page. It is also possible to do so by applying MapReduce on the log.

It is not generally a good idea to calculate analytics using such MapReduce in real time, especially if user are running a website that enjoys heavy traffic. The log will be very large and constantly growing, so running MapReduce on it would take time because MapReduce are known to be consistent and continuous, but their speed depends on several factors. If we ran the page view calculations job , it will take a long time to load the page. Rather, user should run processes in the background that execute the MapReduce jobs, stores the results in a collection, and have the analytics page simply read from that collection.

6 thoughts on “Web Analytics Using MongoDB

Leave a Reply

Your email address will not be published. Required fields are marked *