3

Ive used MySQL since I was a young boy and now I have to switch to MongoDB for various reasons.

I write a log which saves every php error in a MongoDB collection. Reading out the errors is not the problem, using a simple find() is pretty easy and it's great that you can use php arrays to get to the data.

Now I want to have some statistics about the errors. My Collection looks like so:

 {
   "_id": ObjectId("51af10ca0e5e723c0a000000"),
   "Errors": {
     "2048": {
       "0": {
         "Message": "Declaration of ADODB_mysqli::MetaIndexes() should be compatible with ADOConnection::MetaIndexes($table, $primary = false, $owner = false)",
         "File": "File.php",
         "Line": NumberInt(29),
         "Time": NumberInt(1370427591)
      }
    },
     "2": {
       "0": {
         "Message": "Error",
         "File": "File.php",
         "Line": NumberInt(29),
         "Time": NumberInt(1370427591)
      },
      "1": {
         "Message": "Error",
         "File": "File.php",
         "Line": NumberInt(29),
         "Time": NumberInt(1370427591)
      }
    },
    "8": {
       "0": {
         "Message": "Undefined index: PluginLastAdded",
         "File": "File.php",
         "Line": NumberInt(36),
         "Time": NumberInt(1370427594)   
      },
       "1": {
         "Message": "Undefined index: PluginLastAdded",
         "File": "File.php",
         "Line": NumberInt(36),
         "Time": NumberInt(1370427594)   
      }
    }
  }
}

Now I want to know how often each error in this entry occours. It would be great to have a seperated list into 2048, 2, 8 and then the count of each error.

Is that possible without much php code but using MongoDB's aggregation?

Any help would be great, MongoDB is 180° different than MySQL in my opinion, and the switch is pretty hard.

2
  • One problem I can is your schema, you cannot easily unwind and aggregate that schema unless with a map reduce. I would personally recommend you change it so that the error code is not the key but rather a field in the document
    – Sammaye
    Commented Jun 6, 2013 at 12:52
  • Thank you, I'll try it this way. Maybe I can come a cross a solution now. :)
    – Moe
    Commented Jun 6, 2013 at 13:26

1 Answer 1

1

To build upon what Sammaye mentioned above, the following schema with real arrays would be more suitable:

{
    "_id": ObjectId("51af10ca0e5e723c0a000000"),
    "errors": [
        {
            "code": 2048,
            "message": "Declaration of ADODB_mysqli::MetaIndexes() should be compatible with ADOConnection::MetaIndexes($table, $primary = false, $owner = false)",
            "file": "File.php",
            "line": NumberInt(29),
            "time": NumberInt(1370427591)
        },
        {
            "code": 2,
            "message": "Error",
            "file": "File.php",
            "line": NumberInt(29),
            "time": NumberInt(1370427591)
        },
        {
            "code": 2,
            "message": "Error",
            "file": "File.php",
            "line": NumberInt(29),
            "time": NumberInt(1370427591)
        },
        {
            "code": 8,
            "message": "Undefined index: PluginLastAdded",
            "file": "File.php",
            "line": NumberInt(36),
            "time": NumberInt(1370427594)
        },
        {
            "code": 8,
            "message": "Undefined index: PluginLastAdded",
            "file": "File.php",
            "line": NumberInt(36),
            "time": NumberInt(1370427594)
        }
    ]
}

The array structure also makes indexing and querying much more straightforward. Indexes are capable indexing array values, and MongoDB also makes it easy to query on arrays. For instance, you have the flexibility to query for particular errors (perhaps a combination of code and file) with $elemMatch. Additionally, since errors is a real array, you can use various update operators, such as $push or $pull.

One thing to consider is that nested objects restrict the way you index and write queries. In your previous example, the only way to query for the first error message is by Errors.2048.0.Message, but using the schema above would allow for a query on errors.message.

Arrays also makes the Aggregation Framework a viable option for you, particularly since it will allow you to iterate through arrays with its $unwind operator and then $group on values within array elements. In addition to the MongoDB documentation on the aggregation framework, you may find this presentation helpful, goes through the different operators visually.

To your earlier question on obtaining a count of the number of errors per code, the following aggregation framework pipeline would calculate that across all documents in the entire collection:

db.foo.aggregate([
    { $unwind: "$errors" },
    { $group: {
        _id: "$errors.code",
        num: { $sum: 1 }
    }}
]);

Lastly, I would suggest storing the time field as a BSON date (MongoDate in PHP) instead of an integer. That opens up the option of using date operators in the aggregation framework.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.