0

Given billions of the following variable length URLs, where the number of parameters depends on the parameter "type":

  • test.com/req?type=x&a=1&b=test
  • test.com/req?type=x&a=2&b=test2
  • test.com/req?type=y&a=4&b=cat&c=dog&....z=0

I would like to extract and store its parameters in a database to basically execute queries like "get number of occurrences of each possible value for parameter "a" when "type" is x" as fast as possible, taking into account that:

  • There are 100 possible values for "type".
  • There will NOT be concurrent writes/reads in the DB. First I fill the DB, then I execute queries.
  • There will be ~10 clients querying the DB.
  • There is only one machine for storing the DB (no clusters/ distributed computing)

Which of the following options for the DB would be the fastest option?

1) MySQL using an EAV pattern

table 1

columns: id, type.

rows:

0   |   x
1   |   x
2   |   y

table 2

columns: table1_id, param, value

rows:

0     |   a   |   1
0     |   b   |  test

2) NoSql (mongoDb)

Please feel free to suggest any other option.

Thanks in advance.

1
  • Why not set up a test to see which is the fastest in your case?
    – halfer
    Commented Nov 24, 2013 at 6:52

1 Answer 1

1

I think you can try use ElasticSearch. It's very fast search engine which can be used as a document-oriented (JSON) NoSQL database. If the insertion speed does not play a decisive role, it will be a good solution for your problem.

It's structure of json document. {url: "your url", type: "type from url", params: {a:"val", b:"val"...}} or more simple {url: "your url", type: "type from url", a:"val", b:"val"...}

Size of params is not fixed, because it's scheme-free.

2
  • Thanks for your answer. ES really sounds like a good option, however I have seen ES users saying that there are still some bugs causing data loss here. Wouldn't MongoDB + ES be a safer option ? Thanks.
    – D T
    Commented Nov 23, 2013 at 2:34
  • 1
    I have no experience with the MongoDB + ES, but I worked with Couchbase + ES. Couchbase replicated documents by XDCR to ES and on ES-side we execute query. So you can use Couchbase only like store, and ES like search engine. Note, that ES needs all of the documents to make index. So you have to have twice as much memory. But on your link last comment was 7 month ago. ES and Lucene have a lot of updates, maybe problems with loss data was solved. Commented Nov 23, 2013 at 12:54

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.