mongodb - Nosql database design for complex querying -


a nosql question:

let's have scenario:

a user status change (let's every seconds), has different characteristics such country (up 10k characteristics per user)... user can post messages have different types.

the issue:

the scenario in opinion rds oriented join used lot querying. however, not option (for sake of exercise). therefore, not looking solution pseudo rds hive or other solution have pseudo join. looking mongodb can use mapreduce or aggregation.

my solution (using mongodb):

let's have 3 collections:

  1. user => user characteristics (a large number of differents characteristics such age/sex..)
  2. message => message specific field
  3. status => status specific field

the possible way tackle problem (as far know) are:

  1. denormalize data duplicating user field , embed in message , status (or putting in 1 collection) => not seem optimized have lot of characteristics per user , reach 2mb limit of documents (you use gridfs worried perf of , duplicate tons of not useful data storage).
  2. use sql solution adding user_id reference message , status => seems reasonable solution. trapped (in terms of query performance) if want make specific queries such count number of message of type x users have last status equal z , have characteristic y equal e , group them characteristic w.in sql select tmp.count(*), user.characteristicw message inner join status on status.user_id=message.user_id inner join user on user.id=message.user_id status.type=z , user.characteristicz=e group user.characteristicw (this query not totally exact want know if last status equal z , not if ever had status equal z, require select within select not point of exercise). becomes demanding have make several queries (in example 1 getting user id have last status equal z, 1 filter user id list 1 have characteristic y equal e, messages these users , group them characteristic w map reduce job.
  3. go double reference user ref message , message user. status ref user , user ref status. => might seem fine user document being big come potentially issue solution 1 assuming there tons of messages , status.

i went option 2 unhappy processing time query might not seem optimized.

question:

in scenario above, best practices implement scalable solution allow complex querying example gave above.


Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -