hadoop - Jcascalog to query thrift data on HDFS -


i read book of nathan marz on lambda architecture. i'm making proof of concept of solution.

i have difficulties build jcascalog query.

this piece of thrift schema interest :

union articlepropertyvalue {   1: decimal quantity,   2: string name; }  union articleid {   1: int id; }  struct articleproperty {    1: required articleid id;    2: required articlepropertyvalue property; }  union dataunit {   1: ticketproperty ticket_property;   2: articleproperty article_property; } 

i stored data pail folder : /home/tickets

now want make request on data : want sum of quantity grouping article name. first need names, , after quantity. each can id of article.

for example have result name request(id_article, name): (1, pasta) - (2, pasta2) - (3, pasta)

for quantity request (id_article, quantity): (1, 2) - (2, 1) - (3, 1)

  tap source = splitdatatap("/home/florian/workspace/tickets");   api.execute(           new stdouttap(),           new subquery("?name", "?sum")             .predicate(source, "_", "?data")             .predicate(new extractarticlename(), "?data")                 .out("?id", "?name")             .predicate(new extractarticlequantity(), "?data")                 .out("?id", "?quantity")             .predicate(new sum(), "?quantity")                 .out("?sum")           ); 

the problem don't how merge result ? how can perfom join cascalog , data in hdfs ?

i guess want store result of query in hdfs, need following:

say data saved in "/data" folder, , in simple text format, thenyou need this:

subquery subquery =  new subquery("?name", "?sum")             .predicate(source, "_", "?data")             .predicate(new extractarticlename(), "?data")             .out("?id", "?name")             .predicate(new extractarticlequantity(), "?data")             .out("?id", "?quantity")             .predicate(new sum(), "?quantity")             .out("?sum");  api.execute(api.hfstextline("/data"), subquery); 

Comments

Popular posts from this blog

javascript - oscilloscope of speaker input stops rendering after a few seconds -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -