hadoop - Jcascalog to query thrift data on HDFS -
i read book of nathan marz on lambda architecture. i'm making proof of concept of solution.
i have difficulties build jcascalog query.
this piece of thrift schema interest :
union articlepropertyvalue { 1: decimal quantity, 2: string name; } union articleid { 1: int id; } struct articleproperty { 1: required articleid id; 2: required articlepropertyvalue property; } union dataunit { 1: ticketproperty ticket_property; 2: articleproperty article_property; }
i stored data pail folder : /home/tickets
now want make request on data : want sum of quantity grouping article name. first need names, , after quantity. each can id of article.
for example have result name request(id_article, name): (1, pasta) - (2, pasta2) - (3, pasta)
for quantity request (id_article, quantity): (1, 2) - (2, 1) - (3, 1)
tap source = splitdatatap("/home/florian/workspace/tickets"); api.execute( new stdouttap(), new subquery("?name", "?sum") .predicate(source, "_", "?data") .predicate(new extractarticlename(), "?data") .out("?id", "?name") .predicate(new extractarticlequantity(), "?data") .out("?id", "?quantity") .predicate(new sum(), "?quantity") .out("?sum") );
the problem don't how merge result ? how can perfom join cascalog , data in hdfs ?
i guess want store result of query in hdfs, need following:
say data saved in "/data" folder, , in simple text format, thenyou need this:
subquery subquery = new subquery("?name", "?sum") .predicate(source, "_", "?data") .predicate(new extractarticlename(), "?data") .out("?id", "?name") .predicate(new extractarticlequantity(), "?data") .out("?id", "?quantity") .predicate(new sum(), "?quantity") .out("?sum"); api.execute(api.hfstextline("/data"), subquery);
Comments
Post a Comment