hadoop - Hive JOIN of query with subquery takes forever -

- June 15, 2014

lately have been playing bit hive. things have been progressing well, however, when try convert like

2015-04-01   device1   traffic    other       start    2015-04-01   device1   traffic    violation   deny  2015-04-01   device1   traffic    violation   deny  2015-04-02   device1   traffic    other       start    2015-04-03   device1   traffic    other       start    2015-04-03   device1   traffic    other       start

into

2015-04-01   1       2 2015-04-02   1        2015-04-03   2

i tried using following query reason reduce stage of query gets stuck @ 96% no matter how long wait.

select pass.date, count(pass.type), count(deny.deny_type) firewall_logs pass join ( select date, type deny_type firewall_logs device = 'device1'  , date '2015-04-%' , type = 'traffic' , subtype = 'violation' , status = 'deny'  ) deny on ( pass.date = deny.date  ) pass.device = 'device1'  , pass.date '2015-04-%' , pass.type = 'traffic' , pass.subtype = 'other' , pass.status = 'start' group pass.date order pass.date ;

all mr2 logs show is:

2015-06-11 01:54:04,206 info [main] org.apache.hadoop.hive.ql.exec.commonjoinoperator: table 0 has 9028000 rows join key [2015-04-26] 2015-06-11 01:54:04,423 info [main] org.apache.hadoop.hive.ql.exec.commonjoinoperator: table 0 has 9128000 rows join key [2015-04-26] 2015-06-11 01:54:04,638 info [main] org.apache.hadoop.hive.ql.exec.commonjoinoperator: table 0 has 9228000 rows join key [2015-04-26] 2015-06-11 01:54:04,838 info [main] org.apache.hadoop.mapred.fileinputformat: total input paths process : 1

would have idea of why?

i try avoid self-joins in hive plague. can collecting , creating map

add jar ./brickhouse-0.7.1.jar; create temporary function collect 'brickhouse.udf.collect.collectudaf';  select date   , c_map['start'] starts   , c_map['deny'] denies (   select date     , collect(status, c) c_map   (     select date, status       , count( subtype ) c     table     device='device1' , type='traffic'     group date, status ) x   group date ) y

Search This Blog

Th

hadoop - Hive JOIN of query with subquery takes forever -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -