hadoop - Hive JOIN of query with subquery takes forever -
lately have been playing bit hive. things have been progressing well, however, when try convert like
2015-04-01 device1 traffic other start 2015-04-01 device1 traffic violation deny 2015-04-01 device1 traffic violation deny 2015-04-02 device1 traffic other start 2015-04-03 device1 traffic other start 2015-04-03 device1 traffic other start
into
2015-04-01 1 2 2015-04-02 1 2015-04-03 2
i tried using following query reason reduce stage of query gets stuck @ 96% no matter how long wait.
select pass.date, count(pass.type), count(deny.deny_type) firewall_logs pass join ( select date, type deny_type firewall_logs device = 'device1' , date '2015-04-%' , type = 'traffic' , subtype = 'violation' , status = 'deny' ) deny on ( pass.date = deny.date ) pass.device = 'device1' , pass.date '2015-04-%' , pass.type = 'traffic' , pass.subtype = 'other' , pass.status = 'start' group pass.date order pass.date ;
all mr2 logs show is:
2015-06-11 01:54:04,206 info [main] org.apache.hadoop.hive.ql.exec.commonjoinoperator: table 0 has 9028000 rows join key [2015-04-26] 2015-06-11 01:54:04,423 info [main] org.apache.hadoop.hive.ql.exec.commonjoinoperator: table 0 has 9128000 rows join key [2015-04-26] 2015-06-11 01:54:04,638 info [main] org.apache.hadoop.hive.ql.exec.commonjoinoperator: table 0 has 9228000 rows join key [2015-04-26] 2015-06-11 01:54:04,838 info [main] org.apache.hadoop.mapred.fileinputformat: total input paths process : 1
would have idea of why?
i try avoid self-joins in hive plague. can collecting , creating map
add jar ./brickhouse-0.7.1.jar; create temporary function collect 'brickhouse.udf.collect.collectudaf'; select date , c_map['start'] starts , c_map['deny'] denies ( select date , collect(status, c) c_map ( select date, status , count( subtype ) c table device='device1' , type='traffic' group date, status ) x group date ) y
Comments
Post a Comment