Cassandra Read time out after mass delete -
we create table “bidresponses”, schema following. create table yp_rtb_new.bidresponses ( time_id bigint, campaignid int, bidid text, adid int, adsize text, appname text, …
primary key (time_id, campaignid, bidid)
) , set ttl of table 3days. insert 20m records per day. notice weird thing. in first 3 days, run “select * bidresponses limit 10”. after 3rd—mass delete happened because of ttl, when ran “select * bidresponses limit 10” , got time out error; running “select * bidresponses time_id=?”,there no problem. tried force compact, doesn’t help. after restarting cluster, run “select * bidresposnse limit 10” again. idea?
i'm guessing cassandra had read through lot of tombstones (data marked deletion) find data. that, , "select * table;" full table/multiple partition scan cause timeouts, depending on many factors (tombstones, number of nodes, number of partitions etc).
when specified 'time_id=?', told cassandra partition wanted means fewer/no network hops , seeks find data.
i found these articles particularly helpful , relevant: http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling https://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/
and cassandra has date based compaction strategy (date tiered compaction strategy) - can smart modeling around deletes using well. http://www.datastax.com/dev/blog/datetieredcompactionstrategy
Comments
Post a Comment