Cassandra Read time out after mass delete -

- March 15, 2013

we create table “bidresponses”, schema following. create table yp_rtb_new.bidresponses ( time_id bigint, campaignid int, bidid text, adid int, adsize text, appname text, …

primary key (time_id, campaignid, bidid)

) , set ttl of table 3days. insert 20m records per day. notice weird thing. in first 3 days, run “select * bidresponses limit 10”. after 3rd—mass delete happened because of ttl, when ran “select * bidresponses limit 10” , got time out error; running “select * bidresponses time_id=?”,there no problem. tried force compact, doesn’t help. after restarting cluster, run “select * bidresposnse limit 10” again. idea?

i'm guessing cassandra had read through lot of tombstones (data marked deletion) find data. that, , "select * table;" full table/multiple partition scan cause timeouts, depending on many factors (tombstones, number of nodes, number of partitions etc).

when specified 'time_id=?', told cassandra partition wanted means fewer/no network hops , seeks find data.

i found these articles particularly helpful , relevant: http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling https://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/

and cassandra has date based compaction strategy (date tiered compaction strategy) - can smart modeling around deletes using well. http://www.datastax.com/dev/blog/datetieredcompactionstrategy

Search This Blog

Th

Cassandra Read time out after mass delete -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -