cqlsh - Cassandra performance on distinct query -
in cassandra , read need design table schema such minimum number of partitions hit. have designed schema meet requirement. in scenario need partition keys alone. planning use
select distinct <partitionkeys> table
i ran distinct query using cqlsh around 15k rows .it quite fast.
questions
- will there performance issues if use distinct ?
- how cassandra fetches partition keys alone ?
- i need know limitations on distinct query.
will there performance issues if use distinct? how cassandra fetches partition keys alone?
basically, cassandra has rip through nodes , pull partition (row) keys table. querying these keys how cassandra designed work, not surprised performed you. drawback, have hit or of nodes complete operation, performance slow if have large number of nodes.
this difference between cql rows , rows in underlying storage comes play. if @ data cassandra-cli
tool, can see how partition keys treated differently. here example crew members of ship stored in table, ship.
aploetz@cqlsh:presentation> select * shipcrewregistry ; shipname | lastname | firstname | citizenid | aliases ----------+-----------+-----------+--------------------------------------+-------------------------------------- serenity | book | derial | 48bc975a-c9f2-474d-8a29-247503445877 | {'classified'} serenity | cobb | jayne | 2d643fb1-54fb-4c98-8d2d-a5bb9c6c8354 | {'hero of canton'} serenity | frye | kaylee | d556cf44-348b-4ea3-8c19-ba9d4877818c | null serenity | inara | serra | a25b7e02-8099-401a-8c41-d9d2ea894b72 | null serenity | reynolds | malcolm | 169382b7-21b0-47bf-b1c8-19bc008a9060 | {'mal', 'sgt. reynolds'} serenity | tam | river | af68201f-4135-413e-959c-dd81ea651e52 | null serenity | tam | simon | aa090e1a-7792-4d7b-bba9-bac66f8c1f15 | {'dr. tam'} serenity | washburne | hoban | 73f591df-c0dc-44c4-b3f3-9c37453c9537 | {'wash'} serenity | washburne | zoey | 46bc77ad-53ad-4402-b252-a0543005c583 | {'corporal alleyne', 'zoey alleyne'} (9 rows)
but when query within cassandra-cli
:
[default@presentation] list shipcrewregistry; using default limit of 100 using default cell limit of 100 ------------------- rowkey: serenity => (name=book:derial:48bc975a-c9f2-474d-8a29-247503445877:, value=, timestamp=1424904853420170) => (name=book:derial:48bc975a-c9f2-474d-8a29-247503445877:aliases:434c4153534946494544, value=, timestamp=1424904853420170) => (name=cobb:jayne:2d643fb1-54fb-4c98-8d2d-a5bb9c6c8354:, value=, timestamp=1424904853492976) => (name=cobb:jayne:2d643fb1-54fb-4c98-8d2d-a5bb9c6c8354:aliases:4865726f206f662043616e746f6e, value=, timestamp=1424904853492976) => (name=frye:kaylee:d556cf44-348b-4ea3-8c19-ba9d4877818c:, value=, timestamp=1428442425610395) => (name=inara:serra:a25b7e02-8099-401a-8c41-d9d2ea894b72:, value=, timestamp=1428442425621555) => (name=reynolds:malcolm:169382b7-21b0-47bf-b1c8-19bc008a9060:, value=, timestamp=1424904853505461) => (name=reynolds:malcolm:169382b7-21b0-47bf-b1c8-19bc008a9060:aliases:4d616c, value=, timestamp=1424904853505461) => (name=reynolds:malcolm:169382b7-21b0-47bf-b1c8-19bc008a9060:aliases:5367742e205265796e6f6c6473, value=, timestamp=1424904853505461) => (name=tam:river:af68201f-4135-413e-959c-dd81ea651e52:, value=, timestamp=1428442425575881) => (name=tam:simon:aa090e1a-7792-4d7b-bba9-bac66f8c1f15:, value=, timestamp=1424904853518092) => (name=tam:simon:aa090e1a-7792-4d7b-bba9-bac66f8c1f15:aliases:44722e2054616d, value=, timestamp=1424904853518092) => (name=washburne:hoban:73f591df-c0dc-44c4-b3f3-9c37453c9537:, value=, timestamp=1428442425587484) => (name=washburne:hoban:73f591df-c0dc-44c4-b3f3-9c37453c9537:aliases:57617368, value=, timestamp=1428442425587484) => (name=washburne:zoey:46bc77ad-53ad-4402-b252-a0543005c583:, value=, timestamp=1428442425596863) => (name=washburne:zoey:46bc77ad-53ad-4402-b252-a0543005c583:aliases:436f72706f72616c20416c6c65796e65, value=, timestamp=1428442425596863) => (name=washburne:zoey:46bc77ad-53ad-4402-b252-a0543005c583:aliases:5a6f657920416c6c65796e65, value=, timestamp=1428442425596863) 1 row returned. elapsed time: 86 msec(s).
this intended show how 9 cql rows 1 row "under hood."
i need know limitations on distinct query.
in cql, distinct
work on partition keys. not sure how many rows negate usefulness. 15000 cql rows should fine it. if have millions of distinct partition keys (high cardinality) expect performance drop off...especially several nodes in cluster.
Comments
Post a Comment