Mike Carter
2014-06-24 08:09:08 UTC
Hello!
I'm a beginner in C* and I'm quite struggling with it.
Iâd like to measure the performance of some Cassandra-Range-Queries. The
idea is to execute multidimensional range-queries on Cassandra. E.g. there
is a given table of 1million rows with 10 columns and I like to execute
some queries like âselect count(*) from testable where d=1 and v1<10 and v2
caused by long scan operations.
In further tests I like to extend the dimensions to more than 200 hundreds
and the rows to 100millions, but actually I canât handle this small table.
Should reorganize the data or is it impossible to perform such high
multi-dimensional queries on Cassandra?
The setup:
Cassandra is installed on a single node with 2 TB disk space and 180GB Ram.
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Keyspace:
CREATE KEYSPACE test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '1'
};
Table:
CREATE TABLE testc21 (
key int,
d int,
v1 int,
v10 int,
v2 int,
v3 int,
v4 int,
v5 int,
v6 int,
v7 int,
v8 int,
v9 int,
PRIMARY KEY (key)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='ROWS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX testc21_d_idx ON testc21 (d);
select * from testc21 limit 10;
key | d | v1 | v10 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9
--------+---+----+-----+----+----+-----+----+----+----+----+-----
302602 | 1 | 56 | 55 | 26 | 45 | 67 | 75 | 25 | 50 | 26 | 54
531141 | 1 | 90 | 77 | 86 | 42 | 76 | 91 | 47 | 31 | 77 | 27
693077 | 1 | 67 | 71 | 14 | 59 | 100 | 90 | 11 | 15 | 6 | 19
4317 | 1 | 70 | 77 | 44 | 77 | 41 | 68 | 33 | 0 | 99 | 14
927961 | 1 | 15 | 97 | 95 | 80 | 35 | 36 | 45 | 8 | 11 | 100
313395 | 1 | 68 | 62 | 56 | 85 | 14 | 96 | 43 | 6 | 32 | 7
368168 | 1 | 3 | 63 | 55 | 32 | 18 | 95 | 67 | 78 | 83 | 52
671830 | 1 | 14 | 29 | 28 | 17 | 42 | 42 | 4 | 6 | 61 | 93
62693 | 1 | 26 | 48 | 15 | 22 | 73 | 94 | 86 | 4 | 66 | 63
488360 | 1 | 8 | 57 | 86 | 31 | 51 | 9 | 40 | 52 | 91 | 45
Mike
I'm a beginner in C* and I'm quite struggling with it.
Iâd like to measure the performance of some Cassandra-Range-Queries. The
idea is to execute multidimensional range-queries on Cassandra. E.g. there
is a given table of 1million rows with 10 columns and I like to execute
some queries like âselect count(*) from testable where d=1 and v1<10 and v2
20 and v3 <45 and v4>70 ⊠allow filteringâ. This kind of queries is very
slow in C* and soon the tables are bigger, I get a read-timeout probablycaused by long scan operations.
In further tests I like to extend the dimensions to more than 200 hundreds
and the rows to 100millions, but actually I canât handle this small table.
Should reorganize the data or is it impossible to perform such high
multi-dimensional queries on Cassandra?
The setup:
Cassandra is installed on a single node with 2 TB disk space and 180GB Ram.
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Keyspace:
CREATE KEYSPACE test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '1'
};
Table:
CREATE TABLE testc21 (
key int,
d int,
v1 int,
v10 int,
v2 int,
v3 int,
v4 int,
v5 int,
v6 int,
v7 int,
v8 int,
v9 int,
PRIMARY KEY (key)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='ROWS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX testc21_d_idx ON testc21 (d);
select * from testc21 limit 10;
key | d | v1 | v10 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9
--------+---+----+-----+----+----+-----+----+----+----+----+-----
302602 | 1 | 56 | 55 | 26 | 45 | 67 | 75 | 25 | 50 | 26 | 54
531141 | 1 | 90 | 77 | 86 | 42 | 76 | 91 | 47 | 31 | 77 | 27
693077 | 1 | 67 | 71 | 14 | 59 | 100 | 90 | 11 | 15 | 6 | 19
4317 | 1 | 70 | 77 | 44 | 77 | 41 | 68 | 33 | 0 | 99 | 14
927961 | 1 | 15 | 97 | 95 | 80 | 35 | 36 | 45 | 8 | 11 | 100
313395 | 1 | 68 | 62 | 56 | 85 | 14 | 96 | 43 | 6 | 32 | 7
368168 | 1 | 3 | 63 | 55 | 32 | 18 | 95 | 67 | 78 | 83 | 52
671830 | 1 | 14 | 29 | 28 | 17 | 42 | 42 | 4 | 6 | 61 | 93
62693 | 1 | 26 | 48 | 15 | 22 | 73 | 94 | 86 | 4 | 66 | 63
488360 | 1 | 8 | 57 | 86 | 31 | 51 | 9 | 40 | 52 | 91 | 45
Mike