Discussion:
Wide row column slicing - row size shard limit
Data Craftsman
2012-02-15 18:40:24 UTC
Permalink
Hello experts,

Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/

"This (wide row column slicing) works well enough for a while, but over
time, this row will get very large. If you are storing sensor data that
updates hundreds of times per second, that row will quickly become gigantic
and unusable. The answer to that is to shard the data up in some way"

There is a limit on how big the row size can be before slowing down the
update and query performance, that is 10MB or less.

Is this still true in Cassandra latest version? or in what release
Cassandra will remove this limit?

Manually sharding the wide row will increase the application complexity, it
would be better if Cassandra can handle it transparently.

Thanks,
Charlie | DBA & Developer

p.s. Quora link,
http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
R. Verlangen
2012-02-16 08:36:02 UTC
Permalink
Things you should know:

- Thrift has a limit on the amount of data it will accept / send, you can
configure this in Cassandra: 64MB's should still work find (1)
- Rows should not become huge: this will make "perfect" load balancing
impossible in your cluster
- A single row should fit on a disk
- The limit of columns per row is 2 billion

You should pick a range for your time range (e.g. second, minute, ..) that
suits your needs.

As far as I'm aware of, there's no such limit as 10MB in Cassandra for a
single row to decrease performance. Might be a memory / IO problem.
Post by Data Craftsman
Hello experts,
Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
"This (wide row column slicing) works well enough for a while, but over
time, this row will get very large. If you are storing sensor data that
updates hundreds of times per second, that row will quickly become gigantic
and unusable. The answer to that is to shard the data up in some way"
There is a limit on how big the row size can be before slowing down the
update and query performance, that is 10MB or less.
Is this still true in Cassandra latest version? or in what release
Cassandra will remove this limit?
Manually sharding the wide row will increase the application complexity,
it would be better if Cassandra can handle it transparently.
Thanks,
Charlie | DBA & Developer
p.s. Quora link,
http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
aaron morton
2012-02-16 08:38:19 UTC
Permalink
Post by Data Craftsman
Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
I've not read that one but it sounds right. Mat Dennis knows his stuff http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling
Post by Data Craftsman
There is a limit on how big the row size can be before slowing down the update and query performance, that is 10MB or less.
There is no hard limit. Wide rows wont upset writes too much. Some read queries can avoid problems but most will not.

Wide rows are a pain when it comes to maintenance. They take longer to compact and repair.
Post by Data Craftsman
Is this still true in Cassandra latest version? or in what release Cassandra will remove this limit?
There is a limit of 2 billion columns per row. There is a not a limit of 10MB per row. I've seen some rows in the 100's of MB and they are always a pain.
Post by Data Craftsman
Manually sharding the wide row will increase the application complexity, it would be better if Cassandra can handle it transparently.
it's not that hard :)

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
Post by Data Craftsman
Hello experts,
Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
"This (wide row column slicing) works well enough for a while, but over time, this row will get very large. If you are storing sensor data that updates hundreds of times per second, that row will quickly become gigantic and unusable. The answer to that is to shard the data up in some way"
There is a limit on how big the row size can be before slowing down the update and query performance, that is 10MB or less.
Is this still true in Cassandra latest version? or in what release Cassandra will remove this limit?
Manually sharding the wide row will increase the application complexity, it would be better if Cassandra can handle it transparently.
Thanks,
Charlie | DBA & Developer
p.s. Quora link,
http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
Data Craftsman
2012-02-16 23:41:18 UTC
Permalink
Hi Aaron Morton and R. Verlangen,

Thanks for the quick answer. It's good to know Thrift's limit on the amount
of data it will accept / send.

I know the hard limit is 2 billion columns per row. My question is at what
size it will slowdown read/write performance and maintenance. The blog I
reference said the row size should be less than 10MB.

It'll be better if Cassandra can transparently shard/split the wide row and
then distribute them to many nodes, to help the load balancing.

Are there any other ways to model historical data
(or time-series-data) besides wide row column slicing in Cassandra?

Thanks,
Charlie | Data Solution Architect Developer
http://mujiang.blogspot.com
Post by aaron morton
Post by Data Craftsman
Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
I've not read that one but it sounds right. Mat Dennis knows his stuff
http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling
Post by Data Craftsman
There is a limit on how big the row size can be before slowing down the
update and query performance, that is 10MB or less.
There is no hard limit. Wide rows wont upset writes too much. Some read
queries can avoid problems but most will not.
Wide rows are a pain when it comes to maintenance. They take longer to compact and repair.
Post by Data Craftsman
Is this still true in Cassandra latest version? or in what release
Cassandra will remove this limit?
There is a limit of 2 billion columns per row. There is a not a limit of
10MB per row. I've seen some rows in the 100's of MB and they are always a
pain.
Post by Data Craftsman
Manually sharding the wide row will increase the application complexity,
it would be better if Cassandra can handle it transparently.
it's not that hard :)
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
Post by Data Craftsman
Hello experts,
Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
"This (wide row column slicing) works well enough for a while, but over
time, this row will get very large. If you are storing sensor data that
updates hundreds of times per second, that row will quickly become gigantic
and unusable. The answer to that is to shard the data up in some way"
Post by Data Craftsman
There is a limit on how big the row size can be before slowing down the
update and query performance, that is 10MB or less.
Post by Data Craftsman
Is this still true in Cassandra latest version? or in what release
Cassandra will remove this limit?
Post by Data Craftsman
Manually sharding the wide row will increase the application complexity,
it would be better if Cassandra can handle it transparently.
Post by Data Craftsman
Thanks,
Charlie | DBA & Developer
p.s. Quora link,
http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
aaron morton
2012-02-20 00:32:43 UTC
Permalink
I know the hard limit is 2 billion columns per row. My question is at what size it will slowdown read/write performance and maintenance. The blog I reference said the row size should be less than 10MB.
A look at read performance with different row sizesÂ….
http://thelastpickle.com/2011/10/03/Reverse-Comparators/
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
Are there any other ways to model historical data (or time-series-data) besides wide row column slicing in Cassandra?
Not that I am aware of. You will need to partition the rows.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
Hi Aaron Morton and R. Verlangen,
Thanks for the quick answer. It's good to know Thrift's limit on the amount of data it will accept / send.
I know the hard limit is 2 billion columns per row. My question is at what size it will slowdown read/write performance and maintenance. The blog I reference said the row size should be less than 10MB.
It'll be better if Cassandra can transparently shard/split the wide row and then distribute them to many nodes, to help the load balancing.
Are there any other ways to model historical data (or time-series-data) besides wide row column slicing in Cassandra?
Thanks,
Charlie | Data Solution Architect Developer
http://mujiang.blogspot.com
Post by Data Craftsman
Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
I've not read that one but it sounds right. Mat Dennis knows his stuff http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling
Post by Data Craftsman
There is a limit on how big the row size can be before slowing down the update and query performance, that is 10MB or less.
There is no hard limit. Wide rows wont upset writes too much. Some read queries can avoid problems but most will not.
Wide rows are a pain when it comes to maintenance. They take longer to compact and repair.
Post by Data Craftsman
Is this still true in Cassandra latest version? or in what release Cassandra will remove this limit?
There is a limit of 2 billion columns per row. There is a not a limit of 10MB per row. I've seen some rows in the 100's of MB and they are always a pain.
Post by Data Craftsman
Manually sharding the wide row will increase the application complexity, it would be better if Cassandra can handle it transparently.
it's not that hard :)
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
Post by Data Craftsman
Hello experts,
Based on this blog of Basic Time Series with Cassandra data modeling,
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
"This (wide row column slicing) works well enough for a while, but over time, this row will get very large. If you are storing sensor data that updates hundreds of times per second, that row will quickly become gigantic and unusable. The answer to that is to shard the data up in some way"
There is a limit on how big the row size can be before slowing down the update and query performance, that is 10MB or less.
Is this still true in Cassandra latest version? or in what release Cassandra will remove this limit?
Manually sharding the wide row will increase the application complexity, it would be better if Cassandra can handle it transparently.
Thanks,
Charlie | DBA & Developer
p.s. Quora link,
http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
Loading...