Cassandra counter readtimeout error

Discussion:

Javier Pareja

2018-02-17 11:40:42 UTC

Hello everyone,

I get a timeout error when reading a particular row from a large counters
table.

I have a storm topology that inserts data into a Cassandra counter table.
This table has 6 partition keys, 4 primary keys and 5 counters.

When data starts to be inserted, I can query the counters correctly from
that particular row but after a few minutes updating the table with
thousands of events, I get a readtimeout every time I try to read a
particular row from the table (the most frequently updated). Other rows I
can read quick and fine. Also if I run "select *", the top few hundreds are
returned quick and fine as expected. The storm topology is stopped but the
error is still there.

I am using Cassandra 3.6.

More information here:
https://stackoverflow.com/q/48833146

Are counters in this version broken? I run the query from CQLSH and get the
same error every time. I tried running it with trace enabled and get
nothing but the error:

ReadTimeout: Error from server: code=1200 [Coordinator node timed out
waiting for replica nodes' responses] message="Operation timed out -
received only 0 responses." info={'received_responses': 0,
'required_responses': 1, 'consistency': 'ONE'}

Any ideas?

Alain RODRIGUEZ

2018-02-19 08:08:57 UTC

Permalink

Hello,

This table has 6 partition keys, 4 primary keys and 5 counters.

I think the root issue is this ^. There might be some inefficiency or
issues with counter, but this design, makes Cassandra relatively
inefficient in most cases and using standard columns or counters
indifferently.

Cassandra data is supposed to be well distributed for a maximal efficiency.
With only 6 partitions, if you have 6+ nodes, there is 100% chances that
the load is fairly imbalanced. If you have less nodes, it's still probably
poorly balanced. Also reading from a small number of sstables and in
parallel within many nodes ideally to split the work and make queries
efficient, but in this case cassandra is reading huge partitions from one
node most probably. When the size of the request is too big it can timeout.
I am not sure how pagination works with counters, but I believe even if
pagination is working, at some point, you are just reading too much (or too
inefficiently) and the timeout is reached.

I imagined it worked well for a while as counters are very small columns /
tables compared to any event data but at some point you might have reached
'physical' limit, because you are pulling *all* the information you need
from one partition (and probably many SSTables)

Is there really no other way to design this use case?

When data starts to be inserted, I can query the counters correctly from

Post by Javier Pareja
that particular row but after a few minutes updating the table with
thousands of events, I get a read timeout every time

Troubleshot:
- Use tracing to understand what takes so long with your queries
- Check for warns / error in the logs. Cassandra use to complain when it is
unhappy with the configurations. There a lot of interesting and it's been a
while I last had a failure with no relevant informations in the logs.
- Check SSTable per read and other read performances for this counter
table. Using some monitoring could make the reason of this timeout obvious.
If you use Datadog for example, I guess that a quick look at the "Read
Path" Dashboard would help. If you are using any other tool, look for
SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
(as maybe fast insert rate compactions and implicit 'read-before-writes'
are making the machine less responsive.

Fix:
- Improve design to improve the findings you made above ^
- Improve compaction strategy or read operations depending on the findings
above ^

I am not saying there is no bug in counters and in your version, but I
would say it is to early to state this, given the data model, some other
reasons could explain this slowness.

If you don't have any monitoring in place, tracing and logs are a nice
place to start digging. If you want to share those here, we can help
interpreting outputs you will share if needed :).

C*heers,

Alain

Post by Javier Pareja
Hello everyone,
I get a timeout error when reading a particular row from a large counters
table.
I have a storm topology that inserts data into a Cassandra counter table.
This table has 6 partition keys, 4 primary keys and 5 counters.
When data starts to be inserted, I can query the counters correctly from
that particular row but after a few minutes updating the table with
thousands of events, I get a readtimeout every time I try to read a
particular row from the table (the most frequently updated). Other rows I
can read quick and fine. Also if I run "select *", the top few hundreds are
returned quick and fine as expected. The storm topology is stopped but the
error is still there.
I am using Cassandra 3.6.
https://stackoverflow.com/q/48833146
Are counters in this version broken? I run the query from CQLSH and get
the same error every time. I tried running it with trace enabled and get
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
Any ideas?

Javier Pareja

2018-02-19 09:31:08 UTC

Permalink

Hi,

Thank you for your reply.

As I was bothered by this problem, last night I upgraded the cluster to
version 3.11.1 and everything is working now. As far as I can tell the
counter table can be read now. I will be doing more testing today with this
version but it is looking good.

To answer your questions:
- I might not have explained the table definition very well but the table
does not have 6 partitions, but 6 partition keys. There are thousands of
partitions in that table, a combination of all those partition keys. I also
made sure that the partitions remained small when designing the table.
- I also enabled tracing in the CQLSH but it showed nothing when querying
this row. It however did when querying other tables...

Thanks again for your reply!! I am very excited to be part of the Cassandra
user base.

Javier

F Javier Pareja

Post by Alain RODRIGUEZ
Hello,
This table has 6 partition keys, 4 primary keys and 5 counters.
I think the root issue is this ^. There might be some inefficiency or
issues with counter, but this design, makes Cassandra relatively
inefficient in most cases and using standard columns or counters
indifferently.
Cassandra data is supposed to be well distributed for a maximal
efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
chances that the load is fairly imbalanced. If you have less nodes, it's
still probably poorly balanced. Also reading from a small number of
sstables and in parallel within many nodes ideally to split the work and
make queries efficient, but in this case cassandra is reading huge
partitions from one node most probably. When the size of the request is too
big it can timeout. I am not sure how pagination works with counters, but I
believe even if pagination is working, at some point, you are just reading
too much (or too inefficiently) and the timeout is reached.
I imagined it worked well for a while as counters are very small columns /
tables compared to any event data but at some point you might have reached
'physical' limit, because you are pulling *all* the information you need
from one partition (and probably many SSTables)
Is there really no other way to design this use case?
When data starts to be inserted, I can query the counters correctly from

Post by Javier Pareja
that particular row but after a few minutes updating the table with
thousands of events, I get a read timeout every time

- Use tracing to understand what takes so long with your queries
- Check for warns / error in the logs. Cassandra use to complain when it
is unhappy with the configurations. There a lot of interesting and it's
been a while I last had a failure with no relevant informations in the logs.
- Check SSTable per read and other read performances for this counter
table. Using some monitoring could make the reason of this timeout obvious.
If you use Datadog for example, I guess that a quick look at the "Read
Path" Dashboard would help. If you are using any other tool, look for
SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
(as maybe fast insert rate compactions and implicit 'read-before-writes'
are making the machine less responsive.
- Improve design to improve the findings you made above ^
- Improve compaction strategy or read operations depending on the findings
above ^
I am not saying there is no bug in counters and in your version, but I
would say it is to early to state this, given the data model, some other
reasons could explain this slowness.
If you don't have any monitoring in place, tracing and logs are a nice
place to start digging. If you want to share those here, we can help
interpreting outputs you will share if needed :).
C*heers,
Alain

Alain RODRIGUEZ

2018-02-19 16:43:20 UTC

Permalink

Hi Javier,

Glad to hear it is solved now. Cassandra 3.11.1 should be a more stable
version and 3.11 a better series.

Excuse my misunderstanding, your table seems to be better designed than
thought.

Welcome to the Apache Cassandra community!

C*heers ;-)
-----------------------
Alain Rodriguez - @arodream - ***@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Post by Javier Pareja
Hi,
Thank you for your reply.
As I was bothered by this problem, last night I upgraded the cluster to
version 3.11.1 and everything is working now. As far as I can tell the
counter table can be read now. I will be doing more testing today with this
version but it is looking good.
- I might not have explained the table definition very well but the table
does not have 6 partitions, but 6 partition keys. There are thousands of
partitions in that table, a combination of all those partition keys. I also
made sure that the partitions remained small when designing the table.
- I also enabled tracing in the CQLSH but it showed nothing when querying
this row. It however did when querying other tables...
Thanks again for your reply!! I am very excited to be part of the
Cassandra user base.
Javier
F Javier Pareja

Post by Alain RODRIGUEZ
Hello,
This table has 6 partition keys, 4 primary keys and 5 counters.
I think the root issue is this ^. There might be some inefficiency or
issues with counter, but this design, makes Cassandra relatively
inefficient in most cases and using standard columns or counters
indifferently.
Cassandra data is supposed to be well distributed for a maximal
efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
chances that the load is fairly imbalanced. If you have less nodes, it's
still probably poorly balanced. Also reading from a small number of
sstables and in parallel within many nodes ideally to split the work and
make queries efficient, but in this case cassandra is reading huge
partitions from one node most probably. When the size of the request is too
big it can timeout. I am not sure how pagination works with counters, but I
believe even if pagination is working, at some point, you are just reading
too much (or too inefficiently) and the timeout is reached.
I imagined it worked well for a while as counters are very small columns
/ tables compared to any event data but at some point you might have
reached 'physical' limit, because you are pulling *all* the information
you need from one partition (and probably many SSTables)
Is there really no other way to design this use case?
When data starts to be inserted, I can query the counters correctly from

Post by Javier Pareja
that particular row but after a few minutes updating the table with
thousands of events, I get a read timeout every time

- Use tracing to understand what takes so long with your queries
- Check for warns / error in the logs. Cassandra use to complain when it
is unhappy with the configurations. There a lot of interesting and it's
been a while I last had a failure with no relevant informations in the logs.
- Check SSTable per read and other read performances for this counter
table. Using some monitoring could make the reason of this timeout obvious.
If you use Datadog for example, I guess that a quick look at the "Read
Path" Dashboard would help. If you are using any other tool, look for
SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
(as maybe fast insert rate compactions and implicit 'read-before-writes'
are making the machine less responsive.
- Improve design to improve the findings you made above ^
- Improve compaction strategy or read operations depending on the
findings above ^
I am not saying there is no bug in counters and in your version, but I
would say it is to early to state this, given the data model, some other
reasons could explain this slowness.
If you don't have any monitoring in place, tracing and logs are a nice
place to start digging. If you want to share those here, we can help
interpreting outputs you will share if needed :).
C*heers,
Alain

Post by Javier Pareja
Hello everyone,
I get a timeout error when reading a particular row from a large
counters table.
I have a storm topology that inserts data into a Cassandra counter
table. This table has 6 partition keys, 4 primary keys and 5 counters.
When data starts to be inserted, I can query the counters correctly from
that particular row but after a few minutes updating the table with
thousands of events, I get a readtimeout every time I try to read a
particular row from the table (the most frequently updated). Other rows I
can read quick and fine. Also if I run "select *", the top few hundreds are
returned quick and fine as expected. The storm topology is stopped but the
error is still there.
I am using Cassandra 3.6.
https://stackoverflow.com/q/48833146
Are counters in this version broken? I run the query from CQLSH and get
the same error every time. I tried running it with trace enabled and get
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
Any ideas?

Carl Mueller

2018-02-20 18:07:39 UTC

Permalink

How "hot" are your partition keys in these counters?

I would think, theoretically, if specific partition keys are getting
thousands of counter increments/mutations updates, then compaction won't
"compact" those together into the final value, and you'll start
experiencing the problems people get with rows with thousands of tombstones.

So if you had an event 'birthdaypartyattendance'

and you had 1110 separate updates doing +1s/+2s/+3s to the attendance count
for that event (what a bday party!), then when you went to select that
final attendance value, with many of those increments may still be on other
nodes and not fully replicated, then it will have to read 1110 cells and
accumulate them to the final value. When replication has completed and
compaction runs, it should amalgamate those. QUORUM-write will help with
ensuring the counter mutations are written to the proper number of nodes,
with the usual three node wait overhead.

DISCLAIMER: I don't have working knowledge of the code in distributed
counters. I just know they are a really hard problem and don't work great
in 2.x. As said, 3.x seems to be a lot better.

Post by Alain RODRIGUEZ
Hi Javier,
Glad to hear it is solved now. Cassandra 3.11.1 should be a more stable
version and 3.11 a better series.
Excuse my misunderstanding, your table seems to be better designed than
thought.
Welcome to the Apache Cassandra community!
C*heers ;-)
-----------------------
France / Spain
The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Post by Alain RODRIGUEZ
Hello,
This table has 6 partition keys, 4 primary keys and 5 counters.
I think the root issue is this ^. There might be some inefficiency or
issues with counter, but this design, makes Cassandra relatively
inefficient in most cases and using standard columns or counters
indifferently.
Cassandra data is supposed to be well distributed for a maximal
efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
chances that the load is fairly imbalanced. If you have less nodes, it's
still probably poorly balanced. Also reading from a small number of
sstables and in parallel within many nodes ideally to split the work and
make queries efficient, but in this case cassandra is reading huge
partitions from one node most probably. When the size of the request is too
big it can timeout. I am not sure how pagination works with counters, but I
believe even if pagination is working, at some point, you are just reading
too much (or too inefficiently) and the timeout is reached.
I imagined it worked well for a while as counters are very small columns
/ tables compared to any event data but at some point you might have
reached 'physical' limit, because you are pulling *all* the information
you need from one partition (and probably many SSTables)
Is there really no other way to design this use case?
When data starts to be inserted, I can query the counters correctly from

Post by Javier Pareja
that particular row but after a few minutes updating the table with
thousands of events, I get a read timeout every time

- Use tracing to understand what takes so long with your queries
- Check for warns / error in the logs. Cassandra use to complain when it
is unhappy with the configurations. There a lot of interesting and it's
been a while I last had a failure with no relevant informations in the logs.
- Check SSTable per read and other read performances for this counter
table. Using some monitoring could make the reason of this timeout obvious.
If you use Datadog for example, I guess that a quick look at the "Read
Path" Dashboard would help. If you are using any other tool, look for
SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
(as maybe fast insert rate compactions and implicit 'read-before-writes'
are making the machine less responsive.
- Improve design to improve the findings you made above ^
- Improve compaction strategy or read operations depending on the
findings above ^
I am not saying there is no bug in counters and in your version, but I
would say it is to early to state this, given the data model, some other
reasons could explain this slowness.
If you don't have any monitoring in place, tracing and logs are a nice
place to start digging. If you want to share those here, we can help
interpreting outputs you will share if needed :).
C*heers,
Alain

Post by Javier Pareja
Hello everyone,
I get a timeout error when reading a particular row from a large
counters table.
I have a storm topology that inserts data into a Cassandra counter
table. This table has 6 partition keys, 4 primary keys and 5 counters.
When data starts to be inserted, I can query the counters correctly
from that particular row but after a few minutes updating the table with
thousands of events, I get a readtimeout every time I try to read a
particular row from the table (the most frequently updated). Other rows I
can read quick and fine. Also if I run "select *", the top few hundreds are
returned quick and fine as expected. The storm topology is stopped but the
error is still there.
I am using Cassandra 3.6.
https://stackoverflow.com/q/48833146
Are counters in this version broken? I run the query from CQLSH and get
the same error every time. I tried running it with trace enabled and get
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
Any ideas?