Discussion:
node keeps dying
Prem Yadav
2014-09-24 16:32:11 UTC
Permalink
Hi,
this is an issue that has happened a few times. We are using DSE 4.0
One of the Cassandra nodes is detected as dead by the opscenter even though
I can see the process is up.

the logs show heap space error:

INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24 08:31:05,340
StorageService.java (line 2538) Starting repair command #30766, repairing 1
ranges for keyspace <keyspace>
ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java (line
196) Exception in thread Thread[BatchlogTasks:1,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at
org.antlr.runtime.CommonTokenStream.<init>(CommonTokenStream.java:68)
at
org.antlr.runtime.CommonTokenStream.<init>(CommonTokenStream.java:72)
at
org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:413)
at
org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:396)
at
org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:253)
at
org.apache.cassandra.db.BatchlogManager.process(BatchlogManager.java:355)
at
org.apache.cassandra.db.BatchlogManager.replayAllFailedBatches(BatchlogManager.java:179)
at
org.apache.cassandra.db.BatchlogManager$1.runMayThrow(BatchlogManager.java:97)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown
Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)


Any advice will be helpful

thanks
Michael Shuler
2014-09-24 20:53:09 UTC
Permalink
Post by Prem Yadav
this is an issue that has happened a few times. We are using DSE 4.0
I believe this is Apache Cassandra 2.0.5, which is better info for this
list.
Post by Prem Yadav
One of the Cassandra nodes is detected as dead by the opscenter even
though I can see the process is up.
INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24 08:31:05,340
StorageService.java (line 2538) Starting repair command #30766,
repairing 1 ranges for keyspace <keyspace>
ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java
(line 196) Exception in thread Thread[BatchlogTasks:1,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
OOM.

System environment and configuration modification details might be
helpful for others to give you advice. Searching for "cassandra oom"
gave me a few good links to read, and knowing some details about your
nodes might be really helpful. Additionally, CASSANDRA-7507 [0] suggests
that an OOM leaving the process running in an unclean state is not
desired, and the process should be killed.

Several of the search links provide details on how to capture and dig
around a heap dump to aid in troubleshooting.

[0] https://issues.apache.org/jira/browse/CASSANDRA-7507
--
Kind regards,
Michael
Prem Yadav
2014-09-24 21:27:54 UTC
Permalink
Well its not the Linux OOM killer. The system is running with all default
settings.

Total memory 7GB- Cassandra gets assigned 2GB
2 core processors.
Two rings with 3 nodes in each ring.
Post by Michael Shuler
Post by Prem Yadav
this is an issue that has happened a few times. We are using DSE 4.0
I believe this is Apache Cassandra 2.0.5, which is better info for this
list.
One of the Cassandra nodes is detected as dead by the opscenter even
Post by Prem Yadav
though I can see the process is up.
INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24 08:31:05,340
StorageService.java (line 2538) Starting repair command #30766,
repairing 1 ranges for keyspace <keyspace>
ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java
(line 196) Exception in thread Thread[BatchlogTasks:1,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
OOM.
System environment and configuration modification details might be helpful
for others to give you advice. Searching for "cassandra oom" gave me a few
good links to read, and knowing some details about your nodes might be
really helpful. Additionally, CASSANDRA-7507 [0] suggests that an OOM
leaving the process running in an unclean state is not desired, and the
process should be killed.
Several of the search links provide details on how to capture and dig
around a heap dump to aid in troubleshooting.
[0] https://issues.apache.org/jira/browse/CASSANDRA-7507
--
Kind regards,
Michael
Prem Yadav
2014-09-24 21:32:17 UTC
Permalink
BTW, thanks Michael.
I am surprised why I didn't search for Cassandra oom before.
I got some good links that discuss that. Will try to optimize and see how
it goes.
Post by Prem Yadav
Well its not the Linux OOM killer. The system is running with all default
settings.
Total memory 7GB- Cassandra gets assigned 2GB
2 core processors.
Two rings with 3 nodes in each ring.
Post by Michael Shuler
Post by Prem Yadav
this is an issue that has happened a few times. We are using DSE 4.0
I believe this is Apache Cassandra 2.0.5, which is better info for this
list.
One of the Cassandra nodes is detected as dead by the opscenter even
Post by Prem Yadav
though I can see the process is up.
INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24 08:31:05,340
StorageService.java (line 2538) Starting repair command #30766,
repairing 1 ranges for keyspace <keyspace>
ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java
(line 196) Exception in thread Thread[BatchlogTasks:1,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
OOM.
System environment and configuration modification details might be
helpful for others to give you advice. Searching for "cassandra oom" gave
me a few good links to read, and knowing some details about your nodes
might be really helpful. Additionally, CASSANDRA-7507 [0] suggests that an
OOM leaving the process running in an unclean state is not desired, and the
process should be killed.
Several of the search links provide details on how to capture and dig
around a heap dump to aid in troubleshooting.
[0] https://issues.apache.org/jira/browse/CASSANDRA-7507
--
Kind regards,
Michael
Vivek Mishra
2014-09-25 11:12:19 UTC
Permalink
Increase heap size with Cassandra and try
Post by Prem Yadav
BTW, thanks Michael.
I am surprised why I didn't search for Cassandra oom before.
I got some good links that discuss that. Will try to optimize and see how
it goes.
Post by Prem Yadav
Post by Prem Yadav
Well its not the Linux OOM killer. The system is running with all
default settings.
Post by Prem Yadav
Post by Prem Yadav
Total memory 7GB- Cassandra gets assigned 2GB
2 core processors.
Two rings with 3 nodes in each ring.
Post by Michael Shuler
Post by Prem Yadav
this is an issue that has happened a few times. We are using DSE 4.0
I believe this is Apache Cassandra 2.0.5, which is better info for this
list.
Post by Prem Yadav
Post by Prem Yadav
Post by Michael Shuler
Post by Prem Yadav
One of the Cassandra nodes is detected as dead by the opscenter even
though I can see the process is up.
INFO [RMI TCP Connection(18270)-172.31.49.189] 2014-09-24
08:31:05,340
Post by Prem Yadav
Post by Prem Yadav
Post by Michael Shuler
Post by Prem Yadav
StorageService.java (line 2538) Starting repair command #30766,
repairing 1 ranges for keyspace <keyspace>
ERROR [BatchlogTasks:1] 2014-09-24 08:48:54,780 CassandraDaemon.java
(line 196) Exception in thread Thread[BatchlogTasks:1,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
OOM.
System environment and configuration modification details might be
helpful for others to give you advice. Searching for "cassandra oom" gave
me a few good links to read, and knowing some details about your nodes
might be really helpful. Additionally, CASSANDRA-7507 [0] suggests that an
OOM leaving the process running in an unclean state is not desired, and the
process should be killed.
Post by Prem Yadav
Post by Prem Yadav
Post by Michael Shuler
Several of the search links provide details on how to capture and dig
around a heap dump to aid in troubleshooting.
Post by Prem Yadav
Post by Prem Yadav
Post by Michael Shuler
[0] https://issues.apache.org/jira/browse/CASSANDRA-7507
--
Kind regards,
Michael
Loading...