High CPU on nodes

Discussion:

High CPU on nodes

Anubhav Kale

2016-12-16 23:10:17 UTC

Hello,

I am trying to fight a high CPU problem on some of our nodes. Thread dumps show that it's not GC threads (we have 30GB heap), iostat %iowait confirms it's not disk (ranges between 0.3 - 0.9%). One of the ways in which the problem manifests is that the nodes can't compact SSTables and it happens randomly. We run Cassandra 2.1.13 on Azure Premium Storage (network attached SSDs).

One of the sample threads that was taking high CPU shows :

"pool-13-thread-1" #3352<https://support.datastax.com/hc/requests/3352> prio=5 os_prio=0 tid=0x00007f2275340bb0 nid=0x1b0b runnable [0x00007f33ffaae000]
java.lang.Thread.State: RUNNABLE
at java.util.TimSort.gallopRight(TimSort.java:632)
at java.util.TimSort.mergeLo(TimSort.java:739)
at java.util.TimSort.mergeAt(TimSort.java:514)
at java.util.TimSort.mergeCollapse(TimSort.java:441)
at java.util.TimSort.sort(TimSort.java:245)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1454)
at java.util.Collections.sort(Collections.java:175)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithScore(DynamicEndpointSnitch.java:163)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithBadness(DynamicEndpointSnitch.java:200)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(DynamicEndpointSnitch.java:152)
at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1581)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1739)

Looking at code, I can't figure out why things like this would require a high CPU and I don't find any JIRAs relating this as well. So, what can I do next to troubleshoot this ?

Thanks !

Alain RODRIGUEZ

2016-12-17 13:17:53 UTC

Permalink

Hi,

What does 'nodetool netstats' looks like on those nodes?

we have 30GB heap
How is the JVM / GC doing? Are you using G1GC or CMS? This setting would be
bad for CMS.

You can use this tool to understand were the CPU is being used
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command
.

I hope that helps,

C*heers,
-----------------------
Alain Rodriguez - @arodream - ***@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Post by Anubhav Kale
Hello,
I am trying to fight a high CPU problem on some of our nodes. Thread dumps
show that itâs not GC threads (we have 30GB heap), iostat %iowait confirms
itâs not disk (ranges between 0.3 â 0.9%). One of the ways in which the
problem manifests is that the nodes canât compact SSTables and it happens
randomly. We run Cassandra 2.1.13 on Azure Premium Storage (network
attached SSDs).
"pool-13-thread-1" #3352 <https://support.datastax.com/hc/requests/3352>
prio=5 os_prio=0 tid=0x00007f2275340bb0 nid=0x1b0b runnable
[0x00007f33ffaae000]
java.lang.Thread.State: RUNNABLE
at java.util.TimSort.gallopRight(TimSort.java:632)
at java.util.TimSort.mergeLo(TimSort.java:739)
at java.util.TimSort.mergeAt(TimSort.java:514)
at java.util.TimSort.mergeCollapse(TimSort.java:441)
at java.util.TimSort.sort(TimSort.java:245)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1454)
at java.util.Collections.sort(Collections.java:175)
at org.apache.cassandra.locator.DynamicEndpointSnitch.
sortByProximityWithScore(DynamicEndpointSnitch.java:163)
at org.apache.cassandra.locator.DynamicEndpointSnitch.
sortByProximityWithBadness(DynamicEndpointSnitch.java:200)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(
DynamicEndpointSnitch.java:152)
at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(
StorageProxy.java:1581)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(
StorageProxy.java:1739)
Looking at code, I canât figure out why things like this would require a
high CPU and I donât find any JIRAs relating this as well. So, what can I
do next to troubleshoot this ?
Thanks !

Anubhav Kale

2016-12-21 16:40:31 UTC

Permalink

CIL

From: Alain RODRIGUEZ [mailto:***@gmail.com]
Sent: Saturday, December 17, 2016 5:18 AM
To: ***@cassandra.apache.org
Subject: Re: High CPU on nodes

Hi,

What does 'nodetool netstats' looks like on those nodes?

Its not doing any streaming.

we have 30GB heap

How is the JVM / GC doing? Are you using G1GC or CMS? This setting would be bad for CMS.

G1. GC is doing fine. I donât see any long pauses beyond 200 ms.

You can use this tool to understand were the CPU is being used https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faragozin%2Fjvm-tools%2Fblob%2Fmaster%2Fsjk-core%2FCOMMANDS.md%23ttop-command&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=R%2FouOelExm1C3okjg9zEJsdlCiDRrhy8%2B9n3SIqC4fg%3D&reserved=0>.

I hope that helps,

C*heers,
-----------------------
Alain Rodriguez - @arodream - ***@thelastpickle.com<mailto:***@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thelastpickle.com&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=kZPi%2B43OyWGNr%2FAmJsLflVOWkSMI0V7oK4x%2Ff%2FR27BU%3D&reserved=0>

2016-12-17 0:10 GMT+01:00 Anubhav Kale <***@microsoft.com<mailto:***@microsoft.com>>:
Hello,

I am trying to fight a high CPU problem on some of our nodes. Thread dumps show that itâs not GC threads (we have 30GB heap), iostat %iowait confirms itâs not disk (ranges between 0.3 â 0.9%). One of the ways in which the problem manifests is that the nodes canât compact SSTables and it happens randomly. We run Cassandra 2.1.13 on Azure Premium Storage (network attached SSDs).

One of the sample threads that was taking high CPU shows :

"pool-13-thread-1" #3352<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.datastax.com%2Fhc%2Frequests%2F3352&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=OP%2FepExQP5HyrBitvVlyjCj4cVXpB0zc8Oj5TWapduY%3D&reserved=0> prio=5 os_prio=0 tid=0x00007f2275340bb0 nid=0x1b0b runnable [0x00007f33ffaae000]
java.lang.Thread.State: RUNNABLE
at java.util.TimSort.gallopRight(TimSort.java:632)
at java.util.TimSort.mergeLo(TimSort.java:739)
at java.util.TimSort.mergeAt(TimSort.java:514)
at java.util.TimSort.mergeCollapse(TimSort.java:441)
at java.util.TimSort.sort(TimSort.java:245)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1454)
at java.util.Collections.sort(Collections.java:175)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithScore(DynamicEndpointSnitch.java:163)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithBadness(DynamicEndpointSnitch.java:200)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(DynamicEndpointSnitch.java:152)
at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1581)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1739)

Looking at code, I canât figure out why things like this would require a high CPU and I donât find any JIRAs relating this as well. So, what can I do next to troubleshoot this ?

Thanks !

Nate McCall

2016-12-21 17:04:03 UTC

Permalink

https://issues.apache.org/jira/browse/CASSANDRA-6908

Disable DynamicSnitch by adding the following to cassandra.yaml (it is a
not in the file by default):

dynamic_snitch: false

CIL
*Sent:* Saturday, December 17, 2016 5:18 AM
*Subject:* Re: High CPU on nodes
Hi,
What does 'nodetool netstats' looks like on those nodes?
*Its not doing any streaming.*
we have 30GB heap
How is the JVM / GC doing? Are you using G1GC or CMS? This setting would be bad for CMS.
*G1. GC is doing fine. I donât see any long pauses beyond 200 ms.*
You can use this tool to understand were the CPU is being used
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#
ttop-command
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faragozin%2Fjvm-tools%2Fblob%2Fmaster%2Fsjk-core%2FCOMMANDS.md%23ttop-command&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=R%2FouOelExm1C3okjg9zEJsdlCiDRrhy8%2B9n3SIqC4fg%3D&reserved=0>
.
I hope that helps,
C*heers,
-----------------------
France
The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com
<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thelastpickle.com&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=kZPi%2B43OyWGNr%2FAmJsLflVOWkSMI0V7oK4x%2Ff%2FR27BU%3D&reserved=0>
Hello,
I am trying to fight a high CPU problem on some of our nodes. Thread dumps
show that itâs not GC threads (we have 30GB heap), iostat %iowait confirms
itâs not disk (ranges between 0.3 â 0.9%). One of the ways in which the
problem manifests is that the nodes canât compact SSTables and it happens
randomly. We run Cassandra 2.1.13 on Azure Premium Storage (network
attached SSDs).
"pool-13-thread-1" #3352
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.datastax.com%2Fhc%2Frequests%2F3352&data=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cab2c0fcf99a447694b0908d4267f3036%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636175775106811606&sdata=OP%2FepExQP5HyrBitvVlyjCj4cVXpB0zc8Oj5TWapduY%3D&reserved=0>
prio=5 os_prio=0 tid=0x00007f2275340bb0 nid=0x1b0b runnable
[0x00007f33ffaae000]
java.lang.Thread.State: RUNNABLE
at java.util.TimSort.gallopRight(TimSort.java:632)
at java.util.TimSort.mergeLo(TimSort.java:739)
at java.util.TimSort.mergeAt(TimSort.java:514)
at java.util.TimSort.mergeCollapse(TimSort.java:441)
at java.util.TimSort.sort(TimSort.java:245)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1454)
at java.util.Collections.sort(Collections.java:175)
at org.apache.cassandra.locator.DynamicEndpointSnitch.
sortByProximityWithScore(DynamicEndpointSnitch.java:163)
at org.apache.cassandra.locator.DynamicEndpointSnitch.
sortByProximityWithBadness(DynamicEndpointSnitch.java:200)
at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(
DynamicEndpointSnitch.java:152)
at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(
StorageProxy.java:1581)
at org.apache.cassandra.service.StorageProxy.getRangeSlice(
StorageProxy.java:1739)
Looking at code, I canât figure out why things like this would require a
high CPU and I donât find any JIRAs relating this as well. So, what can I
do next to troubleshoot this ?
Thanks !

--
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Continue reading on narkive:

Search results for 'High CPU on nodes' (Questions and Answers)

replies

My cpu gets high really easily, do i buy a more powerful one?

started 2008-11-08 11:43:29 UTC

hardware

replies

deifference between ring topology and star topology and assembly language and high level language.?

started 2008-10-26 21:55:15 UTC

computer networking

replies