Discussion:
multiple node bootstrapping
Osman YOZGATLIOĞLU
2018-11-28 10:03:14 UTC
Permalink
Hello,

I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.

I started one node in dc1 and its already joining. 3TB of 50TB finished in 2 weeks. One year ttl time series data with twcs.

I know, its not best practise..

I want to start one node in dc2 and cassandra refused to start with mentioning already one node in joining state.

I find some workaround with jmx directives, but i'm not sure if I broke something on the way.

Is it wise to bootstrap in both dc at the same time?


Regards,

Osman
Vitali Dyachuk
2018-11-28 10:40:30 UTC
Permalink
You can use auto_bootstrap set to false to add a new node to the ring, it
will calculate the token range for the new node, but will not start
streaming the data.
In this case you can add several nodes into the ring quickly. After that
you can start nodetool rebuild -dc <> to start streaming data.
In your case 50Tb of data per node is quite a large amount of data i would
recommend, based on own experience keeping 1Tb per node, since when
streaming can be interrupted for some reason and it cannot be resumed so
you'll have to restart streaming. Also there will be compaction problems.

Vitali.
On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOĞLU <
Post by Osman YOZGATLIOĞLU
Hello,
I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
I started one node in dc1 and its already joining. 3TB of 50TB finished in
2 weeks. One year ttl time series data with twcs.
I know, its not best practise..
I want to start one node in dc2 and cassandra refused to start with
mentioning already one node in joining state.
I find some workaround with jmx directives, but i'm not sure if I broke
something on the way.
Is it wise to bootstrap in both dc at the same time?
Regards,
Osman
Jeff Jirsa
2018-11-28 11:59:40 UTC
Permalink
This violates any consistency guarantees you have and isn’t the right approach unless you know what you’re giving up (correctness, typically)
--
Jeff Jirsa
You can use auto_bootstrap set to false to add a new node to the ring, it will calculate the token range for the new node, but will not start streaming the data.
In this case you can add several nodes into the ring quickly. After that you can start nodetool rebuild -dc <> to start streaming data.
In your case 50Tb of data per node is quite a large amount of data i would recommend, based on own experience keeping 1Tb per node, since when streaming can be interrupted for some reason and it cannot be resumed so you'll have to restart streaming. Also there will be compaction problems.
Vitali.
Post by Osman YOZGATLIOĞLU
Hello,
I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
I started one node in dc1 and its already joining. 3TB of 50TB finished in 2 weeks. One year ttl time series data with twcs.
I know, its not best practise..
I want to start one node in dc2 and cassandra refused to start with mentioning already one node in joining state.
I find some workaround with jmx directives, but i'm not sure if I broke something on the way.
Is it wise to bootstrap in both dc at the same time?
Regards,
Osman
Jonathan Haddad
2018-11-28 13:59:04 UTC
Permalink
Agree with Jeff here, using auto_bootstrap:false is probably not what you
want.

Have you increased your streaming throughput?

Upgrading to 3.11 might reduce the time by quite a bit:
https://issues.apache.org/jira/browse/CASSANDRA-9766

You'd be doing committers a huge favor if you grabbed some histograms and
flame graphs on both the sending an receiving nodes:
http://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html and
sent them to the dev mailing list.
Post by Jeff Jirsa
This violates any consistency guarantees you have and isn’t the right
approach unless you know what you’re giving up (correctness, typically)
--
Jeff Jirsa
You can use auto_bootstrap set to false to add a new node to the ring, it
will calculate the token range for the new node, but will not start
streaming the data.
In this case you can add several nodes into the ring quickly. After that
you can start nodetool rebuild -dc <> to start streaming data.
In your case 50Tb of data per node is quite a large amount of data i would
recommend, based on own experience keeping 1Tb per node, since when
streaming can be interrupted for some reason and it cannot be resumed so
you'll have to restart streaming. Also there will be compaction problems.
Vitali.
On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOĞLU <
Post by Osman YOZGATLIOĞLU
Hello,
I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
I started one node in dc1 and its already joining. 3TB of 50TB finished
in 2 weeks. One year ttl time series data with twcs.
I know, its not best practise..
I want to start one node in dc2 and cassandra refused to start with
mentioning already one node in joining state.
I find some workaround with jmx directives, but i'm not sure if I broke
something on the way.
Is it wise to bootstrap in both dc at the same time?
Regards,
Osman
--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade
Continue reading on narkive:
Loading...