multiple node bootstrapping

Discussion:

Osman YOZGATLIOĞLU

2018-11-28 10:03:14 UTC

Hello,

I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.

I started one node in dc1 and its already joining. 3TB of 50TB finished in 2 weeks. One year ttl time series data with twcs.

I know, its not best practise..

I want to start one node in dc2 and cassandra refused to start with mentioning already one node in joining state.

I find some workaround with jmx directives, but i'm not sure if I broke something on the way.

Is it wise to bootstrap in both dc at the same time?

Regards,

Osman

Vitali Dyachuk

2018-11-28 10:40:30 UTC

Permalink

You can use auto_bootstrap set to false to add a new node to the ring, it
will calculate the token range for the new node, but will not start
streaming the data.
In this case you can add several nodes into the ring quickly. After that
you can start nodetool rebuild -dc <> to start streaming data.
In your case 50Tb of data per node is quite a large amount of data i would
recommend, based on own experience keeping 1Tb per node, since when
streaming can be interrupted for some reason and it cannot be resumed so
you'll have to restart streaming. Also there will be compaction problems.

Vitali.
On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOÄLU <

Post by Osman YOZGATLIOÄLU
Hello,
I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
I started one node in dc1 and its already joining. 3TB of 50TB finished in
2 weeks. One year ttl time series data with twcs.
I know, its not best practise..
I want to start one node in dc2 and cassandra refused to start with
mentioning already one node in joining state.
I find some workaround with jmx directives, but i'm not sure if I broke
something on the way.
Is it wise to bootstrap in both dc at the same time?
Regards,
Osman

Jeff Jirsa

2018-11-28 11:59:40 UTC

Permalink

This violates any consistency guarantees you have and isnât the right approach unless you know what youâre giving up (correctness, typically)
--
Jeff Jirsa

You can use auto_bootstrap set to false to add a new node to the ring, it will calculate the token range for the new node, but will not start streaming the data.
In this case you can add several nodes into the ring quickly. After that you can start nodetool rebuild -dc <> to start streaming data.
In your case 50Tb of data per node is quite a large amount of data i would recommend, based on own experience keeping 1Tb per node, since when streaming can be interrupted for some reason and it cannot be resumed so you'll have to restart streaming. Also there will be compaction problems.
Vitali.

Post by Osman YOZGATLIOÄLU
Hello,
I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
I started one node in dc1 and its already joining. 3TB of 50TB finished in 2 weeks. One year ttl time series data with twcs.
I know, its not best practise..
I want to start one node in dc2 and cassandra refused to start with mentioning already one node in joining state.
I find some workaround with jmx directives, but i'm not sure if I broke something on the way.
Is it wise to bootstrap in both dc at the same time?
Regards,
Osman

Jonathan Haddad

2018-11-28 13:59:04 UTC

Permalink

Agree with Jeff here, using auto_bootstrap:false is probably not what you
want.

Have you increased your streaming throughput?

Upgrading to 3.11 might reduce the time by quite a bit:
https://issues.apache.org/jira/browse/CASSANDRA-9766

You'd be doing committers a huge favor if you grabbed some histograms and
flame graphs on both the sending an receiving nodes:
http://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html and
sent them to the dev mailing list.

Post by Jeff Jirsa
This violates any consistency guarantees you have and isnât the right
approach unless you know what youâre giving up (correctness, typically)
--
Jeff Jirsa
You can use auto_bootstrap set to false to add a new node to the ring, it
will calculate the token range for the new node, but will not start
streaming the data.
In this case you can add several nodes into the ring quickly. After that
you can start nodetool rebuild -dc <> to start streaming data.
In your case 50Tb of data per node is quite a large amount of data i would
recommend, based on own experience keeping 1Tb per node, since when
streaming can be interrupted for some reason and it cannot be resumed so
you'll have to restart streaming. Also there will be compaction problems.
Vitali.
On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOÄLU <

Post by Osman YOZGATLIOÄLU
Hello,
I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
I started one node in dc1 and its already joining. 3TB of 50TB finished
in 2 weeks. One year ttl time series data with twcs.
I know, its not best practise..
I want to start one node in dc2 and cassandra refused to start with
mentioning already one node in joining state.
I find some workaround with jmx directives, but i'm not sure if I broke
something on the way.
Is it wise to bootstrap in both dc at the same time?
Regards,
Osman

--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Continue reading on narkive:

Search results for 'multiple node bootstrapping' (Questions and Answers)

replies

what's a usb connection?

started 2006-05-16 10:09:14 UTC

add-ons

replies

what is DNS?what is Active Directory?what is patch file?

started 2006-10-10 03:15:22 UTC

computer networking

replies