Discussion:
Migrating from DSE5.1.2 to Opensource cassandra
Nandakishore Tokala
2018-12-04 22:46:18 UTC
Permalink
HI All,

we are migrating from DSE to open source Cassandra. if anyone has recently
migrated, Can you please share their experience, steps you followed and
challenges you guys faced.

we want to migrate to the same computable version in open source, can you
give us version number(even with the minor version) for DSE 5.1.2

5.1 DSE production-certified 3.10 + enhancements 3.4 + enhancements big m
--
Thanks & Regards,
Nanda Kishore
Jonathan Koppenhofer
2018-12-05 02:32:27 UTC
Permalink
Unfortunately, we found this to be a little tricky. We did migrations from
DSE 4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I will
also say your best option may be to install a fresh cluster and stream the
data. This wasn't feasible for us at the size and scale in the time frames
and infrastructure restrictions we had. I will have to review my notes for
more detail, but off the top of my head, for an in place migration...

Pre-upgrade
* Be sure you are not using any Enterprise features like Search or Graph.
Not only are there not equivalent features in open source, but theses
features require proprietary classes to be in the classpath, or Cassandra
will not even start up.
* By default, I think DSE uses their own custom authenticators,
authorizors, and such. Make sure what you are doing has an open source
equivalent.
* The DSE system keyapaces use custom replication strategies. Convert these
to NTS before upgrade.
* Otherwise, follow the same processes you would do before an upgrade
(repair, snapshot, etc)

Upgrade
* The easy part is just replacing the binaries as you would in normal
upgrade. Drain and stop the existing node first. You can also do this same
process in a rolling fashion to maintain availability. In our case, we were
doing an in-place upgrade and reusing the same IPs
* DSE unfortunately creates a custom column in a system table that requires
you to remove one (or more) system tables (peers?) to be able to start the
node. You delete these system tables by removing the sstbles on disk while
the node is down. This is a bit of a headache if using vnodes. As we are
using vnodes, it required us to manually specify num tokens, and the
specific tokens the node was responsible for in Cassandra.yaml. You have to
do this before you start the node. If not using vnodes, this is simpler,
but we used vnodes. Again, I'll double check my notes. Once the node is up,
you can revert to your normal vnodes/num tokens settings.

Post upgrade:
* Drop DSE system tables.

I'll revert with more detail if needed.

On Tue, Dec 4, 2018, 5:46 PM Nandakishore Tokala <
Post by Nandakishore Tokala
HI All,
we are migrating from DSE to open source Cassandra. if anyone has recently
migrated, Can you please share their experience, steps you followed and
challenges you guys faced.
we want to migrate to the same computable version in open source, can you
give us version number(even with the minor version) for DSE 5.1.2
5.1 DSE production-certified 3.10 + enhancements 3.4 + enhancements big m
--
Thanks & Regards,
Nanda Kishore
d***@yahoo.com.INVALID
2018-12-05 05:03:54 UTC
Permalink
Thanks, nice summary of the overall process.
Dinesh

On Tuesday, December 4, 2018, 9:38:47 PM EST, Jonathan Koppenhofer <***@koppedomain.com> wrote:

Unfortunately, we found this to be a little tricky. We did migrations from DSE 4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I will also say your best option may be to install a fresh cluster and stream the data. This wasn't feasible for us at the size and scale in the time frames and infrastructure restrictions we had. I will have to review my notes for more detail, but off the top of my head, for an in place migration...
Pre-upgrade* Be sure you are not using any Enterprise features like Search or Graph. Not only are there not equivalent features in open source, but theses features require proprietary classes to be in the classpath, or Cassandra will not even start up.* By default, I think DSE uses their own custom authenticators, authorizors, and such. Make sure what you are doing has an open source equivalent.* The DSE system keyapaces use custom replication strategies. Convert these to NTS before upgrade.* Otherwise, follow the same processes you would do before an upgrade (repair, snapshot, etc)
Upgrade* The easy part is just replacing the binaries as you would in normal upgrade. Drain and stop the existing node first. You can also do this same process in a rolling fashion to maintain availability. In our case, we were doing an in-place upgrade and reusing the same IPs* DSE unfortunately creates a custom column in a system table that requires you to remove one (or more) system tables (peers?) to be able to start the node. You delete these system tables by  removing the sstbles on disk while the node is down. This is a bit of a headache if using vnodes. As we are using vnodes, it required us to manually specify num tokens, and the specific tokens the node was responsible for in Cassandra.yaml. You have to do this before you start the node. If not using vnodes, this is simpler, but we used vnodes. Again, I'll double check my notes. Once the node is up, you can revert to your normal vnodes/num tokens settings.
Post upgrade:* Drop DSE system tables.
I'll revert with more detail if needed.

On Tue, Dec 4, 2018, 5:46 PM Nandakishore Tokala <***@gmail.com wrote:

HI All,
we are migrating from DSE to open source Cassandra. if anyone has recently migrated, Can you please share their experience, steps you followed and challenges you guys faced.
we want to migrate to the same computable version in open source, can you give us version number(even with the minor version) for DSE 5.1.2

| 5.1 | DSE production-certified 3.10 + enhancements | 3.4 + enhancements | big | m |
--
Thanks & Regards,
Nanda Kishore
Dor Laor
2018-12-06 06:18:37 UTC
Permalink
An alternative approach is to form another new cluster, leave the original
cluster alive (many times
it's a must since it needs to be 24x7 online). Double write to the two
clusters and later migrate the
data to it. Either by taking a snapshot and pass those files to the new
cluster or with sstableloader.
With this procedure, you'll need to have the same token range ownership.

Another solution is to migrate using Spark which will full-table-scan. We
have generic code that
does it and we can open source it. This way the new cluster can be of any
size and speed is also good
with large amount of data (100s of TB). This process is also restartable as
it takes days to transfer such
amount of data.

Good luck
Post by d***@yahoo.com.INVALID
Thanks, nice summary of the overall process.
Dinesh
On Tuesday, December 4, 2018, 9:38:47 PM EST, Jonathan Koppenhofer <
Unfortunately, we found this to be a little tricky. We did migrations from
DSE 4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I will
also say your best option may be to install a fresh cluster and stream the
data. This wasn't feasible for us at the size and scale in the time frames
and infrastructure restrictions we had. I will have to review my notes for
more detail, but off the top of my head, for an in place migration...
Pre-upgrade
* Be sure you are not using any Enterprise features like Search or Graph.
Not only are there not equivalent features in open source, but theses
features require proprietary classes to be in the classpath, or Cassandra
will not even start up.
* By default, I think DSE uses their own custom authenticators,
authorizors, and such. Make sure what you are doing has an open source
equivalent.
* The DSE system keyapaces use custom replication strategies. Convert
these to NTS before upgrade.
* Otherwise, follow the same processes you would do before an upgrade
(repair, snapshot, etc)
Upgrade
* The easy part is just replacing the binaries as you would in normal
upgrade. Drain and stop the existing node first. You can also do this same
process in a rolling fashion to maintain availability. In our case, we were
doing an in-place upgrade and reusing the same IPs
* DSE unfortunately creates a custom column in a system table that
requires you to remove one (or more) system tables (peers?) to be able to
start the node. You delete these system tables by removing the sstbles on
disk while the node is down. This is a bit of a headache if using vnodes.
As we are using vnodes, it required us to manually specify num tokens, and
the specific tokens the node was responsible for in Cassandra.yaml. You
have to do this before you start the node. If not using vnodes, this is
simpler, but we used vnodes. Again, I'll double check my notes. Once the
node is up, you can revert to your normal vnodes/num tokens settings.
* Drop DSE system tables.
I'll revert with more detail if needed.
On Tue, Dec 4, 2018, 5:46 PM Nandakishore Tokala <
HI All,
we are migrating from DSE to open source Cassandra. if anyone has recently
migrated, Can you please share their experience, steps you followed and
challenges you guys faced.
we want to migrate to the same computable version in open source, can you
give us version number(even with the minor version) for DSE 5.1.2
5.1 DSE production-certified 3.10 + enhancements 3.4 + enhancements big m
--
Thanks & Regards,
Nanda Kishore
Brooke Thorley
2018-12-06 06:23:19 UTC
Permalink
Jonathan's high level process for in place conversion looks right.

To answer your original question about versioning, DSE release notes lists
the equivalent Cassandra version as 3.11.0.

DataStax Enterprise 5.1.2 -

DataStax Enterprise 5.1.10

Apache Cassandra™ 3.11.0 (updated)


Kind Regards,
*Brooke Thorley*
*VP Technical Operations & Customer Services*
***@instaclustr.com | support.instaclustr.com


<https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information. If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy
Post by Dor Laor
An alternative approach is to form another new cluster, leave the original
cluster alive (many times
it's a must since it needs to be 24x7 online). Double write to the two
clusters and later migrate the
data to it. Either by taking a snapshot and pass those files to the new
cluster or with sstableloader.
With this procedure, you'll need to have the same token range ownership.
Another solution is to migrate using Spark which will full-table-scan. We
have generic code that
does it and we can open source it. This way the new cluster can be of any
size and speed is also good
with large amount of data (100s of TB). This process is also restartable
as it takes days to transfer such
amount of data.
Good luck
Post by d***@yahoo.com.INVALID
Thanks, nice summary of the overall process.
Dinesh
On Tuesday, December 4, 2018, 9:38:47 PM EST, Jonathan Koppenhofer <
Unfortunately, we found this to be a little tricky. We did migrations
from DSE 4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I
will also say your best option may be to install a fresh cluster and stream
the data. This wasn't feasible for us at the size and scale in the time
frames and infrastructure restrictions we had. I will have to review my
notes for more detail, but off the top of my head, for an in place
migration...
Pre-upgrade
* Be sure you are not using any Enterprise features like Search or Graph.
Not only are there not equivalent features in open source, but theses
features require proprietary classes to be in the classpath, or Cassandra
will not even start up.
* By default, I think DSE uses their own custom authenticators,
authorizors, and such. Make sure what you are doing has an open source
equivalent.
* The DSE system keyapaces use custom replication strategies. Convert
these to NTS before upgrade.
* Otherwise, follow the same processes you would do before an upgrade
(repair, snapshot, etc)
Upgrade
* The easy part is just replacing the binaries as you would in normal
upgrade. Drain and stop the existing node first. You can also do this same
process in a rolling fashion to maintain availability. In our case, we were
doing an in-place upgrade and reusing the same IPs
* DSE unfortunately creates a custom column in a system table that
requires you to remove one (or more) system tables (peers?) to be able to
start the node. You delete these system tables by removing the sstbles on
disk while the node is down. This is a bit of a headache if using vnodes.
As we are using vnodes, it required us to manually specify num tokens, and
the specific tokens the node was responsible for in Cassandra.yaml. You
have to do this before you start the node. If not using vnodes, this is
simpler, but we used vnodes. Again, I'll double check my notes. Once the
node is up, you can revert to your normal vnodes/num tokens settings.
* Drop DSE system tables.
I'll revert with more detail if needed.
On Tue, Dec 4, 2018, 5:46 PM Nandakishore Tokala <
HI All,
we are migrating from DSE to open source Cassandra. if anyone has
recently migrated, Can you please share their experience, steps you
followed and challenges you guys faced.
we want to migrate to the same computable version in open source, can you
give us version number(even with the minor version) for DSE 5.1.2
5.1 DSE production-certified 3.10 + enhancements 3.4 + enhancements big m
--
Thanks & Regards,
Nanda Kishore
Jonathan Koppenhofer
2018-12-07 01:17:58 UTC
Permalink
Just to add a few additional notes on the in-place replacement.
* We had to remove system.local and system.peers
* Since we remove those system tables, you also have to put
replace_address_first_boot in cassandra-env with the same IP address.
* We also temporarily add the node as a seed to avoid the node from
bootstrapping
* Don't forget to switch your config back to "normal" after after the nodes
is back up and running
* Probably unrelated to this process, but even after drain when we
originally stopped the node, we noticed DSE did not cleanup the commitlogs
even though the logs said those files were drained. So we had to forcefully
remove commitlogs before bringing the node back up.

Finally... Be sure you test this pretty well. We did this on many clusters,
but your mileage may vary depending on version of DSE and the features you
use.
Post by Brooke Thorley
Jonathan's high level process for in place conversion looks right.
To answer your original question about versioning, DSE release notes lists
the equivalent Cassandra version as 3.11.0.
DataStax Enterprise 5.1.2 -
DataStax Enterprise 5.1.10
Apache Cassandra™ 3.11.0 (updated)
Kind Regards,
*Brooke Thorley*
*VP Technical Operations & Customer Services*
<https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>
Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.
This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).
This email and any attachments may contain confidential and legally
privileged information. If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.
Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy
Post by Dor Laor
An alternative approach is to form another new cluster, leave the
original cluster alive (many times
it's a must since it needs to be 24x7 online). Double write to the two
clusters and later migrate the
data to it. Either by taking a snapshot and pass those files to the new
cluster or with sstableloader.
With this procedure, you'll need to have the same token range ownership.
Another solution is to migrate using Spark which will full-table-scan. We
have generic code that
does it and we can open source it. This way the new cluster can be of any
size and speed is also good
with large amount of data (100s of TB). This process is also restartable
as it takes days to transfer such
amount of data.
Good luck
Post by d***@yahoo.com.INVALID
Thanks, nice summary of the overall process.
Dinesh
On Tuesday, December 4, 2018, 9:38:47 PM EST, Jonathan Koppenhofer <
Unfortunately, we found this to be a little tricky. We did migrations
from DSE 4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I
will also say your best option may be to install a fresh cluster and stream
the data. This wasn't feasible for us at the size and scale in the time
frames and infrastructure restrictions we had. I will have to review my
notes for more detail, but off the top of my head, for an in place
migration...
Pre-upgrade
* Be sure you are not using any Enterprise features like Search or
Graph. Not only are there not equivalent features in open source, but
theses features require proprietary classes to be in the classpath, or
Cassandra will not even start up.
* By default, I think DSE uses their own custom authenticators,
authorizors, and such. Make sure what you are doing has an open source
equivalent.
* The DSE system keyapaces use custom replication strategies. Convert
these to NTS before upgrade.
* Otherwise, follow the same processes you would do before an upgrade
(repair, snapshot, etc)
Upgrade
* The easy part is just replacing the binaries as you would in normal
upgrade. Drain and stop the existing node first. You can also do this same
process in a rolling fashion to maintain availability. In our case, we were
doing an in-place upgrade and reusing the same IPs
* DSE unfortunately creates a custom column in a system table that
requires you to remove one (or more) system tables (peers?) to be able to
start the node. You delete these system tables by removing the sstbles on
disk while the node is down. This is a bit of a headache if using vnodes.
As we are using vnodes, it required us to manually specify num tokens, and
the specific tokens the node was responsible for in Cassandra.yaml. You
have to do this before you start the node. If not using vnodes, this is
simpler, but we used vnodes. Again, I'll double check my notes. Once the
node is up, you can revert to your normal vnodes/num tokens settings.
* Drop DSE system tables.
I'll revert with more detail if needed.
On Tue, Dec 4, 2018, 5:46 PM Nandakishore Tokala <
HI All,
we are migrating from DSE to open source Cassandra. if anyone has
recently migrated, Can you please share their experience, steps you
followed and challenges you guys faced.
we want to migrate to the same computable version in open source, can
you give us version number(even with the minor version) for DSE 5.1.2
5.1 DSE production-certified 3.10 + enhancements 3.4 + enhancements big m
--
Thanks & Regards,
Nanda Kishore
Loading...