Discussion:
Problem with restoring a snapshot using sstableloader
Oliver Herrmann
2018-11-30 16:05:43 UTC
Permalink
Hi,

I'm having some problems to restore a snapshot using sstableloader. I'm
using cassandra 3.11.1 and followed the instructions for a creating and
restoring from this page:
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/toolsBulkloader.html


1. Called nodetool cleanup on each node
$ nodetool cleanup cass_testapp

2. Called nodetool snapshot on each node
$ nodetool snapshot -t snap1 -kt cass_testapp.table3

3. Checked the data and snapshot folders:
$ ll
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb
drwxr-xr-x 2 cassandra cassandra 6 Nov 29 03:54 backups
-rw-r--r-- 2 cassandra cassandra 43 Nov 30 10:21
mc-11-big-CompressionInfo.db
-rw-r--r-- 2 cassandra cassandra 241 Nov 30 10:21 mc-11-big-Data.db
-rw-r--r-- 2 cassandra cassandra 9 Nov 30 10:21 mc-11-big-Digest.crc32
-rw-r--r-- 2 cassandra cassandra 16 Nov 30 10:21 mc-11-big-Filter.db
-rw-r--r-- 2 cassandra cassandra 21 Nov 30 10:21 mc-11-big-Index.db
-rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
-rw-r--r-- 2 cassandra cassandra 95 Nov 30 10:21 mc-11-big-Summary.db
-rw-r--r-- 2 cassandra cassandra 92 Nov 30 10:21 mc-11-big-TOC.txt
drwxr-xr-x 3 cassandra cassandra 18 Nov 30 10:30 snapshots

and

$ ll
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
total 44
-rw-r--r-- 1 cassandra cassandra 32 Nov 30 10:30 manifest.json
-rw-r--r-- 2 cassandra cassandra 43 Nov 30 10:21
mc-11-big-CompressionInfo.db
-rw-r--r-- 2 cassandra cassandra 241 Nov 30 10:21 mc-11-big-Data.db
-rw-r--r-- 2 cassandra cassandra 9 Nov 30 10:21 mc-11-big-Digest.crc32
-rw-r--r-- 2 cassandra cassandra 16 Nov 30 10:21 mc-11-big-Filter.db
-rw-r--r-- 2 cassandra cassandra 21 Nov 30 10:21 mc-11-big-Index.db
-rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
-rw-r--r-- 2 cassandra cassandra 95 Nov 30 10:21 mc-11-big-Summary.db
-rw-r--r-- 2 cassandra cassandra 92 Nov 30 10:21 mc-11-big-TOC.txt
-rw-r--r-- 1 cassandra cassandra 1043 Nov 30 10:30 schema.cql

4. Truncated the table
cqlsh:cass_testapp> TRUNCATE table3 ;

5. Tried to restore table3 on one cassandra node
$ sstableloader -d localhost
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
Established connection to initial hosts
Opening sstables and calculating sections to stream
Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist

Summary statistics:
Connections per host : 1
Total files transferred : 0
Total bytes transferred : 0.000KiB
Total duration : 2652 ms
Average transfer rate : 0.000KiB/s
Peak transfer rate : 0.000KiB/s

I'm always getting the message "Skipping file mc-11-big-Data.db: table
snapshots.table3 doesn't exist". I also tried to rename the snapshots
folder into the keyspace name (cass_testapp) but then I get the message
"Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".

What I'm doing wrong?

Thanks
Oliver
Oleksandr Shulgin
2018-11-30 16:39:23 UTC
Permalink
Post by Oliver Herrmann
I'm always getting the message "Skipping file mc-11-big-Data.db: table
snapshots.table3 doesn't exist". I also tried to rename the snapshots
folder into the keyspace name (cass_testapp) but then I get the message
"Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".
Hi,

I imagine moving the files from snapshot directory to the data directory
and then running `nodetool refresh` is the supported way. Why use
sstableloader for that?

--
Alex
Oliver Herrmann
2018-11-30 16:54:09 UTC
Permalink
When using nodetool refresh I must have write access to the data folder and
I have to do it on every node. In our production environment the user that
would do the restore does not have write access to the data folder.

Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin <
Post by Oleksandr Shulgin
Post by Oliver Herrmann
I'm always getting the message "Skipping file mc-11-big-Data.db: table
snapshots.table3 doesn't exist". I also tried to rename the snapshots
folder into the keyspace name (cass_testapp) but then I get the message
"Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".
Hi,
I imagine moving the files from snapshot directory to the data directory
and then running `nodetool refresh` is the supported way. Why use
sstableloader for that?
--
Alex
Dmitry Saprykin
2018-11-30 19:18:07 UTC
Permalink
You need to move you files into directory named 'cass_testapp/table3/'.
sstable loader uses 2 last path components as keyspace and table names.
Post by Oliver Herrmann
When using nodetool refresh I must have write access to the data folder
and I have to do it on every node. In our production environment the user
that would do the restore does not have write access to the data folder.
Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin <
Post by Oleksandr Shulgin
Post by Oliver Herrmann
I'm always getting the message "Skipping file mc-11-big-Data.db: table
snapshots.table3 doesn't exist". I also tried to rename the snapshots
folder into the keyspace name (cass_testapp) but then I get the message
"Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".
Hi,
I imagine moving the files from snapshot directory to the data directory
and then running `nodetool refresh` is the supported way. Why use
sstableloader for that?
--
Alex
Oliver Herrmann
2018-11-30 19:56:39 UTC
Permalink
Oleksandr Shulgin
2018-12-02 05:24:29 UTC
Permalink
Post by Oliver Herrmann
When using nodetool refresh I must have write access to the data folder
and I have to do it on every node. In our production environment the user
that would do the restore does not have write access to the data folder.
OK, not entirely sure that's a reasonable setup, but do you imply that with
sstableloader you don't need to process every snapshot taken -- that is,
also visiting every node? That would only be true if your replication
factor equals to the number of nodes, IMO.

--
Alex
Oliver Herrmann
2018-12-03 15:23:57 UTC
Permalink
Am So., 2. Dez. 2018 um 06:24 Uhr schrieb Oleksandr Shulgin <
Post by Oleksandr Shulgin
Post by Oliver Herrmann
When using nodetool refresh I must have write access to the data folder
and I have to do it on every node. In our production environment the user
that would do the restore does not have write access to the data folder.
OK, not entirely sure that's a reasonable setup, but do you imply that
with sstableloader you don't need to process every snapshot taken -- that
is, also visiting every node? That would only be true if your replication
factor equals to the number of nodes, IMO.
You are right. The number of nodes in our cluster is equal to the
replication factor. For that reason I think it should be sufficient to call
sstableloader only from one node.
Oleksandr Shulgin
2018-12-04 07:54:03 UTC
Permalink
Post by Oliver Herrmann
You are right. The number of nodes in our cluster is equal to the
replication factor. For that reason I think it should be sufficient to call
sstableloader only from one node.
The next question is then: do you care about consistency of data restored
from one snapshot? Is the snapshot taken after repair? Do you still write
to those tables?

In other words, your data will be consistent after restoring from one
node's snapshot only if you were writing with consistency level ALL (or
equal to your replication factor and, transitively, to the number of nodes).
--
Oleksandr "Alex" Shulgin | Senior Software Engineer | Team Flux | Data
Services | Zalando SE | Tel: +49 176 127-59-707
Alex Ott
2018-12-02 13:49:24 UTC
Permalink
It's a bug in the sstableloader introduced many years ago - before that, it
worked as described in documentation...

Oliver Herrmann at "Fri, 30 Nov 2018 17:05:43 +0100" wrote:
OH> Hi,

OH> I'm having some problems to restore a snapshot using sstableloader. I'm using cassandra 3.11.1 and followed the instructions for
OH> a creating and restoring from this page:
OH> https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/toolsBulkloader.html 

OH> 1. Called nodetool cleanup on each node
OH> $ nodetool cleanup cass_testapp

OH> 2. Called nodetool snapshot on each node
OH> $ nodetool snapshot -t snap1 -kt cass_testapp.table3 

OH> 3. Checked the data and snapshot folders:
OH> $ ll /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb
OH> drwxr-xr-x 2 cassandra cassandra    6 Nov 29 03:54 backups
OH> -rw-r--r-- 2 cassandra cassandra   43 Nov 30 10:21 mc-11-big-CompressionInfo.db
OH> -rw-r--r-- 2 cassandra cassandra  241 Nov 30 10:21 mc-11-big-Data.db
OH> -rw-r--r-- 2 cassandra cassandra    9 Nov 30 10:21 mc-11-big-Digest.crc32
OH> -rw-r--r-- 2 cassandra cassandra   16 Nov 30 10:21 mc-11-big-Filter.db
OH> -rw-r--r-- 2 cassandra cassandra   21 Nov 30 10:21 mc-11-big-Index.db
OH> -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
OH> -rw-r--r-- 2 cassandra cassandra   95 Nov 30 10:21 mc-11-big-Summary.db
OH> -rw-r--r-- 2 cassandra cassandra   92 Nov 30 10:21 mc-11-big-TOC.txt
OH> drwxr-xr-x 3 cassandra cassandra   18 Nov 30 10:30 snapshots

OH> and 

OH> $ ll /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
OH> total 44
OH> -rw-r--r-- 1 cassandra cassandra   32 Nov 30 10:30 manifest.json
OH> -rw-r--r-- 2 cassandra cassandra   43 Nov 30 10:21 mc-11-big-CompressionInfo.db
OH> -rw-r--r-- 2 cassandra cassandra  241 Nov 30 10:21 mc-11-big-Data.db
OH> -rw-r--r-- 2 cassandra cassandra    9 Nov 30 10:21 mc-11-big-Digest.crc32
OH> -rw-r--r-- 2 cassandra cassandra   16 Nov 30 10:21 mc-11-big-Filter.db
OH> -rw-r--r-- 2 cassandra cassandra   21 Nov 30 10:21 mc-11-big-Index.db
OH> -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
OH> -rw-r--r-- 2 cassandra cassandra   95 Nov 30 10:21 mc-11-big-Summary.db
OH> -rw-r--r-- 2 cassandra cassandra   92 Nov 30 10:21 mc-11-big-TOC.txt
OH> -rw-r--r-- 1 cassandra cassandra 1043 Nov 30 10:30 schema.cql

OH> 4. Truncated the table
OH> cqlsh:cass_testapp> TRUNCATE table3 ;

OH> 5. Tried to restore table3 on one cassandra node
OH> $ sstableloader -d localhost /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
OH> Established connection to initial hosts
OH> Opening sstables and calculating sections to stream
OH> Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist

OH> Summary statistics: 
OH>    Connections per host    : 1         
OH>    Total files transferred : 0         
OH>    Total bytes transferred : 0.000KiB  
OH>    Total duration          : 2652 ms   
OH>    Average transfer rate   : 0.000KiB/s
OH>    Peak transfer rate      : 0.000KiB/s

OH> I'm always getting the message "Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist". I also tried to rename
OH> the snapshots folder into the keyspace name (cass_testapp) but then I get the message "Skipping file mc-11-big-Data.db: table
OH> snap1.snap1. doesn't exist".

OH> What I'm doing wrong?

OH> Thanks
OH> Oliver
--
With best wishes, Alex Ott
Solutions Architect EMEA, DataStax
http://datastax.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@cassandra.apache.org
For additional commands, e-mail: user-***@cassandra.apache.org
Loading...