2d or multi dimension range query in cassandra CQL

Discussion:

Mehak Mehta

2015-03-17 17:49:54 UTC

Hi,

I want to perform range queries (as in x and y ranges) on a large data
billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering
columns.
Its not allowing me to filter the data using any other column even with use
of Allow Filtering option.

cqlsh:images> *select * from results1 where image_caseid='mehak' and x >
100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*

My table definition is :

*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar,
analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h
double, normalized varchar, type varchar, filehost varchar, filename
varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*

Kindly suggest me how to model is data and perform 2d or in general multi
dimension range queries in Cassandra efficiently.

Thanks,
Mehak

Jack Krupansky

2015-03-17 18:11:47 UTC

Permalink

Post by Mehak Mehta
Hi,
I want to perform range queries (as in x and y ranges) on a large data
billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering
columns.
Its not allowing me to filter the data using any other column even with
use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak' and x >
100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar,
analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h
double, normalized varchar, type varchar, filehost varchar, filename
varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general multi
dimension range queries in Cassandra efficiently.
Thanks,
Mehak

健峰高

2015-03-17 18:26:52 UTC

Permalink

Lucene or Solr is better. I¡¯m using lucene

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky
Hi,
I want to perform range queries (as in x and y ranges) on a large data billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering columns.
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.
cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"
CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar, Area float, submit_date timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));
Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

Mehak Mehta

2015-03-17 20:51:29 UTC

Permalink

Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and
y.

*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
(image_caseid));*

With secondary indexes on x and y:
*cqlsh:images> create index results1_y on results1(y);*
*cqlsh:images> create index results1_x on results1(x);*

But when I try the following the query it is not allowed with an error.

*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y <100 allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*

Can you tell me why this is not working. I can't understand why Cassandra
is not allowing me to use Non EQ range query on secondary indexes even when
I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column you need
to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky

Hi,
I want to perform range queries (as in x and y ranges) on a large data billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering columns.
Its not allowing me to filter the data using any other column even with
use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak' and x >
100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar,
analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h
double, normalized varchar, type varchar, filehost varchar, filename
varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general multi
dimension range queries in Cassandra efficiently.
Thanks,
Mehak

Jack Krupansky

2015-03-17 21:19:01 UTC

Permalink

Yeah, you may have to add a dummy column populated with a constant, or just
pick some other additional column to index that you can apply = to. It's
just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for
any filtering. What did you really mean to do? It makes no sense the way
you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

Post by Mehak Mehta
Hi,
In my case I am just trying to do range queries on 2 dimensions i.e. x and
y.
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
(image_caseid));*
*cqlsh:images> create index results1_y on results1(y);*
*cqlsh:images> create index results1_x on results1(x);*
But when I try the following the query it is not allowed with an error.
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y <100 allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
Can you tell me why this is not working. I can't understand why Cassandra
is not allowing me to use Non EQ range query on secondary indexes even when
I have restricted my primary key by EQ as specified in error message.
Also please elaborate on how to use Lucene/Solr in this case with Cassandra.
Thanks,
Mehak

Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column you
need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky

Post by Mehak Mehta
100 and y <100 order by x allow filtering;*

*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar,
analysis_execution_uuid uuid, x double, y double, loc varchar, w double, h
double, normalized varchar, type varchar, filehost varchar, filename
varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general
multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

Mehak Mehta

2015-03-17 22:01:11 UTC

Permalink

Sorry I gave you wrong table definition for query. Here a composite key
of image_caseid, x and uuid which is unique. I have used x in clustering
columns to query it. And used secondary index on y column.

1. Example
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,uuid));*
*cqlsh:images> create index results1_y on results1(y);*

In the below query you can see I have image_caseid as primary key which is
filtered. Even then it is giving error that "*No indexed columns present*"

*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*

2. Example
I also tried including both x and y columns as composite key even then
query gives following error:

*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,y,uuid));*

*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
restricted (preceding column "ColumnDefinition{name=x,
type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
componentIndex=0, indexName=null, indexType=null}" is either not restricted
or by a non-EQ relation)"*

Thanks,
Mehak

Post by Jack Krupansky
Yeah, you may have to add a dummy column populated with a constant, or
just pick some other additional column to index that you can apply = to.
It's just a requirement to assure that a filtered select isn't too-too slow.
Uh.... if you are applying = to your primary key then there is no need for
any filtering. What did you really mean to do? It makes no sense the way
you have it!
Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".
-- Jack Krupansky

Post by Mehak Mehta
Hi,
In my case I am just trying to do range queries on 2 dimensions i.e. x
and y.
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
(image_caseid));*
*cqlsh:images> create index results1_y on results1(y);*
*cqlsh:images> create index results1_x on results1(x);*
But when I try the following the query it is not allowed with an error.
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y <100 allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
Can you tell me why this is not working. I can't understand why Cassandra
is not allowing me to use Non EQ range query on secondary indexes even when
I have restricted my primary key by EQ as specified in error message.
Also please elaborate on how to use Lucene/Solr in this case with Cassandra.
Thanks,
Mehak

Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column you
need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky

Post by Mehak Mehta
100 and y <100 order by x allow filtering;*

*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general
multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

Andres de la Peña

2015-03-18 09:45:49 UTC

Permalink

Hi,

With Stratio Cassandra you can create Lucene based indexes for
multidimensional queries this way:

ALTER TABLE images.results1 ADD lucene text ;

CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
USING 'com.stratio.cassandra.index.RowIndex'
WITH OPTIONS = {
'refresh_seconds':'1',
'schema':'{
fields:{
image_caseid:{type:"string"},
x:{type:"double"},
y:{type:"double"} } } '};

Then you can perform the query using the dummy column:

SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

However, you can take advantage of partition key to route the query only to
the nodes owning the data:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

Or, even better:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND
lucene='{ filter:{field:"y", type:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitude), soon
you will can use the incoming spatial features.

Post by Mehak Mehta
Sorry I gave you wrong table definition for query. Here a composite key
of image_caseid, x and uuid which is unique. I have used x in clustering
columns to query it. And used secondary index on y column.
1. Example
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,uuid));*
*cqlsh:images> create index results1_y on results1(y);*
In the below query you can see I have image_caseid as primary key which is
filtered. Even then it is giving error that "*No indexed columns present*
"
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
2. Example
I also tried including both x and y columns as composite key even then
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,y,uuid));*
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
restricted (preceding column "ColumnDefinition{name=x,
type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
componentIndex=0, indexName=null, indexType=null}" is either not restricted
or by a non-EQ relation)"*
Thanks,
Mehak

Post by Jack Krupansky
Yeah, you may have to add a dummy column populated with a constant, or
just pick some other additional column to index that you can apply = to.
It's just a requirement to assure that a filtered select isn't too-too slow.
Uh.... if you are applying = to your primary key then there is no need
for any filtering. What did you really mean to do? It makes no sense the
way you have it!
Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".
-- Jack Krupansky

Post by Mehak Mehta
Hi,
In my case I am just trying to do range queries on 2 dimensions i.e. x
and y.
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
(image_caseid));*
*cqlsh:images> create index results1_y on results1(y);*
*cqlsh:images> create index results1_x on results1(x);*
But when I try the following the query it is not allowed with an error.
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y <100 allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
Can you tell me why this is not working. I can't understand why
Cassandra is not allowing me to use Non EQ range query on secondary indexes
even when I have restricted my primary key by EQ as specified in error
message.
Also please elaborate on how to use Lucene/Solr in this case with Cassandra.
Thanks,
Mehak

Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column you
need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky

Post by Mehak Mehta
Hi,
I want to perform range queries (as in x and y ranges) on a large data
billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering columns.
Its not allowing me to filter the data using any other column even
with use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak' and
x > 100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general
multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

--
AndrÃ©s de la PeÃ±a

<http://www.stratio.com/>
Avenida de Europa, 26. Ãtica 5. 3Âª Planta
28224 Pozuelo de AlarcÃ³n, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Mehak Mehta

2015-03-22 06:44:09 UTC

Permalink

Hi,

On the basis of some suggestions, I tried using tuplejump for multidimensional
queries. Since other mostly needed root permissions (for building ) which I
don't have on my cluster account.

I found a major problem in tuplejump (stargate-core). When I am using it
with a list type field in my table. It stops working.
For e.g.

create table person (
id int primary key,
isActive boolean,
age int,
eyeColor varchar,
name text,
gender varchar,
company varchar,
email varchar,
phone varchar,
address text,
points list<double>,
stargate text
);

with indexing as:
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
'com.tuplejump.stargate.RowIndex' WITH options =
{
'sg_options':'{
"fields":{
"eyeColor":{},
"age":{},
"phone":{}
}
}'
};

If I insert data in the table along with points list. The following query
won't give any results (0 rows):

SELECT * FROM RESULTS1 WHERE stargate ='{
filter: {
type: "range",
field: "x",
lower: 0
}
}';

I tried removing points list<double> from the table and it works i.e. same
query will return results.
Can somebody help me with this problem as I couldn't find much support from
Stargate.

Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
as given in link (
http://stargate-core.readthedocs.org/en/latest/quickstart.html).

Thanks,
Mehak

Post by Andres de la PeÃ±a
Hi,
With Stratio Cassandra you can create Lucene based indexes for
ALTER TABLE images.results1 ADD lucene text ;
CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
USING 'com.stratio.cassandra.index.RowIndex'
WITH OPTIONS = {
'refresh_seconds':'1',
'schema':'{
fields:{
image_caseid:{type:"string"},
x:{type:"double"},
y:{type:"double"} } } '};
SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';
However, you can take advantage of partition key to route the query only
SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';
SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND
lucene='{ filter:{field:"y", type:"range", lower:100}}';
Additionally, if your data are geospatial (latitude and longitude), soon
you will can use the incoming spatial features.

Post by Mehak Mehta
Sorry I gave you wrong table definition for query. Here a composite key
of image_caseid, x and uuid which is unique. I have used x in clustering
columns to query it. And used secondary index on y column.
1. Example
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,uuid));*
*cqlsh:images> create index results1_y on results1(y);*
In the below query you can see I have image_caseid as primary key which
is filtered. Even then it is giving error that "*No indexed columns
present*"
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
2. Example
I also tried including both x and y columns as composite key even then
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,y,uuid));*
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
restricted (preceding column "ColumnDefinition{name=x,
type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
componentIndex=0, indexName=null, indexType=null}" is either not restricted
or by a non-EQ relation)"*
Thanks,
Mehak

Post by Jack Krupansky
Yeah, you may have to add a dummy column populated with a constant, or
just pick some other additional column to index that you can apply = to.
It's just a requirement to assure that a filtered select isn't too-too slow.
Uh.... if you are applying = to your primary key then there is no need
for any filtering. What did you really mean to do? It makes no sense the
way you have it!
Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".
-- Jack Krupansky

Post by Mehak Mehta
100 and y <100 allow filtering;*

*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
Can you tell me why this is not working. I can't understand why
Cassandra is not allowing me to use Non EQ range query on secondary indexes
even when I have restricted my primary key by EQ as specified in error
message.
Also please elaborate on how to use Lucene/Solr in this case with Cassandra.
Thanks,
Mehak

Post by Mehak Mehta
Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column you
need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky
On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <

Post by Mehak Mehta
Hi,
I want to perform range queries (as in x and y ranges) on a large
data billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering columns.
Its not allowing me to filter the data using any other column even
with use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak' and
x > 100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general
multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

--
AndrÃ©s de la PeÃ±a
<http://www.stratio.com/>
Avenida de Europa, 26. Ãtica 5. 3Âª Planta
28224 Pozuelo de AlarcÃ³n, Madrid

Asit KAUSHIK

2015-03-23 12:17:18 UTC

Permalink

i am using Startio Cassandra it way better than stargate as it works on the
latest release of Cassandra and is better on my performance.

we are using it for fulltext search use case

Regards
Asit

Post by Mehak Mehta
Hi,
On the basis of some suggestions, I tried using tuplejump for multidimensional
queries. Since other mostly needed root permissions (for building ) which I
don't have on my cluster account.
I found a major problem in tuplejump (stargate-core). When I am using it
with a list type field in my table. It stops working.
For e.g.
create table person (
id int primary key,
isActive boolean,
age int,
eyeColor varchar,
name text,
gender varchar,
company varchar,
email varchar,
phone varchar,
address text,
points list<double>,
stargate text
);
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
'com.tuplejump.stargate.RowIndex' WITH options =
{
'sg_options':'{
"fields":{
"eyeColor":{},
"age":{},
"phone":{}
}
}'
};
If I insert data in the table along with points list. The following query
SELECT * FROM RESULTS1 WHERE stargate ='{
filter: {
type: "range",
field: "x",
lower: 0
}
}';
I tried removing points list<double> from the table and it works i.e.
same query will return results.
Can somebody help me with this problem as I couldn't find much support
from Stargate.
Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
as given in link (
http://stargate-core.readthedocs.org/en/latest/quickstart.html).
Thanks,
Mehak

Post by Mehak Mehta
Sorry I gave you wrong table definition for query. Here a composite key
of image_caseid, x and uuid which is unique. I have used x in clustering
columns to query it. And used secondary index on y column.
1. Example
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,uuid));*
*cqlsh:images> create index results1_y on results1(y);*
In the below query you can see I have image_caseid as primary key which
is filtered. Even then it is giving error that "*No indexed columns
present*"
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
2. Example
I also tried including both x and y columns as composite key even then
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,y,uuid));*
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
restricted (preceding column "ColumnDefinition{name=x,
type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
componentIndex=0, indexName=null, indexType=null}" is either not restricted
or by a non-EQ relation)"*
Thanks,
Mehak
On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <

Post by Jack Krupansky
Yeah, you may have to add a dummy column populated with a constant, or
just pick some other additional column to index that you can apply = to.
It's just a requirement to assure that a filtered select isn't too-too slow.
Uh.... if you are applying = to your primary key then there is no need
for any filtering. What did you really mean to do? It makes no sense the
way you have it!
Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".
-- Jack Krupansky

Post by Mehak Mehta
100 and y <100 allow filtering;*

Post by Mehak Mehta
Hi,
I want to perform range queries (as in x and y ranges) on a large
data billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering columns.
Its not allowing me to filter the data using any other column even
with use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak'
and x > 100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general
multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

--
AndrÃ©s de la PeÃ±a
<http://www.stratio.com/>
Avenida de Europa, 26. Ãtica 5. 3Âª Planta
28224 Pozuelo de AlarcÃ³n, Madrid

Mehak Mehta

2015-03-23 17:07:21 UTC

Permalink

Hi,

I checked Startio Cassandra but couldn't get any good documentation for the
same.
Can you give me some pointers on how to use it.
Do I have to build it from the source or I can use it directly with jar
files as in case of Stargate.
Since I was looking for solution which I don't need a full build and can be
used with existing tar of cassandra because I have some restrictions on
installing stuff on my server.

Thanks,
Mehak

Post by Asit KAUSHIK
i am using Startio Cassandra it way better than stargate as it works on
the latest release of Cassandra and is better on my performance.
we are using it for fulltext search use case
Regards
Asit

Post by Mehak Mehta
100 and y<100 order by image_caseid asc;*

*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
2. Example
I also tried including both x and y columns as composite key even then
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,y,uuid));*
*cqlsh:images> select * from results1 where image_caseid='mehak' and x

Post by Mehak Mehta
100 and y<100 order by image_caseid asc;*

*code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
restricted (preceding column "ColumnDefinition{name=x,
type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
componentIndex=0, indexName=null, indexType=null}" is either not restricted
or by a non-EQ relation)"*
Thanks,
Mehak
On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <

Post by Mehak Mehta
Yeah, you may have to add a dummy column populated with a constant, or
just pick some other additional column to index that you can apply = to.
It's just a requirement to assure that a filtered select isn't too-too slow.
Uh.... if you are applying = to your primary key then there is no need
for any filtering. What did you really mean to do? It makes no sense the
way you have it!
Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".
-- Jack Krupansky
On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <

Post by Mehak Mehta
Hi,
In my case I am just trying to do range queries on 2 dimensions i.e.
x and y.
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
(image_caseid));*
*cqlsh:images> create index results1_y on results1(y);*
*cqlsh:images> create index results1_x on results1(x);*
But when I try the following the query it is not allowed with an error.
*cqlsh:images> select * from results1 where image_caseid='mehak' and
x > 100 and y <100 allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
Can you tell me why this is not working. I can't understand why
Cassandra is not allowing me to use Non EQ range query on secondary indexes
even when I have restricted my primary key by EQ as specified in error
message.
Also please elaborate on how to use Lucene/Solr in this case with Cassandra.
Thanks,
Mehak

Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column
you need to index on. Not recommended. Considered an anti-pattern for
Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky
On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <

Post by Mehak Mehta
Hi,
I want to perform range queries (as in x and y ranges) on a large
data billions of rows.
CQL allows me to put Non EQ restrictions on only one of the
clustering columns.
Its not allowing me to filter the data using any other column even
with use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak'
and x > 100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general
multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

--
AndrÃ©s de la PeÃ±a
<http://www.stratio.com/>
Avenida de Europa, 26. Ãtica 5. 3Âª Planta
28224 Pozuelo de AlarcÃ³n, Madrid

Andres de la Peña

2015-03-23 17:52:45 UTC

Permalink

Hi,

You can download Stratio Cassandra binaries from
https://s3.amazonaws.com/stratioorg/cassandra/stratio-cassandra-2.1.3.1-bin.tar.gz

You can get info about how to build and getting started at its README file
<https://github.com/Stratio/stratio-cassandra/blob/master/README.md>. More
detailed info can be found at
https://github.com/Stratio/stratio-cassandra/blob/master/doc/stratio/extended-search-in-cassandra.md
.

Regards,

Post by Mehak Mehta
Hi,
I checked Startio Cassandra but couldn't get any good documentation for
the same.
Can you give me some pointers on how to use it.
Do I have to build it from the source or I can use it directly with jar
files as in case of Stargate.
Since I was looking for solution which I don't need a full build and can
be used with existing tar of cassandra because I have some restrictions on
installing stuff on my server.
Thanks,
Mehak

Post by Mehak Mehta
Hi,
On the basis of some suggestions, I tried using tuplejump for multidimensional
queries. Since other mostly needed root permissions (for building ) which I
don't have on my cluster account.
I found a major problem in tuplejump (stargate-core). When I am using it
with a list type field in my table. It stops working.
For e.g.
create table person (
id int primary key,
isActive boolean,
age int,
eyeColor varchar,
name text,
gender varchar,
company varchar,
email varchar,
phone varchar,
address text,
points list<double>,
stargate text
);
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
'com.tuplejump.stargate.RowIndex' WITH options =
{
'sg_options':'{
"fields":{
"eyeColor":{},
"age":{},
"phone":{}
}
}'
};
If I insert data in the table along with points list. The following
SELECT * FROM RESULTS1 WHERE stargate ='{
filter: {
type: "range",
field: "x",
lower: 0
}
}';
I tried removing points list<double> from the table and it works i.e.
same query will return results.
Can somebody help me with this problem as I couldn't find much support
from Stargate.
Please note that I am using Cassandra 2.0.9 compatible with
Stargate-core as given in link (
http://stargate-core.readthedocs.org/en/latest/quickstart.html).
Thanks,
Mehak
On Wed, Mar 18, 2015 at 5:45 AM, Andres de la PeÃ±a <

Post by Andres de la PeÃ±a
Hi,
With Stratio Cassandra you can create Lucene based indexes for
ALTER TABLE images.results1 ADD lucene text ;
CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
USING 'com.stratio.cassandra.index.RowIndex'
WITH OPTIONS = {
'refresh_seconds':'1',
'schema':'{
fields:{
image_caseid:{type:"string"},
x:{type:"double"},
y:{type:"double"} } } '};
SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';
However, you can take advantage of partition key to route the query
SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';
SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND
lucene='{ filter:{field:"y", type:"range", lower:100}}';
Additionally, if your data are geospatial (latitude and longitude),
soon you will can use the incoming spatial features.

Post by Mehak Mehta
100 and y<100 order by image_caseid asc;*

Post by Mehak Mehta
Yeah, you may have to add a dummy column populated with a constant,
or just pick some other additional column to index that you can apply = to.
It's just a requirement to assure that a filtered select isn't too-too slow.
Uh.... if you are applying = to your primary key then there is no
need for any filtering. What did you really mean to do? It makes no sense
the way you have it!
Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".
-- Jack Krupansky
On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <

Post by Mehak Mehta
Hi,
In my case I am just trying to do range queries on 2 dimensions i.e.
x and y.
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
(image_caseid));*
*cqlsh:images> create index results1_y on results1(y);*
*cqlsh:images> create index results1_x on results1(x);*
But when I try the following the query it is not allowed with an error.
*cqlsh:images> select * from results1 where image_caseid='mehak' and
x > 100 and y <100 allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
Can you tell me why this is not working. I can't understand why
Cassandra is not allowing me to use Non EQ range query on secondary indexes
even when I have restricted my primary key by EQ as specified in error
message.
Also please elaborate on how to use Lucene/Solr in this case with Cassandra.
Thanks,
Mehak

Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column
you need to index on. Not recommended. Considered an anti-pattern for
Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky
On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <

Post by Mehak Mehta
Hi,
I want to perform range queries (as in x and y ranges) on a large
data billions of rows.
CQL allows me to put Non EQ restrictions on only one of the
clustering columns.
Its not allowing me to filter the data using any other column even
with use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak'
and x > 100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in
general multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak

--
AndrÃ©s de la PeÃ±a
<http://www.stratio.com/>
Avenida de Europa, 26. Ãtica 5. 3Âª Planta
28224 Pozuelo de AlarcÃ³n, Madrid