Hi,
With Stratio Cassandra you can create Lucene based indexes for
multidimensional queries this way:
ALTER TABLE images.results1 ADD lucene text ;
CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
USING 'com.stratio.cassandra.index.RowIndex'
WITH OPTIONS = {
'refresh_seconds':'1',
'schema':'{
fields:{
image_caseid:{type:"string"},
x:{type:"double"},
y:{type:"double"} } } '};
Then you can perform the query using the dummy column:
SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';
However, you can take advantage of partition key to route the query only to
the nodes owning the data:
SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';
Or, even better:
SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND
lucene='{ filter:{field:"y", type:"range", lower:100}}';
Additionally, if your data are geospatial (latitude and longitude), soon
you will can use the incoming spatial features.
Post by Mehak MehtaSorry I gave you wrong table definition for query. Here a composite key
of image_caseid, x and uuid which is unique. I have used x in clustering
columns to query it. And used secondary index on y column.
1. Example
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,uuid));*
*cqlsh:images> create index results1_y on results1(y);*
In the below query you can see I have image_caseid as primary key which is
filtered. Even then it is giving error that "*No indexed columns present*
"
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
2. Example
I also tried including both x and y columns as composite key even then
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
((image_caseid),x,y,uuid));*
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y<100 order by image_caseid asc;*
*code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
restricted (preceding column "ColumnDefinition{name=x,
type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
componentIndex=0, indexName=null, indexType=null}" is either not restricted
or by a non-EQ relation)"*
Thanks,
Mehak
Post by Jack KrupanskyYeah, you may have to add a dummy column populated with a constant, or
just pick some other additional column to index that you can apply = to.
It's just a requirement to assure that a filtered select isn't too-too slow.
Uh.... if you are applying = to your primary key then there is no need
for any filtering. What did you really mean to do? It makes no sense the
way you have it!
Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".
-- Jack Krupansky
Post by Mehak MehtaHi,
In my case I am just trying to do range queries on 2 dimensions i.e. x
and y.
*cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y
double, submit_date timestamp, points list<double>, PRIMARY KEY
(image_caseid));*
*cqlsh:images> create index results1_y on results1(y);*
*cqlsh:images> create index results1_x on results1(x);*
But when I try the following the query it is not allowed with an error.
*cqlsh:images> select * from results1 where image_caseid='mehak' and x >
100 and y <100 allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
Can you tell me why this is not working. I can't understand why
Cassandra is not allowing me to use Non EQ range query on secondary indexes
even when I have restricted my primary key by EQ as specified in error
message.
Also please elaborate on how to use Lucene/Solr in this case with Cassandra.
Thanks,
Mehak
Lucene or Solr is better. Iâm using lucene
1. Create multiple secondary indexes, one for each non-key column you
need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
-- Jack Krupansky
Post by Mehak MehtaHi,
I want to perform range queries (as in x and y ranges) on a large data
billions of rows.
CQL allows me to put Non EQ restrictions on only one of the clustering columns.
Its not allowing me to filter the data using any other column even
with use of Allow Filtering option.
cqlsh:images> *select * from results1 where image_caseid='mehak' and
x > 100 and y <100 order by x allow filtering;*
*code=2200 [Invalid query] message="No indexed columns present in
by-columns clause with Equal operator"*
*CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w
double, h double, normalized varchar, type varchar, filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width double, image_height
double, objective double, cancer_type varchar, Area float, submit_date
timestamp, points list<double>, PRIMARY KEY ((image_caseid),x));*
Kindly suggest me how to model is data and perform 2d or in general
multi dimension range queries in Cassandra efficiently.
Thanks,
Mehak
--
Andrés de la Peña
<http://www.stratio.com/>
Avenida de Europa, 26. Ãtica 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*