Discussion:
Best approach in Cassandra (+ Spark?) for Continuous Queries?
(too old to reply)
Hugo José Pinto
2015-01-03 10:46:58 UTC
Permalink
Hello.

We're currently using Hazelcast (http://hazelcast.org/) as a distributed
in-memory data grid. That's been working sort-of-well for us, but going
solely in-memory has exhausted its path in our use case, and we're
considering porting our application to a NoSQL persistent store. After the
usual comparisons and evaluations, we're borderline close to picking
Cassandra, plus eventually Spark for analytics.

Nonetheless, there is a gap in our architectural needs that we're still not
grasping how to solve in Cassandra (with or without Spark): Hazelcast
allows us to create a Continuous Query in that, whenever a row is
added/removed/modified from the clause's resultset, Hazelcast calls up back
with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.

This is probably a conceptual mismatch we're making, so - how to best
address this use case in Cassandra (with or without Spark's help)? Is there
something in the API that allows for Continuous Queries on key/clause
changes (haven't found it)? Is there some other way to get a stream of
key/clause updates? Events of some sort?

I'm aware that we could, eventually, periodically poll Cassandra, but in
our use case, the client is potentially interested in a large number of
table clause notifications (think "all changes to Ship positions on
California's coastline"), and iterating out of the store would kill the
streamer's scalability.

Hence, the magic question: what are we missing? Is Cassandra the wrong tool
for the job? Are we not aware of a particular part of the API or external
library in/outside the apache realm that would allow for this?

Many thanks for any assistance!

Hugo
DuyHai Doan
2015-01-03 11:09:59 UTC
Permalink
Hello Hugo

I was facing the same kind of requirement from some users. Long story
short, below are the possible strategies with advantages and draw-backs of
each

1) Put Spark in front of the back-end, every incoming
modification/update/insert goes into Spark first, then Spark will forward
it to Cassandra for persistence. With Spark, you can perform pre or
post-processing and notify external clients of mutation.

The draw back of this solution is that all the incoming mutations must go
through Spark. You may set up a Kafka queue as temporary storage to
distribute the load and consume mutations with Spark but it add ups to the
architecture complexity with additional components & technologies

2) For high availability and resilience, you probably want to have all
mutations saved first into Cassandra then process notifications with Spark.
In this case the only way to have notifications from Cassandra, as of
version 2.1, is to rely on manually coded triggers (which is still
experimental feature).

With the triggers you can notify whatever clients you want, not only Spark.

The big draw back of this solution is that playing with triggers is
dangerous if you are not familiar with Cassandra internals. Indeed the
trigger is on the write path and may hurt performance if you are doing
complex and blocking tasks.

That's the 2 solutions I can see, maybe the ML members will propose other
innovative choices

Regards
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed
in-memory data grid. That's been working sort-of-well for us, but going
solely in-memory has exhausted its path in our use case, and we're
considering porting our application to a NoSQL persistent store. After the
usual comparisons and evaluations, we're borderline close to picking
Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still
not grasping how to solve in Cassandra (with or without Spark): Hazelcast
allows us to create a Continuous Query in that, whenever a row is
added/removed/modified from the clause's resultset, Hazelcast calls up back
with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best
address this use case in Cassandra (with or without Spark's help)? Is there
something in the API that allows for Continuous Queries on key/clause
changes (haven't found it)? Is there some other way to get a stream of
key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in
our use case, the client is potentially interested in a large number of
table clause notifications (think "all changes to Ship positions on
California's coastline"), and iterating out of the store would kill the
streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong
tool for the job? Are we not aware of a particular part of the API or
external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Hugo José Pinto
2015-01-03 23:07:07 UTC
Permalink
Thank you all for your answers.

It seems I'll have to go with some event-driven processing before/during the Cassandra write path.

My concern would be that I'd love to first guarantee the disk write of the Cassandra persistence and then do the event processing (which is mostly CRUD intercepts at this point), even if slightly delayed, and doing so via triggers would probably bog down the whole processing pipeline.

What I'd probably do is to write, in trigger, a separate key table with all the CRUDed elements and to have the ESP process that table.

Thank you for your contribution. Should anyone else have any experiende experience in these scenarios I'm obviously all ears as well.

Best,

Hugo

Enviado do meu iPhone
Post by DuyHai Doan
Hello Hugo
I was facing the same kind of requirement from some users. Long story short, below are the possible strategies with advantages and draw-backs of each
1) Put Spark in front of the back-end, every incoming modification/update/insert goes into Spark first, then Spark will forward it to Cassandra for persistence. With Spark, you can perform pre or post-processing and notify external clients of mutation.
The draw back of this solution is that all the incoming mutations must go through Spark. You may set up a Kafka queue as temporary storage to distribute the load and consume mutations with Spark but it add ups to the architecture complexity with additional components & technologies
2) For high availability and resilience, you probably want to have all mutations saved first into Cassandra then process notifications with Spark. In this case the only way to have notifications from Cassandra, as of version 2.1, is to rely on manually coded triggers (which is still experimental feature).
With the triggers you can notify whatever clients you want, not only Spark.
The big draw back of this solution is that playing with triggers is dangerous if you are not familiar with Cassandra internals. Indeed the trigger is on the write path and may hurt performance if you are doing complex and blocking tasks.
That's the 2 solutions I can see, maybe the ML members will propose other innovative choices
Regards
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its path in our use case, and we're considering porting our application to a NoSQL persistent store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still not grasping how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast calls up back with the corresponding notification. We use this to continuously update the clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best address this use case in Cassandra (with or without Spark's help)? Is there something in the API that allows for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to get a stream of key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in our use case, the client is potentially interested in a large number of table clause notifications (think "all changes to Ship positions on California's coastline"), and iterating out of the store would kill the streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job? Are we not aware of a particular part of the API or external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Colin
2015-01-03 23:10:59 UTC
Permalink
Use a message bus with a transactional get, get the message, send to cassandra, upon write success, submit to esp, commit get on bus. Messaging systems like rabbitmq support this semantic.

Using cassandra as a queuing mechanism is an anti-pattern.

--
Colin Clark
+1-320-221-9531
Post by Hugo José Pinto
Thank you all for your answers.
It seems I'll have to go with some event-driven processing before/during the Cassandra write path.
My concern would be that I'd love to first guarantee the disk write of the Cassandra persistence and then do the event processing (which is mostly CRUD intercepts at this point), even if slightly delayed, and doing so via triggers would probably bog down the whole processing pipeline.
What I'd probably do is to write, in trigger, a separate key table with all the CRUDed elements and to have the ESP process that table.
Thank you for your contribution. Should anyone else have any experiende experience in these scenarios I'm obviously all ears as well.
Best,
Hugo
Enviado do meu iPhone
Post by DuyHai Doan
Hello Hugo
I was facing the same kind of requirement from some users. Long story short, below are the possible strategies with advantages and draw-backs of each
1) Put Spark in front of the back-end, every incoming modification/update/insert goes into Spark first, then Spark will forward it to Cassandra for persistence. With Spark, you can perform pre or post-processing and notify external clients of mutation.
The draw back of this solution is that all the incoming mutations must go through Spark. You may set up a Kafka queue as temporary storage to distribute the load and consume mutations with Spark but it add ups to the architecture complexity with additional components & technologies
2) For high availability and resilience, you probably want to have all mutations saved first into Cassandra then process notifications with Spark. In this case the only way to have notifications from Cassandra, as of version 2.1, is to rely on manually coded triggers (which is still experimental feature).
With the triggers you can notify whatever clients you want, not only Spark.
The big draw back of this solution is that playing with triggers is dangerous if you are not familiar with Cassandra internals. Indeed the trigger is on the write path and may hurt performance if you are doing complex and blocking tasks.
That's the 2 solutions I can see, maybe the ML members will propose other innovative choices
Regards
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its path in our use case, and we're considering porting our application to a NoSQL persistent store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still not grasping how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast calls up back with the corresponding notification. We use this to continuously update the clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best address this use case in Cassandra (with or without Spark's help)? Is there something in the API that allows for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to get a stream of key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in our use case, the client is potentially interested in a large number of table clause notifications (think "all changes to Ship positions on California's coastline"), and iterating out of the store would kill the streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job? Are we not aware of a particular part of the API or external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Peter Lin
2015-01-03 23:43:41 UTC
Permalink
listen to colin's advice, avoid the temptation of anti-patterns.
Post by Colin
Use a message bus with a transactional get, get the message, send to
cassandra, upon write success, submit to esp, commit get on bus. Messaging
systems like rabbitmq support this semantic.
Using cassandra as a queuing mechanism is an anti-pattern.
--
*Colin Clark*
+1-320-221-9531
Thank you all for your answers.
It seems I'll have to go with some event-driven processing before/during
the Cassandra write path.
My concern would be that I'd love to first guarantee the disk write of the
Cassandra persistence and then do the event processing (which is mostly
CRUD intercepts at this point), even if slightly delayed, and doing so via
triggers would probably bog down the whole processing pipeline.
What I'd probably do is to write, in trigger, a separate key table with
all the CRUDed elements and to have the ESP process that table.
Thank you for your contribution. Should anyone else have any experiende
experience in these scenarios I'm obviously all ears as well.
Best,
Hugo
Enviado do meu iPhone
Hello Hugo
I was facing the same kind of requirement from some users. Long story
short, below are the possible strategies with advantages and draw-backs of
each
1) Put Spark in front of the back-end, every incoming
modification/update/insert goes into Spark first, then Spark will forward
it to Cassandra for persistence. With Spark, you can perform pre or
post-processing and notify external clients of mutation.
The draw back of this solution is that all the incoming mutations must go
through Spark. You may set up a Kafka queue as temporary storage to
distribute the load and consume mutations with Spark but it add ups to the
architecture complexity with additional components & technologies
2) For high availability and resilience, you probably want to have all
mutations saved first into Cassandra then process notifications with Spark.
In this case the only way to have notifications from Cassandra, as of
version 2.1, is to rely on manually coded triggers (which is still
experimental feature).
With the triggers you can notify whatever clients you want, not only Spark.
The big draw back of this solution is that playing with triggers is
dangerous if you are not familiar with Cassandra internals. Indeed the
trigger is on the write path and may hurt performance if you are doing
complex and blocking tasks.
That's the 2 solutions I can see, maybe the ML members will propose other
innovative choices
Regards
On Sat, Jan 3, 2015 at 11:46 AM, Hugo José Pinto <
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed
in-memory data grid. That's been working sort-of-well for us, but going
solely in-memory has exhausted its path in our use case, and we're
considering porting our application to a NoSQL persistent store. After the
usual comparisons and evaluations, we're borderline close to picking
Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still
not grasping how to solve in Cassandra (with or without Spark): Hazelcast
allows us to create a Continuous Query in that, whenever a row is
added/removed/modified from the clause's resultset, Hazelcast calls up back
with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best
address this use case in Cassandra (with or without Spark's help)? Is there
something in the API that allows for Continuous Queries on key/clause
changes (haven't found it)? Is there some other way to get a stream of
key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in
our use case, the client is potentially interested in a large number of
table clause notifications (think "all changes to Ship positions on
California's coastline"), and iterating out of the store would kill the
streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong
tool for the job? Are we not aware of a particular part of the API or
external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Hugo José Pinto
2015-01-03 23:48:39 UTC
Permalink
Thanks :)

Duly noted - this is all uncharted territory for us, hence the value of seasoned advice.


Best

--
Hugo José Pinto
Post by Peter Lin
listen to colin's advice, avoid the temptation of anti-patterns.
Post by Colin
Use a message bus with a transactional get, get the message, send to cassandra, upon write success, submit to esp, commit get on bus. Messaging systems like rabbitmq support this semantic.
Using cassandra as a queuing mechanism is an anti-pattern.
--
Colin Clark
+1-320-221-9531
Post by Hugo José Pinto
Thank you all for your answers.
It seems I'll have to go with some event-driven processing before/during the Cassandra write path.
My concern would be that I'd love to first guarantee the disk write of the Cassandra persistence and then do the event processing (which is mostly CRUD intercepts at this point), even if slightly delayed, and doing so via triggers would probably bog down the whole processing pipeline.
What I'd probably do is to write, in trigger, a separate key table with all the CRUDed elements and to have the ESP process that table.
Thank you for your contribution. Should anyone else have any experiende experience in these scenarios I'm obviously all ears as well.
Best,
Hugo
Enviado do meu iPhone
Post by DuyHai Doan
Hello Hugo
I was facing the same kind of requirement from some users. Long story short, below are the possible strategies with advantages and draw-backs of each
1) Put Spark in front of the back-end, every incoming modification/update/insert goes into Spark first, then Spark will forward it to Cassandra for persistence. With Spark, you can perform pre or post-processing and notify external clients of mutation.
The draw back of this solution is that all the incoming mutations must go through Spark. You may set up a Kafka queue as temporary storage to distribute the load and consume mutations with Spark but it add ups to the architecture complexity with additional components & technologies
2) For high availability and resilience, you probably want to have all mutations saved first into Cassandra then process notifications with Spark. In this case the only way to have notifications from Cassandra, as of version 2.1, is to rely on manually coded triggers (which is still experimental feature).
With the triggers you can notify whatever clients you want, not only Spark.
The big draw back of this solution is that playing with triggers is dangerous if you are not familiar with Cassandra internals. Indeed the trigger is on the write path and may hurt performance if you are doing complex and blocking tasks.
That's the 2 solutions I can see, maybe the ML members will propose other innovative choices
Regards
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its path in our use case, and we're considering porting our application to a NoSQL persistent store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still not grasping how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast calls up back with the corresponding notification. We use this to continuously update the clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best address this use case in Cassandra (with or without Spark's help)? Is there something in the API that allows for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to get a stream of key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in our use case, the client is potentially interested in a large number of table clause notifications (think "all changes to Ship positions on California's coastline"), and iterating out of the store would kill the streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job? Are we not aware of a particular part of the API or external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Peter Lin
2015-01-03 23:53:47 UTC
Permalink
if you like SQL dialect, try out products that use streamSQL to do
continuous queries. Espers comes to mind. Google to see what other products
support streamSQL
Post by Hugo José Pinto
Thanks :)
Duly noted - this is all uncharted territory for us, hence the value of seasoned advice.
Best
--
Hugo José Pinto
listen to colin's advice, avoid the temptation of anti-patterns.
Post by Colin
Use a message bus with a transactional get, get the message, send to
cassandra, upon write success, submit to esp, commit get on bus. Messaging
systems like rabbitmq support this semantic.
Using cassandra as a queuing mechanism is an anti-pattern.
--
*Colin Clark*
+1-320-221-9531
Thank you all for your answers.
It seems I'll have to go with some event-driven processing before/during
the Cassandra write path.
My concern would be that I'd love to first guarantee the disk write of
the Cassandra persistence and then do the event processing (which is mostly
CRUD intercepts at this point), even if slightly delayed, and doing so via
triggers would probably bog down the whole processing pipeline.
What I'd probably do is to write, in trigger, a separate key table with
all the CRUDed elements and to have the ESP process that table.
Thank you for your contribution. Should anyone else have any experiende
experience in these scenarios I'm obviously all ears as well.
Best,
Hugo
Enviado do meu iPhone
Hello Hugo
I was facing the same kind of requirement from some users. Long story
short, below are the possible strategies with advantages and draw-backs of
each
1) Put Spark in front of the back-end, every incoming
modification/update/insert goes into Spark first, then Spark will forward
it to Cassandra for persistence. With Spark, you can perform pre or
post-processing and notify external clients of mutation.
The draw back of this solution is that all the incoming mutations must
go through Spark. You may set up a Kafka queue as temporary storage to
distribute the load and consume mutations with Spark but it add ups to the
architecture complexity with additional components & technologies
2) For high availability and resilience, you probably want to have all
mutations saved first into Cassandra then process notifications with Spark.
In this case the only way to have notifications from Cassandra, as of
version 2.1, is to rely on manually coded triggers (which is still
experimental feature).
With the triggers you can notify whatever clients you want, not only Spark.
The big draw back of this solution is that playing with triggers is
dangerous if you are not familiar with Cassandra internals. Indeed the
trigger is on the write path and may hurt performance if you are doing
complex and blocking tasks.
That's the 2 solutions I can see, maybe the ML members will propose other
innovative choices
Regards
On Sat, Jan 3, 2015 at 11:46 AM, Hugo José Pinto <
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a
distributed in-memory data grid. That's been working sort-of-well for us,
but going solely in-memory has exhausted its path in our use case, and
we're considering porting our application to a NoSQL persistent store.
After the usual comparisons and evaluations, we're borderline close to
picking Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still
not grasping how to solve in Cassandra (with or without Spark): Hazelcast
allows us to create a Continuous Query in that, whenever a row is
added/removed/modified from the clause's resultset, Hazelcast calls up back
with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best
address this use case in Cassandra (with or without Spark's help)? Is there
something in the API that allows for Continuous Queries on key/clause
changes (haven't found it)? Is there some other way to get a stream of
key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in
our use case, the client is potentially interested in a large number of
table clause notifications (think "all changes to Ship positions on
California's coastline"), and iterating out of the store would kill the
streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong
tool for the job? Are we not aware of a particular part of the API or
external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Hugo José Pinto
2015-01-04 13:41:55 UTC
Permalink
Many thanks once again.

I rethought the target data structure, and things started coming together to allow for really elegant, compact ESP preprocessing and storage.

Best.

Enviado do meu iPhone
if you like SQL dialect, try out products that use streamSQL to do continuous queries. Espers comes to mind. Google to see what other products support streamSQL
Post by Hugo José Pinto
Thanks :)
Duly noted - this is all uncharted territory for us, hence the value of seasoned advice.
Best
--
Hugo José Pinto
Post by Peter Lin
listen to colin's advice, avoid the temptation of anti-patterns.
Post by Colin
Use a message bus with a transactional get, get the message, send to cassandra, upon write success, submit to esp, commit get on bus. Messaging systems like rabbitmq support this semantic.
Using cassandra as a queuing mechanism is an anti-pattern.
--
Colin Clark
+1-320-221-9531
Post by Hugo José Pinto
Thank you all for your answers.
It seems I'll have to go with some event-driven processing before/during the Cassandra write path.
My concern would be that I'd love to first guarantee the disk write of the Cassandra persistence and then do the event processing (which is mostly CRUD intercepts at this point), even if slightly delayed, and doing so via triggers would probably bog down the whole processing pipeline.
What I'd probably do is to write, in trigger, a separate key table with all the CRUDed elements and to have the ESP process that table.
Thank you for your contribution. Should anyone else have any experiende experience in these scenarios I'm obviously all ears as well.
Best,
Hugo
Enviado do meu iPhone
Post by DuyHai Doan
Hello Hugo
I was facing the same kind of requirement from some users. Long story short, below are the possible strategies with advantages and draw-backs of each
1) Put Spark in front of the back-end, every incoming modification/update/insert goes into Spark first, then Spark will forward it to Cassandra for persistence. With Spark, you can perform pre or post-processing and notify external clients of mutation.
The draw back of this solution is that all the incoming mutations must go through Spark. You may set up a Kafka queue as temporary storage to distribute the load and consume mutations with Spark but it add ups to the architecture complexity with additional components & technologies
2) For high availability and resilience, you probably want to have all mutations saved first into Cassandra then process notifications with Spark. In this case the only way to have notifications from Cassandra, as of version 2.1, is to rely on manually coded triggers (which is still experimental feature).
With the triggers you can notify whatever clients you want, not only Spark.
The big draw back of this solution is that playing with triggers is dangerous if you are not familiar with Cassandra internals. Indeed the trigger is on the write path and may hurt performance if you are doing complex and blocking tasks.
That's the 2 solutions I can see, maybe the ML members will propose other innovative choices
Regards
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its path in our use case, and we're considering porting our application to a NoSQL persistent store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still not grasping how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast calls up back with the corresponding notification. We use this to continuously update the clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best address this use case in Cassandra (with or without Spark's help)? Is there something in the API that allows for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to get a stream of key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in our use case, the client is potentially interested in a large number of table clause notifications (think "all changes to Ship positions on California's coastline"), and iterating out of the store would kill the streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job? Are we not aware of a particular part of the API or external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Peter Lin
2015-01-03 11:58:28 UTC
Permalink
It looks like you're using the wrong tool and architecture.

If the use case really needs continuous query like event processing, use an ESP product to do that. You can still store data in Cassandra for persistence .

The design you want is to have two paths: event stream and persistence. At the entry point, the system makes parallel calls. One goes to a messaging system that feeds the ESP and a second that calls Cassandra


Sent from my iPhone
Post by Hugo José Pinto
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its path in our use case, and we're considering porting our application to a NoSQL persistent store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still not grasping how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast calls up back with the corresponding notification. We use this to continuously update the clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best address this use case in Cassandra (with or without Spark's help)? Is there something in the API that allows for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to get a stream of key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in our use case, the client is potentially interested in a large number of table clause notifications (think "all changes to Ship positions on California's coastline"), and iterating out of the store would kill the streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job? Are we not aware of a particular part of the API or external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Jabbar Azam
2015-01-03 17:16:00 UTC
Permalink
Hello,

Or you can have a look at akka http://www.akka.io for event processing and
use cassandra for persistence(Peters suggestion).
Post by Peter Lin
It looks like you're using the wrong tool and architecture.
If the use case really needs continuous query like event processing, use
an ESP product to do that. You can still store data in Cassandra for
persistence .
The design you want is to have two paths: event stream and persistence. At
the entry point, the system makes parallel calls. One goes to a messaging
system that feeds the ESP and a second that calls Cassandra
Sent from my iPhone
Hello.
We're currently using Hazelcast (http://hazelcast.org/) as a distributed
in-memory data grid. That's been working sort-of-well for us, but going
solely in-memory has exhausted its path in our use case, and we're
considering porting our application to a NoSQL persistent store. After the
usual comparisons and evaluations, we're borderline close to picking
Cassandra, plus eventually Spark for analytics.
Nonetheless, there is a gap in our architectural needs that we're still
not grasping how to solve in Cassandra (with or without Spark): Hazelcast
allows us to create a Continuous Query in that, whenever a row is
added/removed/modified from the clause's resultset, Hazelcast calls up back
with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.
This is probably a conceptual mismatch we're making, so - how to best
address this use case in Cassandra (with or without Spark's help)? Is there
something in the API that allows for Continuous Queries on key/clause
changes (haven't found it)? Is there some other way to get a stream of
key/clause updates? Events of some sort?
I'm aware that we could, eventually, periodically poll Cassandra, but in
our use case, the client is potentially interested in a large number of
table clause notifications (think "all changes to Ship positions on
California's coastline"), and iterating out of the store would kill the
streamer's scalability.
Hence, the magic question: what are we missing? Is Cassandra the wrong
tool for the job? Are we not aware of a particular part of the API or
external library in/outside the apache realm that would allow for this?
Many thanks for any assistance!
Hugo
Continue reading on narkive:
Loading...