[python] [flask] [CQLAlchemy] NoHostAvailable on create

Discussion:

Alan Hamlett

2017-12-31 09:07:52 UTC

I'm seeing tracebacks in my Python Flask app when creating rows:

Traceback (most recent call last):
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1351, in _execute
results = _execute_statement(self.model, statement,
self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in
cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
against any hosts', {})

I'm using the cassandra-driver client library 3.12.0 via Flask-CQLAlchemy
1.2.0 (https://github.com/thegeorgeous/flask-cqlalchemy) with uWSGI (
https://github.com/unbit/uwsgi).

cassandra.cqlengine.connection.setup is being passed lazy_connect=True and
retry_connect=Truecassandra.cqlengine because lazy_connect=False causes
requests to timeout to the Flask app for some reason.

Also seeing these errors in my uWSGI log file:

[control connection] Error connecting to 10.1.2.3: Traceback (most
recent call last): File "cassandra/cluster.py", line 2781, in
cassandra.cluster.ControlConnection._reconnect_internal File
"cassandra/cluster.py", line 2803, in
cassandra.cluster.ControlConnection._try_connect File
"cassandra/cluster.py", line 1195, in
cassandra.cluster.Cluster.connection_factory File
"cassandra/connection.py", line 341, in
cassandra.connection.Connection.factory cassandra.OperationTimedOut:
errors=Timed out creating connection (5 seconds), last_host=None

What's causing these connection and timeout errors? Something related to
Flask-CQLAlchemy?

Alan Hamlett

2017-12-31 16:52:17 UTC

Permalink

More info: The NoHostAvailable error is happening at random times on each
client host, so it's probably a client error. If the Cassandra cluster was
really offline then all client hosts would report the error at the same
time instead of different random times. The NoHostAvailable error occurs
about once every 30 minutes, so most request call Model.create() without
the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via Flask-CQLAlchemy
1.2.0 (https://github.com/thegeorgeous/flask-cqlalchemy) with uWSGI (
https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed lazy_connect=True
and retry_connect=Truecassandra.cqlengine because lazy_connect=False
causes requests to timeout to the Flask app for some reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related to
Flask-CQLAlchemy?

Jeff Jirsa

2017-12-31 17:04:20 UTC

Permalink

uWSGI forks and the driver / cqlalchemy may need to reconnect or otherwise fix the state after each fork - you could try to prove this is the cause by checking uWSGI logs or ps for indication that a worker process has exited/been recycled. If you think it may be related to this, check out @postfork decorator
--
Jeff Jirsa

More info: The NoHostAvailable error is happening at random times on each client host, so it's probably a client error. If the Cassandra cluster was really offline then all client hosts would report the error at the same time instead of different random times. The NoHostAvailable error occurs about once every 30 minutes, so most request call Model.create() without the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeous/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed lazy_connect=True and retry_connect=Truecassandra.cqlengine because lazy_connect=False causes requests to timeout to the Flask app for some reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related to Flask-CQLAlchemy?

Alan Hamlett

2017-12-31 22:34:28 UTC

Permalink

Thanks for the reply, I think it's related. However, after using a fork of
Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable error
once per 4k requests. One strange thing is the error rate doesn't increase
with the number of requests, since some uWSGI clients with ~20k requests
over the same time period have an error rate of once per 20k requests. Both
uWSGI hosts have the same number of worker processes.

*Flask-CQLAlchemy Fork with Patch:*

https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c7cf0c51a19be98791dd4c47b72b97d9be

*Error Traceback seen after patch applied:*

Failed to create connection pool for new host 10.1.2.3:
Traceback (most recent call last):
File "cassandra/cluster.py", line 2452, in
cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
File "cassandra/pool.py", line 332, in
cassandra.pool.HostConnection.__init__
File "cassandra/cluster.py", line 1195, in
cassandra.cluster.Cluster.connection_factory
File "cassandra/connection.py", line 341, in
cassandra.connection.Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5
seconds), last_host=None
Traceback (most recent call last):
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in
wsgi_app
response = self.full_dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in
full_dispatch_request
rv = self.handle_user_exception(e)
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in
handle_user_exception
reraise(exc_type, exc_value, tb)
File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in
reraise
raise value
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in
full_dispatch_request
rv = self.dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in
dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./app/api_utils.py", line 876, in get_durations
use_cassandra=use_cassandra,
File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
line 53, in wrapped
return func(*args, **kwargs)
File "./app/api_utils.py", line 1339, in heartbeats_to_durations
for heartbeat in heartbeats:
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 512, in __iter__
self._execute_query()
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 469, in _execute_query
self._result_generator = (i for i in
self._execute(self._select_query()))
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 401, in _execute
result = _execute_statement(self.model, statement, self._consistency,
self._timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File
"./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in
cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in
cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
against any hosts', {})

Post by Jeff Jirsa
uWSGI forks and the driver / cqlalchemy may need to reconnect or otherwise
fix the state after each fork - you could try to prove this is the cause by
checking uWSGI logs or ps for indication that a worker process has
exited/been recycled. If you think it may be related to this, check out
@postfork decorator
--
Jeff Jirsa
More info: The NoHostAvailable error is happening at random times on each
client host, so it's probably a client error. If the Cassandra cluster was
really offline then all client hosts would report the error at the same
time instead of different random times. The NoHostAvailable error occurs
about once every 30 minutes, so most request call Model.create() without
the error.

Alan Hamlett

2018-01-01 20:14:46 UTC

Permalink

Still getting the cassandra.cluster.NoHostAvailable error periodically from
uWSGI hosts. Setting up the connection with postfork:
https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed3298af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56

Lazy connection is False, Retry connection is True. Could this be a bug in
cassandra-driver's connection pooling?

P.S. Blocking a web app when connection isn't available (default non-lazy
connect) is really bad. With a web app you want requests that don't depend
on Cassandra to complete, but cassandra-driver blocks all requests when
there's no Cassandra connection even if it's not needed for the current web
app's request. This design decision gives me very low confidence in the
Python cassandra-driver.

Post by Alan Hamlett
Thanks for the reply, I think it's related. However, after using a fork of
Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable error
once per 4k requests. One strange thing is the error rate doesn't increase
with the number of requests, since some uWSGI clients with ~20k requests
over the same time period have an error rate of once per 20k requests. Both
uWSGI hosts have the same number of worker processes.
*Flask-CQLAlchemy Fork with Patch:*
https://github.com/alanhamlett/flask-cqlalchemy/tree/
a7e5c7c7cf0c51a19be98791dd4c47b72b97d9be
*Error Traceback seen after patch applied:*
File "cassandra/cluster.py", line 2452, in cassandra.cluster.Session.add_
or_renew_pool.run_add_or_renew_pool
File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.
__init__
File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.
connection_factory
File "cassandra/connection.py", line 341, in cassandra.connection.
Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5
seconds), last_host=None
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in
wsgi_app
response = self.full_dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in
full_dispatch_request
rv = self.handle_user_exception(e)
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in
handle_user_exception
reraise(exc_type, exc_value, tb)
File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in
reraise
raise value
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in
full_dispatch_request
rv = self.dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in
dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./app/api_utils.py", line 876, in get_durations
use_cassandra=use_cassandra,
File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
line 53, in wrapped
return func(*args, **kwargs)
File "./app/api_utils.py", line 1339, in heartbeats_to_durations
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 512, in __iter__
self._execute_query()
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 469, in _execute_query
self._result_generator = (i for i in self._execute(self._select_
query()))
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 401, in _execute
result = _execute_statement(self.model, statement, self._consistency,
self._timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.
execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.
ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
against any hosts', {})

Post by Jeff Jirsa
uWSGI forks and the driver / cqlalchemy may need to reconnect or
otherwise fix the state after each fork - you could try to prove this is
the cause by checking uWSGI logs or ps for indication that a worker process
has exited/been recycled. If you think it may be related to this, check out
@postfork decorator
--
Jeff Jirsa
More info: The NoHostAvailable error is happening at random times on each
client host, so it's probably a client error. If the Cassandra cluster was
really offline then all client hosts would report the error at the same
time instead of different random times. The NoHostAvailable error occurs
about once every 30 minutes, so most request call Model.create() without
the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via
Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeous/flask-cqlalchemy)
with uWSGI (https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed lazy_connect=True
and retry_connect=Truecassandra.cqlengine because lazy_connect=False
causes requests to timeout to the Flask app for some reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related to
Flask-CQLAlchemy?

--
Alan Hamlett
ahamlett.com

Jeff Jirsa

2018-01-01 20:21:14 UTC

Permalink

Well the python driver you reference is a third party driver, because the project doesnât ship official drivers. You may have better luck looking for a datastax driver support forum, or wait until after the holiday for more people to be checking email.
--
Jeff Jirsa

Post by Alan Hamlett
https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed3298af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56
Lazy connection is False, Retry connection is True. Could this be a bug in cassandra-driver's connection pooling?
P.S. Blocking a web app when connection isn't available (default non-lazy connect) is really bad. With a web app you want requests that don't depend on Cassandra to complete, but cassandra-driver blocks all requests when there's no Cassandra connection even if it's not needed for the current web app's request. This design decision gives me very low confidence in the Python cassandra-driver.

Thanks for the reply, I think it's related. However, after using a fork of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable error once per 4k requests. One strange thing is the error rate doesn't increase with the number of requests, since some uWSGI clients with ~20k requests over the same time period have an error rate of once per 20k requests. Both uWSGI hosts have the same number of worker processes.
https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c7cf0c51a19be98791dd4c47b72b97d9be
File "cassandra/cluster.py", line 2452, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.__init__
File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory
File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./app/api_utils.py", line 876, in get_durations
use_cassandra=use_cassandra,
File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py", line 53, in wrapped
return func(*args, **kwargs)
File "./app/api_utils.py", line 1339, in heartbeats_to_durations
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 512, in __iter__
self._execute_query()
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 469, in _execute_query
self._result_generator = (i for i in self._execute(self._select_query()))
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 401, in _execute
result = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})

Post by Jeff Jirsa
--
Jeff Jirsa

More info: The NoHostAvailable error is happening at random times on each client host, so it's probably a client error. If the Cassandra cluster was really offline then all client hosts would report the error at the same time instead of different random times. The NoHostAvailable error occurs about once every 30 minutes, so most request call Model.create() without the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeous/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed lazy_connect=True and retry_connect=Truecassandra.cqlengine because lazy_connect=False causes requests to timeout to the Flask app for some reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related to Flask-CQLAlchemy?

--
Alan Hamlett
ahamlett.com

Alan Hamlett

2018-01-02 04:43:55 UTC

Permalink

Adding more nodes to the cluster fixed the error. Looks like a bug in
python-driver connection pool:

1. The connection pool only has one host
2. A query times out, causing that connection to be removed from the pool
3. Another query executes, but there are no hosts in the pool

Post by Jeff Jirsa
Well the python driver you reference is a third party driver, because the
project doesnât ship official drivers. You may have better luck looking for
a datastax driver support forum, or wait until after the holiday for more
people to be checking email.
--
Jeff Jirsa
Still getting the cassandra.cluster.NoHostAvailable error periodically
https://github.com/alanhamlett/flask-cqlalchemy/blob/
653ed3298af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56
Lazy connection is False, Retry connection is True. Could this be a bug in
cassandra-driver's connection pooling?
P.S. Blocking a web app when connection isn't available (default non-lazy
connect) is really bad. With a web app you want requests that don't depend
on Cassandra to complete, but cassandra-driver blocks all requests when
there's no Cassandra connection even if it's not needed for the current web
app's request. This design decision gives me very low confidence in the
Python cassandra-driver.

Post by Alan Hamlett
Thanks for the reply, I think it's related. However, after using a fork
of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable
error once per 4k requests. One strange thing is the error rate doesn't
increase with the number of requests, since some uWSGI clients with ~20k
requests over the same time period have an error rate of once per 20k
requests. Both uWSGI hosts have the same number of worker processes.
*Flask-CQLAlchemy Fork with Patch:*
https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c
7cf0c51a19be98791dd4c47b72b97d9be
*Error Traceback seen after patch applied:*
File "cassandra/cluster.py", line 2452, in
cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.
__init__
File "cassandra/cluster.py", line 1195, in
cassandra.cluster.Cluster.connection_factory
File "cassandra/connection.py", line 341, in
cassandra.connection.Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5
seconds), last_host=None
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in
full_dispatch_request
rv = self.handle_user_exception(e)
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in
handle_user_exception
reraise(exc_type, exc_value, tb)
File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in
full_dispatch_request
rv = self.dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./app/api_utils.py", line 876, in get_durations
use_cassandra=use_cassandra,
File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
line 53, in wrapped
return func(*args, **kwargs)
File "./app/api_utils.py", line 1339, in heartbeats_to_durations
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 512, in __iter__
self._execute_query()
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 469, in _execute_query
self._result_generator = (i for i in self._execute(self._select_que
ry()))
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 401, in _execute
result = _execute_statement(self.model, statement, self._consistency,
self._timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in
cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in
cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
against any hosts', {})

Post by Jeff Jirsa
uWSGI forks and the driver / cqlalchemy may need to reconnect or
otherwise fix the state after each fork - you could try to prove this is
the cause by checking uWSGI logs or ps for indication that a worker process
has exited/been recycled. If you think it may be related to this, check out
@postfork decorator
--
Jeff Jirsa
More info: The NoHostAvailable error is happening at random times on
each client host, so it's probably a client error. If the Cassandra cluster
was really offline then all client hosts would report the error at the same
time instead of different random times. The NoHostAvailable error occurs
about once every 30 minutes, so most request call Model.create() without
the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via
Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeo
us/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed lazy_connect=True
and retry_connect=Truecassandra.cqlengine because lazy_connect=False
causes requests to timeout to the Flask app for some reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related
to Flask-CQLAlchemy?

--
Alan Hamlett
ahamlett.com

Alan Hamlett

2018-01-02 16:13:15 UTC

Permalink

Still getting the NoHostAvailable with more hosts, just occurring less
frequently. Created a JIRA issue on the Python cassandra-driver tracker:
https://datastax-oss.atlassian.net/browse/PYTHON-891

Post by Alan Hamlett
Adding more nodes to the cluster fixed the error. Looks like a bug in
1. The connection pool only has one host
2. A query times out, causing that connection to be removed from the pool
3. Another query executes, but there are no hosts in the pool

Post by Jeff Jirsa
Well the python driver you reference is a third party driver, because the
project doesnât ship official drivers. You may have better luck looking for
a datastax driver support forum, or wait until after the holiday for more
people to be checking email.
--
Jeff Jirsa
Still getting the cassandra.cluster.NoHostAvailable error periodically
https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed32
98af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56
Lazy connection is False, Retry connection is True. Could this be a bug
in cassandra-driver's connection pooling?
P.S. Blocking a web app when connection isn't available (default non-lazy
connect) is really bad. With a web app you want requests that don't depend
on Cassandra to complete, but cassandra-driver blocks all requests when
there's no Cassandra connection even if it's not needed for the current web
app's request. This design decision gives me very low confidence in the
Python cassandra-driver.

Post by Alan Hamlett
Thanks for the reply, I think it's related. However, after using a fork
of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable
error once per 4k requests. One strange thing is the error rate doesn't
increase with the number of requests, since some uWSGI clients with ~20k
requests over the same time period have an error rate of once per 20k
requests. Both uWSGI hosts have the same number of worker processes.
*Flask-CQLAlchemy Fork with Patch:*
https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c
7cf0c51a19be98791dd4c47b72b97d9be
*Error Traceback seen after patch applied:*
File "cassandra/cluster.py", line 2452, in
cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.
__init__
File "cassandra/cluster.py", line 1195, in
cassandra.cluster.Cluster.connection_factory
File "cassandra/connection.py", line 341, in
cassandra.connection.Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5
seconds), last_host=None
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in
full_dispatch_request
rv = self.handle_user_exception(e)
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in
handle_user_exception
reraise(exc_type, exc_value, tb)
File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in
full_dispatch_request
rv = self.dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in
dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./app/api_utils.py", line 876, in get_durations
use_cassandra=use_cassandra,
File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
line 53, in wrapped
return func(*args, **kwargs)
File "./app/api_utils.py", line 1339, in heartbeats_to_durations
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 512, in __iter__
self._execute_query()
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 469, in _execute_query
self._result_generator = (i for i in self._execute(self._select_que
ry()))
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 401, in _execute
result = _execute_statement(self.model, statement,
self._consistency, self._timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout,
connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in
cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in
cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
against any hosts', {})

Post by Jeff Jirsa
uWSGI forks and the driver / cqlalchemy may need to reconnect or
otherwise fix the state after each fork - you could try to prove this is
the cause by checking uWSGI logs or ps for indication that a worker process
has exited/been recycled. If you think it may be related to this, check out
@postfork decorator
--
Jeff Jirsa
More info: The NoHostAvailable error is happening at random times on
each client host, so it's probably a client error. If the Cassandra cluster
was really offline then all client hosts would report the error at the same
time instead of different random times. The NoHostAvailable error occurs
about once every 30 minutes, so most request call Model.create() without
the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via
Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeo
us/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed
lazy_connect=True and retry_connect=Truecassandra.cqlengine because
lazy_connect=False causes requests to timeout to the Flask app for some
reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related
to Flask-CQLAlchemy?

--
Alan Hamlett
ahamlett.com

Alan Hamlett

2018-01-06 01:50:32 UTC

Permalink

Update: Still getting the NoHostAvailable periodically in client logs.

Also seeing these INFO and WARN messages in

/var/log/cassandra/system.log

INFO [epollEventLoopGroup-2-5] 2018-01-06 01:39:02,412
Message.java:623 - Unexpected exception during request; channel = [id:
0xae99b597, L:/10.1.2.3:9042 - R:/10.1.2.12:54720]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
failed: Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
WARN [ReadStage-1] 2018-01-06 01:39:24,350 ReadCommand.java:533 -
Read 344 live rows and 2074 tombstone cells for query SELECT * FROM
keyspace.heartbeat WHERE user_id =
66b6796d-eb84-4bb9-b9d2-8dc882f4c6ac AND time >= 1515225599 AND time
<= 1515139200 ORDER BY (time ASC) LIMIT 5000 (see
tombstone_warn_threshold)

Post by Alan Hamlett
Still getting the NoHostAvailable with more hosts, just occurring less
https://datastax-oss.atlassian.net/browse/PYTHON-891

Post by Jeff Jirsa
Well the python driver you reference is a third party driver, because
the project doesnât ship official drivers. You may have better luck looking
for a datastax driver support forum, or wait until after the holiday for
more people to be checking email.
--
Jeff Jirsa
Still getting the cassandra.cluster.NoHostAvailable error periodically
https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed32
98af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56
Lazy connection is False, Retry connection is True. Could this be a bug
in cassandra-driver's connection pooling?
P.S. Blocking a web app when connection isn't available (default
non-lazy connect) is really bad. With a web app you want requests that
don't depend on Cassandra to complete, but cassandra-driver blocks all
requests when there's no Cassandra connection even if it's not needed for
the current web app's request. This design decision gives me very low
confidence in the Python cassandra-driver.

Post by Alan Hamlett
Thanks for the reply, I think it's related. However, after using a fork
of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable
error once per 4k requests. One strange thing is the error rate doesn't
increase with the number of requests, since some uWSGI clients with ~20k
requests over the same time period have an error rate of once per 20k
requests. Both uWSGI hosts have the same number of worker processes.
*Flask-CQLAlchemy Fork with Patch:*
https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c
7cf0c51a19be98791dd4c47b72b97d9be
*Error Traceback seen after patch applied:*
File "cassandra/cluster.py", line 2452, in
cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.
__init__
File "cassandra/cluster.py", line 1195, in
cassandra.cluster.Cluster.connection_factory
File "cassandra/connection.py", line 341, in
cassandra.connection.Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5
seconds), last_host=None
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614,
in full_dispatch_request
rv = self.handle_user_exception(e)
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517,
in handle_user_exception
reraise(exc_type, exc_value, tb)
File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612,
in full_dispatch_request
rv = self.dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598,
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./app/api_utils.py", line 876, in get_durations
use_cassandra=use_cassandra,
File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
line 53, in wrapped
return func(*args, **kwargs)
File "./app/api_utils.py", line 1339, in heartbeats_to_durations
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 512, in __iter__
self._execute_query()
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 469, in _execute_query
self._result_generator = (i for i in self._execute(self._select_que
ry()))
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 401, in _execute
result = _execute_statement(self.model, statement,
self._consistency, self._timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout,
connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in
cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in
cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
against any hosts', {})

Post by Jeff Jirsa
uWSGI forks and the driver / cqlalchemy may need to reconnect or
otherwise fix the state after each fork - you could try to prove this is
the cause by checking uWSGI logs or ps for indication that a worker process
has exited/been recycled. If you think it may be related to this, check out
@postfork decorator
--
Jeff Jirsa
More info: The NoHostAvailable error is happening at random times on
each client host, so it's probably a client error. If the Cassandra cluster
was really offline then all client hosts would report the error at the same
time instead of different random times. The NoHostAvailable error occurs
about once every 30 minutes, so most request call Model.create() without
the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via
Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeo
us/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed
lazy_connect=True and retry_connect=Truecassandra.cqlengine because
lazy_connect=False causes requests to timeout to the Flask app for some
reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related
to Flask-CQLAlchemy?

--
Alan Hamlett
ahamlett.com

Jeff Jirsa

2018-01-06 01:56:21 UTC

Permalink

The warn is a hint youâve got tombstones, maybe not a big deal, but a hint at your data model. Itâs not causing this

The log at INFO is Cassandra connection to your app getting severed, Cassandra is saying the reset is on the other side (app side, maybe firewall or something in the middle too).
--
Jeff Jirsa

Post by Alan Hamlett
Update: Still getting the NoHostAvailable periodically in client logs.
Also seeing these INFO and WARN messages in
/var/log/cassandra/system.log
INFO [epollEventLoopGroup-2-5] 2018-01-06 01:39:02,412 Message.java:623 - Unexpected exception during request; channel = [id: 0xae99b597, L:/10.1.2.3:9042 - R:/10.1.2.12:54720]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
WARN [ReadStage-1] 2018-01-06 01:39:24,350 ReadCommand.java:533 - Read 344 live rows and 2074 tombstone cells for query SELECT * FROM keyspace.heartbeat WHERE user_id = 66b6796d-eb84-4bb9-b9d2-8dc882f4c6ac AND time >= 1515225599 AND time <= 1515139200 ORDER BY (time ASC) LIMIT 5000 (see tombstone_warn_threshold)

Post by Alan Hamlett
https://datastax-oss.atlassian.net/browse/PYTHON-891

Post by Alan Hamlett
1. The connection pool only has one host
2. A query times out, causing that connection to be removed from the pool
3. Another query executes, but there are no hosts in the pool

Thanks for the reply, I think it's related. However, after using a fork of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable error once per 4k requests. One strange thing is the error rate doesn't increase with the number of requests, since some uWSGI clients with ~20k requests over the same time period have an error rate of once per 20k requests. Both uWSGI hosts have the same number of worker processes.
https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c7cf0c51a19be98791dd4c47b72b97d9be
File "cassandra/cluster.py", line 2452, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.__init__
File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory
File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./app/api_utils.py", line 876, in get_durations
use_cassandra=use_cassandra,
File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py", line 53, in wrapped
return func(*args, **kwargs)
File "./app/api_utils.py", line 1339, in heartbeats_to_durations
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 512, in __iter__
self._execute_query()
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 469, in _execute_query
self._result_generator = (i for i in self._execute(self._select_query()))
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 401, in _execute
result = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})

Post by Jeff Jirsa
--
Jeff Jirsa

More info: The NoHostAvailable error is happening at random times on each client host, so it's probably a client error. If the Cassandra cluster was really offline then all client hosts would report the error at the same time instead of different random times. The NoHostAvailable error occurs about once every 30 minutes, so most request call Model.create() without the error.

Post by Alan Hamlett
File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
Heartbeat.create(**form_data)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 672, in create
return cls.objects.create(**kwargs)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 977, in create
.using(connection=self._connection) \
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", line 738, in save
if_exists=self._if_exists).save()
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1476, in save
self._execute(insert)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1351, in _execute
results = _execute_statement(self.model, statement, self._consistency, self._timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", line 1505, in _execute_statement
return conn.execute(s, params, timeout=timeout, connection=connection)
File "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", line 341, in execute
result = conn.session.execute(query, params, timeout=timeout)
File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 3982, in cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})
I'm using the cassandra-driver client library 3.12.0 via Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeous/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
cassandra.cqlengine.connection.setup is being passed lazy_connect=True and retry_connect=Truecassandra.cqlengine because lazy_connect=False causes requests to timeout to the Flask app for some reason.
[control connection] Error connecting to 10.1.2.3: Traceback (most recent call last): File "cassandra/cluster.py", line 2781, in cassandra.cluster.ControlConnection._reconnect_internal File "cassandra/cluster.py", line 2803, in cassandra.cluster.ControlConnection._try_connect File "cassandra/cluster.py", line 1195, in cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", line 341, in cassandra.connection.Connection.factory cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
What's causing these connection and timeout errors? Something related to Flask-CQLAlchemy?

--
Alan Hamlett
ahamlett.com