Our recently created worker, which runs once a day, stoped work. The exception was quite straight forward: “OperationalError: (2006, ‘MySQL server has gone away’)”.
So, let’s tackle this issue. Is the mysql server up? Do the servers have access between each other?
$ telnet mysqlserver.mynetwork.com 3306 Trying 10.123.123.3... Connected to mysqlserver.mynetwork.com. Escape character is '^]'. m 5.6.40-enterprise-commercial-advanced-logޔ#=|~#/c(9!�%L3X@vV*mf9+mysql_native_password
Oh right, server is up and I have access to it. What is happening?
Me: – Dev, how do you connect to the server?
Dev: – I’m using django ORM. Usually it works quite well.
Ok. Let’s see if the server has a open connection with the server:
$ netstat -na|grep 3306 tcp 0 0 171.0.3.100:50566 10.123.123.3:3306 ESTABLISHED
So, the server has a connection but the app CAN’T use the connection? (The server runs only this worker, so I know this connections belongs to it).
And this is the moment that it’s useful to understand a little bit about the OSI model.
While the servers had a valid TCP connection (layer 4) between each other, since the application only runs once a day for a few minutes and then becomes idle, MYSQL dropped the session (level 6) but kept the TCP connection established. As the application didn’t get aware that its session is invalid, it kept trying to use that session.
How to solve that? We close your sessions after running our worker tasks and make sure it opens a new one when necessary. In django it can be achieved by using this code:
from django.db import connections
for conn in connections.all(): conn.close()