OSI model helps in the real world

Our recently created worker, which runs once a day, stoped work. The exception was quite straight forward: “OperationalError: (2006, ‘MySQL server has gone away’)”.

So, let’s tackle this issue. Is the mysql server up? Do the servers have access between each other?

$ telnet mysqlserver.mynetwork.com 3306
Connected to mysqlserver.mynetwork.com.
Escape character is '^]'.

Oh right, server is up and I have access to it.  What is happening?

Me: – Dev, how do you connect to the server?
Dev: – I’m using django ORM. Usually it works quite well.

Ok. Let’s see if the server has a open connection with the server:

$ netstat -na|grep 3306

So, the server has a connection but the app CAN’T use the connection? (The server runs only this worker, so I know this connections belongs to it).

And this is the moment that it’s useful to understand a little bit about the OSI model.

While the servers had a valid TCP connection (layer 4) between each other, since the application only runs once a day for a few minutes and then becomes idle,  MYSQL dropped the session (level 6) but kept the TCP connection established. As the application didn’t get aware that its session is invalid, it kept trying to use that session.

How to solve that? We  close your sessions after running our worker tasks and make sure it opens a new one when necessary. In django it can be achieved by using this code:

from django.db import connections
for conn in connections.all():

Good! Problem solved.
Last question:  why have we never seen this error on other django applications?
Answer: probably because those applications do not stay idle for a long time. As a result, their sessions never get closed by Mysql. And that is a good thing to know when you’re coding a worker instead of a daemon.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s