TCP connection (usually) is not a good test

Yesterday we were setting up a master/follower cluster.

The point: to know if the master is alive, the follower performs only a TCP connection.

Why is that a problem?

Due to an unsuccessful execution of an “poweroff” command, the master was able to receive a TCP connection but wasn’t able to answer real requests. Since the follower only checks if it can establish a TCP connection with the master, and it could, it never became the master as expected, making our cluster unavailable.

What did we learn from that?

If you want to test a service, make a real request. Don’t use “ping”, “telnet” or something like that. Do a REAL request.

IMPORTANT: make sure your requests are real but also light. You don’t want your monitors/tests to overload your service.

Thanks for reading 🙂