Yesterday we were setting up a master/follower cluster.
The point: to know if the master is alive, the follower performs only a TCP connection.
Why is that a problem?
Due to an unsuccessful execution of an “poweroff” command, the master was able to receive a TCP connection but wasn’t able to answer real requests. Since the follower only checks if it can establish a TCP connection with the master, and it could, it never became the master as expected, making our cluster unavailable.
What did we learn from that?
If you want to test a service, make a real request. Don’t use “ping”, “telnet” or something like that. Do a REAL request.
IMPORTANT: make sure your requests are real but also light. You don’t want your monitors/tests to overload your service.
Thanks for reading 🙂