DNS Failover with ICMP Ping

One of the most used DNS Failover checks by our customers is ICMP Ping. Usually, everything works smooth, but sometimes the clients are facing down events and they are asking why as the usual ping checks show the IP is online. Usually, the problem comes from specific network problems caused by the size of the packets. As network problems may vary from the packet size, we are doing 3 different ping checks with total 12 packets to be 100% sure your network is fully operational.

Ping command

For the DNS Failover checks we are using Linux ping command with the following parameters:

-c - to define the number of packets we will use
-s - to define the packet payload size
and the IP we are checking

64 bytes check

The first check we are executing is with 64 bytes packet size, 8 bytes are the header in the packet, we are passing 56 bytes for the payload. Here is an example:

# ping -c 4 -s 56 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=48 time=1 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=48 time=1 ms
64 bytes from 192.168.0.1: icmp_seq=3 ttl=48 time=1 ms
64 bytes from 192.168.0.1: icmp_seq=4 ttl=48 time=1 ms

--- 192.168.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 1.010/1.021/1.031/0.004 ms

From the result we can see there is no packet loss with the small packets and everything is okay. However, if we see packet loss here then usually the network issues you are experiencing are significant.

512 bytes check

The second check we are executing is with 512 bytes packet size. Usually, when our clients experience some problems with larger packets, here the packet loss starts with 1 or 2 packets missed. Here is an example with 25% packet loss (the payload is 508 bytes, because 8 bytes are for the header):

# ping -c 4 -s 504 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 504(532) bytes of data.
512 bytes from 192.168.0.1: icmp_seq=1 ttl=48 time=1 ms
512 bytes from 192.168.0.1: icmp_seq=2 ttl=48 time=1 ms
512 bytes from 192.168.0.1: icmp_seq=4 ttl=48 time=1 ms

--- 192.168.0.1 ping statistics ---
4 packets transmitted, 3 received, 25% packet loss, time 3000ms
rtt min/avg/max/mdev = 1.010/1.021/1.031/0.004 ms

1024 bytes check

The last check we are executing is with 1024 bytes packet size. Usually when you are experiencing problems with larger packets, here the packet loss is 50% or more. Here is an example with 50% packet loss (payload is 1016, because 8 bytes are for the header):

# ping -c 4 -s 1016 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 1016(1044) bytes of data.
1024 bytes from 192.168.0.1: icmp_seq=1 ttl=48 time=1 ms
1024 bytes from 192.168.0.1: icmp_seq=2 ttl=48 time=1 ms

--- 192.168.0.1 ping statistics ---
4 packets transmitted, 2 received, 50% packet loss, time 3000ms
rtt min/avg/max/mdev = 1.010/1.021/1.031/0.004 ms

Down event

At the end of the process, we have a result from 3 checks with 4 packets each, total 12 packets. Based on your configuration the down event is occurred on different thresholds:

15% packet loss - this is the most sensitive check, for the down event you need to have 2 or more packets loss (2/12)
25% packet loss - down event is occurred if you have 3 or more packets loss (3/12)
50% packet loss - down event is triggered if you have 6 or more packet loss (6/12)

Debugging

If a down event occurs for your IP, first you have to check it with the regular ping command, if you see any packet loss, then contact your network provider and provide him the details
If you don't see any issues with the regular ping command, then check the IP address with the different packet sizes from the examples above, if you see any packet loss, contact your network provider and provide him the details
If you are unable to reproduce the network issue by yourself, please contact our technical support and we will provide you traceroutes and details from the problematic nodes, it is possible the problem to be in specific routes.

Last modified: 2021-07-12

Wiki