Came across an issue with a Red Hat High Availability install which had previously worked in a development environment. The issue occurred when starting the cluster using “service cman start” which gave the following error:
Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Starting qdiskd... [ OK ] Waiting for quorum... Timed-out waiting for cluster [FAILED]
As we were having issues creating a suitable shared disk in VMware ESXi 5.5 to act as a quorum disk the initial thoughts were that the systems were having trouble writing to the qdisk – especially as the first node came up fine. The issue was tracked down to the VMware NSX firewall which the VMs were running in. Even though an explicit “any” rule had been added for the hosts (to allow unfettered IP between them) it was discovered this was not sufficient for multicast packets which were getting blocked. This was preventing the cluster from starting on the other nodes.
A rule to allow 126.96.36.199/14 on upd 5404 and 5405 in NSX resolved the communication issues and the cluster started successfully.