AZURE HEROES
  • Home-Updates
  • Blog
    • Azure Blog
    • Azure Heroes Events >
      • Azure Heroes Sessions #1
      • Azure Heroes Sessions #2
      • Azure Heroes Sessions #3
      • Azure Heroes Sessions #4
      • Azure Heroes Sessions #5
      • Azure Heroes Sessions #6
      • Azure Heroes Sessions #7
  • Who We Are!
  • eBooks
  • Azure All In One!
    • Azure Disk & Storage
    • Azure Network
    • Azure VPN
    • Azure VMs
  • Free Azure Support!
  • Contact Us
  • Events
    • Beginners Event
    • Developers Event
    • Special Event
    • Azure Workshop #4
    • Azure Workshop #5
    • Azure Workshop #6
    • Azure Workshop #7
    • Azure Workshop #8
    • Upcoming Events
  • Registration Form
  • Privacy Policy
  • Home-Updates
  • Blog
    • Azure Blog
    • Azure Heroes Events >
      • Azure Heroes Sessions #1
      • Azure Heroes Sessions #2
      • Azure Heroes Sessions #3
      • Azure Heroes Sessions #4
      • Azure Heroes Sessions #5
      • Azure Heroes Sessions #6
      • Azure Heroes Sessions #7
  • Who We Are!
  • eBooks
  • Azure All In One!
    • Azure Disk & Storage
    • Azure Network
    • Azure VPN
    • Azure VMs
  • Free Azure Support!
  • Contact Us
  • Events
    • Beginners Event
    • Developers Event
    • Special Event
    • Azure Workshop #4
    • Azure Workshop #5
    • Azure Workshop #6
    • Azure Workshop #7
    • Azure Workshop #8
    • Upcoming Events
  • Registration Form
  • Privacy Policy

Windows Failover Cluster 2012 R2 or 2016 Roles are not reachable after failover!

2/16/2019

0 Comments

 
Picture
This was a particularly old issue which I had experienced before so I thought it’s worth writing about it.
Issue
Unable to reach/ping Cluster role VIP
Troubleshooting  The real case - One of our customer called me today morning and he asked me to take a remote session ASAP and help him to fix the issue in one of SQL Failover cluster instance, as he is unable to ping the FCI VIP After failover the role to the second node! while from both nodes you still can reach/ping the SQL cluster VIP!! 
  • Windows cluster with two nodes VM01 and VM02
  • There are two SQL FCI's installed 2016
  • Each node has two NICs, one for the LAN and management network, and one for the heartbeat network
  • The cluster consists of three Network resource; a cluster IP address and 2 SQL instance addresses which float between the two nodes depending on which one is active.
 then, I took a remote session, I start working on the issue as per the action plan below :
  • Check Windows Logs -nothing clear or related to the issue!
  • Checking SQL Logs -nothing related to the issue
  • Patch Windows And SQL to the latest update - still can't ping
  • Disable Symantec EP Firewall - still can't ping
  • Run Windows failover cluster validation - All tests where passed
I start thinking if I failover File server role to different node what will happened! is the issue affecting SQL FCI only!

Meanwhile, I asked the customer to failover the File server role to second node , and suddenly the file server IP become unreachable, I came to know that the issue affecting all Windows failover cluster role in the Customer Site!


My Colleague, he is a senior network Engineer start checking the network switches and firewalls, he realized that the MAC address associated with the cluster IP addresses wasn’t changing to the MAC address of node VM02 when we failover the role from VM01 to VM02 – which is what we would expect as a result of the failover operation

commands he used during his troubleshooting:
  • Show ip arp 10.10.2.x - "SQL Cluster IP"
  • Clear ip arp 10.10.2.x - "SQL Cluster IP"
Resolution It appears there is a registry entry in Windows which enables Gratuitous Address Resolution Protocol (GARP) requests to be sent out when a failover occurs. By default this entry doesn’t exist in Server 2012 R2 and 2016 as well, I looked at the registry of node VM02. The registry entry was there but it was set to 0 – which is mean  "don’t send garp" ! So I  set the value to 3, then gave the node a reboot. Once the node was accessible again, I carried out another failover test – and voila! only experienced a single ping drop this time before all 3 cluster IP addresses were accessible again So to get this working – Windows server registry object “ArpRetryCount” needs to be added or updated if it exists as follow:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters -REG_DWORD > ArpRetryCount
 
Values:
0: don't send garp
1: send garp once only
2: send garp twice
3: send garp three times (The Default Value)


From Network Side make sure to enable the garp-reply:-
To enable on Juniper EX & SRX platform – user the following command –

set interface interface_name/number gratuitous-arp-reply
The interface can be a physical interface, logical interface, interface group, SVI or IRB To enable GARP
 on Cisco IOS – use interface command ip gratuitous-arps


Note: It just for troubleshooting purpose. Mainly we disable GARP from server side. In VMware environment "Virtual machines hosted on ESXI", it mandates to disable if you have Active-Active, Active-Passive sites. in order to send L2 packets to Core Switches


Originally Posted @ Microsoft Wiki
0 Comments



Leave a Reply.

    Author

    Mohammad Al Rousan is a Microsoft MVP (Azure), Microsoft Certified Solution Expert (MCSE) in Cloud Platform & Azure DevOps & Infrastructure, An active community blogger and speaker. Al Rousan has over 8 years of professional experience in IT Infrastructure and very passionate about Microsoft technologies and products.

    Picture
    Picture
    Top 10 Microsoft Azure Blogs

    Archives

    November 2022
    October 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    November 2021
    May 2021
    February 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    June 2020
    April 2020
    January 2020
    July 2019
    June 2019
    May 2019
    February 2019
    January 2019

    Categories

    All
    AKS
    Azure
    Beginner
    CDN
    DevOps
    End Of Support
    Fundamentals
    Guide
    Hybrid
    License
    Migration
    Network
    Security
    SQL
    Storage
    Virtual Machines
    WAF

    RSS Feed

    Follow
    Free counters!
Powered by Create your own unique website with customizable templates.