Understanding SQL Server Clustering IP Addresses?


#1

Been reading some stuff and I think I know the basis, but still need to confirm a couple things as far as the ‘data flow’ of communication between clients a cluster.

Here is an example I made up and my questions that follow. I’m referring a SQL server cluster here.

Physical Nodes IP’s

Node A - 10.23.23.25

Node B - 10.23.23.26

Cluster Name

WinCLN102 - 10.23.23.27

SQL Server Instance running on cluster

PDB101\OLTP - 10.23.23.28

Private Network for clustered nodes (A and B heartbeat) to communicate.

Node A - 10.23.24.30

Node B - 10.23.24.31

1.) What is the purpose of each IP address? (especially why does the cluster name have its own IP?) 2.) Do clients connect on the SQL server instance IP or clustered IP? Does it go through the physical node IP to reach them? Essentially, what is the route requests take to reach cluster resources?

Thank you


#2

I’m not 100% sure on SQL clustering, but here’s generally how things work, in my experience.

Your servers need 3 addresses and (typically) two interfaces, because they play tricks with TCP/IP and Ethernet to make clustering work.

In a server cluster, you have 3 types of addresses:

  • Individual Server addresses (your x.x.23.25, x.x.23.26 addresses)
  • Shared address (x.x.23.27)
  • Private addresses (the x.x.24.x addresses)

The first group is the individual server addresses. You use these when connecting to your servers from other computers. File sharing, Remote Destkop, even directly accessing the SQL instance on that server would be done through the individual 10.23.23.x addresses.

The shared address is the one both servers have in common: 10.23.23.27. A shared IP address is actually a violation of the TCP/IP standard, but Microsoft uses some tricks to allow the address to be shared. I believe SQL server does this by sending out ARP messages to cause remote hosts to switch between hosts.

An ARP message is the mechanism that IP devices on an Ethernet network use to connect an IP address and a physical (MAC) address. Every computer in the world has at least one physical address, that we call a MAC.

The MAC address is a number burned into your network card’s ROM which uniquely identifies that network interface. (This has nothing to do with Apple computers. Please don’t ever call an Apple Mac a “MAC”, or a network engineer loses his wings.)

Whenever a computer requests a connection to an IP address, say your shared IP of 10.23.23.27, your computer sends an ARP request. The computer that owns 10.23.23.27 responds with an ARP response. This response includes the IP address and the MAC address of the network port.

The trick here is to bounce the network connection back and forth between cluster nodes… one way to do this is to send out different ARP requests when a new client sends a request. So client F gets server A’s MAC address and client G gets server B’s MAC address.

Or the servers can set one of their ports to the same MAC address. In that case, the servers just ignore messages meant for the other server. This requires that the network switch be explicitly configured to allow this… which means you need more expensive, managed switches, but that’s another conversation.

And this leads us to the private network ports - the ones on the 10.23.24 subnet.

These ports are used as a sort of back channel, to let the servers talk to each other and decide which sever responds to incoming requests. This is how the servers know to respond to specific requests - they’re constantly talking in the background and coordinating which server responds to which clients.

And this is all transparent to the clients, because the servers manipulate the TCP/IP address resolution systems (ARP, MAC, and DNS) to make things work. It’s pretty smart, although it’s also frustrating when things break or when servers don’t respond properly due to network switches or routers not passing packets.

That’s just kind of a general overview of how the concept works; the specifics may be different with different clustering systems. And it’s going to be very different if you have an appliance involved or if you use things like round robin DNS or other outside tools to handle load balancing.