Apple OS X IP Failover Daemon

OS X Lion (10.7) IP Failover Workaround

Apple has removed IP Failover from OS X Lion Server (10.7.2).

Doesn't work as expected. The heartbeatd works fine, but failoverd crashs the kernel after a take over process is initiated. The system can't boot after that happened, just an reinstall will fix the problem. Weird thing, maybe someone know how to fix this issue? If you need the Failover Daemon i.e. for AFP, you can port it from an OS X 10.6.x Server Installation (hereinafter a short HowTo). Copy the following files from an OS X Snow Leopard Server (10.6) to a Lion (10.7) Installation. A Lion Server isn't necessary.

scp /usr/sbin/heartbeatd root@lion.server:/usr/sbin/.
scp /usr/sbin/failoverd root@lion.server:/usr/sbin/.
scp -r /System/Library/PrivateFrameworks/CoreServer.framework root@lion.server:/System/Library/.
scp /usr/libexec/NotifyFailover root@lion.server:/usr/libexec/.
scp /usr/libexec/ProcessFailover root@lion.server:/usr/libexec/.
scp -r /System/Library/StartupItems/IPFailover root@lion.server:System/Library/StartupItems/.

Hopefully Apple will come up with a new Failover solution some day. OD just have one built in, and how the new AFP daemon works I haven't found out yet (the AppleFileServer manpage is the worst I've ever seen!). The SAN (integrated in Lion) features would be an other way for an AFP high availability solution.

OS X IP Failover Advanced Test

The IP Failover feature is a daemon process integrated in OS X Server. It's mostly used for high availability server setups, i.e. if you use OS X Server as a OpenDirectory (OD) Server for client home folders stored on the server or for Netboot clients.

OD it self doesn't need the IP Failover daemon features. If you have at least 2 ODs running and one server goes off line, the clients automatically find the other OD servers, it's a built in OD feature. Unlike to AFP or other applications, i.e. a type server or the SWUPD (software update daemon), the clients won't find the new server if the IP or URL of the master is gone.

Over the years I did a lot of IP Failover setups for large companies. The main setups and how they work are well documented by Apple and some other resources on the web.
Here you can find a nearly perfect “IP Failover Test Script”, that checks a little more before the take over starts. The main problems are to restart the master server (without shutting down the replic) or release him after an acquire without getting inconsistence of the home directories data. The so called STOMITH (Shoot The Other Machine In The Head) procedure is not necessary any more (who did invent this rough solution?).

If you like to use them commercially, please contact me.

Test

This IP Failover use additionally a rSync Script to update the home directory files. You can find this script in the Linux section.

#!/bin/bash
# (c) 2011 www.schellworth.de v.0.2.1
#
# This Script checks if the FileSync is clean and additionally if the IP is really down after <WAIT> seconds.
# 
# WARNING: If the FileSync is not clean (maybe after a past take over and the files hasn't been updated) a TakeOver won't proceed!
# 
 
 
STATE=$1					#acquire or release mode
IP=$2						#on hook IP
WAIT=240					#sleep X seconds (240 = 4 minutes), to check if the Server is really down (or just reboots)
HOST_IP=192.168.0.1 				#2nd interface to check system health
SYNC_STATE="/Library/Scripts/rsync/rsyncbackup.com"
 
 
 
 
logger "IPFailover (Test): Testscript starts to $1 IP: $2"
 
check_link=$( ls -l $SYNC_STATE )               #Get the linked target
target=${check_link#* -> }
target_file=${target##*/}
 
if [ $target_file == "activesync.sh" ]       	#Check if the FileSync is active
then
	/sbin/ping -q -c1 $HOST_IP &> /dev/null         #Check the 2nd IP if the Master is really down
	if ([ "$?" -gt 0 ] && [ $STATE == "acquire" ])
	then						#ACQIURE
		logger "IPFailover (Test): $HOST_IP isn't reachable! Test again in $WAIT seconds. ..."
		sleep $WAIT				#wait a few seconds
		logger: "IPFailover (Test): ... ping $HOST_IP"
		/sbin/ping -q -c1 $HOST_IP &> /dev/null	#Check the 2nd IP again
		if [ "$?" -gt 0 ]
		then
			logger "IPFailover (Test): Master is DOWN! Acquiring $IP will proceed. (0)"
			exit 0
		else
			logger "IPFailover (Test): $HOST_IP is UP. Canceling to acquire $IP (5)."
			exit 5
		fi
	else
		if [ $1 == "release" ]		#RELEASE
     		then
			logger "IPFailover (Test): Master is UP again. Releasing $IP will proceed"	
			exit 0	
		else
			logger "IPFailover (Test): $HOST_IP is UP! Canceling to acquire $IP (100)."
			echo "started from command line?"
			echo "usage: Test ['acquire' | 'release'] ['IP']"
			exit 10	
		fi
	fi
else
	logger "IPFailover (Test): FileSync is disabled. Please reverse the FileFsync process!"
	logger "IPFailover (Test): $STATE $IP canceled! (50)"
 
	SUBJECT="WARNING!!! The IP Failover $IP failed!" 
	TO="support@it.org" 
	BODY="Please check the server logs."
	echo "$BODY" | mail -s "$SUBJECT" "$TO"
 
	exit 50
fi

PreAcq

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PreAcq script"
 
# Disable rsync Process
ln -sf /Library/Scripts/rsyncbackup/inactivsync.sh /Library/Scripts/rsyncbackup/rsyncbackup.com
logger "IPFailover: RSYNC disabled"

PostAcq

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PostAcq script"
 
# Starting AFP-Daemon
serveradmin start afp
logger "IPFailover: AFP started"

PreRel

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PreRel script"
 
# Stop afp-Daemon 
serveradmin stop afp
logger "IPFailover: AFP stopped"

PostRel

#! /bin/bash
# (c) www.schellworth.de
 
logger "IPFailover: Starting PostRel script"

Resources

Discussion

JT, 2011/12/23 09:34

I wish they would do something. Know of any third party alternatives? I much like you have created my only scripts but using PINGs instead of the hearbeatd protocol on both sides. I then wrote a custom fail over script.

Philipp Schellworth, 2011/12/23 11:53

There is a Linux alternative, but it's very kernel based and wouldn't work on Darwin - http://www.keepalived.org/. I'm thinking of to code an alternative in bash. One important feature is to take over a virtual IP on hook of one interface (that won't be a big issue). The other issue is to start this script in launchd and restart it, when it crashed. The best way is to code it in Objective C or in any other language, but this will be a bigger issue (for me).

Philipp Schellworth, 2011/12/23 12:17

Here are some more Linux alternatives: http://www.toniwestbrook.com/archives/184 http://www.linux-ha.org/wiki/Main_Page http://ostatic.com/mpathd/home/1# http://www.ultramonkey.org/

I'll have a deeper look into this the next days.

Philipp Schellworth, 2012/01/19 11:33

MPATHD is very outdated. Ultramonkeys is also outdated. I'm currently looking a little bit deeper into the linux-ha heartbeat solution.

jt, 2011/12/23 09:47

The conclusion that I have come to is that it has been completely taken out. The general consensus is that apple is moving away from the enterprise market (isn't it obvious?)

My best guess is that heartheatd and failover didn't simply port over from their previous source and maybe had issues with compatibility with the removal of classic support in Lion. Similar to the Final cut fiasco. I also think that is why they re engineered the presentation and GUI interface so radically, and also choose to leave other services out completely.

The only other reason would be that they know of another product that does it differently and better, similar to their stance on xserve raids and having them discontinued and replaced by Promise.

The problem is I do not know of any alternative 3rd party solution that does what it did. I ended up writing my own scripts that monitor my two servers and then fails over in the case of failure.

If anyone with applescript and servers wants to collaborate on what I have I would be totally open. I have been running it for about 3 months in production without issues, and prior tested all functionality which worked great.

Philipp Schellworth, 2011/12/23 12:32

That should be the sadly truth. Hence I think there is a way for an 3rd party alternative. Maybe I could port one of the existing Linux projects to Darwin, would be worth a try.

jt, 2012/04/12 23:35

Hello any luck with a 3rd party alternative, I ended up writing a new script in applescript for a client, I am going to begin to port it to Xcode applescript cocoa with GUI and build it out a little bit. I would love to get your opinion of some of the procedures.

Philipp Schellworth, 2012/04/17 10:10

After some struggle I've successfully compiled heartbeat/pacemaker on OS X. That should be a good base to go a head to configure it. I'll post my results soon.

Enter your comment. Wiki syntax is allowed:
 
os_x/ipfailover.txt · Last modified: 2011/12/14 22:02 by pschellworth
You are here: startos_xipfailover
Driven by DokuWiki Recent changes RSS feed Valid XHTML 1.0
Modified template based on A Centered Perspective.