© 2011 www.schellworth.de
Apple has removed IP Failover from OS X Lion Server (10.7.2).
Doesn't work as expected. The heartbeatd works fine, but failoverd crashs the kernel after a take over process is initiated. The system can't boot after that happened, just an reinstall will fix the problem. Weird thing, maybe someone know how to fix this issue?
If you need the Failover Daemon i.e. for AFP, you can port it from an OS X 10.6.x Server Installation (hereinafter a short HowTo).
Copy the following files from an OS X Snow Leopard Server (10.6) to a Lion (10.7) Installation. A Lion Server isn't necessary.
scp /usr/sbin/heartbeatd root@lion.server:/usr/sbin/. scp /usr/sbin/failoverd root@lion.server:/usr/sbin/. scp -r /System/Library/PrivateFrameworks/CoreServer.framework root@lion.server:/System/Library/. scp /usr/libexec/NotifyFailover root@lion.server:/usr/libexec/. scp /usr/libexec/ProcessFailover root@lion.server:/usr/libexec/. scp -r /System/Library/StartupItems/IPFailover root@lion.server:System/Library/StartupItems/.
Hopefully Apple will come up with a new Failover solution some day. OD just have one built in, and how the new AFP daemon works I haven't found out yet (the AppleFileServer manpage is the worst I've ever seen!). The SAN (integrated in Lion) features would be an other way for an AFP high availability solution.
The IP Failover feature is a daemon process integrated in OS X Server. It's mostly used for high availability server setups, i.e. if you use OS X Server as a OpenDirectory (OD) Server for client home folders stored on the server or for Netboot clients.
OD it self doesn't need the IP Failover daemon features. If you have at least 2 ODs running and one server goes off line, the clients automatically find the other OD servers, it's a built in OD feature. Unlike to AFP or other applications, i.e. a type server or the SWUPD (software update daemon), the clients won't find the new server if the IP or URL of the master is gone.
Over the years I did a lot of IP Failover setups for large companies. The main setups and how they work are well documented by Apple and some other resources on the web.
Here you can find a nearly perfect “IP Failover Test Script”, that checks a little more before the take over starts. The main problems are to restart the master server (without shutting down the replic) or release him after an acquire without getting inconsistence of the home directories data. The so called STOMITH (Shoot The Other Machine In The Head) procedure is not necessary any more (who did invent this rough solution?).
If you like to use them commercially, please contact me.
This IP Failover use additionally a rSync Script to update the home directory files. You can find this script in the Linux section.
#!/bin/bash # (c) 2011 www.schellworth.de v.0.2.1 # # This Script checks if the FileSync is clean and additionally if the IP is really down after <WAIT> seconds. # # WARNING: If the FileSync is not clean (maybe after a past take over and the files hasn't been updated) a TakeOver won't proceed! # STATE=$1 #acquire or release mode IP=$2 #on hook IP WAIT=240 #sleep X seconds (240 = 4 minutes), to check if the Server is really down (or just reboots) HOST_IP=192.168.0.1 #2nd interface to check system health SYNC_STATE="/Library/Scripts/rsync/rsyncbackup.com" logger "IPFailover (Test): Testscript starts to $1 IP: $2" check_link=$( ls -l $SYNC_STATE ) #Get the linked target target=${check_link#* -> } target_file=${target##*/} if [ $target_file == "activesync.sh" ] #Check if the FileSync is active then /sbin/ping -q -c1 $HOST_IP &> /dev/null #Check the 2nd IP if the Master is really down if ([ "$?" -gt 0 ] && [ $STATE == "acquire" ]) then #ACQIURE logger "IPFailover (Test): $HOST_IP isn't reachable! Test again in $WAIT seconds. ..." sleep $WAIT #wait a few seconds logger: "IPFailover (Test): ... ping $HOST_IP" /sbin/ping -q -c1 $HOST_IP &> /dev/null #Check the 2nd IP again if [ "$?" -gt 0 ] then logger "IPFailover (Test): Master is DOWN! Acquiring $IP will proceed. (0)" exit 0 else logger "IPFailover (Test): $HOST_IP is UP. Canceling to acquire $IP (5)." exit 5 fi else if [ $1 == "release" ] #RELEASE then logger "IPFailover (Test): Master is UP again. Releasing $IP will proceed" exit 0 else logger "IPFailover (Test): $HOST_IP is UP! Canceling to acquire $IP (100)." echo "started from command line?" echo "usage: Test ['acquire' | 'release'] ['IP']" exit 10 fi fi else logger "IPFailover (Test): FileSync is disabled. Please reverse the FileFsync process!" logger "IPFailover (Test): $STATE $IP canceled! (50)" SUBJECT="WARNING!!! The IP Failover $IP failed!" TO="support@it.org" BODY="Please check the server logs." echo "$BODY" | mail -s "$SUBJECT" "$TO" exit 50 fi
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PreAcq script" # Disable rsync Process ln -sf /Library/Scripts/rsyncbackup/inactivsync.sh /Library/Scripts/rsyncbackup/rsyncbackup.com logger "IPFailover: RSYNC disabled"
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PostAcq script" # Starting AFP-Daemon serveradmin start afp logger "IPFailover: AFP started"
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PreRel script" # Stop afp-Daemon serveradmin stop afp logger "IPFailover: AFP stopped"
#! /bin/bash # (c) www.schellworth.de logger "IPFailover: Starting PostRel script"
http://docs.info.apple.com/article.html?path=ServerAdmin/10.5/en/c3fs29.html
http://www.mactech.com/articles/mactech/Vol.23/23.03/OSXFailover-Part1/index.html
http://www.afp548.com/article.php?story=20050218175501583
http://www.afp548.com/article.php?story=20051018203349525
http://www.mac-o-net.de/article/8
http://osxnetzwerk.de/2010/08/12/ip-failover-fuer-afp-unter-mac-os-x-server-snow-leopard/
Discussion
I wish they would do something. Know of any third party alternatives? I much like you have created my only scripts but using PINGs instead of the hearbeatd protocol on both sides. I then wrote a custom fail over script.
There is a Linux alternative, but it's very kernel based and wouldn't work on Darwin - http://www.keepalived.org/. I'm thinking of to code an alternative in bash. One important feature is to take over a virtual IP on hook of one interface (that won't be a big issue). The other issue is to start this script in launchd and restart it, when it crashed. The best way is to code it in Objective C or in any other language, but this will be a bigger issue (for me).
Here are some more Linux alternatives: http://www.toniwestbrook.com/archives/184 http://www.linux-ha.org/wiki/Main_Page
http://ostatic.com/mpathd/home/1#http://www.ultramonkey.org/I'll have a deeper look into this the next days.
MPATHD is very outdated. Ultramonkeys is also outdated. I'm currently looking a little bit deeper into the linux-ha heartbeat solution.
The conclusion that I have come to is that it has been completely taken out. The general consensus is that apple is moving away from the enterprise market (isn't it obvious?)
My best guess is that heartheatd and failover didn't simply port over from their previous source and maybe had issues with compatibility with the removal of classic support in Lion. Similar to the Final cut fiasco. I also think that is why they re engineered the presentation and GUI interface so radically, and also choose to leave other services out completely.
The only other reason would be that they know of another product that does it differently and better, similar to their stance on xserve raids and having them discontinued and replaced by Promise.
The problem is I do not know of any alternative 3rd party solution that does what it did. I ended up writing my own scripts that monitor my two servers and then fails over in the case of failure.
If anyone with applescript and servers wants to collaborate on what I have I would be totally open. I have been running it for about 3 months in production without issues, and prior tested all functionality which worked great.
That should be the sadly truth. Hence I think there is a way for an 3rd party alternative. Maybe I could port one of the existing Linux projects to Darwin, would be worth a try.
Hello any luck with a 3rd party alternative, I ended up writing a new script in applescript for a client, I am going to begin to port it to Xcode applescript cocoa with GUI and build it out a little bit. I would love to get your opinion of some of the procedures.
After some struggle I've successfully compiled heartbeat/pacemaker on OS X. That should be a good base to go a head to configure it. I'll post my results soon.