I’ve just completed a project for my home network / lab to automate the graceful shutdown of my VMware Esxi hosts. Project Configure APC Powerchute Network Shutdown (PCNS) objective was to solve a problem with accounting for input power failure to my UPS. We rarely experience utility power outages but we had an incident recently where the heater element in the tumble dryer burnt out causing the Main RCD to trip. I get about 45 mins of run time on the UPS and as we were not home or close enough to reset the RCD the hypervisors came to a hard stop! There were a couple of casualty’s, VM’s that experienced disk corruption and needed to be rebuilt / restored from backups. Less than ideal.
If your reading this blog post because your trying to solve a similar problem then read on as I’ve documented the steps I took and various tweaks to get it all working.
I have an APC UPS 1500 with the optional Ethernet management card. My Synology NAS devices use SNMP to query the UPS and will gracefully shut themselves down when the UPS is low on battery. I never quite got round to configuring PCNS to shutdown the Esxi hosts so after the last power incident it became a high priority. The plan was install PCNS on VMware vMA to register with the management card on the UPS as a PowerChute client. Upon the UPS running on battery and with the run time remaining threshold exceeded, execute a script that will gracefully shutdown both my VMware Esxi hypervisors. I thought this was going to be fairly simple?
With vMA and PCNS installed I killed the power to the UPS and waited in anticipation for PCNS to execute the script. Yeah – like that worked…
Using my Google-Fu I fairly quickly learnt that PCNS can no longer shutdown the free version (unlicensed) of Esxi. VMware removed the ability to use the rCli API for unlicensed Esxi hosts. Great. I’ve many times considered purchasing the vSphere Essentials licence but never pulled the trigger. The £700+ price tag for the licence needs consideration especially for a home lab.
Anyway, I found a resolution to the issue by using an awesome script that you can find here. The script was written by William Lam in perl and uses SOAP commands to emulate the vSphere client logging into the host and executing the graceful shutdown. Cool, and no vSphere licence required!
It wasn’t quite as simple as pointing PCNS at the perl script though. I had to tweak it slightly as PCNS will not execute scripts with arguments involved. PCNS will also only execute bash / shell scripts. I wrote a simple bash script to execute the modified perl script. These are the steps I took, get ready:
1) Install the Vmware vMA appliance. You don’t have to use vMA but whatever you use it has to have the VMware SDK installed. I use vMA version 6 to backup the configurations of my Esxi hosts so I already had it installed as a VM on each of my hypervisors.
2) Install APC PCNS on the vMA(s) – I used PCNS version 4.1
Make note of the first two lines below, they’ll save you some head scratching when trying to install PCNS!
sudo chmod 777 /etc/rc.d sudo mkdir /etc/rc.d/init.d cd /tmp sudo gunzip pcns4xxxESXi.tar.gz sudo tar -xvf pcns4xxxESXi.tar cd ESXi sudo ./install_en.sh
Accept default options, once asked for Java Directory enter the path as below (assuming you want to use the already installed Java version)
3) Create bash script to be executed by PCNS. Pay very careful attention to “source /etc/profile” this was the secret sauce line of code required for PCNS to execute the script as root with the correct environment variables set. I could successfully run the perl script manually whilst logged into vMA as v-admin but when executed by PCNS it would fail. This took many hours of Googling to find the solution too. I hacked around adding PATH= statements into the scripts taken manually from environment variables but had no luck, source /etc/profile did the trick for me and at great relief. I created a directory called APC to store the script:
#!/bin/sh # When UPS running on battery and remaining run time threshold has been reached PCNS will execute this script # For PCNS to execute the script (as root) the below line is required to load environment variables and paths source /etc/profile # # Leave a record of the shutdown to /APC/PCNS-Shutdownesxi.log echo "UPS run time below threshold - shutting down Esxi host $(date +%Y%m%d-%H%M%S)" >> /APC/PCNS-Shutdownesxi.log # #Execute the perl script /APC/./nameofperlscript.pl
As mentioned previously I had issues with executing the perl script with PCNS using arguments. The original perl script written by William Lam can shutdown a list of hosts using an argument which references a file containing the list of hosts to be shutdown. Below is the modification I made to the script to enable me to specify my host directly in the perl script. Locate the section in the script as below:
#### DO NOT EDIT PAST HERE #### my @hostlist = ("sv-na1-phy-03.corp.netassured.co.uk"); my ($file,$request,$message,$response,$retval,$cookie); #&verifyUserInput(); #&processFile($file);
4) Set PCNS to execute the below script when configured thresholds are met. For Esxi host1 I chose to have PCNS run the script when the run time remaining on batteries is 30 minutes. For host 2 shutdown when 15 minutes remaining. Staggering the shutdown enables me to squeeze a bit more life out of the batteries on the UPS. I have vMA installed on each of my Esxi hosts with PCNS installed separately on each.
Enter the full path to the script and the time remaining while on battery:
5) Configure via the vSphere client the VM power on / shutdown options to set desired order for startup / shutdown of the VM’s:
And that’s it! If the UPS looses input power for some reason PCNS will execute the script at the specified threshold, connect to the Esxi host and gracefully shutdown the VM’s and power down the host. This will prevent disk corruption experienced during a hard stop of the host.
The next problem…
Once the utility power is restored, and the UPS is powered back up how do you switch the Esxi hosts back on remotely? Fortunately I have the Intel Xeon processors edition of the Lenovo Thinkserver TS140. Intel AMT is baked right into the silicon and once configured via the BIOS you can connect to the server via a web gui even when the server is powered off.
The caveat is there must still be input power to the power supply. AMT enables you to send power on /off commands to the server. Got to love Out of Band management 🙂