Forum
Welcome, Guest
Username: Password: Remember me
This is the optional category header for the Suggestion Box.
  • Page:
  • 1
  • 2

TOPIC:

Host Failure in Two Host Cluster 10 years 9 months ago #14

Hi,

I've created a Pool for my two hosts. I have one vAPP which has my four VM's which are part of the pool. The VM's are stored on a SAN SR. I also have a SNAPshot policy but that's not too important here.

So basically, I'm trying to configure the VM's that are running on host 1 to move to host 2 automatically on host 1 failure, e.g. blue screens, powered off unexpectedly. Here is my HA-Lizard configuration. I've been trying this on a couple of attempts and not really getting anywhere. I also read the document and section "Auto-Start VMs – With Host High Availability" and configured something very similar.

HA-Lizard Settings

DISABLED_VAPPS=()
ENABLE_LOGGING=0
FENCE_ACTION=stop
FENCE_ENABLE=1
FENCE_ENABLED=1
FENCE_FILE_LOC=/etc/ha-lizard/fence
FENCE_HA_ONFAIL=1
FENCE_HEURISTICS_IPS=10.10.0.1
FENCE_HOST_FORGET=1
FENCE_IPADDRESS=
FENCE_METHOD=POOL
FENCE_MIN_HOST=1
FENCE_MIN_HOSTS=3
FENCE_PASSWD=
FENCE_QUORUM_REQUIRED=1
FENCE_REBOOT_LONE_HOST=1
FENCE_USE_IP_HEURISTICS=0
GLOBAL_VM_HA=1
MAIL_FROM=IT-ITIT-Pool01@ITIT.com
MAIL_ON=1
MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"
MAIL_TO=itsupport@ITIT.com
MONITOR_DELAY=45
MONITOR_KILLALL=1
MONITOR_MAX_STARTS=50
MONITOR_SCANRATE=10
OP_MODE=2
PROMOTE_SLAVE=1
SLAVE_HA=1
SLAVE_VM_STAT=1
XAPI_COUNT=5
XAPI_DELAY=15
XC_FIELD_NAME=XCPool01
XE_TIMEOUT=10

Thanks
Simon

Please Log in or Create an account to join the conversation.

Host Failure in Two Host Cluster 10 years 9 months ago #15

Simon,
A 2 node pool is a special case which will require an additinal quorum vote in order to takeover a failed host. To do this, please set FENCE_USE_IP_HEURISTICS to 1 (you have it set to 0 in your configuration). This will trigger some logic that will check the provided HEURISTICS_IPS, and if they are reachable, an additional quorum vote will be added so that a "majority" is acheived. FENCE_MIN_HOSTS will also need to be set to 2 (you currently have than listed twice in your configuration.. one entry has a typo)

Also, did you make configuration changes via the cli tool or manually in ha-lizard.conf? Usually, it is not necessry to modify the text based configuration at all since all settings are globally stored in one place, the XAPI DB, regrdless of the number of hosts in the pool.

A reference design for a 2 node pool is currently under development and expected to be released in a week or so.. Below is an excerpt detailing the required settings for a 2 node pool.. These are verified to work reliably. PLease adjust where necessary to fit your environment.

ha-cfg set FENCE_ENABLED 1
ha-cfg set FENCE_HEURISTICS_IPS <IP Address of Management Switch>
ha-cfg set FENCE_MIN_HOSTS 2
ha-cfg set FENCE_QUORUM_REQUIRED 1
ha-cfg set FENCE_USE_IP_HEURISTICS 1
ha-cfg set MAIL_TO <your alert emails address>
ha-cfg set MONITOR_DELAY 15
ha-cfg set MONITOR_MAX_STARTS 20
ha-cfg set XAPI_COUNT 2
ha-cfg set XAPI_DELAY 10

Please Log in or Create an account to join the conversation.

Host Failure in Two Host Cluster 10 years 9 months ago #16

Hi,

I just did it from the CLI. What I outputted to the post was from the ha-cfg get command.

I've added the changes you've recommended and its great in a mode where the host is gracefully restarted or entered into maintenance mode but not when I simulate a blue screen. (Push and hold the power button). Would it be because we're closing off the Slave node and not the Master? Shouldn't the Slave resume the roll of the master if the master was to fail and remain the master until it is reverted back? Is this mechanism handled by the HA-Lizard at this point or XCP.

Have it mis-interrupted at some point?

Thanks again

Simon

DISABLED_VAPPS=()
ENABLE_LOGGING=0
FENCE_ACTION=stop
FENCE_ENABLE=1
FENCE_ENABLED=1
FENCE_FILE_LOC=/etc/ha-lizard/fence
FENCE_HA_ONFAIL=1
FENCE_HEURISTICS_IPS=10.10.0.1
FENCE_HOST_FORGET=1
FENCE_IPADDRESS=
FENCE_METHOD=POOL
FENCE_MIN_HOST=2
FENCE_MIN_HOSTS=3
FENCE_PASSWD=
FENCE_QUORUM_REQUIRED=1
FENCE_REBOOT_LONE_HOST=1
FENCE_USE_IP_HEURISTICS=1
GLOBAL_VM_HA=1
MAIL_FROM=
MAIL_ON=1
MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"
MAIL_TO=
MONITOR_DELAY=15
MONITOR_KILLALL=1
MONITOR_MAX_STARTS=20
MONITOR_SCANRATE=10
OP_MODE=2
PROMOTE_SLAVE=1
SLAVE_HA=1
SLAVE_VM_STAT=1
XAPI_COUNT=2
XAPI_DELAY=10
XC_FIELD_NAME=XCPool01
XE_TIMEOUT=10

Please Log in or Create an account to join the conversation.

Host Failure in Two Host Cluster 10 years 9 months ago #17

It seems you still have FENCE_MIN_HOSTS set to 3.. this should be set to 2.
It is odd that you have this configuration entry twice and spelled differently.

try "ha-cfg set FENCE_MIN_HOSTS 2"

Also, if you are running version 1.6.41.x, you can also set "FENCE_FORGET_HOST" to 0. this will save you the trouble of having to re-introduce the host into the pool after fencing.

Can you post the log output if you are still experiencing issues after the above changes.. Use "ha-cfg log" to start a log session.. Scrub any sensitive details before posting.

Lastly, you should have no trouble failing over on a host failure as you describe. Once the failover happens though, there is no failing back. Meaning, if a Master fails and a Slave becomes the Master, the new Master will remain the Master. When the former Master joins the pool, it will be a Slave.

Post failover - if you wish to return to former Master to that role, you will need to temporarily disable HA "ha-cfg status" and then issue the following commands on the Slave that is to become the master

xe pool-emergency-transition-to-master
xe pool-recover-slaves

when done - re-enable HA with "ha-cfg status"

Please Log in or Create an account to join the conversation.

Host Failure in Two Host Cluster 10 years 9 months ago #18

Thanks again for the reply. I've added the change you've recommend and some different behavior. Good though. For a start when the pool realises there is an issue in XenCentre it does recover to show the failed host and you manage the other vm's on the other host. Only issue is the VDI. The one VM I leave on the host I fail won't start on the server. The moment the host recovers it will complete the migration.

Here's the logs. Looks like access issue with the VDI.


Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: --- 10.10.0.12 ping statistics ---
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: 1 packets transmitted, 0 received, 100% packet loss, time 0ms
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh:
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: PING 10.10.0.1 (10.10.0.1) 56(84) bytes of data.
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: 64 bytes from 10.10.0.1: icmp_seq=1 ttl=255 time=0.788 ms
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh:
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: --- 10.10.0.1 ping statistics ---
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: 1 packets transmitted, 1 received, 0% packet loss, time 0ms
Jul 10 10:42:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: rtt min/avg/max/mdev = 0.788/0.788/0.788/0.000 ms
Jul 10 10:42:35 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: /etc/ha-lizard/ha-lizard.func: line 912: Stopping: command not found
Jul 10 10:42:35 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error: Connection refused (calling connect )
Jul 10 10:42:47 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:42:47 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:43:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:43:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:43:17 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:43:17 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:43:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:43:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:43:47 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:43:47 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:44:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:44:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:44:17 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:44:17 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:44:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:44:32 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:44:47 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:44:47 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:45:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:45:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:45:17 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:45:17 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:45:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:45:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:45:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:45:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:46:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:46:02 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:46:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:46:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:46:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:46:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:46:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:46:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:47:03 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:47:03 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:47:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:47:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:47:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:47:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:47:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:47:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:48:03 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:48:03 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:48:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:48:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:48:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:48:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:48:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:48:48 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:49:03 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:49:03 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:49:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:49:18 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],
Jul 10 10:49:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error code: SR_BACKEND_FAILURE_46
Jul 10 10:49:33 XCP-04 ha-lizard-NOTICE-/etc/ha-lizard/ha-lizard.sh: Error parameters: , The VDI is not available [opterr=VDI 1c9334f0-0345-4005-83f6-a6ba670bb190 already attached RW],

Please Log in or Create an account to join the conversation.

Host Failure in Two Host Cluster 10 years 9 months ago #19

It appears that the VDI is still attached to the failed host.. In your test, did you fail the Master or the Slave? Part of the fencing logic should clear any possible hung VDIs from the failed host in order to start properly on the surviving node.. it may be that fencing is failing..

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2