Forum
Welcome, Guest
Username: Password: Remember me
This is the optional category header for the Suggestion Box.

TOPIC:

Master Up, slave lost network & VM stuck on slave 7 years 7 months ago #976

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Regarding the second part of your question - yes, there are some settings that can be configured to significantly speed up the takeover by the slave. When the slave detects a missing master, it will sit in a loop retying to connect to the master for a configurable number of tries. If all attempts fail, then the slave will takeover as master and start any affected VMs.

Default settings
XAPI_COUNT=2 (2 loop iterations trying to reconnect to master)
XAPI_DELAY=10 (sleep 10 seconds between each iteration)

This creates 20 seconds in delay plus timeouts.

For a quick switchover, try XAPI_COUNT=1 and XAPI_DELAY=1. There is some danger in this scenario though. A ~ 2 second interruption of the master's management interface could trigger an HA event.

I don't think these settings have anything to do with your test failure though.

Please Log in or Create an account to join the conversation.

Master Up, slave lost network & VM stuck on slave 7 years 7 months ago #977

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
The poweroff is really just to clean up stale states held in the slave's copy of the xapi DB. VM status for the former Master's VMs will be "Powered on" which would prevent starting on the slave.

Please Log in or Create an account to join the conversation.

Master Up, slave lost network & VM stuck on slave 7 years 7 months ago #978

  • Tobias Kreidl
  • Tobias Kreidl's Avatar Topic Author
  • Offline
  • Posts: 14
I agree, I think the circumstances are different. The failed master should be fenced and all VMs restarted that are designated to do so, regardless what's happening on the failed master. Of course, when it comes back up again, that will have to be reconciled, as well, but that's not the immediate concern. :-)

Still waiting for the one server to reboot and I will reproduce that set of circumstances. Many thanks in advance!

Please Log in or Create an account to join the conversation.

Last edit: by Tobias Kreidl.

Master Up, slave lost network & VM stuck on slave 7 years 7 months ago #979

  • Tobias Kreidl
  • Tobias Kreidl's Avatar Topic Author
  • Offline
  • Posts: 14
Log is saying PROMOTE_SLAVE is disabled -- I take it this is probably the issue?

Please Log in or Create an account to join the conversation.

Master Up, slave lost network & VM stuck on slave 7 years 7 months ago #980

  • Tobias Kreidl
  • Tobias Kreidl's Avatar Topic Author
  • Offline
  • Posts: 14
Here is the log, sanitized for various things.

AM now realizing both of these probably need to be enabled:

####################################################################
# If the Pool Master cannot be reached and all attempts to reach
# it have been exhausted, set whether autoselected slave will try to
# start appliances and/or VMs.
# (PROMOTE_SLAVE must also be set to 1 for this to work)
####################################################################
#SLAVE_HA=1

####################################################################
# If master cannot be reched - set whether slave should be promoted
# to pool master (this only affects a single slave: the
# "autoselect" winner chosen by the former master to recover the pool)
####################################################################
#PROMOTE_SLAVE=1

Please Log in or Create an account to join the conversation.

Last edit: by Tobias Kreidl.

Master Up, slave lost network & VM stuck on slave 7 years 7 months ago #981

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Very likely the issue.. Here are settings from a 2-node dev pool that are working:

DISABLED_VAPPS=()
ENABLE_LOGGING=1
FENCE_ACTION=stop
FENCE_ENABLED=1
FENCE_FILE_LOC=/etc/ha-lizard/fence
FENCE_HA_ONFAIL=0
FENCE_HEURISTICS_IPS=192.168.1.1
FENCE_HOST_FORGET=0
FENCE_IPADDRESS=
FENCE_METHOD=POOL
FENCE_MIN_HOSTS=2
FENCE_PASSWD=
FENCE_QUORUM_REQUIRED=1
FENCE_REBOOT_LONE_HOST=0
FENCE_USE_IP_HEURISTICS=1
GLOBAL_VM_HA=1
MAIL_FROM="root@localhost"
MAIL_ON=1
MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"
MAIL_TO="root@localhost"
MONITOR_DELAY=15
MONITOR_KILLALL=1
MONITOR_MAX_STARTS=20
MONITOR_SCANRATE=10
OP_MODE=2
PROMOTE_SLAVE=1
SLAVE_HA=1
SLAVE_VM_STAT=0
SMTP_PASS=""
SMTP_PORT="25"
SMTP_SERVER="127.0.0.1"
SMTP_USER=""
XAPI_COUNT=2
XAPI_DELAY=10
XC_FIELD_NAME='ha-lizard-enabled'
XE_TIMEOUT=10

Please Log in or Create an account to join the conversation.