Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC:

VM's failed to switch when one host failed with HA enabled 5 years 3 months ago #1761

  • Sherbin George
  • Sherbin George's Avatar Topic Author
  • Offline
  • Posts: 11
Hi,

We have been using HALizard configured on a pool. Last night we had an outage on a Host and the VM's residing under the same host failed to switch to other host within the pool.

Can you help me in fixing the situation?

Please Log in or Create an account to join the conversation.

VM's failed to switch when one host failed with HA enabled 5 years 3 months ago #1762

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
The user.log file would tell us exactly what happened. Can you post the user.log for the time when the outage occured? You should have many of them in /var/log. Just the one that covers the time of the incident would suffice

Please Log in or Create an account to join the conversation.

VM's failed to switch when one host failed with HA enabled 5 years 3 months ago #1763

  • Sherbin George
  • Sherbin George's Avatar Topic Author
  • Offline
  • Posts: 11
Hi,

Attaching some logs which popped up during the outage,

*********************
Dec 31 18:44:03 XEN08 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Dec 31 18:44:04 XEN08 ha-lizard: 3211 ha-lizard already running: Attempt 16 on PIDS: 3004 418
Dec 31 18:44:04 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:06 XEN08 iscsi-ha: 5417 Spawning new instance of iscsi-ha
Dec 31 18:44:06 XEN08 iscsi-ha: 5417 check_logger_processes Checking logger processes
Dec 31 18:44:06 XEN08 iscsi-ha: 5417 check_logger_processes No processes to clear
Dec 31 18:44:06 XEN08 iscsi-ha: Normalized ISCSI_TARGET_SERVICE [ tgtd ]
Dec 31 18:44:06 XEN08 iscsi-ha: XenServer Major Release = [ 7 ]
Dec 31 18:44:06 XEN08 iscsi-ha: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:06 XEN08 iscsi-ha: This iteration is count 943
Dec 31 18:44:06 XEN08 iscsi-ha: Checking if this host is a Pool Master or Slave
Dec 31 18:44:06 XEN08 iscsi-ha: This host's pool status = slave:10.200.2.120
Dec 31 18:44:06 XEN08 iscsi-ha: service_execute: Execute [ status ] on [ iscsi-ha ]
Dec 31 18:44:06 XEN08 iscsi-ha: service_execute: System V mode detected
Dec 31 18:44:06 XEN08 iscsi-ha: service_execute: [ OK ]#015iscsi-ha running: 7234
Dec 31 18:44:06 XEN08 iscsi-ha: service_execute: Returning exit status [ 0 ]
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 local_ip_list: Local IP list returned 127.0.0.1#01210.200.2.121#01210.10.10.2
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 service_execute: Execute [ status ] on [ tgtd ]
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 service_execute: systemctl mode being used
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 service_execute: ● tgtd.service - tgtd iSCSI target daemon#012 Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)#012 Drop-In:
/etc/systemd/system/tgtd.service.d#012 └─local.conf#012 Active: inactive (dead)
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 service_execute: Returning exit status [ 3 ]
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 iSCSI target: tgtd status stopped. Expected Stopped . [inactive (dead)]
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101)#012srcversion: 2A6B2FA4F0703B49CA9C727 #012#012 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnk
nown C r
#012 ns:0 nr:1622317788 dw:1622317788 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 validate_drbd_resources_loaded: Checking DRBD has loaded with resources. Checking [ 5 ] > [ 2 ]
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 validate_drbd_resources_loaded: Resources loaded
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 check_drbd_resource_state: DRBD Resource: iscsi1 in Secondary mode
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 email: Duplicate message - not sending. Content = DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:06 XEN08 iscsi-ha: 5627 email: Message barred for 30 minutes
Dec 31 18:44:07 XEN08 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Pool Master 10.200.2.120 not responding - replug_pbd exiting
Dec 31 18:44:09 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:13 XEN08 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Dec 31 18:44:14 XEN08 ha-lizard: 3211 ha-lizard already running: Attempt 17 on PIDS: 3004 418
Dec 31 18:44:14 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:16 XEN08 iscsi-ha: 5578 Spawning new instance of iscsi-ha
Dec 31 18:44:16 XEN08 iscsi-ha: 5578 check_logger_processes Checking logger processes
Dec 31 18:44:16 XEN08 iscsi-ha: 5578 check_logger_processes No processes to clear
Dec 31 18:44:16 XEN08 iscsi-ha: Normalized ISCSI_TARGET_SERVICE [ tgtd ]
Dec 31 18:44:16 XEN08 iscsi-ha: XenServer Major Release = [ 7 ]
Dec 31 18:44:16 XEN08 iscsi-ha: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:16 XEN08 iscsi-ha: This iteration is count 944
Dec 31 18:44:16 XEN08 iscsi-ha: Checking if this host is a Pool Master or Slave
Dec 31 18:44:16 XEN08 iscsi-ha: This host's pool status = slave:10.200.2.120
Dec 31 18:44:16 XEN08 iscsi-ha: service_execute: Execute [ status ] on [ iscsi-ha ]
Dec 31 18:44:16 XEN08 iscsi-ha: service_execute: System V mode detected
Dec 31 18:44:16 XEN08 iscsi-ha: service_execute: [ OK ]#015iscsi-ha running: 7234
Dec 31 18:44:16 XEN08 iscsi-ha: service_execute: Returning exit status [ 0 ]
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 local_ip_list: Local IP list returned 127.0.0.1#01210.200.2.121#01210.10.10.2
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 service_execute: Execute [ status ] on [ tgtd ]
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 service_execute: systemctl mode being used
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 service_execute: ● tgtd.service - tgtd iSCSI target daemon#012 Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)#012 Drop-In: /etc/systemd/system/tgtd.service.d#012 └─local.conf#012 Active: inactive (dead)
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 service_execute: Returning exit status [ 3 ]
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 iSCSI target: tgtd status stopped. Expected Stopped . [inactive (dead)]
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101)#012srcversion: 2A6B2FA4F0703B49CA9C727 #012#012 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r
#012 ns:0 nr:1622317788 dw:1622317788 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 validate_drbd_resources_loaded: Checking DRBD has loaded with resources. Checking [ 5 ] > [ 2 ]
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 validate_drbd_resources_loaded: Resources loaded
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 check_drbd_resource_state: DRBD Resource: iscsi1 in Secondary mode
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 email: Duplicate message - not sending. Content = DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:16 XEN08 iscsi-ha: 5788 email: Message barred for 30 minutes
Dec 31 18:44:17 XEN08 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Pool Master 10.200.2.120 not responding - replug_pbd exiting
Dec 31 18:44:19 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:23 XEN08 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Dec 31 18:44:24 XEN08 ha-lizard: 3211 ha-lizard already running: Attempt 18 on PIDS: 3004 418
Dec 31 18:44:24 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:26 XEN08 iscsi-ha: 5751 Spawning new instance of iscsi-ha
Dec 31 18:44:26 XEN08 iscsi-ha: 5751 check_logger_processes Checking logger processes
Dec 31 18:44:26 XEN08 iscsi-ha: 5751 check_logger_processes No processes to clear
Dec 31 18:44:26 XEN08 iscsi-ha: Normalized ISCSI_TARGET_SERVICE [ tgtd ]
Dec 31 18:44:26 XEN08 iscsi-ha: XenServer Major Release = [ 7 ]
Dec 31 18:44:26 XEN08 iscsi-ha: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:26 XEN08 iscsi-ha: This iteration is count 945
Dec 31 18:44:26 XEN08 iscsi-ha: Checking if this host is a Pool Master or Slave
Dec 31 18:44:27 XEN08 iscsi-ha: This host's pool status = slave:10.200.2.120
Dec 31 18:44:27 XEN08 iscsi-ha: service_execute: Execute [ status ] on [ iscsi-ha ]Dec 31 18:44:27 XEN08 iscsi-ha: service_execute: System V mode detected
Dec 31 18:44:27 XEN08 iscsi-ha: service_execute: [ OK ]#015iscsi-ha running: 7234
Dec 31 18:44:27 XEN08 iscsi-ha: service_execute: Returning exit status [ 0 ]
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 local_ip_list: Local IP list returned 127.0.0.1#01210.200.2.121#01210.10.10.2
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 service_execute: Execute [ status ] on [ tgtd ]
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 service_execute: systemctl mode being used
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 service_execute: ● tgtd.service - tgtd iSCSI target daemon#012 Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)#012 Drop-In: /etc/systemd/system/tgtd.service.d#012 └─local.conf#012 Active: inactive (dead)
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 service_execute: Returning exit status [ 3 ]
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 iSCSI target: tgtd status stopped. Expected Stopped . [inactive (dead)]
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101)#012srcversion: 2A6B2FA4F0703B49CA9C727 #012#012 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r
#012 ns:0 nr:1622317788 dw:1622317788 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 validate_drbd_resources_loaded: Checking DRBD has loaded with resources. Checking [ 5 ] > [ 2 ]
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 validate_drbd_resources_loaded: Resources loaded
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 check_drbd_resource_state: DRBD Resource: iscsi1 in Secondary mode
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 email: Duplicate message - not sending. Content = DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:27 XEN08 iscsi-ha: 5946 email: Message barred for 30 minutes
Dec 31 18:44:28 XEN08 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Pool Master 10.200.2.120 not responding - replug_pbd exiting
Dec 31 18:44:29 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:33 XEN08 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Dec 31 18:44:34 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:34 XEN08 ha-lizard: 3211 ha-lizard already running: Attempt 19 on PIDS: 3004 418
Dec 31 18:44:37 XEN08 iscsi-ha: 5909 Spawning new instance of iscsi-ha
Dec 31 18:44:37 XEN08 iscsi-ha: 5909 check_logger_processes Checking logger processes
Dec 31 18:44:37 XEN08 iscsi-ha: 5909 check_logger_processes No processes to clear
Dec 31 18:44:37 XEN08 iscsi-ha: Normalized ISCSI_TARGET_SERVICE [ tgtd ]
Dec 31 18:44:37 XEN08 iscsi-ha: XenServer Major Release = [ 7 ]
Dec 31 18:44:37 XEN08 iscsi-ha: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:37 XEN08 iscsi-ha: This iteration is count 946
Dec 31 18:44:37 XEN08 iscsi-ha: Checking if this host is a Pool Master or Slave
Dec 31 18:44:37 XEN08 iscsi-ha: This host's pool status = slave:10.200.2.120
Dec 31 18:44:37 XEN08 iscsi-ha: service_execute: Execute [ status ] on [ iscsi-ha ]
Dec 31 18:44:37 XEN08 iscsi-ha: service_execute: System V mode detected
Dec 31 18:44:37 XEN08 iscsi-ha: service_execute: [ OK ]#015iscsi-ha running: 7234
Dec 31 18:44:37 XEN08 iscsi-ha: service_execute: Returning exit status [ 0 ]
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 local_ip_list: Local IP list returned 127.0.0.1#01210.200.2.121#01210.10.10.2
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 service_execute: Execute [ status ] on [ tgtd ]
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 service_execute: systemctl mode being used
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 service_execute: ● tgtd.service - tgtd iSCSI target daemon#012 Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)#012 Drop-In: /etc/systemd/system/tgtd.service.d#012 └─local.conf#012 Active: inactive (dead)
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 service_execute: Returning exit status [ 3 ]
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 iSCSI target: tgtd status stopped. Expected Stopped . [inactive (dead)]
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101)#012srcversion: 2A6B2FA4F0703B49CA9C727 #012#012 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r
#012 ns:0 nr:1622317788 dw:1622317788 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 validate_drbd_resources_loaded: Checking DRBD has loaded with resources. Checking [ 5 ] > [ 2 ]
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 validate_drbd_resources_loaded: Resources loaded
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 check_drbd_resource_state: DRBD Resource: iscsi1 in Secondary mode
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 email: Duplicate message - not sending. Content = DRBD Resource: iscsi1 in WFConnection state - expected Connected state
Dec 31 18:44:37 XEN08 iscsi-ha: 6108 email: Message barred for 30 minutes
Dec 31 18:44:38 XEN08 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Pool Master 10.200.2.120 not responding - replug_pbd exiting
Dec 31 18:44:39 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
Dec 31 18:44:43 XEN08 ha-lizard: ha-lizard Watchdog: ha-lizard running - OK
Dec 31 18:44:44 XEN08 iscsi-ha: iscsi-ha Watchdog: iscsi-ha running - OK
*********************

Please Log in or Create an account to join the conversation.

VM's failed to switch when one host failed with HA enabled 5 years 3 months ago #1764

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Several minutes of debug before this point would be helpful. All I can tell from this is that the main thread was active for a very long time, but cannot tell what it was doing.

Also, can you post your config.. The output of "ha-cfg get" would do.

Please Log in or Create an account to join the conversation.

VM's failed to switch when one host failed with HA enabled 5 years 3 months ago #1765

  • Sherbin George
  • Sherbin George's Avatar Topic Author
  • Offline
  • Posts: 11
~]# ha-cfg get
DISABLED_VAPPS=()
ENABLE_LOGGING=1
FENCE_ACTION=stop
FENCE_ENABLED=1
FENCE_FILE_LOC=/etc/ha-lizard/fence
FENCE_HA_ONFAIL=0
FENCE_HEURISTICS_IPS=10.200.2.1
FENCE_HOST_FORGET=0
FENCE_IPADDRESS=
FENCE_METHOD=POOL
FENCE_MIN_HOSTS=2
FENCE_PASSWD=
FENCE_QUORUM_REQUIRED=1
FENCE_REBOOT_LONE_HOST=0
FENCE_USE_IP_HEURISTICS=1
GLOBAL_VM_HA=1
HOST_SELECT_METHOD=0
MAIL_FROM="root@localhost"
MAIL_ON=1
MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"
MAIL_TO="root@localhost"
MGT_LINK_LOSS_TOLERANCE=5
MONITOR_DELAY=15
MONITOR_KILLALL=1
MONITOR_MAX_STARTS=20
MONITOR_SCANRATE=10
OP_MODE=2
PROMOTE_SLAVE=1
SLAVE_HA=1
SLAVE_VM_STAT=0
SMTP_PASS=""
SMTP_PORT="25"
SMTP_SERVER="127.0.0.1"
SMTP_USER=""
XAPI_COUNT=2
XAPI_DELAY=10
XC_FIELD_NAME='ha-lizard-enabled'
XE_TIMEOUT=10

Please Log in or Create an account to join the conversation.

VM's failed to switch when one host failed with HA enabled 5 years 3 months ago #1766

I can provide some additional details here, outside of the debug logs.

We're running XCP-NG 7.5 on both machines.

When the Master went down which is XEN07, XEN08 (slave) was still running with its VMs. I proceeded to transition the slave to master which worked. The problem was that we did not see the XEN07 VMs in the list to start.

At this point we ran the following commands in the article and we were able to recover the lost VMs to start them on XEN08 support.citrix.com/article/CTX132387

The error on XEN07 was basically operating system not found. I then used the XCP-NG net installer which actually found the previous installation, but it accidentally upgraded to 7.6 (I did not realize a 7.5 net install would still install the latest version). Ultimately the server did come up as a slave to XEN08, but with all configurations wiped. So now we have a partial pool upgrade.

The 7.5 installation is backed up, but I am not entirely sure if I go to restore it, if it will boot this time (I do not know if doing the upgrade and then reverting would fix the boot issue, or if I would be stuck with OS not found).

My main question is, if I were to restore 7.5 on XEN07 and it starts fine, will it still think its primary still, and still think the VMs that we reset and move? I absolutely cannot have any sort of corruption on this remaining host.

Or would it be safer to do a clean reinstall of 7.5 on XEN07? If so, what steps would be required to get synced with XEN08 so that VMs can be migrated back over?

Thanks so much for your help!

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2