Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1

TOPIC:

Temporary Network Loss, Slave Keeps Rebooting 7 years 3 months ago #1135

  • tyh-chris
  • tyh-chris's Avatar Topic Author
  • Offline
  • Posts: 21
Hi,

We temporarily lost network connectivity when our switch rebooted. The master is fine, but the slave now continually reboots. It gets into XenServer 7.0, shortly afterwards it reboots without getting the Management Network online. I can't get into the shell from the console, either because it reboots before I get chance.

Both hosts are running XenServer 7.0 and have HA-Lizard 2.1.0 installed.

iscsi-cfg status shows iSCSI running on the master, and VMs are fine. ha-cfg status shows ENABLED. I can't run either command on the slave due to the reboots.

Please Log in or Create an account to join the conversation.

Temporary Network Loss, Slave Keeps Rebooting 7 years 3 months ago #1136

  • tyh-chris
  • tyh-chris's Avatar Topic Author
  • Offline
  • Posts: 21
I just managed to get the command ha-cfg status executed on the slave before it rebooted. I saw the word "TIMEOUT" in red letters, then it rebooted again.

I've now powered down the slave.

Please Log in or Create an account to join the conversation.

Temporary Network Loss, Slave Keeps Rebooting 7 years 3 months ago #1138

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
The slave will reboot only if it self fences, which it would do if switch connectivity was lost for more than ~30 seconds. It should reboot only once.

It's hard to tell if ha-lizard is causing the reboots at this point. If you could manage to get back to the shell before a reboot, enter this:

"> /etc/ha-lizard/state/fenced_slave && sync"
chkconfig ha-lizard-watchdog off
chkconfig ha-lizard off

It may reboot once more and then stop if ha-lizard is the problem.
If it doesn't stop the reboot loop after this, something else is causing the reboots..

Please Log in or Create an account to join the conversation.

Temporary Network Loss, Slave Keeps Rebooting 7 years 3 months ago #1140

  • tyh-chris
  • tyh-chris's Avatar Topic Author
  • Offline
  • Posts: 21
Thanks for the help.

Typing those commands did stop the reboot cycle, and I would say that switch connectivity was indeed lost for more than 30 seconds.

I have a couple of things I need to resolve now:

1) I need to get the slave working with the master again with ha-lizard. Typing "ha-cfg status" tells me that the daemon and Watchdog are not running as expected, but it also has a yellow notice telling me I can recover the fenced host. I don't want to enable or run anything until I have your input.

2) The local console on the problematic slave doesn't appear to have xapi; I can't see the management interface, I can't view VMs in the pool, etc. Bizarrely, though, the slave is connected to the pool and can be managed in XenCenter. I can also enter the command shell from the broken interface, run "xsconsole" and this instance has xapi.

Please Log in or Create an account to join the conversation.

Temporary Network Loss, Slave Keeps Rebooting 7 years 3 months ago #1144

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
If you are getting the yellow warning, then it should be OK to start the services again. The warning is driven from a file HA-Lizard places on the disk to prevent it from constantly rebooting. This is placed on the disk only when the host self fences (as in your case).

Before self fencing, any pending disk writes are flushed and synced - meaning, anything that should be written to the disk is flushed from cache and physically written to the disk. It seems, in your case, the sync did not really happen (which caused the host to constantly reboot). There is likely some form of disk cache (outside of dom0) which is preventing syncing of disk writes before. This will solve the rebooting problem in the future.

Regarding the other issue with xapi not responding.. That could be a network issue on the slave or something else outside of halizard. At a minimum try restarting the toolstack on the slave.

Please Log in or Create an account to join the conversation.

Temporary Network Loss, Slave Keeps Rebooting 7 years 3 months ago #1148

  • tyh-chris
  • tyh-chris's Avatar Topic Author
  • Offline
  • Posts: 21
Thanks for the reply.

Both hosts have an HP P410i Smart Array controller with battery backed up write cache. As far as I know, this cannot be disabled. Do you think this is the cause of the problem?

Regarding the XAPI problem, I did try a toolstack restart, but sadly this didn't change anything.

I'm concerned that a power cut in future could create a similar situation. Seeing as how it appears to be the loss of the Management network that has caused the issue (each of which are connected to a switch that went down), perhaps I could create a bond for Management. Two ports for each host, with one pair connected directly to each other (like the DRBD network), and the other pair connected to a switch for access externally. If the switch goes down for any reason, the hosts can hopefully still see each other. Would that be feasible?
The following user(s) said Thank You: Rob Hall

Please Log in or Create an account to join the conversation.

  • Page:
  • 1