Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1

TOPIC:

iscsi-ha, recovering from a failed host 8 years 8 months ago #494

Hello:

Recently we had one of our two hosts fail in the 2-node pool running HA-Lizard 1.7.8, on XenServer 6.2. I had to change the motherboard and reconfigure the NICs in order for it to boot and talk to its network interfaces. After the NICs were set correctly, everything seemed to come back up, except that in XenCenter, the storage volume that is connected to the iSCSI target for HA-Lizard was showing offline and "unplugged". When I tried to click "Repair" in XenCenter, it said it failed due to an "unknown error". I verified that the DRBD volume for this was good, it was syncing, and the iscsi-ha service was running just fine. HA-Lizard was also running just fine and enabled. The virtual IP address for iscsi-ha was pingable and I could connect to it at the TCP port 3260 as a test, from both XenServer hosts. Nothing I did would get XenCenter to connect it back. Eventually, I just shut down all VMS, and shut down both XenServer hosts, and booted them back up together, and the volume then came back online.

My question is, what is best practice for bringing up the storage volume(s), when a host fails like this? Ideally I would think we should be able to bring up the failed host, get DRBD talking, and have XenCenter somehow re-connect without having to reboot the good XenServer host.

Thanks much for the help.

Please Log in or Create an account to join the conversation.

iscsi-ha, recovering from a failed host 8 years 8 months ago #495

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
It is hard to tell what exactly caused your issue. We have occasionally experienced the same thing with XenServer when using external iSCSI SR (non ha-lizard enviro).

Regarding best practice for re-introducing a host. There is really nothing out of the ordinary required.

Please Log in or Create an account to join the conversation.

iscsi-ha, recovering from a failed host 8 years 8 months ago #496

Ok thanks for the reply. Is there a way that you know of, to recover iscsi-ha when it gets in to that weird state? I did try restarting that service, but that did not help. Or does it seem like more of a XenServer issue rather than with iscsi-ha? I've tried everything I have come across, and as I mentioned none of that would work until I did a complete reboot of the entire pool.

Thanks for your help.

Please Log in or Create an account to join the conversation.

iscsi-ha, recovering from a failed host 8 years 8 months ago #497

I had the same issue happen yet again, this time I was not able to get the storage to come back online, until several reboots later. I saw another post on this forum about this very same issue, and went through the steps. I found that the file /etc/tgt/targets.conf was not exactly the same on both hosts.

On one host, I had the IQN as:
<target iqn.2014-11.com.ourdomain.com:server3>

And on the other host as:
<target iqn.2014-11.com.ourdomain.com:server4>

I noticed in XenCenter, it has this as the IQN: iqn.2014-11.com.ourdomain.com:server3

So I changed targets.conf on the second host so that it's exactly the same, and shows what XenCenter shows.

Would this make any difference? After matching it up, I rebooted again and the storage would not come back online, until I restarted the tgtd service on the iSCSI master host (this was one of the steps in the other post) , and it seemed to connect after that when I clicked on the "Repair" button in XenCenter.
If you can confirm the targets.conf file, that would be great. And if you have any further ideas on anything I can check. I have gone through our setup with your documentation and I think everything else is set correctly. We've been running this pool for about 8 months and the only time there's an issue is when it has to be rebooted or shut down for any reason (extended power outage, etc). Thanks for your help.

Please Log in or Create an account to join the conversation.

Last edit: by cn480.

iscsi-ha, recovering from a failed host 8 years 8 months ago #499

  • mb
  • mb's Avatar
Yes. Targets.conf ‎ must be identical for the storage to work correctly on both hosts

Please Log in or Create an account to join the conversation.

iscsi-ha, recovering from a failed host 8 years 8 months ago #500

mb wrote: Yes. Targets.conf ‎ must be identical for the storage to work correctly on both hosts


OK great, thank you. I will have to monitor going forward. I've seen a couple of other references to similar issues in this forum where storage goes offline, and restarting tgtd may be the solution which is good enough for me for now. Thanks.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1