Forum
Welcome, Guest
Username: Password: Remember me
This is the optional category header for the Suggestion Box.
  • Page:
  • 1

TOPIC:

Upgrade 8.1 XCP-NG to 8.2 with halizard installed 11 months 1 week ago #2864

  • Christoph Christ
  • Christoph Christ's Avatar Topic Author
  • Offline
  • Posts: 9
Hello guys,

This weekend, I was upgrading my Xcp-NG Cluster from already unsupported 8.1 to 8.2 with reinstallation of iscsi-ha and halizard.

I had a minor issue concerning the missing dependency for drbd84 and the missing sysvinit package. but then the reinstallation was working flawlessly. All fine. all servers up. Wonderful documentation, very good, streight forward Steps and Howto - thumbs up!

But the problems were coming afterwards :-) After I had xen orchestra again connected It asked me, to install 45 patches. Very nice, thats what we want theat software for - so I did it. Afterwards the pool requested me to reboot the machines to get the patches live.

I did it with XOA - everything has been migrated to the secondary host, yet listed as slave. then the reboot took place. Now the second machine had been made pool master. And it remains pool master, whatever I do. But even worse: The iscsi-ha drbd mirror jhas lost synchronization and resulted in a split brain situation with half the vms running on the rebooted server (already not pool master) and half on the new master with the split brain feature. Two of them unresponsive with the message "VDI not availabe error". Oh bugger....

Because of ha-lizard keeping the bad vms restarting all the time, I could not do anything about it. Fiirst I had to deactivate with ha-cfg ha-disable the ha feature. Then the two bad vms remained shutdown. Then start "recovery reboot" on advanced tab of XEN Orchestra - which btw is not even available on Xen-Center Windowssoftware (there I could not revitalize the dead vms). This recovery restart resulted in the uefi shell, showing me the available partitions, Then I did a normal reboot and the bad vms came up again normally. and are working again.

Then I decided to have all working machines on one server and do split brains recovery now. Started all vm on the host, that had been rebooted and did the following on my "primary" - -i,e. the server that had been denominated pool master:
drbdadm connect --discard-my-data issci1
and on the survicor:
drbdadm connect iscsi1
which resulted in resync of both xcp-ng servers.

Then I had to nomiate the primary which was actually the slave again as slave and the master as primary. then I could deactivate manual mode and iscsi-ha started working agian.

But I do still have the situation, that the second server is yet the master although it had been rebooted several times and thus should have lost that.

So I connected to the "slave", and used xsconsole Resource Pool Configuration to denominate a new pool master.

That also worked, and now I do have again a working cluster again.

But there is a question remaining - do you guys know about a proper way of migrating or evacuating vms to the pool slave without "Migrate" - because in the iscsi-ha solution we do have the data mirrored, so failing over should be a matter of seconds - not vm migration with copying data, which is already there. leading to possible split brain situation if the speed of the connection is not fast enough (never ever set your node to mainenance mode. Should I put the "HA" checkbox on the respective vm to notify xcp that this is a HA VM?

Kind regards
Christoph

Please Log in or Create an account to join the conversation.

Upgrade 8.1 XCP-NG to 8.2 with halizard installed 11 months 1 week ago #2865

Dear Christoph,

uhhh, hopefully it was a testing cluster and not production. I could imagine how you might have felt during this long journey.

My thoughts and advise to you:

1. Read the last page of reference design document
www.halizard.com/images/pdf/iscsi-ha_2-n...Server_7.x_final.pdf

2. do not turn on HA using XenCenter (XCP-NG Center) or XOA! Never do that!

3. Follow strictly the advice mentioned under 1.

XOA is a wonderful tool for managing and monitoring VMs but it does not know anything from HA-Lizard.

For applying patches I awlays follow the mentioned "Appendix A - Example Maintenance Operations"

I do that in the CLI of each host and for VMs with huge storage I do shutdown them instead of live migration and boot them from the second host, it is much faster and perfectly save.

Please Log in or Create an account to join the conversation.

Last edit: by ajmind.

Upgrade 8.1 XCP-NG to 8.2 with halizard installed 11 months 1 week ago #2866

  • Christoph Christ
  • Christoph Christ's Avatar Topic Author
  • Offline
  • Posts: 9
Of course it was the production cluster. :-)

The update followed exactly this guide as I wrote below. Therefore it worked flawlessly. As I wrote. the problems started, when using the "autorestart" functionality of XOA adter applying all patches... and that was the beginning of the frenzy, which has not yet stopped. because some vm experienced storage defects and need to be reinstalled.

I definitely think, the best way how to do reboots is:

1 - shutdown and restart vm's on the slave
2 -follow the described steps in your link to keep the downtime as low as possible to avoid any split brain scenaro, which happend to me everytime. I rebooted any of my servers... possibly also migrate to the other server (but the data is alreasdy there. so "migration" is nonsense imho)
3 - do never ever use the auto reboot features of xen orchestra. which does not know. how to handle a ha-lizard reboot scenario and crashes the servers + even worse the machines.

KR
Christoph

Please Log in or Create an account to join the conversation.

Upgrade 8.1 XCP-NG to 8.2 with halizard installed 11 months 1 week ago #2867

  • Christoph Christ
  • Christoph Christ's Avatar Topic Author
  • Offline
  • Posts: 9
Yet another question, to clarify another problem for me:

The steps are:
1) deactivate ha (ha-cfg ha-disable)
2) enter on both hosts "manual mode" iscsi-cfg manual-mode-enable
3) migrate all vmd to the pool slave
4) demote the pool masters storage to secondary and the slaves to primary

My Question: Iisn't the drbd iscsi1 link already deactivated in step 2 and thus I cannot do the trick with shut down and restart on the slave host?
.....

Please Log in or Create an account to join the conversation.

Last edit: by Christoph Christ.

Upgrade 8.1 XCP-NG to 8.2 with halizard installed 11 months 1 week ago #2868

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Hi Christoph,

When the pool is operating normally, it is assumed that there is always a pool master (HA provides this should the pool master fail, it will promote a new pool master). So, the shared pool storage is only exposed on the pool master, so.

In response to your question regarding step 2 of your post, the DRBD device is not deactivated at all when entering manual mode; Entering manual mode on both hosts simply stops the above logic, allowing you to manually choose where the storage is exposed. For example, to reboot the master, enter manual mode on both hosts and then demote the master to secondary and promote the slave to primary. This will ensure that the VMs being migrated to the slave will continue to operate while the master is busy rebooting.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1