Forum
Welcome, Guest
Username: Password: Remember me
This is the optional category header for the Suggestion Box.
  • Page:
  • 1
  • 2

TOPIC:

Testing HA lizard 6 years 9 months ago #1360

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
I've got HA Lizard installed according to the youtube video. I have made sure HA-lizard is enabled as well as HA enabled. Unfortunately there is no video showing a production case of server failure for HA lizard so I figured we need to test it in production to see the results.

My expectations were (and may as well be wrong, so please correct me):

- Master goes down (cold boot), within Xencenter I should still have access to the pool, 2nd server should automatically become master and all running VM's on master should now be booted up on slave (or the now new master)

- Once the failed server comes back up it should automatically become slave but should not hinder the overall process of the pool.

- There should not be any hinderance to the scsi storage and generally failure should be mitigated without human intervention.

---
Now, it's important to note that I am very new to most of the technologies presented within the HA suite and that my installation was done by trial and error, so there very well may be something wrong with my config which is leading to the results below:

I moved a test VM to the pool and started it and then cold booted the master server via idrac to test the HA feature. After about 30 seconds the pool dissapeared entirely from XenCenter (note that I restarted master) and I was unable to connect to the pool until master was back online (and booted into xenserver)

Once master came back online was I able to reconnect xencenter, it has then made my slave master (as expected) however, the shared storage is broken indicating that slave is unplugged and the test VM is nowhere to be found (probably as result of the broken iSCSI adapter). It's important to note that master (which in my opinion should now be running the VM) is connected to the iSCSI adapter so should be able to run the vm.

I then restarted the new master and again, the same process as above happened. I was unable to connect to the pool via xencenter and only once the new master came back online was I able to connect to xencenter again. The storage is still missing and indicating unplugged by slave which points that it's not a single server being unable to connect to the shared storage but slave.

As of this point the scsi storage is still missing and the vm isn't running (which would be somewhat of a disaster in production)

---

This leads me to believe that:

1. My installation may be broken
2. There are additional steps that need to be completed post installation (so not covered in the youtube video)
3. I'm missing something :)

I have not made any post installation changes (so native HA is still disabled in XenCenter) and I have not nominated that the VM must be HA enabled. Are these maybe steps I'm missing?

Please Log in or Create an account to join the conversation.

Testing HA lizard 6 years 9 months ago #1361

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
I've gone through the documentation and noted that I had to enable HA within xencenter. I redid the installation of HA lizard, enabled ha and made sure everything was in order.

Did a restart of the master server and now I'm unable to connect to the pool at all. When I try to connect to the pool it seems to think the old host is still the master so it's unable to connect from within xencenter.

On console I can also see that the storage has dissapeared (iscsi) from the host server and the VM is nowhere to be found (is currently off).

Please Log in or Create an account to join the conversation.

Testing HA lizard 6 years 9 months ago #1362

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
Decided to make a trip out to the DC today and redid both hosts from start to finish. I followed the installation and everything worked first time around.

I also enabled HA in Xencenter for the 2 hosts as that appears to be a step I missed with the previous installation.

The drives are syncing now so will let it finish before I do any further testing.

Please Log in or Create an account to join the conversation.

Testing HA lizard 6 years 9 months ago #1363

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
I've gone back to the DC, reinstalled both hosts xenserver (7.2) along with HA lizard. Installation went perfectly, no issues. iSCSI also worked first time around and waited 25 hours for the sync to complete.

I have also enabled HA in Xencenter as that is how I understood the documentation. I am honestly not sure if this is a prereqisite but as I did not do this in a previous installation and the entire pool would go down if a server was rebooted, it makes logical sense.

Some tests I figured would be worth completing before considering for production are:

1. Testing maintenance mode (as per documentation)
2. Restarting the slave server (cold boot)
3. Restarting the master server (cold boot)
4. Shutdown of both servers, starting master first (to test the scenario)
5. Shutdown of both servers, starting slave first (to test the scenario)

Please Log in or Create an account to join the conversation.

Testing HA lizard 6 years 9 months ago #1364

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
1. Testing maintenance mode (as per documentation):

Server names (for simplicity)
Default Master = server1
Default Slave = server2

I have noted a couple of other posts as well regarding this particular issue and could not figure out if the issue has been addressed within HAL yet. I am going to assume it's related to the other posts.

I've tested putting master (server1) into maintenance mode so that I can enable multipathing. I followed the steps in the documentation exactly as it should be (changing HA to manual mode etc) - The only additional step I had to take was to also in Xencenter disable HA as I could not put the server1 server in maintenance mode as it was the pool master.

Upon restart of the server1 server, I was able to turn it off maintenance mode but the iSCSI repository could not connect back to the server1 server. I've left it over 6 hours hoping it will fix itself but no prevail. Server2 (new master) is connected to the SR but that's where it ends.

When I try to repair the broken SR (from within xencenter) the process does not complete (hangs). I can however ping the floating IP from both the servers so there should be no obvious network connection issue.

Some logs (new master / old slave first / server2):

ha-cfg status
| ha-lizard Version: 2.1.3 |
| Operating Mode: Mode [ 2 ] Managing All VMs in Pool |
| Host Role: master |
| Pool UUID: 39e2093b-845b-5273-5ecc-0e2499acad05 |
| Host UUID: 90053162-6b09-4f32-8cdf-305125d49f21 |
| Master UUID: 90053162-6b09-4f32-8cdf-305125d49f21 |
| Daemon Status: ha-lizard is running [ OK ] |
| Watchdog Status: ha-lizard-watchdog is running [ OK ] |
| HA Enabled: false |
Pool HA Status: DISABLED
ha-lizard is disabled. Enable? <yes or Enter to quit>

iscsi-cfg status

| iSCSI-HA Version IHA_2.1.4_29881 |
| Tue Jul 18 08:40:55 SAST 2017 |
----
| |
----
Control + C to exit


| DRBD Status |

| version: 8.4.5 (api:1/proto:86-101) |
| srcversion: 2A6B2FA4F0703B49CA9C727 |
| 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r
|
| ns:0 nr:0 dw:0 dr:152 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 |

drbd-overview
1:iscsi1/0 StandAlone Primary/Unknown UpToDate/DUnknown


OLD MASTER / NEW SLAVE / server1:


ha-cfg status
| ha-lizard Version: 2.1.3 |
| Operating Mode: Mode [ 2 ] Managing All VMs in Pool |
| Host Role: slave |
| Pool UUID: 39e2093b-845b-5273-5ecc-0e2499acad05 |
| Host UUID: b5cdce68-dce6-4882-a781-dd43b6fda13d |
| Master UUID: 90053162-6b09-4f32-8cdf-305125d49f21 |
| Daemon Status: ha-lizard is running [ OK ] |
| Watchdog Status: ha-lizard-watchdog is running [ OK ] |
| HA Enabled: false |
Pool HA Status: DISABLED
ha-lizard is disabled. Enable? <yes or Enter to quit>

iscsi-cfg status
| iSCSI-HA Version IHA_2.1.4_29881 |
| Tue Jul 18 08:46:13 SAST 2017 |

| iSCSI-HA Status: Running 2489 |
| Last Updated: Tue Jul 18 08:46:06 SAST 2017 |
| HOST ROLE: SLAVE |
| VIRTUAL IP: 10.4.0.3 is not local |
| ISCSI TARGET: Stopped [expected stopped] |
| DRBD CONNECTION: iscsi1 in state |
Control + C to exit


| DRBD Status |

| version: 8.4.5 (api:1/proto:86-101) |
| srcversion: 2A6B2FA4F0703B49CA9C727 |

drbd-overview
1:iscsi1/0 Unconfigured . .

THIS DOES NOT LOOK RIGHT?

----

Any ideas?

Please Log in or Create an account to join the conversation.

Last edit: by Mauritz.

Testing HA lizard 6 years 9 months ago #1365

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
I have taken the following steps (as outlined in the post at www.halizard.com/forum/software-support/...-unplugged-on-reboot):

On the server which is currently NOT connected to the iSCSI storage:

chkconfig iscsi-ha-watchdog off && chkconfig iscsi-ha off
REBOOTED and waited for about 10 minutes after the server came back online
systemctl start iscsi-ha

No difference. SR remains disconnected and trying a repair from within xencenter appears to stall the operation. Still able to ping the floating IP and other other DRDB IP's

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2