Forum
Welcome, Guest
Username: Password: Remember me
This is the optional category header for the Suggestion Box.
  • Page:
  • 1

TOPIC:

High load and System Operations getting Freezed 4 years 7 months ago #1873

  • Sherbin George
  • Sherbin George's Avatar Topic Author
  • Offline
  • Posts: 11
Hi,

We have ran into a situation where the system operations getting freezed on HA-Lizard.

The situation started when we moved a VM onto a HA-Lizard Pool. It got stuck when attempting a start function. At one Point, I think xe-toolstack restart would fix this. But there was no luck.

Afterwards, I attempted a Snapshot on a running VM within the same pool and it made the VM to freeze and it become un-responsive. Even a force reboot was showing to be running forever.

Atlast, we rebooted Slave Host to see if it fixes the issue, but it didn't. We haven't touched Master yet. But Load is staying around 10.00 at an Average.

I am passing the information I've collected from Both Master and Slave. Can a Master reboot can bring things aback to normal here?

**********************
Master

[root@XEN05 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.5 (api:1/proto:86-101)
srcversion: D496E56BBEBA8B1339BB34A
m:res cs ro ds p mounted fstype
1:iscsi1 StandAlone Primary/Unknown UpToDate/DUnknown r


[root@XEN05 ~]# ha-cfg status
| ha-lizard Version: 2.1.4 |
| Operating Mode: Mode [ 2 ] Managing Individual VMs in Pool |
| Host Role: master |
| Pool UUID: 33b3ff3e-c123-2f54-9c1e-bbe748d7db51 |
| Host UUID: f2b699a9-de94-4b8c-8443-d53672b8a49d |
| Master UUID: f2b699a9-de94-4b8c-8443-d53672b8a49d |
| Daemon Status: ha-lizard is running [ OK ] |
| Watchdog Status: ha-lizard-watchdog is running [ OK ] |
| HA Enabled: false |


Xen05@:iscsi-cfg status
| iSCSI-HA Version IHA_2.1.5 |
| Sat Aug 24 14:38:52 EDT 2019 |

| iSCSI-HA Status: Running 8644 |
| Last Updated: Sat Aug 24 14:38:50 EDT 2019 |
| HOST ROLE: MASTER |
| DRBD ROLE: iscsi1=Primary |
| DRBD CONNECTION: iscsi1 in StandAlone state |
| ISCSI TARGET: Running [expected running] |
| VIRTUAL IP: 10.10.11.3 is local |
Control + C to exit


| DRBD Status |

| version: 8.4.5 (api:1/proto:86-101) |
| srcversion: D496E56BBEBA8B1339BB34A |
| 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r
|
| ns:0 nr:0 dw:334633404 dr:1893486048 al:33628704 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:3117818488 |


Slave

[root@XEN06 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.5 (api:1/proto:86-101)
srcversion: D496E56BBEBA8B1339BB34A
m:res cs ro ds p mounted fstype
1:iscsi1 WFConnection Secondary/Unknown UpToDate/DUnknown C


[root@XEN06 ~]# ha-cfg status
| ha-lizard Version: 2.1.4 |
| Operating Mode: Mode [ 2 ] Managing Individual VMs in Pool |
| Host Role: slave |
| Pool UUID: 33b3ff3e-c123-2f54-9c1e-bbe748d7db51 |
| Host UUID: d481ed26-3d0c-406a-bd92-d61d63d5ca3b |
| Master UUID: f2b699a9-de94-4b8c-8443-d53672b8a49d |
| Daemon Status: ha-lizard is running [ OK ] |
| Watchdog Status: ha-lizard-watchdog is running [ OK ] |
| HA Enabled: false |


Xen06@:iscsi-cfg status

| iSCSI-HA Version IHA_2.1.5 |
| Sat Aug 24 14:40:13 EDT 2019 |

| iSCSI-HA Status: Running 5403 |
| Last Updated: Sat Aug 24 14:40:07 EDT 2019 |
| HOST ROLE: SLAVE |
| VIRTUAL IP: 10.10.11.3 is not local |
| ISCSI TARGET: Stopped [expected stopped] |
| DRBD ROLE: iscsi1=Secondary |
| DRBD CONNECTION: iscsi1 in WFConnection state |
Control + C to exit


| DRBD Status |

| version: 8.4.5 (api:1/proto:86-101) |
| srcversion: D496E56BBEBA8B1339BB34A |
| 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r
|
| ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:34692 |

Please Log in or Create an account to join the conversation.

High load and System Operations getting Freezed 4 years 7 months ago #1874

Sherbin, I ran into a similar problem and later discovered the root cause was the xcp-emu-manager package version on my XCP-ng 7.6 hosts. Theres a problem with the version 0.0.3 or 0.0.5.... I cant remember exactly, but I know you must update it to avoid get VM stuck during an Live Migration.

The latest version of xcp-emu-manager is 1.1.2. I think you should give a try.

Other situation I`ve seen was an the VM processing get stuck on wait state after live migrate it to the Slave host (I`ve openned the VM console on XOA and watched the CPU usage on nmon). Moving the VM back to the Master host brought VM to normal operation.

This very same issue was solved after updated all hosts entirely with an "yum update".

Regards

Regards

Please Log in or Create an account to join the conversation.

Last edit: by Fabio Brizzolla.

High load and System Operations getting Freezed 4 years 7 months ago #1875

  • Sherbin George
  • Sherbin George's Avatar Topic Author
  • Offline
  • Posts: 11
Thank you for you reply Fabio.

We are using 7.1 on both of the Hosts where the problem exists, and I don't think the xcp-emu-manager package is installed over there. We have another Pool with 7.5 installed and there we have xcp-emu-manager.

~~~
[root@XEN05 ~]# rpm -qa | grep -i emu
emulex-be2net-11.1.196.0-1.x86_64
qemu-xen-2.2.1-4.36786.x86_64
emulex-lpfc-11.1.210.1-1.x86_64
~~~

Please Log in or Create an account to join the conversation.

High load and System Operations getting Freezed 4 years 7 months ago #1876

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
it looks like your replication link is not working (as shown in the status of both hosts (5 and 6). This could prevent any VMs from running on the slave if the link issue is IP related.

You can check whether the issue is IP related by pinging the floating replication IP and peer replication IP from each host. If there is a connectivity issue AND you are using a bonded active/active link for replication, try momentarily unplugging one of the replication Ethernet ports. This is known to clear a linux ARP issue that sometimes appears when a linux bridge is stacked on top of a linux bonded link.

If the above is not the issue, then you could have a split brain situation. You can recover from this with a tool in /etc/iscsi-ha/scripts/drbd-sb-tool. Keep in mind, that this tool will perform a complete sync from the surviving node to the peer which could take several hours, depending on the link speed and size of the storage. The system will continue to operate while this is happening in the background.

Please Log in or Create an account to join the conversation.

High load and System Operations getting Freezed 4 years 7 months ago #1877

  • Sherbin George
  • Sherbin George's Avatar Topic Author
  • Offline
  • Posts: 11
Hi Salvatore,

We would see the Floating IP is pinging from Both Hosts.

[root@XEN05 ~]# ping 10.10.11.3
PING 10.10.11.3 (10.10.11.3) 56(84) bytes of data.
64 bytes from 10.10.11.3: icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from 10.10.11.3: icmp_seq=2 ttl=64 time=0.043 ms

[root@XEN06 ~]# ping 10.10.11.3
PING 10.10.11.3 (10.10.11.3) 56(84) bytes of data.
64 bytes from 10.10.11.3: icmp_seq=1 ttl=64 time=0.162 ms
64 bytes from 10.10.11.3: icmp_seq=2 ttl=64 time=0.182 ms

Also in the current situation 4 VM's are running on Slave and 1 VM on Master(Load Average is still staying around 10.00 at an average). But we haven't performed any system operations, thinking it might break things again.

Does a Reboot of Master brings up things back to sync rather than considering the drbd-sb-tool tool, while keeping all VM's under slave? Or do you think rebooting master here will affects VM's running on Slave too?

Please Log in or Create an account to join the conversation.

High load and System Operations getting Freezed 4 years 7 months ago #1878

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
drbd-sb-tool is only to recover from a DRBD split brain (which is not common). It is not necessary when executing maintenance operations such as a reboot. In your case, after the reboot, the hosts should resync within a few seconds.

to reboot the master, you must put the storage into manual mode so that it can be exposed on the slave while the master reboots. Here are the steps. Before performing the below, migrate all VMs to the slave

1) Disable HA - this can be done from either host. Only needs to be done on a single host
ha-cfg ha-disable

2) Put each host into manual mode. This needs to be done on both hosts
iscsi-cfg manual-mode-enable

The next 2 steps must be performed with minimal delay in between.

3) Put the master into secondary mode
iscsi-cfg become-secondary

4) Put the slave into primary mode
iscsi-cfg become-primary

It is now safe to reboot the master. Once the master has rebooted, perform the following to put the pool back into normal operational mode.

The next 2 steps must be performed with minimal delay in between.

5) Put the slave into secondary mode
iscsi-cfg become-secondary

6) Put the master into primary mode
iscsi-cfg become-primary

7) On both hosts - exit manual mode
iscsi-cfg manual-mode-disable

8) re-enable HA
ha-cfg ha-enable

Please Log in or Create an account to join the conversation.

  • Page:
  • 1