Forum
Welcome, Guest
Username: Password: Remember me

TOPIC:

HA-Lizard supported PV not available 5 months 4 weeks ago #3025

Dear Savatore;

Here is the output from the master with iscsi-cfg manual-mode-enable
last login: Sun Oct 29 16:55:05 2023 from 192.168.20.54
[21:38 IT2XCP-NG-MASTER1 ~]# systemctl status tgtd 
● tgtd.service - tgtd iSCSI target daemon
   Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/tgtd.service.d
           └─local.conf
   Active: active (running) since Sun 2023-10-29 14:35:50 CET; 7h ago
  Process: 7368 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v ready (code=exited, status=0/SUCCESS)
  Process: 7301 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG (code=exited, status=0/SUCCESS)
  Process: 7298 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
  Process: 6953 ExecStartPost=/bin/sleep 5 (code=exited, status=0/SUCCESS)
 Main PID: 6952 (tgtd)
   CGroup: /system.slice/tgtd.service
           └─6952 /usr/sbin/tgtd -f
[21:39 IT2XCP-NG-MASTER1 ~]# systemctl restart tgtd 
[21:39 IT2XCP-NG-MASTER1 ~]# blkid 
/dev/sda1: LABEL="root-hlgnmj" UUID="f52447ce-cea3-44e8-86a4-1d2ab98da098" TYPE="ext3" PARTUUID="b3b6b29b-814a-4564-8dc5-88d6e556f290" 
/dev/sda3: UUID="h0N0HV-wfbn-WpSf-Cc03-Dp2f-qRPz-j29723" TYPE="LVM2_member" PARTUUID="6118b5ae-d778-4973-9927-61777d596806" 
/dev/sda5: LABEL="logs-hlgnmj" UUID="824debcc-f29d-4081-99d0-ed92933b01fc" TYPE="ext3" PARTUUID="989c2b87-c8ea-4633-92cb-81307eec547d" 
/dev/sda6: LABEL="swap-hlgnmj" UUID="b042269f-0a4e-491d-8eec-93b53ce55868" TYPE="swap" PARTUUID="01b71e1c-2020-4ec4-a4c2-2ba99427c29f" 
/dev/sdb: UUID="a8457c76650ccb45" TYPE="drbd" 
/dev/drbd1: UUID="3cxgfb-3W4w-Dx2Q-l1p2-4KpH-kjn7-mfAyjS" TYPE="LVM2_member" 
/dev/sda2: PARTUUID="41bcab07-fa08-49ec-b738-4c1c9c7559ca" 
/dev/sda4: PARTUUID="d4f4703d-d47a-494c-bafd-e263214f0c6b" 
[21:39 IT2XCP-NG-MASTER1 ~]# exit

Yes it seems to be that TGT is running but no sdc is visible?


Update, I have restarted tgtd and suddenly sdc with the VG was visible, even that I had restarted both pool members twice.

Now I have to check if all VMs and services are running as normal.

I will revert tomorrow anfter a long sleep ;-)


After a quick check everything runs as it should, so I am happy so far.

Next weekend I will check why this had happend and if there is a connection with the XCP-NG security updates.
The tgtd daemon was starting and running after various reboots but had not presented the configured drive /dev/sdc to the OS.

Next weekend only because we run our production 24/5.

BR Andreas

Please Log in or Create an account to join the conversation.

Last edit: by ajmind.

HA-Lizard supported PV not available 5 months 2 weeks ago #3027

Dear Salvatore,

I have repeated the use case to bring both hosts into the manual mode in order to apply patches and to reboot a host following such updates.
As it is a production system I am limited to perform extensive testing.

Steps to reproduce the problem:
1. Disable HA function with ha-cfg status > yes
2. iscsi-cfg manual-mode-enable on primary and secondary node
3. shutdown all VMs needed for production and other VMs, just as a safety measurement
4. move one remaining Linux VM to secondary node
5. switch role from primary and secondary node (first on primary host iscsi-cfg become-secondary, on secondary node 15 seconds later iscsi-cfg become-primary)
6. reboot primary node - still poolmaster (at this stage I had the last weekend with yum update applied the October patches from XCP-NG before the reboot)
7. after reboot switch role from secondary and primary node (first on secondary host iscsi-cfg become-secondary, on primary node 15 seconds later iscsi-cfg become-primary) - at this stage the running VM is losing its backing device with reported IO errors and read only access.

iscsi-cfg status reports the intended status
drbdadm shows the intended status
tgtd is running on primary node - preset disabled as intended
tgtd is not running on secondary node - preset disabled as intended

backing device /dev/sdc is not visible on primary but on secondary node
Migration of the running VM to the primary node is not possible

After systemctl restart tgtd on primary node and shutdown of the running VM, everything goes back to normal behavior.
(Live migration works as intended!)

so what can be done to eliminate the problem and return to the previous well working status.

I assume that the October patches from XCP-NG with the implemented changes are responsible for the problem:

Statement from the patch notes:
"To proactively prevent future vulnerabilities that may leverage the privileges of these components, we have now implemented privilege reduction for them."

I have fears that the main function of HA-Lizard with HA-ISCSI will fail in a real disaster case, (hardware failure).
I would like to offer our help in finding the root cause of the problem, but I am limited to the weekend as our two pools are running in production 24/5. So I have no testing lab available.

Thank you for support and you could rest assured that we will donate for your work, depending on your needs against a commercial invoice or any other way your prefer.

Best regards Andreas

Please Log in or Create an account to join the conversation.

Last edit: by ajmind.

HA-Lizard supported PV not available 5 months 2 weeks ago #3028

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Hi Andreas,
Thank you for the detailed test procedure. I will try to reproduce it in our dev environment after upgrading to the latest XCP-ng patches.
I may be a couple of weeks before I have any update on the issue.

Please Log in or Create an account to join the conversation.

HA-Lizard supported PV not available 5 months 2 weeks ago #3029

If needed I could send log files when this issue is happening. Please let me know which ones you need.
BR Andreas

Please Log in or Create an account to join the conversation.

HA-Lizard supported PV not available 4 months 4 weeks ago #3032

@Salvatore,

I have performed last weekend again the maintenance procedure and have disabled the HA function and have put both nodes into manual mode, keeping a few Linux VMs on the secondary node running. Moving a VM from slave to master and back to slave was working as expected.

Changing the primary and secondary node in its storage role leads immediately within the VMs to IO errors.
I had tried to restart quickly tgtd but it had for the running VMs no effect. So I had to stop them and put back everything to the previous status.

As a second test run I have done the a.m. procedure on another pool without the "October patches" from XCP-NG, this pool had the September patch level included. I had no problems switching the primary/secondary storage role in manual mode.

Based on the announcement and patch notes it is obvious that the changes to limit privileges for a number of components have affected the smooth function of Ha-Lizard/ISCSi-HA.

Unfortunately I cannot test the impact when it comes to an HA incident and one host will stop working and VMs may have to be restarted on the surviving node as we have no test equipment available.

Maybe my tests are helpful to narrow down the root cause for you. Again, If you need additional information or testing I am happy to do so.

Best regards Andreas

Please Log in or Create an account to join the conversation.

HA-Lizard supported PV not available 2 months 3 weeks ago #3075

@Salvatore,

are you able to send us an offer were you estimate the time needed and costs involved to fix the problem introduced with the recent XCP-NG patches. We do not want to stick at this point, but we need you to solve.
BR Andreas

Please Log in or Create an account to join the conversation.