Forum

Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /homepages/13/d467848118/htdocs/templates/cleanout/vertex/responsive/responsive_mobile_menu.php on line 158
Welcome, Guest
Username: Password: Remember me

TOPIC:

Missing SCSI ID after update and reboot 5 months 4 weeks ago #5236

This sounds similar to the problem I had reported in this thread:

www.halizard.com/forum/software-support/...ailable?start=6#3027

The problem had started with the October 2023 patches. Consequently I had not applied any further updates.

Please Log in or Create an account to join the conversation.

Missing SCSI ID after update and reboot 3 months 2 weeks ago #5258

Hi Salvatore,

I've spent some time reviewing the updates to 8.2.1 that cause this iSCSI tgtd race condition problem and I've narrowed it down to multipath. It appears that the upgrade to multipath version 0.4.9-136.xcpng8.2 triggers the problem.

Avoiding Problematic Updates:
In my lab I can load XCP 8.2.0 or XCP 8.2.1 (First Release 2022) and provision HA-Lizard as normal then run yum update and exclude multipath and dependencies bringing everything else up to date without a problem.
yum update -x kpartx -x device-mapper-multipath -x device-mapper-multipath-libs

Note that the second revision of 8.2.1 (released 11-12-2023) comes with the new multipath version and is therefore already problematic. Multipath cannot be rolled back due to the complex chain of dependencies so until HA-Lizard supports the second revision of 8.2.1 (released 11-12-2023), we need to use the first revision and exclude kpartx, device-mapper-multipath and device-mapper-multipath-libs from any updates.

OR... Use the workaround below (which might be considered a fix now if you approve)

iSCSI Target Daemon Problem
One thing I found interesting is that the problem exists with a fresh install of HA-Lizard on XCP-ng 8.2.1 2023 release. Interesting because there is no boot involved. The final HA-Lizard install stage of adding an iscsi SR cannot be completed because the target is not online until you stop the tgtd service manually and wait a bit for iscsi-ha to start it up again (or manually start it).

What I got from this is that the iSCSI target only becomes available on a subsequent start of tgtd and not the first start, regardless of whether it's after a system boot or a fresh install. What does tgtd do the first time is starts that is different from any subsequent start and how does this relate to multipath? Is drbd involved? is the race condition related to multipath and the drbd backing-stores?

Multipath and DRBD
Upon closer inspection of multipath I found that the drbd backing-store was not being blacklisted even though a default rule exists for this in the main rule set. The filter expression does not appear to trigger so manually adding a modified rule to the multipath custom configuration applies the blacklist immediately and prevents multipath interfering with the drbd backing-store thus allowing tgtd to create a functional target.

devnode "^rbd[0-9]*" - Not Working
devnode "drbd[0-9]*" - Working

As far as I can tell, the drbd backing-stores have never been blacklisted by multipath. It's like this on 8.2.0 but only now with the latest version does it actually matter. I still don't know why but at least we have a workaround.


Final Workaround:
I now work around the problem by simply by adding custom rule to multipath forcing the drbd backing-stores to be blacklisted. No need to change iscsi-ha code and fully updated 8.2.1 hosts now start normally after reboot.

Edit /etc/multipath/conf.d/custom.conf
Insert blacklist rule for drbd nodes on both hosts.
blacklist {
        devnode "drbd[0-9]*"
}



Initial Workaround:
This is how I worked around the problem initially before realising that multipathd was at the root of the problem

Edit /etc/iscsi-ha/iscsi-ha.sh
Insert a one line tgtd primer before the service_execute tgtd start line (line 255) on both hosts.
# /usr/bin/systemctl start tgtd && sleep 5 && /usr/bin/systemctl stop tgtd
log "Attempting to start iSCSI target $ISCSI_TARGET_SERVICE"
/usr/bin/systemctl start tgtd && sleep 2 && /usr/bin/systemctl stop tgtd
service_execute tgtd start && SERVICE_EXECUTE_RESULT=$(service_execute $ISCSI_TARGET_SERVICE status)
RETVAL=$?

Intermediate Workaround:
I then began working around the problem by forcing a restart of multipathd before tgtd starts

Edit /etc/iscsi-ha/iscsi-ha.sh
Insert a one line multipathd service restart before the service_execute tgtd start line (line 255) on both hosts.
# /usr/bin/systemctl restart multipathd
log "Attempting to start iSCSI target $ISCSI_TARGET_SERVICE"
/usr/bin/systemctl restart multipathd
service_execute tgtd start && SERVICE_EXECUTE_RESULT=$(service_execute $ISCSI_TARGET_SERVICE status)
RETVAL=$?



Hope all of this helps.

Cheers
Nathan

Please Log in or Create an account to join the conversation.

Last edit: by Nathan Scannell.

Missing SCSI ID after update and reboot 3 months 1 week ago #5261

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 725
Nathan,
Many thanks for getting to the bottom of the problem.

I have a small release that I started working on which fixed a few minor issues. I will work on adding something to address this issue too.

Please Log in or Create an account to join the conversation.

Missing SCSI ID after update and reboot 3 months 6 days ago #5262

No probs. Happy to contribute.

Please Log in or Create an account to join the conversation.

Missing SCSI ID after update and reboot 3 months 2 days ago #5263

Hey Salvatore,

Any idea when the new version with the fixes will be released ??

Thanks and attentive.

Please Log in or Create an account to join the conversation.

Missing SCSI ID after update and reboot 2 months 4 weeks ago #5264

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 725
Hi Victor,
I will start work on the release in about 2 weeks from now.

Please Log in or Create an account to join the conversation.