Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1

TOPIC:

Advice on Split-Brian recovery 1 month 1 week ago #2885

  • Carlos Marchi
  • Carlos Marchi's Avatar Topic Author
  • Offline
  • Posts: 8
Hi there!

I was running some tests in a client today, and in one of the final tests a VM got completely stuck while being transfered, and the only thing I was able to do that made it start again was booting the host (the VM then started on the other host).

I'm not completely sure if drbd got into StandAlone mode at this moment or if it was a little before that, but it didn't recovered the drbd sync.

I have the following scenario:

xcpsrv01 (the one that was rebooted when a VM was stuck transfering)
xcpsrv02 (the one that assumed master role and it's running most of the VMs)

Since the VMs are now running at xcpsrv02, is it safe to assume that this is the survivor and 01 is the victim?
Attachments:

Please Log in or Create an account to join the conversation.

Last edit: by Carlos Marchi.

Advice on Split-Brian recovery 1 month 1 week ago #2886

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 731
Can't really tell why your migration failed. it does look like DRBD could be in split brain. If this was recent, you should be able to find a drbd log in /var/log/user.log that would confirm that.

You can follow DRBD documentation for recovering split brain or use a script that's included as part of ha-lizard. The script gets run on both hosts and walks you through the steps
/etc/iscsi-ha/scripts/drbd-sb-tool

Even though drbd is not connected, your VMs should continue to fully operate on either host. The is because we operate in single primary mode and only expose the storage on the pool master at any time.

Please Log in or Create an account to join the conversation.

Advice on Split-Brian recovery 1 month 1 week ago #2887

  • Carlos Marchi
  • Carlos Marchi's Avatar Topic Author
  • Offline
  • Posts: 8
Hi Salvatore,

Thanks for the quick reply.

The survivor had some virtual disks corrupted, would it make sense to revert the logic and recover from the victim? Or would this make it worse? Losing changes from the last hours wouldn't be an issue.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1