Forum
Welcome, Guest
Username: Password: Remember me

TOPIC:

URGENT: VMs corrupted after transferred to HAL 6 years 8 months ago #1392

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
Salvatore, curious thing I've noticed is that VM's are divided accross both master and slave. I was under the impression there can only be 1 master and 1 slave, so was not sure if this also played a role?

Please Log in or Create an account to join the conversation.

URGENT: VMs corrupted after transferred to HAL 6 years 8 months ago #1393

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
VMs are free to run on either host. There are no restrictions.

Please Log in or Create an account to join the conversation.

URGENT: VMs corrupted after transferred to HAL 6 years 8 months ago #1394

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
I've got more conclusive results to display today. I've spent the better part of yesterday and today to conduct testing, specifically migrating from standalone to a HAL pool.

In my test for today I wanted to test the following. In total there is 12 VM's:

HAL Disabled (ha-cfg status > disable)
6 individual VM's, 3 were left on (live migration) an 3 were shutdown and then migrated. I also tested migrating to master, slave and no specify.

I tested the HAL disabled first and was unable to get any of the VM's to break. They ran for a straight 6 hours and showed no symptom of issues. I was also able to then further migrate them around to the other pool hosts without interruption.

HAL Enabled (ha-cfg status >enabled)
Same VM set as above.

The ones with HAL enabled showed much different results. 2 of the 3 migrated whilst still on had their filesystems corrupted almost immediately. I was unable to recover any of these filesystems even using xfs_repair. (see screenshots below). The ones which was shutdown prior to the migrate showed absolutely no issues even migrating between hosts.

Migrating to the other host (in this case from master > slave) makes no difference, however, xfs_repair allows me this time around to repair the partition on the slave which resolved the startup issue. Even migrating back to the master now kept the VM running. It broke a couple of minutes later with a hung kernel. See last screenshot below

Conclusion:
I'm going to run a last test to ensure my findings are correct but it will be worth your time to maybe setup a test system with 7.2 to check out results. There is obviously a strong possibility I stuffed up something somewhere but as this is quite destructive and only occurs ones HA is enabled.

Note no other VM's running is at all affected, the hardware shows no issue in iDRAC/ILO and most importantly, if the VM is shutdown before migrating to a HA enabled pool they are not affected.

If you're interested in logs let me know, I'll gladly provide some info. I'm invested in getting this resolved as at this stage I am not sure to what level this issue is happening and would not want to see issues down the line. All other aspects are working.

Please Log in or Create an account to join the conversation.

Last edit: by Mauritz.

URGENT: VMs corrupted after transferred to HAL 6 years 8 months ago #1395

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Thanks for the detailed test results. Were the VMs running during the migration or were they powered off?

Also, please post your config. 'ha-cfg get'

Looks like Hal is attempting to start the vm, perhaps before xapi has finished the migration, or xapi is prematurely allowing a start on a vm that has not completed its migration processes.

Please Log in or Create an account to join the conversation.

URGENT: VMs corrupted after transferred to HAL 6 years 8 months ago #1396

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
It was a mix with each set of tests. 3 was running and 3 was shutdown. In no particular case did the VM's that was shutdown ever get affected, even whilst HAL was enabled.

Those which was running but HAL was off showed no issues in 6 hours running. 2 of the 3 which was transferred (and running) to the HAL enabled pool had their filesystems crash.

Config:
DISABLED_VAPPS=()
ENABLE_LOGGING=1
FENCE_ACTION=stop
FENCE_ENABLED=1
FENCE_FILE_LOC=/etc/ha-lizard/fence
FENCE_HA_ONFAIL=0
FENCE_HEURISTICS_IPS=129.232.159.14
FENCE_HOST_FORGET=0
FENCE_IPADDRESS=
FENCE_METHOD=POOL
FENCE_MIN_HOSTS=2
FENCE_PASSWD=
FENCE_QUORUM_REQUIRED=1
FENCE_REBOOT_LONE_HOST=0
FENCE_USE_IP_HEURISTICS=1
GLOBAL_VM_HA=1
HOST_SELECT_METHOD=0
MAIL_FROM=halizard****
MAIL_ON=1
MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"
MAIL_TO=alerts****
MGT_LINK_LOSS_TOLERANCE=5
MONITOR_DELAY=15
MONITOR_KILLALL=1
MONITOR_MAX_STARTS=20
MONITOR_SCANRATE=10
OP_MODE=2
PROMOTE_SLAVE=1
SLAVE_HA=1
SLAVE_VM_STAT=0
SMTP_PASS=*****
SMTP_PORT=587
SMTP_SERVER=mail****
SMTP_USER=halizard****
XAPI_COUNT=2
XAPI_DELAY=10
XC_FIELD_NAME='ha-lizard-enabled'
XE_TIMEOUT=10

Please Log in or Create an account to join the conversation.

URGENT: VMs corrupted after transferred to HAL 6 years 8 months ago #1397

  • Mauritz
  • Mauritz's Avatar Topic Author
  • Offline
  • Posts: 43
Not wanting to leave anything to chance, I decided to do a last test to make sure the data I give you is accurate. This is our production servers and want to contribute to the great work you've offered here.

I created 4 VM's, 2 was shutdown and 2 was kept running. Migrated all 4 to the pool, shutdown was fine, out of the 2 which was kept running the 1 crashed almost instantly with no real option to fix the filesystem.

I then disabled HAL and migrated 2 vm's over which was running and they worked fine (as expected).

Lastly I switched HAL back on, kept 2 vm's running and this time migrated them to the pool master, and as expected, both failed, meaning that in the event that we're doing a live migration over to a HAL master host it will break the filesystem.

Workaround is shutting down the VM's first, disabling HAL before you migrate. Let me know what else I can provide you with.

Please Log in or Create an account to join the conversation.