Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1

TOPIC:

any idea why OP_MODE have changed ?? 1 month 2 weeks ago #2871

  • Victor Hugo
  • Victor Hugo's Avatar Topic Author
  • Offline
  • Posts: 16
Hello everyone,

Today at 03:00AM we have a blackout and the VMs and servers stopped correct via the UPS system, two hours later, a operator called me to report that the electrical power have been online long time ago, but no system was working.

So, I logged on the servers and everything was OK (ups, drbd, ha-cfg enabled, iscsi-cfg status without split-brain, storages/disks connected, etc, etc), but no VM was running like the operator have been reported.

Then, I started a VM manually and everything ran normally. Next step was to check the ha-lizard configuration and then I saw that the OP_MODE was setup on 2 (manage virtual machines) , but we use manage appliances (OP_MODE=1). I setup the OP_MODE to 1 using the command:

```
ha-cfg set OP_MODE 1
```

and all the VMs started to boot again as before. I mean, "as before" because on past (more specific on the 03/October) we had another blackout and all the infrastructure go down for 3 hours, after, when the electrical power come online again, no manual intervention was necessary to start the VMs.

I'm very sure that no one have setup the OP_MODE to 2 since 03/October (we register all the loggins and command externally), but I can check on the logs because we only store 3 months of user.log (shame on myself).

anyway, it is a very strange situation and Im not sure what haven happened, my only "possible" explanation is that for some strange reason, the HA have reset the value of the OP_MODE to the default one. Can it be ?? any other idea ??


version ha/iscsi = 2.2.0-1 on xenserver 7.2.0 (yes, it is a old version)

Thanks and attentive.

Please Log in or Create an account to join the conversation.

any idea why OP_MODE have changed ?? 1 month 2 weeks ago #2883

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 731
Hi Hugo,
op_mode, like all the settings are stored in the xapi DB. It's really not possible to tell why or how it reverted at this point. The only possibility I see is that the local config cache (which comes from the xapi DB) was corrupted when power was lost. If this happens, we load some default settings just to bootstrap the initial startup and then recache with settings from the xapi DB. Maybe something in this logic went wrong, but again, it's impossible to know for sure without logs, etc..

Please Log in or Create an account to join the conversation.

any idea why OP_MODE have changed ?? 3 weeks 21 hours ago #2893

  • Victor Hugo
  • Victor Hugo's Avatar Topic Author
  • Offline
  • Posts: 16
Hi Salvatore,

Today we had the same situation, but I finally found the reason and detected the "problem".
We use the apcupsd daemon to monitor the UPS and shutdown the VMS and physical servers in case of prolonged power failure, like:

  • 1 - if UPS with less than 15 minutes of charge
  • 2 - Shutdown all VMs
  • 3 - Shutdown all physical servers
  • 4 - If UPS has more than 40% of charge
  • 5 - Start all the physical servers
  • 6 - halizard should start all VMs on the appliances (OP_MODE=1)

  • In theory this should works OK, but we had detected on the past that step #2 has a problem when the OP_MODE is 1, because if we manually shutdown the VMs, halizard HA will start it again and the shutdown procedure will never finish and the servers will shutdown incorrectly when the UPS come to 0% of charge. :-(

    So, to avoid this problem we have add on the UPS shutdown script a line to change the OP_MODE from 1 to 2. The problem now is that the VMs don't start automatically after the physical servers turn on (because we manually changed the OP_MODE) !!!

    I'm thinking in some workaround to this like add a line on the /etc/rc-local to change the the OP_MODE to 1 again after the reboot, but it is ugly !!

    Have you (or someone else) managed this kind of situation (halizard + HA working wiht UPS and allowing it to shutdown the VMs and posterior starting the VMs again automatically) on the past ??

    Maybe a most elegant situation is to add a parameter on the halizard to avoid the restart of the VMs (function vm_mon) when the system is turning off, for example:
    OP_MODE1 or 2
    SHUTDOWN_IN_PROGRESStrue or false (always false when the halizard start)

    and on the code this logic:
    ==============
    If OP_MODE = 1 or 2 AND SHUTDOWN_IN_PROGRESS=false
    VAPP_START=`$XE appliance-start uuid=$UUID`
    ==============

    In this case, any tool/script can setup the SHUTDOWN_IN_PROGRESS via:
    =====
    ha-cfg set SHUTDOWN_IN_PROGRESS true
    =====
    before turning off the system and when the system start again, the HA will work as before (because the SHUTDOWN_IN_PROGRESS is false).

    what you think ??
    If there is no actual settings to work with UPS that we can use with ha-lizard, I believe that this is a good idea and I can work in a patch to it today/tomorrow.

    let me knwow.

    Please Log in or Create an account to join the conversation.

    Last edit: by Victor Hugo.
    • Page:
    • 1