Forum
Welcome, Guest
Username: Password: Remember me

TOPIC:

Failed to spawn 7 years 2 weeks ago #1255

I looked in there but having found anything very interesting yet. I also tried increasing the timeout but I'm still getting a bunch of messages when anything significant disk activity happens, which is making me somewhat concerned. I don't want to risk damaging the iSCSI drive by something timing that shouldn't (I'm not exactly sure what is timing out or how risky this is.)

A similar setup in my office with nearly the same hardware is working much better. Well, performance seems to be the same from the VM perspective, but no errors here.

I may remove HA-Lizard for now from the client as I need them to be rock solid. I'll keep working with it here, though, and get more experience with it. Which is normally what I like to do before deploying a new solution to a client anyway.

Please Log in or Create an account to join the conversation.

Failed to spawn 7 years 2 weeks ago #1256

Actually, I guess I am seeing something odd in the system that is generating the alerts. Does this part of the log shed any light on anything? Anything I should look at? I don't see any entries like that on the system that is behaving itself.
Mar 12 01:00:43 grgcxen1 iscsi-ha:  This iteration is count 629
Mar 12 01:00:43 grgcxen1 iscsi-ha:  Checking if this host is a Pool Master or Slave
Mar 12 01:00:43 grgcxen1 iscsi-ha:  This host's pool status = master
Mar 12 01:00:43 grgcxen1 iscsi-ha:  auto_plug_pbd: Found LVMoISCSI SR List: aac579cd-4bd2-ed73-c972-fbcbaa011f9d
Mar 12 01:00:45 grgcxen1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Scanning for Volume Group -> iscsi-sr: aac579cd-4bd2-ed73-c972-fbcbaa011f9d
Mar 12 01:00:45 grgcxen1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Volume Group for iSCSI-SR found OK: aac579cd-4bd2-ed73-c972-fbcbaa011f9d
Mar 12 01:00:46 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:00:51 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:00:51 grgcxen1 ha-lizard:  ha-lizard Watchdog: ha-lizard running - OK
Mar 12 01:00:51 grgcxen1 ha-lizard: 14843 ha-lizard already running: Attempt 1 on PIDS: 14843
Mar 12 01:00:53 grgcxen1 iscsi-ha: 14950 iscsi-ha already running: Attempt 1 on PIDS: 14966 14950
Mar 12 01:00:56 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:00:58 grgcxen1 iscsi-ha: 14950 iscsi-ha already running: Attempt 2 on PIDS: 14966 14950
Mar 12 01:01:01 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:01:01 grgcxen1 ha-lizard:  ha-lizard Watchdog: ha-lizard running - OK
Mar 12 01:01:02 grgcxen1 ha-lizard: 14843 ha-lizard already running: Attempt 2 on PIDS: 14843
Mar 12 01:01:04 grgcxen1 iscsi-ha: 14950 iscsi-ha already running: Attempt 3 on PIDS: 14966 14950
Mar 12 01:01:06 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:01:11 grgcxen1 iscsi-ha: 14950 iscsi-ha already running: Attempt 4 on PIDS: 14966 14950
Mar 12 01:01:11 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:01:11 grgcxen1 ha-lizard:  ha-lizard Watchdog: ha-lizard running - OK
Mar 12 01:01:12 grgcxen1 ha-lizard: 14843 ha-lizard already running: Attempt 3 on PIDS: 14843
Mar 12 01:01:16 grgcxen1 iscsi-ha: 14950 iscsi-ha already running: Attempt 5 on PIDS: 14966 14950
Mar 12 01:01:16 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:01:21 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar 12 01:01:21 grgcxen1 iscsi-ha: 14950 iscsi-ha already running: Attempt 6 on PIDS: 14966 14950
Mar 12 01:01:21 grgcxen1 ha-lizard:  ha-lizard Watchdog: ha-lizard running - OK
Mar 12 01:01:22 grgcxen1 iscsi-ha: 14950 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Mar 12 01:01:22 grgcxen1 ha-lizard: 14843 ha-lizard already running: Attempt 4 on PIDS: 14843
Mar 12 01:01:22 grgcxen1 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: cat: /dev/shm/iscsi-ha-mail/*.msg: No such file or directory
Mar 12 01:01:22 grgcxen1 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: /etc/iscsi-ha/iscsi-ha.func: line 282: [: -eq: unary operator expected
Mar 12 01:01:22 grgcxen1 iscsi-ha: 14950 email Sending ALERT email to bill@myemailaddresswashere.com: iscsi-ha failed to spawn new instance after 6 attmepts. MAX_STARTS is set to 5. Check Host: grgcxen1 for possible hung process
Mar 12 01:01:26 grgcxen1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK

Please Log in or Create an account to join the conversation.

Failed to spawn 7 years 2 weeks ago #1257

It looks like iscsi-ha is already running but it's trying to restart anyway?

One more thing I noticed. The misbehaving system seems to be generating a much greater log volume than the system that's working better. But I don't really see what is going on. Am I missing something obvious?

Please Log in or Create an account to join the conversation.

Last edit: by Bill.

Failed to spawn 7 years 2 weeks ago #1262

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Looks like ha-lizard and iscsi-ha are both reporting slow system responses. The slowdown is likely from xapi. Next time your hosts are performing this maintenance, try taking a look at cpu usage on dom0 and xapi cpu usage on dom0

Please Log in or Create an account to join the conversation.

Failed to spawn 7 years 1 week ago #1271

I'm still not seeing anything useful in the logs. Last night at 1:20 AM again I got a bunch of errors from HA-Lizard:

A bunch of these:
ha-lizard failed to spawn new instance after 21 attmepts. MAX_STARTS is set to 20. Check Host: grgcxen1 for possible hung process

And some of these:
xe_wrapper: COMMAND: xe pool-param-get

update_global_conf_params: Failed to update global pool configuration settings in /etc/ha-lizard/ha-lizard.pool.conf - Check Configuration!

xe_wrapper COMMAND: xe sr-list: type=lvmoiscsi

I could start from scratch with a XenServer 7 install, but that's pretty much what this is.

One other thought - I'm going to try swapping primary/secondary storage and see what happens tonight. If I get the same error messages from xen2 then there's a software problem on both servers.

<confused>

Please Log in or Create an account to join the conversation.

Last edit: by Bill.

Failed to spawn 7 years 1 week ago #1273

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Based on your last post, it is clear that the xen Api is slow to respond, hence the timeout errors. Reinstalling won't make a difference. You can try suppressing some of the errors or increasing some of the timeout settings to quiet things down

Please Log in or Create an account to join the conversation.