Forum
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC:

iSCSI shared storage 'unplugged' on reboot 6 years 11 months ago #1322

  • HoppySpadge
  • HoppySpadge's Avatar Topic Author
  • Offline
  • Posts: 4
Gd day...I am trying to set up Ha-lizard on a 2-node pool. Xenserver 7.1 / halizard_nosan_installer_2.1.3. Every time though (I've nuked and paved 4 times now...) the same problem happens: The iSCSI virtual disk storage fails to restart on first reboot. Xencenter says it is 'unplugged'.

Some details:
dbrd is not even running in either of the 2 servers:
[root@rx100 ~]# drbd-overview
 1:iscsi1/0  Unconfigured . . 
[root@rx100 ~]# ls /dev/d*
by-id  by-label  by-partuuid  by-path  by-scsibus  by-scsid  by-uuid

The pool is not running any VMs, ha-cfg status is disabled:
[root@rx100 ~]# ha-cfg status
-------------------------------------------------------------
| ha-lizard Version:   2.1.3                                |
| Operating Mode:      Mode [ 2 ] Managing All VMs in Pool  |
| Host Role:           master                               |
| Pool UUID:           e264c9cf-1f50-ef0e-5fb4-185f497b3788 |
| Host UUID:           b139fd7e-9010-4969-a95f-c7717f31aa83 |
| Master UUID:         b139fd7e-9010-4969-a95f-c7717f31aa83 |
| Daemon Status:       ha-lizard is running [ OK ]          |
| Watchdog Status:     ha-lizard-watchdog is running [ OK ] |
| HA Enabled:          false                                |
-------------------------------------------------------------
Pool HA Status: DISABLED

[root@quad ~]# systemctl is-enabled tgtd
disabled
[root@quad ~]# systemctl is-enabled drbd
disabled
[root@quad ~]# systemctl is-enabled ha-lizard
ha-lizard.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig ha-lizard --level=5
enabled
Which I think is how it is supposed to be?

iSCSI is a direct linked 1GB. Ping to floating IP is always fine (in fact all pings are always fine):
[root@rx100 ~]# ping 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=0.017 ms

Each time I have installed I have made the setup more basic, until this time I installed no VMs, did not enable HA, followed instructions exactly as per the video and waited a full 10 hrs for the iSCSI drbd array to get upto date before rebooting. Result was the same, master server (rx100) took about 15 mins to gracefully shutdown, but it did shutdown gracefully, and when I just started the master again with the slave off then the master could not see storage on its own. Bringing up the slave made no difference: no iSCSI storage!!

There's doesn't seem to be anything obvious in /var/log/user.log except that since installing less than 24 hrs ago the log is now 40MB - so something isn't right.
Here's an excerpt:
May 17 06:45:43 rx100 iscsi-ha: 21003 Aborting promote to primary
May 17 06:45:44 rx100 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
May 17 06:45:48 rx100 ha-lizard:  ha-lizard Watchdog: ha-lizard running - OK
May 17 06:45:49 rx100 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
May 17 06:45:51 rx100 ha-lizard: 20631 Spawning new instance of ha-lizard
May 17 06:45:51 rx100 ha-lizard:  Mail Spool Directory Found /dev/shm/ha-lizard-mail
May 17 06:45:51 rx100 ha-lizard:  This iteration is count 73
May 17 06:45:51 rx100 ha-lizard:  Checking if this host is a Pool Master or Slave
May 17 06:45:51 rx100 ha-lizard:  This host's pool status = master
May 17 06:45:51 rx100 ha-lizard:  Checking if ha-lizard is enabled for this pool
May 17 06:45:51 rx100 ha-lizard:  check_ha_enabled: Checking if ha-lizard is enabled for pool: e264c9cf-1f50-ef0e-5fb4-185f497b3788
May 17 06:45:51 rx100 ha-lizard:  check_ha_enabled: ha-lizard is disabled
May 17 06:45:51 rx100 ha-lizard:  ha-lizard is disabled
May 17 06:45:51 rx100 ha-lizard:  Updating state information
May 17 06:45:51 rx100 ha-lizard:  get_vms_on_host: No VMs found on host: b139fd7e-9010-4969-a95f-c7717f31aa83
May 17 06:45:51 rx100 ha-lizard:  get_vms_on_host: No VMs found on host: 34008e0e-4ad4-44b9-b6a2-4f60d1bf7c53
May 17 06:45:51 rx100 ha-lizard:  Mail Spool Directory Found /dev/shm/ha-lizard-mail
May 17 06:45:51 rx100 ha-lizard:  check_email_enabled: Email enabled for write_pool_state
May 17 06:45:51 rx100 ha-lizard:  email: Duplicate message - not sending. Content = write_pool_state:  Error retrieving autopromote_uuid from pool configuration
May 17 06:45:51 rx100 ha-lizard:  email: Message barred for 60 minutes
May 17 06:45:51 rx100 ha-lizard:  check_ha_enabled: Checking if ha-lizard is enabled for pool: e264c9cf-1f50-ef0e-5fb4-185f497b3788
May 17 06:45:51 rx100 ha-lizard:  check_ha_enabled: ha-lizard is disabled
May 17 06:45:52 rx100 ha-lizard:  get_pool_host_list: enabled flag set - returning only hosts with enabled=true
May 17 06:45:52 rx100 ha-lizard:  get_pool_host_list: returned b139fd7e-9010-4969-a95f-c7717f31aa83#01234008e0e-4ad4-44b9-b6a2-4f60d1bf7c53
May 17 06:45:52 rx100 ha-lizard:  get_pool_ip_list: returned 172.16.130.10
May 17 06:45:52 rx100 ha-lizard:  get_pool_ip_list: returned 172.16.130.10 172.16.130.11
May 17 06:45:52 rx100 ha-lizard:  write_status_report: Writing status report
May 17 06:45:52 rx100 ha-lizard:  update_global_conf_params: Successfully updated global pool configuration settings in /etc/ha-lizard/ha-lizard.pool.conf.
May 17 06:45:52 rx100 ha-lizard:  update_global_conf_params: DISABLED_VAPPS=()#012ENABLE_LOGGING=1#012FENCE_ACTION=stop#012FENCE_ENABLED=1#012FENCE_FILE_LOC=/etc/ha-lizard/fence#012FENCE_HA_ONFAIL=0#012FENCE_HEURISTICS_IPS=10.0.10.1#012FENCE_HOST_FORGET=0#012FENCE_IPADDRESS=#012FENCE_METHOD=POOL#012FENCE_MIN_HOSTS=2#012FENCE_PASSWD=#012FENCE_QUORUM_REQUIRED=1#012FENCE_REBOOT_LONE_HOST=0#012FENCE_USE_IP_HEURISTICS=1#012GLOBAL_VM_HA=1#012HOST_SELECT_METHOD=0#012MAIL_FROM="root@localhost"#012MAIL_ON=1#012MAIL_SUBJECT="SYSTEM_ALERT-FROM_HOST:$HOSTNAME"#012MAIL_TO="root@localhost"#012MGT_LINK_LOSS_TOLERANCE=5#012MONITOR_DELAY=15#012MONITOR_KILLALL=1#012MONITOR_MAX_STARTS=20#012MONITOR_SCANRATE=10#012OP_MODE=2#012PROMOTE_SLAVE=1#012SLAVE_HA=1#012SLAVE_VM_STAT=0#012SMTP_PASS=""#012SMTP_PORT="25"#012SMTP_SERVER="127.0.0.1"#012SMTP_USER=""#012XAPI_COUNT=2#012XAPI_DELAY=10#012XC_FIELD_NAME='ha-lizard-enabled'#012XE_TIMEOUT=10
May 17 06:45:53 rx100 iscsi-ha: 20987 Spawning new instance of iscsi-ha
May 17 06:45:53 rx100 iscsi-ha: 20987 check_logger_processes Checking logger processes
May 17 06:45:53 rx100 iscsi-ha: 20987 check_logger_processes No processes to clear
May 17 06:45:53 rx100 iscsi-ha:  Normalized ISCSI_TARGET_SERVICE [ tgtd ]
May 17 06:45:53 rx100 iscsi-ha:  XenServer Major Release = [ 7 ]
May 17 06:45:53 rx100 iscsi-ha:  Mail Spool Directory Found /dev/shm/iscsi-ha-mail
May 17 06:45:53 rx100 iscsi-ha:  This iteration is count 14
May 17 06:45:53 rx100 iscsi-ha:  Checking if this host is a Pool Master or Slave
May 17 06:45:53 rx100 iscsi-ha:  This host's pool status = master
May 17 06:45:53 rx100 iscsi-ha: 21566 service_execute: Execute [ status ] on [ iscsi-ha ]
May 17 06:45:53 rx100 iscsi-ha: 21566 service_execute: System V mode detected
May 17 06:45:53 rx100 iscsi-ha:  auto_plug_pbd: Found LVMoISCSI SR List: 13df09b8-0957-2469-e683-d80be34157e9
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: /etc/iscsi-ha/iscsi-ha.func: line 198: [: ff7f0e05-b240-51ed-6c34-8411e3378fe9: binary operator expected
May 17 06:45:53 rx100 iscsi-ha: 21566 service_execute: [  OK  ]#015iscsi-ha running: 15713
May 17 06:45:53 rx100 iscsi-ha: 21566 service_execute: Returning exit status [ 0 ]
May 17 06:45:53 rx100 iscsi-ha: 21566 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101) srcversion: D496E56BBEBA8B1339BB34A
May 17 06:45:53 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Scanning for Volume Group -> iscsi-sr: 13df09b8-0957-2469-e683-d80be34157e9
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: 1: Failure: (127) Device minor not allocated
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: additional info from kernel:
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: unknown minor
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: Command 'drbdsetup-84 role 1' terminated with exit code 10
May 17 06:45:53 rx100 iscsi-ha: 21566 check_drbd_resource_state: Error retrieving DRBD state for resouce: iscsi1 - check configuration
May 17 06:45:53 rx100 iscsi-ha: 21566 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
May 17 06:45:53 rx100 iscsi-ha: 21566 email: Duplicate message - not sending. Content = check_drbd_resource_state: Error retrieving DRBD state for resouce: iscsi1 - check configuration
May 17 06:45:53 rx100 iscsi-ha: 21566 email: Message barred for 30 minutes
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: 1: Failure: (127) Device minor not allocated
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: additional info from kernel:
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: unknown minor
May 17 06:45:53 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: Command 'drbdsetup-84 primary 1' terminated with exit code 10
May 17 06:45:53 rx100 iscsi-ha: 21566 DRBD Resource: iscsi1 failed transition to Primary
May 17 06:45:53 rx100 iscsi-ha: 21566 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
May 17 06:45:53 rx100 iscsi-ha: 21566 email: Duplicate message - not sending. Content = DRBD Resource: iscsi1 failed transition to Primary
May 17 06:45:53 rx100 iscsi-ha: 21566 email: Message barred for 30 minutes
May 17 06:45:53 rx100 iscsi-ha: 21566 Aborting promote to primary
May 17 06:45:54 rx100 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
However from the log, one thing I see on every install attempt is this:
iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: 1: Failure: (127) Device minor not allocated
rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: additional info from kernel:
iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: unknown minor

Which sounds like bad news...

/etc/tgt/targets.conf (identical on master/slave)
############### BEGIN HALIZARD INSERTION ###############
	<target iqn.2015.com.halizard:noSAN>
	backing-store /dev/drbd1
	lun 10
</target>
############### END HALIZARD INSERTION ###############

One one occasion I setup and installed a VM on the shared storage immediately after installation, messed about with HA - it all seemed to work fine, but on reboot no storage....
Each time I always: 1.shtudown VMs (if any). 2.shut down slave from xencenter 3.wait 4.shut down master.

Any ideas? 3 days now and I can't even get onto properly testing Ha-lizard, VMs and HA at all:(

Full /var/log/user.log for both servers since last clean install:
cloud.hoppyspadge.com/index.php/s/WLCwmGrYREc7Zrl
cloud.hoppyspadge.com/index.php/s/VyssxeuCR3WubiI

Tks in advance
H

Please Log in or Create an account to join the conversation.

iSCSI shared storage 'unplugged' on reboot 6 years 11 months ago #1324

  • HoppySpadge
  • HoppySpadge's Avatar Topic Author
  • Offline
  • Posts: 4
I nuked and paved again - this time I installed another NIC and bonded exactly as per the video guide. So 3 NICs: 1 for the management and 2 bonded for iSCSI. ALl installed perfectly and storage worked and started syncing.
I shut down slave, waited then shutdown master. waited, restarted master and again it could not see the storage on its own. Restarted slave...same thing again. Xencenter says 'unplugged'.

Again there was no /dev/drbd so I brought up DRBD manually on both machines:
drbdadm up iscsi1
and up it comes, and the 2 disks start syncing - so all is well in drbd land it seems
[root@rx100 ~]# drbd-overview
 1:iscsi1/0  SyncSource Primary/Secondary UpToDate/Inconsistent 
	[====>...............] sync'ed: 26.5% (584252/794700)M 

[root@rx100 ~]# ls /dev/d*
/dev/drbd1

/dev/disk:
by-id  by-label  by-partuuid  by-path  by-scsibus  by-scsid  by-uuid

/dev/drbd:
by-disk  by-res
[root@rx100 ~]# 
I then tried to restart the iscsi-ah service
systemctl start iscsi-ha
[root@rx100 ~]# systemctl status iscsi-ha
● iscsi-ha.service - SYSV: iscsi-ha init script version 2.1 December 2016
   Loaded: loaded (/etc/rc.d/init.d/iscsi-ha)
   Active: active (running) since Wed 2017-05-17 14:53:22 BST; 2h 27min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 1503 ExecStart=/etc/rc.d/init.d/iscsi-ha start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iscsi-ha.service
           ├─ 1565 /bin/bash /etc/iscsi-ha/init/iscsi-ha.mon /var/run/iscsi-ha.mon.pid
           ├─ 1617 /bin/bash /etc/iscsi-ha/init/iscsi-ha.mon /var/run/iscsi-ha.mon.pid
           ├─ 1621 logger -t iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon
           ├─ 1623 /bin/bash /etc/iscsi-ha/init/iscsi-ha.mon /var/run/iscsi-ha.mon.pid
           ├─ 1627 logger -t iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon
           └─16873 sleep 10
[root@rx100 ~]#

here's /var/log/user.log when I restarted the iscsi-ha service:
May 17 17:21:24 rx100 iscsi-ha: 18197 Spawning new instance of iscsi-ha
May 17 17:21:24 rx100 iscsi-ha: 18197 check_logger_processes Checking logger processes
May 17 17:21:24 rx100 iscsi-ha: 18197 check_logger_processes No processes to clear
May 17 17:21:24 rx100 iscsi-ha:  Normalized ISCSI_TARGET_SERVICE [ tgtd ]
May 17 17:21:24 rx100 iscsi-ha:  XenServer Major Release = [ 7 ]
May 17 17:21:24 rx100 iscsi-ha:  Mail Spool Directory Found /dev/shm/iscsi-ha-mail
May 17 17:21:24 rx100 iscsi-ha:  This iteration is count 885
May 17 17:21:24 rx100 iscsi-ha:  Checking if this host is a Pool Master or Slave
May 17 17:21:24 rx100 iscsi-ha:  This host's pool status = master
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: Execute [ status ] on [ iscsi-ha ]
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: System V mode detected
May 17 17:21:24 rx100 iscsi-ha:  auto_plug_pbd: Found LVMoISCSI SR List: b78d4501-105a-988b-8c25-2d2587b7af93
May 17 17:21:24 rx100 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: /etc/iscsi-ha/iscsi-ha.func: line 198: [: 6295cd1f-5571-3298-92fc-075cada3e427: binary operator expected
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: [  OK  ]#015iscsi-ha running: 1565
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: Returning exit status [ 0 ]
May 17 17:21:24 rx100 iscsi-ha: 18716 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101) srcversion: D496E56BBEBA8B1339BB34A 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n- ns:230797860 nr:0 dw:0 dr:230798488 al:0 bm:0 lo:0 pe:3 ua:1 ap:0 ep:1 wo:f oos:582978396 [====>...............] sync'ed: 28.4% (569312/794700)M finish: 5:48:45 speed: 27,852 (29,920) K/sec
May 17 17:21:24 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Scanning for Volume Group -> iscsi-sr: b78d4501-105a-988b-8c25-2d2587b7af93
May 17 17:21:24 rx100 iscsi-ha: 18716 check_drbd_resource_state: DRBD Resource: iscsi1 in Primary mode
May 17 17:21:24 rx100 iscsi-ha: 18716 DRBD Resource: iscsi1 in SyncSource state - expected Connected state
May 17 17:21:24 rx100 iscsi-ha: 18716 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
May 17 17:21:24 rx100 iscsi-ha: 18716 email: Duplicate message - not sending. Content = DRBD Resource: iscsi1 in SyncSource state - expected Connected state
May 17 17:21:24 rx100 iscsi-ha: 18716 email: Message barred for 30 minutes
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: Execute [ status ] on [ tgtd ]
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: systemctl mode being used
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: ● tgtd.service - tgtd iSCSI target daemon#012   Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)#012  Drop-In: /etc/systemd/system/tgtd.service.d#012           └─local.conf#012   Active: active (running) since Wed 2017-05-17 14:53:39 BST; 2h 27min ago#012  Process: 4056 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v ready (code=exited, status=0/SUCCESS)#012  Process: 4023 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG (code=exited, status=0/SUCCESS)#012  Process: 3998 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)#012  Process: 3145 ExecStartPost=/bin/sleep 5 (code=exited, status=0/SUCCESS)#012 Main PID: 3144 (tgtd)#012   CGroup: /system.slice/tgtd.service#012           └─3144 /usr/sbin/tgtd -f
May 17 17:21:24 rx100 iscsi-ha: 18716 service_execute: Returning exit status [ 0 ]
May 17 17:21:24 rx100 iscsi-ha: 18716 iSCSI target: tgtd status = OK. [ active (running) since Wed 2017-05-17 14:53:39 BST; 2h 27min ago ]
May 17 17:21:24 rx100 iscsi-ha: 18716 local_ip_list: Local IP list returned 127.0.0.1#01210.10.10.1#01210.10.10.3#012172.16.130.10
May 17 17:21:24 rx100 iscsi-ha: 18716 CHECKING IP 127.0.0.1
May 17 17:21:24 rx100 iscsi-ha: 18716 CHECKING IP 10.10.10.1
May 17 17:21:24 rx100 iscsi-ha: 18716 CHECKING IP 10.10.10.3
May 17 17:21:24 rx100 iscsi-ha: 18716 Virtual IP: 10.10.10.3 discovered on host rx100
May 17 17:21:24 rx100 iscsi-ha: 18716 send_replication_network_arp: Sending ARP update to peer
May 17 17:21:24 rx100 iscsi-ha: 18716 send_replication_network_arp: IP address list for [ xapi1 ] = [ 10.10.10.1#01210.10.10.3 ]
May 17 17:21:24 rx100 iscsi-ha: 18716 send_replication_network_arp: Updating ARP for device [ xapi1 ] IP [ 10.10.10.1 ]
May 17 17:21:25 rx100 ha-lizard:  ha-lizard Watchdog: ha-lizard running - OK
May 17 17:21:25 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: ARPING 10.10.10.1 from 10.10.10.1 xapi1
May 17 17:21:25 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Sent 2 probes (2 broadcast(s))
May 17 17:21:25 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Received 0 response(s)
May 17 17:21:25 rx100 iscsi-ha: 18716 send_replication_network_arp: Updating ARP for device [ xapi1 ] IP [ 10.10.10.3 ]
May 17 17:21:26 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: ARPING 10.10.10.3 from 10.10.10.3 xapi1
May 17 17:21:26 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Sent 2 probes (2 broadcast(s))
May 17 17:21:26 rx100 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Received 0 response(s)

xe pbd-list includes:
[root@rx100 ~]# xe pbd-list

uuid ( RO)                  : 6295cd1f-5571-3298-92fc-075cada3e427
             host-uuid ( RO): c92f0d79-d394-4574-9c18-a1ce6ef0a66b
               sr-uuid ( RO): b78d4501-105a-988b-8c25-2d2587b7af93
         device-config (MRO): target: 10.10.10.3; port: 3260; targetIQN: iqn.2015.com.halizard:noSAN; SCSIid: 360000000000000000e0000000001000a
    currently-attached ( RO): false

and also
[root@rx100 ~]# xe sr-list
uuid ( RO)                : b78d4501-105a-988b-8c25-2d2587b7af93
          name-label ( RW): iSCSI virtual disk storage
    name-description ( RW): iSCSI SR [10.10.10.3 (iqn.2015.com.halizard:noSAN; LUN 10: beaf110: 890 GB (IET))]
                host ( RO): <shared>
                type ( RO): lvmoiscsi
        content-type ( RO):

but manually attempting to plug it fails
[root@rx100 ~]# xe pbd-plug uuid=6295cd1f-5571-3298-92fc-075cada3e427
Error code: SR_BACKEND_FAILURE_47
Error parameters: , The SR is not available [opterr=[Errno 2] No such file or directory: '/dev/disk/by-scsid/360000000000000000e0000000001000a'],

systemctl restart iscsi makes no difference. I'm new to xenserver and have been just 'trying things that seem logical' so if anyone has a better idea how to get the SR and PBD back up then please let me know!

tks in advance
H

Please Log in or Create an account to join the conversation.

iSCSI shared storage 'unplugged' on reboot 6 years 11 months ago #1325

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
It looks like there is an issue with DRBD starting too early in the boot process. We added a fix for this in our last release which would re-attempt starting DRBD in case it failed to start. The logic, however, could be improved. We are relying on the proc file system to check whether DRBD started. Unfortunately, DRBD can start and remain running in an errored state. We will add an improvement to the logic and have something for you to test in a few days.

In the meantime, you can verify these findings by delaying the start of iscsi-ha.

Can you try disabling iscsi-ha. Then reboot the server as in your previous tests. Once the server has rebooted, wait a couple of minutes and then start iscsi-ha manually.


Disable:
chkconfig iscsi-ha-watchdog off
chkconfig iscsi-ha off

reboot as usual.

Start manually after the host as rebooted and a few minutes elapsed:
service iscsi-ha start

Please Log in or Create an account to join the conversation.

iSCSI shared storage 'unplugged' on reboot 6 years 11 months ago #1327

  • HoppySpadge
  • HoppySpadge's Avatar Topic Author
  • Offline
  • Posts: 4
molte grazie :) Fixed. That works fine. Here are the truncated logs and here's a list of timings of what I did and when so yu can tally log entries

cloud.hoppyspadge.com/index.php/s/AKjsc3mJbczv52X
cloud.hoppyspadge.com/index.php/s/KjLDLvAwVrxY1HK

On "quad": (slave/secondary)
08.45 chkconfig iscsi-ha-watchdog off && chkconfig iscsi-ha off
08.47 shutdown quad from xencenter shutdown button

on "rx100" (master/primary)
08:54 chkconfig iscsi-ha-watchdog off && chkconfig iscsi-ha off

just to make sure
[root@rx100 dev]# chkconfig iscsi-ha-watchdog off
[root@rx100 dev]# chkconfig iscsi-ha off
[root@rx100 dev]# systemctl is-enabled iscsi-ha
iscsi-ha.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig iscsi-ha --level=5
disabled
[root@rx100 dev]# systemctl is-enabled iscsi-ha-watchdog
iscsi-ha-watchdog.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig iscsi-ha-watchdog --level=5
disabled
[root@rx100 dev]#

08:55 shutdown rx100 from xencenter shutdown button
09:14 finally rx100 shutsdown!

09:30 Power up rx100 (master)
09:31 ssh in to rx100
09:40 service iscsi-ha start
about 3 mins later the iscsi disk appears
[root@rx100 home]# find /dev -name "360000000000000000e0000000001000a"
/dev/disk/by-scsid/360000000000000000e0000000001000a
[root@rx100 home]#

and the storage repo is up and running. It has however initiated a wait forever whilst trying to find the slave/seconday which is still off

09:46 : power on "quad"


All is good. Storage repo is up and syncing and I have full control :)

Some questions if I may:

- next time I shutdown should I chkconfig iscsi-ha-watchdog off && chkconfig iscsi-ha off first? They are disabled now at autostart so I'll have to start them manually anyway.

- Any clues as to why is this happening? Is it some hardware issue with Xenserver? The master machine is a Fujitsu Primergy RX100 S8 1U rack with a Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz, 8GB RAM, 2 x 1TB spinning drives. It should be able to handle this easily. I note however that it is not on Xenserver's Hardware Compatibility List hcl.xenserver.org/servers/?serversupport...dor=13&form_factor=2 Could this be a factor? The other machine I was given to do this project is a another Primergy - a TX1330 M2 - and I've been unable to install any recent version of xenserver on that machine at all - the install just hangs and logs say nothing useful - damn useless!! To complete this study I've had to dust off my 'runs absolutely everything' Core2 Quad box with an Intel G41M chipset - a bit slow but super reliable on all Linux OSs and this box has no problems. If Ha-Lizard works we might use the software in several installations - Do you suffer from odd Xenserver related hardware issues? Or is it generally a case of "if it installs it works" ?

- I see Xenserver is still using sysvinit. systemd and cgroups apparently makes the starting/timing and controlling of processes much easier and much more reliable. Can yu not do iscsi-ha as a systemd unit?

Tks very much for your excellent support - I'll continue testing.
H

Please Log in or Create an account to join the conversation.

iSCSI shared storage 'unplugged' on reboot 6 years 11 months ago #1328

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 722
Thanks for verifying.
We have seen a few cases like this since Xenserver 7.x
I am fairly certain it is related to systemd which significantly reduces boot time compared to older system V based systems.

Regarding DRBD wait forever. You can tweak the DRBD config and set a timeout if you wish.

I'll update this post once we have a fix ready for testing.
Thanks

Please Log in or Create an account to join the conversation.

iSCSI shared storage 'unplugged' on reboot 6 years 10 months ago #1330

Same problems here with two HA servers
-XenServer is 7.1 update to latest patch
-4xProLiant DL380 Gen9, 4 NICs on each server. They are in two pools with two servers

After restart I don't have iSCSI, but if I restart service manualy all is ok
command that I use is
On master
tgt-admin --update ALL
to see if running
tgt-admin --dump | grep -o "backing-store \(.*\)" | sed -e "s/^backing-store \(.*\)$/\1/"
service tgtd restart
service drbd restart


On slave
service drbd restart


to see if all is ok
iscsi-cfg status
on both master and slave

It happens on both HA, I reinstall all server for about 5-6 times and the result is the same.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2