Forum

Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /homepages/13/d467848118/htdocs/templates/cleanout/vertex/responsive/responsive_mobile_menu.php on line 158
Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1

TOPIC:

Slave won't connect to ISCSI 4 years 8 months ago #1996

  • Jim McNamara
  • Jim McNamara's Avatar Topic Author
  • Offline
  • Posts: 3
Hello! I've had a 2 node setup that has been fine for about 9 months. Around the time we hit the US daylight savings time, the slave somehow lost connection with the master. 10.10.10.1 has 10.10.10.3 on it, but until I rebooted 10.10.10.2 could not even ping let alone log into the iscsi at 10.10.10.3.

After the reboot both machines can ping all 3 addresses on the 10.10.10 subnet, but the slave will not allow the iscsi to reconnect. When I try to "repair" in XCP-ng Center I am given the message:
The storage repository is not available

On the slave this is what is shown in iscsi-cfg log:
Mar  9 01:14:30 office2 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar  9 01:14:33 office2 iscsi-ha: 3829 Spawning new instance of iscsi-ha
Mar  9 01:14:33 office2 iscsi-ha: 3829 check_logger_processes Checking logger processes
Mar  9 01:14:33 office2 iscsi-ha: 3829 check_logger_processes No processes to clear
Mar  9 01:14:33 office2 iscsi-ha:  Normalized ISCSI_TARGET_SERVICE [ tgtd ]
Mar  9 01:14:33 office2 iscsi-ha:  XenServer Major Release = [ 7 ]
Mar  9 01:14:33 office2 iscsi-ha:  Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Mar  9 01:14:33 office2 iscsi-ha:  This iteration is count 53
Mar  9 01:14:33 office2 iscsi-ha:  Checking if this host is a Pool Master or Slave
Mar  9 01:14:33 office2 iscsi-ha:  This host's pool status = slave:192.168.37.101
Mar  9 01:14:33 office2 iscsi-ha:  service_execute: Execute [ status ] on [ iscsi-ha ]
Mar  9 01:14:33 office2 iscsi-ha:  service_execute: System V mode detected
Mar  9 01:14:33 office2 iscsi-ha:  service_execute: [  OK  ]#015iscsi-ha running: 7360
Mar  9 01:14:33 office2 iscsi-ha:  service_execute: Returning exit status [ 0 ]
Mar  9 01:14:33 office2 iscsi-ha: 4292 local_ip_list: Local IP list returned 127.0.0.1#01210.10.10.2#012192.168.37.102
Mar  9 01:14:33 office2 iscsi-ha: 4292 service_execute: Execute [ status ] on [ tgtd ]
Mar  9 01:14:33 office2 iscsi-ha: 4292 service_execute: systemctl mode being used
Mar  9 01:14:33 office2 iscsi-ha: 4292 service_execute: ● tgtd.service - tgtd iSCSI target daemon#012   Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)#012  Drop-In: /etc/systemd/system/tgtd.service.d#012           └─local.conf#012   Active: inactive (dead)
Mar  9 01:14:33 office2 iscsi-ha: 4292 service_execute: Returning exit status [ 3 ]
Mar  9 01:14:33 office2 iscsi-ha: 4292 iSCSI target: tgtd status stopped. Expected Stopped . [inactive (dead)]
Mar  9 01:14:33 office2 iscsi-ha: 4292 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101)#012srcversion: 2A6B2FA4F0703B49CA9C727 #012#012 1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----#012    ns:0 nr:12663452 dw:12663452 dr:0 al:0 bm:0 lo:0 pe:24 ua:0 ap:0 ep:1 wo:f oos:3207867780#012#011[>....................] sync'ed:  0.3% (3132680/3140508)M#012#011finish: 146:57:55 speed: 6,044 (4,700) want: 54,360 K/sec
Mar  9 01:14:33 office2 iscsi-ha: 4292 validate_drbd_resources_loaded: Checking DRBD has loaded with resources. Checking [ 7 ] > [ 2 ]
Mar  9 01:14:33 office2 iscsi-ha: 4292 validate_drbd_resources_loaded: Resources loaded
Mar  9 01:14:33 office2 iscsi-ha: 4292 check_drbd_resource_state: DRBD Resource: iscsi1 in Secondary mode
Mar  9 01:14:33 office2 iscsi-ha: 4292 DRBD Resource: [iscsi1] in [SyncTarget] state - expected Connected state
Mar  9 01:14:33 office2 iscsi-ha: 4292 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Mar  9 01:14:33 office2 iscsi-ha: 4292 email: Duplicate message - not sending. Content = DRBD Resource:       [iscsi1] in [SyncTarget] state - expected Connected state
Mar  9 01:14:33 office2 iscsi-ha: 4292 email: Message barred for 30 minutes
Mar  9 01:14:33 office2 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon:   /run/lvm/lvmetad.socket: connect failed: No such file or directory
Mar  9 01:14:33 office2 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon:   WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
Mar  9 01:14:33 office2 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Scanning for Volume Group -> iscsi-sr: b965bb7f-0cc8-6ae4-24c4-37d13cc04891
Mar  9 01:14:35 office2 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK
Mar  9 01:14:40 office2 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK

Here is what the master has to say on its iscsi-cfg log:
Mar  9 01:15:47 office1 iscsi-ha: 7309 Spawning new instance of iscsi-ha
Mar  9 01:15:47 office1 iscsi-ha: 7309 check_logger_processes Checking logger processes
Mar  9 01:15:47 office1 iscsi-ha: 7309 check_logger_processes No processes to clear
Mar  9 01:15:47 office1 iscsi-ha:  Normalized ISCSI_TARGET_SERVICE [ tgtd ]
Mar  9 01:15:47 office1 iscsi-ha:  XenServer Major Release = [ 7 ]
Mar  9 01:15:47 office1 iscsi-ha:  Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Mar  9 01:15:47 office1 iscsi-ha:  This iteration is count 2150
Mar  9 01:15:47 office1 iscsi-ha:  Checking if this host is a Pool Master or Slave
Mar  9 01:15:47 office1 iscsi-ha:  This host's pool status = master
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: Execute [ status ] on [ iscsi-ha ]
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: System V mode detected
Mar  9 01:15:47 office1 iscsi-ha:  auto_plug_pbd: Found LVMoISCSI SR List: b965bb7f-0cc8-6ae4-24c4-37d13cc04891
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: [  OK  ]#015iscsi-ha running: 25574
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: Returning exit status [ 0 ]
Mar  9 01:15:47 office1 iscsi-ha: 8057 DRBD Running on this host: version: 8.4.5 (api:1/proto:86-101) srcversion: 2A6B2FA4F0703B49CA9C727 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:634910084 nr:0 dw:141473328 dr:1522968804 al:19041574 bm:0 lo:3 pe:118 ua:0 ap:120 ep:1 wo:f oos:3207584004 [>....................] sync'ed: 0.3% (3132404/3140508)M finish: 131:22:48 speed: 6,764 (4,664) K/sec
Mar  9 01:15:47 office1 iscsi-ha: 8057 validate_drbd_resources_loaded: Checking DRBD has loaded with resources. Checking [ 7 ] > [ 2 ]
Mar  9 01:15:47 office1 iscsi-ha: 8057 validate_drbd_resources_loaded: Resources loaded
Mar  9 01:15:47 office1 iscsi-ha: 8057 check_drbd_resource_state: DRBD Resource: iscsi1 in Primary mode
Mar  9 01:15:47 office1 iscsi-ha: 8057 DRBD Resource: iscsi1 in [SyncSource] state - expected Connected state
Mar  9 01:15:47 office1 iscsi-ha: 8057 email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Mar  9 01:15:47 office1 iscsi-ha: 8057 email: Duplicate message - not sending. Content = DRBD Resource: iscsi1 in [SyncSource] state - expected Connected state
Mar  9 01:15:47 office1 iscsi-ha: 8057 email: Message barred for 30 minutes
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: Execute [ status ] on [ tgtd ]
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: systemctl mode being used
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: ● tgtd.service - tgtd iSCSI target daemon#012   Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled; vendor preset: disabled)#012  Drop-In: /etc/systemd/system/tgtd.service.d#012           └─local.conf#012   Active: active (running) since Mon 2019-07-08 16:47:11 EDT; 8 months 0 days ago#012 Main PID: 25786 (tgtd)#012   CGroup: /system.slice/tgtd.service#012           └─25786 /usr/sbin/tgtd -f
Mar  9 01:15:47 office1 iscsi-ha: 8057 service_execute: Returning exit status [ 0 ]
Mar  9 01:15:47 office1 iscsi-ha: 8057 iSCSI target: tgtd status = OK. [ active (running) since Mon 2019-07-08 16:47:11 EDT; 8 months 0 days ago ]
Mar  9 01:15:47 office1 iscsi-ha: 8057 local_ip_list: Local IP list returned 127.0.0.1#01210.10.10.1#01210.10.10.3#012192.168.37.101
Mar  9 01:15:47 office1 iscsi-ha: 8057 CHECKING IP 127.0.0.1
Mar  9 01:15:47 office1 iscsi-ha: 8057 CHECKING IP 10.10.10.1
Mar  9 01:15:47 office1 iscsi-ha: 8057 CHECKING IP 10.10.10.3
Mar  9 01:15:47 office1 iscsi-ha: 8057 Virtual IP: 10.10.10.3 discovered on host office1
Mar  9 01:15:47 office1 iscsi-ha: 8057 send_replication_network_arp: Sending ARP update to peer
Mar  9 01:15:47 office1 iscsi-ha: 8057 send_replication_network_arp: IP address list for [ xapi2 ] = [ 10.10.10.1#01210.10.10.3 ]
Mar  9 01:15:47 office1 iscsi-ha: 8057 send_replication_network_arp: Updating ARP for device [ xapi2 ] IP [ 10.10.10.1 ]
Mar  9 01:15:48 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: ARPING 10.10.10.1 from 10.10.10.1 xapi2
Mar  9 01:15:48 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Sent 2 probes (2 broadcast(s))
Mar  9 01:15:48 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Received 0 response(s)
Mar  9 01:15:48 office1 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: Error code: SR_BACKEND_FAILURE_47
Mar  9 01:15:48 office1 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon: Error parameters: , The SR is not available [opterr=The SR is not available [opterr=no such volume group: VG_XenStorage-b965bb7f-0cc8-6ae4-24c4-37d13cc04891]],
Mar  9 01:15:48 office1 iscsi-ha:  auto_plug_pbd: Successfully plugged PBD: [85af2a74-189a-56ca-5f54-4b09b7a20947] SR: [b965bb7f-0cc8-6ae4-24c4-37d13cc04891]
Mar  9 01:15:48 office1 iscsi-ha: 8057 send_replication_network_arp: Updating ARP for device [ xapi2 ] IP [ 10.10.10.3 ]
Mar  9 01:15:48 office1 iscsi-ha:  email: Mail Spool Directory Found /dev/shm/iscsi-ha-mail
Mar  9 01:15:48 office1 iscsi-ha:  email: Duplicate message - not sending. Content = auto_plug_pbd: Successfully plugged PBD: [85af2a74-189a-56ca-5f54-4b09b7a20947] SR: [b965bb7f-0cc8-6ae4-24c4-37d13cc04891]
Mar  9 01:15:48 office1 iscsi-ha:  email: Message barred for 30 minutes
Mar  9 01:15:48 office1 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon:   /run/lvm/lvmetad.socket: connect failed: No such file or directory
Mar  9 01:15:48 office1 iscsi-ha-ERROR-/etc/iscsi-ha/init/iscsi-ha.mon:   WARNING: Failed to connect to lvmetad. Falling back to internal scanning.
Mar  9 01:15:49 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: ARPING 10.10.10.3 from 10.10.10.3 xapi2
Mar  9 01:15:49 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Sent 2 probes (2 broadcast(s))
Mar  9 01:15:49 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Received 0 response(s)
Mar  9 01:15:49 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Scanning for Volume Group -> iscsi-sr: b965bb7f-0cc8-6ae4-24c4-37d13cc04891
Mar  9 01:15:49 office1 iscsi-ha-NOTICE-/etc/iscsi-ha/init/iscsi-ha.mon: Volume Group for iSCSI-SR found OK: b965bb7f-0cc8-6ae4-24c4-37d13cc04891
Mar  9 01:15:51 office1 iscsi-ha:  iscsi-ha Watchdog: iscsi-ha running - OK

Any idea what is happening and what I can go to get the slave seeing the iscsi share again?

Please Log in or Create an account to join the conversation.

Slave won't connect to ISCSI 4 years 8 months ago #1998

  • Jim McNamara
  • Jim McNamara's Avatar Topic Author
  • Offline
  • Posts: 3
There seems to be some iscsi functionality that is failing. From the slave i see this:
[root@office2 ~]# ping -c 4 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=2.48 ms
64 bytes from 10.10.10.3: icmp_seq=2 ttl=64 time=0.145 ms
64 bytes from 10.10.10.3: icmp_seq=3 ttl=64 time=0.122 ms
64 bytes from 10.10.10.3: icmp_seq=4 ttl=64 time=0.152 ms

--- 10.10.10.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.122/0.726/2.485/1.015 ms
[root@office2 ~]# grep 10.10.10.3 /proc/net/arp 
10.10.10.3       0x1         0x2         24:6e:96:27:37:da     *        xapi2
[root@office2 ~]# iscsiadm -m discovery -t st -p 10.10.10.3
iscsiadm: cannot make connection to 10.10.10.3: No route to host
iscsiadm: cannot make connection to 10.10.10.3: No route to host
iscsiadm: cannot make connection to 10.10.10.3: No route to host
iscsiadm: cannot make connection to 10.10.10.3: No route to host
iscsiadm: cannot make connection to 10.10.10.3: No route to host
iscsiadm: cannot make connection to 10.10.10.3: No route to host
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: Could not perform SendTargets discovery: encountered connection failure

While from the master there seems to be no issue -
[root@office1 ~]# ping -c4 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=0.023 ms
64 bytes from 10.10.10.3: icmp_seq=2 ttl=64 time=0.026 ms
64 bytes from 10.10.10.3: icmp_seq=3 ttl=64 time=0.022 ms
64 bytes from 10.10.10.3: icmp_seq=4 ttl=64 time=0.023 ms

--- 10.10.10.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.022/0.023/0.026/0.005 ms
[root@office1 ~]# /sbin/ifconfig xapi2
xapi2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.10.1  netmask 255.255.255.0  broadcast 10.10.10.255
        ether 24:6e:96:27:37:da  txqueuelen 1  (Ethernet)
        RX packets 32654089251  bytes 48547202244056 (44.1 TiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 23043446272  bytes 95475767408621 (86.8 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@office1 ~]# iscsiadm -m discovery -t st -p 10.10.10.3
10.10.10.3:3260,1 iqn.2015.com.halizard:noSAN

As the directions state, we have the both interfaces on the 10.10.10.0/24 network directly connected without a switch in between.

Please Log in or Create an account to join the conversation.

Slave won't connect to ISCSI 4 years 8 months ago #1999

  • Salvatore Costantino
  • Salvatore Costantino's Avatar
  • Offline
  • Posts: 727
Can you confirm that your hosts are using a time server and are in sync? Given that this coincided with daylight savings time change, the root cause could be related to that. Xenserver is known to be sensitive to clock drift between pool members.

Regarding the second update, looks like the firewall rule that allows the storage to be exposed could have been affected. There should be a rule in iptables allowing traffic to 10.10.10.0/24. You can check whether it is there with
iptables -L

or, simply try stopping the firewall and see if that resolves the issue.
service iptables stop
The following user(s) said Thank You: Jim McNamara

Please Log in or Create an account to join the conversation.

Last edit: by Salvatore Costantino.

Slave won't connect to ISCSI 4 years 8 months ago #2000

  • Jim McNamara
  • Jim McNamara's Avatar Topic Author
  • Offline
  • Posts: 3
The firewall was the true source of the issue. The time jump played a factor as well, but not as large. Both machines had ntp running and were very well in sync with the ntp servers. The problem was that the 10.10.10.0/24 exemption had been removed from the iptables config file for some time, but the service hadn't been restarted. So the old file with the exemption was still running. Then when the time jump happened I got 10.10.10.2 into maintenance mode and rebooted it, so on restart the exemption was missing. Thank you for the solution, Salvatore!

Please Log in or Create an account to join the conversation.

  • Page:
  • 1