Problem Definition
A customer had installed NetScaler appliances in a high availability setup. The customer reported the high availability pair of the appliances was not creating a Link Aggregation Control Protocol (LACP) channel even when the status of the interfaces was marked as UP.
Environment
The customer had the following components set up in the network:
- Two NetScaler MPX 7500 appliances installed with NetScaler software release 9.2
- A layer 2 Switch installed with Cisco Internetwork Operating System (IOS) software release 12.2
Troubleshooting Methodology
To troubleshoot the issue, the Citrix Technical Support Engineers requested the customer to provide an access to the network. The engineers observed that the Cisco Switch was attempting to create a LACP channel and was not successful.
To further troubleshoot the issue, the engineers completed the following tasks:
- The engineers analyzed the /var/log/messages log file and noticed that there were a few RX error entries in the log file. The following is an excerpt of the messages log file:
Jan 25 10:31:58 <local0.notice> 172.28.23.28 01/25/2011:10:31:58 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1022 :?? Device "interface(1/4)" - RX state PORT_DISABLED
Jan 25 10:31:58 <local0.notice> 172.28.23.28 01/25/2011:10:31:58 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1023 :?? Device "interface(1/4)" - RX state LACP_DISABLED
Jan 25 10:32:42 <local0.notice> 172.28.23.28 01/25/2011:10:32:42 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1037 :?? Device "interface(1/8)" - RX state INIT
Jan 25 10:32:42 <local0.notice> 172.28.23.28 01/25/2011:10:32:42 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1038 :?? Device "interface(1/8)" - RX state PORT_DISABLED
Jan 25 10:32:42 <local0.notice> 172.28.23.28 01/25/2011:10:32:42 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1039 :?? Device "interface(1/8)" - RX state EXPIRED
Jan 25 10:32:42 <local0.notice> 172.28.23.28 01/25/2011:10:32:42 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1041 :?? Device "interface(1/4)" - RX state INIT
Jan 25 10:32:42 <local0.notice> 172.28.23.28 01/25/2011:10:32:42 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1042 :?? Device "interface(1/4)" - RX state PORT_DISABLED
Jan 25 10:32:42 <local0.notice> 172.28.23.28 01/25/2011:10:32:42 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1043 :?? Device "interface(1/4)" - RX state EXPIRED
Jan 25 10:32:45 <local0.notice> 172.28.23.28 01/25/2011:10:32:45 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1047 :?? Device "interface(1/8)" - RX state DEFAULTED
Jan 25 10:32:45 <local0.notice> 172.28.23.28 01/25/2011:10:32:45 GMT end-eplb02-cl01 PPE-0 : EVENT NICLACPSC 1048 :?? Device "interface(1/4)" - RX state DEFAULTED
- The engineers analyzed the /var/log/newnslog file. They used the newnslog lacp variables to analyze the number of Link Aggregation Control Protocol Data Units (LACPDU) received or transmitted by the appliance. They requested the customer to run the following commands:
- $ nsconmsg -K newnslog -g rx_lacp -d current | more
NetScaler V20 Performance Data
NetScaler NS9.2: Build 49.8.nc, Date: Nov 15 2010, 11:42:29??????????????????????????????????????????????
The appliance was not receiving any LACPDUs from the other appliance.
- $ nsconmsg -K newnslog -g tx_lacp -d current | more
NetScaler V20 Performance Data
NetScaler NS9.2: Build 49.8.nc, Date: Nov 15 2010, 11:42:29
?? Index???? rtime totalcount-val?????????? delta rate/sec symbol-name&device-no
?????????? 0???? 21000?????????????????????????? 4?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
?????????? 1?????? 7000?????????????????????????? 4?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/8)
?????????? 2???? 21000?????????????????????????? 5?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
?????????? 3?????? 7000?????????????????????????? 5?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/8)
?????????? 4???? 28000?????????????????????????? 6?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/8)
?????????? 5???????????? 0?????????????????????????? 6?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
?????????? 6???? 28000?????????????????????????? 7?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/8)
?????????? 7???????????? 0?????????????????????????? 7?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
?????????? 8???? 28000?????????????????????????? 8?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
?????????? 9?????? 7000?????????????????????????? 8?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/8)
???????? 10???? 28000?????????????????????????? 9?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/8)
???????? 11???????????? 0?????????????????????????? 9?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
???????? 12???? 28000???????????????????????? 10?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/8)
???????? 13???????????? 0???????????????????????? 10?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
???????? 14???? 28000???????????????????????? 11?????????????????? 1?????????????? 0 nic_tot_tx_lacpdus interface(1/4)
The appliance was sending LACPDUs to the other appliance when the status of the interfaces was marked as UP.
Resolution
From the analysis of the newnslog file, the engineers identified that the Switch was not sending LACPDUs to the appliance. The engineers requested the customer to connect the NetScaler appliance to a different Switch and the LACP channel was created between the NetScaler appliances.
More Information
Refer to the Knowledge Center article CTX125102 – Netscaler NIC Counters for the list of various newnslog Network Interface Card (NIC) variables and the respective description.