Drew Weaver | 23 Mar 14:14 2011

[c-nsp] C6509 funky secondary reloading/errors.

Howdy,

We have a C6509 with 2xWS-SUP720

We have been having an issue where the secondary sup will just reload seemingly at random:

2011-03-23 08:49:06	Local7.Info	xxxxx	212267: Mar 23 07:49:36.903 EST: %PFREDUN-SP-6-ACTIVE:
Standby processor removed or reloaded, changing to Simplex mode 

There is no other log message about it..

2011-03-23 08:49:06	Local7.Error	xxxxx	212268: Mar 23 07:49:36.903 EST: %OIR-SP-3-PWRCYCLE: Card
in module 6, is being power-cycled 'Module reset'

After the card loaded I noticed these:

Mar 10 15:15:11 EST: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 6 TestSPRPInbandPing consecutive failure count:10
Mar 10 15:15:11 EST: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=14% RP=6% Traffic=6%
netint_thr_active[0], Tx_Rate[1233], Rx_Rate[881], dev=4[IPv6, fail=10]
Mar 10 15:15:11 EST: %CONST_DIAG-SP-4-HM_TEST_WARNING: Sup switchover will occur after 10 consecutive failures
Mar 10 15:16:44 EST: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 6 TestSPRPInbandPing consecutive failure count:15
Mar 10 15:16:44 EST: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=13% RP=22 % Traffic=5%
netint_thr_active[0], Tx_Rate[872], Rx_Rate[640], dev=4[IPv6, fail=15]
Mar 10 15:16:44 EST: %CONST_DIAG-SP-4-HM_TEST_WARNING: Sup switchover will occur after 10 consecutive failures
Mar 10 15:18:16 EST: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 6 TestSPRPInbandPing consecutive failure count:20
Mar 10 15:18:16 EST: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=18% RP=1% Traffic=5%
netint_thr_active[0], Tx_Rate[978], Rx_Rate[665], dev=4[IPv6, fail=20]
Mar 10 15:18:16 EST: %CONST_DIAG-SP-4-HM_TEST_WARNING: Sup switchover will occur after 10 consecutive failures
Mar 10 15:19:51 EST: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 6 TestSPRPInbandPing consecutive failure count:25
Mar 10 15:19:51 EST: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=18% RP=31% Traffic=5%
(Continue reading)


Gmane