chitambira | 9 May 2012 21:08

zenstatus and zenperfsnmp confusion

Zenoss Community

zenstatus and zenperfsnmp confusion

created by chitambira in zenoss-users - View the full discussion

I have a rather weird problem with a zenoss install

 

I changed the network config of my zenoss server

the config didnt not work out, so it left the server unreachable, and also not able to reach its monitored devices.

I corrected my config to a working one and I manually cleared the "network unreachable" events that had been generated.

I realised only a handful devices were monitored at this stage.

Most devices continue to be marked as "down" on their status page.

I restarted zenoss, but these devices are still not being monitored, no snmp graphs, rrds are not updated

I reboted this zenoss machine and still the problem persist.

 

zenhub run -v10  end is error:

 

2012-05-09 19:41:30,548 DEBUG zen.Plugins: Loading collector plugins from: /opt/zenoss/ZenPacks/ZenPacks.zenoss.ZenossVirtualHostMonitor-2.3.0-py2.4.egg/ZenPacks/zenoss/ZenossVirtualHostMonitor/modeler/plugins

Traceback (most recent call last):

  File "/opt/zenoss/Products/ZenHub/zenhub.py", line 613, in ?

    z = ZenHub()

  File "/opt/zenoss/Products/ZenHub/zenhub.py", line 269, in __init__

    reactor.listenTCP(self.options.pbport, pb.PBServerFactory(pt))

  File "/opt/zenoss/lib/python/twisted/internet/posixbase.py", line 328, in listenTCP

    p.startListening()

  File "/opt/zenoss/lib/python/twisted/internet/tcp.py", line 739, in startListening

    raise CannotListenError, (self.interface, self.port, le)

twisted.internet.error.CannotListenError: Couldn't listen on any:8789: (98, 'Address already in use').

 

 

in zenhub.log evrything seems ok, except that I see:

 

INFO zen.ZenHub: Worker reports 2012-03-09 10:45:34,277 WARNING zen.ZenStatus: device 'device_name' network '192.168.1.0/24' not in topology

 

 

 

if i run zenperfsnmp run -v10 -d device_name

 

I get  the following:

 

.....

.....

2012-03-09 10:02:53,874 DEBUG zen.zenperfsnmp: Finished fetching configs for 1 devices

2012-03-09 10:02:53,874 DEBUG zen.zenperfsnmp: Gathering performance data for device1.mydomain.com

2012-03-09 10:02:53,874 INFO zen.zenperfsnmp: Configured 1 of 1 devices

2012-03-09 10:02:53,874 DEBUG zen.zenperfsnmp: Getting device ping issues

2012-03-09 10:02:55,003 DEBUG zen.thresholds: Checking value 0 on Daemons/localhost/zenperfsnmp_eventQueueLength

2012-03-09 10:02:55,004 DEBUG zen.MinMaxCheck: Checking zenperfsnmp_eventQueueLength 0 against min None and max 1000

2012-03-09 10:02:55,004 DEBUG zen.zenperfsnmp: Queueing event {'manager': 'zenoss.domain.com', 'eventKey': 'high event queue', 'device': 'localhost', 'eventClass': '/Perf', 'summary': 'threshold of high event queue restored: current value 0.00', 'component': '', 'monitor': 'localhost', 'agent': 'zenperfsnmp', 'severity': 0}

2012-03-09 10:02:55,004 DEBUG zen.zenperfsnmp: Total of 1 queued events

2012-03-09 10:02:56,086 DEBUG zen.zenperfsnmp: unresponsive devices: [['server0053', 2, 64946], ['server0061, 1, 1], ......

...

...

 

......and so on listing all servers not working properly

 

 

Any ideas

Reply to this message by replying to this email -or- go to the message on Zenoss Community

Start a new discussion in zenoss-users by email or at Zenoss Community

jcurry | 9 May 2012 21:21

zenstatus and zenperfsnmp confusion

Zenoss Community

Re: zenstatus and zenperfsnmp confusion

created by jcurry in zenoss-users - View the full discussion

So can your Zenoss server actually ping the devices?  Both from command line and with a few sample tests from the Command menu?

 

You put your Zenoss server back exactly as it was???  Anything else changed - DNS, firewalls, network topology?

 

Have you remodeled your Zenoss server and is it in a consistent stae, especially with respect to its network cards?

 

Do you have any heartbeat events?

 

Are all the daemons running (zenoss status) ??

 

Cheers,

Jane

Reply to this message by replying to this email -or- go to the message on Zenoss Community

Start a new discussion in zenoss-users by email or at Zenoss Community

chitambira | 9 May 2012 21:49

zenstatus and zenperfsnmp confusion

Zenoss Community

Re: zenstatus and zenperfsnmp confusion

created by chitambira in zenoss-users - View the full discussion

Yes i can ping all the devices, I can also snmpwalk them

The server is as it was, no dns/firewall issues and the network topology hasnt changed.

I tried remodelled the affected servers with no luck

All daems are running and erros I can see are the ones I have posted above.

Its wierd because some devices are monitored ok, but thats only about 5% of the total

 

zenperfsnmp when the server was ok: (showing 391 devices)

 

2012-03-07 05:13:59,575 INFO zen.zenperfsnmp: ******** Cycle completed ********

2012-03-07 05:13:59,575 INFO zen.zenperfsnmp: Sent 38309 OID requests

2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: Queried 391 devices

2012-03-07 05:13:59,576 INFO zen.zenperfsnmp:   0 in queue still unqueried

2012-03-07 05:13:59,576 INFO zen.zenperfsnmp:   Successes: 384  Failures: 7  Not reporting: 0

2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: Waited on 0 queries from previous cycles.

2012-03-07 05:13:59,576 INFO zen.zenperfsnmp:   Successes: 0  Failures: 0  Not reporting: 0

2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: Cycle lasted 166.34 seconds

2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: *********************************

 

 

 

Now showing 57 devices only

 

2012-03-09 19:03:07,285 INFO zen.zenperfsnmp: ******** Cycle completed ********

2012-03-09 19:03:07,285 INFO zen.zenperfsnmp: Sent 5920 OID requests

2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: Queried 57 devices

2012-03-09 19:03:07,286 INFO zen.zenperfsnmp:   0 in queue still unqueried

2012-03-09 19:03:07,286 INFO zen.zenperfsnmp:   Successes: 54  Failures: 3  Not reporting: 0

2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: Waited on 0 queries from previous cycles.

2012-03-09 19:03:07,286 INFO zen.zenperfsnmp:   Successes: 0  Failures: 0  Not reporting: 0

2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: Cycle lasted 12.01 seconds

2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: *********************************

Reply to this message by replying to this email -or- go to the message on Zenoss Community

Start a new discussion in zenoss-users by email or at Zenoss Community

chitambira | 9 May 2012 22:03

zenstatus and zenperfsnmp confusion

Zenoss Community

Re: zenstatus and zenperfsnmp confusion

created by chitambira in zenoss-users - View the full discussion

Also zenping run -v10  is showing error;

 

Unhandled error in Deferred:

Traceback (most recent call last):

  File "/opt/zenoss/Products/ZenUtils/ZenDaemon.py", line 232, in sigTerm

    if callable(stop): stop()

  File "/opt/zenoss/Products/ZenHub/PBDaemon.py", line 298, in stop

    drive(self.pushEvents).addBoth(stopNow)

  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 214, in addBoth

    callbackKeywords=kw, errbackKeywords=kw)

  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 186, in addCallbacks

    self._runCallbacks()

--- <exception caught here> ---

  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks

    self.result = callback(self.result, *args, **kw)

  File "/opt/zenoss/Products/ZenHub/PBDaemon.py", line 288, in stopNow

    reactor.stop()

  File "/opt/zenoss/lib/python/twisted/internet/base.py", line 494, in stop

    raise error.ReactorNotRunning(

twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running.

2012-03-09 21:00:13,889 DEBUG zen.ZenPing: Sent a 'stop' event

2012-03-09 21:00:13,889 INFO zen.ZenPing: Daemon ZenPing shutting down

2012-03-09 21:00:13,889 DEBUG zen.ZenPing: Removing service EventService

2012-03-09 21:00:13,889 DEBUG zen.ZenPing: Removing service PingConfig

Reply to this message by replying to this email -or- go to the message on Zenoss Community

Start a new discussion in zenoss-users by email or at Zenoss Community

Chet Luther | 9 May 2012 23:20

zenstatus and zenperfsnmp confusion

Zenoss Community

Re: zenstatus and zenperfsnmp confusion

created by Chet Luther in zenoss-users - View the full discussion

The "Couldn't listen on any:8789" error you get when running zenhub in the foreground is normal. The zenhub daemon binds two ports: 8789 and 8081. So you can't run two copies a the same time. If you want to run zenhub in the foreground you must first stop the daemon. This is a long way of saying that I don't think the error is related to your problem.

 

Don't worry about the "not in topology" warnings. They're completely benign and not related to your problem.

 

The reason zenperfsnmp is showing so many "unresponsive devices" is almost certainly because those devices have active critical /Status/Ping events. The zenperfsnmp daemon won't attempt to collect from devices that Zenoss thinks are ping unreachable. Can you confirm or deny this by looking at your event console?

 

Depending on what version of Zenoss you're running, it may be normal to see that "Can't stop reactor that isn't running." error when running zenping in the foreground without the --cycle parameter. It'll do one pass then terminate in that ugly way. I don't think it's related to your problem.

 

All of that being said, the "unexpected pkt" could potentially be related to the problem since the root of the problem seems to be that Zenoss thinks devices are ping unreachable when they're not.

 

Would you try clearing all /Status/Ping events from your event console and restarting zenping?

Reply to this message by replying to this email -or- go to the message on Zenoss Community

Start a new discussion in zenoss-users by email or at Zenoss Community


Gmane