Thursday, December 29, 2011

DB21018E DB2 CLP cannot start

DB2 CLP refused to start after I killed some db2 processes. And finally I found this error from db2diag.log. Actually db2 and db2bp use message queue to communicate. And somehow all queues (16) are all used. use command "ipcs -qp" to show the current message queues
2011-12-29-05.01.05.656543-480 E46743E405          LEVEL: Severe (OS)
PID     : 29100                TID  : 47033606104272PROC : db2
INSTANCE: db2c97               NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloexec, probe:20
MESSAGE : ZRC=0x870F00F2=-2029059854=SQLO_NORES
          "no resources to create process or thread"
CALLED  : OS, -, msgget                           OSERR: ENOSPC (28)
You can use "ipcrm -q msgid" to remove them. After removing all those message queues, db2 CLP starts.

Wednesday, December 7, 2011

Input DB2 password using Python openpty()

Here is how to use python openpty() to input DB2 password.
  • You need use openpty() because db2 insists reading from a terminal. You cannot use PIPE for stdin to pass the password.
  • You have to read from stdout first. If you write the password through pty before reading, db2 may not read the password because it needs time to start.
  • In python, read() will read until EOF. readline() won't work in that db2 prints "Enter " and waits for the input, no new line is present yet.
import os
import subprocess

m,s = os.openpty()
print m,s
p = subprocess.Popen("db2.sh", stdin=s, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
f = os.fdopen(m, "w")
out = p.stdout.read(5)
print "OUT: %s" % out
if out == "Enter":
  f.write("mypassword\n")
else:
  exit(-1)

print "\nSTDOUT:\n"
for line in p.stdout:
  print line
print "\nSTDERR:\n"
for line in p.stderr:
  print line
print p.returncode
#!/bin/bash
set -e
source /home/db2c97/sqllib/db2profile
db2 connect to DEVEDW user myname
echo $?
echo "loading ..."
db2 "select count(*) from players"
echo $?

Friday, December 2, 2011

Fix "The property zDeviceTemplates does not exist"

I wanted to bind my monitoring templates with my Hadoop devices programmatically. So I wrote a zendmd script like this:

...
templates = {
    "clientnode": [],
    "secondarynamenode": [ 'HadoopJVM', 'HadoopNameNode', 'HadoopDFS' ],
    "namenode":   [ 'HadoopJVM', 'HadoopNameNode', 'HadoopDFS' ],
    "jobtracker": [ 'HadoopJVM', 'HadoopJobTracker', 'HadoopFairScheduler' ],
    "datanode":   [ 'HadoopJVM', 'HadoopDataNode', 'HadoopTaskTracker' ],
    "utility":    []
    }
...

for item in dmd.Devices.Server.SSH.Linux.Ganglia.devices.objectItems():
  (name, device) = item
  bindings = set([ 'Device' ])
  rule = findRule(name, rules)
  if rule:
    device.zGangliaHost = gmond[rule["cluster"]]
    for t in rule["kinds"]:
      bindings = bindings.union(templates[t])
  device.zDeviceTemplates = list(bindings)
  print name, device.zDeviceTemplates
commit()

The basic idea is to define a list of templates for each node and set the templates list to zDeviceTemplates.

It worked after running this script in zendmd, and you can find all monitoring templates for the device. But you cannot bind templates in the WebUI any more. If you try to load objects including templates using ImportRM.loadObjectFromXML(xmlfile=f), it will throw this error "The property zDeviceTemplates does not exist".

Another problem is: zGangliaHost won't show up in "Configuration properties" after running the script, but Ganglia ZenPack works well.

I found exactly the same problem http://community.zenoss.org/thread/5812, which suggested "delete the device and create it again".

Actually you should never assign zGangliaHost and zDeviceTemplates directly. You should use device.setZenProperty('zGangliaHost', gmond[rule["cluster"]]) and device.setZenProperty('zDeviceTemplates', list(bindings)). setZenProperty actually maintains a internal property dict. If you assign zGangliaHost or zDeviceTemplates directly (using attribute directly), the property dict will not contain those properties, and you will get the error.

But you cannot call setZenProperty to set the property any more after the error is already thrown. You will kept gotten "the property doesn't exist" error. How should I fix it without delete the device?

It is actually pretty simple: delete attribute from zGangliaHost and zDeviceTemplates. Actually zenoss check if the property name is valid before set the property. If the object already has the attribute, the property name will be invalid because zenoss will add attribute to the object for each property. Unfortunately, the error message is misleading. This is my fix script:

for (id, dev) in dmd.Devices.Server.SSH.Linux.Ganglia.devices.objectItems():
  print '----- %s' % id
  
  try:
    gangliaHost = dev.zGangliaHost
    delattr(dev, 'zGangliaHost')
    dev.setZenProperty('zGangliaHost', gangliaHost)
  except:
    print 'Missing zGangliaHost'

  devTemplates = dev.zDeviceTemplates
  delattr(dev, 'zDeviceTemplates')
  dev.setZenProperty('zDeviceTemplates', devTemplates)

  print 'zGangliaHost = %s' % dev.zGangliaHost
  print 'zDeviceTemplates = %s' % dev.zDeviceTemplates
commit()