Tuesday, October 30, 2012

Hue 2.0 failed when syncdb with a MySQL database

If you use MySQL as HUE 2.0 database, and the default charset of the database is UTF8, you will have trouble to run syncdb because the migration ${HUE_DIR}/apps/jobsub/src/jobsub/migrations/0002_auto__add_ooziestreamingaction__add_oozieaction__add_oozieworkflow__ad.py of jobsub defines some columns as varchar(32678), which is not supported in MySQL using UTF8. The maximum length is 21844. Here is how you can fix it.
  • Don't use utf8 as default. Change /etc/my.cnf to use latin1. For latin1, varchar can hold 64K. After syncdb, just modify the tables charset to use utf8 if you still want utf8.
  • Edit the migration to replace all 32678 to 12678 before running syncdb. It doesn't matter to what value you change because the developers already have the fix in migration 0003_xxx of the same directory and the table will be fixed anyway. Don't forget to delete 0002_.pyc in the directory.

Monday, October 29, 2012

Import users to Cloudera Hue 2 from Hue 1.2.0

I thought it should not be too hard to import users from Hue 1.2.0 into Hue 2 (CDH 4.1) because this page doesn't mention special steps: https://ccp.cloudera.com/display/CDH4DOC/Hue+Installation#HueInstallation-UpgradingHuefromCDH3toCDH4.

But I was wrong. After importing auth_user, auth_group, and auth_user_groups and successfully log on using my old username and password, I got "Server Error (500)" and I couldnot find any error message in the log files.

It turns out that you have to create records in useradmin_grouppermission and useradmin_userprofile tables for each user in Hue 2(CDH4.1). Here are the queries:

mysql> insert into useradmin_grouppermission(hue_permission_id,group_id) select hp.id, g.id from useradmin_huepermission hp inner join auth_group g;
mysql> insert into useradmin_userprofile (user_id, creation_method, home_directory) select u.id, 'HUE', concat('/user/', u.username) from auth_user u;
You may need to put id<>1 if you already create the super user. It is better not to create the superuser when you do syncdb at the first time. You can do like this
  • drop database hue
  • create database hue
  • build/env/bin/hue syncdb
  • answer no when you are asked if creating a superuser
  • You just installed Django's auth system, which means you don't have any superusers defined.
    Would you like to create one now? (yes/no): no
    
  • mysqldump -uxxx -pxxxx -h old_db --compact --no-create-info --disable-keys hue auth_group auth_user auth_user_groups auth_user_user_permissions > hue_user.sql
  • mysql -uyyy -pyyyy -h new_db -D hue < hue_user.sql
  • run the above insert queries

Friday, October 26, 2012

DB2 Client Install on Linux


  1. db2ls can show the current installation
  2. [bewang@logs]$ db2ls
    Install Path                       Level   Fix Pack   Special Install Number   Install Date                  Installer UID 
    ---------------------------------------------------------------------------------------------------------------------
    /opt/ibm/db2/V9.7                 9.7.0.2        2                            Tue Oct  4 17:08:48 2011 PDT             0 
    
  3. and the installed components
  4. [bewang@logs]$ db2ls -q -b /opt/ibm/db2/V9.7
    
    Install Path : /opt/ibm/db2/V9.7
    
    Feature Response File ID             Level   Fix Pack   Feature Description  
    ---------------------------------------------------------------------------------------------------------------------
    BASE_CLIENT                         9.7.0.2          2   Base client support 
    JAVA_SUPPORT                        9.7.0.2          2   Java support 
    LDAP_EXPLOITATION                   9.7.0.2          2   DB2 LDAP support 
    
  5. db2idrop to remove, then db2_deinstall

Friday, October 12, 2012

Setup Tomcat on CentOS for Windows Authentication using SPNEGO

I setup a Tomcat server running on a Linux box with SPNEGO, so that the users can Single-Sign-On the server without typing their password. You can follow the instructions on http://spnego.sourceforge.net/spnego_tomcat.html. Although this tutorial uses Windows for example, but the steps are same as the ones on Linux.

The big problem I faced was my company's network settings:
  • There are two networks: corp.mycompany.com and lab.mycompany.com.
  • lab trusts corp, but corp doesn't trust lab
The goal is "the users from a Windows machine in corpcan access the tomcat server in lab without typing username and password."

Here is a question: where should you create the pre-auth account: in lab or corp?
I tried to create a service account in lab's AD, and registered SPNs in lab. It didn't work. When I accessed hello_spnego.jsp page on a Windows machine in corp, I always got the dialog asking for username and password. This is because I enabled downgrade to basic authentication for NTLM. If I disabled basic authentication, I would get 500 error.
I used wireshark to catch the packets and found out the traffic as bellow:
  1. Browser sends GET /hello_spnego.jsp
  2. Server returns 401 Unauthorized with Negotiate
  3. HTTP/1.1 401 Unauthorized
    Server: Apache-Coyote/1.1\r\n
    WWW-Authenticate: Negotiate\r\n
    WWW-Authenticate: Basic realm="LAB.MYCOMPANY.COM"\r\n
    
  4. Client sends KRB5 TGS-REQ
  5. Client receives KRB5 KRB Error: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
  6. Kerberos KRB-ERROR
      Pvno: 5
      MSG Type: KRB-ERROR(30)
      stime: 2012-10-10 23:04:48 (UTC)
      susec: 394362
      error_code: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
      Realm: CORP.MYCOMPANY.COM
      Server Name (Service and Instance): HTTP/tomcat.lab.mycompany.com
    
  7. Browser sends GET /hello_spnego.jsp HTTP/1.1, NTLMSSP_NEGOTIATE
Obviously, the machine in corp tries to query its own realm CORP.MYCOMPANY.COM to find the server SPN HTTP/tomcat.lab.mycompany.com. That means we should register SPNs in corp. After creating a new service account and registering SPNs in corp, I changed the pre-auth account in web.xml to serviceaccount@CORP.COMPANY.COM, then everything worked.
Then I tried to use keytab method because I don't like to put username/password in plaintext in web.xml. There are still a log of pitfalls in this step. Here are the working version of my login.conf
spnego-client {
  com.sun.security.auth.module.Krb5LoginModule required;
};

spnego-server {
  com.sun.security.auth.module.Krb5LoginModule
    required
    useKeyTab=true
    keyTab="conf/appserver.keytab"
    principal="serviceaccount@CORP.MYCOMPANY.COM"
    storeKey=true
    isInitiator=false;
};
and krb5.conf.
[libdefaults]
  default_realm = LAB.MYCOMPANY.COM
  default_tgs_enctypes = arcfour-hmac-md5 des-cbc-crc des-cbc-md5 des3-hmac-sha1
  default_tkt_enctypes = arcfour-hmac-md5 des-cbc-crc des-cbc-md5 des3-hmac-sha1
  clockskew = 300

[realms]
  LAB.MYCOMPANY.COM = {
    kdc = kdc1.lab.mycompany.com
    kdc = kdc2.lab.mycompany.com
    default_domain = lab.mycompany.com
  }                         

[default_realm]             
  lab.mycompany.com = LAB.MYCOMPANY.COM
  .lab.mycompany.com = LAB.MYCOMPANY.COM
You may encounter different issues if something is wrong. Here is my experience:
  1. If I don't quote the principal like this principal=serviceaccount@CORP.MYCOMPANY.COM, I will get the configuration error. And the message is misleading because line 9 is keyTab.
  2. Caused by: java.io.IOException: Configuration Error:
            Line 9: expected [option key], found [null]
    
  3. When you use ktab, the first thing you need to know is only windows version has this tool, while Linux RPM from oracle doesn't have it.
  4. You should use the service account in corp network, not lab, to generate the keytable file like this:
  5. ktab -a serviceaccount@CORP.MYCOMPANY.COM -k appserver.keytab
    
  6. Make sure your
  7. I also encountered this error: KrbException: Specified version of key is not available (44). It turns out that the keytab file I generated with kvno=1 and the expected is 2. You can use wireshark to catch the packet for KRB5 TGT-REP, and it will tell you what kvno is expected.
  8. Ticket
      Tkt-vno: 5
      Realm: LAB.MYCOMPANY.COM
      Server Name ....
      enc-part rc5-hmac
        Encryption type: ...
        Kvno: 2 *** Here it is
        enc-part: ...
    
  9. You have to run ktab command multiple times to achieve the correct kvno just like this page http://dmdaa.wordpress.com/2010/05/08/how-to-get-needed-kvno-for-keytab-file-created-by-java-ktab-utility/. Use can just use ktab -l to find the kvno:
  10. ktab -l -k appserver.keytab
    
  11. Which version of JDK seems not important. A keytab file generated by JDK 7 worked in JDK 1.6.0_32.
  12. I also got this Checksum error if I used my lab service account (serviceaccount@lab.mycompany.com) in pre-auth fields or keytab.
  13. SEVERE: Servlet.service() for servlet [jsp] in context with path [] threw exception [GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)] with root cause
    java.security.GeneralSecurityException: Checksum failed
            at sun.security.krb5.internal.crypto.dk.ArcFourCrypto.decrypt(ArcFourCrypto.java:388)
            at sun.security.krb5.internal.crypto.ArcFourHmac.decrypt(ArcFourHmac.java:74)
    

Thursday, October 11, 2012

Hive Server 2 in CDH4.1

Just gave a try Hive server 2 in CDH 4.1 and encounter couple of issues:
  • Hive server 2 support LDAP and Kerberos, but only supports Simple bind. Unfortunately our LDAP server only supports SASL. If you get an error saying "Error validating the login", you may have the same issue I had. Just try to use ldapsearch in the command line to verify if you can access the ladp server. And you'd better take a look of /etc/ldap.conf if you use CentOS.
  • ldapsearch -Z -x "uid=bewang"
    
  • Setting up JDBC driver seems straight forward. Unfortunately it is not. I didn't find a document saying which jars should be copied. I put hive-jdbc, hive-service, and libthrift. Unfortunately, I get an error saying "java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory" in eclipse. If I dismissed the error dialog, and click "Test Connection" again, I got this confusing error saying "java.lang.NoClassDefFoundError: Could not initialize class org.apache.thrift.transport.TSocket" which is in libthrift.jar. Actually, you need to add slf4j-api int Jar list in eclipse. You also need to add commons-logging if you want to run query.
  • You can use a new tool called beeline /usr/lib/hive/bin/beeline
  • Start Hive Server 2
    1. Install hive-server2 from cloudera cdh4 yum repository, and sudo /sbin/service hive-server2 start
    2. Or hive --service hiveserver2
  • Hive Server 2 JDBC driver doesn't work well with Eclipse data tools. You cannot see databases because the error below. Also when I run a select statement, I got a lot of "Method not support" in SQL status window, and it seemed that it will never compelete. But you can cancel the query and see the result.
  • java.sql.SQLException: Method not supported
            at org.apache.hive.jdbc.HiveDatabaseMetaData.supportsMixedCaseIdentifiers(HiveDatabaseMetaData.java:922)