Thursday, November 17, 2011

Invalid artifact issue of Eclipselink Nexus Proxy Repository

I use Nexus as my internal maven repository. I set up a proxy repository "Eclipselink Maven Mirror" for Eclipselink. Eclipselink uses this URL http://www.eclipse.org/downloads/download.php?r=1&nf=1&file=/rt/eclipselink/maven.repo which actually redirects you to a mirror site. Just put this link into "Remote Storage Location" and create a proxy repository. I made Eclipselink repository as the last one in "Ordered Group Repository" in "Public Repository" Configuration tab. My maven settings.xml defines a mirror to use the public repository group like this
    <mirrors>
        <mirror>
            <id>nexus-public</id>
            <mirrorOf>*</mirrorOf>
            <url>http://nexus:8080/nexus/content/groups/public</url>
        </mirror>
    </mirrors>
Unfortunately it doesn't work well. When maven gets some of artifacts, especially my own artifacts, Nexus returns invalid pom or jar files in this "Eclipselink Maven Mirror", which are actually HTML pages. And the artifacts will be saved into maven local repository, and make my maven builds fail again and again. I don't know how Nexus search the public repositories to locate an artifact. But looks like Nexus tries "EclipseLink Maven Mirror", then access the Eclipselink Repo using the URL listed above. Unfortunately this URL will return a HTML page once the artifact is not found. But Nexus didn't check if it is invalid because I didn't configure it. Here is my solution:
  • Setup two mirrors: one for eclipselink and one for the others except eclipselink.
  •         
                nexus-public
                *,!eclipselink-repo
                http://nexus:8080/nexus/content/groups/public
            
            
                nexus-eclipselink
                eclipselink-repo
                http://nexus:8080/nexus/content/repositories/eclipselink-maven-mirror
            
    
  • Set "File Content Validation" to true For "Eclipselink Maven Mirror"

Tuesday, November 8, 2011

Hive Metastore Trick: "get_privilege_set failed"

Three kinds of Hive metastores are supported in Hive 0.7.1:
  • Embeded
  • Local
  • Remote

It seems straight forward. Right? If MySQL metastore database is installed on serverA, and the Hive client is running on serverB, which one you should use? Definitely not Embedded. Should use "Remote"? Does this configuration hive-site.xml work?


<property>
  <name>hive.metastore.local</name>
  <value>false</value>
  <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://serverA:8003</value>
  <description>host and port for the thrift metastore server</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://chelhadedw002/metastore_dev</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>username</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

Unfortunately this doesn't work correctly. We could run "show tables", but Hive threw the exception when we run a select statement like this
FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.metadata.HiveException(org.apache.thrift.TApplicationException: get_privilege_set failed: unknown result)
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: get_privilege_set failed: unknown result
        at org.apache.hadoop.hive.ql.metadata.Hive.get_privilege_set(Hive.java:1617)
        at org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider.authorizeUserPriv(DefaultHiveAuthorizationProvider.java:201)
        at org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider.authorizeUserAndDBPriv(DefaultHiveAuthorizationProvider.java:226)
        at org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider.authorizeUserDBAndTable(DefaultHiveAuthorizationProvider.java:259)
        at org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider.authorize(DefaultHiveAuthorizationProvider.java:159)
        at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:531)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: org.apache.thrift.TApplicationException: get_privilege_set failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_privilege_set(ThriftHiveMetastore.java:2414)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_privilege_set(ThriftHiveMetastore.java:2379)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.get_privilege_set(HiveMetaStoreClient.java:1042)
        at org.apache.hadoop.hive.ql.metadata.Hive.get_privilege_set(Hive.java:1615)
        ... 15 more

The problem of our configuration is "We should use LOCAL metastore even the MySQL metastore locates on a different server", in other word, hive.metastore.local=true. If you want to use "Remote" metastore, you need to start the metastore service hive --service metastore. It seems working when we use "serverA:8003", because we install Hue on serverA, and beeswax starts a metastore thrift service on port 8003. That is why you can run show tables successfully, but get the above error when you run select.

Another issue is: when you use remote metastore, you will still have this issue. see https://issues.apache.org/jira/browse/HIVE-2554 and https://issues.apache.org/jira/browse/HIVE-2405.