Friday, August 19, 2011

How to debug Pig UDFs in Eclipse?

It is not that hard as you think actually.
  1. Create a maven project using m2eclipse.
  2. Add org.apache.pig:pig as dependency.
  3. Click "Debug configurations ...".
  4. Create a "Java Application".
  5. Main class "org.apache.pig.Main".
  6. In Arguments tab, put "-x local" and other arugments in "Program arguments".
  7. In Environment tab, create a variable "PATH" as "${env_var:path};c:\cygwin\bin" if your OS is Windows.
  8. Debug your script, and Eclipse will stop at the breakpoint you set in UDF.

How to debug Hadoop MapReduce jobs in Eclipse.

It is actually very easy to debug Hadoop MapReduce jobs in Eclipse, especially when you use maven.
  1. Create a maven project using m2eclipse.
  2. Add org.apache.hadoop:hadoop-core as dependency.
  3. You can set breakpoint at any line in your code.
  4. Right-click your drive class, Debug As -> Java Application
  5. In arguments tab of launch configuration, put "-fs file:/// -jt local -Dmapred.local.dir=c:/temp/hadoop your_input_file c:/temp/hadoop/output" in "Program arguments"
  6. If you run on Windows, you have to use Cygwin because hadoop uses external shell command "chmod". In Environment tab, add environment variable PATH, value is ${env_var:path};c:\cygwin\bin. Then hadoop can find chmod.
  7. Click debug, you can debug your MapReduce code in eclipse. Hadoop is running in local mode.




Tuesday, August 16, 2011

If java is installed in c:\Program Files\Java\, it is headache to make Hadoop/Hive works. Hive reports an error like this:

/workplace/apps/hadoop-0.20.2-cdh3u1/bin/hadoop: line 300: /cygdrive/c/Program:
No such file or directory

Apparently the space in JAVA_HOME doesn't work.

I tried several solutions on the Internet, but none of them works. For example,

export JAVA_HOME=/cygdrive/c/Program\ Files/Java/jdk1.6.0_25
export JAVA_HOME=/cygdrive/c/"Program Files"/Java/jdk1.6.0_25

Here is how I solve this problem: create a soft link like this

ln -s /cygdrive/c/Program\ Files/Java/jdk1.6.0_25 /usr/java/default

and set JAVA_HOME in ${HADOOP_HOME}/conf/hadoop-env.sh like this

export JAVA_HOME=/usr/java/default