Monday, September 12, 2016

Gradle show dependencies of the specified configuration

I record the method here in case I forget how to show the dependencies for ONLY one configuration like runtime again.
  • Show dependencies for one configuration ONLY: ./gradlew dependencies --configuration runtime
  • Show a task’s arguments: ./gradlew help --task dependencies

Record Request and Reponse of HTTP Samplers in JMeter

  • Add “Simple Data Writer”
  • Give the file name like “result.xml”
  • Click Configure button.
  • Check the radio buttons for
    • Save As XML
    • Save URL
    • Save Response Data (XML)
  • Start the jmeter test
  • Check the XML file
  • If you stop jmeter, then start another run, the requests and responses in the new run will be appended to the XML file.
<testResults>
<httpSample t="17" lt="17" ts="1473705874608" s="true" lb="Profile Request" rc="200" rm="OK" tn="Profile (year) 1-1" dt="text" by="253" ng="1" na="2">
  <responseData class="java.lang.String">...</responseData>
  <java.net.URL>...</java.net.URL>
</httpSample>
</testResults>

Friday, September 9, 2016

Make Spark read Teradata directly.

Spark SQL supports read JDBC resource, but the paragraph in the latest 2.0 document is not really helpful:
The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.
I wrote this blog to explain how to run against Teradata database:
  • in spark-shell and YARN-cluster mode
  • spark-submit and YARN-cluster mode

spark-shell and YARN mode

...
--conf spark.executor.extraClassPath=./tdgssconfig-15.10.00.22.jar:./terajdbc4-15.10.00.22.jar \
--driver-class-path $LIB_DIR/tdgssconfig-15.10.00.22.jar:$LIB_DIR/terajdbc4-15.10.00.22.jar \
...
  • for executor.extraClassPath, the path is the current directory ./, which is where YARN starts the executor.
  • for --driver-class-path, “$LIB_DIR” is the directory where those jdbc driver jars live, it is on the host where you run spark-shell.

spark-submit and YARN-cluster mode

...
--conf spark.executor.extraClassPath=./tdgssconfig-15.10.00.22.jar:./terajdbc4-15.10.00.22.jar \
--driver-class-path ./tdgssconfig-15.10.00.22.jar:./terajdbc4-15.10.00.22.jar \
--jars $LIB_DIR/tdgssconfig-15.10.00.22.jar,$LIB_DIR/terajdbc4-15.10.00.22.jar,<other jars>
...

Explanation

Two differences:
  • In spark-shell, you don’t have to put Terdata jdbc driver jars in --jars, because --driver-class-path already does. If you put them in --jars, it doesn’t hurt. But in spark-submit, you have to add them in --jars.
  • In spark-submit, --driver-class-path uses the current directory ./, not $LIB_IDR in spark-shell.
Lost? Here is why:
  • When you run spark-submit with YARN-cluster mode , the Spark app driver actually runs in a YARN container, not where you type “spark-submit”.
    • --jars makes teradata jdbc driver jars copied to the container directory where the Spark app driver (YARN appMaster) runs.
    • --driver-class-path just declares you need teradata jdbc driver jars. Be the Spark app driver runs in a YARN container, this argument tells the spark app driver JVM where to find the jdbc jars. Because the jars are copied into the YARN container’s directory, to the JVM of spark app driver, the container directory is the current directory.
    • spark-submit actually cand not find the jdbc jars in ./ in --driver-class-path, so the jdbc jars won’t be copied to the YARN container. That is why you have to use --jars to tell spark where to find those jars and ship them to the container.
  • In spark-shell, the spark app driver is started on the host where you type “spark-shell”. So you need to give the full path in --driver-class-path