Friday, December 30, 2016

Spark Action VS Spark Submit

  • spark action
    • <workflow-app xmlns="uri:oozie:workflow:0.3" name="oozie-spark-action-test">
      <start to="spark-node" />
      <action name="spark-node">
      <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <master>yarn-cluster</master>
      <name>person</name>
      <class>com.spark.batch.Generator</class>
      <jar>${nameNode}/user/${wf:user()}/${appDir}/lib/spark-0.0.3-jar-with-dependencies.jar</jar>
      <spark-opts>--executor-memory 19g --num-executors 31 --executor-cores 3 --driver-memory 9g --driver-cores 2 --conf spark.yarn.historyServer.address=http://lqahadoopdata03.net:18080 --conf spark.eventLog.dir=${nameNode}/spark-history --conf spark.eventLog.enabled=true</spark-opts>
      <arg>-P=job.conf</arg>
      <arg>-C</arg>
      </spark>
      <ok to="end" />
      <error to="fail" />
      </action>
      <kill name="fail">
      <message>Workflow failed, error
      message[${wf:errorMessage(wf:lastErrorNode())}] </message>
      </kill>
      <end name="end" />
      </workflow-app>
      • spark submit
        • spark-submit --master yarn-cluster --name person --executor-memory 19g --num-executors 31 --executor-cores 3 --driver-memory 9g --driver-cores 2 --files /usr/hdp/current/spark-client/conf/hive-site.xml,/home/hdfs/person/conf/job.conf --driver-class-path /home/hdfs/lib/sqljdbc4.jar --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/home/hdfs/lib/sqljdbc4.jar --class com.spark.batch.Generator spark-0.0.3-jar-with-dependencies.jar -P job.conf -C
  • spark action (oozie) VS spark submit (spark)
    • 항목
      oozie
      spark
      master<master></master>--master
      name<name></name>--name
      class<class></class>--class
      app (JAR)<jar></jar>--class 뒤
      spark options<spark-opts></spark-opts>--num-executors, --executor-cores, --executor-memory, 기타
      app arguments<arg></arg>app (JAR) 뒤
      file(s)${oozie-workflow-dir}/lib (HDFS)--files
      JAR(s)${oozie-workflow-dir}/lib (HDFS)--jars
      other- app option 지정 시 key와 value 사이에 '=' 부호를 사용하여야 함
      - 예, -P=job.conf
      - app option 지정 시 key와 value 사이에 '=' 부호가 없어도 됨
      - 예, -P job.conf

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.