Thursday, June 15, 2017

Help 4 HDP

  • Define a schema (spark)
    • Options
      // 1
      val schema = new StructType()  .add("i_logid", IntegerType, false)
        .add("i_logdetailid", IntegerType, false)
        .add("i_logdes"new StructType().add("gamecode", StringType, true), false)
       
      // 2
      val schema = StructType(
        StructField("i_logid", IntegerType, false::
          StructField("i_logdetailid", IntegerType, false::
          StructField("i_logdes"new StructType().add("gamecode", StringType, true), false::
          Nil
      )
       
      // 3
      case class Des(gamecode: String)
      case class Log(i_logid: Int, i_logdetailid: Int, i_logdes: Des)
      import org.apache.spark.sql.Encoders
      val schema = Encoders.product[Log].schema
       
      // 4
      spark.sql("select get_json_object(lower(cast(value as string)), '$.i_regdatetime') as i_regdatetime from rawData")
       
      // 5
      val schema = spark.read.table("netmarbles.log_20170813").schema
  • Estimates the sizes of java objects (spark)
  • Using the desc option in the orderBy API (spark)
    • orderBy($"count".desc)
    • orderBy('count.desc)
    • orderBy(-'count)
  • java.lang.NoClassDefFoundError: com/google/protobuf/ProtocolStringList (spark)
    • It is caused by using old version of protobuf such as probuf-java-2.5.0.jar which does not contain a ProtocolStringList class.
    • Spark in HDP 2.6.0 uses probuf-java-2.5.0.jar.
    • Use a newer version, like protobuf-java-2.6.1.jar will fix it.
    • E.g.
  • AbstractLifeCycle: FAILED ServerConnector@X{HTTP/1.1}{0.0.0.0:4040}: java.net.BindException: Address already in use (spark)
    • Possible reasons
      • Someone is using spark
      • There is a service using 4040
      • There was a spark-shell process which did not exit properlyd
    • netstat -lpn | grep 4040
      • tcp        0      0 0.0.0.0:4040            0.0.0.0:*               LISTEN      26464/java
    • ps -ef | grep 26464
    • kill -9 26464
  • java.io.FileNotFoundException: /data/hadoop/hdfs/data/current/VERSION (Permission denied) (hdfs)
    • Check the permission
      • ls -al /data/hadoop/hdfs/data/current
    • If it is not "hdfs:hadoop"
      • chown -R hdfs:hadoop /data/hadoop/hdfs/data/current
  • RDB 2 local using sqoop (sqoop)
    • Use -jt option
      • E.g. sqoop import -jt local --target-dir file:///home/hdfs/temp
    • Use -fs and -jt options
      • E.g. sqoop import -fs local -jt local
    • File file:/hdp/apps/2.6.0.3-8/mapreduce/mapreduce.tar.gz does not exist
      • mkdir -p /hdp/apps/2.6.0.3-8/mapreduce
      • chown -R hdfs:hadoop /hdp
      • cd /hdp/apps/2.6.0.3-8/mapreduce
      • hdfs dfs -get /hdp/apps/2.6.0.3-8/mapreduce/mapreduce.tar.gz .
  • Read files in s3a from spark (spark)
    • spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key","XXX") 
    • spark.sparkContext.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled","false") 
    • spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint","host:port") 
    • spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key","XXX")
    • spark.read.text("s3a://path/to/the/file")
  • java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer (spark)
    • Add kafka client JAR file
    • E.g. spark-shell --jars spark-sql-kafka-0-10_2.11-2.1.0.jar,kafka-clients-0.10.1.2.6.0.3-8.jar
  • org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms (kafka)
    • Check the broker's port number from ambari web UI and use the port number instead of 9092, and use the IP address of broker instead of localhost
      • bin/kafka-console-producer.sh --broker-list 10.100.99.152:6667 --topic rep-test
  • The status of clients are unknown (ambari)
    • Try to restart or reinstall the clients
    • If it gives a "Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?" error
      • ps -ef | grep apt
      • If there is apt-get or aptitude process, then kill it by using kill command
      • dpkg --configure -a
  • Setting the logging level of the ambari-agent.log (ambari)
    • cd /etc/ambari-agent/conf
    • cp logging.conf.sample logging.conf
    • vim logging.conf
      • [logger_root]
        level=WARNING
  • Amabri does not create/change hive-log4j.properties file in /etc/hive/2.6.0.3-8/0/conf.server or /usr/hdp/current/hive-server2/conf/conf.server (ambari)
    • Add a new hive metastore service to the same node which has no hive-log4j.properties file in /etc/hive/2.6.0.3-8/0/conf.server folder
  • Hive servers in same cluster use different hive-log4j.properties files (hive)
    • Option 1
      • Create a hive-log4j.properties file to the /etc/hive/2.6.0.3-8/0/conf.server folder if it is not exists
    • Option 2
      • Delete the hive server
      • Create a new hive server in another master node and use it
    • Option 3
      • Add a new hive metastore service to the same node which has no hive-log4j.properties file in /etc/hive/2.6.0.3-8/0/conf.server folder
  • Setting the logging level of the hiveserver2.log (hive)
    • Ambari web UI -> hive--> config --> advanced hive-log4j --> hive.root.logger=INFO,DRFA
  • Couldn't find leader offsets (spark)
    • Choose right kafka connector
  • toDF not member of RDD (spark)
    • val _spark = org.apache.spark.sql.SparkSession.builder().getOrCreate()
    • val _sqlContext = _spark.sqlContext
    • import _sqlContext.implicits._
  • Task Not Serializable (spark)
    • val _spark = org.apache.spark.sql.SparkSession.builder().getOrCreate()
    • val _sc = _spark.sparkContext
    • val _sqlContext = _spark.sqlContext
    • import _sqlContext.implicits._
    • SparkSession.setActiveSession(_spark)
  • Push JSON Records (spark)
    • val df = temp.toDF("createdAt", "users", "tweet")
    • json_rdd = df.toJSON.rdd
    • json_rdd.foreachPartition ( partition => { // Send records to Kinesis / Kafka })
  • Distcp creates same folder again (distcp)
    • E.g.
      • "hadoop distcp /a/b/target /c/d/target" gives /c/d/target.
      • If run the command again, it gives /c/d/target/target.
    • Option
      • use -update option
      • E.g. hadoop distcp -update /a/b/target /c/d/target
  • TExecuteStatementResp(status=TStatus(errorCode=1, errorMessage='Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask' (pyHive)
    • Options
      • Specify the user name as hdfs in the Connection method.
  • ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate($int,WrappedArray()) (spark)
    • ERROR Utils: Uncaught exception in thread pool-1-thread-1
      • java.lang.InterruptedException
    • WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, 
      • java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException
    • Option
      • Set bigger value to "spark.ui.retainedTasks"
  • resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.6.0.3-8 | tail -1`' returned 1. ERROR: set command takes 2 parameters, instead of 1 (ambari)
    • ambari-python-wrap /usr/bin/hdp-select versions
    • ERROR: Unexpected file/directory found in /usr/hdp: test
    • cd /usr/hdp
    • rm -r test
  • Spark's log level become WARN, even though there is no related config is set (spark)
    • It is because there is two --files option are specified while submitting the spark job.
      • E.g. --files a --files b
    • Use one --files option will solve the problem.
      • E.g. --files a,b
  • Failed to add $JARName to Spark environment / java.io.FileNotFoundException: Jar $JARName not found / File file:/path/to/file does not exist (spark)
    • Reason
      • This happens when the checkpoint is enabled and the spark job is submitted as client mode at first.
      • There will be these errors while trying to submit as cluster mode at next time.
    • Option
      • Delete the checkpoint directory.
      • Submit the job as cluster mode.
  • WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect. (spark)
    • Using lazily instantiated singleton instance of SparkSession may avoid the warning.
    • use "val spark = SparkSessionSingleton.getInstance(rdd.sparkContext.getConf)" instead of "val spark = SparkSession.builder().config(rdd.sparkContext.getConf).getOrCreate()".
      object SparkSessionSingleton {
       
        @transient  private var instance: SparkSession = _
       
        def getInstance(sparkConf: SparkConf): SparkSession = {
          if (instance == null) {
            instance = SparkSession
              .builder
              .config(sparkConf)
              .getOrCreate()
          }
          instance
        }
      }
  • Exception in thread "main" java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72) (spark)
    • remove "microsoft" from the URL.
      • E.g. use val url="jdbc:sqlserver://19.16.6.5:51051"
  • java.sql.SQLException: No suitable driver (spark)
    • Specify the driver
      • prop.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
  • java.sql.SQLException: No suitable driver found for jdbc:... (spark)
    • The JDBC driver class must be visible to the primordial class loader on the client session and on all executors.
    • Include the dependency inside the JAR.
    • Or, use --driver-class-path and --jar.
  • org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task (spark)
    • Check spark version.
    • Make spark dependencies as provided in pom.xml file.
    • Remove spark dependencies conflict each other.
  • java.lang.NoClassDefFoundError: com/yammer/metrics/Metrics (spark)
    • --jars /home/hdfs/lib/metrics-core-2.2.0.jar
    • KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)
  • error: not found: value kafka (spark)
    • --jars /home/hdfs/lib/kafka_2.11-0.8.2.1.jar
    • import kafka.serializer.StringDecoder
  • error: object kafka is not a member of package org.apache.spark.streaming (spark)
    • --jars /home/hdfs/lib/spark-streaming-kafka-0-8_2.11-2.1.0.jar
    • import org.apache.spark.streaming.kafka.KafkaUtils
  • org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (spark)
    • sc.stop
    • val ssc = new StreamingContext(conf, Seconds(1))
  • org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI (sqoop)
  • org.kitesdk.data.ValidationException: Dataset name common.metagoods is not alphanumeric (plus '_') (sqoop)
    • use --hive-database test --hive-table test
    • instead of --hive-table test.test
  • 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel (HDFS)
    • It may be because of the GC problem of the app running on the hadoop cluster.
    • E.g. for spark app, remove unnecessary variables and tune the GC.
    • Related errors
      • java.io.EOFException: Premature EOF: no length prefix available
      • DataXceiver error processing WRITE_BLOCK operation src: /IP:port dst: /IP:port java.net.SocketTimeoutException
      • java.io.InterruptedIOException: Interrupted while waiting for IO on channel java.nio.channels.SocketChannel[connected local=/IP:port remote=/IP:port]. 60000 millis timeout left
  • DataNode Heap Usage (ambari)
    • export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=800m -XX:MaxNewSize=800m -XX:+UseParNewGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}"
  • java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths (spark)
    • If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.
    • E.g. spark.read.option("basePath", "/test/") .textFile( "/test/code=test1", "/test/code=test2")
  • java.lang.AssertionError: assertion failed: Conflicting partition column names detected (spark)
    • Check the HDFS path.
      • /test/code=test1
      • /test/code=test2/code=test2
    • Remove the incorrect partition column
      • E.g. hdfs dfs -rm -r /test/code=test2/code=test2
  • Very slow "select * from table limit 1" (hive)
    • The query is very slow, even hangs or gives errors, if the table is an ORC table with large size files.
    • Option 1
      • Save the table as text instead of ORC.
    • Option 2
      • Recreate the table with smaller ORC files.
  • java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.zlib (hive)
    • Use 'stored as orc tblproperties ("orc.compress"="ZLIB")' instead of 'stored as orc tblproperties ("orc.compress"="zlib")'
    • Same as snappy.
  • SemanticException org.apache.thrift.transport.TTransportException (hive)
    • mysql -u hive -p
    • use hive;
    • select location from SDS;
    • Check the service name of hadoop cluster
    • If it is wrong, then update it.
  • Zeppelin shows stopped in ambari UI while restarting it through command line (ambari)
    • use the command placed below to solve the problem.
    • su -l zeppelin -c "/usr/hdp/current/zeppelin-server/bin/zeppelin-daemon.sh restart"
  • The IP address of newly added host is marked as 127.0.1.1 from ambari (ambari)
    • The reason is the wrong information in the /etc/hosts file. And the solutions for this problem depend on the use cases.
    • Case 1
      • Change "10.10.10.10 test.com test" to "127.0.1.1 test.com test".
      • ambari-agent restart
      • Ambari UI → hosts show 127.0.1.1.
      • Change "127.0.1.1 test.com test" to "10.10.10.10 test.com test".
      • ambari-agent restart
      • Ambari UI → hosts show 10.10.10.10.
    • Case 2
      • Comment "127.0.1.1 test".
        • #127.0.1.1 test
      • Change "10.10.10.10 test test.com" to "10.10.10.10 test"
      • ambari-agent restart
      • Ambari UI → hosts shows 10.10.10.10.
  • The status of newly added host is marked as unknown from ambari (ambari)
    • The wrong information in the hosts and hoststate table cause the problem.
    • The hosts and hoststate are the tables of RDBMS which ambari using.
    • And the wrong information is there is an IP address has two hostname information.
      • E.g. There are "test, 10.10.10.10" and "test.com, 10.10.10.10".
    • The reason why "test.com, 10.10.10.10" exists is this information is included in the /etc/hosts file while restarting ambari-agent.
    • Solution
      • Remove "test.com, 10.10.10.10" from the two tables.
        • E.g. delete from hoststate where host_id=102; delete from hosts where host_id=102;
      • Remove "10.10.10.10 test.com" from the /etc/hosts file.
      • ambari-agent restart
  • Failed to connect to server: rm1.host.name/rm1.ip:8032: retries get failed due to exceeded maximum allowed retries number: 0 (YARN)
    • The client will show the warning while it tries to connect the RM1 which is at standby status.
  • DataNode heap usage warning (ambari)
  • If no severity is select in ambari alert notification. All severities will be selected (ambari)
    • Clear all groups instead.
  • Unable to edit ambari alert notification (ambari)
    • Copy the notification, modify it, and save it as new notification. Then delete the old notification.
    • Or, restart the ambari server.
  • Spark gives error if using spark.driver.userClassPathFirst=true and hive enabled in YARN cluster mode (spark)
    • Use spark.executor.userClassPathFirst=true with hive enabled.
  • no viable alternative at input '<EOF>'(line 1, pos 4000) (spark)
    • The string length of a column schema cannot exceed 4000.
    • Divide the columns in a column to more columns.
    • Or, explode the column to columns.
  • Zeppelin in ambari UI shows stopped, if using command to restart the zepplein (ambari)
    • E.g. /usr/hdp/current/zeppelin-server/bin/zeppelin-daemon.sh restart.
    • Use the following command instead.
    • su -l zeppelin -c "/usr/hdp/current/zeppelin-server/bin/zeppelin-daemon.sh restart"
  • Hosts show high load average (hadoop)
    • Check the status of khugepaged.
    • If its status is D, then check the status of the transparent huge page.
    • if it is not never, then set THP to never permanently.
    • Restart the hosts.
  • Ambari shows strange or invalid host name (ambari)
    • Delete strange or invalid host name information from hosts and hoststate tables in the database which ambari using.
      • E.g.
        • delete from hosts where host_id > 12;
        • delete from hoststate where host_id > 12;
    • Restart ambari server.
  • Spark driver shows high CPU usage (spark)
    • Increase driver memory.
    • Tune spark job.
      • E.g.
        • Unpersist finished variable.
        • Avoid unnecessary variable creation.
  • Two spark history servers are running with high CPU usage (spark)
    • Stop one from the ambari UI first, and kill another.
    • Start spark history server from the ambari UI.
  • Spark history server shows high CPU usage after restarting it (spark)
    • Check /spark-history.
    • If there are many files, then delete old or all files before restarting spark history server
  • Newly added column shows NULL in the partitioned hive table with existing data (hive)
    • This issue will happen when trying to overwrite the existed partition, even if there is non-null value for the new column.
    • And, will show the right value from newly added partitions.
    • Options
      • Delete the partitions first, then insert the data
      • Or, add new column with cascade option
        • E.g. alter table table_name add columns (new_column string) cascade;
  • How to specify hive tez job name showing at resource manager UI (tez)
  • net.minidev.json.parser.ParseException: Unexpected duplicate key (spark)
    • option 1
      • spark.driver.userClassPathFirst
        • (Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only
      • spark.executor.userClassPathFirst
        • (Experimental) Same functionality as spark.driver.userClassPathFirst, but applied to executor instances
    • option 2
      • cd /usr/hdp/current/spark2-client/jars
      • remove the old minidev library such as json-smart-1.1.1.jar
      • add new minidev libraries such as json-smart-2.3.jar and accessors-smart-1.2.jar
    • option 3
      • java.sql.SQLException: No suitable driverSpark stddev functions returns different value from hive or similar function in SQL serverAdd following configuration to the maven-shade-plugin
      • <configuration>
        <relocations>
        <relocation>
        <pattern>net.minidev</pattern>
        <shadedPattern>shaded.net.minidev</shadedPattern>
        </relocation>
        </relocations>
        </configuration>
  • Invalid dfs.datanode.data.dir /data5/hadoop/hdfs/data : java.io.FileNotFoundException: File file:/data5/hadoop/hdfs/data does not exist (HDFS)
    • cd /data5/h
    • mkdir hdfs
    • cd hdfs
    • mkdir data
    • chown hdfs:hadoop data/
    • start data node
  • Duplicate key name 'CONSTRAINTS_PARENT_TABLE_ID_INDEX' (hive)
  • java.lang.OutOfMemoryError: Direct buffer memory (hbase)
  • Failure in saving service configuration (ambari)
    • Click the one of the old config and click 'Make V{number} Current'
    • Restart the service
    • Try to modify and save the configuration again
  • Add JAR file for solving "class not found" kinds of problem in the zeppelin notebook (zeppelin)
    • Go to the interpreter configuration page
    • Click the edit button on the spark interpreter section
    • Add the path of the JAR file to the artifact column of the dependencies
      • E.g. /home/zeppelin/elasticsearch-hadoop-hive-5.0.0.jar
    • Click save button
    • Click restart button on the spark interpreter section
  • "class not found" errors while using zeppelin hive notebook (zeppelin)
    • Add these jar files listed below to the "/interpreter/jdbc" folder. E.g. /usr/hdp/2.5.3.0-37/zeppelin/interpreter/jdbc
    • curator-client-2.7.1.jar
    • curator-framework-2.7.1.jar
    • hadoop-common-2.7.3.2.5.3.0-37.jar
    • hive-common-1.2.1000.2.5.3.0-37.jar
    • hive-jdbc-1.2.1000.2.5.3.0-37.jar
    • hive-metastore-1.2.1000.2.5.3.0-37.jar
    • hive-service-1.2.1000.2.5.3.0-37.jar
    • zookeeper-3.4.6.2.5.3.0-37.jar
  • tool.ImportTool: Encountered IOException running import job: java.io.IOException: Caught Exception checking database column EXCHANGE in  hcatalog table (sqoop)
    • do not use "EXCHANGE" as column name. rename it.
  • An error occurred while calling z:java.sql.DriverManager.getConnection. : java.sql.SQLException: java.lang.RuntimeException: java.lang.NullPointerException at (phoenix)
    • add ":" symbol after the port number and before the zookeeper node
    • e.g. jdbc:phoenix:zookeeper_host:2181:/hbase-unsecure
  • "'ascii' codec can't encode characters in position XXX: ordinal not in range(128)" (HUE)
  • ambari web UI -> hosts shows inappropriate IP address (ambari)
    • check the /etc/hosts file of the host shows inappropriate IP address
    • modify the inappropriate contents in the /etc/hosts file
    • ambari-agent restart
    • (option) you may also need to update the hosts table in the ambari database of the ambari's RDBMS for modifying the inappropriate IP address, and restart the ambari server depends on your use case
  • Local OS is not compatible with cluster primary OS family. Please perform manual bootstrap on this host (ambari)
    • use the commands listed below to confirm your OS version
      • uname -a: for all information regarding the kernel version
      • uname -r: for the exact kernel version 
      • lsb_release -a: for all information related to the Ubuntu version
      • lsb_release -r: for the exact version 
      • sudo fdisk -l: for partition information with all details
    • if the OS versions are different, then reinstall your OS
  • UNIQUE constraint failed: auth_user.username / 1062, "Duplicate entry 'hue' for key 'username'" (HUE)
    • Do not create an account called "hue" at very first time, since HUE will create an account called "hue" when installing examples
  • X is not allowed to impersonate Y (ambari)
    • add the properties, placed below, to core-site
      • hadoop.proxyuser.X.groups = *
      • hadoop.proxyuser.X.hosts = *
  • Missing Required Header for CSRF protection (HUE)
    • disable CSRF for livy via ambari
  • ConfigObjError: Parsing failed with several errors (HUE)
    • invalid syntax exists in the hue.ini
  • Unauthorized connection for super-user: hcat from IP X.X.X.X (hive)
    • add the properties, placed below, to core-site
      • hadoop.proxyuser.hcat.hosts=*
      • hadoop.proxyuser.hcat.groups=*
  • Queue's AM resource limit exceeded (YARN)
    • increase yarn.scheduler.capacity.maximum-am-resource-percent
  • check and fix under replicated blocks in HDFS (HDFS)
    • hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}'
    • hadoop fs -setrep 3 file-name
  • java.lang.NoClassDefFoundError: org/apache/spark/deploy/SparkSubmit (oozie)
    • add spark-assembly-*-hadoop*.jar to the lib directory of
    • e.g. spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar 
  • Class org.apache.oozie.action.hadoop.SparkMain not found (oozie)
    • add oozie-sharelib-spark-*.jar to the lib directory of the oozie app
    • e.g. /user/hdfs/oozie-test/lib/oozie-sharelib-spark-4.2.0.2.5.0.0-1245.jar
  • unknown hosts (ambari)
    • delete unknown hosts' data from hosts and hoststate tables of ambari databases
  • ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning (spark)
    • use "--files /usr/hdp/current/spark-client/conf/hive-site.xml"
    • instead of "--files /usr/hdp/current/hive-client/conf/hive-site.xml"
  • Exception in thread “dag-scheduler-event-loop” java.lang.OutOfMemoryError: Java heap space (spark)
    • increase the driver's memory using "--driver-memory"
  • org.apache.spark.sql.AnalysisException: resolved attribute(s) ... HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber() windowspecdefinition ... (spark)
    • do not use an alias on a column which is already existied
      • e.g. PlatformADID as PlatformADID
    • however, it is OK to use alias on a new column
      • e.g. min(test) as min_test
  • The information of vCores in the resource manager UI is different from the value of "--executor-cores" (spark)
    • turn yarn.scheduler.capacity.resource-calculator on, if using HDP
    • or, add the property placed below in the capacity-scheduler.xml file
      • <property> <name>yarn.scheduler.capacity.resource-calculator</name> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> </property>
  • Failed to find datanode, suggest to check cluster health. (HDFS)
    • check both forward and reverse DNS setting
    • or, add the information of datanodes to the /etc/hosts file
  • stddev_samp returns NaN (spark)
    • option 1: cast(STDDEV_SAMP(column) as decimal(16, 10)) (spark SQL)
    • option 2: STDDEV_SAMP = SQRT[N/(N-1)] * STEDDV_POP
  • spark job is getting slow, almost frozen, OOM-GC (spark)
    • try to run a action at a intermediate stage of the job. It may help.
  • Hive_CLIENT in invalid state. Invalid transition. Invalid event: HOST_SVCCOMP_OP_IN_PROGRESS at INSTALL_FAILED (ambari)
  • Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found (spark)
    • --jars /local/path/to/datanucleus-api-jdo-3.2.6.jar (spark-submit --master yarn-cluster)
  • Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient (spark)
    • --jars /local/path/to/datanucleus-core-3.2.10.jar (spark-submit --master yarn-cluster)
  • There is no available StoreManager of type "rdbms". Make sure that you have put the relevant DataNucleus store plugin in your CLASSPATH and if defining a connection via JNDI or DataSource you also need to provide persistence property "datanucleus.storeManagerType" (spark)
    • --jars /home/hdfs/libs/datanucleus-rdbms-3.2.9.jar (spark-submit --master yarn-cluster)
  • Database does not exist: user_pattern (spark)
    • --files /home/hdfs/libs/hive-site.xml (spark-submit --master yarn-cluster)
  • failed to start metrics monitor caused by psutil (ambari metrics)
    • python /usr/lib/python2.6/site-packages/resource_monitoring/psutil/build.py
  • No FileSystem for scheme: hdfs (hbase)
    • chmod 755 /bin/which
  • org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block (HDFS)
    • open ports
      • 50076, 50010, 50020