Wednesday, July 29, 2015

Hive 4 Test

  1. 2015.03.23
    1. row_sequence()
      1. add jar /opt/cloudera/parcels/CDH/jars/hive-contrib-0.13.1-cdh5.3.0.jar;
      2. CREATE TEMPORARY FUNCTION row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
  2. 2015.02.11
    1. Software
      1. Hive 0.13.0 SetupHDP 2.1 General Availability
        • Hadoop 2.4.0
        • Tez 0.4.0
        • Hive 0.13.0
        HDP was deployed using Ambari 1.5.1. For the most part, the cluster used the Ambari defaults (except where noted below).  Hive 0.13.0 runs were done using Java 7 (default JVM).
        Tez and MapReduce were tuned to process all queries using 4 GB containers at a target container-to-disk ratio of 2.0. The ratio is important because it minimizes disk thrash and maximizes throughput.
        Other Settings:
        • yarn.nodemanager.resource.memory-mb was set to 49152
        • Default virtual memory for a job’s map-task and reduce-task were set to 4096
        • hive.tez.container.size was set to 4096
        • hive.tez.java.opts was set to -Xmx3800m
        • Tez app masters were given 8 GB
        • mapreduce.map.java.opts and mapreduce.reduce.java.opts were set to -Xmx3800m. This is smaller than 4096 to allow for some garbage collection overhead
        • hive.auto.convert.join.noconditionaltask.size was set to 1252698795
        Note:  this is 1/3 of the Xmx value, about 1.7 GB.
        The following additional optimizations were used for Hive 0.13.0:
        • Vectorized Query enabled
        • ORCFile formatted data
        • Map-join auto conversion enabled
    2. Hardware
      1. 20 physical nodes, each with:
        • 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores/machine
        • Hyper-threading enabled
        • 256GB RAM per node
        • 6x 4TB WDC WD4000FYYZ-0 drives per node
        • 10 Gigabit interconnect between the nodes
        Notes: Based on the YARN Node Manager’s Memory Resource setting used below, only 48 GB of RAM per node was dedicated to query processing, the remaining 200 GB of RAM were available for system caches and HDFS.
        Linux Configurations:
        • /proc/sys/net/core/somaxconn = 512
        • /proc/sys/vm/dirty_writeback_centisecs = 6000
        • /proc/sys/vm/swappiness = 0
        • /proc/sys/vm/zone_reclaim_mode = 0
        • /sys/kernel/mm/redhat_transparent_hugepage/defrag = never
        • /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag = no
        • /sys/kernel/mm/transparent_hugepage/khugepaged/defrag = 0

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.