Monday, August 31, 2015

Help 4 Error - Old

  1. bin/flume-ng: line X: syntax error in conditional expression (flume)
    1. add " symbol. for instance,
      1. if [[ $line =~ "^java\.library\.path=(.*)$" ]]; then
    2. or, update bash.
      1. check bash version using 'bash -version' command
  2. warning: unprotected private key file! / permissions 0xxx for 'path/to/id_rsa' are too open (OpenSSH)
    1. chmod 600 path/to/id_rsa
  3. ask for password, even authorized_keys file exists (SSH)
    1. chmod 700 .ssh
    2. chmod 644 .ssh/authorized_keys
  4. failed to recv file (R - scp)
    1. check the file path
  5. no lines available in input / 입력에 가능한 라인들이 없습니다 (R - read.table with pipe)
    1. check the file path
  6. java.net.MalformedURLException: unknown protocol: hdfs (java)
    1. add 'URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());'
  7. argument list too long (curl)
    1. it causes the error, because 'cat base64.txt' returns long content
      1. curl -XPOST "http://localhost:9200/test/person" -d '
        {
        "file" : {
        "_content" : "'`cat base64.txt`'"
        }
        }'
    2. use '@-' to solve the problem
      1. curl -XPOST "http://localhost:9200/test/person" -d @- <<CURL_DATA
        {
        "file" : {
        "_content" : "`cat base64.txt`"
        }
        }
        CURL_DATA
      2. note that you can use any string instead of CURL_DATA, and there is no single quote inside the value of _content this time
  8. invalid target release: 1.7 (maven)
    1. export JAVA_HOME=<java home path>

Behemoth

  1. 2015.08.18
    1. prerequisites
      1. java 1.6
      2. apache maven 2.2.1
      3. internet connection
    2. compiling
      1. git clone https://github.com/DigitalPebble/behemoth.git
      2. cd behemoth
      3. mvn install
      4. mvn test
      5. mvn package
    3. generate a corpus
      1. hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i <file or dir> -o output1
      2. ./behemoth importer
    4. extract text
      1. hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.tika.TikaDriver -i output1 -o output2
      2. ./behemoth tika
    5. inspect the corpus
      1. hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.CorpusReader -i output2 -a -c -m -t
      2. hadoop fs -libjars tika/target/behemoth-tika-*-job.jar -text output2/part-00000
      3. hadoop fs -libjars tika/target/behemoth-tika-*-job.jar -text output2/part-00001
      4. ./behemoth reader
    6. extract content from seq files
      1. hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.ContentExtractor -i output2 -o output3
      2. hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.ContentExtractor -i output2/part-00000 -o output4
      3. hadoop jar tika/target/behemoth-tika-*-job.jar com.digitalpebble.behemoth.util.ContentExtractor -i output2/part-00001 -o output5
      4. ./behemoth exporter

ElasticSearch

  1. 2015.08.14
    1. mapper attachments type for elasticsearch
      1. each node
        1. bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.3
          1. note that 2.4.3 is for ES 1.4
        2. restart
      2. DELETE /test
      3. PUT /test
      4. PUT /test/person/_mapping
        {
        "person" : {
        "properties" : {
        "file" : {
        "type" : "attachment",
        "fields" : {
        "file" : {"term_vector" : "with_positions_offsets", "store": true},
        "title" : {"store" : "yes"},
        "date" : {"store" : "yes"},
        "author" : {"store" : "yes"},
        "keywords" : {"store" : "yes"},
        "content_type" : {"store" : "yes"},
        "content_length" : {"store" : "yes"},
        "language" : {"store" : "yes"}
        }
        }
        }
        }
        }
      5. curl -XPOST "http://localhost:9200/test/person" -d '
        {
        "file" : {
        "_content" : "... base64 encoded attachment ..."
        }
        }'
      6. for long base64
        1. curl -XPOST "http://localhost:9200/test/person" -d @- <<CURL_DATA
          {
          "file" : {
          "_content" : "`base64 my.pdf | perl -pe 's/\n/\\n/g'`"
          }
          }
          CURL_DATA
      7. GET /test/person/_search
        {
        "fields": [ "file.date", "file.title", "file.name", "file.author", "file.keywords", "file.language", "file.cotent_length", "file.content_type", "file" ],
        "query": {
        "match": {
        "file.content_type": "pdf"
        }
        }
        }
  2. 2015.03.03
    1. bashrc
      1. export INNERIP=`hostname -i`
        export ES_HEAP_SIZE=8g
        export ES_CLASSPATH=/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*
    2. configuration
      1. cluster.name: test
      2. node.name: ${HOSTNAME}
      3. transport.host: ${INNERIP}
      4. discovery.zen.ping.multicast.enabled: false
      5. discovery.zen.ping.unicast.hosts: ["10.0.2.a", "10.0.2.b", "10.0.2.c"]
      6. indices.fielddata.cache.size: 40%
  3. 2015.03.02
    1. snapshot and restore
      1. repository register
        1. PUT _snapshot/hdfs
          {
          "type": "hdfs",
          "settings": {
          "path": "/backup/elasticsearch"
          }
          }
      2. repository verification
        1. POST _snapshot/hdfs/_verify
      3. snapshot
        1. PUT _snapshot/hdfs/20150302
      4. monitoring snapshot/restore progress
        1. GET _snapshot/hdfs/20150302/_status
        2. GET _snapshot/hdfs/20150302
      5. snapshot information and status
        1. GET _snapshot/hdfs/20150302
        2. GET _snapshot/hdfs/_all 
        3. GET _snapshot/_status 
        4. GET _snapshot/hdfs/_status 
        5. GET _snapshot/hdfs/20150302/_status
      6. restore
        1. POST _snapshot/hdfs/20150302/_restore
      7. snapshot deletion / stopping currently running snapshot and restore operations
        1. DELETE _snapshot/hdfs/20150302
      8. repository deletion
        1. DELETE _snapshot/hdfs
      9. reference
        1. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
    2. rolling update
      1. Disable shard reallocation
        1. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "none" } }'
      2. Shut down a single node within the cluster
        1. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
      3. Confirm that all shards are correctly reallocated to the remaining running nodes
      4. Download newest version
      5. Extract the zip or tarball to a new directory
      6. Copy the configuration files from the old Elasticsearch installation’s config directory to the new Elasticsearch installation’s config directory
      7. Move data files from the old Elasticsesarch installation’s data directory
      8. Install plugins
      9. Start the now upgraded node
      10. Confirm that it joins the cluster
      11. Re-enable shard reallocation
        1. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'
      12. Observe that all shards are properly allocated on all nodes
      13. Repeat this process for all remaining nodes
      14. Reference
        1. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades
  4. 2015.02.13
    1. MySQL Slow Query Log Mapping
      1. PUT msql-2015
        {
          "mappings": {
            "log": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock_time": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "query_time": {
                  "type": "double"
                },
                "rows_examined": {
                  "type": "double"
                },
                "rows_sent": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            }
          }
        }
    2. MySQL Slow Query Dump Mapping
      1. PUT msqld-2015
        {
          "mappings": {
            "dump": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "count": {
                  "type": "double"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "rows": {
                  "type": "double"
                },
                "time": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            }
          }
        }
  5. 2015.02.12
    1. MySQL Slow Query Log & Dump Mappings
      1. PUT msqld-2015
        {
          "mappings": {
            "log": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock_time": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "query_time": {
                  "type": "double"
                },
                "rows_examined": {
                  "type": "double"
                },
                "rows_sent": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            },
            "dump": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "count": {
                  "type": "double"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "rows": {
                  "type": "double"
                },
                "time": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            }
          }
        }
  6. 2015.01.19
    1. restart script
      1. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        sleep 1s
        curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
        sleep 1s
        bin/elasticsearch -d
        sleep 10s
        curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
  7. ~ 2015.01.01
    1. Commnad
      1. curl 'http://localhost:9200/?pretty'
      2. curl -XPOST 'http://localhost:9200/_shutdown'
      3. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
      4. curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
      5. curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
    2. Configuration
      1. config/elasticsearch.yml
        1. cluster.name
        2. node.name
        3. node.master
        4. node.data
        5. path.*
          1. path.conf: -Des.path.conf
          2. path.data
          3. path.work
          4. path.logs
        6. discovery.zen.ping.multicast.enabled: false
        7. discovery.zen.ping.unicast.hosts
        8. gateway.recover_after_nodes: n
        9. discovery.zen.minimum_master_nodes: (n/2) + 1
        10. action.disable_delete_all_indices: true
        11. action.auto_create_index: false
        12. action.destructive_requires_name: true
        13. index.mapper.dynamic: false
        14. script.disable_dynamic: true
        15. indices.fielddata.cache.size: 40%
      2. dynamic
        1. discovery.zen.minimum_master_nodes
          curl -XPUT localhost:9200/_cluster/settings -d '{
            "persistent" : {
              "discovery.zen.minimum_master_nodes" : (n/2) + 1
            }
          }'
        2. disable _all
          PUT /my_index/_mapping/my_type
          {
              "my_type": {
                  "_all": { "enabled": false }
              }
          }
        3. include_in_all
          PUT /my_index/my_type/_mapping
          {
              "my_type": {
                  "include_in_all": false,
                  "properties": {
                      "title": {
                          "type":           "string",
                          "include_in_all": true
                      },
                      ...
                  }
              }
          }
        4. _alias, _aliases
          PUT /my_index_v1 
          PUT /my_index_v1/_alias/my_index

          POST /_aliases
          {
              "actions": [
                  { "remove": { "index": "my_index_v1", "alias": "my_index" }},
                  { "add":    { "index": "my_index_v2", "alias": "my_index" }}
              ]
          }
        5. refresh_interval (bulk indexing)
          PUT /my_logs
          {
            "settings": {
              "refresh_interval": "30s" 
            }
          }
          POST /my_logs/_settings
          { "refresh_interval": -1 } 
          
          POST /my_logs/_settings
          { "refresh_interval": "1s" } 
        6. flush
          POST /blogs/_flush 
          
          POST /_flush?wait_for_ongoing
        7. optimize
          POST /logstash-old-index/_optimize?max_num_segments=1
        8. filed length norm (for logging)
          PUT /my_index
          {
            "mappings": {
              "doc": {
                "properties": {
                  "text": {
                    "type": "string",
                    "norms": { "enabled": false } 
                  }
                }
              }
            }
          }
        9. tune cluster and index recovery settings (test the value)
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_initial_primary_recoveries":25}}'
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":5}}'
          ?
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.max_bytes_per_sec":"100mb"}}'
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.concurrent_streams":20}}'
      3. logging.yml
        1. use node.name instead of cluster.name
          file: ${path.logs}/${node.name}.log
      4. elasticsearch.in.sh
        1. disable HeapDumpOnOutOfMemoryError
          #JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
      5. ES_HEAP_SIZE: 50% (< 32g)
        1. export ES_HEAP_SIZE=31g
      6. no swap
        1. bootstrap.mlockall = true
        2. ulimit -l unlimited
      7. thread pools
        1. thread pool size
          1. search - 3 * # of processors (3 * 64 = 192)
          2. index - 2 * # of processors (2 * 64 = 128)
          3. bulk - 3 * # of processors (3 * 64 = 192)
        2. queues - set the size to -1 to prevent rejections from ES
      8. buffers
        1. increased indexing buffer size to 40%
      9. dynamic node.name
        1. ES script
          export ES_NODENMAE=`hostname -s`
        2. elasticsearch.yml
          node.name: "${ES_NODENAME}"
    3. Hardware
      1. CPU
        1. core
      2. disk
        1. SSD
          1. noop / deadline scheduler
          2. better IOPS
          3. cheaper WRT: IOPS
          4. manufacturing tolerance can vary
        2. RAID
          1. do not necessarily need
          2. ES handles redundancy
    4. Monitoring
      1. curl 'localhost:9200/_cluster/health'
      2. curl 'localhost:9200/_nodes/process'
        1. max_file_descriptotrs: 30000?
      3. curl 'localhost:9200/_nodes/jvm'
        1. version
        2. mem.heap_max
      4. curl 'localhost:9200/_nodes/jvm/stats'
        1. heap_used
      5. curl 'localhost:9200/_nodes/indices/stats'
        1. fielddata
      6. curl 'localhost:9200/_nodes/indices/stats?fields=created_on'
        1. fields
      7. curl 'localhost:9200/_nodes/http/stats'
        1. http
      8. GET /_stats/fielddata?fields=*
      9. GET /_nodes/stats/indices/fielddata?fields=*
      10. GET /_nodes/stats/indices/fielddata?level=indices&fields=*
    5. Scenario
      1. adding nodes
        1. disable allocation to stop shard shuffling until ready
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        2. increase speed of transfers
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
        3. start new nodes
        4. enable allocation
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
      2. removing nodes
        1. exclude the nodes from the cluster, this will tell ES to move things off
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude._name":"node-05*,node-06*"}}'
        2. increase speed of transfers
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
        3. shutdown old nodes after all shards move off
          curl -XPOST 'localhost:9200/_cluster/nodes/node-05*,node-06*/_shutdown'
      3. upgrades / node restarts
        1. disable auto balancing  if doing rolling restarts
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        2. restart
        3. able auto balancing
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
      4. re / bulk indexing
        1. set replicas to 0
        2. increase after completion
      5. configure heap size
        1. heap size setting
        2. export ES_HEAP_SIZE=9g
        3. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        4. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
        5. bin/elasticsearch -d
        6. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'

Zeppelin installation (HDP)

  1. echo $JAVA_HOME
  2. git clone https://github.com/apache/incubator-zeppelin.git
  3. cd incubator-zeppelin
  4. mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 -Phadoop-2.6 -Pyarn
  5. hdp-select status hadoop-client | sed 's/hadoop-client - \(.*\)/\1/'
    1. 2.3.0.0-2557
  6. vim conf/zeppelin-env.sh
    1. export HADOOP_CONF_DIR=/etc/hadoop/conf 
    2. export ZEPPELIN_PORT=10008 
    3. export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557"
  7. cp /etc/hive/conf/hive-site.xml conf/
  8. su hdfs -l -c 'hdfs dfs -mkdir /user/zeppelin;hdfs dfs -chown zeppelin:hdfs /user/zeppelin'
  9. bin/zeppelin-daemon.sh start
  10. http://$host:10008

Maven

  1. 2015.08.12
    1. install maven using yum
      1. wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
      2. yum -y install apache-maven
    2. installation
      1. echo $JAVA_HOME
      2. cd /opt
      3. wget http://apache.tt.co.kr/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.zip
      4. wget http://www.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.zip.asc
      5. wget http://www.apache.org/dist/maven/KEYS
      6. gpg --import KEYS
      7. gpg --verify apache-maven-3.3.3-bin.zip.asc apache-maven-3.3.3-bin.zip
      8. unzip apache-maven-3.3.3-bin.zip
      9. export PATH=/opt/apache-maven-3.3.3/bin:$PATH
      10. mvn -v

Help 4 HDP - Old

  1. caused by: unrecognized locktype: native (solr)
    1. vim /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
    2. search lockType
    3. set it to hdfs
    4. /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd upconfig -confname myCollConfigs -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf
  2. caused by: direct buffer memory (solr)
    1. vim /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
    2. search solr.hdfs.blockcache.direct.memory.allocation
    3. set it to false
    4. /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd upconfig -confname myCollConfigs -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf
    5. restart solr
    6. or try 'caused by: java heap space (solr)' directly
  3. caused by: java heap space (solr)
    1. vim /opt/lucidworks-hdpsearch/solr/bin/solr.in.sh
    2. search solr_heap
    3. increase it
    4. restart solr
  4. error: not found: value StructType  (spark)
    1. import org.apache.spark.sql.types._
    2. note that you need to import 'import org.apache.spark.sql.types._' even if 'import org.apache.spark.sql._' is already imported
  5. no response from namenode UI / 50070 is binded to private IP  (hadoop)
    1. ambari web -> HDFS -> configs -> custom core-site -> add property
      1. key: dfs.namenode.http-bind-host
      2. value: 0.0.0.0
    2. save it and restart related services
    3. note that there are 'dfs.namenode.rpc-bind-host', 'dfs.namenode.servicerpc-bind-host' and 'dfs.namenode.https-bind-host' properties which can solve similar issue
  6. root is not allowed to impersonate <username> (hadoop)
    1. ambari web -> HDFS -> configs -> custom core-site -> add property
      1. key: hadoop.proxyuser.root.groups
      2. value: *
      3. key: hadoop.proxyuser.root.hosts
      4. value: *
    2. save it and restart related services
    3. note that you should change root to the user name who runs/submits the service/job
  7. option sql_select_limit=default (ambari)
    1. use latest jdbc driver
      1. cd /usr/share
      2. mkdir java
      3. cd java
      4. wget http://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.1.36.zip
      5. unzip mysql-connector-java-5.1.36.zip
      6. cp mysql-connector-java-5.1.36/mysql-connector-java-5.1.36-bin.jar .
      7. ln -s mysql-connector-java-5.1.36-bin.jar mysql-connector-java.jar

HDP Search installation on HDP 2.3

  1. prerequisites
    1. CentOS v6.x / Red Hat Enterprise Linux (RHEL) v6.x / Oracle Linux v6.x
    2. JDK 1.7 or higher
    3. Hortonworks Data Platform (HDP) v2.3
  2. installation
    1. note that solr should be installed on each node that runs HDFS
    2. each node
      1. export JAVA_HOME=/usr/jdk64/jdk1.8.0_40/
      2. ls /etc/yum.repos.d/HDP-UTILS.repo
      3. yum -y install lucidworks-hdpsearch