Wednesday, July 29, 2015

elasticsearch-repository-hdfs

  1. Introduction
    1. elasticsearch-repository-hdfs plugin allows Elasticsearch 1.4 to use hdfs file-system as a repository for snapshot/restore.
  2. Installation
    1. version information
      1. CDH: 5.3.0
      2. ealsticsearch: 1.4.2
      3. elasticsearch-repository-hdfs: 2.1.0.Beta3-light
        1. note that the stable version which is 2.0.2 did not work right before I am writing this blog.
        2. check https://groups.google.com/forum/#!msg/elasticsearch/CZy1oJpKHyc/1uvoMbI5r5sJ
    2. hadoop installed at the same node
      1. append the output of "hadoop classpath" commnad to ES_CLASSPATH
      2. example
        1. hadoop classpath
        2. export ES_CLASSPATH=/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*
      3. install plugin at each node and restart it
        1. bin/plugin -i elasticsearch/elasticsearch-repository-hdfs/2.1.0.Beta3-light
    3. no hadoop installed at the same node
      1. install plugin at each node and restart it
        1. bin/plugin -i elasticsearch/elasticsearch-repository-hdfs/2.1.0.Beta3-hadoop2
    4. repository register
      1. exmaple
        1. PUT _snapshot/hdfs
          {
          "type": "hdfs",
          "settings": {
          "path": "/backup/elasticsearch"
          }
          }
    5. verification
      1. POST _snapshot/hdfs/_verify
  3. Configuration
    1. uri: "hdfs://<host>:<port>/" # optional - Hadoop file-system URI
    2. path: "some/path" # required - path with the file-system where data is stored/loaded
    3. load_defaults: "true" # optional - whether to load the default Hadoop configuration (default) or not
    4. conf_location: "extra-cfg.xml" # optional - Hadoop configuration XML to be loaded (use commas for multi values)
    5. conf.<key> : "<value>" # optional - 'inlined' key=value added to the Hadoop configuration 
    6. concurrent_streams: 5 # optional - the number of concurrent streams (defaults to 5) 
    7. compress: "false" # optional - whether to compress the data or not (default) 
    8. chunk_size: "10mb" # optional - chunk size (disabled by default)
  4. Reference
    1. https://github.com/elasticsearch/elasticsearch-hadoop/tree/master/repository-hdfs

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.