Monday, August 31, 2015

ElasticSearch

  1. 2015.08.14
    1. mapper attachments type for elasticsearch
      1. each node
        1. bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.3
          1. note that 2.4.3 is for ES 1.4
        2. restart
      2. DELETE /test
      3. PUT /test
      4. PUT /test/person/_mapping
        {
        "person" : {
        "properties" : {
        "file" : {
        "type" : "attachment",
        "fields" : {
        "file" : {"term_vector" : "with_positions_offsets", "store": true},
        "title" : {"store" : "yes"},
        "date" : {"store" : "yes"},
        "author" : {"store" : "yes"},
        "keywords" : {"store" : "yes"},
        "content_type" : {"store" : "yes"},
        "content_length" : {"store" : "yes"},
        "language" : {"store" : "yes"}
        }
        }
        }
        }
        }
      5. curl -XPOST "http://localhost:9200/test/person" -d '
        {
        "file" : {
        "_content" : "... base64 encoded attachment ..."
        }
        }'
      6. for long base64
        1. curl -XPOST "http://localhost:9200/test/person" -d @- <<CURL_DATA
          {
          "file" : {
          "_content" : "`base64 my.pdf | perl -pe 's/\n/\\n/g'`"
          }
          }
          CURL_DATA
      7. GET /test/person/_search
        {
        "fields": [ "file.date", "file.title", "file.name", "file.author", "file.keywords", "file.language", "file.cotent_length", "file.content_type", "file" ],
        "query": {
        "match": {
        "file.content_type": "pdf"
        }
        }
        }
  2. 2015.03.03
    1. bashrc
      1. export INNERIP=`hostname -i`
        export ES_HEAP_SIZE=8g
        export ES_CLASSPATH=/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*
    2. configuration
      1. cluster.name: test
      2. node.name: ${HOSTNAME}
      3. transport.host: ${INNERIP}
      4. discovery.zen.ping.multicast.enabled: false
      5. discovery.zen.ping.unicast.hosts: ["10.0.2.a", "10.0.2.b", "10.0.2.c"]
      6. indices.fielddata.cache.size: 40%
  3. 2015.03.02
    1. snapshot and restore
      1. repository register
        1. PUT _snapshot/hdfs
          {
          "type": "hdfs",
          "settings": {
          "path": "/backup/elasticsearch"
          }
          }
      2. repository verification
        1. POST _snapshot/hdfs/_verify
      3. snapshot
        1. PUT _snapshot/hdfs/20150302
      4. monitoring snapshot/restore progress
        1. GET _snapshot/hdfs/20150302/_status
        2. GET _snapshot/hdfs/20150302
      5. snapshot information and status
        1. GET _snapshot/hdfs/20150302
        2. GET _snapshot/hdfs/_all 
        3. GET _snapshot/_status 
        4. GET _snapshot/hdfs/_status 
        5. GET _snapshot/hdfs/20150302/_status
      6. restore
        1. POST _snapshot/hdfs/20150302/_restore
      7. snapshot deletion / stopping currently running snapshot and restore operations
        1. DELETE _snapshot/hdfs/20150302
      8. repository deletion
        1. DELETE _snapshot/hdfs
      9. reference
        1. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
    2. rolling update
      1. Disable shard reallocation
        1. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "none" } }'
      2. Shut down a single node within the cluster
        1. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
      3. Confirm that all shards are correctly reallocated to the remaining running nodes
      4. Download newest version
      5. Extract the zip or tarball to a new directory
      6. Copy the configuration files from the old Elasticsearch installation’s config directory to the new Elasticsearch installation’s config directory
      7. Move data files from the old Elasticsesarch installation’s data directory
      8. Install plugins
      9. Start the now upgraded node
      10. Confirm that it joins the cluster
      11. Re-enable shard reallocation
        1. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'
      12. Observe that all shards are properly allocated on all nodes
      13. Repeat this process for all remaining nodes
      14. Reference
        1. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades
  4. 2015.02.13
    1. MySQL Slow Query Log Mapping
      1. PUT msql-2015
        {
          "mappings": {
            "log": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock_time": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "query_time": {
                  "type": "double"
                },
                "rows_examined": {
                  "type": "double"
                },
                "rows_sent": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            }
          }
        }
    2. MySQL Slow Query Dump Mapping
      1. PUT msqld-2015
        {
          "mappings": {
            "dump": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "count": {
                  "type": "double"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "rows": {
                  "type": "double"
                },
                "time": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            }
          }
        }
  5. 2015.02.12
    1. MySQL Slow Query Log & Dump Mappings
      1. PUT msqld-2015
        {
          "mappings": {
            "log": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock_time": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "query_time": {
                  "type": "double"
                },
                "rows_examined": {
                  "type": "double"
                },
                "rows_sent": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            },
            "dump": {
              "properties": {
                "@timestamp": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "@version": {
                  "type": "string"
                },
                "count": {
                  "type": "double"
                },
                "host": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "ip": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "lock": {
                  "type": "double"
                },
                "message": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "query": {
                  "type": "string"
                },
                "rows": {
                  "type": "double"
                },
                "time": {
                  "type": "double"
                },
                "type": {
                  "type": "string"
                },
                "user": {
                  "type": "string"
                }
              }
            }
          }
        }
  6. 2015.01.19
    1. restart script
      1. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        sleep 1s
        curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
        sleep 1s
        bin/elasticsearch -d
        sleep 10s
        curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
  7. ~ 2015.01.01
    1. Commnad
      1. curl 'http://localhost:9200/?pretty'
      2. curl -XPOST 'http://localhost:9200/_shutdown'
      3. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
      4. curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
      5. curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
    2. Configuration
      1. config/elasticsearch.yml
        1. cluster.name
        2. node.name
        3. node.master
        4. node.data
        5. path.*
          1. path.conf: -Des.path.conf
          2. path.data
          3. path.work
          4. path.logs
        6. discovery.zen.ping.multicast.enabled: false
        7. discovery.zen.ping.unicast.hosts
        8. gateway.recover_after_nodes: n
        9. discovery.zen.minimum_master_nodes: (n/2) + 1
        10. action.disable_delete_all_indices: true
        11. action.auto_create_index: false
        12. action.destructive_requires_name: true
        13. index.mapper.dynamic: false
        14. script.disable_dynamic: true
        15. indices.fielddata.cache.size: 40%
      2. dynamic
        1. discovery.zen.minimum_master_nodes
          curl -XPUT localhost:9200/_cluster/settings -d '{
            "persistent" : {
              "discovery.zen.minimum_master_nodes" : (n/2) + 1
            }
          }'
        2. disable _all
          PUT /my_index/_mapping/my_type
          {
              "my_type": {
                  "_all": { "enabled": false }
              }
          }
        3. include_in_all
          PUT /my_index/my_type/_mapping
          {
              "my_type": {
                  "include_in_all": false,
                  "properties": {
                      "title": {
                          "type":           "string",
                          "include_in_all": true
                      },
                      ...
                  }
              }
          }
        4. _alias, _aliases
          PUT /my_index_v1 
          PUT /my_index_v1/_alias/my_index

          POST /_aliases
          {
              "actions": [
                  { "remove": { "index": "my_index_v1", "alias": "my_index" }},
                  { "add":    { "index": "my_index_v2", "alias": "my_index" }}
              ]
          }
        5. refresh_interval (bulk indexing)
          PUT /my_logs
          {
            "settings": {
              "refresh_interval": "30s" 
            }
          }
          POST /my_logs/_settings
          { "refresh_interval": -1 } 
          
          POST /my_logs/_settings
          { "refresh_interval": "1s" } 
        6. flush
          POST /blogs/_flush 
          
          POST /_flush?wait_for_ongoing
        7. optimize
          POST /logstash-old-index/_optimize?max_num_segments=1
        8. filed length norm (for logging)
          PUT /my_index
          {
            "mappings": {
              "doc": {
                "properties": {
                  "text": {
                    "type": "string",
                    "norms": { "enabled": false } 
                  }
                }
              }
            }
          }
        9. tune cluster and index recovery settings (test the value)
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_initial_primary_recoveries":25}}'
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":5}}'
          ?
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.max_bytes_per_sec":"100mb"}}'
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.concurrent_streams":20}}'
      3. logging.yml
        1. use node.name instead of cluster.name
          file: ${path.logs}/${node.name}.log
      4. elasticsearch.in.sh
        1. disable HeapDumpOnOutOfMemoryError
          #JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
      5. ES_HEAP_SIZE: 50% (< 32g)
        1. export ES_HEAP_SIZE=31g
      6. no swap
        1. bootstrap.mlockall = true
        2. ulimit -l unlimited
      7. thread pools
        1. thread pool size
          1. search - 3 * # of processors (3 * 64 = 192)
          2. index - 2 * # of processors (2 * 64 = 128)
          3. bulk - 3 * # of processors (3 * 64 = 192)
        2. queues - set the size to -1 to prevent rejections from ES
      8. buffers
        1. increased indexing buffer size to 40%
      9. dynamic node.name
        1. ES script
          export ES_NODENMAE=`hostname -s`
        2. elasticsearch.yml
          node.name: "${ES_NODENAME}"
    3. Hardware
      1. CPU
        1. core
      2. disk
        1. SSD
          1. noop / deadline scheduler
          2. better IOPS
          3. cheaper WRT: IOPS
          4. manufacturing tolerance can vary
        2. RAID
          1. do not necessarily need
          2. ES handles redundancy
    4. Monitoring
      1. curl 'localhost:9200/_cluster/health'
      2. curl 'localhost:9200/_nodes/process'
        1. max_file_descriptotrs: 30000?
      3. curl 'localhost:9200/_nodes/jvm'
        1. version
        2. mem.heap_max
      4. curl 'localhost:9200/_nodes/jvm/stats'
        1. heap_used
      5. curl 'localhost:9200/_nodes/indices/stats'
        1. fielddata
      6. curl 'localhost:9200/_nodes/indices/stats?fields=created_on'
        1. fields
      7. curl 'localhost:9200/_nodes/http/stats'
        1. http
      8. GET /_stats/fielddata?fields=*
      9. GET /_nodes/stats/indices/fielddata?fields=*
      10. GET /_nodes/stats/indices/fielddata?level=indices&fields=*
    5. Scenario
      1. adding nodes
        1. disable allocation to stop shard shuffling until ready
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        2. increase speed of transfers
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
        3. start new nodes
        4. enable allocation
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
      2. removing nodes
        1. exclude the nodes from the cluster, this will tell ES to move things off
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude._name":"node-05*,node-06*"}}'
        2. increase speed of transfers
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
        3. shutdown old nodes after all shards move off
          curl -XPOST 'localhost:9200/_cluster/nodes/node-05*,node-06*/_shutdown'
      3. upgrades / node restarts
        1. disable auto balancing  if doing rolling restarts
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        2. restart
        3. able auto balancing
          curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
      4. re / bulk indexing
        1. set replicas to 0
        2. increase after completion
      5. configure heap size
        1. heap size setting
        2. export ES_HEAP_SIZE=9g
        3. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
        4. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
        5. bin/elasticsearch -d
        6. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.