Mungeol Heo: ElasticSearch

2015.08.14
1. mapper attachments type for elasticsearch
  1. each node
    1. bin/plugin install elasticsearch/elasticsearch-mapper-attachments/2.4.3
      1. note that 2.4.3 is for ES 1.4
    2. restart
  2. DELETE /test
  3. PUT /test
  4. PUT /test/person/_mapping
    {
    "person" : {
    "properties" : {
    "file" : {
    "type" : "attachment",
    "fields" : {
    "file" : {"term_vector" : "with_positions_offsets", "store": true},
    "title" : {"store" : "yes"},
    "date" : {"store" : "yes"},
    "author" : {"store" : "yes"},
    "keywords" : {"store" : "yes"},
    "content_type" : {"store" : "yes"},
    "content_length" : {"store" : "yes"},
    "language" : {"store" : "yes"}
    }
    }
    }
    }
    }
  5. curl -XPOST "http://localhost:9200/test/person" -d '
    {
    "file" : {
    "_content" : "... base64 encoded attachment ..."
    }
    }'
  6. for long base64
    1. curl -XPOST "http://localhost:9200/test/person" -d @- <<CURL_DATA
      {
      "file" : {
      "_content" : "`base64 my.pdf | perl -pe 's/\n/\\n/g'`"
      }
      }
      CURL_DATA
  7. GET /test/person/_search
    {
    "fields": [ "file.date", "file.title", "file.name", "file.author", "file.keywords", "file.language", "file.cotent_length", "file.content_type", "file" ],
    "query": {
    "match": {
    "file.content_type": "pdf"
    }
    }
    }
2015.03.03
1. bashrc
  1. export INNERIP=`hostname -i`
    export ES_HEAP_SIZE=8g
    export ES_CLASSPATH=/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/libexec/../../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*
2. configuration
  1. cluster.name: test
  2. node.name: ${HOSTNAME}
  3. transport.host: ${INNERIP}
  4. discovery.zen.ping.multicast.enabled: false
  5. discovery.zen.ping.unicast.hosts: ["10.0.2.a", "10.0.2.b", "10.0.2.c"]
  6. indices.fielddata.cache.size: 40%
2015.03.02
1. snapshot and restore
  1. repository register
    1. PUT _snapshot/hdfs
      {
      "type": "hdfs",
      "settings": {
      "path": "/backup/elasticsearch"
      }
      }
  2. repository verification
    1. POST _snapshot/hdfs/_verify
  3. snapshot
    1. PUT _snapshot/hdfs/20150302
  4. monitoring snapshot/restore progress
    1. GET _snapshot/hdfs/20150302/_status
    2. GET _snapshot/hdfs/20150302
  5. snapshot information and status
    1. GET _snapshot/hdfs/20150302
    2. GET _snapshot/hdfs/_all
    3. GET _snapshot/_status
    4. GET _snapshot/hdfs/_status
    5. GET _snapshot/hdfs/20150302/_status
  6. restore
    1. POST _snapshot/hdfs/20150302/_restore
  7. snapshot deletion / stopping currently running snapshot and restore operations
    1. DELETE _snapshot/hdfs/20150302
  8. repository deletion
    1. DELETE _snapshot/hdfs
  9. reference
    1. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
2. rolling update
  1. Disable shard reallocation
    1. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "none" } }'
  2. Shut down a single node within the cluster
    1. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
  3. Confirm that all shards are correctly reallocated to the remaining running nodes
  4. Download newest version
  5. Extract the zip or tarball to a new directory
  6. Copy the configuration files from the old Elasticsearch installation’s config directory to the new Elasticsearch installation’s config directory
  7. Move data files from the old Elasticsesarch installation’s data directory
  8. Install plugins
  9. Start the now upgraded node
  10. Confirm that it joins the cluster
  11. Re-enable shard reallocation
    1. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'
  12. Observe that all shards are properly allocated on all nodes
  13. Repeat this process for all remaining nodes
  14. Reference
    1. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades

2015.02.13

MySQL Slow Query Log Mapping

PUT msql-2015
{
  "mappings": {
    "log": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "@version": {
          "type": "string"
        },
        "host": {
          "type": "string",
          "index": "not_analyzed"
        },
        "ip": {
          "type": "string",
          "index": "not_analyzed"
        },
        "lock_time": {
          "type": "double"
        },
        "message": {
          "type": "string",
          "index": "not_analyzed"
        },
        "query": {
          "type": "string"
        },
        "query_time": {
          "type": "double"
        },
        "rows_examined": {
          "type": "double"
        },
        "rows_sent": {
          "type": "double"
        },
        "type": {
          "type": "string"
        },
        "user": {
          "type": "string"
        }
      }
    }
  }
}

MySQL Slow Query Dump Mapping

PUT msqld-2015
{
  "mappings": {
    "dump": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "@version": {
          "type": "string"
        },
        "count": {
          "type": "double"
        },
        "host": {
          "type": "string",
          "index": "not_analyzed"
        },
        "ip": {
          "type": "string",
          "index": "not_analyzed"
        },
        "lock": {
          "type": "double"
        },
        "message": {
          "type": "string",
          "index": "not_analyzed"
        },
        "query": {
          "type": "string"
        },
        "rows": {
          "type": "double"
        },
        "time": {
          "type": "double"
        },
        "type": {
          "type": "string"
        },
        "user": {
          "type": "string"
        }
      }
    }
  }
}

2015.02.12

MySQL Slow Query Log & Dump Mappings

PUT msqld-2015
{
  "mappings": {
    "log": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "@version": {
          "type": "string"
        },
        "host": {
          "type": "string",
          "index": "not_analyzed"
        },
        "ip": {
          "type": "string",
          "index": "not_analyzed"
        },
        "lock_time": {
          "type": "double"
        },
        "message": {
          "type": "string",
          "index": "not_analyzed"
        },
        "query": {
          "type": "string"
        },
        "query_time": {
          "type": "double"
        },
        "rows_examined": {
          "type": "double"
        },
        "rows_sent": {
          "type": "double"
        },
        "type": {
          "type": "string"
        },
        "user": {
          "type": "string"
        }
      }
    },
    "dump": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "@version": {
          "type": "string"
        },
        "count": {
          "type": "double"
        },
        "host": {
          "type": "string",
          "index": "not_analyzed"
        },
        "ip": {
          "type": "string",
          "index": "not_analyzed"
        },
        "lock": {
          "type": "double"
        },
        "message": {
          "type": "string",
          "index": "not_analyzed"
        },
        "query": {
          "type": "string"
        },
        "rows": {
          "type": "double"
        },
        "time": {
          "type": "double"
        },
        "type": {
          "type": "string"
        },
        "user": {
          "type": "string"
        }
      }
    }
  }
}

2015.01.19

restart script

curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
sleep 1s
curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
sleep 1s
bin/elasticsearch -d
sleep 10s
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'

~ 2015.01.01
1. Commnad
  1. curl 'http://localhost:9200/?pretty'
  2. curl -XPOST 'http://localhost:9200/_shutdown'
  3. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
  4. curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
  5. curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
2. Configuration
  1. config/elasticsearch.yml
    1. cluster.name
    2. node.name
    3. node.master
    4. node.data
    5. path.*
      1. path.conf: -Des.path.conf
      2. path.data
      3. path.work
      4. path.logs
    6. discovery.zen.ping.multicast.enabled: false
    7. discovery.zen.ping.unicast.hosts
    8. gateway.recover_after_nodes: n
    9. discovery.zen.minimum_master_nodes: (n/2) + 1
    10. action.disable_delete_all_indices: true
    11. action.auto_create_index: false
    12. action.destructive_requires_name: true
    13. index.mapper.dynamic: false
    14. script.disable_dynamic: true
    15. indices.fielddata.cache.size: 40%
  2. dynamic
    1. discovery.zen.minimum_master_nodes
```
curl -XPUT localhost:9200/_cluster/settings -d '{
  "persistent" : {
    "discovery.zen.minimum_master_nodes" : (n/2) + 1
  }
}'
```
    2. disable _all
```
PUT /my_index/_mapping/my_type
{
    "my_type": {
        "_all": { "enabled": false }
    }
}
```
    3. include_in_all
```
PUT /my_index/my_type/_mapping
{
    "my_type": {
        "include_in_all": false,
        "properties": {
            "title": {
                "type":           "string",
                "include_in_all": true
            },
            ...
        }
    }
}
```
    4. _alias, _aliases
```
PUT /my_index_v1 
PUT /my_index_v1/_alias/my_index
```
```
POST /_aliases
{
    "actions": [
        { "remove": { "index": "my_index_v1", "alias": "my_index" }},
        { "add":    { "index": "my_index_v2", "alias": "my_index" }}
    ]
}
```
    5. refresh_interval (bulk indexing)
```
PUT /my_logs
{
  "settings": {
    "refresh_interval": "30s" 
  }
}
```
```
POST /my_logs/_settings
{ "refresh_interval": -1 } 

POST /my_logs/_settings
{ "refresh_interval": "1s" } 
```
    6. flush
```
POST /blogs/_flush 

POST /_flush?wait_for_ongoing
```
    7. optimize
```
POST /logstash-old-index/_optimize?max_num_segments=1
```
    8. filed length norm (for logging)
```
PUT /my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "string",
          "norms": { "enabled": false } 
        }
      }
    }
  }
}
```
    9. tune cluster and index recovery settings (test the value)
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_initial_primary_recoveries":25}}'
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":5}}'
?
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.max_bytes_per_sec":"100mb"}}'
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.concurrent_streams":20}}'
```
  3. logging.yml
    1. use node.name instead of cluster.name
```
file: ${path.logs}/${node.name}.log
```
  4. elasticsearch.in.sh
    1. disable HeapDumpOnOutOfMemoryError
```
#JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
```
  5. ES_HEAP_SIZE: 50% (< 32g)
    1. export ES_HEAP_SIZE=31g
  6. no swap
    1. bootstrap.mlockall = true
    2. ulimit -l unlimited
  7. thread pools
    1. thread pool size
      1. search - 3 * # of processors (3 * 64 = 192)
      2. index - 2 * # of processors (2 * 64 = 128)
      3. bulk - 3 * # of processors (3 * 64 = 192)
    2. queues - set the size to -1 to prevent rejections from ES
  8. buffers
    1. increased indexing buffer size to 40%
  9. dynamic node.name
    1. ES script
```
export ES_NODENMAE=`hostname -s`
```
    2. elasticsearch.yml
```
node.name: "${ES_NODENAME}"
```
3. Hardware
  1. CPU
    1. core
  2. disk
    1. SSD
      1. noop / deadline scheduler
      2. better IOPS
      3. cheaper WRT: IOPS
      4. manufacturing tolerance can vary
    2. RAID
      1. do not necessarily need
      2. ES handles redundancy
4. Monitoring
  1. curl 'localhost:9200/_cluster/health'
  2. curl 'localhost:9200/_nodes/process'
    1. max_file_descriptotrs: 30000?
  3. curl 'localhost:9200/_nodes/jvm'
    1. version
    2. mem.heap_max
  4. curl 'localhost:9200/_nodes/jvm/stats'
    1. heap_used
  5. curl 'localhost:9200/_nodes/indices/stats'
    1. fielddata
  6. curl 'localhost:9200/_nodes/indices/stats?fields=created_on'
    1. fields
  7. curl 'localhost:9200/_nodes/http/stats'
    1. http
  8. GET /_stats/fielddata?fields=*
  9. GET /_nodes/stats/indices/fielddata?fields=*
  10. GET /_nodes/stats/indices/fielddata?level=indices&fields=*
5. Scenario
  1. adding nodes
    1. disable allocation to stop shard shuffling until ready
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
```
    2. increase speed of transfers
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
```
    3. start new nodes
    4. enable allocation
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
```
  2. removing nodes
    1. exclude the nodes from the cluster, this will tell ES to move things off
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude._name":"node-05*,node-06*"}}'
```
    2. increase speed of transfers
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
```
    3. shutdown old nodes after all shards move off
```
curl -XPOST 'localhost:9200/_cluster/nodes/node-05*,node-06*/_shutdown'
```
  3. upgrades / node restarts
    1. disable auto balancing if doing rolling restarts
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
```
    2. restart
    3. able auto balancing
```
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
```
  4. re / bulk indexing
    1. set replicas to 0
    2. increase after completion
  5. configure heap size
    1. heap size setting
    2. export ES_HEAP_SIZE=9g
    3. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
    4. curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
    5. bin/elasticsearch -d
    6. curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'

Mungeol Heo

Monday, August 31, 2015

ElasticSearch

No comments:

Post a Comment