- Introduction
- Elasticsearch is an open-source search engine built on top of Apache Lucene™ , a full-text search engine library
- Elasticsearch is a real-time distributed search and analytics engine
- It allows you to explore your data at a speed and at a scale never before possible
- It is used for full text search, structured search, analytics, and all three in combination
- Why ES
- reason 1
- Unfortunately, most databases are astonishingly inept at extracting actionable knowledge from your data
- Can they perform full-text search, handle synonyms and score documents by relevance?
- Can they generate analytics and aggregations from the same data?
- Most importantly, can they do this in real-time without big batch processing jobs?
- reason 2
- A distributed real-time document store where every field is indexed and searchable
- A distributed search engine with real-time analytics
- Capable of scaling to hundreds of servers and petabytes of structured and unstructured data
- reason 1
- Installation
- curl -L -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.zip
- unzip elasticsearch-1.3.2.zip
- cd elasticsearch-1.3.2
- ./bin/elasticsearch -d
- Cluster
- 3 lower resource master-eligible nodes in large clusters
- light wight client nodes
- metal is more configurable
- metal can utilize SSD
- Commnad
- curl 'http://localhost:9200/?pretty'
- curl -XPOST 'http://localhost:9200/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
- Configuration
- config/elasticsearch.yml
- cluster.name
- node.name
- node.master
- node.data
- path.*
- path.conf: -Des.path.conf
- path.data
- path.work
- path.logs
- discovery.zen.ping.multicast.enabled: false
- discovery.zen.ping.unicast.hosts
- gateway.recover_after_nodes: n
- discovery.zen.minimum_master_nodes: (n/2) + 1
- action.disable_delete_all_indices: true
- action.auto_create_index: false
- action.destructive_requires_name: true
- index.mapper.dynamic: false
- script.disable_dynamic: true
- dynamic
- discovery.zen.minimum_master_nodes
curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "discovery.zen.minimum_master_nodes" : (n/2) + 1 } }'
- disable _all
PUT /my_index/_mapping/my_type { "my_type": { "_all": { "enabled": false } } }
- include_in_all
PUT /my_index/my_type/_mapping { "my_type": { "include_in_all": false, "properties": { "title": { "type": "string", "include_in_all": true }, ... } } }
- _alias, _aliases
PUT /my_index_v1 PUT /my_index_v1/_alias/my_index
POST /_aliases { "actions": [ { "remove": { "index": "my_index_v1", "alias": "my_index" }}, { "add": { "index": "my_index_v2", "alias": "my_index" }} ] }
- refresh_interval (bulk indexing)
PUT /my_logs { "settings": { "refresh_interval": "30s" } }
POST /my_logs/_settings { "refresh_interval": -1 } POST /my_logs/_settings { "refresh_interval": "1s" }
- flush
POST /blogs/_flush POST /_flush?wait_for_ongoing
- optimize
POST /logstash-old-index/_optimize?max_num_segments=1
- filed length norm (for logging)
PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false } } } } } }
- tune cluster and index recovery settings (test the value)
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_initial_primary_recoveries":25}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":5}}' ? curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.max_bytes_per_sec":"100mb"}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.concurrent_streams":20}}'
- discovery.zen.minimum_master_nodes
- logging.yml
- use node.name instead of cluster.name
file: ${path.logs}/${node.name}.log
- use node.name instead of cluster.name
- elasticsearch.in.sh
- disable HeapDumpOnOutOfMemoryError
#JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
- disable HeapDumpOnOutOfMemoryError
- ES_HEAP_SIZE: 50%
- heaps < 32GB
- no swap
- bootstrap.mlockall = true
- ulimit -l unlimited
- thread pools
- thread pool size
- search - 3 * # of processors (3 * 64 = 192)
- index - 2 * # of processors (2 * 64 = 128)
- bulk - 3 * # of processors (3 * 64 = 192)
- queues - set the size to -1 to prevent rejections from ES
- thread pool size
- buffers
- increased indexing buffer size to 40%
- dynamic node.name
- ES script
export ES_NODENMAE=`hostname -s`
- elasticsearch.yml
node.name: "${ES_NODENAME}"
- ES script
- config/elasticsearch.yml
- Hardware
- CPU
- core
- disk
- SSD
- noop / deadline scheduler
- better IOPS
- cheaper WRT: IOPS
- manufacturing tolerance can vary
- RAID
- do not necessarily need
- ES handles redundancy
- SSD
- CPU
- Monitoring
- curl 'localhost:9200/_cluster/health'
- curl 'localhost:9200/_nodes/process'
- max_file_descriptotrs: 30000?
- curl 'localhost:9200/_nodes/jvm'
- version
- mem.heap_max
- curl 'localhost:9200/_nodes/jvm/stats'
- heap_used
- curl 'localhost:9200/_nodes/indices/stats'
- fielddata
- curl 'localhost:9200/_nodes/indices/stats?fields=created_on'
- fields
- curl 'localhost:9200/_nodes/http/stats'
- http
- GET /_stats/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?level=indices&fields=*
- Scenario
- adding nodes
- disable allocation to stop shard shuffling until ready
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- increase speed of transfers
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- start new nodes
- enable allocation
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- disable allocation to stop shard shuffling until ready
- removing nodes
- exclude the nodes from the cluster, this will tell ES to move things off
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude._name":"node-05*,node-06*"}}'
- increase speed of transfers
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- shutdown old nodes after all shards move off
curl -XPOST 'localhost:9200/_cluster/nodes/node-05*,node-06*/_shutdown'
- exclude the nodes from the cluster, this will tell ES to move things off
- upgrades / node restarts
- disable auto balancing if doing rolling restarts
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- restart
- able auto balancing
curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
- disable auto balancing if doing rolling restarts
- re / bulk indexing
- set replicas to 0
- increase after completion
- adding nodes
- Restoration
- snapshot
- Reference
- Elasticsearch - The Definitive Guide
- http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/
- http://www.elasticsearch.org/webinars/elk-stack-devops-environment/
- http://www.elasticsearch.org/videos/moloch-elasticsearch-powering-network-forensics-aol/
- http://www.elasticsearch.org/videos/elastic-searching-big-data/
Wednesday, December 31, 2014
ElasticSearch
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.