- Introduction- Elasticsearch is an open-source search engine built on top of Apache Lucene™ , a full-text search engine library
- Elasticsearch is a real-time distributed search and analytics engine
- It allows you to explore your data at a speed and at a scale never before possible
- It is used for full text search, structured search, analytics, and all three in combination
 
- Why ES- reason 1- Unfortunately, most databases are astonishingly inept at extracting actionable knowledge from your data
- Can they perform full-text search, handle synonyms and score documents by relevance?
- Can they generate analytics and aggregations from the same data?
- Most importantly, can they do this in real-time without big batch processing jobs?
 
- reason 2- A distributed real-time document store where every field is indexed and searchable
- A distributed search engine with real-time analytics
- Capable of scaling to hundreds of servers and petabytes of structured and unstructured data
 
 
- reason 1
- Installation- curl -L -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.zip
- unzip elasticsearch-1.3.2.zip
- cd elasticsearch-1.3.2
- ./bin/elasticsearch -d
 
- Cluster- 3 lower resource master-eligible nodes in large clusters
- light wight client nodes
- metal is more configurable
- metal can utilize SSD
 
- Commnad- curl 'http://localhost:9200/?pretty'
- curl -XPOST 'http://localhost:9200/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
- curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
 
- Configuration- config/elasticsearch.yml- cluster.name
- node.name
- node.master
- node.data
- path.*- path.conf: -Des.path.conf
- path.data
- path.work
- path.logs
 
- discovery.zen.ping.multicast.enabled: false
- discovery.zen.ping.unicast.hosts
- gateway.recover_after_nodes: n
- discovery.zen.minimum_master_nodes: (n/2) + 1
- action.disable_delete_all_indices: true
- action.auto_create_index: false
- action.destructive_requires_name: true
- index.mapper.dynamic: false
- script.disable_dynamic: true
 
- dynamic- discovery.zen.minimum_master_nodescurl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "discovery.zen.minimum_master_nodes" : (n/2) + 1 } }'
- disable _allPUT /my_index/_mapping/my_type { "my_type": { "_all": { "enabled": false } } }
- include_in_allPUT /my_index/my_type/_mapping { "my_type": { "include_in_all": false, "properties": { "title": { "type": "string", "include_in_all": true }, ... } } }
- _alias, _aliases- PUT /my_index_v1 PUT /my_index_v1/_alias/my_index
- POST /_aliases { "actions": [ { "remove": { "index": "my_index_v1", "alias": "my_index" }}, { "add": { "index": "my_index_v2", "alias": "my_index" }} ] }
 
- refresh_interval (bulk indexing)- PUT /my_logs { "settings": { "refresh_interval": "30s" } }
- POST /my_logs/_settings { "refresh_interval": -1 } POST /my_logs/_settings { "refresh_interval": "1s" }
 
- flush- POST /blogs/_flush POST /_flush?wait_for_ongoing
 
- optimize- POST /logstash-old-index/_optimize?max_num_segments=1
 
- filed length norm (for logging)- PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false } } } } } }
 
- tune cluster and index recovery settings (test the value)
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_initial_primary_recoveries":25}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.node_concurrent_recoveries":5}}' ? curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.max_bytes_per_sec":"100mb"}}' curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.recovery.concurrent_streams":20}}'
 
- discovery.zen.minimum_master_nodes
- logging.yml- use node.name instead of cluster.name
 file: ${path.logs}/${node.name}.log
 
- use node.name instead of cluster.name
- elasticsearch.in.sh- disable HeapDumpOnOutOfMemoryError
 #JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
 
- disable HeapDumpOnOutOfMemoryError
- ES_HEAP_SIZE: 50%
- heaps < 32GB
- no swap- bootstrap.mlockall = true
- ulimit -l unlimited
 
- thread pools- thread pool size- search - 3 * # of processors (3 * 64 = 192)
- index - 2 * # of processors (2 * 64 = 128)
- bulk - 3 * # of processors (3 * 64 = 192)
 
- queues - set the size to -1 to prevent rejections from ES
 
- thread pool size
- buffers- increased indexing buffer size to 40%
 
- dynamic node.name- ES script
 export ES_NODENMAE=`hostname -s`
- elasticsearch.yml
 node.name: "${ES_NODENAME}"
 
- ES script
 
- config/elasticsearch.yml
- Hardware- CPU- core
 
- disk- SSD- noop / deadline scheduler
- better IOPS
- cheaper WRT: IOPS
- manufacturing tolerance can vary
 
- RAID- do not necessarily need
- ES handles redundancy
 
 
- SSD
 
- CPU
- Monitoring- curl 'localhost:9200/_cluster/health'
- curl 'localhost:9200/_nodes/process'- max_file_descriptotrs: 30000?
 
- curl 'localhost:9200/_nodes/jvm'- version
- mem.heap_max
 
- curl 'localhost:9200/_nodes/jvm/stats'- heap_used
 
- curl 'localhost:9200/_nodes/indices/stats'- fielddata
 
- curl 'localhost:9200/_nodes/indices/stats?fields=created_on'- fields
 
- curl 'localhost:9200/_nodes/http/stats'- http
 
- GET /_stats/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?fields=*
- GET /_nodes/stats/indices/fielddata?level=indices&fields=*
 
- Scenario- adding nodes- disable allocation to stop shard shuffling until ready
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- increase speed of transfers
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- start new nodes
- enable allocation
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
 
- disable allocation to stop shard shuffling until ready
- removing nodes- exclude the nodes from the cluster, this will tell ES to move things off
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude._name":"node-05*,node-06*"}}'
- increase speed of transfers
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"indices.recovery.concurrent_streams":6,"indices.recovery.max_bytes_per_sec":"50mb"}}'
- shutdown old nodes after all shards move off
 curl -XPOST 'localhost:9200/_cluster/nodes/node-05*,node-06*/_shutdown'
 
- exclude the nodes from the cluster, this will tell ES to move things off
- upgrades / node restarts- disable auto balancing  if doing rolling restarts
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":true}}'
- restart
- able auto balancing
 curl -XPUT localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.disable_allocation":false}}'
 
- disable auto balancing  if doing rolling restarts
- re / bulk indexing- set replicas to 0
- increase after completion
 
 
- adding nodes
- Restoration- snapshot
 
- Reference- Elasticsearch - The Definitive Guide
- http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/
- http://www.elasticsearch.org/webinars/elk-stack-devops-environment/
- http://www.elasticsearch.org/videos/moloch-elasticsearch-powering-network-forensics-aol/
- http://www.elasticsearch.org/videos/elastic-searching-big-data/
 
Wednesday, December 31, 2014
ElasticSearch
Subscribe to:
Post Comments (Atom)
 
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.