Mungeol Heo: LogStash

introduction
1. logstash is a tool for managing events and logs
2. You can use it to collect logs, parse them, and store them for later use (like, for searching)
3. Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs
4. It is fully free and fully open source
5. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way
6. logstash is now a part of the Elasticsearch family! This allows us to build better software much faster as well as offering production support
Installation
1. curl -O https://download.elasticsearch.org/logstash/logstash/logstash-1.4.2.tar.gz
2. tar zxvf logstash-1.4.2.tar.gz
3. cd logstash-1.4.2
Sample
1. Sample 1
  1. bin/logstash -e 'input { stdin { } } output { stdout {} }'
  2. hello world
2. Sample 2
  1. bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
```
goodnight moon
{
"message" => "goodnight moon",
"@timestamp" => "2013-11-20T23:48:05.335Z",
"@version" => "1",
"host" => "my-laptop"
}
```
Configuration
1. Inputs
  1. file: reads from a file on the filesystem, much like the UNIX command "tail -0a"
  2. redis: reads from a redis server, using both redis channels and also redis lists. Redis is often used as a "broker" in a centralized Logstash installation, which queues Logstash events from remote Logstash "shippers"
  3. lumberjack: processes events sent in the lumberjack protocol. Now called logstash-forwarder
2. Filters
  1. grok: parses arbitrary text and structure it. Grok is currently the best way in Logstash to parse unstructured log data into something structured and queryable. With 120 patterns shipped built-in to Logstash, it’s more than likely you’ll find one that meets your needs
  2. mutate: The mutate filter allows you to do general mutations to fields. You can rename, remove, replace, and modify fields in your events
  3. drop: drop an event completely, for example, debug events
  4. clone: make a copy of an event, possibly adding or removing fields
  5. geoip: adds information about geographical location of IP addresses (and displays amazing charts in kibana)
3. Outputs
  1. elasticsearch: If you’re planning to save your data in an efficient, convenient and easily queryable format
  2. file: writes event data to a file on disk
4. Codecs
  1. json: encode / decode data in JSON format
  2. multiline: Takes multiple-line text events and merge them into a single event, e.g. java exception and stacktrace messages

Inputs

file

input {
  file {
    add_field => ... # hash (optional), default: {}
    codec => ... # codec (optional), default: "plain"
    discover_interval => ... # number (optional), default: 15
    exclude => ... # array (optional)
    path => ... # array (required)
    sincedb_path => ... # string (optional)
    sincedb_write_interval => ... # number (optional), default: 15
    start_position => ... # string, one of ["beginning", "end"] (optional), default: "end"
    stat_interval => ... # number (optional), default: 1
    tags => ... # array (optional)
    type => ... # string (optional)
  }
}

Filters

grok

filter {
  grok {
    add_field => ... # hash (optional), default: {}
    add_tag => ... # array (optional), default: []
    break_on_match => ... # boolean (optional), default: true
    drop_if_match => ... # boolean (optional), default: false
    keep_empty_captures => ... # boolean (optional), default: false
    match => ... # hash (optional), default: {}
    named_captures_only => ... # boolean (optional), default: true
    overwrite => ... # array (optional), default: []
    patterns_dir => ... # array (optional), default: []
    remove_field => ... # array (optional), default: []
    remove_tag => ... # array (optional), default: []
    tag_on_failure => ... # array (optional), default: ["_grokparsefailure"]
  }
}

geoip

filter {
  geoip {
    add_field => ... # hash (optional), default: {}
    add_tag => ... # array (optional), default: []
    database => ... # a valid filesystem path (optional)
    fields => ... # array (optional)
    remove_field => ... # array (optional), default: []
    remove_tag => ... # array (optional), default: []
    source => ... # string (required)
    target => ... # string (optional), default: "geoip"
}

}

multiline

filter {
  multiline {
    add_field => ... # hash (optional), default: {}
    add_tag => ... # array (optional), default: []
    enable_flush => ... # boolean (optional), default: false
    negate => ... # boolean (optional), default: false
    pattern => ... # string (required)
    patterns_dir => ... # array (optional), default: []
    remove_field => ... # array (optional), default: []
    remove_tag => ... # array (optional), default: []
    stream_identity => ... # string (optional), default: "%{host}.%{path}.%{type}"
    what => ... # string, one of ["previous", "next"] (required)
  }
}

drop

filter {
  drop {
    add_field => ... # hash (optional), default: {}
    add_tag => ... # array (optional), default: []
    remove_field => ... # array (optional), default: []
    remove_tag => ... # array (optional), default: []
  }
}

date

filter {
  date {
    add_field => ... # hash (optional), default: {}
    add_tag => ... # array (optional), default: []
    locale => ... # string (optional)
    match => ... # array (optional), default: []
    remove_field => ... # array (optional), default: []
    remove_tag => ... # array (optional), default: []
    target => ... # string (optional), default: "@timestamp"
    timezone => ... # string (optional)
  }
}

mutate

filter {
  mutate {
    add_field => ... # hash (optional), default: {}
    add_tag => ... # array (optional), default: []
    convert => ... # hash (optional)
    gsub => ... # array (optional)
    join => ... # hash (optional)
    lowercase => ... # array (optional)
    merge => ... # hash (optional)
    remove_field => ... # array (optional), default: []
    remove_tag => ... # array (optional), default: []
    rename => ... # hash (optional)
    replace => ... # hash (optional)
    split => ... # hash (optional)
    strip => ... # array (optional)
    update => ... # hash (optional)
    uppercase => ... # array (optional)
  }
}

Outputs

elasticsearch

output {
  elasticsearch {
    action => ... # string (optional), default: "index"
    bind_host => ... # string (optional)
    bind_port => ... # number (optional)
    cluster => ... # string (optional)
    codec => ... # codec (optional), default: "plain"
    document_id => ... # string (optional), default: nil
    embedded => ... # boolean (optional), default: false
    embedded_http_port => ... # string (optional), default: "9200-9300"
    flush_size => ... # number (optional), default: 5000
    host => ... # string (optional)
    idle_flush_time => ... # number (optional), default: 1
    index => ... # string (optional), default: "logstash-%{+YYYY.MM.dd}"
    index_type => ... # string (optional)
    manage_template => ... # boolean (optional), default: true
    node_name => ... # string (optional)
    port => ... # string (optional)
    protocol => ... # string, one of ["node", "transport", "http"] (optional)
    template => ... # a valid filesystem path (optional)
    template_name => ... # string (optional), default: "logstash"
    template_overwrite => ... # boolean (optional), default: false
    workers => ... # number (optional), default: 1
  }
}

elasticearch-river

output {
  elasticsearch_river {
    codec => ... # codec (optional), default: "plain"
    document_id => ... # string (optional), default: nil
    durable => ... # boolean (optional), default: true
    es_bulk_size => ... # number (optional), default: 1000
    es_bulk_timeout_ms => ... # number (optional), default: 100
    es_host => ... # string (required)
    es_ordered => ... # boolean (optional), default: false
    es_port => ... # number (optional), default: 9200
    exchange => ... # string (optional), default: "elasticsearch"
    exchange_type => ... # string, one of ["fanout", "direct", "topic"] (optional), default: "direct"
    index => ... # string (optional), default: "logstash-%{+YYYY.MM.dd}"
    index_type => ... # string (optional), default: "%{type}"
    key => ... # string (optional), default: "elasticsearch"
    password => ... # string (optional), default: "guest"
    persistent => ... # boolean (optional), default: true
    queue => ... # string (optional), default: "elasticsearch"
    rabbitmq_host => ... # string (required)
    rabbitmq_port => ... # number (optional), default: 5672
    user => ... # string (optional), default: "guest"
    vhost => ... # string (optional), default: "/"
    workers => ... # number (optional), default: 1
  }
}

stdout

output {
  stdout {
    codec => ... # codec (optional), default: "plain"
    workers => ... # number (optional), default: 1
  }
}

Flag
1. -w: utilize multiple cores
Reference
1. http://logstash.net/docs/1.4.2
2. http://grokdebug.herokuapp.com/

Mungeol Heo

Friday, January 2, 2015

LogStash

No comments:

Post a Comment