Wednesday, December 31, 2014

Apache Kafka

  1. Introduction
    1. Kafka is a distributed, partitioned, replicated commit log service
    2. It provides the functionality of a messaging system, but with a unique design
  2. Feature
    1. Fast
      1. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients
    2. Scalable
      1. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization
      2. It can be elastically and transparently expanded without downtime
      3. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers
    3. Durable
      1. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact
    4. Distributed by Design
      1. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees

  3. Installation
    1. wget http://mirror.apache-kr.org/kafka/0.8.1.1/kafka_2.9.2-0.8.1.1.tgz
    2. tar -xzf kafka_2.9.2-0.8.1.1.tgz
    3. cd kafka_2.9.2-0.8.1.1
    4. bin/kafka-server-start.sh config/server.properties
  4. Configuration
    1. broker
      1. broker.id
      2. log.dirs
      3. zookeeper.connect
      4. host.name
      5. topic-level
        1. bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1
        2. bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --config max.message.bytes=128000
        3. bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --deleteConfig max.message.bytes
      6. controlled.shutdown.enable=true
      7. auto.leader.rebalance.enable=true
    2. consumer
      1. group.id
      2. zookeeper.connect
    3. producer
      1. metadata.broker.list
    4. server production server configuration
      # Replication configurations
      num.replica.fetchers=4
      replica.fetch.max.bytes=1048576
      replica.fetch.wait.max.ms=500
      replica.high.watermark.checkpoint.interval.ms=5000
      replica.socket.timeout.ms=30000
      replica.socket.receive.buffer.bytes=65536
      replica.lag.time.max.ms=10000
      replica.lag.max.messages=4000
      
      controller.socket.timeout.ms=30000
      controller.message.queue.size=10
      
      # Log configuration
      num.partitions=8
      message.max.bytes=1000000
      auto.create.topics.enable=true
      log.index.interval.bytes=4096
      log.index.size.max.bytes=10485760
      log.retention.hours=168
      log.flush.interval.ms=10000
      log.flush.interval.messages=20000
      log.flush.scheduler.interval.ms=2000
      log.roll.hours=168
      log.cleanup.interval.mins=30
      log.segment.bytes=1073741824
      
      # ZK configuration
      zookeeper.connection.timeout.ms=6000
      zookeeper.sync.time.ms=2000
      
      # Socket server configuration
      num.io.threads=8
      num.network.threads=8
      socket.request.max.bytes=104857600
      socket.receive.buffer.bytes=1048576
      socket.send.buffer.bytes=1048576
      queued.max.requests=16
      fetch.purgatory.purge.interval.requests=100
      producer.purgatory.purge.interval.requests=100
  5. Operations
    1. bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name --partitions 20 --replication-factor 3 --config x=y
    2. bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --partitions 40
    3. bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --config x=y
    4. bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --deleteConfig x
    5. bin/kafka-topics.sh --zookeeper zk_host:port/chroot --delete --topic my_topic_name
    6. bin/kafka-preferred-replica-election.sh --zookeeper zk_host:port/chroot
    7. bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group test
  6. Reference
    1. http://kafka.apache.org/
    2. http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.