Thursday, June 15, 2017

Help 4 Apache Project

  • Data is not balanced inside kafka partitions while using kafka sink (flume)
    • Reason
      • Kafka Sink uses the topic and key properties from the FlumeEvent headers to send events to Kafka. If topic exists in the headers, the event will be sent to that specific topic, overriding the topic configured for the Sink. If key exists in the headers, the key will used by Kafka to partition the data between the topic partitions. Events with same key will be sent to the same partition. If the key is null, events will be sent to random partitions.
    • Option
      • a1.sources.r1.interceptors = i1
        a1.sources.r1.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
        a1.sources.r1.interceptors.i1.headerName = key
  • Cannot produce messages to the specified topic while using both kafka source and sink (flume)
    • agent01.sources.source01.interceptors = interceptor01
      agent01.sources.source01.interceptors.interceptor01.type = static
      agent01.sources.source01.interceptors.interceptor01.preserveExisting = false
      agent01.sources.source01.interceptors.interceptor01.key = topic
      agent01.sources.source01.interceptors.interceptor01.value = sink-topic
  • org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block (HDFS)
    • open ports
      • 50075, 50010, 50020
  • 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask', sqlState='08S01' (PyHive)
    • use hdfs account, or accounts able to access hive tables to run the python code

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.