- Data is not balanced inside kafka partitions while using kafka sink (flume)
- Reason
- Kafka Sink uses the topic and key properties from the FlumeEvent headers to send events to Kafka. If topic exists in the headers, the event will be sent to that specific topic, overriding the topic configured for the Sink. If key exists in the headers, the key will used by Kafka to partition the data between the topic partitions. Events with same key will be sent to the same partition. If the key is null, events will be sent to random partitions.
- Option
- a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
a1.sources.r1.interceptors.i1.headerName = key
- a1.sources.r1.interceptors = i1
- Reason
- Cannot produce messages to the specified topic while using both kafka source and sink (flume)
- agent01.sources.source01.interceptors = interceptor01
agent01.sources.source01.interceptors.interceptor01.type = static
agent01.sources.source01.interceptors.interceptor01.preserveExisting = false
agent01.sources.source01.interceptors.interceptor01.key = topic
agent01.sources.source01.interceptors.interceptor01.value = sink-topic
- org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block (HDFS)
- open ports
- 50075, 50010, 50020
- open ports
- 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask', sqlState='08S01' (PyHive)
- use hdfs account, or accounts able to access hive tables to run the python code
Thursday, June 15, 2017
Help 4 Apache Project
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.