https://github.com/wurstmeister/storm-kafka-0.8-plus

http://blog.csdn.net/xeseo/article/details/18615761

 

准备,一些相关类

GlobalPartitionInformation (storm.kafka.trident)

记录partitionid和broker的关系

GlobalPartitionInformation info = new GlobalPartitionInformation();

info.addPartition(0, new Broker("10.1.110.24",9092));

info.addPartition(0, new Broker("10.1.110.21",9092));

可以静态的生成GlobalPartitionInformation,向上面代码一样

也可以动态的从zk获取,推荐这种方式

从zk获取就会用到DynamicBrokersReader

 

DynamicBrokersReader

核心就是从zk上读出partition和broker的对应关系

操作zk都是使用curator框架

核心函数,

    /**
* Get all partitions with their current leaders
*/
public GlobalPartitionInformation getBrokerInfo() {
GlobalPartitionInformation globalPartitionInformation = new GlobalPartitionInformation();
try {
int numPartitionsForTopic = getNumPartitions(); //从zk取得partition的数目
String brokerInfoPath = brokerPath();
for (int partition = 0; partition < numPartitionsForTopic; partition++) {
int leader = getLeaderFor(partition); //从zk获取partition的leader broker
String path = brokerInfoPath + "/" + leader;
try {
byte[] brokerData = _curator.getData().forPath(path);
Broker hp = getBrokerHost(brokerData); //从zk获取broker的host:port
globalPartitionInformation.addPartition(partition, hp);//生成GlobalPartitionInformation
} catch (org.apache.zookeeper.KeeperException.NoNodeException e) {
LOG.error("Node {} does not exist ", path);
}
}
} catch (Exception e) {
throw new RuntimeException(e);
}
LOG.info("Read partition info from zookeeper: " + globalPartitionInformation);
return globalPartitionInformation;
}

 

DynamicPartitionConnections

维护到每个broker的connection,并记录下每个broker上对应的partitions

核心数据结构,为每个broker维持一个ConnectionInfo

Map<Broker, ConnectionInfo> _connections = new HashMap();

ConnectionInfo的定义,包含连接该broker的SimpleConsumer和记录partitions的set

    static class ConnectionInfo {
SimpleConsumer consumer;
Set<Integer> partitions = new HashSet(); public ConnectionInfo(SimpleConsumer consumer) {
this.consumer = consumer;
}
}

核心函数,就是register

    public SimpleConsumer register(Broker host, int partition) {
if (!_connections.containsKey(host)) {
_connections.put(host, new ConnectionInfo(new SimpleConsumer(host.host, host.port, _config.socketTimeoutMs, _config.bufferSizeBytes, _config.clientId)));
}
ConnectionInfo info = _connections.get(host);
info.partitions.add(partition);
return info.consumer;
}

 

 

PartitionManager

关键核心逻辑,用于管理一个partiiton的读取状态

先理解下面几个变量,

Long _emittedToOffset;
Long _committedTo;
SortedSet<Long> _pending = new TreeSet<Long>();
LinkedList<MessageAndRealOffset> _waitingToEmit = new LinkedList<MessageAndRealOffset>();

kafka对于一个partition,一定是从offset从小到大按顺序读的,并且这里为了保证不读丢数据,会定期的将当前状态即offset写入zk

几个中间状态,

从kafka读到的offset,_emittedToOffset

从kafka读到的messages会放入_waitingToEmit,放入这个list,我们就认为一定会被emit,所以emittedToOffset可以认为是从kafka读到的offset

已经成功处理的offset,lastCompletedOffset

由于message是要在storm里面处理的,其中是可能fail的,所以正在处理的offset是缓存在_pending中的

如果_pending为空,那么lastCompletedOffset=_emittedToOffset

如果_pending不为空,那么lastCompletedOffset为pending list里面第一个offset,因为后面都还在等待ack

    public long lastCompletedOffset() {
if (_pending.isEmpty()) {
return _emittedToOffset;
} else {
return _pending.first();
}
}

 

已经写入zk的offset,_committedTo

我们需要定期将lastCompletedOffset,写入zk,否则crash后,我们不知道上次读到哪儿了

所以_committedTo <= lastCompletedOffset

完整过程,

1. 初始化,

关键就是注册partition,然后初始化offset,以知道从哪里开始读

    public PartitionManager(DynamicPartitionConnections connections, String topologyInstanceId, ZkState state, Map stormConf, SpoutConfig spoutConfig, Partition id) {
_partition = id;
_connections = connections;
_spoutConfig = spoutConfig;
_topologyInstanceId = topologyInstanceId;
_consumer = connections.register(id.host, id.partition); //注册partition到connections,并生成simpleconsumer
_state = state;
_stormConf = stormConf; String jsonTopologyId = null;
Long jsonOffset = null;
String path = committedPath();
try {
Map<Object, Object> json = _state.readJSON(path);
LOG.info("Read partition information from: " + path + " --> " + json );
if (json != null) {
jsonTopologyId = (String) ((Map<Object, Object>) json.get("topology")).get("id");
jsonOffset = (Long) json.get("offset"); // 从zk中读出commited offset
}
} catch (Throwable e) {
LOG.warn("Error reading and/or parsing at ZkNode: " + path, e);
} if (jsonTopologyId == null || jsonOffset == null) { // zk中没有记录,那么根据spoutConfig.startOffsetTime设置offset,Earliest或Latest
_committedTo = KafkaUtils.getOffset(_consumer, spoutConfig.topic, id.partition, spoutConfig);
LOG.info("No partition information found, using configuration to determine offset");
} else if (!topologyInstanceId.equals(jsonTopologyId) && spoutConfig.forceFromStart) {
_committedTo = KafkaUtils.getOffset(_consumer, spoutConfig.topic, id.partition, spoutConfig.startOffsetTime);
LOG.info("Topology change detected and reset from start forced, using configuration to determine offset");
} else {
_committedTo = jsonOffset;
} _emittedToOffset = _committedTo; // 初始化时,中间状态都是一致的
}

 

2. 从kafka读取messages,放到_waitingToEmit

从kafka中读到数据ByteBufferMessageSet,

把需要emit的msg,MessageAndRealOffset,放到_waitingToEmit

把没完成的offset放到pending

更新emittedToOffset

    private void fill() {
ByteBufferMessageSet msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, _emittedToOffset);
for (MessageAndOffset msg : msgs) {
_pending.add(_emittedToOffset);
_waitingToEmit.add(new MessageAndRealOffset(msg.message(), _emittedToOffset));
_emittedToOffset = msg.nextOffset();
}
}

其中fetch message的逻辑如下,

    public static ByteBufferMessageSet fetchMessages(KafkaConfig config, SimpleConsumer consumer, Partition partition, long offset) {
ByteBufferMessageSet msgs = null;
String topic = config.topic;
int partitionId = partition.partition;
for (int errors = 0; errors < 2 && msgs == null; errors++) { // 容忍两次错误
FetchRequestBuilder builder = new FetchRequestBuilder();
FetchRequest fetchRequest = builder.addFetch(topic, partitionId, offset, config.fetchSizeBytes).
clientId(config.clientId).build();
FetchResponse fetchResponse;
try {
fetchResponse = consumer.fetch(fetchRequest);
} catch (Exception e) {
if (e instanceof ConnectException) {
throw new FailedFetchException(e);
} else {
throw new RuntimeException(e);
}
}
if (fetchResponse.hasError()) { // 主要处理offset outofrange的case,通过getOffset从earliest或latest读
KafkaError error = KafkaError.getError(fetchResponse.errorCode(topic, partitionId));
if (error.equals(KafkaError.OFFSET_OUT_OF_RANGE) && config.useStartOffsetTimeIfOffsetOutOfRange && errors == 0) {
long startOffset = getOffset(consumer, topic, partitionId, config.startOffsetTime);
LOG.warn("Got fetch request with offset out of range: [" + offset + "]; " +
"retrying with default start offset time from configuration. " +
"configured start offset time: [" + config.startOffsetTime + "] offset: [" + startOffset + "]");
offset = startOffset;
} else {
String message = "Error fetching data from [" + partition + "] for topic [" + topic + "]: [" + error + "]";
LOG.error(message);
throw new FailedFetchException(message);
}
} else {
msgs = fetchResponse.messageSet(topic, partitionId);
}
}
return msgs;
}

 

3. emit msg

从_waitingToEmit中取到msg,转换成tuple,然后通过collector.emit发出去

    public EmitState next(SpoutOutputCollector collector) {
if (_waitingToEmit.isEmpty()) {
fill();
}
while (true) {
MessageAndRealOffset toEmit = _waitingToEmit.pollFirst();
if (toEmit == null) {
return EmitState.NO_EMITTED;
}
Iterable<List<Object>> tups = KafkaUtils.generateTuples(_spoutConfig, toEmit.msg);
if (tups != null) {
for (List<Object> tup : tups) {
collector.emit(tup, new KafkaMessageId(_partition, toEmit.offset));
}
break;
} else {
ack(toEmit.offset);
}
}
if (!_waitingToEmit.isEmpty()) {
return EmitState.EMITTED_MORE_LEFT;
} else {
return EmitState.EMITTED_END;
}
}

可以看看转换tuple的过程,

可以看到是通过kafkaConfig.scheme.deserialize来做转换

    public static Iterable<List<Object>> generateTuples(KafkaConfig kafkaConfig, Message msg) {
Iterable<List<Object>> tups;
ByteBuffer payload = msg.payload();
ByteBuffer key = msg.key();
if (key != null && kafkaConfig.scheme instanceof KeyValueSchemeAsMultiScheme) {
tups = ((KeyValueSchemeAsMultiScheme) kafkaConfig.scheme).deserializeKeyAndValue(Utils.toByteArray(key), Utils.toByteArray(payload));
} else {
tups = kafkaConfig.scheme.deserialize(Utils.toByteArray(payload));
}
return tups;
}

所以你使用时,需要定义scheme逻辑,

spoutConfig.scheme = new SchemeAsMultiScheme(new TestMessageScheme());

public class TestMessageScheme implements Scheme {
private static final Logger LOGGER = LoggerFactory.getLogger(TestMessageScheme.class); @Override
public List<Object> deserialize(byte[] bytes) {
try {
String msg = new String(bytes, "UTF-8");
return new Values(msg);
} catch (InvalidProtocolBufferException e) {
LOGGER.error("Cannot parse the provided message!");
}
return null;
} @Override
public Fields getOutputFields() {
return new Fields("msg");
}
}

 

4. 定期的commit offset

    public void commit() {
long lastCompletedOffset = lastCompletedOffset();
if (lastCompletedOffset != lastCommittedOffset()) {
Map<Object, Object> data = ImmutableMap.builder()
.put("topology", ImmutableMap.of("id", _topologyInstanceId,
"name", _stormConf.get(Config.TOPOLOGY_NAME)))
.put("offset", lastCompletedOffset)
.put("partition", _partition.partition)
.put("broker", ImmutableMap.of("host", _partition.host.host,
"port", _partition.host.port))
.put("topic", _spoutConfig.topic).build();
_state.writeJSON(committedPath(), data);
_committedTo = lastCompletedOffset;
} else {
LOG.info("No new offset for " + _partition + " for topology: " + _topologyInstanceId);
}
}

 

5. 最后关注一下,fail时的处理

首先作者没有cache message,而只是cache offset

所以fail的时候,他是无法直接replay的,在他的注释里面写了,不这样做的原因是怕内存爆掉

所以他的做法是,当一个offset fail的时候, 直接将_emittedToOffset回滚到当前fail的这个offset

下次从Kafka fetch的时候会从_emittedToOffset开始读,这样做的好处就是依赖kafka做replay,问题就是会有重复问题

所以使用时,一定要考虑,是否可以接受重复问题

    public void fail(Long offset) {
//TODO: should it use in-memory ack set to skip anything that's been acked but not committed???
// things might get crazy with lots of timeouts
if (_emittedToOffset > offset) {
_emittedToOffset = offset;
_pending.tailSet(offset).clear();
}
}

 

KafkaSpout

最后来看看KafkaSpout

1. 初始化

关键就是初始化DynamicPartitionConnections和_coordinator

    public void open(Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
_collector = collector; Map stateConf = new HashMap(conf);
List<String> zkServers = _spoutConfig.zkServers;
if (zkServers == null) {
zkServers = (List<String>) conf.get(Config.STORM_ZOOKEEPER_SERVERS);
}
Integer zkPort = _spoutConfig.zkPort;
if (zkPort == null) {
zkPort = ((Number) conf.get(Config.STORM_ZOOKEEPER_PORT)).intValue();
}
stateConf.put(Config.TRANSACTIONAL_ZOOKEEPER_SERVERS, zkServers);
stateConf.put(Config.TRANSACTIONAL_ZOOKEEPER_PORT, zkPort);
stateConf.put(Config.TRANSACTIONAL_ZOOKEEPER_ROOT, _spoutConfig.zkRoot);
_state = new ZkState(stateConf); _connections = new DynamicPartitionConnections(_spoutConfig, KafkaUtils.makeBrokerReader(conf, _spoutConfig)); // using TransactionalState like this is a hack
int totalTasks = context.getComponentTasks(context.getThisComponentId()).size();
if (_spoutConfig.hosts instanceof StaticHosts) {
_coordinator = new StaticCoordinator(_connections, conf, _spoutConfig, _state, context.getThisTaskIndex(), totalTasks, _uuid);
} else {
_coordinator = new ZkCoordinator(_connections, conf, _spoutConfig, _state, context.getThisTaskIndex(), totalTasks, _uuid);
}
}

看看_coordinator 是干嘛的?

这很关键,因为我们一般都会开多个并发的kafkaspout,类似于high-level中的consumer group,如何保证这些并发的线程不冲突?

使用和highlevel一样的思路,一个partition只会有一个spout消费,这样就避免处理麻烦的访问互斥问题(kafka做访问互斥很麻烦,试着想想)

是根据当前spout的task数和partition数来分配,task和partitioin的对应关系的,并且为每个partition建立PartitionManager

这里首先看到totalTasks就是当前这个spout component的task size

StaticCoordinator和ZkCoordinator的差别就是, 从StaticHost还是从Zk读到partition的信息,简单起见,看看StaticCoordinator实现

public class StaticCoordinator implements PartitionCoordinator {
Map<Partition, PartitionManager> _managers = new HashMap<Partition, PartitionManager>();
List<PartitionManager> _allManagers = new ArrayList(); public StaticCoordinator(DynamicPartitionConnections connections, Map stormConf, SpoutConfig config, ZkState state, int taskIndex, int totalTasks, String topologyInstanceId) {
StaticHosts hosts = (StaticHosts) config.hosts;
List<Partition> myPartitions = KafkaUtils.calculatePartitionsForTask(hosts.getPartitionInformation(), totalTasks, taskIndex);
for (Partition myPartition : myPartitions) {// 建立PartitionManager
_managers.put(myPartition, new PartitionManager(connections, topologyInstanceId, state, stormConf, config, myPartition));
}
_allManagers = new ArrayList(_managers.values());
} @Override
public List<PartitionManager> getMyManagedPartitions() {
return _allManagers;
} public PartitionManager getManager(Partition partition) {
return _managers.get(partition);
} }

其中分配的逻辑在calculatePartitionsForTask

    public static List<Partition> calculatePartitionsForTask(GlobalPartitionInformation partitionInformation, int totalTasks, int taskIndex) {
Preconditions.checkArgument(taskIndex < totalTasks, "task index must be less that total tasks");
List<Partition> partitions = partitionInformation.getOrderedPartitions();
int numPartitions = partitions.size();
List<Partition> taskPartitions = new ArrayList<Partition>();
for (int i = taskIndex; i < numPartitions; i += totalTasks) {// 平均分配,
Partition taskPartition = partitions.get(i);
taskPartitions.add(taskPartition);
}
logPartitionMapping(totalTasks, taskIndex, taskPartitions);
return taskPartitions;
}

 

2. nextTuple

逻辑写的比较tricky,其实只要从一个partition读成功一次

只所以要for,是当EmitState.NO_EMITTED时,需要遍历后面的partition以保证读成功一次

    @Override
public void nextTuple() {
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) { // in case the number of managers decreased
_currPartitionIndex = _currPartitionIndex % managers.size(); //_currPartitionIndex初始为0,每次依次读一个partition
EmitState state = managers.get(_currPartitionIndex).next(_collector); //调用PartitonManager.next去emit数据
if (state != EmitState.EMITTED_MORE_LEFT) { //当EMITTED_MORE_LEFT时,还有数据,可以继续读,不需要+1
_currPartitionIndex = (_currPartitionIndex + 1) % managers.size();
}
if (state != EmitState.NO_EMITTED) { //当EmitState.NO_EMITTED时,表明partition的数据已经读完,也就是没有读到数据,所以不能break
break;
}
} long now = System.currentTimeMillis();
if ((now - _lastUpdateMs) > _spoutConfig.stateUpdateIntervalMs) {
commit(); //定期commit
}
}

定期commit的逻辑,遍历去commit每个PartitionManager

    private void commit() {
_lastUpdateMs = System.currentTimeMillis();
for (PartitionManager manager : _coordinator.getMyManagedPartitions()) {
manager.commit();
}
}

 

3. Ack和Fail

直接调用PartitionManager

    @Override
public void ack(Object msgId) {
KafkaMessageId id = (KafkaMessageId) msgId;
PartitionManager m = _coordinator.getManager(id.partition);
if (m != null) {
m.ack(id.offset);
}
} @Override
public void fail(Object msgId) {
KafkaMessageId id = (KafkaMessageId) msgId;
PartitionManager m = _coordinator.getManager(id.partition);
if (m != null) {
m.fail(id.offset);
}
}

 

4. declareOutputFields

所以在scheme里面需要定义,deserialize和getOutputFields

    @Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(_spoutConfig.scheme.getOutputFields());
}

 

Metrics

再来看下Metrics,关键学习一下如何在storm里面加metrics

在spout.open里面初始化了下面两个metrics

kafkaOffset

反映出每个partition的earliestTimeOffset,latestTimeOffset,和latestEmittedOffset,其中latestTimeOffset - latestEmittedOffset就是spout lag

除了反映出每个partition的,还会算出所有的partitions的总数据

        context.registerMetric("kafkaOffset", new IMetric() {
KafkaUtils.KafkaOffsetMetric _kafkaOffsetMetric = new KafkaUtils.KafkaOffsetMetric(_spoutConfig.topic, _connections); @Override
public Object getValueAndReset() {
List<PartitionManager> pms = _coordinator.getMyManagedPartitions(); //从coordinator获取pms的信息
Set<Partition> latestPartitions = new HashSet();
for (PartitionManager pm : pms) {
latestPartitions.add(pm.getPartition());
}
_kafkaOffsetMetric.refreshPartitions(latestPartitions); //根据最新的partition信息删除metric中已经不存在的partition的统计信息
for (PartitionManager pm : pms) {
_kafkaOffsetMetric.setLatestEmittedOffset(pm.getPartition(), pm.lastCompletedOffset()); //更新metric中每个partition的已经完成的offset
}
return _kafkaOffsetMetric.getValueAndReset();
}
}, _spoutConfig.metricsTimeBucketSizeInSecs);

_kafkaOffsetMetric.getValueAndReset,其实只是get,不需要reset

@Override
public Object getValueAndReset() {
try {
long totalSpoutLag = 0;
long totalEarliestTimeOffset = 0;
long totalLatestTimeOffset = 0;
long totalLatestEmittedOffset = 0;
HashMap ret = new HashMap();
if (_partitions != null && _partitions.size() == _partitionToOffset.size()) {
for (Map.Entry<Partition, Long> e : _partitionToOffset.entrySet()) {
Partition partition = e.getKey();
SimpleConsumer consumer = _connections.getConnection(partition);
long earliestTimeOffset = getOffset(consumer, _topic, partition.partition, kafka.api.OffsetRequest.EarliestTime());
long latestTimeOffset = getOffset(consumer, _topic, partition.partition, kafka.api.OffsetRequest.LatestTime());
long latestEmittedOffset = e.getValue();
long spoutLag = latestTimeOffset - latestEmittedOffset;
ret.put(partition.getId() + "/" + "spoutLag", spoutLag);
ret.put(partition.getId() + "/" + "earliestTimeOffset", earliestTimeOffset);
ret.put(partition.getId() + "/" + "latestTimeOffset", latestTimeOffset);
ret.put(partition.getId() + "/" + "latestEmittedOffset", latestEmittedOffset);
totalSpoutLag += spoutLag;
totalEarliestTimeOffset += earliestTimeOffset;
totalLatestTimeOffset += latestTimeOffset;
totalLatestEmittedOffset += latestEmittedOffset;
}
ret.put("totalSpoutLag", totalSpoutLag);
ret.put("totalEarliestTimeOffset", totalEarliestTimeOffset);
ret.put("totalLatestTimeOffset", totalLatestTimeOffset);
ret.put("totalLatestEmittedOffset", totalLatestEmittedOffset);
return ret;
} else {
LOG.info("Metrics Tick: Not enough data to calculate spout lag.");
}
} catch (Throwable t) {
LOG.warn("Metrics Tick: Exception when computing kafkaOffset metric.", t);
}
return null;
}

 

kafkaPartition

反映出从Kafka fetch数据的情况,fetchAPILatencyMax,fetchAPILatencyMean,fetchAPICallCount 和 fetchAPIMessageCount

        context.registerMetric("kafkaPartition", new IMetric() {
@Override
public Object getValueAndReset() {
List<PartitionManager> pms = _coordinator.getMyManagedPartitions();
Map concatMetricsDataMaps = new HashMap();
for (PartitionManager pm : pms) {
concatMetricsDataMaps.putAll(pm.getMetricsDataMap());
}
return concatMetricsDataMaps;
}
}, _spoutConfig.metricsTimeBucketSizeInSecs);

pm.getMetricsDataMap(),

public Map getMetricsDataMap() {
Map ret = new HashMap();
ret.put(_partition + "/fetchAPILatencyMax", _fetchAPILatencyMax.getValueAndReset());
ret.put(_partition + "/fetchAPILatencyMean", _fetchAPILatencyMean.getValueAndReset());
ret.put(_partition + "/fetchAPICallCount", _fetchAPICallCount.getValueAndReset());
ret.put(_partition + "/fetchAPIMessageCount", _fetchAPIMessageCount.getValueAndReset());
return ret;
}

更新的逻辑如下,

    private void fill() {
long start = System.nanoTime();
ByteBufferMessageSet msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, _emittedToOffset);
long end = System.nanoTime();
long millis = (end - start) / 1000000;
_fetchAPILatencyMax.update(millis);
_fetchAPILatencyMean.update(millis);
_fetchAPICallCount.incr();
int numMessages = countMessages(msgs);
_fetchAPIMessageCount.incrBy(numMessages);
}

 

我们在读取kafka时,

首先是关心,每个partition的读取状况,这个通过取得KafkaOffset Metrics就可以知道

再者,我们需要replay数据,使用high-level接口的时候可以通过系统提供的工具,这里如何搞?

看下下面的代码,

第一个if,是从配置文件里面没有读到配置的情况

第二个else if,当topologyInstanceId发生变化时,并且forceFromStart为true时,就会取startOffsetTime指定的offset(Latest或Earliest)

这个topologyInstanceId, 每次KafkaSpout对象生成的时候随机产生,

String _uuid = UUID.randomUUID().toString();

Spout对象是在topology提交时,在client端生成一次的,所以如果topology停止,再重新启动,这个id一定会发生变化

所以应该是只需要把forceFromStart设为true,再重启topology,就可以实现replay

        if (jsonTopologyId == null || jsonOffset == null) { // failed to parse JSON?
_committedTo = KafkaUtils.getOffset(_consumer, spoutConfig.topic, id.partition, spoutConfig);
LOG.info("No partition information found, using configuration to determine offset");
} else if (!topologyInstanceId.equals(jsonTopologyId) && spoutConfig.forceFromStart) {
_committedTo = KafkaUtils.getOffset(_consumer, spoutConfig.topic, id.partition, spoutConfig.startOffsetTime);
LOG.info("Topology change detected and reset from start forced, using configuration to determine offset");
} else {
_committedTo = jsonOffset;
LOG.info("Read last commit offset from zookeeper: " + _committedTo + "; old topology_id: " + jsonTopologyId + " - new topology_id: " + topologyInstanceId );
}

 

代码例子

storm-kafka的文档很差,最后附上使用的例子

import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.BrokerHosts;
import storm.kafka.ZkHosts;
import storm.kafka.KeyValueSchemeAsMultiScheme;
import storm.kafka.KeyValueScheme; public static class SimplekVScheme implements KeyValueScheme { //定义scheme
@Override
public List<Object> deserializeKeyAndValue(byte[] key, byte[] value){
ArrayList tuple = new ArrayList();
tuple.add(key);
tuple.add(value);
return tuple;
} @Override
public List<Object> deserialize(byte[] bytes) {
ArrayList tuple = new ArrayList();
tuple.add(bytes);
return tuple;
} @Override
public Fields getOutputFields() {
return new Fields("key","value");
} } String topic = “test”; //
String zkRoot = “/kafkastorm”; //
String spoutId = “id”; //读取的status会被存在,/kafkastorm/id下面,所以id类似consumer group BrokerHosts brokerHosts = new ZkHosts("10.1.110.24:2181,10.1.110.22:2181"); SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, topic, zkRoot, spoutId);
spoutConfig.scheme = new KeyValueSchemeAsMultiScheme(new SimplekVScheme()); /*spoutConfig.zkServers = new ArrayList<String>(){{ //只有在local模式下需要记录读取状态时,才需要设置
add("10.118.136.107");
}};
spoutConfig.zkPort = 2181;*/ spoutConfig.forceFromStart = false;
spoutConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime();
spoutConfig.metricsTimeBucketSizeInSecs = 6; builder.setSpout(SqlCollectorTopologyDef.KAFKA_SPOUT_NAME, new KafkaSpout(spoutConfig), 1);

storm-kafka-0.8-plus 源码解析的更多相关文章

  1. SpringBoot 2.0.3 源码解析

    前言 用SpringBoot也有很长一段时间了,一直是底层使用者,没有研究过其到底是怎么运行的,借此机会今天试着将源码读一下,在此记录...我这里使用的SpringBoot 版本是  2.0.3.RE ...

  2. Redis系列(十):数据结构Set源码解析和SADD、SINTER、SDIFF、SUNION、SPOP命令

    1.介绍 Hash是以K->V形式存储,而Set则是K存储,空间节省了很多 Redis中Set是String类型的无序集合:集合成员是唯一的. 这就意味着集合中不能出现重复的数据.可根据应用场景 ...

  3. EventBus3.0源码解析

    本文主要介绍EventBus3.0的源码 EventBus是一个Android事件发布/订阅框架,通过解耦发布者和订阅者简化 Android 事件传递. EventBus使用简单,并将事件发布和订阅充 ...

  4. solr&lucene3.6.0源码解析(四)

    本文要描述的是solr的查询插件,该查询插件目的用于生成Lucene的查询Query,类似于查询条件表达式,与solr查询插件相关UML类图如下: 如果我们强行将上面的类图纳入某种设计模式语言的话,本 ...

  5. solr&lucene3.6.0源码解析(三)

    solr索引操作(包括新增 更新 删除 提交 合并等)相关UML图如下 从上面的类图我们可以发现,其中体现了工厂方法模式及责任链模式的运用 UpdateRequestProcessor相当于责任链模式 ...

  6. Heritrix 3.1.0 源码解析(三十七)

    今天有兴趣重新看了一下heritrix3.1.0系统里面的线程池源码,heritrix系统没有采用java的cocurrency包里面的并发框架,而是采用了线程组ThreadGroup类来实现线程池的 ...

  7. Masonry1.0.2 源码解析

    在了解Masonry框架之前,有必要先了解一下自动布局的概念.在iOS6之前,UI布局的方式是通过frame属性和Autoresizing来完成的,而在iOS6之后,苹果公司推出了AutoLayout ...

  8. Retrofit2.0源码解析

    欢迎访问我的个人博客 ,原文链接:http://wensibo.net/2017/09/05/retrofit/ ,未经允许不得转载! 今天是九月的第四天了,学校也正式开学,趁着大学最后一年的这大好时 ...

  9. 04、NetCore2.0下Web应用之Startup源码解析

    04.NetCore2.0Web应用之Startup源码解析   通过分析Asp.Net Core 2.0的Startup部分源码,来理解插件框架的运行机制,以及掌握Startup注册的最优姿势. - ...

  10. Android事件总线(二)EventBus3.0源码解析

    1.构造函数 当我们要调用EventBus的功能时,比如注册或者发送事件,总会调用EventBus.getDefault()来获取EventBus实例: public static EventBus ...

随机推荐

  1. poj 1276 多重背包

    735 3 4 125 6 5 3 350 //735的最大额,3种,4个125,6个5,3个350 633 4 500 30 6 100 1 5 0 1 735 0 0 3 10 100 10 50 ...

  2. TabLayout

    效果图: 标题和fragment联动效果已经封装好了,非常方便 <android.support.design.widget.TabLayout android:id="@+id/ta ...

  3. Hark的数据结构与算法练习之若领图排序ProxymapSort

    算法说明 若领图排序是分布排序的一种. 个人理解,若领图排序算是桶排序+计数排序的变异版,桶排序计数排序理解了,那么若领图排序理解起来就会比较容易.区别其实就是存储中间值的方式做了调整…… 话说,这个 ...

  4. js:工具库-highcharts.js

    主要用于方便绘制图标,注意是需要付费的: 相关连接: 官网 中文网 祛除自带标识的代码: credits: { enabled: false //remove the logo and button ...

  5. logback 配置详解【讲解较全的博客网站】

    http://blog.csdn.net/haidage/article/category/812478 详解(一)http://blog.csdn.net/haidage/article/detai ...

  6. 关于配置文件权衡,.config VS .xml

    众所周知,程序的灵活性有一部分就是“配”出来了. 当然,config文件从来就没有让.NET的同学轻松过,至少,我觉得很麻烦. 1.config .NET的配置文件方便,其实最方便的是appSetti ...

  7. Revit二次开发示例:EventsMonitor

    在该示例中,插件在Revit启动时弹出事件监控选择界面,供用户设置,也可在添加的Ribbon界面完成设置.当Revit进行相应操作时,弹出窗体会记录事件时间和名称. #region Namespace ...

  8. 2015ACM/ICPC亚洲区长春站 A hdu 5527 Too Rich

    Too Rich Time Limit: 6000/3000 MS (Java/Others)    Memory Limit: 262144/262144 K (Java/Others)Total ...

  9. BZOJ3829 : [Poi2014]FarmCraft

    d[x]表示走完x的子树并回到x所需的时间 f[x]表示从走到x开始计时,x子树中最晚的点安装完的最早时间 d[x]=sum(d[i]+2),i是x的孩子 f[x]的计算比较复杂: 考虑将x的各棵子树 ...

  10. BZOJ3329 : Xorequ

    第一问: 打表可得规律:当且仅当x&(x<<1)=0时才会是解,于是数位DP f[i][j][k]表示二进制中前i位,上一位是j,前i位是否等于n的方案数 第二问: 打表可得规律: ...