There are two kinds of nodes on a Storm cluster: the master node and the worker nodes.



Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.


The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Each worker process executes a subset of a topology


a running topology consists of many worker processes spread across many machines.

A topology is a graph of computation. Each node in a topology contains processing logic, and links between nodes indicate how data should be passed around between nodes.


All coordination between Nimbus and the Supervisors is done through a Zookeeper cluster.

Additionally, the Nimbus daemon and Supervisor daemons are fail-fast and stateless; all state is kept in Zookeeper or on local disk.


Thrift :

Since topology definitions are just Thrift structs, and Nimbus is a Thrift service, you can create and submit topologies using any programming language.

(thrift是一个软件框架,用来进行可扩展且跨语言的服务的开发。它结合了功能强大的软件堆栈和代码生成引擎,以构建在 C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml 这些编程语言间无缝结合的、高效的服务。)


The core abstraction in Storm is the “stream”. A stream is an unbounded sequence of tuples.

Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way.

spouts & bolts 

The basic primitives Storm provides for doing stream transformations are “spouts” and “bolts”. Spouts and bolts have interfaces that you implement to run your application-specific logic.

A spout is a source of streams (数据流的源头)

A bolt consumes any number of input streams,does some processing, and possibly emits (发送出)new streams.

Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.


spouts 和 bolts 组成的网络,就构成了一个topology,这是提交给storm执行的高层次抽象。

A topology is a graph of stream transformations where each node is a spout or bolt.

When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.


