Understanding the Storm Architecture -

- September 15, 2015

i have been trying understand storm architecture, not sure if got right. i'll try explain possible believe case. please explain - if - got wrong , right.

preliminary thoughts: workers

http://storm.apache.org/releases/2.0.0-snapshot/understanding-the-parallelism-of-a-storm-topology.html suggests worker process http://storm.apache.org/releases/2.0.0-snapshot/concepts.html "worker processes. each worker process physical jvm", http://storm.apache.org/releases/1.0.1/setting-up-a-storm-cluster.html states worker node "nimbus , worker machines". website http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/ mentions "master node" , "worker nodes". what: worker process or physical node (or node process)? think there 2 things: worker nodes , worker processes.

what believe true

entities in play

master node = management server
worker node = slave node
nimbus jvm process, running on master node
zookeeper jvm processes, running on zookeeper nodes
supervisor jvm process, running on worker nodes
worker process (jvm), running on worker nodes
executor thread, run worker process
task (instances of bolts , spouts), executed executor

how things work

the nimbus jvm process, running on physical master node, receives program (storm topology) takes bolts , spouts , generates tasks them. if bolt supposed parallelized 3 times, nimbus generates 3 tasks it. nimbus asks zookeeper jvm processes configurations of cluster, e.g. run tasks , zookeeper jvm processes tells nimbus. in order zookeeper communicates supervisors (what comes later). nimbus distributes tasks workers nodes, physical nodes. worker nodes managed supervisors, jvm processes - 1 supervisor 1 worker node. supervisors manage (start, stop, etc.) worker processes, jvm processes run on worker nodes. each worker node can have multiple worker processes running. worker processes jvm processes. run 1 or multiple threads called executors. each executor thread runs 1 or multiple tasks, meaning 1 or multiple instances of bolt or spout, but have of same bolt.

if true, begs questions:

what point of having multiple worker processes run on 1 worker node - after process can use multiple processor cores, right?
what point of having multiple tasks run 1 executor thread if have of same bolt/spout? thread run on 1 processor core, multiple bolt/spout have run after each other , can not parallelized. running indefinitely in storm topology - point having 2 instances of same bolt/spout in exectuor thread?

edit: additional resource: http://www.tutorialspoint.com/apache_storm/apache_storm_cluster_architecture.htm

first few clarification (basically got right).

the term "node" , "process" not used consistently (unfortunately). got "entities in play" correct.
about parallelization: number of tasks determined .setnumtasks() -- if not specified, same parallelism_hint (which sets number of executors).
about deployment: nimbus gets cluster configuration zk, zk not decides run task -- nimbus has scheduler component makes decision (based on given topology, topology configuration, , cluster configuration).

to answer question:

a single worker process executes task single topology. thus, if want run multiple topologies need multiple worker jvms. reason design fault-tolerance, ie, isolation of topologies. if 1 topology fails (maybe due bad user code), crashing jvm not affect other running topologies.
tasks allow dynamic re-balancing , changing parallelism @ runtime. without tasks, need stop , redeploy topology if want change parallelism spout/bolt. thus, number of tasks defines maximum parallelism spout/bolt. see https://storm.apache.org/releases/1.0.1/understanding-the-parallelism-of-a-storm-topology.html

Search This Blog

To form