Understanding the Storm Architecture -


i have been trying understand storm architecture, not sure if got right. i'll try explain possible believe case. please explain - if - got wrong , right.

preliminary thoughts: workers

http://storm.apache.org/releases/2.0.0-snapshot/understanding-the-parallelism-of-a-storm-topology.html suggests worker process http://storm.apache.org/releases/2.0.0-snapshot/concepts.html "worker processes. each worker process physical jvm", http://storm.apache.org/releases/1.0.1/setting-up-a-storm-cluster.html states worker node "nimbus , worker machines". website http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/ mentions "master node" , "worker nodes". what: worker process or physical node (or node process)? think there 2 things: worker nodes , worker processes.

what believe true

entities in play

  • master node = management server
  • worker node = slave node
  • nimbus jvm process, running on master node
  • zookeeper jvm processes, running on zookeeper nodes
  • supervisor jvm process, running on worker nodes
  • worker process (jvm), running on worker nodes
  • executor thread, run worker process
  • task (instances of bolts , spouts), executed executor

how things work

the nimbus jvm process, running on physical master node, receives program (storm topology) takes bolts , spouts , generates tasks them. if bolt supposed parallelized 3 times, nimbus generates 3 tasks it. nimbus asks zookeeper jvm processes configurations of cluster, e.g. run tasks , zookeeper jvm processes tells nimbus. in order zookeeper communicates supervisors (what comes later). nimbus distributes tasks workers nodes, physical nodes. worker nodes managed supervisors, jvm processes - 1 supervisor 1 worker node. supervisors manage (start, stop, etc.) worker processes, jvm processes run on worker nodes. each worker node can have multiple worker processes running. worker processes jvm processes. run 1 or multiple threads called executors. each executor thread runs 1 or multiple tasks, meaning 1 or multiple instances of bolt or spout, but have of same bolt.

if true, begs questions:

  • what point of having multiple worker processes run on 1 worker node - after process can use multiple processor cores, right?
  • what point of having multiple tasks run 1 executor thread if have of same bolt/spout? thread run on 1 processor core, multiple bolt/spout have run after each other , can not parallelized. running indefinitely in storm topology - point having 2 instances of same bolt/spout in exectuor thread?

edit: additional resource: http://www.tutorialspoint.com/apache_storm/apache_storm_cluster_architecture.htm

first few clarification (basically got right).

  • the term "node" , "process" not used consistently (unfortunately). got "entities in play" correct.
  • about parallelization: number of tasks determined .setnumtasks() -- if not specified, same parallelism_hint (which sets number of executors).
  • about deployment: nimbus gets cluster configuration zk, zk not decides run task -- nimbus has scheduler component makes decision (based on given topology, topology configuration, , cluster configuration).

to answer question:

  • a single worker process executes task single topology. thus, if want run multiple topologies need multiple worker jvms. reason design fault-tolerance, ie, isolation of topologies. if 1 topology fails (maybe due bad user code), crashing jvm not affect other running topologies.
  • tasks allow dynamic re-balancing , changing parallelism @ runtime. without tasks, need stop , redeploy topology if want change parallelism spout/bolt. thus, number of tasks defines maximum parallelism spout/bolt. see https://storm.apache.org/releases/1.0.1/understanding-the-parallelism-of-a-storm-topology.html

Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -