
Top Storm Interview Questions and Answers [ TO GET HIRED ]
Last updated on 09th Nov 2021, Blog, Interview Questions
These Storm Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Storm. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer.we are going to cover top Storm Interview questions along with their detailed answers. We will be covering Storm scenario based interview questions, Storm interview questions for freshers as well as Storminterview questions and answers for experienced.
1. Which components are used for stream flow of data?
Ans:
- Bolt
- Spout
- Tuple
For streaming of data flow, three components are used:
2. How is Bolt used for stream flow of data?
Ans:
Bolts represent the processing logic unit in Storm. One can utilize bolts to do any kind of processing such as filtering, aggregating, joining, interacting with data stores, talking to external systems etc. Bolts can also emit tuples (data messages) for the subsequent bolts to process. Additionally, bolts are responsible to acknowledge the processing of tuples after they are done processing.
3. How is Spouts used for stream flow of data?
Ans:
Spouts represent the source of data in Storm. You can write spouts to read data from data sources such as databases, distributed file systems, messaging frameworks etc. Spouts can broadly be classified into following:
Reliable – These spouts have the capability to replay the tuples (a unit of data in a data stream). This helps applications achieve ‘at least once message processing’ semantic as in case of failures, tuples can be replayed and processed again. Spouts for fetching the data from messaging frameworks are generally reliable as these frameworks provide the mechanism to replay the messages.
Unreliable – These spouts don’t have the capability to replay the tuples. Once a tuple is emitted, it cannot be replayed irrespective of whether it was processed successfully or not. This type of sp
4. How is Tuple used for stream flow of data?
Ans:
The tuple is the main data structure in Storm. A tuple is a named list of values, where each value can be any type. Tuples are dynamically typed — the types of the fields do not need to be declared. Tuples have helper methods like getInteger and getString to get field values without having to cast the result. Storm needs to know how to serialize all the values in a tuple. By default, Storm knows how to serialize the primitive types, strings, and byte arrays. If you want to use another type, you’ll need to implement and register a serializer for that type.
5. Compare Spark & Storm?
Ans:
Data at rest | Data in motion |
Task parallel | Data parallel |
Few seconds | Sub-second |
6. What are the key benefits of using Storm for Real Time Processing?
Ans:
Real fast : It can process 100 messages per second per node.
Fault Tolerant : It detects the fault automatically and restarts the functional attributes.
Reliable : It guarantees that each unit of data will be executed at least once or exactly once.
Scalable : It runs across a cluster of machine
Easy to operate : Operating Storms is quite easy.
7.Does Apache act as a Proxy server?
Ans:
Yes, It acts as a proxy also by using the mod_proxy module. This module implements a proxy, gateway or cache for Apache. It implements proxying capability for AJP13 (Apache JServ Protocol version 1.3), FTP, CONNECT (for SSL),HTTP/0.9, HTTP/1.0, and (since Apache 1.3.23) HTTP/1.1. The module can be configured to connect to other proxy modules for these and other protocols.
8. What is the use of Zookeeper in Storm?
Ans:
Storm uses Zookeeper for coordinating the cluster. Zookeeper is not used for message passing, so the load that Storm places on Zookeeper is quite low. Single node Zookeeper clusters should be sufficient for most cases, but if you want failover or are deploying large Storm clusters you may want larger Zookeeper clusters. Instructions for deploying Zookeeper are here.A few notes about Zookeeper deployment :
It’s critical that you run Zookeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case. See here for more details.
It’s critical that you set up a cron to compact Zookeeper’s data and transaction logs. The Zookeeper daemon does not do this on its own, and if you don’t set up a cron, Zookeeper will quickly run out of disk space.
9. What is ZeroMQ?
Ans:
ZeroMQ is “a library which extends the standard socket interfaces with features traditionally provided by specialized messaging middleware products”. Storm relies on ZeroMQ primarily for task-to-task communication in running Storm topologies.
10. What is the 3-Tier Apache Storm architecture?
Ans:

11. How many distinct layers are of Storm’s Codebase?
Ans:
There are three distinct layers to Storm’s codebase:
First : Storm was designed from the very beginning to be compatible with multiple languages. Nimbus is a Thrift service and topologies are defined as Thrift structures. The usage of Thrift allows Storm to be used from any language.
Second : all of Storm’s interfaces are specified as Java interfaces. So even though there’s a lot of Clojure in Storm’s implementation, all usage must go through the Java API. This means that every feature of Storm is always available via Java.
Third : Storm’s implementation is largely in Clojure. Line-wise, Storm is about half Java code, half Clojure code. But Clojure is much more expressive, so in reality the great majority of the implementation logic is in Clojure.
12. What does it mean for a tuple coming off a spout can trigger thousands of tuples to be created?
Ans:
- the streaming word count topology:TopologyBuilder builder = new TopologyBuilder();
- builder.setSpout(“sentences”, new KestrelSpout(“kestrel.backtype.com”,
- 22133,
- “sentence_queue”,
- new StringScheme()));
- builder.setBolt(“split”, new SplitSentence(), 10)
- .shuffleGrouping(“sentences”);
- builder.setBolt(“count”, new WordCount(), 20)
- .fieldsGrouping(“split”, new Fields(“word”));
A tuple coming off a spout can trigger thousands of tuples to be created based on it. Consider, for example:
This topology reads sentences off a Kestrel queue, splits the sentences into its constituent words, and then emits for each word the number of times it has seen that word before. A tuple coming off the spout triggers many tuples being created based on it: a tuple for each word in the sentence and a tuple for the updated count for each word.
Storm considers a tuple coming off a spout “fully processed” when the tuple tree has been exhausted and every message in the tree has been processed. A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout. This timeout can be configured on a topology-specific basis using the Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS configuration and defaults to 30 seconds.
13. When do you call the cleanup method?
Ans:
The cleanup method is called when a Bolt is being shutdown and should clean up any resources that were opened. There’s no guarantee that this method will be called on the cluster: For instance, if the machine the task is running on blows up, there’s no way to invoke the method. The cleanup method is intended when you run topologies in local mode (where a Storm cluster is simulated in process), and you want to be able to run and kill many topologies without suffering any resource leaks.
14. How can we kill a topology?
Ans:
To kill a topology, simply run:
- Storm kill {Storm name}
Give the same name to Storm kill as you used when submitting the topology.
Storm won’t kill the topology immediately. Instead, it deactivates all the spouts so that they don’t emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.
15. Mention what is the difference between Apache Hbase and Storm?
Ans:
It provides data processing in real-time It processes the data but not store. | It offers you low-latency reads of processed data for querying later. It stores the data but does not store |
16.What is combinerAggregator?
Ans:
- public interface CombinerAggregator {
- T init (TridentTuple tuple);
- T combine(T val1, T val2);
- T zero();
- }
A CombinerAggregator is used to combine a set of tuples into a single field. It has the following signature:
Storm calls the init() method with each tuple, and then repeatedly calls the combine()method until the partition is processed. The values passed into the combine() method are partial aggregations, the result of combining the values returned by calls to init().
17. What are the common configurations in Apache Storm?
Ans:
There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found here. The ones prefixed with “TOPOLOGY” can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:
Config.TOPOLOGY_WORKERS : This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelisms across all components in the topology, each worker process will have 6 tasks running within it as threads.
Config.TOPOLOGY_ACKER_EXECUTORS : This sets the number of executors that will track tuple trees and detect when a spout tuple has been fully processed By not setting this variable or setting it as null, Storm will set the number of acker executors to be equal to the number of workers configured for this topology. If this variable is set to 0, then Storm will immediately back tuples as soon as they come off the spout, effectively disabling reliability.
Config.TOPOLOGY_MAX_SPOUT_PENDING : This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.
Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS : This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies.
Config.TOPOLOGY_SERIALIZATIONS : You can register more serializers to Storm using this config so that you can use custom types within tuples.
18. Is it necessary to kill the topology while updating the running topology?
Ans:
Yes, to update a running topology, the only option currently is to kill the current topology and resubmit a new one. A planned feature is to implement a Storm swap command that swaps a running topology with a new one, ensuring minimal downtime and no chance of both topologies processing tuples at the same time.
19. How can Storm UI be used in topology?
Ans:
Storm UI is used in monitoring the topology. The Storm UI provides information about errors happening in tasks and fine-grained stats on the throughput and latency performance of each component of each running topology.
20. Define apache Storm architecture?
Ans:

21.Why does not Apache include SSL?
Ans:
SSL (Secure Socket Layer) data transport requires encryption, and many governments have restrictions upon the import, export, and use of encryption technology. If Apache included SSL in the base package, its distribution would involve all sorts of legal and bureaucratic issues, and it would no longer be freely available. Also, some of the technology required to talk to current clients using SSL is patented by RSA Data Security, who restricts its use without a license.
22. Does Apache include any sort of database integration?
Ans:
Apache is a Web (HTTP) server, not an application server. The base package does not include any such functionality. PHP project and the mod_perl project allow you to work with databases from within the Apache environment.
23. While installing, why does Apache have three config files – srm.conf, access.conf and httpd.conf?
Ans:
The first two are remnants from the NCSA times, and generally you should be fine if you delete the first two, and stick with httpd.conf.
srm.conf :- This is the default file for the ResourceConfig directive in httpd.conf. It is processed after httpd.conf but before access.conf.,
access.conf :- This is the default file for the AccessConfig directive in httpd.conf.It is processed after httpd.conf and srm.conf.,
httpd.conf :-The httpd.conf file is well-commented and mostly self-explanatory.
24. How to check for the httpd.conf consistency and any errors in it?
Ans:
- following command.
- httpd –S
We can check syntax for httpd configuration file by using:
This command will dump out a description of how Apache parsed the configuration file. Careful examination of the IP addresses and server names may help uncover configuration mistakes.
25. Mention what is the difference between Apache Kafka and Apache Storm?
Ans:
It is a distributed and robust messaging system that can handle huge amounts of data and allows passage of messages from one end-point to another. | It is a real time message processing system, and you can edit or manipulate data in real time. Apache storm pulls the data from Kafka and applies some required manipulation. |
26. Explain when to use field grouping in Storm? Is there any time-out or limit to known field values?
Ans:
Field grouping in Storm uses a mod hash function to decide which task to send a tuple, ensuring which task will be processed in the correct order. For that, you don’t require any cache. So, there is no time-out or limit to known field values.
The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the “user-id” field, tuples with the same “user-id” will always go to the same task, but tuples with different “user-id”‘s may go to different tasks.
27. What is mod_vhost_alias?
Ans:
This module creates dynamically configured virtual hosts, by allowing the IP address and/or the Host: header of the HTTP request to be used as part of the path name to determine what files to serve. This allows for easy use of a huge number of virtual hosts with similar configurations.
28. Tell me Is running apache as a root is a security risk?
Ans:
No. Root process opens port 80, but never listens to it, so no user will actually enter the site with root rights. If you kill the root process, you will see the other roots disappear as well.
29. What is Multiviews?
Ans:
MultiViews search is enabled by the MultiViews Options. It is the general name given to the Apache server’s ability to provide language-specific document variants in response to a request. This is documented quite thoroughly in the content negotiation description page. In addition, Apache Week carried an article on this subject entitled It then chooses the best match to the client’s requirements, and returns that document.
30. What is the cluster architecture of Storms?
Ans:

31. Does Apache include a search engine?
Ans:
Yes, Apache contains a Search engine. You can search a report name in Apache by using the “Search title”.
32. Explain how you can streamline log files using Apache Storm?
Ans:
To read from the log files, you can configure your spout and emit per line as it reads the log. The output then can be assigned to a bolt for analyzing.
33. Mention how Storm application can be beneficial in financial services?
Ans:
- Perform real-time anomaly detection on known patterns of activities and use learned patterns from prior modeling and simulations
- Correlate transaction data with other streams (chat, email, etc.) in a cost-effective parallel processing environment.
- Reduce query time from hours to minutes on large volumes of data.
- Build a single platform for operational applications and analytics that reduces total cost of ownership (TCO)
In financial services, Storm can be helpful in preventing:
Securities fraud :
Order routing :
Order routing is the process by which an order goes from the end user to an exchange. An order may go directly to the exchange from the customer, or it may go first to a broker who then routes the order to the exchange.
Pricing :
Pricing is the process whereby a business sets the price at which it will sell its products and services, and may be part of the business’s marketing plan.
Compliance Violations :
compliance means conforming to a rule, such as a specification, policy, standard or law. Regulatory compliance describes the goal that organizations aspire to achieve in their efforts to ensure that they are aware of and take steps to comply with relevant laws and regulations. And any disturbance in regarding compliance is violations in compliance.
34. Can we use Active server pages(ASP) with Apache?
Ans:
Apache Web Server package does not include ASP support.
However, a number of projects provide ASP or ASP-like functionality for Apache. Some of these are:
Apache:ASP :- Apache ASP provides Active Server Pages port to the Apache Web Server with Perl scripting only, and enables development of dynamic web applications with session management and embedded Perl code. There are also many powerful extensions, including XML taglibs, XSLT rendering, and new events not originally part of the ASP AP.
mod_mono :- It is an Apache 2.0/2.2/2.4.3 module that provides ASP.NET support for the web’s favorite server, Apache. It is hosted inside Apache. Depending on your configuration, the Apache box could be one or a dozen of separate processes, all of these processes will send their ASP.NET requests to the mod-mono-server process. The mod-mono-server process in turn can host multiple independent applications. It does this by using Application Domains to isolate the applications from each other, while using a single Mono virtual machine.
35. Is Nimbus communicate with supervisor directly with respect to Apache Storm?
Ans:
Apache Storm uses an internal distributed messaging system for the communication between nimbus and supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus. Worker process. A worker process will execute tasks related to a specific topology.
36. What is Toplogy Message Timeout secs in Apache Storm?
Ans:
It is the maximum amount of time allotted to the topology to fully process a message released by a spout. If the message in not acknowledged in given time frame, Apache Storm will fail the message on the spout.
37. What is the ServerType directive in Apache Server?
Ans:
It defines whether Apache should spawn itself as a child process (standalone) or keep everything in a single process (inetd). Keeping it inetd conserves resources.
The ServerType directive is included in Apache 1.3 for background compatibility with older UNIX-based versions of Apache. By default, Apache is set to a standalone server which means Apache will run as a separate application on the server. The ServerType directive isn’t available in Apache 2.0.
38. In which folder are Java Applications stored in Apache?
Ans:
Java applications are not stored in Apache, it can be only connected to another Java webapp hosting web server using the mod_jk connector. mod_jk is a replacement to the elderly mod_jserv. It is a completely new Tomcat-Apache plug-in that handles the communication between Tomcat and Apache.Several reasons:
mod_jserv was too complex. Because it was ported from Apache/JServ, it brought with it lots of JServ specific bits that aren’t needed by Apache.
mod_jserv supported only Apache. Tomcat supports many web servers through a compatibility layer named the jk library. Supporting two different modes of work became problematic in terms of support, documentation and bug fixes. mod_jk should fix that.
The layered approach provided by the jk. library makes it easier to support both Apache1.3.x and Apache2.xx.
Better support for SSL. mod_jserv couldn’t reliably identify whether a request was made via HTTP or HTTPS. mod_jk can, using the newer Ajp13 protocol.
39. Explain what Apache Storm is? What are the components of Storm?
Ans:
Apache Storm is an open source distributed real-time computation system used for processing real time big data analytics. Unlike Hadoop batch processing, Apache Storm does for real-time processing and can be used with any programming language.
40. What is the data model and components of apache Storm?
Ans:

41. Explain what streams is and stream grouping in Apache Storm?
Ans:
In Apache Storm, stream is referred as a group or unbounded sequence of Tuples while stream grouping determines how stream should be partitioned among the bolt’s tasks.
42. Explain Components of Apache Storm ?
Ans:
Components of Apache Storm includes
Nimbus: It works as a Hadoop’s Job Tracker. It distributes code across the cluster, uploads computation for execution, allocate workers across the cluster and monitors computation and reallocates workers as needed
Zookeeper: It is used as a mediator for communication with the Storm Cluster
Supervisor: Interacts with Nimbus through Zookeeper, depending on the signals received from the Nimbus, it executes the process.
43. List out different stream groupings in Apache Storm?
Ans:
- Shuffle grouping
- Fields grouping
- Global grouping
- All grouping
- None grouping
- Direct grouping
- Local grouping
44. What is Topology Message Timeout secs in Apache Storm?
Ans:
The maximum amount of time allotted to the topology to fully process a message released by a spout. If the message is not acknowledged in a given time frame, Apache Storm will fail the message on the spout.
45. What is Hadoop Storm?
Ans:
Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language, and is a lot of fun to use!
46. How to write the Output into a file using Storm?
Ans:
In Spout, when you are reading a file, make a FileReader object in the Open() method, as such that time it initializes the reader object for the worker node. And use that object in nextTuple() method.
47. How message is fully processed in Apache Storm?
Ans:
By calling the nextTuple procedure or method on the Spout, Storm requests a tuple from the Spout. The Spout utilizes the SpoutoutputCollector given in the open method to discharge a tuple to one of its output streams. While discharging a tuple, the Spout allocates a “message id” that will be used to recognize the tuple later.After that, the tuple gets sent to consuming bolts, and Storm takes charge of tracking the tree of messages that is produced. If the Storm is confident that a tuple is processed thoroughly, then it can call the ack procedure on the originating Spout task with the message id that the Spout has given to the Storm.
48. What do you mean by “spouts” and “bolts”?
Ans:
Apache Storm utilizes custom-created “spouts” and “bolts” to describe information origins and manipulations to provide batch, distributed processing of streaming data.
49. Where would you use Apache Storm?
Ans:
Storm is used for: Stream processing- Apache Storm is adapted to the processing of a stream of data in real-time and updates numerous databases. The processing rate must balance that of the input data. Distributed RPC- Apache Storm can parallelize a complicated query, enabling its computation in real-time. Continuous computation- Data streams are continuously processed, and Storm presents the results to customers in real-time. This might need the processing of every message when it reaches or building it in tiny batches over a brief period. Streaming trending themes from Twitter into web browsers is an illustration of continuous computation. Real-time analytics- Apache Storm will interpret and respond to data as it arrives from multiple data origins in real-time.
50. What is Dzone big data in Apache storm?
Ans:

51. What are the characteristics of Apache Storm?
Ans:
- It is a speedy and secure processing system.
- It can manage huge volumes of data at tremendous speeds.
- It is open-source and a component of Apache projects.
- It aids in processing big data.
- Apache Storm is horizontally expandable and fault-tolerant.
52. How would one split a stream in Apache Storm?
Ans:
- @Override
- public void declareOutputFields(final OutputFieldsDeclarer outputFieldsDeclarer) {
- outputFieldsDeclarer.declareStream(“stream1”, new Fields(“field1”));
- outputFieldsDeclarer.declareStream(“stream2”, new Fields(“field1”));
- }
- Emitting from the bolt on the stream:
- collector.emit(“stream1”, new Values(“field1 Value”));
- You listen to the correct stream through the topology
- builder.setBolt(“myBolt1”, new MyBolt1()).shuffleGrouping(“boltWithStreams”, “stream1”);
- builder.setBolt(“myBolt2”, new MyBolt2()).shuffleGrouping(“boltWithStreams”,”stream2″);
One can use multiple streams if one’s case requires that, which is not really splitting, but we will have a lot of flexibility, we can use it for content-based routing from a bolt for example: Declaring the stream in the bolt:
53. Is there an effortless approach to deploy Apache Storm on a local machine (say, Ubuntu) for evaluation?
Ans:
- LocalCluster cluster = new LocalCluster();
- cluster.submitTopology(“Topology_Name”, conf, Topology_Object);
You use the below code, the topology is submitted to the cluster through the active nimbus node.
- StormSubmitter.submitTopology(“Topology_Name”, conf, Topology_Object);
But if you use the below code, the topology is submitted locally in the same machine. In this case, a new local cluster is created with nimbus, zookeepers, and supervisors in the same machine.
54. What is a directed acyclic graph in Storm?
Ans:
Storm is a “topology” in the form of a directed acyclic graph (DAG) with spouts and bolts serving as the graph vertices. Edges on the graph are called streams and forward data from one node to the next. Collectively, the topology operates as a data alteration pipeline.
55. What do you mean by Nodes?
Ans:
The two classes of nodes are the Master Node and Worker Node. The Master Node administers a daemon Nimbus which allocates jobs to devices and administers their performance. The Worker Node operates a daemon known as Supervisor, which distributes the responsibilities to other worker nodes and manages them as per requirement.
56. What are the Elements of Storm?
Ans:
Storm has three crucial elements, viz., Topology, Stream, and Spout. Topology is a network composed of Stream and Spout. The Stream is a boundless pipeline of tuples, and Spout is the origin of the data streams which transforms the data into the tuple of streams and forwards it to the bolts to be processed.
57. What are Storm Topologies?
Ans:
The philosophy for a real-time application is inside a Storm topology. A Storm topology is comparable to MapReduce. One fundamental distinction is that a MapReduce job ultimately concludes, whereas a topology continues endlessly (or until you kill it, of course). A topology is a graph of spouts and bolts combined with stream groupings.
58. What is the TopologyBuilder class?
Ans:
- java.lang.Object -> org.apache.Storm.topology.TopologyBuilder
- public class TopologyBuilder
- extends Object
- TopologyBuilder builder = new TopologyBuilder();
- builder.setSpout(“1”, new TestWordSpout(true), 5);
- builder.setSpout(“2”, new TestWordSpout(true), 3);
- builder.setBolt(“3”, new TestWordCounter(), 3)
- .fieldsGrouping(“1”, new Fields(“word”))
- .fieldsGrouping(“2”, new Fields(“word”));
- builder.setBolt(“4”, new TestGlobalCount())
- .globalGrouping(“1”);
- Map conf = new HashMap();
- conf.put(Config.TOPOLOGY_WORKERS, 4);
- StormSubmitter.submitTopology(“topology”, conf, builder.createTopology());
TopologyBuilder displays the Java API for defining a topology for Storm to administer. Topologies are Thrift formations in the conclusion, but as the Thrift API is so repetitive, TopologyBuilder facilitates generating topologies. Template for generating and submitting a topology:
59. How do you Kill a topology in Storm?
Ans:
Storm kill topology-name [-w wait-time-secs]
Kills the topology with the name: topology-name. Storm will initially deactivate the topology’s spouts for the span of the topology’s message timeout to let all messages currently processing finish processing. Storm will then shut down the workers and clean up their state. You can annul the measure of time Storm pauses between deactivation and shutdown with the -w flag.
60. What is the architecture view of apache Storm?
Ans:

61. What transpires when Storm kills a topology?
Ans:
Storm does not kill the topology instantly. Instead, it deactivates all the spouts so they don’t release any more tuples, and then Storm pauses for Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS moments before destroying all workers. This provides the topology sufficient time to finish the tuples it was processing while it got destroyed.
62. What is the suggested approach for writing integration tests for an Apache Storm topology in Java?
Ans:
You can utilize LocalCluster for integration testing. You can look at some of Storm’s own integration tests for inspiration here. Tools you want to use are the FeederSpout and FixedTupleSpout. A topology where all spouts implement the CompletableSpout interface can be run until fulfillment using the tools in the Testing class. Storm tests can also decide to “simulate time” which implies the Storm topology will idle till you call LocalCluster.advanceClusterTime. This can allow you to do asserts in between bolt emits, for example.
63. What does the swap command do?
Ans:
A proposed feature is to achieve a Storm swap command that interchanges a working topology with a brand-new one, assuring minimum downtime and no risk of both topologies working on tuples simultaneously.
64. How do you monitor topologies?
Ans:
The most suitable place to monitor a topology is utilizing the Storm UI. The Storm UI gives data about errors occurring in tasks, fine-grained statistics on the throughput, and latency performance of every element of each operating topology.
65. How do you rebalance the number of executors for a bolt in a running Apache Storm topology?
Ans:
You continually need to have larger (or equal number of) jobs than executors. As the quantity of tasks is fixed, you need to define a larger initial number than initial executors to be able to scale up the number of executors throughout the runtime. You can see the number of tasks, like a maximum number of executors: #executors <= #numTasks.
66. What are Streams?
Ans:
A Stream is the core concept in Storm. A stream is a boundless series of tuples that are processed and produced in parallel in a distributed manner. We define Streams by a schema that represents the fields in the stream’s records.
67. What can tuples hold in Storm?
Ans:
By default, tuples can include integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays. You can further specify your serializers so that custom varieties can be utilized natively.
68. How do we check for the httpd.conf consistency and the errors in it?
Ans:
We check the configuration file by using:
httpd –
The command gives a description of how Storm parsed the configuration file. A careful examination of the IP addresses and servers might help in uncovering configuration errors.
69. What is Kryo?
Ans:
Storm utilizes Kryo for serialization. Kryo is a resilient and quick serialization library that provides minute serializations.
70. What is the main Storm architecture of apache?
Ans:

71. What are Spouts?
Ans:
A spout is the origin of streams in a topology. Generally, spouts will scan tuples from an outside source and release them into the topology. Sprouts can be reliable or unreliable. A reliable spout is able to replay a tuple if it was not processed by Storm, while an unreliable spout overlooks the tuple as soon as it is emitted. Spouts can emit more than one stream. To do so, declare multiple streams utilizing the declareStream method of OutputFieldsDeclarer and define the stream to emit to when applying the emit method on SpoutOutputCollector. The chief method on spouts is nextTuple. nextTuple either emits a distinct tuple into the topology or just returns if there are no new tuples to emit. It is important that nextTuple does not block any spout implementation as Storm calls all the spout methods on the corresponding thread. Other chief methods on spouts are ack and fail. These are called when Storm identifies that a tuple emitted from the spout either successfully made it through the topology or failed to be achieved. Ack and fail are only called for reliable spouts.
72. What are Bolts?
Ans:
All processing in topologies is done in bolts. Bolts can do everything from filtering, aggregations, functions, talking to schemas, joins, and more. Bolts can perform simplistic stream transmutations. Doing complicated stream transformations usually demands multiple actions and hence added bolts.
73. What are the programming languages supported to work with Apache Storm?
Ans:
There is no specific language mentioned, as apache Storm is flexible to work with any of the programming languages.
74. What are the components of Apache Storm?
Ans:
Nimbus, Zookeeper, and Supervisor are the components of Apache Storm.
75. What is Nimbus used for?
Ans:
Nimbus is also known as Master Node. Nimbus is used to track the jobs of workers. All the code is distributed among the workers and allocated workers to clusters available. If at all any of the workers is needed with more resources, Nimbus has to provide extra resources to the workers. Note: Nimbus is similar to Job Tracker in Hadoop
76. What is the Storm process?
Ans:
A system for processing streaming data in real time. Apache™ Storm adds reliable real-time data processing capabilities to Enterprise Hadoop. Storm on YARN is powerful for scenarios requiring real-time analytics, machine learning and continuous monitoring of operations.
77. What is Storm used for?
Ans:
Apache Storm is a distributed, fault-tolerant, open-source computation system. You can use Storm to process streams of data in real time with Apache Hadoop. Storm solutions can also provide guaranteed processing of data, with the ability to replay data that wasn’t successfully processed the first time.
78. What is a Storm tool?
Ans:
Storm, or the software tool for the organization of requirements modeling, is a tool designed to streamline the process of specifying a software system by automating processes that help reduce errors.
79. What is spout in Storm?
Ans:
Spouts represent the source of data in Storm. You can write spouts to read data from data sources such as databases, distributed file systems, messaging frameworks etc. Spouts can broadly be classified into following – Reliable – These spouts have the capability to replay the tuples (a unit of data in data stream).
80. Define real time big data analysis engine for Storm?
Ans:

81.What are some of the scenarios in which you would want to use Apache Storm?
Ans:
Storm can be used for the following use cases:
Stream processing
Apache Storm is used to process a stream of data in real time and update several databases. This processing takes place in real time, and the processing speed must match that of the input data speed.
Continuous computation
Apache Storm can process data streams continuously and deliver the results to clients in real time. This could require processing each message when it arrives or creating in small batches over a short period of time. Streaming trending topics from Twitter into browsers is an example of continuous computation.
Distributed RPC
Apache Storm can parallelize a complex query, allowing it to be computed in real time.
Real-time analytics
Apache Storm will analyse and react to data as it comes in from various data sources in real time.
82.What is Storm topology?
Ans:
A topology is a graph of stream transformations where each node is a spout or bolt. Each node in a Storm topology executes in parallel. In your topology, you can specify how much parallelism you want for each node, and then Storm will spawn that number of threads across the cluster to do the execution.
83.What is Zookeeper?
Ans:
Zookeeper helps in communication among the Storm cluster nodes. As the zookeeper is concerned only with coordination and not in messages, there exists not much workload.
84. What is the use of a Supervisor?
Ans:
The supervisor takes signals from Nimbus through Zookeeper to execute the process. Supervisors are also known as Worker nodes.
85. What are the features of Apache Storm?
Ans:
Reliable – All the data is ensured to be executed
Scalable – Machine’s cluster execution provides scalability by parallel calculations.
Robust – Storm restarts workers when there is an error/fault providing successful uninterrupted executions of other workers in the node.
Easy to operate – Standard configurations help it be easy to deploy and use.
Quick – Each node can process One million 100 byte messages.
86. How can log files be streamlined?
Ans:
First, configure spout to read the log files to emit one line and then analyze it by assigning it to bolt.
87. What are the types of stream groups in Apache Storm?
Ans:
All, none, local, global, field, shuffle, and direct groupings are available in apache Storm.
88. What is Topology_Message_Timeout_secs used for?
Ans:
Time specified to process a message released from the spout, and if at all the message is not processed, then the message is considered as fail.
89. How do you use Apache Storm as a Proxy server?
Ans:
Using the mod-proxy module, it can be used as a proxy server as well.
90. What is the Storm architecture and example of topology?
Ans:

91. Command to kill Storm topology?
Ans:
- Storm kill {ACTE_topology}
- Where – ACTE_topology is the name of the topology
92. Why is Apache Storm not provided with SSL.
Ans:
In order to avoid legal or bureaucratic issues, Apache Storm avoids SSL.
93. What are java elements supported by Apache Storm?
Ans:
There is no Java support for Apache Storm.