Spark Interview Questions

Download PDF of Apache Spark Interview Questions

Q31. How provenance repository does provides search capability?

Ans: Provenance repository uses the embedded Lucene search engine.

Q32. Can we re-process the FlowFile, which has already been processed and how?

Ans: Yes, we can re-process the FlowFile which had already been processed from Data Provenance repository. Replay button that allows the user to re-insert the FlowFile into the flow and re-process it from exactly the point at which the event happened. This provides a very powerful mechanism, as we are able to modify our flow in real time, re-process a FlowFile, and then view the results. If they are not as expected, we can modify the flow again, and re-process the FlowFile again. We are able to perform this iterative development of the flow until it is processing the data exactly as intended.

Q33. What is the use of Flow Controller?

Ans: The flow controller is the brains of the operation. It provides threads for extensions to run on, and manages the schedule of when extensions receive resources to execute. The Flow Controller acts as the engine dictating when a particular processor is given a thread to execute.

NiFi Professional Training with HandsOn : Subscribe Now

Q34. What is process group in NiFi?

Ans: Process group can help you to create sub data flow. Which you can add in your main dataflow. You can send and receive data from process group using output port and input port respectively. You can say it is a composition of the NiFi components to create a sub dataflow.

Q35. What is the difference between FlowFile and Content repository in NiFi?

Ans: The FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is presently active in the flow.

The Content Repository is where the actual content bytes of a given FlowFile live.

Q36. How do you define NiFi content repository?

Ans: As we mentioned previously, contents are not stored in the FlowFile. They are stored in the content repository and referenced by the FlowFile. This allows the contents of FlowFiles to be stored independently and efficiently based on the underlying storage mechanism.

Q37. Does NiFi works as a master-slave architecture?

Ans: No, from NiFi 1.0 there is 0-master philosophy is considered. And each node in the NiFi cluster is the same. NiFi cluster is managed by the Zookeeper. Apache ZooKeeper elects a single node as the Cluster Coordinator, and failover is handled automatically by ZooKeeper. All cluster nodes report heartbeat and status information to the Cluster Coordinator. The Cluster Coordinator is responsible for disconnecting and connecting nodes. Additionally, every cluster has one Primary Node, also elected by ZooKeeper.

Q38. If you are working as a DataFlow Manager on a clustered NiFi setup, than which node you will be using to create dataflow?

Ans: As a DataFlow manager, you can interact with the NiFi cluster through the user interface (UI) of any node. Any change you make is replicated to all nodes in the cluster, allowing for multiple entry points.

Q39. How NiFi does guarantees the delivery of the messages?

Ans: This is achieved through effective use of a purpose built persistent write-ahead log and content repository.

Q40. If you need to do site-to-site deployment, then which all files would you configure?

Ans: To create site-to-site deployment, you have to do following configurations.

1. state-management.xml : to reflect my zookeeper instances

2. nifi.properties : for site-to-site properties and cluster properties

3. zookeeper.properties and authorizers.xml : to reflect the hostnames of all nodes

Premium Training : Spark Full Length Training : with Hands On Lab

Previous Next

Home Spark Hadoop NiFi Java












Disclaimer :

1. Hortonworks® is a registered trademark of Hortonworks.

2. Cloudera® is a registered trademark of Cloudera Inc

3. Azure® is aregistered trademark of Microsoft Inc.

4. Oracle®, Java® are registered trademark of Oracle Inc

5. SAS® is a registered trademark of SAS Inc

6. IBM® is a registered trademark of IBM Inc

7. DataStax ® is a registered trademark of DataStax

8. MapR® is a registered trademark of MapR Inc.

2014-2017 © HadoopExam.com | Dont Copy , it's bad Karma |