Spark Interview Questions

Q11. What does it mean that FlowFileRepository are Write-Ahead-Log?

Ans: It means anything written to the FlowFileRepository are first logged and check pointed. Persist in the logs to avoid any data loss, before processing the data as well as periodically check pointed to support rollback.

Q12. Does Reporting Task have access to the FlowFile contents?

Ans: No, a Reporting Task does not have access to individual FlowFile. Rather, a Reporting Task has access to all Provenance Events, bulletins, and the metrics shown for components on the graph, such as FlowFiles In, Bytes Read, and Bytes Written.

NiFi Professional Training with HandsOn : Subscribe Now

Q13. What is the use of FlowFileExpiration?

Ans: FlowFileExpiration attribute is defined on the Dataflow connection. And it helps us to decide that after x amount of time this FlowFile should be expired and deleted. Suppose you have configured FlowFileExpiration as 1 hr. As soon as FlowFile arrived in the NiFi system and timer will start. And as soon as FlowFile reaches to the connection. Connection will check how old is the FlowFile, if it is already 1 hr than that FlowFile will not be processed and will be deleted.

Q14. What is the Backpressure in NiFi system?

Ans: Sometime what happens that Producer system is faster than consumer system. Hence, the messages which are consumed is slower. Hence, all the messages (FlowFiles) which are not being processed will remain in the connection buffer. However, you can limit the connection backpressure size either based on number of FlowFiles or number of data size. If it reaches to defined limit, connection will give back pressure to producer processor not run. Hence, no more FlowFiles generated, until backpressure is reduced. This is one of the important feature of this.

Q15. If while processing FlowFile, prioritize finds that two FlowFile have the same priority. Which FlowFile will be processed first?

Ans: If there are multiple FlowFiles having the same priority based on the first prioritizes than the next one which is in the selected prioritizes will be considered to evaluate the priority (means another criteria will be used to find the priority).

Q16. What is the Auto Terminate, configuration does in a processor?

Ans: When we have FlowFile to be processed by a processor. Now, processor had completed the processing. What should do with this message? Either you should have configured next processor to process this message further or this message should be dropped, if this is a last processor. So auto termination will help us defining that once the FlowFile is processed auto terminate the FlowFile and needs to be deleted.

Q17. Can processor configuration changed, while it is in running state?

Ans: No, processor configuration cannot be changed or updated while it is running. You have to first stop it and then wait for all the FlowFile processing will be finished. Than only you can change the configuration of the processor.

Q18. What is the use of RouteOnAttribute?

Ans: RouteOnAttribute allows the user to make routing decisions in the flow so that FlowFiles that meet some criteria can be handled differently than other FlowFiles.

Q19. Is it possible that processor, once process the FlowFile and resultant content can be placed with different file name and in different directory?

Ans: Yes, you can take advantage of the attributes to do this. For example in PutFile processor there are attributes directory name and filename, which can be different for each file.

Q20. What is the meaning of Provenance Data in NiFi?

Ans: NiFI stores every small detail about the FlowFile in a Data provenance repository. As the data is processed through the system and is transformed, routed, split, aggregated, and distributed to other endpoints, this information is all stored within NiFi's Provenance Repository. You can search for each individual FlowFile, how it is processed. ?�~D�

Premium Training : Spark Full Length Training : with Hands On Lab

Previous Next

Home Spark Hadoop NiFi Java

Disclaimer :

1. Hortonworks® is a registered trademark of Hortonworks.

2. Cloudera® is a registered trademark of Cloudera Inc

3. Azure® is aregistered trademark of Microsoft Inc.

4. Oracle®, Java® are registered trademark of Oracle Inc

5. SAS® is a registered trademark of SAS Inc

6. IBM® is a registered trademark of IBM Inc

7. DataStax ® is a registered trademark of DataStax

8. MapR® is a registered trademark of MapR Inc.

2014-2017 © | Dont Copy , it's bad Karma |