Spark Interview Questions

Download PDF of Apache Spark Interview Questions

Q41. While installing NiFi, you get “java.lang.UnsupportedClassVersionError” , what could be the reason?

Ans: You generally get java.lang.UnsupportedClassVersionError when you have multiple version of java and you are using wrong java version to run the script.

Please run below before starting Nifi

export JAVA_HOME=<jdk path you want to use>

Q42. Can you use the single installation of Ranger on the HDP, to be used with HDF?

Ans: Yes, you can use single Ranger installed on the HDP to manage HDF (separate installation) as well. However, The Ranger that is included with HDP will not include the service definition for NiFi, so it would need to be installed manually.

NiFi Professional Training with HandsOn : Subscribe Now

Q43. If, I want to execute a shell script, in the NiFi dataflow. How can I do that?

Ans: To execute shell script in the NiFi processor. Then you can use ExecuteProcess processor.

Q44. What is the solution, when your dataflow, is interacting with external system and network outage caused the FlowFile to go in failed relationship?

Ans: Attempt retry anytime you are dealing with an external system where failures resulting from things that are out of NiFi's control for example

· Network outages

· Destination systems have no disk space

· Destination has files of same name etc.

Q45. What do you mean by “Back pressure deadlock”?

Ans: Suppose you have a processor e.g. PublishJMS and you are publishing the message to a target queue. However, target queue is full and you FlowFile will be routed to the failure relationship. And for re-try, you will re-try the same failed FlowFile again and input backpressure connection/queue is getting filled up which can cause a backpressure deadlock. In this situation, PublishJMS will not be called at all any longer. Even when the JMS problem resolves, the whole thing stays deadlocked.

Q46. What is the solution to avoid “Back pressure deadlock” ?

Ans: There are some options like

- admin can temporarily increase the back pressure threshold of the failure connection

- Another useful approach to consider in such a case may actually be to have Reporting Tasks that would monitor the flow for large queues

Q47. I want to consume a SOAP based WebService in HDF dataflow and WSDL is provided to you. Which if the processor will help to consume this WebService?

Ans: You can use InvokeHTTP processor. With InvokeHTTP, you can add dynamic properties, which will be sent in the request as headers. You can use dynamic properties to set values for the Content-Type and SOAPAction headers, just use the header names for the names of the dynamic properties. InvokeHTTP lets you control the HTTP method, so you can set that to POST. The remaining step would be to get the content of request.xml to be sent to the InvokeHTTP as a FlowFile. One way to do this is to use a GetFile processor to fetch requeset.xml from some location on the filesystem, and pass the success relationship of GetFile to InvokeHTTP.

Q48. How, would you check that FlowFile content size and content length is greater than 1 byte?

Ans: We can use expression language syntax to find that as below.

${fileSize:gt(1)} //Here we are checking the flow file size is greater than 1 byte

${content:length():gt(1)} //we are having content attribute from extract text processor (You need to extract the FlowFile content first) and using that attribute and checking length of attribute and checking is it greater than 1.

Q49. How would you Distribute lookup data to be used in the Dataflow processor?

Ans: You should have used “PutDistributeMapCache” . to share common static configurations at various part of a NiFi flow.

Q50. What is a NiFi custom properties registry?

Ans: You can use to load custom key, value pair you can use custom properties registry, which can be configure as (in file)

and you can put key value pairs in that file and you can use that properties in you NiFi processor using expression language e.g. ${OS} , if you have configured that property in registry file.

Q51. How does NiFi support huge volume of PayLoad in a DataFlow?

Ans: Huge volume of data can transit from DataFlow. As data moves through NiFi, a pointer to the data is being passed around, referred to as a FlowFile. The content of the FlowFile is only accessed as needed.

It reads data in streaming fashion to work upon, hence huge memory consumption of the JVM can be avoided.

Premium Training : Spark Full Length Training : with Hands On Lab

Previous Next

Home Spark Hadoop NiFi Java

Disclaimer :

1. Hortonworks® is a registered trademark of Hortonworks.

2. Cloudera® is a registered trademark of Cloudera Inc

3. Azure® is aregistered trademark of Microsoft Inc.

4. Oracle®, Java® are registered trademark of Oracle Inc

5. SAS® is a registered trademark of SAS Inc

6. IBM® is a registered trademark of IBM Inc

7. DataStax ® is a registered trademark of DataStax

8. MapR® is a registered trademark of MapR Inc.

2014-2017 © | Dont Copy , it's bad Karma |