Hadoop Questions-3

Hadoop Interview Questions

Q21. Which object can be used to get the progress of a particular Jon?

Ans: Context

Q22. What is next step after Mapper or MapTask?

Ans : The output of the Mapper are sorted and Partitions will be created for the output. Number ofpartition depends on the number of reducer.

Q23. How can we control particular key should go in a specific reducer?

Ans: Users can control which keys (and hence records) go to which Reducer by implementing acustom Partitioner.

Q24. What is the use of Combiner?

Ans: It is an optional component or class, and can be specify via Job.setCombinerClass(ClassName), to perform local aggregation of the intermediateoutputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.

Q25. How many maps are there in a particular Job?

Ans: The number of maps is usually driven by the total size of the inputs, that is, the total numberof blocks of the input files. Generally it is around 10-100 maps per-node. Task setup takes awhile,so it is best if the maps take at least a minute to execute. Suppose, if you expect 10TB of input dataand have a blocksize of 128MB, you'll end up with 82,000 maps, to control the number of block you can use the mapreduce.job.maps parameter (which only provides a hint tothe framework). Ultimately, the number of tasks is controlled by the number of splits returned by theInputFormat.getSplits() method (which you can override).

Q26. What is the Reducer used for?

Ans: Reducer reduces a set of intermediate values which share a key to a (usually smaller) set of values. The number of reduces for the job is set by the user via Job.setNumReduceTasks(int).

Q27. Explain the core methods of the Reducer?

Ans: The API of Reducer is very similar to that of Mapper, there's a run() method that receives aContext containing the job's configuration as well as interfacing methods that return data from thereducer itself back to the framework. The run() method calls setup() once, reduce() once foreach key associated with the reduce task, and cleanup() once at the end. Each of these methodscan access the job's configuration data by using Context.getConfiguration().

As in Mapper, any or all of these methods can be overridden with custom implementations. If noneof these methods are overridden, the default reducer operation is the identity function; values arepassed through without further processing.

The heart of Reducer is its reduce() method. This is called once per key; the second argumentis an Iterable which returns all the values associated with that key.

Q28. What are the primary phases of the Reducer?

Ans: Shuffle, Sort and Reduce

Q29. Explain the shuffle?

Ans: Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP.

Q30. Explain the Reducer’s Sort phase?

Ans: The framework groups Reducer inputs by keys (since different mappers may have output thesame key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being fetchedthey are merged (It is similar to merge-sort).

Premium Training : Spark Full Length Training : with Hands On Lab

Previous Next

Home Spark Hadoop NiFi Java

Disclaimer :

1. Hortonworks® is a registered trademark of Hortonworks.

2. Cloudera® is a registered trademark of Cloudera Inc

3. Azure® is aregistered trademark of Microsoft Inc.

4. Oracle®, Java® are registered trademark of Oracle Inc

5. SAS® is a registered trademark of SAS Inc

6. IBM® is a registered trademark of IBM Inc

7. DataStax ® is a registered trademark of DataStax

8. MapR® is a registered trademark of MapR Inc.

Report abuse