Let’s Explore What are those Special and Updated Features in Hadoop 3.0.


Apache Hadoop is back with its latest update 3.0 which creates a lot of buzz in technological town. The release of Hadoop’s version has happened in a consecutive manner in the very short span of time. The latest version (3.0) of the open source software framework for scalable, reliable, distributed computer brings a lot of new features. Apache community has incorporated many changes in the latest version and still working on them to make more efficient Hadoop Versions. So, by this blog, KVCH is here to introduce and mention all the new features which are included in the latest version of Hadoop.

Let’s explore what are those special and updated features in Hadoop 3.0.
*      Minimum Runtime Version for Hadoop 3.0 is JDK 8.

In Hadoop 3.0, all Hadoop JARs are compiled to run on JDK 8 version, so users who are still using JAVA 7 or below, you should upgrade to JAVA 8 and start working with Hadoop 3.0.
*      HDFS Erasure Coding

Erasure Coding is more like an advanced RAID technique that recovers data automatically when the hard disk fails. It can be used in the place of replication, which will provide the same level of fault-tolerance with less storage overhead. With the support for Erasure Coding in Hadoop 3.0, the physical disk usage will be cut by half and the fault tolerance level will increase by 50%. This new Hadoop 3.0 feature will save Hadoop costumers big bucks on hardware infrastructure as they can reduce the size of their Hadoop Cluster to half and store the same amount of data with HDFS Erasure Coding.

*      Yarn Timeline Service v.2

Hadoop 3.0 introduced a major revision of Yarn Timeline services i.e. v.2. It is developed to address two major challenges:
Ø  Improving scalability and reliability of Timeline Services.
Ø  Enhancing usability by introducing flows and aggregation.

*      Shell script rewrite

Much of Apache Hadoop’s functionality is controlled via the shell. The Hadoop shell scripts have been rewritten to fix many long-standing bugs and include some new features.

Ø  To enable external log rotation, .out files will be appended to the new release unlike being overwritten in previous Hadoop releases.
Ø  All Hadoop shell script subsystems now execute Hadoop-env.sh, which allows for all of the environment variables to be in one location.
Ø  The command to change ownership and permissions on many files ‘hadoop distch’ will now be executed through Hadoop MapReduce jobs.
Ø  Scripts now test and report better error messages for various states of the log and pid dirs on daemon startup. Before, unprotected shell errors would be displayed to the user.
*      Classpath Clients Side Isolation

It solves the problem with application’s code dependency and its conflict with Hadoop’s dependencies. So this feature separating server side Jar and client-side jar like HBase-client dependencies are shared.

*      Support for Opportunistic Containers and Distributed Scheduling

 A notion of Execution Type has been introduced, whereby applications can now request for containers with an execution type of Opportunistic. Opportunistic containers are of lower priority than the default guaranteed containers and are therefore preempted, if needed, to make room for guaranteed containers. This should improve cluster utilization.

*      MapReduce Task-Level Native Optimization   

In Hadoop 3.0, they added the native implementation of the map output collector. For shuffle-intensive jobs, this may provide speed-ups of 30% or more.

*      Supports for more than two namenodes

In this new feature, it will allow users to run multiple standby namenodes. For instance, by configuring three name nodes and five Journalnodes, the cluster is able to tolerate the failure of two nodes rather than just one.

*      Default ports of multiple services have been changed

Prior to Hadoop 3.0, the default ports for multiple Hadoop services were in the Linux ephemeral port range and can be the conflict with other applications running on the same node. So these default ports for namenode, datanode, secondary namenode and KMS has been moved out of the Linux ephemeral port ranges to avoid any bind errors on startup. This feature has been introduced to enhance the reliability of rolling restarts on large Hadoop clusters.
*      Support for Filesystem Connector

Hadoop now supports integration with Microsoft Azure Data Lake and Aliyun Object Storage System as alternative Hadoop-compatible filesystems.

*      Intra-DataNode Balancer

A Single DataNode manages multiple disks. Intra-DataNode Balancing functionality addresses the intra-node skew that can occur when disks are added or replaced.

*      Reworked Daemon and Task Heap Management

Hadoop 3.0 introduces new methods for configuring heap sizes. A series of changes have been made to heap management for Hadoop daemons as well as MapReduce tasks.

To get any of the further more details about the features of Update version of Hadoop visit our official webpage  Best Hadoop training

Comments

Popular posts from this blog

Discover Why Python is an Essential Programming Course

How to Become Data Scientist?

Best Company for Summer Internship