Let’s Explore What are those Special and Updated Features in Hadoop 3.0.
Apache Hadoop is
back with its latest update 3.0 which creates a lot of buzz in
technological town. The release of Hadoop’s version has happened in a
consecutive manner in the very short span of time. The latest version (3.0) of
the open source software framework for scalable, reliable, distributed computer
brings a lot of new features. Apache community has incorporated many changes in
the latest version and still working on them to make more efficient Hadoop
Versions. So, by this blog, KVCH is here to introduce and mention all
the new features which are included in the latest version of Hadoop.
Let’s explore what
are those special and updated features in Hadoop 3.0.

In Hadoop 3.0, all
Hadoop JARs are compiled to run on JDK 8 version, so users who are still using
JAVA 7 or below, you should upgrade to JAVA 8 and start working with Hadoop
3.0.

Erasure Coding is
more like an advanced RAID technique that recovers data automatically when the
hard disk fails. It can be used in the place of replication, which will provide
the same level of fault-tolerance with less storage overhead. With the support
for Erasure Coding in Hadoop 3.0, the physical disk usage will be cut by half
and the fault tolerance level will increase by 50%. This new Hadoop 3.0 feature
will save Hadoop costumers big bucks on hardware infrastructure as they can
reduce the size of their Hadoop Cluster to half and store the same amount of
data with HDFS Erasure Coding.

Hadoop 3.0
introduced a major revision of Yarn Timeline services i.e. v.2. It is developed
to address two major challenges:
Ø
Improving scalability and reliability of Timeline Services.
Ø
Enhancing usability by introducing flows and aggregation.

Much of Apache
Hadoop’s functionality is controlled via the shell. The Hadoop shell scripts
have been rewritten to fix many long-standing bugs and include some new
features.
Ø
To enable external log rotation, .out files will be appended to
the new release unlike being overwritten in previous Hadoop releases.
Ø
All Hadoop shell script subsystems now execute Hadoop-env.sh, which allows for all of the
environment variables to be in one location.
Ø
The command to change ownership and permissions on many files
‘hadoop distch’ will now be executed through Hadoop MapReduce jobs.
Ø
Scripts now test and report better error messages for various states of the log and pid dirs
on daemon startup. Before, unprotected shell errors would be displayed to the
user.

It solves the
problem with application’s code dependency and its conflict with Hadoop’s
dependencies. So this feature separating server side Jar and client-side jar
like HBase-client dependencies are shared.

A notion of Execution Type has been
introduced, whereby applications can now request for containers with an
execution type of Opportunistic. Opportunistic containers are of lower priority
than the default guaranteed containers and are therefore preempted, if needed,
to make room for guaranteed containers. This should improve cluster
utilization.

In Hadoop 3.0, they
added the native implementation of the map output collector. For
shuffle-intensive jobs, this may provide speed-ups of 30% or more.

In this new
feature, it will allow users to run multiple standby namenodes. For instance,
by configuring three name nodes and five Journalnodes, the cluster is able to
tolerate the failure of two nodes rather than just one.

Prior to Hadoop
3.0, the default ports for multiple Hadoop services were in the Linux ephemeral
port range and can be the conflict with other applications running on the same
node. So these default ports for namenode, datanode, secondary namenode and KMS
has been moved out of the Linux ephemeral port ranges to avoid any bind errors
on startup. This feature has been introduced to enhance the reliability of
rolling restarts on large Hadoop clusters.

Hadoop now supports
integration with Microsoft Azure Data Lake and Aliyun Object Storage System as
alternative Hadoop-compatible filesystems.

A Single DataNode
manages multiple disks. Intra-DataNode Balancing functionality addresses the
intra-node skew that can occur when disks are added or replaced.

Hadoop 3.0
introduces new methods for configuring heap sizes. A series of changes have
been made to heap management for Hadoop daemons as well as MapReduce tasks.
To get any of the further more
details about the features of Update version of Hadoop visit our official
webpage Best
Hadoop training
Comments
Post a Comment