Get the lowest-cost and the best server colocation service in the business. Learn more.
Information Technology News.

Hadoop community working together to make Docker faster

Share on Twitter.

Install your server in Sun Hosting's modern colocation center in Montreal. Get all the details by clicking here.

Do it right this time. Click here and we will take good care of you!

Click here to order our special clearance dedicated servers.

Get the most reliable SMTP service for your business. You wished you got it sooner!

May 3, 2014

The Hadoop community said late yesterday that it's currently working as a team on various new patches that will bring Docker into the data management system, and independent benchmarks are already showing that the technology is now a lot faster than traditional server virtualization methods. The technology actually is a new breakthrough.

Docker is an open source Linux containerization technology that uses underlying kernel elements like namespaces, lxc, and cgroups to let a system admin run multiple apps with all their dependencies in secure sandboxes on the same underlying Linux operating system, making it an attractive option to server virtualization, which bundles a copy of the OS with each app.

In a set of specific benchmarks that an IBM worker released on Thursday, Big Blue demonstrated that Docker containerization has some huge advantages over the KVM hypervisor, from an overall performance perspective.

Alongside this, we also discovered some pretty impressive work by the Hadoop community to bring the technology into the eponymous data analysis and management engine.

This will add more punch to the idea that Docker could become an eventual replacement for traditional server virtualization approaches, granting businesses huge benefits from an open source technology.

To start with, benchmarks conducted by IBM show that Docker has a number of performance advantages over the KVM hypervisor when running on the open source cloud infrastructure tool OpenStack.

An informative post published by IBM's Boden Russell goes into further details about the results. "From an OpenStack Cloudy operational time perspective (boot, reboot, delete, snapshot, etc.) docker LXC outperformed KVM ranging from 1.09x (delete) to 49x (reboot)," Russell wrote.

"Based on the compute node resource usage metrics during the serial VM packing test, Docker LXC CPU growth is approximately 26 times lower than KVM. On this surface, this indicates a 26x density potential increase from a CPU point of view using docker LXC vs a traditional hypervisor. Docker LXC memory growth is approximately 3 times lower than KVM. On the surface, this indicates a 3x density potential increase from a memory point of view using docker LXC vs a traditional hypervisor," he added.

Not only does Docker have desirable resource-usage characteristics, but the way it allows developers to package applications has attracted attention from the open source Hadoop community.

Recently, we learned that some people are diligently working to add Docker support into a crucial component of Apache Hadoop 2.0 named YARN, with the goal of increasing the usefuleness of both technologies.

YARN was introduced in version two of Apache Hadoop, and it lets the software run multiple applications within Hadoop rather than purely MapReduce jobs.

Thanks to this, YARN is helping to transform Hadoop from a batch processing and storage system into a more general tool for manipulating and storing data.

By combining YARN with Docker, the community hopes that it can make it trivial for developers to package an application in a Docker container, then sling it onto the YARN tech as part of a larger Hadoop installation.

Altiscale, the company behind the code contributions that make this possible, was kind enough to answer some of our questions about why this could be useful.

"As a company building Hadoop as a Service (HaaS) platform, we are particularly interested in YARN as it allows Hadoop to move beyond map-reduce to a much more diverse variety of applications," explained the company's chief executive Raymie Stata.

"One of the key components of YARN that make this possible are containers. The existing YARN container implementation does not adequately provide all the types of isolation required to address a scenario we are noticing with our larger customers-– multiple, independent groups in the same organization with different software requirements."

By adding Docker support, Altiscale hopes that it can flatten some of the barriers that lie between enterprise developers and a greater utilization of Hadoop.

"For example, a common issue for users is software dependency management," Stata explained. "Docker provides an intriguing approach to solving that problem by allowing users to upload prepackaged environments or images into repositories which can then easily be downloaded and run in isolation".

"For instance, there are public repositories in the Docker community called Docker Registries which provide a variety of language environments such as Java and Ruby. There is also support for private repositories where containers with more specialized environments can be placed," he added.

Other members of the Hadoop community are keen on the addition of Docker as well. "Where Docker makes perfect sense for YARN is that we can use Docker Images to fully describe the entire Unix filesystem image for any YARN container," explained Arun Murthy, a founder and system architect at Hortonworks.

"In this manner, instead of forcing a user to deal with individual files or binaries as things stand today, we can allow the application to package the entire Unix filesystem image it needs as a Docker image and then get perfect predictability from an environment perspective at runtime."

"This is where Docker has the most amount of interest to the YARN/Hadoop community, particularly for users packaging complex applications which need their own version of Perl, Python, Java, Libc etc, that is hard to manage on YARN currently," he said.

The addition of Docker to YARN looks like a potentially useful tool and is another example of the enthusiasm with which Silicon Valley has adopted the young open source technology.

This follows Red Hat announcing the broad support for Docker in its eponymous Linux distribution, and launching a project named "Atomic" built around the technology. Amazon also recently added Docker support to its "Elastic Beanstalk" platform-as-a-service cloud.

In other IT and open source news

System admins in Microsoft's cloud data centre for Western Europe have spent the morning battling severe issues with the server equipment that supports the company's main cloud service.

Problems with the core Compute and Storage components were first reported at 9:39 am this morning, according to the Windows Azure Status Dashboard, when Microsoft said it had received an alert for SQL Databases, Compute and Storage in West Europe.

Microsoft later admitted that the issues meant that customers could "experience issues accessing services". It described the Compute problems as a "Partial Service Interruption Limited Impact", which is a Microsoft euphemism for a fraction of its technology not working correctly.

Storage was termed a "Partial Service Interruption" and not given a qualifier, so it's likely that more customers were hit by the issues.

As Compute, Storage, and SQL Databases are fundamental building blocks for any cloud infrastructure, this is a severe problem.

As of 2:54 pm Microsoft said it had "partially restored the services and continue to see improvements to Storage availability".

It indicated that the Compute services were mostly fixed, noting that-- "We have confirmed recovery for Compute availability. A very small subset of IaaS Virtual Machines may be affected. We are validating the restoration steps."

In other IT news

The U of T has criticised Canada's Internet Service Providers for unnecessarily routing user traffic via the United States, even when both the origin and destination of the traffic is within Canada.

In a study that mirrors European concerns about why traffic should traverse the U.S. when it doesn't need to, the Canadian transparency study blames an unwillingness to peer for sending traffic into the reach of the NSA.

The University of Toronto's Andrew Clement and Jonathan Obar have put together the report along with an interactive map, in which they rate Canadian ISPs on various transparency characteristics.

The ratings, the report says, are based on how easily users can find information including an ISP's compliance with data privacy legislation, how they report data access requests, how well they define personal information, information about where user data is stored, etc.

Against the 10 criteria used in the assessment, nobody scored highly-- the best was Teksavvy, scoring just 3.5 stars out of a potential ten, followed by Primus on just three stars.

None of the ISPs tested provide transparency reporting, and the researchers say none of the 20 carriers they examined are in full compliance with Canada's PIPEDA (Personal Information Protection and Electronic Documents Act) privacy law.

About routing, the report states-- “Fewer than half of the ISP privacy policies refer to the location and jurisdiction for the information they store. Only one (Hurricane) gives an indication of where it routes customer data and none make explicit that they may route data via the US where it is subject to NSA surveillance”.

Source: The Hadoop Development Team.

Get the most dependable SMTP server for your company. You will congratulate yourself!

Share on Twitter.

Need to know more about the cloud? Sign up for your free Cloud Hosting White Paper.

IT News Archives | Site Search | Advertise on IT Direction | Contact | Home

All logos, trade marks or service marks on this site are the property of their respective owners.

Sponsored by Sure Mail™, Avantex and
by Montreal Server Colocation.

       © IT Direction. All rights reserved.