Intel announces their own distribution of Apache Hadoop

Intel decides that they should talk more about big data, and so they did.

Feb 27, 2013 by Leo Yim

Intel decided that they should talk more about big data, and so they did.

The announcement is Intel is beginning to sell their own distribution of Apache Hadoop with support for AES instructions for hardware handling AES encryption rounds on their Xeon CPUs. Available the second quarter of this year along with a set of Intel software that goes with the Hadoop distribution. What they are selling is a subscription model of support in two tiers, each lasting for 12 months, with a Standard tier for non-production environments and the Premium tier for production.

There is a Intel Manager for Apache Hadoop that goes with the distribution, it’s a management console with monitoring, even logging and wizards, and Intel Active Tuner for Apache Hadoop for automatic performance tuning for Hadoop workloads. Intel Labs is also working on Intel Graph Builder for visualizing data sets and the relationships between large volumes of unstructured data.

As for the performance numbers, Intel states that with Intel distribution that is optimized for Intel 10GbE networking adapters, Intel SSD technology and 10 nodes of dual Intel Xeon E5-2690 instead of 1 node of dual Intel Xeon X5690, the time for analyzing 1 TB of data will go from 4 hours to a mere 7 minutes. [Independent testing not provided, these are Intel’s numbers.] Intel also states up to 8.5x speed-up in Hive queries and the product brief mentions a 6.2x speedup and a 19.8x speedup can be achieved in encrypting/decrypting data using an optimized version of OpenSSL with AES instructions support. Intel points out that the sweet-spot for Hadoop would be dual-processor Xeon nodes, but says that one of their China customers put that on 8P server nodes.

Intel promises that there will be more than 20 launching partners, including some big names, and then go on stating their continuous research and investments in building a ecosystem around the big data market. Additionally, Intel claims two major China mobile phone service provider as their customers that pushed the decision for Intel releasing their Apache Hadoop distribution globally. And so you know the rest of the PR talking points.

Intel also talks about bringing enhancements available on their distribution of Hadoop back to the Open Source community. They expect to include enhancements to HDFS, HBase, Hive and Yarn, and say they will submit the enhancements on a regular basis, however, they note they cannot control when the enhancements will appear on the next main Hadoop release.

This announcement shouldn’t come as a surprise, as Intel and others are jumping on the bandwagon of big data, and this announcement to quote from their own words, would be their “initial step towards addressing current challenges in the big data market“. So that means we should be expecting more. However, how does Intel’s offering stack up to other vendors, for instance Cloudera, Oracle and Hortonworks besides the only differentiator, hardware AES instructions? Just because Intel is the leader, will Intel naturally accelerate the adoption of Apache Hadoop, as the partners said in the webcast? We will remain skeptical for the time being.

To make things worse, this announcement gave no love for Atom i n theultra-dense space, since Atom processor cores at the moment are unable to support AES instructions. As we continue to see increased density, per-core performance, performance-per-watt and the natural scalability of such servers towards highly-distributive workloads like Hadoop, Intel will be in a bad position for value ultra-dense server market, even with the hardware AES support and additional performance it brought to the table. Fortunately for Intel, these value ultra-dense servers doesn’t even have ECC memory support to persuade enterprises to make that switch, but that’s going to change soon with ARM Cortex-A50 series CPU cores.

Another challenge would be the inclusion of GPU compute for MapReduce, the programming model behind Hadoop, and worse yet, OpenCL that combines CPU and GPU processing power and work on the same data set. If Intel doesn’t act quickly enough to cover their big data software package to ultra-dense and parallel-processing product lineup, a.k.a Xeon Phi, then the software distribution won’t have much of an impact.

So this is Intel’s first step towards Hadoop infrastructure market. Announcements like this don’t seem as if they will make a huge bang on the Hadoop community, as it’s another distribution which is going to fight for market share. While we have zero information about the actual performance, both in computing performance and business performance, besides what Intel tells everyone, we cannot make conclusive remarks about this Intel distribution of Apache Hadoop, we are going to just buy some more popcorn (or bacon for that matter) instead and keep an eye on this space. S|A

Bio
Latest Posts

Leo Yim

Author at SemiAccurate

Leo Yim is our correspondent from the far flung reaches of East Asia. Fluent in Mandarin, Cantonese and English he'd rather be talking about computers no matter what language. A true detail man he dreams of building gaming rigs from workstation class parts.

Latest posts by Leo Yim (see all)

AMD announces FirePro W9100 and new branding program - Mar 27, 2014
AMD Hawaii-based FirePro to launch - Mar 17, 2014
Microsoft officially turns down Mantle - Oct 16, 2013
AMD to livestream GPU ’14 Product Showcase - Sep 20, 2013
The Microsoft Xbox One backpedaling chronicled - Aug 16, 2013

Thank you, Subscribers!

Thank you to our Subscribers, past and present.

You are appreciated.

You are what keeps SemiAccurate going, what allows us to maintain our journalism, what keeps us ad-free, what allows us to tell it like it is, it is still just you. You, the reader and subscriber, we thank you.

If you want to know more about subscriptions, both free and paid, the information can be found here.

For more on our track record of leading edge journalism see Fully Accurate.
Our Writers

Charlie Demerjian is the founder of Stone Arch Networking Services and S|A.

SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, security and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture.

As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends.

Thomas Ryan is a GIS Programmer and freelance technology writer from Seattle, WA. You can find his work on SemiAccurate and PCWorld.
Tweets from https://twitter.com/SemiAccurate/lists/writers

SemiAccurate

On Target Technology News

Hot Article AMD to differentiate cores

Hot Article Intel foundry customer bails out

Hot Article Coffee Lake is going to impact Intel’s margins

Hot Article SemiAccurate digs up Intel Coffee Lake specs

Intel announces their own distribution of Apache Hadoop

Intel decides that they should talk more about big data, and so they did.

Leo Yim

Latest posts by Leo Yim (see all)