Intel announces their own distribution of Apache Hadoop

Intel decides that they should talk more about big data, and so they did.

Intel decided that they should talk more about big data, and so they did.

The announcement is Intel is beginning to sell their own distribution of Apache Hadoop with support for AES instructions for hardware handling AES encryption rounds on their Xeon CPUs.  Available the second quarter of this year along with a set of Intel software that goes with the Hadoop distribution. What they are selling is a subscription model of support in two tiers, each lasting for 12 months, with a Standard tier for non-production environments and the Premium tier for production.

There is a Intel Manager for Apache Hadoop that goes with the distribution, it’s a management console with monitoring, even logging and wizards, and Intel Active Tuner for Apache Hadoop for automatic performance tuning for Hadoop workloads. Intel Labs is also working on Intel Graph Builder for visualizing data sets and the relationships between large volumes of unstructured data.

As for the performance numbers, Intel states that with Intel distribution that is optimized for Intel 10GbE networking adapters, Intel SSD technology and 10 nodes of dual Intel Xeon E5-2690 instead of 1 node of dual Intel Xeon X5690, the time for analyzing 1 TB of data will go from 4 hours to a mere 7 minutes. [Independent testing not provided, these are Intel’s numbers.] Intel also states up to 8.5x speed-up in Hive queries and the product brief mentions a 6.2x speedup and a 19.8x speedup can be achieved in encrypting/decrypting data using an optimized version of OpenSSL with AES instructions support. Intel points out that the sweet-spot for Hadoop would be dual-processor Xeon nodes, but says that one of their China customers put that on 8P server nodes.

Intel promises that there will be more than 20 launching partners, including some big names, and then go on stating their continuous research and investments in building a ecosystem around the big data market. Additionally, Intel claims two major China mobile phone service provider as their customers that pushed the decision for Intel releasing their Apache Hadoop distribution globally. And so you know the rest of the PR talking points.

Intel Hadoop Distribution

Intel also talks about bringing enhancements available on their distribution of Hadoop back to the Open Source community.  They expect to include enhancements to HDFS, HBase, Hive and Yarn, and say they will submit the enhancements on a regular basis, however, they note they cannot control when the enhancements will appear on the next main Hadoop release.

This announcement shouldn’t come as a surprise, as Intel and others are jumping on the bandwagon of big data, and this announcement to quote from their own words, would be their “initial step towards addressing current challenges in the big data market“. So that means we should be expecting more. However, how does Intel’s offering stack up to other vendors, for instance Cloudera, Oracle and Hortonworks besides the only differentiator, hardware AES instructions? Just because Intel is the leader, will Intel naturally accelerate the adoption of Apache Hadoop, as the partners said in the webcast? We will remain skeptical for the time being.

To make things worse, this announcement gave no love for Atom i n theultra-dense space, since Atom processor cores at the moment are unable to support AES instructions. As we continue to see increased density, per-core performance, performance-per-watt and the natural scalability of such servers towards highly-distributive workloads like Hadoop, Intel will be in a bad position for value ultra-dense server market, even with the hardware AES support and additional performance it brought to the table. Fortunately for Intel, these value ultra-dense servers doesn’t even have ECC memory support to persuade enterprises to make that switch, but that’s going to change soon with ARM Cortex-A50 series CPU cores.

Another challenge would be the inclusion of GPU compute for MapReduce, the programming model behind Hadoop, and worse yet, OpenCL that combines CPU and GPU processing power and work on the same data set. If Intel doesn’t act quickly enough to cover their big data software package to ultra-dense and parallel-processing product lineup, a.k.a Xeon Phi, then the software distribution won’t have much of an impact.

So this is Intel’s first step towards Hadoop infrastructure market.  Announcements like this don’t seem as if they will make a huge bang on the Hadoop community, as it’s another distribution which is going to fight for market share. While we have zero information about the actual performance, both in computing performance and business performance, besides what Intel tells everyone, we cannot make conclusive remarks about this Intel distribution of Apache Hadoop, we are going to just buy some more popcorn (or bacon for that matter) instead and keep an eye on this space. S|A

The following two tabs change content below.

Leo Yim

Author at SemiAccurate
Leo Yim is our correspondent from the far flung reaches of East Asia. Fluent in Mandarin, Cantonese and English he'd rather be talking about computers no matter what language. A true detail man he dreams of building gaming rigs from workstation class parts.