Hadoop will be coming to enterprise data centers soon as the big data bandwagon picks up stream. Speed of deployment is crucial. How fast can you deploy Hadoop and deliver business value?
Big data refers to running analytics against large volumes of unstructured data of all sorts to get closer to the customer, combat fraud, mine new opportunities, and more. Published reports have companies spending $4.3 billion on big data technologies by the end of 2012. But big data begets more big data, triggering even more spending, estimated by Gartner to hit $34 billion for 2013 and over a 5-year period to reach as much as $232 billion.
Most enterprises deploy Hadoop on large farms of commodity Intel servers. But that doesn’t have to be the case. Any server capable of running Java and Linux can handle Hadoop. The mainframe, for instance, should make an ideal Hadoop host because of the sheer scalability of the machine. Same with IBM’s Power line or the big servers from Oracle/Sun and HP, including HP’s new top of the line Itanium server.
At its core, Hadoop is a Linux-based Java program and is usually deployed on x86-based systems. The Hadoop community has effectively disguised Hadoop to speed adoption by the mainstream IT community through tools like SQOOP, a tool for importing data from relational databases into Hadoop, and Hive, which enables you to query the data using a SQL-like language called HiveQL. Pig is a high-level platform for creating the MapReduce programs used with Hadoop. So any competent data center IT group could embark on Hadoop big data initiatives.
Big data analytics, however, doesn’t even require Hadoop. Alternatives like Hortonworks Data Platform (HDP), MapR, IBM GPFS-SNC (Shared Nothing Cluster), Lustre, HPCC Systems, Backtype Storm (acquired by Twitter), and three from Microsoft (Azure Table, Project Daytona, LINQ) all promise big data analytics capabilities.
Appliances are shaping up as an increasingly popular way to get big data deployed fast. Appliances trade flexibility for speed and ease of deployment. By packaging hardware and software pre-configured and integrated they make it ready to run right out of the box. The appliance typically comes with built-in analytics software that effectively masks big data complexity.
For enterprise data centers, the three primary big data appliance players:
- IBM—PureData, the newest member of its PureSystems family of expert systems. PureData is delivered as an appliance that promises to let organizations quickly analyze petabytes of data and then intelligently apply those insights in addressing business issues across their organization. The machines come as three workload-specific models optimized either for transactional, operational, and big data analytics.
- Oracle—the Oracle Big Data Appliance is an engineered system optimized for acquiring, organizing, and loading unstructured data into Oracle Database 11g. It combines optimized hardware components with new software to deliver a big data solution. It incorporates Cloudera’s Apache Hadoop with Cloudera Manager. A set of connectors also are available to help with the integration of data.
- EMC—the Greenplum modular data computing appliance includes Greenplum Database for structured data, Greenplum HD for unstructured data, and DIA Modules for Greenplum partner applications such as business intelligence (BI) and extract, transform, and load (ETL) applications configured into one appliance cluster via a high-speed, high-performance, low-latency interconnect.
And there are more. HP offers HP AppSystem for Apache Hadoop, an enterprise-ready appliance that simplifies and speeds deployment while optimizing performance and analysis of extreme scale-out Hadoop workloads. NetApp offers an enterprise-class Hadoop appliance that may be the best bargain given NetApp’s inclusive storage pricing approach.
As much as enterprise data centers loathe deploying appliances, if you are under pressure to get on the big data bandwagon fast and start showing business value almost immediately appliances will be your best bet. And there are plenty to choose from.