Posts Tagged Hadoop

Winning the Coming Talent War Mainframe Style

The next frontier in the ongoing talent war, according to McKinsey, will be deep analytics, a critical weapon required to probe big data in the competition underpinning new waves of productivity, growth, and innovation. Are you ready to compete and win in this technical talent war?

Similarly, Information Week contends that data expertise is called for to take advantage of data mining, text mining, forecasting, and machine learning techniques. As it turns out the mainframe is ideally is ideally positioned to win if you can attract the right talent.

Finding, hiring, and keeping good talent within the technology realm is the number one concern cited by 41% of senior executives, hiring managers, and team leaders responding to the latest Harris Allied Tech Hiring and Retention Survey. Retention of existing talent was the next biggest concern, cited by 19.1%.

This past fall, CA published the results of its latest mainframe survey that came to similar conclusions. It found three major trends on the current and future role of the mainframe:

  1. The mainframe is playing an increasingly strategic role in managing the evolving needs of the enterprise
  2. The mainframe as an enabler of innovation as big data and cloud computing transform the face of enterprise IT
  3. Demand for tech talent with cross-disciplinary skills to fill critical mainframe workforce needs in this new view of enterprise IT

Among the respondents to the CA survey, 76% of global respondents believe their organizations will face a shortage of mainframe skills in the future, yet almost all respondents, 98%, felt their organizations were moderately or highly prepared to ensure the continuity of their mainframe workforce. In contrast, only 8% indicated having great difficulty finding qualified mainframe talent while 61% reported having some difficulty in doing so.

The Harris survey was conducted in September and October 2012. Its message is clear: Don’t be fooled by the national unemployment figures, currently hovering above 8%.  “In the technology space in particular, concerns over the ability to attract game-changing talent has become institutional and are keeping all levels of management awake at night,” notes Harris Allied Managing Director Kathy Harris.

The reason, as suggested in recent IBM studies, is that success with critical new technologies around big data, analytics, cloud computing, social business, virtualization, and mobile increasingly are giving top performing organizations their competitive advantage. The lingering recession, however, has taken its toll; unless your data center has been charged to proactively keep up, it probably is saddled with 5-year old skills at best; 10-year old skills more likely.

The Harris study picked up on this. When asking respondents the primary reason they thought people left their organization, 20% said people left for more exciting job opportunities or the chance to get their hands on some hot new technology.

Some companies recognize the problem and belatedly are trying to get back into the tech talent race. As Harris found when asking about what companies are doing to attract this kind of top talent 38% said they now were offering great opportunities for career growth. Others, 28%, were offering opportunities for professional development to recruit top tech pros. A fewer number, 24.5%, were offering competitive compensation packages while fewer still, 9%, offering competitive benefits packages.

To retain the top tech talent they already had 33.6% were offering opportunities for professional development, the single most important strategy they leveraged to retain employees. Others, 24.5%, offered opportunities for career advancement while 23.6% offered competitive salaries. Still a few hoped a telecommuting option or competitive bonuses would do the trick.

Clearly mainframe shops, like IT in general, are facing a transition as Linux, Java, SOA, cloud computing, analytics, big data, mobile, and social play increasing roles in the organization and the mainframe gains the capabilities to play in all these arenas. Advanced mainframe skills like CICS are great but it’s just a start. You also need Rest, Hadoop, and a slew of mobile, cloud, and data management skill sets.  At the same time, hybrid systems and expert integrated systems like IBM PureSystems and zEnterprise/zBX give shops the ability to tap a broader array of tech talent while baking in much of the expertise required.


, , , , , , , , , , , , ,

Leave a comment

Speed Time to Big Data with Appliances

Hadoop will be coming to enterprise data centers soon as the big data bandwagon picks up stream. Speed of deployment is crucial. How fast can you deploy Hadoop and deliver business value?

Big data refers to running analytics against large volumes of unstructured data of all sorts to get closer to the customer, combat fraud, mine new opportunities, and more. Published reports have companies spending $4.3 billion on big data technologies by the end of 2012. But big data begets more big data, triggering even more spending, estimated by Gartner to hit $34 billion for 2013 and over a 5-year period to reach as much as $232 billion.

Most enterprises deploy Hadoop on large farms of commodity Intel servers. But that doesn’t have to be the case. Any server capable of running Java and Linux can handle Hadoop. The mainframe, for instance, should make an ideal Hadoop host because of the sheer scalability of the machine. Same with IBM’s Power line or the big servers from Oracle/Sun and HP, including HP’s new top of the line Itanium server.

At its core, Hadoop is a Linux-based Java program and is usually deployed on x86-based systems. The Hadoop community has effectively disguised Hadoop to speed adoption by the mainstream IT community through tools like SQOOP, a tool for importing data from relational databases into Hadoop, and Hive, which enables you to query the data using a SQL-like language called HiveQL. Pig is a high-level platform for creating the MapReduce programs used with Hadoop. So any competent data center IT group could embark on Hadoop big data initiatives.

Big data analytics, however, doesn’t even require Hadoop.  Alternatives like Hortonworks Data Platform (HDP), MapR, IBM GPFS-SNC (Shared Nothing Cluster), Lustre, HPCC Systems, Backtype Storm (acquired by Twitter), and three from Microsoft (Azure Table, Project Daytona, LINQ) all promise big data analytics capabilities.

Appliances are shaping up as an increasingly popular way to get big data deployed fast. Appliances trade flexibility for speed and ease of deployment. By packaging hardware and software pre-configured and integrated they make it ready to run right out of the box. The appliance typically comes with built-in analytics software that effectively masks big data complexity.

For enterprise data centers, the three primary big data appliance players:

  • IBM—PureData, the newest member of its PureSystems family of expert systems. PureData is delivered as an appliance that promises to let organizations quickly analyze petabytes of data and then intelligently apply those insights in addressing business issues across their organization. The machines come as three workload-specific models optimized either for transactional, operational, and big data analytics.
  • Oracle—the Oracle Big Data Appliance is an engineered system optimized for acquiring, organizing, and loading unstructured data into Oracle Database 11g. It combines optimized hardware components with new software to deliver a big data solution. It incorporates Cloudera’s Apache Hadoop with Cloudera Manager. A set of connectors also are available to help with the integration of data.
  • EMC—the Greenplum modular data computing appliance includes Greenplum Database for structured data, Greenplum HD for unstructured data, and DIA Modules for Greenplum partner applications such as business intelligence (BI) and extract, transform, and load (ETL) applications configured into one appliance cluster via a high-speed, high-performance, low-latency interconnect.

 And there are more. HP offers HP AppSystem for Apache Hadoop, an enterprise-ready appliance that simplifies and speeds deployment while optimizing performance and analysis of extreme scale-out Hadoop workloads. NetApp offers an enterprise-class Hadoop appliance that may be the best bargain given NetApp’s inclusive storage pricing approach.

As much as enterprise data centers loathe deploying appliances, if you are under pressure to get on the big data bandwagon fast and start showing business value almost immediately appliances will be your best bet. And there are plenty to choose from.

, , , , , , , , , , , , ,

Leave a comment

Next Up: Dynamic Data Warehousing

Enterprise data warehousing (EDW) has been around for well over a decade.  IBM has been long promoting it across all its platforms. So have Oracle and HP and many others.

The traditional EDW, however, has been sidelined even at a time when data is exploding at a tremendous rate and new data types, from sensor data to smartphone and social media data to video data are becoming common. IBM recently projected a 44-fold increase in data and content, reach 35 zettabytes by 2020. In short, the world of data has changed dramatically since organizations began building conventional data warehouses. Now the EDW should accommodate these new types of data and be flexible enough to handle rapidly changing forms of data.

Data warehousing as it is mainly practiced today is too complex, difficult to deploy, requires too much tuning, and is too inefficient when it comes to bringing in analytics, which delays delivering the answers from the EDW that business managers need, observed Phil Francisco,  VP at Netezza, an IBM acquisition that makes data warehouse appliances. And without fast analytics to deliver business insights, well, what’s the point?

In addition, the typical EDW requires too many people to maintain and administer, which makes it too costly, Francisco continued. Restructuring the conventional EDW to accommodate new data types and new data formats—in short, a new enterprise data model—is a mammoth undertaking that companies wisely shy away from. But IBM is moving beyond basic EDW to something Francisco describes as an enterprise data hub, which entails an enterprise data store surrounded by myriad special purpose data marts and special purpose processors for various analytics and such.

IBM’s recommendation: evolve the traditional enterprise data warehouse into what it calls the enterprise data hub, a more flexible systems architecture. This will entail consolidating the infrastructure and reducing the data mart sprawl. It also will simplify analytics, mainly by deploying analytic appliances like IBM’s Netezza. Finally, organizations will need data governance and lifecycle management, probably through automated policy-based controls. The result should be better information faster and delivered in a more flexible and cost-effective way.

Ultimately, IBM wants to see organizations build out this enterprise data hub with a variety of BI and analytic engines connected to it for analyzing streamed data and vast amounts of unstructured data of the type Hadoop has shown itself particularly good at handling. BottomlineIT wrote about Hadoop in the enterprise back in February here.

The payback from all of this, according to IBM, will be increased enterprise agility and faster deployment of analytics, which should result in increased business performance. The consolidated enterprise data warehouse also should lower the TCO  for the EDW and speed time to business value. All desirable things, no doubt, but for many organizations this will have require a gradual process and a significant investment in new tools and technologies, from specialized appliances to analytics.

Case in point is Florida Hospital, Orlando, which deployed a z10 mainframe with DB2 10, which provides enhanced temporal data capabilities, with the primary goal of converting its 15 years of clinical patient data into an analytical data warehouse for use in leading edge medical and genetics research. The hospital calls for getting the data up and running on DB2 10 this year and attaching the Smart Analytics Optimizer as an appliance in Q1 2012. Then it can begin cranking up the research analytics.  Top management has bought into this plan for now, but a lot can change in the next year, the earliest the first fruits of the hospital’s analytical medical data exploration are likely to hit.

Oracle has its own EDW success stories here. Hotwire, a leading discount travel site, for example, works with major travel providers to help them fill seats, hotel rooms, and rental cars that would otherwise go unsold. It deployed Oracle’s Exadata Database Machine to improve data warehouse performance and to scale for growing business needs.

IBM does not envision the enterprise data hub as a platform-specific effort. Although EDW runs on IBM’s mainframe much of the activity is steered to the company’s midsize UNIX/Linux Power Systems server platform. Oracle and HP offer x86-based EDW platforms, and HP is actively partnering with Microsoft on its EDW offering.

In an IBM study, 50% business managers complained they don’t have the information they need to do their jobs and 60% of CEOs admitted they need to do a better job of capturing and understanding information rapidly in order to make swift business decisions. That should be a signal to revamp to your EDW now.

, , , , , , , , , , , ,

Leave a comment

Hadoop Aims for the Enterprise

Hadoop, the data storage and retrieval approach developed by Google to handle its massive data needs, is coming to the enterprise data center. Are you interested?

Behind Hadoop is MapReduce, a programming model and software framework that enables the creation of applications able to rapidly process vast amounts of data in parallel on large clusters of compute nodes. Hadoop is an open source project of the Apache Software Foundation and can be found here.

Specifically, Hadoop offers a framework for running applications on large clusters built from commodity hardware. It uses a style of processing called Map/Reduce, which, as Apache explains it, divides an application into many small fragments of work, each of which may be executed on any node in the cluster. A key part of Hadoop is the Hadoop Distributed File System (HDFS), which reliably stores very large files across nodes in the cluster. Both Map/Reduce and HDFS are designed so that node failures are automatically handled by the framework. Hadoop nodes consist of a server with storage.

Hadoop moves computation to the data itself. Computation consists of a map phase, which produces a sorted key and value pairs, and a reduce phase. According to IBM, a distributor of Hadoop, data is initially processed by map functions, which run in parallel across the cluster. The reduce phase aggregates and reduces the map results and completes the job.

HDFS breaks stored data into large blocks and replicates it across the cluster, providing highly available parallel processing and redundancy for both the data and the jobs. Hadoop distributions provide a set of base class libraries for writing Map/Reduce jobs and interacting with HDFS.

The attraction of Hadoop is its ability to find and retrieve data fast from vast unstructured volumes and its resilience. Hadoop, or some variation of it, is critical for massive websites like Google, Facebook, Yahoo and others. It also is a component is IBM’s Watson. But where would Hadoop play in the enterprise?

Cloudera ( has staked out its position as a provider of Apache Hadoop for the enterprise. It primarily targets companies in financial services, Web, telecommunications, and government with Cloudera Enterprise. It includes the tools, platform, and services necessary to use Hadoop in an enterprise production environment, ideally within what amounts to a private cloud.

But there are other players plying the enterprise Hadoop waters. IBM offers its own Hadoop distribution. So does Yahoo. You also can get it directly from the Hadoop Apache community.

So what are those enterprise Hadoop applications likely to be. A few come immediately to mind:

  • Large scale analytics
  • Processing of massive amounts of sensor or surveillance data
  • Private clouds running social media-like applications
  • Fraud applications that must analyze massive amounts of dynamic data fast

Hadoop is like other new technologies that emerge. Did your organization know what it might do with the Web, rich media, solid state disk, or the cloud when they first appeared? Not likely, but it probably knows now. It will be the same with Hadoop.


, , , , , ,

Leave a comment