Make way for Hadoop in the 'Big Data' craze
SAN FRANCISCO (MarketWatch) -- When Mac computer users learned this week that their hotel searches on the Orbitz travel site produced higher-priced offerings than PC users get, it was one of the higher-profile examples of a hot topic sweeping the database software industry.
Increasingly that trend depends upon a technology platform called Hadoop, analysts say. In the case of Orbitz, insight gleaned from Hadoop, an open-source technology, showed that people browsing on Apple computers preferred fancier hotels.
"So when you're using a Mac, you're given quality rankings first," said Omer Trajman, an executive at Cloudera Inc., which offers IT services based on Hadoop and has Orbitz as a customer.
Big businesses are eager to find new ways to make use of torrents of information, letting them process data from a range of sources -- from Twitter and
Hadoop has emerged as a leading platform in that trend, which many observers have dubbed "Big Data." Most Fortune 500 companies are using such technology to harness value from vast and exponentially increasing flows of information, according to UBS analyst Brent Thill.
The technology -- which offers the ability to sort through data from diverse sources with exceptional speed -- is seen complementing the incumbent database players led by
But analysts say the platform also poses a threat to technology's big guns, in the same way that the rise of software as a Web-based service has disrupted the way tech companies have expanded in business software.
The Hadoop disruption is being led by small closely held companies in Silicon Valley: Coudera in Palo Alto; Hortoworks in Sunnyvale and MapR in San Jose. The firms all offer services and support to businesses using Hadoop which, as an open-source platform, is being fine-tuned and expanded by a community of developers.
"Data is exploding," JMP Securities analyst Patrick Walravens said. "Why is it exploding? Because you have social media."
A report from the brokerage late last year, based on data from Informatica, gives some sense of the scope of the Big Data phenomenon: As of late last year, 600 blog posts, 34,000 tweets and 240,000 pieces of content were being published on the Web every minute.
"With data spilling everywhere, there's the desire to understand what it means and do it in real time," said analyst of Roger Kay of Endpoint Technologies Associates.
That has led to a shift in focus in the market for database systems toward "unstructured" data -- that is, information created, sometimes randomly, by different kinds of devices and in different formats, big chunks of it running outside a company's network.
By contrast, traditional database systems from the likes of Oracle and IBM typically rely on so-called structured data, or information culled from standard business operations such as sales, inventory management or financial record-keeping.
"Some of the traditional ways to store and process data are just not structured to handle the increasing volume, velocity and variety of data," said Greg McDowell, a JMP Securities analyst.
"Traditional databases have columns and structures -- name, rank, serial number, data of entry, date of departure," Kay said. "In a Hadoop cluster, it's unstructured. You don't know what the structure is."
And there's money to be had in figuring how to make sense of all that unstructured data.
Companies could use the information to figure out, even in real time, which products are selling or not, the best time and place to sell them, and to whom. Data analyses could even drill down based on specific factors or conditions, such as users' location, the time of day and even the weather.
"There are so many use cases," said Peter Goldmacher, analyst with Cowen & Co. "That's the real opportunity -- if I had this data, how can I use it?"
Hadoop was created by computer scientist Doug Cutting, who developed the platform based on data-indexing research from Google Inc. Cutting, now Cloudera's chief architect, named the technology after his son's yellow stuffed-elephant toy, which went on to become the platform's logo.
Trajman, Cloudera's vice president for technology solutions, said the platform makes possible "very rich data processing and advanced analysis."
And big corporations are taking notice, said McDowell of JMP Securities.
The market for Big Data, in general, is expected to grow sharply in the next decade.
Investor interest in the trend took a new turn in April with the highly-successful initial public offering of
JMP Securities estimates the market was about $9 billion in 2011, or about 2% of about $407 billion spent on business software, storage and servers. In 10 years, Big Data spending will reach $86 billion, or roughly 11% of total enterprise IT spending, according to JMP.
McDowell said Hadoop won't completely replace traditional database platforms, saying, "The relational database is not going to go away. These new technologies will cohabitate with a lot of the existing environment."
J.P. Morgan's John DiFucci says that new Big Data technologies are likely to work alongside existing tools, rather than displace them. Cloudera's Trajman said his company's partners include Oracle, IBM and Teradata.
However, Oracle Chief Executive Larry Ellison seemed to play down the buzz around the trend, suggesting that it isn't sharply different from the kind of data processing his company is already doing.
Asked at Oracle's unveiling of its cloud platform about the company's strategy, Ellison said Oracle itself uses Hadoop, but he also added, "There's big data with a capital B and capital D, and then there's big data ... So whatever you mean by Big Data is yes, we are big into Big Data and we are big into big data."
Cowen's Goldmacher said the rise of new data processing systems has been having an impact on the world's top player in the database market, Oracle. He noted the slower growth rate of Oracle's database business, mainly because rivals have been able to attract customers with lower-cost technologies.
Oracle similarly faces a dilemma with the rise of software applications sold as a Web-based service or through a subscription system, which is radically different from Oracle's traditional licensing model.
"You're going to see continued pressure on pricing," Goldmacher said. "The issue for Oracle is not the technology -- it's the pricing model."
Meanwhile, McDowell argued that "we're still in the very early innings of big data and Hadoop."
For companies looking for new ways to process their data, he added, "the biggest challenge is betting on an architecture that two years from now may be obsolete."