Data is exploding at rates our industry has never seen before and the huge opportunity to leverage this data is stymied by the archaic licensing practices still in use by the old school software companies. Currently, the big guys like Oracle, IBM, SAP, Teradata and other proprietary database and data warehouse vendors have a very simple solution to “big data” environments – just keep charging more money, a lot more money. The only “winners” in this scenario are the software sales reps. Our industry (Tech) is artificially slowed in order to support these old school business models – they can’t afford to innovate in licensing and they surely don’t want to kill the golden goose – The Perpetual License fee.
A major gaming company, for example, had been using Oracle for its database and BI tech. With traffic reaching 100 million to 1 billion impressions per day, the database giant’s only answer was to sell more expensive licenses. Even then, the best it could do was analyze four days worth of information at a time.
Organizations like Mozilla, Facebook, Amazon, Yahoo, RealNetworks and many others are now collecting immense amount of structured and unstructured data. The size of weblogs alone can be enormous. Management wants to be able to triangulate what people are doing at their sites in order to do a better job of
a) Turning prospects into customers
b) Offering customers what they want in a more timely manner
c) Spotting trends and reacting to them in real time.
Any company, small or large, that is trying to sift through terabytes of structured and complex data on an hourly, daily or weekly basis for any kind of analytics had better take a long hard look at what it is really paying for. Just like the worldwide recession of 08-09 brought tremendous attention to lower cost, better value prop alternatives like Pentaho, the “big data” movement is doing the same thing in the DB/DW space. And where do you find some of the best innovations in the tech space? The answer is open source.
Specifically, an open source tech called Apache Hadoop is addressing the “better value proposition for Big Data.” It also is the only tech capable of handling some of these big data applications. Sounds great, right? Well not exactly. The issue with Hadoop is it is a very technical product with a command line interface. Once that data gets into Hadoop, how do you get it out? How do you analyze that data? If only there was an ETL and BI product tightly integrated with Hadoop, and available with the right licensing terms…
Today I’m proud to announce that Pentaho has done just that. Early May 19th we announced our plans to deliver the industry’s first complete end-to-end data integration and business intelligence platform to support Apache Hadoop. Over the next few months we’ll be rolling out versions of our Pentaho Data Integration product and our BI Suite products that will provide Hadoop installations with a rich, visual analytical solution. Early feedback from joint Hadoop-Pentaho sites have been extremely positive and the excitement level is high.
Hadoop came out of the Apache open source camp. It is the best technology around for storing monster data sets. Until recently, only a small number of organizations used it, primarily those with deep technical resources. However, as the tech matures the audience is widening and now with a rich ETL and analytical solution it is about to get even bigger.
Stay tuned to our website and to this blog as I’ll be sharing many success stories over the next 90 days. And most importantly, watch out for the ‘Golden Goose’ licensing schemes from the old school vendors.
Visit www.pentaho.com/hadoop to watch a demo of Pentaho Enterprise integration with Hadoop and reserve your place in the beta program.