There has definitely been an evolution of how the industry talks about data. About five years ago the term ‘Big Data’ emerged to define the volume aspect of Big Data. Soon after, the definition of Big Data expanded to a better one that explains what it really is; not just big, but data that moves extremely fast, often lacks structure, varies greatly from existing data, doesn’t fit well with more traditional database technologies, and frankly, is best described as “messy”.
Fast-forward to 2015 and Pentaho’s announcement of version 5.3 this week to deliver on demand big data analytics at scale on Amazon Web Services and Cloudera Impala. This release is driven by what we see in more and more of our customers – (a new data term for you) — EXTREME data problems! Our customer NASDAQ is a very interesting example of where traditional relational data systems have maxed out and have been replaced by cloud architectures that include Hadoop, Pentaho and AWS Redshift. You can read their story here. What NASDAQ found was that pushing vast amounts of data at extreme levels (10 billion rows everyday) was more easily accomplished by combining cloud and big data technologies, creating a more scalable solution that is highly elastic.
We’ve seen many of our customers processing vast volumes of data in Hadoop with the help of Pentaho to enable analytics at scale like never before. The biggest challenge these customers face is getting the results out of Hadoop and into the hands of the users who can make the most of fresh insights. That’s where Pentaho 5.3 comes into play. This release opens the data refinery to Amazon Redshift AND Cloudera Impala to push the limits of analytics through blended and governed data delivery on demand. In addition to adding Redshift and Impala support to the data refinery, 5.3 includes several other key features:
- Advanced Auto-Modeling – Advances in the auto-modeling accelerate the creation and increase the sophistication of generated data models offering better analytics and ease of use
- Additional Hadoop Support – Support for the latest Hadoop distributions from Cloudera and MapR, Hadoop cluster naming for simplified connectivity and management, and enhanced performance for scale-out integration jobs.
- Analyzer API Enhancements – Complete control over the end user experience for highly tailored and easy to deliver embedded analytics.
- Simplified Customer Experience – Easier, more simplified mechanism for embedding analytics and documentation improvements to simplify learning
If your data is big, messy, extreme or just plain annoying and needs to be tamed, I encourage you to learn more about Pentaho 5.3. Check out the great resources like the video and white paper to get started taming your data today.
Product Marketing, Big Data