Xyratex and Pentaho – Making Big Data, Fast Data.

February 11, 2013

Pentaho and Xyratex today announced our strategic partnership to deliver the world’s first integrated Big Data analytics and scalable storage solution.  We have been working on this joint initiative for some time with the ClusterStor team at Xyratex. ClusterStor is the worlds fastest and most performant storage sub-system.  This will be significantly enhanced by the addition of Hortonworks Hadoop and Pentaho Business Analytics.

Xyratex and Pentaho will make Big Data, Fast Data.  This solves a key pain point for Xyratex’s customers. With all of the compute, storage, database and analytics in one true integrated platform, this appliance will eliminate the large data silos as well as put all of that Big Data, into the hands of the business users.  And it will do that fast!  The ClusterStor, Hadoop and Pentaho Big Data Appliance will deliver business analytics on huge data sets, at the lowest TCO and allow the ClusterStor customers to realize rapid business value from their data with a very short time to value.

Xyratex has taken the complexity of deploying Hadoop away from the customer with this integrated appliance. Critically, ClusterStor also meets all the key criteria in the deployment of an enterprise class Big Data solution; scalable, best in class performance, reliability and rapid time.

7 Things to Ask When Looking for Self-Service Analytics for Big Data Stores

December 18, 2012

Self-service analytics – the ability for non-technical business users to intuitively perform ad hoc reporting and analysis on business data residing in corporate databases and spreadsheets has been a staple of BI and business analytics tools for years.  However, these traditional tools simply don’t work against the new breed of “big data” platforms such as Hadoop and NoSQL databases, which have rejected the traditional relational SQL interface in return for massive scalability and the flexibility to store unstructured data.

Meanwhile, some new specialized but limited big data analytics tools have been released to the market, that are designed specifically to work with the new breed of big data platforms.

So what are the questions you should be asking when looking to provide your data analysts and business users easy self-service analytics for big data platforms such as Hadoop, MongoDB, Cassandra or HBase?

1.Do you have more than one kind of big data store, for example Hadoop as well as HBase, MongoDB or Cassandra?

A:   Chances are you do, or will in the future, to take advantage of the relative strengths of these big data platforms. Consider the fact that most new “big data analytics” tools are capable of self-service analytics against a single big data platform, most often just Hadoop.

2.Would you prefer to use the same tool for big data stores in addition to your traditional relational data stores?

A:   Most new big data analytics tools can only access the big data platform they were designed for, and force you to load data from traditional stores into the big data store. For example, they force you to move data from a low-latency relational database into high-latency (but of course massively scalable) Hadoop, or hard-to-query Cassandra or MongoDB. This makes no sense.

3. Are you ok waiting minutes or even hours to access your big data?

A:   Many traditional BI tools have taken the lowest common denominator type of approach to integrating with big data platforms, for example using Hive with Hadoop. These “batch oriented” interfaces make it impossible to perform speed-of-thought analysis – it’s likely you’ve forgotten the question you asked by the time the data comes back.

4. Are you ok using a spreadsheet-like interface to access and analyze your data?

A:  This, plus maybe basic dashboards, is all that most of the new breed of big data analytics tools offer.  Most business users are much more comfortable using a much more intuitive drag & drop graphical interface for interacting and visualizing their data across different dimensions and measures. For example, many users are stumped when it comes to typing in arcane spreadsheet formulae to work with their data.

5. Do you need complete BI capabilities, including reporting, interactive visualization, and predictive analytics?

A: Most new big data analytics tools offer just a basic subset of these capabilities – for example a spreadsheet-like interface and some lightweight dashboard visualizations.  They don’t let you build highly formatted reports, drag & drop data items in a graphical data visualization interface, or make predictions based on prior history.

6. Do you need to enrich your big data with data from outside of the big data platform?

A: Most big data platforms simply leave it up to you to do this manually, or at best force you to inefficiently load copies of the enrichment data, for example customer demographic attributes such as age, income and location, into your big data platform.

7. Is the big data you want to analyze bigger than the amount of memory you have available?

A: Some new big data analytics tools resolve the big data access latency issue by copying data into an in-memory data store. This works well when the data volumes are low, but blows-up when your data is bigger than your memory. Does the big data analytics tool provide alternatives to in-memory, such as switching to a more scalable and high-performance MPP or columnar analytic database as the speed-of-thought data cache?

Here at Pentaho, we are striving to provide the industry’s most mature and comprehensive big data analytics product, and we think you’ll like our answers we have to every one of the questions listed above.

Let me know what you think. Leave a comment below or @ian_fyfe

Ian Fyfe
Big Data Product Marketing


Impala – A New Era for BI on Hadoop

November 30, 2012

With the recent announcement of Impala, also known as Cloudera Enterprise RTQ (Real Time Query), I expect the interest in and adoption of Hadoop to go from merely intense to crazy.  We applaud Cloudera’s investment in creating Impala as it moves Hadoop a huge step forward in making Hadoop accessible using existing BI tools.

What is Impala?  Simply put, it enables all of the SQL-based BI and business analytics tools that have been built over the past couple of decades to now work directly on top of Hadoop, providing interactive response times not previously attainable with Hadoop, and many times faster than Hive, the existing SQL-like alternative. And Impala provides pretty complete SQL support, including join and aggregate functions – must-have functions for analytics.

For enterprises this analytic query speed and expressiveness is huge – it means they are now much less likely to need to extract data out of Hadoop and load it into a data mart or warehouse for interactive visualization.  Instead they can use their favorite business analytics tool directly against Hadoop. But of course only Pentaho provides the integrated end-to-end data integration and business analytics capability for both ingesting and processing data inside of Hadoop, as well as interactively visualizing and analyzing Hadoop data.

Over the past few months Cloudera and Pentaho have been partnering closely at all levels including marketing, sales and engineering.  We are proud of the role we played in assisting Cloudera with validating and testing Impala against realistic BI workloads and use cases.  Based on the extremely strong interest we’ve seen, as evidenced by the lines at our booth at the recent Strata big data conference in New York City, the combination of Pentaho’s visual development and interactive visualization for Hadoop with the break-through performance of Cloudera Impala is very compelling for a huge number of enterprises.

– Ian Fyfe, Chief Technology Evangelist, Pentaho


Going mobile this year? What’s your biggest big data challenge?

November 16, 2012

We received insightful responses to the polls from our “Mobile and Big Data go Instant and Interactive” webinars about the challenges users of all types face with business analytics. The complexity of data integration, lack of skills and resources, and the need to analyze unstructured data are the most significant big data challenges identified for over 80% of attendees. 50% of our attendees either have a current mobile BI solution in place or plan to in the future.

What does this mean for the future of analytics? Whether mobilizing your sales force or empowering data analysts to discover meaning from data in Hadoop, a complete business analytics solution must address the business pressures of a continual inundation of data and the need to access and interact with that data instantly in simple, familiar ways.

Not surprising that the response to Pentaho’s Business Analytics 4.8 has been overwhelmingly positive — the best of analytics offered up in a mobile optimized experience for business users and Instaview broadening big data access to data analysts for data discovery.

If you missed out on our webinar, access the on demand recording at:

Watch the Pentaho 4.8 On-Demand Webinar

Data Integration and business analytics in a single, unified, modern platform — Pentaho is the future of analytics

Let me know what you think about Pentaho 4.8.

Donna Prlich

Director, Product Marketing


Because You Don’t Have Time to F* Around.

November 5, 2012

At Pentaho we are confident that we are providing the most complete solution for big data analytics. But that doesn’t mean that there isn’t always room for improvement — that is where you come in. The big data market is rapidly growing and evolving and we want to ensure we are at the forefront.

Pentaho invites you to participate in our first Big Data Product Strategy Survey. The survey only takes 3 – 5 minutes, can be taken anonymously and you will automatically be entered to win a $100 American Express gift card!*

Click here to take the survey now and help Pentaho provide the big data product that meets your needs – because you are busy and don’t have time to f* around with your big data!

*you must enter your email address at the end of the survey to be contacted to receive your gift card or copy of the final report.

Pentaho’s November Euro-Trip!

November 2, 2012

With October coming to an end, Pentaho is getting ready for a Euro-trip!  Our October events concluded in New York City this past week at O’Reilly’s Strata Conference.

November’s forecast is Big Data and long flights.  First stop is “Big Data Days 2012” in Munich, Germany on November 6-7.  This is a great conference for networking and is focused on linking together business and IT concepts.  Do we have a booth?

We have one stop on our tour in the U.S. of A. – in the hometown of Pentaho’s headquarters, Orlando, FL., for TDWI.  TDWI’s World Conference Series will be held at The Renaissance Orlando Hotel at SeaWorld November 11-16.  TDWI promises an in depth look at the emerging trends for the upcoming year in Big Data.  If you are at TDWI don’t forget to stop by our booth (rumor has it there are some t-shirts left over from Strata).

From there we take a short trip to Zurich for DW2012  on November 12-13.  In its twelfth year, DW2012 is a great place to meet leaders in Big Data and BI – but be warned, the conference is hosted in German, so if you don’t “sprechen sie Deutsch” you might want to hire a translator.  Do we have a booth?

Next stop – Milan, Italy.  The Big Data Congress on November 22nd is a free conference that will be “themed conversations” between big data solutions suppliers and users.  Not only is it going to be greatly informative, it is free, and in Milan – how many more reasons do you need to attend?

Finally, we wrap up the month in London for Enterprise Business Intelligence (EBI) 2012 on November 28th at the Russell Hotel.  Pentaho is proud to be a silver sponsor of the event put together by Whitehall Media featuring the leading BI world.

To register for any of the events, or for more information visit Pentaho’s event page.

If you are attending any of these events and would like to set-up an in person meeting with Pentaho, contact us!


Get every new post delivered to your Inbox.

Join 12,434 other followers