7 Things to Ask When Looking for Self-Service Analytics for Big Data Stores

December 18, 2012

Self-service analytics – the ability for non-technical business users to intuitively perform ad hoc reporting and analysis on business data residing in corporate databases and spreadsheets has been a staple of BI and business analytics tools for years.  However, these traditional tools simply don’t work against the new breed of “big data” platforms such as Hadoop and NoSQL databases, which have rejected the traditional relational SQL interface in return for massive scalability and the flexibility to store unstructured data.

Meanwhile, some new specialized but limited big data analytics tools have been released to the market, that are designed specifically to work with the new breed of big data platforms.

So what are the questions you should be asking when looking to provide your data analysts and business users easy self-service analytics for big data platforms such as Hadoop, MongoDB, Cassandra or HBase?

1.Do you have more than one kind of big data store, for example Hadoop as well as HBase, MongoDB or Cassandra?

A:   Chances are you do, or will in the future, to take advantage of the relative strengths of these big data platforms. Consider the fact that most new “big data analytics” tools are capable of self-service analytics against a single big data platform, most often just Hadoop.

2.Would you prefer to use the same tool for big data stores in addition to your traditional relational data stores?

A:   Most new big data analytics tools can only access the big data platform they were designed for, and force you to load data from traditional stores into the big data store. For example, they force you to move data from a low-latency relational database into high-latency (but of course massively scalable) Hadoop, or hard-to-query Cassandra or MongoDB. This makes no sense.

3. Are you ok waiting minutes or even hours to access your big data?

A:   Many traditional BI tools have taken the lowest common denominator type of approach to integrating with big data platforms, for example using Hive with Hadoop. These “batch oriented” interfaces make it impossible to perform speed-of-thought analysis – it’s likely you’ve forgotten the question you asked by the time the data comes back.

4. Are you ok using a spreadsheet-like interface to access and analyze your data?

A:  This, plus maybe basic dashboards, is all that most of the new breed of big data analytics tools offer.  Most business users are much more comfortable using a much more intuitive drag & drop graphical interface for interacting and visualizing their data across different dimensions and measures. For example, many users are stumped when it comes to typing in arcane spreadsheet formulae to work with their data.

5. Do you need complete BI capabilities, including reporting, interactive visualization, and predictive analytics?

A: Most new big data analytics tools offer just a basic subset of these capabilities – for example a spreadsheet-like interface and some lightweight dashboard visualizations.  They don’t let you build highly formatted reports, drag & drop data items in a graphical data visualization interface, or make predictions based on prior history.

6. Do you need to enrich your big data with data from outside of the big data platform?

A: Most big data platforms simply leave it up to you to do this manually, or at best force you to inefficiently load copies of the enrichment data, for example customer demographic attributes such as age, income and location, into your big data platform.

7. Is the big data you want to analyze bigger than the amount of memory you have available?

A: Some new big data analytics tools resolve the big data access latency issue by copying data into an in-memory data store. This works well when the data volumes are low, but blows-up when your data is bigger than your memory. Does the big data analytics tool provide alternatives to in-memory, such as switching to a more scalable and high-performance MPP or columnar analytic database as the speed-of-thought data cache?

Here at Pentaho, we are striving to provide the industry’s most mature and comprehensive big data analytics product, and we think you’ll like our answers we have to every one of the questions listed above.

Let me know what you think. Leave a comment below or @ian_fyfe

Ian Fyfe
Big Data Product Marketing


Big Data Speeds Across the Chasm

December 14, 2012

Last week I visited our European team and met with customers, prospects, press and analysts to learn and talk about big data. My week in Europe confirmed my belief that we are definitely in the right business at the most exciting possible time. In a region that is rife with economic challenges, my conversations were optimistic and inspiring.

Seasoned industry people will be familiar with Geoffrey Moore’s famous curve showing the phases of technology adoption, in which the toughest challenge is ‘crossing the chasm between the early adopters and the early majority. With some technologies, this journey can take years. Many never make it across.

Technology Adoption Lifecycle

After speaking to me and executives from MapR, Cloudera and ParAccel, Brian McKenna of Computer Weekly proposes in his article “Big data analytics set to confound conventional adoption curve in UK” that big data adoption is moving relatively fast in the UK and Europe. The UK industry analyst Clive Longbottom, who I met with, reinforced this saying that big data adoption in the UK was only three months behind the US.

Of course the real proof is in what customers are doing. During my visit, our customer Carsten Bomsdorf of Travian Games presented at the Big Data Analytics conference in London about how his company uses Pentaho to analyze the behavior of its 140 million gamers to continuously innovate its award-winning products. And in marked contrast to last year, every single European customer and prospect I met with was either executing or actively planning for big data analytics.

Why is the adoption curve for big data moving faster than other technologies, even in Europe’s more traditionally risk and hype-averse markets? The answer is economic urgency. Big data analytics has demonstrated that it can help companies identify new revenue streams – even needles in haystacks – regardless of the economic climate. Quite simply big data is the ultimate tool for matching supply with demand.

If Europe’s enthusiasm for big data is anything to go on, I have to conclude that 2013 really will be the year that it starts to enter mainstream production. Fasten your seat belts – it’s going to be a wild ride!

Quentin Gallivan, CEO, Pentaho


Get every new post delivered to your Inbox.

Join 12,434 other followers