Self-service analytics – the ability for non-technical business users to intuitively perform ad hoc reporting and analysis on business data residing in corporate databases and spreadsheets has been a staple of BI and business analytics tools for years. However, these traditional tools simply don’t work against the new breed of “big data” platforms such as Hadoop and NoSQL databases, which have rejected the traditional relational SQL interface in return for massive scalability and the flexibility to store unstructured data.
Meanwhile, some new specialized but limited big data analytics tools have been released to the market, that are designed specifically to work with the new breed of big data platforms.
So what are the questions you should be asking when looking to provide your data analysts and business users easy self-service analytics for big data platforms such as Hadoop, MongoDB, Cassandra or HBase?
1.Do you have more than one kind of big data store, for example Hadoop as well as HBase, MongoDB or Cassandra?
A: Chances are you do, or will in the future, to take advantage of the relative strengths of these big data platforms. Consider the fact that most new “big data analytics” tools are capable of self-service analytics against a single big data platform, most often just Hadoop.
2.Would you prefer to use the same tool for big data stores in addition to your traditional relational data stores?
A: Most new big data analytics tools can only access the big data platform they were designed for, and force you to load data from traditional stores into the big data store. For example, they force you to move data from a low-latency relational database into high-latency (but of course massively scalable) Hadoop, or hard-to-query Cassandra or MongoDB. This makes no sense.
3. Are you ok waiting minutes or even hours to access your big data?
A: Many traditional BI tools have taken the lowest common denominator type of approach to integrating with big data platforms, for example using Hive with Hadoop. These “batch oriented” interfaces make it impossible to perform speed-of-thought analysis – it’s likely you’ve forgotten the question you asked by the time the data comes back.
4. Are you ok using a spreadsheet-like interface to access and analyze your data?
A: This, plus maybe basic dashboards, is all that most of the new breed of big data analytics tools offer. Most business users are much more comfortable using a much more intuitive drag & drop graphical interface for interacting and visualizing their data across different dimensions and measures. For example, many users are stumped when it comes to typing in arcane spreadsheet formulae to work with their data.
5. Do you need complete BI capabilities, including reporting, interactive visualization, and predictive analytics?
A: Most new big data analytics tools offer just a basic subset of these capabilities – for example a spreadsheet-like interface and some lightweight dashboard visualizations. They don’t let you build highly formatted reports, drag & drop data items in a graphical data visualization interface, or make predictions based on prior history.
6. Do you need to enrich your big data with data from outside of the big data platform?
A: Most big data platforms simply leave it up to you to do this manually, or at best force you to inefficiently load copies of the enrichment data, for example customer demographic attributes such as age, income and location, into your big data platform.
7. Is the big data you want to analyze bigger than the amount of memory you have available?
A: Some new big data analytics tools resolve the big data access latency issue by copying data into an in-memory data store. This works well when the data volumes are low, but blows-up when your data is bigger than your memory. Does the big data analytics tool provide alternatives to in-memory, such as switching to a more scalable and high-performance MPP or columnar analytic database as the speed-of-thought data cache?
Here at Pentaho, we are striving to provide the industry’s most mature and comprehensive big data analytics product, and we think you’ll like our answers we have to every one of the questions listed above.
Let me know what you think. Leave a comment below or @ian_fyfe
Big Data Product Marketing