As the new product marketing manager at Pentaho, I wanted to write my first blog on my experience attending the four-day Pentaho BI Suite Bootcamp. This class is key to better understand the architecture of the Pentaho BI Suite and get hands on experience with the Pentaho BI Suite, ETL/Data Integration, Analysis and Reporting.
I have to say that coming from an enterprise data integration vendor such as IBM, I was a bit skeptical of any open source solution. What I expected was the bare bones: limited data type coverage, no metadata, basic data modeling and simple BI reporting. What I discovered was to the other extreme. I was taken by surprise on both the breath and the depth of the solution.
Here is my list of lucky 13 highlights from the class:
1. Coverage of all types of data from what you expect (DB2, Oracle, & SQL Server) to Hadoop, Netezza, Teradata, SAP ERP systems, and of course MySQL and PostgreSQL.
2. Parallel processing of data transformations – with an ‘intuitive’ UI that eliminates the need for coding. Unlike other tools in the market from both Open Source as well as major proprietary ETL vendors that are code generators, Pentaho Data Integration eliminates the need for coding. Talk about a low TCO!
3. A shared repository gives a great opportunity for team work and collaboration in the process.
4. Can handle very sophisticated dimensional modeling concepts – combination, degenerate, and conformed, and slowing changing dimensions, star and snow flake schemas.
5. Power of OLAP for dynamic aggregation, so developers don’t have to figure out the reporting requirements ahead of time with SQL. This is an extraordinary piece of the puzzle.
6. Caching and aggregate-aware constructs, so look ups don’t need to hit the database for every row. The wait times will be a lot less and latency is reduced.
7. Removing the SQL layer for BI developers and data analysts. This has two benefits: 1. Reduces the complexity. 2. Protects databases from novice SQL users that can harm the system. Instead, a set of “business objects” (customer, product, etc) are presented for building reports and dashboards.
8. Covers all types of reporting needs: Production reporting (for managed and high volume pixel perfect distribution), ad hoc analysis (for interactive analysis to measure business performance against time, location, product lines), operational reporting (for detailed reporting on the current state of data supporting ERP and CRM applications), and data mining (for predictive analysis to detect fraud / next best offers).
9. Full drill in and out capabilities on reports. Can drill in from summary information to details. Can also drill from report to report with hyperlinks.
10. Parametrized reports for the interactivity factor needed for analysis. Examples: What if I change the time dimension, or region? What changes I will see and how I need to adjust my operations accordingly?
11. Dashboards, scorecards, and metrics that allow business users assemble previously built reports, queries, and analysis, needing “zero-training”.
12. A quick installation got me up and ready in a few minutes. Quote from a customer in the class: “was able to kick this off in 10 minutes”. This customer has evaluated Pentaho’s other Open Source rivals and had found Pentaho to be superior.
13. And now, for the cream of the crop…Something that the Big guys of ETL are light years behind: Pentaho’s integrated ETL, Visualization, and Modeling environment. It was really cool to be able to create a live connection to an transactional systems (OLTP), select some tables, and from there click right into the modeling and data visualization environments, analyze the content of those tables, do some profiling and decide the best ETL paths as well as best data structures for our new target schema. This was really exciting for me to see, as I have seen struggles around it before. Pentaho not only has all the pieces of the puzzle (ETL, Analysis, Modeling, OLAP, Predictive), which a lot of other vendors don’t even have, but it also has ‘integrated’ them into one development suite that helps the user walk through the process of building a BI solution one step at the time and on a consistent path to completion.
Seeing this set of “sophisticated” ETL, Data Modeling, OLAP, Visualization, and BI tools just left me with one puzzling question at the end. This is what I wrote in my notes at the end of the class: “I can’t believe this is all so cheap! Where I was we used to charge hundreds if not millions of dollars for this!”
Product Marketing Manager