Rackspace brings ETL to the Cloud with Pentaho: Hadoop Summit Q&A

June 27, 2013

This week Pentaho has been meeting with the movers and shakers of the Apache Hadoop community in San Jose, at the 6th annual Hadoop Summit. Pentaho and Rackspace are drawing attention on this final day of the show with the announcement of a partnership that brings ETL to the cloud. We’re introducing Rackspace Big Data, a powerful enterprise grade Hadoop as a Service solution. As the industry leader in cost effective data integration for Hadoop, Pentaho is proud to team with Rackspace, the industry leader in enterprise IAAS, to deliver this new era of big data in the cloud.


L) Eddie White, EVP business development, Pentaho | R) Sean Anderson product marketing manager for cloud big data solutions, Rackspace Hosting

To learn more about the news, we’re talking today with Pentaho’s Eddie White, executive vice president of business development.

Give us a quick overview of this Rackspace news, and how Pentaho is involved.

Rackspace Big Data is an exciting Hadoop as a Service offering with full enterprise features. This is the next evolution in the big data ecosystem, delivering the ongoing structure to allow enterprise customers to choose a variety of consumption models over time. Customers can choose managed dedicated servers, and public, private or hybrid cloud options. Pentaho was chosen as the only Hadoop ETL / Data integration partner for this Cloud Tools Hadoop offering.

So is this a solution for enterprise customers looking to grow their big data operations?

Yes, absolutely. Hadoop as a Service is an attractive alternative for customers that need enterprise-level infrastructure support. Pentaho gives Rackspace a partner with the skills and talent on-board to deliver big data for production environments, along with the support and stability that Rackspace customers demand from their service-level agreements. Enterprises are looking for a Cloud partner with an enterprise-grade infrastructure to support running their business; not just test and development efforts.

What makes up this Hadoop as a Service model?

Together, Rackspace, Hortonworks and Pentaho have jointly delivered an offering that facilitates ease of use and ease of adoption of Hadoop as a Service. Rackspace Big Data includes the HortonWorks Data Platform for Hadoop; Pentaho Business Analytics as the ETL / Big Data Integration partner; and Karmasphere providing Hadoop analytics.

Rackspace excels at the enterprise IaaS model, and now they’ve partnered with Hortonworks and Pentaho to introduce an easy-to-use, consume-as-you-scale Hadoop as a Service offering – so customers can get started today, confident their solution will scale along with their big data needs. Rackspace chose to partner with Pentaho because it is the industry-leading Hadoop ETL and Big Data Analytics platform. Rackspace Big Data offers a range of models to meet any organization’s changing needs, from dedicated to hybrid, and for private and public clouds. And the offering ensures the ability to bi-directionally move data in and out of enterprise clusters, with minimal technical effort and cost.

What does Pentaho Data Integration bring to Rackspace Big Data?

Rather than speak for our partner, I’ll let Sean Anderson, Rackspace Hosting’s product marketing manager for cloud big data solutions, answer that. He sums up what Pentaho brings to the partnership nicely:

“Pentaho Data Integration is all about easing adoption and enhancing utilization of Rackspace big data platforms, with native, easy-to-use data integration. Pentaho is leading the innovation of Hadoop Integration and Analytics, and the upcoming cloud offering with Rackspace reduces the barriers to instant success with Hadoop, so customers can adopt and deploy quickly, delivering faster ROI,” said Anderson.

“Pentaho’s powerful data integration engine serves as a platform, enabling delivery of that content right into an enterprise’s pre-existing business intelligence and analytics tools,” continued Anderson. “Rackspace Big Data customers who require multiple data stores can leverage the ease of operation inherent in their visual ETL tool Pentaho provides. Customers will be able to complement their platform offering by adding the validated Pentaho tool via the Cloud Tools Marketplace.”

A key takeaway is that Rackspace Big Data customers may choose to bridge to the Pentaho Business Analytics platform. As an example, Pentaho’s full suite can be used where a Rackspace customer wants to use both Hortonworks and ObjectRocket. We bring the data in both of these databases to life for the Rackspace customer.

Why is Pentaho excited about this announcement?

This is exciting news because it is Pentaho’s first strategic cloud partnership. As the big data market has matured, it’s now time for production workloads to be moved over to Big Data Service offerings. Rackspace is the recognized leader providing the enterprise with IaaS, with an enterprise-grade support model. We see Rackspace and a natural partner for us to make our move into this space. We are market leaders in our respective categories with proven experience that enterprises trust for service, reliability, scalability and support. As the market for Hadoop and Big Data is developing and maturing, we see Rackspace as the natural strategic partner for Pentaho to begin providing Big Data / Hadoop as a Service.

MarketplaceHow can organizations buy Rackspace Big Data?

For anyone looking to leverage Hadoop as a Service, Rackspace Big Data is available directly from Rackspace. For more information and pricing visit: www.rackspace.com/big-data. Pentaho will also be in the Rackspace Cloud Tools marketplace.

“There is nothing more constant than change”—Heraclitus 535BC

June 26, 2013

13-090 Pentaho Labs logo v3

Change and more change. It’s been incredible watching the evolution of and innovation in the big data market.  A few years ago we were helping customers understand Hadoop and the value it could bring in analyzing large volumes of unstructured data. Flash-forward to today as we attend our third Hadoop Summit in San Jose and we see the advances customers have made in adopting these technologies in their production big data environments..

It’s the value of a continuum of innovation. As the market matures we are only limited by what we don’t leave ourselves open to.  Think for a minute about the next “big data,” because there will be one. We can’t anticipate what it look like, where it will come from or how much of it will be of value.  In the same way we couldn’t predict the advent of Facebook or Twitter.

We do know that innovation is a constant. Today’s big data will be tomorrow’s “traditional” data.

Pentaho’s announcement today of an adaptive big data layer and Pentaho Labs are in anticipation of just this type of change.  We’ve simplified for Pentaho and our customers the ability to leverage current and new big data technologies like Hadoop, NoSQL and specialized big data stores.

In the spirit of innovation (which stems from our open source history) we’ve established Pentaho Labs – our place for free thinking innovation that leads to new capabilities in our platform in areas like real time and predictive analytics.

Being a leader at the forefront of a disruptive and ever-changing market means embracing change and the innovation. That’s the future of analytics.

Donna Prlich
Senior Director, Product Marketing, Pentaho

Informatica jumps on the Pentaho bandwagon

June 12, 2013

Big-Data_web.jpgYou know that a technology megatrend has truly arrived when the large vendors start to jump on the bandwagon. Informatica recently announced Informatica Vibe™ — its new virtual data machine (VDM), an embeddable data management engine that allows developers to “Map Once, Deploy Anywhere,” including into Hadoop, without generating or writing code. According to Informatica, developers can instantly become Hadoop developers without having to acquire new skills. Sound familiar?

I applaud Informatica’s efforts – but not for innovating or changing the landscape in data integration.  What I applaud them for is recognizing that the landscape for data integration has indeed changed, and it was time for them to join the party. “Vibe” itself may be new, but it is not a new concept, nor unique to the industry.  In fact, Pentaho recognized the need for a modern, agile, adaptive approach to data integration for OEMs and customers. We pioneered the Kettle “design once, run anywhere” embeddable virtual data engine back in 2005. And let’s set the record straight – Pentaho extended its lightweight data integration capabilities to Hadoop over three years ago as noted in this 2010 press release.

Over the past three years, Pentaho has delivered on Big Data Integration with many successful Hadoop customers, such as BeachMint, MobileThink, TravelTainment and Travian Games and continued our innovation — with not only Hadoop but also NoSQL, Analytical Engines, and other specialized Big Data stores. We have added test, deploy and real time monitoring functionality.  The Pentaho engine is embedded in multiple SaaS, Cloud, and customer applications today such as Marketo, Paytronix, Sharable Ink and Soliditet, with many more on the horizon. Our VDM is completely customer extensible and open. We insulate customers from changes in their data volumes, types, sources, computing platforms, and user types.  In fact, what Informatica states as intention and direction with Vibe, Pentaho Data Integration delivers today, and we continue to lead in this new landscape.


The Data Integration market has changed– the old, heavyweight, proprietary infrastructure players must adapt to current market demands. Agile, extensible, open, embeddable engines with pluggable infrastructures are the base, but it doesn’t end there. Companies of all sizes and verticals are requiring shorter development cycles, broad and deep big data ecosystem support, attractive price points and rich functionality, and all without vendor lock-in.  Informatica is adapting to play in the big data integration world by rebranding its products and signaling new direction.  Tony Baer, principal analyst at Ovum, summarizes this adaptation in his blog, “Informatica aims to get its vibe back.”

The game is on and Pentaho is at the forefront. We have very exciting big data integration news in store for you at the Hadoop Summit in Santa Clara on June 26-27 that unfortunately I have to keep the lid on for now. Stay tuned!


Richard Daley

Co-founder and chief strategy officer

Beer + Pizza + Pentaho = Pentaho London User Group

June 11, 2013
Foto Oficial do Pentaho Day 2013 - Fortaleza - Brasil

Foto Oficial do Pentaho Day 2013 – Fortaleza – Brasil

Guest Post – Pedro Alves, Senior VP of Community, Pentaho

Hello everyone!

Exactly 2 months after the Pentaho Community event in Brazil, that had an all-time record of roughly 200 attendees (see photo on right), we’re hosting our next User Group meeting in London. Shared points between both events? Similar topics, amazing people willing to share their experiences and learn from others and the always fundamental beer and pizza! Unfortunately, the amazing Brazilian weather won’t be the same. Chances are it will be cold and wet – welcome to London:)

The event will be held at the Skills Matter Exchange in the Clerkenwell area of London on Thursday, June 20, 2013 at 6.00 pm. We’re targeting the Pentaho Community, which in my definition includes customers, users, developers, basically anyone that’s willing to spend some time helping to make the product better. It’s one of my main goals as Senior VP of Community to create conditions for that to happen, and I’m interested in hearing ideas and feedback that anyone is willing to share.

Here’s the current agenda:

  • Matt Casters, creator and architect of PDI / Kettle will lead a demo and discussion on how Pentaho supports Hadoop and big data analytics.
  • Dave Romano will lead a talk on how big data start-up Causata has been using Pentaho, specifically covering its use of a custom repository, step plug-ins and embedding Kettle
  • Pedro Alves will present CPK, the Community Plugin Kickstarter, a tool that allow non-developers to create Pentaho Plugins
  • Simon Raybould will describe the dashboard centric implementation at Found, heavily centered around Ctools and Mondrian

Please note that most of the presentations are technically oriented, mainly of interest to consultants and developers. We invite you to propose discussions, technical presentations, user stories and hosted Q&A sessions to educate and inspire other users.

There will be plenty of time before and after the meetup for informal networking. For more information and to register, please visit this link. On behalf of PLUG (Pentaho London User Group) organiser Dan Keeley and Pentaho, we hope to see you on June 20th.

Pedro Alves, Senior VP of Community, Pentaho

The Road to Success with Big Data – A Closer Look at Expectations vs. the Reality

June 5, 2013

Stay on course
Big Data is complex. The technologies in Big Data are rapidly maturing, but are still in many ways in an adolescent phase. While Hadoop is dominating the charts for Big Data technologies, in the recent years we have seen a variety of technologies born out of the early starters in this space- such as Google, Yahoo, Facebook and Cloudera. To name a few:

  • MapReduce: Programming model in Java for parallel processing of large data sets in Hadoop clusters
  • Pig: A high-level scripting language to create data flows from and to Hadoop
  • Hive: SQL-like access for data in Hadoop
  • Impala: SQL query engine that runs inside Hadoop for faster query response times

It’s clear, the spectrum of interaction and interfacing with Hadoop has matured beyond pure programming in Java into abstraction layers that look and feel like SQL. Much of this is due to the lack of resources and talent in big data – and therefore the mantra of “the more we make Big Data feel like structured data, the better adoption it will gain.”

But wait, not so fast—->you can make Hadoop act like a SQL data store. However, there are consequences, as Chris Deptula from OpenBI explains in his blog, A Cautionary Tale for Becoming too Reliant on Hive. You are forgoing flexibility and speed if you choose Hive for a more complex query as opposed to pure programming or using a visual interface to MapReduce.

This goes to show that there are numerous areas of advancements in Hadoop that have yet to be achieved – in this case better performance optimization in Hive. I come from a relational world – namely DB2 – where we spent a tremendous amount of time making this high-performance transactional database – that was developed in the 70’s – even more powerful in the 2000s, and that journey continues today.

Granted, the rate of innovation is much faster today than it was 10, 20, 30 years ago, but we are not yet at the finish line with Hadoop. We need to understand the realities of what Hadoop can and cannot do today, while we forge ahead with big data innovation.

Here are a few areas of opportunity for innovation in Hadoop and strategies to fill the gap:

  • High-Performance Analytics: Hadoop was never built to be a high-performance data interaction platform. Although there are newer technologies that are cracking the nut on real-time access and interactivity with Hadoop, fast analytics still need multi-dimensional cubes, in-memory and caching technology, analytic databases or a combination of them.
  • Security: There are security risks within Hadoop. It would not be in your best interest to open the gates for all users to access information within Hadoop. Until this gap is closed further, a data access layer can help you extract just the right data out of Hadoop for interaction.
  • APIs: Business applications have lived a long time on relational data sources. However with web, mobile and social applications, there is a need to read, write and update data in NoSQL data stores such as Hadoop. Instead of direct programming, APIs can simplify this effort for millions of developers who are building the next generation of applications.
  • Data Integration, Enrichment, Quality Control and Movement: While Hadoop stands strong in storing massive amounts of unstructured / semi-structured data, it is not the only infrastructure in place in today’s data management environments. Therefore, easy integration with other data sources is critical for a long-term success.

The road to success with Hadoop is full of opportunities and obstacles and it is important to understand what is possible today and what to expect next. With all the hype around big data, it is easy to expect Hadoop to do anything and everything. However, successful companies are those that choose combination of technologies that works best for them.

What are your Hadoop expectations?

– Farnaz Erfan, Product Marketing, Pentaho

Pentaho Concierge Services – 5 FAQ

June 3, 2013

Anthony DeShazor, VP of Enterprise Architecture and Principal Enterprise Architect at Pentaho

Last  month Pentaho announced Pentaho Concierge Support Services. We sat down with Anthony DeShazor, VP of Enterprise Architecture and Principal Enterprise Architect at Pentaho and asked him the top 5 frequently asked questions from our customers about this new service. Here is a summary of what we learned:

1. Pentaho recently announced Pentaho Concierge Services. What does this service offer clients?

The Concierge Services provide s a stronger partnership relationship with Pentaho. However, it is much deeper than that. Concierge Services allow customers greater access to Pentaho resources in order to maximize the return of investment in Pentaho. With almost on-demand access to an assigned solution architect and invitations to exclusive technical events, Concierge customers have ongoing access to technical and architectural expertise and to experience in implementing large and complex Pentaho solutions. The solution architect will be a valued partner in developing a long term vision and a plan of implementation. Each solution architect will only serve a few customers, allowing the architect to develop intimate knowledge of customer goals, implementation, and environment. Moreover, this knowledge to help other areas of Pentaho provide better service to Concierge customers.

2. Who is this geared towards? Only fortune 500 companies / smaller fast growing companies can also take advantage of this?

Concierge Services are targeted for the larger and more complex implementations of Pentaho. These implementations are usually for the larger customers. However, smaller fast growing companies can take advantage of the program. In particular, Concierge is a great fit for customers of all sizes who have the internal technical skill but need ongoing technical guidance.

3. Does this Concierge service help clients that are evaluating / building big data projects?

Absolutely! The solution architects who provide Concierge are some of our most experienced architects. The team has experience in implementing Pentaho in many environments—big data, SaaS, enterprise, etc. Concierge is a tailored to help customers develop a vision for their implementation that could include big data and predictive.

4. How is this different than technical support?

Concierge Services and Technical Support are complementary in that they work together to help customers through their implementation; however, they have a specific roles. Technical Support helps with questions related to product features and issues. Concierge provides assistance on strategic questions such as “What is the best way to implement my strict security requirements in Pentaho deployed in multi-tenant SaaS environment” or “Can you provide feedback on the architecture of my Pentaho-Hadoop solution?” These requests are not necessary related to product features but require an analysis of the requirements and experience in large implementations.

5. Is this a charged offering? What should one expect to pay?

Yes, this is a charged offering,  but there are two different levels of engagement that clients can take part depending on your needs – Concierge and Concierge with Strategic Solution Architect. I could go into details of the long list of what is included in each of the services or you can see an easy to read check list of all inclusions on our website at http://www.pentaho.com/services/concierge/.

Thanks Anthony!

Let us know if you have any question and answers you would like us to add. Leave your questions in the comments section below.

To learn more about Pentaho Concierge Services visit: http://www.pentaho.com/services/concierge/


Get every new post delivered to your Inbox.

Join 12,434 other followers