Big Data Analytics: At The Tip Of Your Tongue

Imagine reliably asking Amazon Alexa, Amazon Echo Dot, Google Home, or a chatbot to run analytics queries against a big data platform. For example, “What were the top three revenue generating products last week?” or better yet “Start my Spark cluster” — all without firing up your computer, scrolling through a report, looking through spreadsheet columns, or asking an analyst or a data admin. Big Data at the tip of your tongue — pun intended.

The concept of conversing with a computer is very interesting and has been around for a while — think Star Trek’s “LCARS” and Hal from “A Space Odyssey”. While we might be a long way off from those realities, recent advancements from Amazon, Google, Microsoft, IBM and other natural language and AI technologies have brought us closer. We can expect a lot of new, creative services being built in the near future.

Meanwhile, in the big data space, with the massive amounts of data generated along with advancements in machine learning algorithms and the speed at scale of computing, it’s only a matter of time before Artificial Intelligence (AI) and Machine Learning (ML) will also power big data analytics. In ways and at speeds never experienced before. These systems will meld with the technological innovations in the peripheral areas like Internet of Things (IoT), cloud computing, and natural language processing.

“The movement towards conversational interfaces will accelerate,” — Stuart Frankel, CEO, Narrative Science. “The recent, combined efforts of a number of innovative tech giants point to a coming year when interacting with technology through conversation becomes the norm. Are conversational interfaces really a big deal? They’re game-changing. Since the advent of computers, we have been forced to speak the language of computers in order to communicate with them and now we’re teaching them to communicate in our language.”

“The early adopters of AI and machine learning in analytics will gain a huge first-mover advantage in the digitalization of business.” — Quentin Gallivan, CEO, Pentaho

All of this was racing through my mind as I was absorbing the barrage of information while attending sessions, bootcamps, and workshops at AWS re:Invent 2016 in Las Vegas couple of weeks ago.

Meet Qulexa (Qubole + Alexa)

So… inspired by the industry predictions, technological advancements, and my own thoughts, I’ve created a voice-enabled conversational interface that runs against Big Data platform in the cloud. Meet Qulexa!

Although it’s just a sample application, Qulexa in my mind is a preview of what I am envisioning where we’re headed when it comes to accessing insights using AI, ML and Big Data technologies as never possible before.

Below I’ve outlined the technical details of Qulexa. I know, I know, if you want to skip reading and get your hands dirty, you can find the codebase on GitHub.

Technical Know-how & Prerequisites

Even though this is a pretty lightweight application, I’ve used various technologies that hopefully get my point across. And walking through the entire codebase is beyond the scope of this post so a good understanding and working knowledge of the following technologies is expected.

You will also need:

QDS is the leading big data platform that enables organizations to process and analyze large amounts of structured, unstructured, and raw data using any engine (Hadoop 1 & 2, Spark, Presto, HBase) on any public cloud infrastructure.

NOTE: Google developers can use recently announced Actions on Google to create similar conversational interfaces and applications for Google Home. (In fact, I might port this app myself so stay tuned!

Application At A Glance

At a higher-level, here’s what you can do out-of-the-box by talking to Amazon Alexa or Echo Dot:

  • List all clusters in your QDS account
  • Start a Spark, Hadoop, Presto, HBase, or Airflow cluster in your QDS account
  • Stop any of your clusters
  • Retrieve results of a saved query

Imp to Note: These are just a handful of commands I’ve implemented and note that QDS provides a very extensive set of REST APIs such as: adding nodes to a cluster, executing Spark, Presto, Hive, or Pig commands, executing workflows, scheduling jobs, running Spark Notebooks, etc. that you can create conversational interfaces for.

Configuration

Before you run or test this app in your environment, be sure to update the following attributes in config.js

  • quboleAPIToken
  • alexaAppId
  • quboleQueryId

Testing

If you don’t have access to Alexa-enabled device such as Amazon Alexa or Echo Dot, you can use Amazon’s browser-based Alexa Skill Testing Tool to test.

Ok, where’s the code?

Calm down, it’s available here on GitHub.

In Summary

Forward thinking companies and technologies will enable us to change how we generate and gain insights in ways and at speeds never possible before. If you’d like to join in on the conversation or if you have any feedback / comments, I’d love to hear from you. Feel free to reach out to me on Twitter or on LinkedIn.

Leave a Reply