Share

Google ups Big Data profile with Spark & Hadoop in cloud

The new Google Cloud Dataproc service sits between managing the Spark data processing engine or Hadoop framework directly on virtual machines and a fully managed service like Cloud Dataflow, which lets you orchestrate your data pipelines on Google’s platform. “Cloud Dataproc minimizes the time you spend on administration and management”.

Advertisement

Google is adding another product in its range of big data services on the Google Cloud Platform today.

Cloud Dataproc, which the search giant launched into open beta on Wednesday, is a new piece of its big data portfolio that’s designed to help companies create clusters quickly, manage them easily and turn them off when they’re not needed.

Cloud Dataproc offers a number of advantages over both traditional, on-premises products and competing cloud services, Google said. “In the time it takes you to read this blog post, you can have a Spark or Hadoop cluster created, configured, and ready to work for you”, said Google product manager James Malone, announcing the new service in a post on the firm’s blog.

“In addition to this low price, Cloud Dataproc clusters can include preemptible instances that have lower compute prices, reducing your costs even further”. Whereas many providers round up usage to the nearest hour, Cloud Dataproc uses minute-by-minute billing and a 10-minute-minimum billing period. The service also integrates with other Google Cloud Platform offerings, including Cloud Storage, BigQuery, and Cloud Bigtable. “For example, you can use Cloud Dataproc to effortlessly ETL [extract, transform and load] terabytes of raw log data directly into BigQuery for business reporting”, he noted.

Greg DeMichillie, director of product management for Google Cloud Platform, told me Dataproc users will be able to spin up a Hadoop cluster in under 90 seconds – significantly faster than other services – and Google will only charge 1 cent per virtual CPU/hour in the cluster.

Advertisement

Google Inc (NASDAQ:GOOG) claims that corporate users of Spark as well as Hadoop can now do so without an administrator as well as special software. This is because the interaction of clusters with Spark or Hadoop is through the Google Developers console. When a cluster is no longer in use it can be turned off to avoid spending money needlessly.

Google Launches Cloud Dataproc, A Managed Spark And Hadoop Big Data Service