BettaClowd

Building a Better Agile Cloud with programmability and other cool networking things and cloud stuff!!!!!!!!! Please see archive for past demo videos.

Thursday, May 4, 2017

This post includes Part (1 and 2) combined

Part (1): Hortonworks Hadoop on Baremetal vs Metacloud Openstack
Video Overview: (If not interested in an overview/comparison, scroll down to Part (2) Demo)


Hortonworks Background:
The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs.
The Hortonworks Data Platform (HDP): -ready open-source Apache Hadoop framework that completely addresses the needs of data-at-rest processing, powers real-time customer applications and accelerates decision-making and innovation.
Hortonworks Dataflow (HDF): accelerates deployment of big data infrastructure and enables real-time analysis via an intuitive graphical user interface. HDF simplifies, streamlines and secures real-time data collection from distributed, heterogeneous data sources and provides a coding-free, off-the-shelf UI for on-time big data insights. HDF provides a better approach to data collection which is simple, secure, flexible and easily extensible.
Spark: By moving the computation into memory Spark enables a wide variety of processing, including: traditional batch jobs, interactive analysis, real-time streaming and
Spark enables applications in Hadoop clusters to run faster by caching datasets. With the data now being available in RAM instead of on disk, performance is improved dramatically, especially for iterative algorithms that access the same dataset repeatedly.
Apache Kafta is a free and open source distributed streaming platform. Apache Kafka for transport and Spark for analysis are becoming very common in the industry.
The data can also be ingested into the enterprise distributed data lake, traditional SQL databases and NoSQL databases where it can be used to power dashboards, reporting, interactive analysis, data mining and machine learning.
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. 
Data-in-motion is handled by HDF which collects the real-time data from the source, then filters, analyzes and delivers it to the target data store. HDF efficiently collects, aggregates and transports large amounts of streaming event data, processing it in real-time as necessary before sending it on. -ready open-source Apache Hadoop framework that completely addresses the needs of data-at-rest processing,
HDF was designed specifically to meet the practical challenges of collecting data from a wide range of disparate data sources securely, efficiently and over a geographically disperse and possibly fragmented network.
Streaming Analytics connects to external data sources, enabling applications to integrate certain data into the application flow.
Building next-generation big data architecture requires simplified and centralized management, high performance, and a linearly-scaling infrastructure and software platform. Big Data is now all about data-in-motion, data-at-rest and analytic applications.
Cisco UCS (Bare-metal option:
This CVD: §http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/Cisco_UCS_Integrated_Infrastructure_for_Big_Data_and_Analytics_with_Hortonworks_and_HDF.html describes in detail the process for installing Hortonworks 2.4.2 with Apache Spark, Kafka, Storm including the configuration details of the cluster. It also details application configuration for the HDF libraries.
The CVD explains how to build a HDP/HDF cluster using Cisco Fabric Interconnects and UCS Manager to deploy the the UCS C-series servers, along with scripts to install RHEL 7.2 and build the Ambari server and agents needed to deploy the Hadoop cluster and other applications.
Metacloud Openstack:
In a nutshell Metacloud is intended for customers who are looking for a public cloud experience delivered behind their own Firewalls. The hardware is installed in either their own brick and mortar Data Centers or in a carrier neutral COLO faciltiy.
Metacloud is a service where Cisco delivers, and lifecycle manages a deployment of both the hardware and a fully functional OpenStack out-of-the-box with day (2) management services with an 99.99% SLA for the availability of the Openstack services and API(s).
The Part (2) demo will provide insights into using Ansible as a deployment tool to orchestrate an end-to-end automation workflow.
Conclusion:
Metacloud makes OpenStack easier because the Cisco service includes full day (0-2) lifecycle management of the hardware (compute, network, and storage) along with the Openstack. OpenStack provides the customer admin access to the APIs. Consequently, the admins can leverage these API(s) to automate the deployment of virtual machines and other resources to host the analytical applications and Hadoop data cluster. For example, HEAT, Ansible or other orchestration toolsets can provide clients to invoke a one touch Hadoop workflow for building the entire platform.
In the past, bare metal was necessary from a performance perspective but lacked in a common framework for orchestration and life cycle management. With options for PCI pass-through or Ironic bare-metal service in OpenStack, performance in no longer a inhibitor of virtualized big data. In the aforementioned CVD it documents the process of building Cisco rack servers with UCS manager, and then scripting the ambari and Hadoop deployment. As previously mentioned, the Openstack APIs are more eloquent and holistic as a "one-touch" hadoop as a service delivery option. Secondly Metacloud provides the alternative to manage the underlying hardware, which removes another operational burden from the customer infrastructure teams, who would otherwise need to explore public cloud for a similar arrangement.
Part (2): Demo of deploying Hortonworks Hadoop cluster with Ansible.
Demo Video: Don't forget to set youtube to 720P HD....


What you will see in this Demo:
•Ansible to provision VMs, and storage in OpenStack
•Ansible to install Ambari server and agents on VMs
•Ansible to Launch Blueprints to build Hortonworks Hadoop cluster from APIs
•Run a simple Ansible playbook to configure and Yarn and Map Reduce to provide a word count

Download YAML and JSON files from Github:
Readme:

metacloud_ansible_hadoop

Deploying Hadoop with Ansible on Cisco Metacloud OpenStack
Step 1: Run Bash script to start ansible playbooks
~$ . hadoopplaybook
It invokes these playbooks hadoopvm.yaml #Creates volumes, kepairs, instances, and copies assigned IPs to host.txt and other txt files to create ansible inventory and /etc/hosts for the virtual machines
installambari-master.yaml #copy host files, mnt volume, install amabari server and agent
installambari-data1.yaml #copy host files, mnt volume, install amabari agent
installambari-data2.yaml #copy host files, mnt volume, install amabari agen
blueprintcluster.yaml # curl API commands to register and deploy hadoop cluster blueprint into Ambari

wordcount.yaml # configures and runs Yarn and Map reduce to provide a simple word count
Access Ambari with ambari-master IP address:8080
admin/admin
How to create blueprint for Hortonworks:https://community.hortonworks.com/articles/47170/automate-hdp-installation-using-ambari-blueprints.html

Please checkout my other cloud and ansible demos in my archives at http://bettaclowd.blogspot.com