Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery, and other functionalities to help businesses scale and grow.

It gives organizations a secure and robust platform to develop their custom cloud-based solutions and has several unique features that make it one of the most reliable and flexible cloud platform such as

  • Mobile-friendly access through AWS Mobile Hub and AWS Mobile SDK
  • Fully managed purpose-built Databases
  • Serverless cloud functions
  • Range of storage options that are affordable and scalable.
  • Unbeatable security and compliance

Following are some core services offered by AWS:

AWS Core services
  1. An EC2 instance is a virtual server in Amazon's Elastic Compute Cloud (EC2) for running applications on the AWS infrastructure.
  2. Amazon Elastic Block Store (EBS) is a cloud-based block storage system provided by AWS that is best used for storing persistent data.
  3. Amazon Virtual Private Cloud (Amazon VPC) enables us to launch AWS resources into a virtual network that we have defined. This virtual network closely resembles a traditional network that we would operate in our own data center, with the benefits of using the scalable infrastructure of AWS.
  4. Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
  5. AWS security groups (SGs) are associated with EC2 instances and provide security at the protocol and port access level. Each security group — working much the same way as a firewall — contains a set of rules that filter traffic coming into and out of an EC2 instance.

Let us look more deeply at one of AWS's core services – AWS CloudFormation – that is key for managing workloads on AWS.

1.   CloudFormation

AWS CloudFormation is a service that helps us model and set up our Amazon Web Services resources so that we can spend less time managing those resources and more time focusing on our applications that run in AWS.  We create a template that describes all the AWS resources that we want (like Amazon EC2 instances or S3 buckets), and AWS CloudFormation takes care of provisioning and configuring those resources for us. We don't need to individually create and configure AWS resources and figure out what's dependent on what; AWS CloudFormation handles all of that.

A stack is a collection of AWS resources that you can manage as a single unit. In other words, we can create, update, or delete a collection of resources by creating, updating, or deleting stacks. All the resources in a stack are defined by the stack's AWS CloudFormation template.

2.   CloudFormation template

CloudFormation templates can be written in either JSON or YAML.  The structure of the template in YAML is given below:

---
AWSTemplateFormatVersion: "version date"

Description:
  String
Metadata:
  template metadata
Parameters:
  set of parameters
Mappings:
  set of mappings
Conditions:
  set of conditions
Resources:
  set of resources
Outputs:
  set of outputs

In the above yaml file,

  1. AWSTemplateFormatVersion - The AWS CloudFormation template version that the template conforms to.
  2. Description - A text string that describes the template.
  3. Metadata - Objects that provide additional information about the template.
  4. Parameters - Values to pass to our template at runtime (when we create or update a stack). We can refer to parameters from the Resources and Outputs sections of the template.
  5. Mappings - A mapping of keys and associated values that we can use to specify conditional parameter values, like a lookup table. We can match a key to a corresponding value by using the Fn::FindInMap intrinsic function in the Resources and Outputs sections.
  6. Conditions - Conditions that control whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update. For example, we can conditionally create a resource that depends on whether the stack is for a production or test environment.
  7. Resources - Specifies the stack resources and their properties, such as an Amazon Elastic Compute Cloud instance or an Amazon Simple Storage Service bucket.  We can refer to resources in the Resources and Outputs sections of the template.
  8. Outputs - Describes the values that are returned whenever we view our stack's properties. For example, we can declare an output for an S3 bucket name and then call the AWS cloudformation describe-stacks AWS CLI command to view the name.

Resources is the only required section in the CloudFormation template.  All other sections are optional.

3.   CloudFormation template to create S3 bucket

S3template.yml

Resources:
  HelloBucket:
    Type: AWS::S3::Bucket

In AWS Console, go to CloudFormation and click on Create Stack

Upload the template file which we created.  This will get stored in an S3 location, as shown below.

Click next and give a stack name

Click Next and then "Create stack".  After a few minutes, you can see that the stack creation is completed.

Clicking on the Resource tab, you can see that the S3 bucket has been created with name “s3-stack-hellobucket-buhpx7oucrgn”.  AWS has provided this same since we didn’t specify the BucketName property in YAML.

Note that deleting the stack will delete the S3 bucket which it had created.

4.   Intrinsic functions

AWS CloudFormation provides several built-in functions that help you manage your stacks.

In the below example, we create two resources - a Security Group and an EC2 Instance, which uses this Security Group.  We can refer to the Security Group resource using the !Ref function.

Ec2template.yml

Resources:
  Ec2Instance:
    Type: 'AWS::EC2::Instance'
    Properties:
      SecurityGroups:
        - !Ref InstanceSecurityGroup
      KeyName: mykey
      ImageId: ''
  InstanceSecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Enable SSH access via port 22
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'
          CidrIp: 0.0.0.0/0

Some other commonly used intrinsic functions are

  1. Fn::GetAtt - returns the value of an attribute from a resource in the template.
  2. Fn::Join - appends a set of values into a single value, separated by the specified delimiter. If a delimiter is an empty string, the set of values are concatenated with no delimiter.
  3. Fn::Sub - substitutes variables in an input string with values that you specify. In our templates, we can use this function to construct commands or outputs that include values that aren't available until we create or update a stack.
5.   Parameters

Parameters enable us to input custom values to your template each time you create or update a stack.

TemplateWithParameters.yaml

Parameters: 
  InstanceTypeParameter: 
    Type: String
    Default: t2.micro
    AllowedValues: 
      - t2.micro
      - m1.small
      - m1.large
    Description: Enter t2.micro, m1.small, or m1.large. Default is t2.micro.
Resources:
  Ec2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType:
        Ref: InstanceTypeParameter
      ImageId: ami-0ff8a91507f77f867
6.   Pseudo Parameters

Pseudo parameters are parameters that are predefined by AWS CloudFormation. We do not declare them in our template. Use them the same way as we would a parameter as the argument for the Ref function.

Commonly used pseudo parameters:

  1. AWS: Region - Returns a string representing the AWS Region in which the encompassing resource is being created, such as us-west-2
  2. AWS::StackName - Returns the name of the stack as specified during cloudformation create-stack, such as teststack
7.   Mappings

The optional Mappings section matches a key to a corresponding set of named values. For example, if you want to set values based on a region, we can create a mapping that uses the region name as a key and contains the values we want to specify for each specific region. We use the Fn::FindInMap intrinsic function to retrieve values in a map.

We cannot include parameters, pseudo parameters, or intrinsic functions in the Mappings section.

TemplateWithMappings.yaml

AWSTemplateFormatVersion: "2010-09-09"
Mappings: 
  RegionMap: 
    us-east-1:
      HVM64: ami-0ff8a91507f77f867
      HVMG2: ami-0a584ac55a7631c0c
    us-west-1:
      HVM64: ami-0bdb828fd58c52235
      HVMG2: ami-066ee5fd4a9ef77f1
    eu-west-1:
      HVM64: ami-047bb4163c506cd98
      HVMG2: ami-0a7c483d527806435
    ap-northeast-1:
      HVM64: ami-06cd52961ce9f0d85
HVMG2: ami-053cdd503598e4a9d
    ap-southeast-1:
      HVM64: ami-08569b978cc4dfa10
      HVMG2: ami-0be9df32ae9f92309
Resources: 
  myEC2Instance: 
    Type: "AWS::EC2::Instance"
    Properties: 
      ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", HVM64]
      InstanceType: m1.small
8.   Outputs

The optional Outputs section declares output values that we can import into other stacks (to create cross-stack references), return in response (to describe stack calls), or view on the AWS CloudFormation console. For example, we can output the S3 bucket name for a stack to make the bucket easier to find.

In the below example, the output named StackVPC returns the ID of a VPC, and then exports the value for cross-stack referencing with the name VPCID appended to the stack's name.

Outputs:
  StackVPC:
    Description: The ID of the VPC
    Value: !Ref MyVPC
    Export:
      Name: !Sub "${AWS::StackName}-VPCID"

There is little doubt that data will guide the next generation of business strategy and will bring new efficiencies across industries. But for that to happen, organizations must be able to extract insights from their data.

Qubole is an ideal platform to activate end-to-end data processing in organizations. It combines all types of data – structured, unstructured, and legacy offline data – into a single data pipeline and turns it into rich insights by adding AI, ML, and deep analytics tools to the mix.

It scales seamlessly to accommodate more users and new data without adding administrative overheads and lowers cloud costs significantly. Simply put, Qubole is a platform that puts big data on the cloud to power business decisions based on real-time analytics.

At CloudIQ Technologies, our data experts have deployed Qubole’s cloud-native data systems for many of our clients, and the results have been outstanding. Here is an article from one of our data engineers that provides an overview of how to setup Qubole to use AWS environment and create and run spark clusters.

AWS Access Configuration:

In order for Qubole to create and run a cluster, we have to grant Qubole access to our AWS environment. We can grant access based on a key or a role. We will use role-based authentication.

Step 1: Login to Qubole

Step 2: Click on the menu at the top left corner and select "Account Settings" under the Control Panel.

Step 3: Scroll down to Access settings

Step 4: Switch Access mode to “IAM Role”

Step 5: Copy the Trusted Principal AWS Account ID and External ID

Step 6: Use the copied values to create a QuboleAccessRole in the AWS account (using the cloudformation template)

Step 7: Copy the Role ARN of the QuboleAccessRole and enter it in the Role ARN field

Step 8: Enter the S3 bucket location where the Qubole metadata will be stored in the "Default Location" field.

Step 9: Click Save

Spark Cluster
Create a cluster

The below steps will help create a new Spark cluster in Qubole.

Step 1: Click on the top-left dropdown menu and select "Cluster"

Step 2: Click on “+New” button

Step 3: Select “Spark” and click “Next”

Step 4: Provide a name for the cluster in the “Cluster Labels” field

Step 5: Select the version of Spark to run, Master Node Type, Worker Node Type, Minimum and Maximum nodes

Step 6: Select Region as us-west-2

Step 7: Select Availability Zone as us-west-2a

Step 8: Click “Next”

Step 9: In the Composition screen, you can select the type of nodes that will be spun up.

Step 10: In the Advanced Configuration screen, proceed to EC2 settings

Step 11: Enter “QuboleDualIAMRole” in the “Instance Profile” field

Step 12: Select “AppVPC” in VPC field

Step 13: Select “AppPrivateSNA” under Subnet field

Step 14: Enter the ip address of the Bastion node in the “Bastion Node” field

Step 15: Scroll to the bottom and enter “AppQuboleClusterSG” (security group for the cluster) in the “Persistent Security Group” field

Step 16: Click on “Create”

Run a cluster

To start a cluster, click on the dropdown menu on the top left corner and select cluster. Now click on "Start" button next to the cluster that needs to be started. A cluster is also automatically started when a job is submitted for the cluster.

Submit a job

One of the simplest ways to run a spark job is to submit it through the workbench. You can navigate to the workbench from the drop-down menu at the top left corner. In the workbench, click on "+Create New". Then select "Spark" next to the title of the job. Once you select Spark, an optional drop-down appears where you can choose "Python". In the last drop-down menu, select the spark cluster where you want to execute the job. If this cluster is not active, it will be activated automatically. Enter your spark job in the window below. When complete, click on "Run" to run the job.

Airflow Cluster

Airflow scheduler can be used to run various jobs in a sequence. Let’s take a look at configuring an Airflow cluster in Qubole.

Setting up DataStore

The first step in creating an airflow cluster is to set up a datastore. Make sure that the MySQL db is up and running and contains a database for airflow. Now, select “Explore” from the dropdown menu at the top left corner. On the left hand menu, drop down the selection menu showing “Qubole Hive” and select “+Add Data Store”

In the new screen, provide a name for the data store. Select “MySQL” as the database type. Enter the database name for the airflow database (The database should already be created in MySQL). Enter the host address as “hmklabsbienvironment.cq8z1kp7ikd8.us-west-2.rds.amazonaws.com". Enter the username and password. Make sure to select “Skip Validation”. Since the MySQL db is in a private VPC, Qubole does not have access to it and will not be able to validate.

Configuring Airflow Cluster

Step 1: Click on the top left drop-down menu and select "Cluster"

Step 2: Click on “+New” button

Step 3: Select “Airflow” in the type of cluster and click “Next”

Step 4: Give a cluster name. Select the airflow version, node type.

Step 5: Select the datastore which points to the MySQL

Step 6: Select the us-west-2 as the Region

Step 7: Select us-west-2a as the Availability zone

Step 8: Click next to go to Advanced Configuration

Step 9: Select AppVPC as the VPC

Step 10: Select AppPrivateSNA as the Subnet

Step 11: Enter the Bastion Node information

Step 12: Scroll to the bottom and enter AppQuboleClusterSG as the Persistent Security Groups

Step 13: Click on create

Once the cluster is created, you can run it by clicking on "Start" next to the cluster's name.

Containers are being embraced at a breakneck speed – developers love them, and they are great for business because they deliver speed and scale in a cost-efficient manner. So much so, that container technology seems to be overtaking VMs – especially with container orchestration tools like Kubernetes, making them simpler to manage and extracting higher efficiency and speed from them.

Kubernetes cluster architecture

Kubernetes provides an open-source platform for simplifying multi-cloud environments. The disparities between different cloud providers are a roadblock for developers and Kubernetes helps by streamlining and standardizing container-based applications.

Kubernetes clusters are the architectural foundation that drives this simplicity and makes it possible for users to get the functionality they need at scale and with ease. Here are some of the functionalities of Kubernetes -

  • Kubernetes distributes workload efficiently across all open resources and reduces traffic spikes or outages.
  • It simplifies application deployment regardless of the size of the cluster
  • It automates horizontal scaling
  • It monitors against app failure with constant node and container health checks and performs self-healing and replication to resolve any failure issues.

All this makes the work of developers faster and frees up their time and attention from trivial repetitive tasks allowing them to build applications better and faster! For the organization, the benefits are three-fold - higher productivity, better products and, finally, cost efficiencies.

Let’s move to the specifics now and find out how to set up a Kubernetes Cluster on the RHEL 7.6 operating system on AWS.

Prerequisites:
  • You should have a VPC available.
  • A subnet within that VPC, into which you will place your cluster.
  • You should have Security Groups for the Control Plane Load Balancer and the Nodes created.
  • You should have created the Control Plane Load Balancer.
  • A bastion host, or jump box, with a public IP within your VPC from which you can secure shell into your VMs.
  • A pem file for your AWS region, which you will use to secure shell into your VMs.
Creating the IAM Roles

You will need to create 2 IAM roles: one for the Master(s), and one for the worker nodes.

Master Role

To create an IAM role, go to the IAM (Identity and Access Management) page in the AWS console. On the left-hand menu, click ‘Roles’. Then click ‘Create Role’.

Select the service that will use this role. By default, it is EC2, which is what we want. Then click "Next: Permissions".

Click ‘Create Policy’. The Create Policy page opens in a new tab.

Click on the ‘JSON tab’. Then paste this json into it:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:*",
                "elasticloadbalancing:*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:BatchGetImage",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:UpdateAutoScalingGroup"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

This json defines the permissions that your master nodes will need.

Click ‘Review Policy’. Then give your policy a name and a description.

Click ‘Create Policy’ and your policy is created!

Back on the Create Role page, refresh your policy list, and filter for the policy you just created. Select it and click ‘Next: Tags’.

You should add 2 tags: Name, with a name for your role, and KubernetesCluster, with the name of the cluster that you are going to create. Click ‘Next: Review’.

Give your role a name and a description. Click ‘Create Role’ and your role is created!

Node Role

For the node role, you will follow similar steps, except that you will use the following json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:Describe*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:BatchGetImage"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}
Provisioning the VMs
Provisioning the Master

We will use RHEL 7.6 for our cluster because RHEL 8.0 uses iptables v1.8, and kube-proxy does not work well with iptables v1.8. However, kube-proxy works with iptables v1.4, which is installed on RHEL 7.6. We will use the x86_64 architecture.

Log into the AWS console. Go to the EC2 home page and click ‘Launch Instance’. We will search under Community AMIs for our image.

Click ‘Select’. Then choose your instance type. T2.medium should suffice for a Kubernetes master. Click ‘Next: Configure Instance Details’.

We will use only 1 instance. For an HA cluster, you will want more. Select your network and your subnet. For the purposes of this tutorial, we will enable auto-assigning a public IP.  In production, you would probably not want your master to have a public IP.  But you would need to make sure that your subnet is configured correctly with the appropriate NAT and route tables. Select the IAM role you created. Then click ‘Next: Add Storage’.

The default, 10 GB of storage, should be adequate for a Kubernetes master. Click ‘Next: Add Tags’.

We will add 3 tags: Name, with the name of your master; KubernetesCluster, with the name of your cluster; and kubernetes.io/cluster/<name of your cluster>, with the value owned. Click ‘Next: Configure Security Group’.

Select "Select an existing security group" and select the security group you created for your Kubernetes nodes. Click ‘Review and Launch’.

Click ‘Launch’. Select "Choose an Existing Key Pair". Select the key pair from the drop-down. Check the "I acknowledge" box. You should have the private key file saved on the machine from which you plan to secure shell into your master; otherwise you will not be able to ssh into the master! Click ‘Launch Instances’ and your master is created.

Provisioning the Auto Scaling Group

Your worker nodes should be behind an Auto Scaling group. Under Auto Scaling in the left-hand menu of the AWS console, click ‘Auto Scaling Groups’. Click ‘Create Auto Scaling Group’. On the next page, click ‘Get Started’.

Under “Choose AMI”, select RHEL 7.6 x86_64 under Community AMIs, as you did for the master.

When choosing your instance type, be mindful of what applications you want to run on your Kubernetes cluster and their resource needs. Be sure to provision a size with sufficient CPUs and memory.

Under “Configure Details”, give your autoscaling group a name and select the IAM role you configured for your Kubernetes nodes.

When selecting your storage size, be mindful of the storage requirements of your applications that you want to run on Kubernetes. A database application, for example, would need plenty of storage.

Select the security group that you configured for Kubernetes nodes.

Click ‘Create Launch Configuration’. Then select your key pair as you did for the master. Click ‘Create Launch Configuration’ and you are taken to the ‘Configure Auto Scaling Group Details’ page. Give your group a name. Select a group size. For our purpose, 2 nodes will suffice. Select the same subnet on which you placed your master. Click ‘Next: Configure Scaling policies’.

For this tutorial, we will select "Keep this group at its initial size". For a production cluster with variability in usage, you may want to use scaling policies to adjust the capacity of the group. Click ‘Next: Configure Notifications’.

We will not add any notifications in this tutorial. Click ‘Next: Configure Tags’.

We will add 3 tags: Name, with the name of your nodes; KubernetesCluster, with the name of your cluster; and kubernetes.io/cluster/<your cluster name>, with the value owned. Click ‘Review’.

Click Create Auto Scaling Group and your auto-scaling group is created!

Installing Kubernetes

Specific steps need to be followed to install Kubernetes. Run the following steps as sudo on your master(s) and worker nodes.

 # add docker repo

yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

# install container-selinux

 yum install -y http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-1.el7_6.noarch.rpm

# install docker

yum install docker-ce

# enable docker

systemctl enable --now docker

# create Kubernetes repo. The 2 urls after gpgkey have to be on 1 line.

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

# configure selinux

setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# install kubelet, kubeadm, kubectl, and Kubernetes-cni. We found that version 1.13.2 works well with RHEL 7.6.

yum install -y kubelet-1.13.2 kubeadm-1.13.2 kubectl-1.13.2 kubernetes-cni-0.6.0-0.x86_64 --disableexcludes=kubernetes –nogpgcheck

# enable kubelet

systemctl enable --now kubelet

# Run the following command as a regular user.

sudo usermod -a -G docker $USER
Creating the Kubernetes Cluster

First, add your master(s) to the control plane load balancer as follows. Log into the AWS console, EC2 service, and on the left-hand menu, under Load Balancing, click ‘Load Balancers’. Select your load balancer and click the Instances tab in the bottom window. Click ‘Edit Instances’.

Select your master(s) and click ‘Save’.

We will create the Kubernetes cluster via a config file. You will need a token, the master’s private DNS name taken from the AWS console, the Load Balancer’s IP, and the Load Balancer’s DNS name. You can generate a Kubernetes token by running the following command on a machine on which you have installed kubeadm:

kubeadm token generate

To get the load balancer’s IP, you must execute a dig command. You install dig by running the following command as sudo:

yum install bind-utils

Then you execute the following command:

dig +short <load balancer dns>

Then you create the following yaml file:

 ---
 apiVersion: kubeadm.k8s.io/v1beta1
 kind: InitConfiguration
 bootstrapTokens:
 - groups:
   - "system:bootstrappers:kubeadm:default-node-token"
   token: "<token>"
   ttl: "0s"
   usages:
   - signing
   - authentication
 nodeRegistration:
   name: "<master private dns>"
   kubeletExtraArgs:
     cloud-provider: "aws"
 ---
 apiVersion: kubeadm.k8s.io/v1beta1
 kind: ClusterConfiguration
 kubernetesVersion: "v1.13.2"
 apiServer:
   timeoutForControlPlane: 10m0s
   certSANs:
   - "<Load balancer IPV4>"
   extraArgs:
     cloud-provider: "aws"
 clusterName: kubernetes
 controlPlaneEndpoint: "<load balancer DNS>:6443"
 controllerManager:
   extraArgs:
     cloud-provider: "aws"
     allocate-node-cidrs: "false"
 dcns:
   type: CoreDNS 

You then bootstrap the cluster with the following command as sudo:

kubeadm init --config kubeadm.yaml --ignore-preflight-errors=all

I had a timeout error on the first attempt, but the command ran successfully the second time. Make a note of the output because you will need it to configure the nodes.

You then configure kubectl as follows:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

After this there are some components that need to be installed on Kubernetes on AWS:

# Grant the "admin" user complete access to the cluster

kubectl create clusterrolebinding admin-cluster-binding --clusterrole=cluster-admin --user=admin

# Add-on for networking providers, so pods can communicate. 
# Currently either calico.yaml or weave.yaml

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/weave.yaml

# Install the Kubernetes dashboard

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/dashboard.yaml

# Install the default StorageClass

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/default.storageclass.yaml

# Set up the network policy blocking the AWS metadata endpoint from the default namespace.

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/network-policy.yaml

Then you have to configure kubelet arguments:

sudo vi /var/lib/kubelet/kubeadm-flags.env

And add the following parameters:

--cloud-provider=aws --hostname-override=<the node name>

After editing the kubeadm-flags.env file:

sudo systemctl restart kubelet

Finally, you have to label your master with the provider ID. That way, any load balancers you create for this node will automatically add the node as an AWS instance:

kubectl patch node <node name> -p '{"spec":{"providerID":"aws:///<availability zone>/<instance ID>"}}'

You can join worker nodes to the cluster by running the following command as sudo, which should have been printed out after running kubeadm init on the master:

kubeadm join <load balancer dns>:6443 --token <token> --discovery-token-ca-cert-hash <discovery token ca cert hash> --ignore-preflight-errors=all

Be sure to configure kubelet arguments on each node and patch them using kubectl as you did for the master.

Your Kubernetes cluster on AWS is now ready!

In this article we will discuss how to create security groups in AWS for Kubernetes. The goal is to set up a Kubernetes cluster on AWS EC2, having provisioned your virtual machines. You are going to need two security groups: one for the control plane load balancer, and another for the VMs.

Creating a Security Group through the AWS Console

Prerequisite: You should have a VPC (virtual private cloud) set up.

Log into the AWS EC2 (or VPC) console. On the left-hand menu, under Network and Security, click Security Groups.

Click on Create Security Group.

Enter a Name and a Description for your Security Group. Then select your VPC from the drop-down menu. Click Add Rule.

You will need 2 TCP ingress rules, one over port 6443, another over port 443. We are choosing to allow the Source from anywhere. In production you may want to restrict the CIDR, IP, or security group that can reach this load balancer.

We are choosing to leave the outbound rules as default, in which all outbound traffic is permitted.

Click Create and your security group is created!

Select your security group in the console. You may want to give your security group a Name (in addition to the Group Name that you specified when creating it).

But you are not done yet: you must add tags to your security group. These tags will alert AWS that this security group is to be used for Kubernetes. Click on the Tags tab at the bottom of the window. Then click Add/Edit Tags.

You will need 2 tags:
  • Name: KubernetesCluster. Value: <the name of your Kubernetes cluster>
  • Name: kubernetes.io/cluster/<the name of your Kubernetes cluster>. Value: owned

Click Save and your tags are saved!

Creating a Security Group for the Virtual Machines

Follow the steps above to create a security group for your virtual machines. Here are the ports that you will need to open for your control plane VMs:

The master node:
  1. 22 for SSH from your bastion host
  2. 6443 for the Kubernetes API Server
  3. 2379-2380 for the ETCD server
  4. 10250 for the Kubelet health check
  5. 10252 for the Kube controller manager
  6. 10255 for the read only kubelet API
The worker nodes:
  1. 22 for SSH
  2. 10250 for the kubelet health check
  3. 30000-32767 for external applications. However, it is more likely that you will expose external applications to outside the cluster via load balancers, and restrict access to these ports to within your vpc.
  4. 10255 for the read only kubelet API

We have chosen to combine the master and the worker rules into one security group for convenience. You may want to separate them into 2 security groups for extra security.

Follow the step-by-step instructions detailed above and you will have successfully created AWS Security Groups for Kubernetes.

Analytics

Analytics is the discovery, interpretation, and communication of meaningful patterns in data; and the process of applying those patterns towards effective decision making. In other words, analytics can be understood as the connective tissue between data and effective decision making, within an organization. Organizations may apply analytics to business data to describe, predict, and improve business performance.
Big data analytics is the complex process of examining large and varied data sets -- or big data -- to uncover information including hidden patterns, unknown correlations, market trends and customer preferences that can help organizations make informed business decisions.

Glue, Athena and QuickSight are 3 services under the Analytics Group of services offered by AWS. Glue is used for ETL, Athena for interactive queries and Quicksight for Business Intelligence (BI).

Glue

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. We can create and run an ETL job with a few clicks in the AWS Management Console. We simply point AWS Glue to our data stored on AWS, and AWS Glue discovers our data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, our data is immediately searchable, queryable, and available for ETL.

In this blog we will look at 2 components of Glue – Crawlers and Jobs

Glue Crawlers

Glue crawlers can scan data in all kinds of repositories, classify it, extract schema information from it, and store the metadata automatically in the AWS Glue Data Catalog. From there it can be used to guide ETL operations.

Suppose we have a file named people.json in S3 with the below contents:

                
        {"name":"Ricky","age":22}
        {"name":"Jeff","age":36}
        {"name":"Geddy","age":62}
                
        

Below are the steps to crawl this data and create a table in AWS Glue to store this data:

  1. On the AWS Glue Console, click “Crawlers” and then “Add Crawler”
  2. Give a name for your crawler and click next
  3. Select S3 as data source and under “Include path” give the location of json file on S3.
  4. Since we are going to crawl data from only 1 dataset, select No in next screen and click Next
  5. In next screen select an IAM role which has access to the S3 data store
  6. Select Frequency as “Run on demand” in next screen.
  7. Select a Database to store the crawler’s output. I chose a database named “saravanan” in the screen below. If no database exists, Add a database using the link given
  8. Review all details in next step and click Finish
  9. On next screen, click on “Run it now” to run the crawler
  10. The crawler runs for around a minute and finally you will be able to see status as Stopping / Ready with Tables added count as 1.
  11. Now you can go to Tables link and see that a table named “people_json” has been created under “Saravanan” database.
  12. Using the “View details” Action, and then scrolling down, you can see the schema for the table which Glue has automatically inferred and generated.
Glue jobs

The AWS Glue Jobs system provides managed infrastructure to orchestrate our ETL workflow. We can create jobs in AWS Glue that automate the scripts we use to extract, transform, and transfer data to different locations. Jobs can be scheduled and chained, or they can be triggered by events such as the arrival of new data.

To add a new job using the console

  1. Open the AWS Glue console, and choose the Jobs tab.
  2. Choose Add job and follow the instructions in the Add job wizard. Below screens copy data from the table we created earlier to a parquet file named people-parquet in same S3 bucket.




    After the above job runs and completes, you will be able to verify in S3 that the output Parquet has been created.
DynamicFrame

Glue Jobs use a data structure named DynamicFrame. A DynamicFrame is similar to a Spark DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.

Instead of just using the python job which Glue generates, we can code our own jobs using DynamicFrames and have

                
        import sys
        from awsglue.transforms import *
        from awsglue.utils import getResolvedOptions
        from pyspark.context import SparkContext
        from awsglue.context import GlueContext
        
        glueContext = GlueContext(SparkContext.getOrCreate())
        
        users = glueContext.create_dynamic_frame.from_catalog(
                database="saravanan",
                table_name="users")
        users_courses = glueContext.create_dynamic_frame.from_catalog(
                database="saravanan",
                table_name="users_courses")
        users = users.select_fields(['AccountName','Id','UserName','FullName','Active'])
        .rename_field('Active','UserActive')
         users_courses = users_courses.select_fields(['UserId', 'Id','Name','Code','Active',
        'Complete','PercentageComplete','Overdue']).rename_field('Id','Course_Id')\
        .rename_field('Name','CourseName').rename_field('Code','CourseCode').rename_field 
         ('Active','CourseActive').rename_field('Complete','CourseComplete')\
         .rename_field('PercentageComplete','CoursePercentageComplete').rename_field
         ('Overdue','CourseOverDue')
        joined_table = Join.apply(users, users_courses, 'Id', 'UserId').drop_fields(['Id'])

        joined_table.toDF().write.parquet('s3://saravanan-glue/parquet_partitioned',
                partitionBy=['AccountName'])
                
        

it run through Glue. The same Glue job on next page selects specific fields from 2 Glue tables, renames some of the fields, joins the tables and writes the joined table to S3 in parquet format.

Athena

Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and we pay only for the queries that we run.

Athena is easy to use. We must simply point to our data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare our data for analysis. This makes it easy for anyone with SQL skills to quickly analyse large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing us to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.

Since Athena uses same Data Catalog as Glue, we will be able to query and view properties of the people_json table which we created earlier using Glue.

Also, we can create new table using data from S3 bucket data as shown below:


Unlike Glue, we have to explicitly give the data format (CSV, JSON, etc) and specify the column names and types while creating the table in Athena.


We can also manually create and query the tables using SQL as shown below:

QuickSight

Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy for us to deliver insights to everyone in our organization.

QuickSight lets us create and publish interactive dashboards that can be accessed from browsers or mobile devices. We can embed dashboards into our applications, providing our customers with powerful self-service analytics.

QuickSight easily scales to tens of thousands of users without any software to install, servers to deploy, or infrastructure to manage.

Below are the steps to create a sample Analysis in QuickSight:

  1. Any Analysis in QuickSight requires data from a Data Set. First click on the “Manage data” link at top right to list the Data Sets we currently have.
  2. To create a new Data Set, click the “New data set” link
  3. We can create Data Set from any of the Data sources listed here – uploading a file, S3, Athena table, etc.
  4. For our example, I am selecting Athena as data source and giving it a name “Athena source”. Then we must map this to a database / table in Athena.
  5. After we select the Athena table, QuickSight provides us an option to import the data to SPICE. SPICE is Amazon QuickSight's in-memory optimized calculation engine, designed specifically for fast, adhoc data visualization. SPICE stores our data in a system architected for high availability, where it is saved until we choose to delete it.
  6. Using the Edit/Preview Data option above allows us to select the columns to be included in Data set and rename them if required.
  7. Once we click the “Save & visualize” link above, QuickSight starts creating an Analysis for us. For our exercise we will select the Table visual type from the list.
  8. Add Account Name and User_id by dragging them from “Fields list” to “Group by” and course_active to “Value”
  9. Now we will add 2 parameters for Account Name and Learner id by clicking on Parameters at bottom left. While creating the parameter use the option “Link to a data set field” for Values and link the parameter to the appropriate column in the Athena table
  10. Once the parameters are added, create controls for the parameters. If we are adding 2 parameters with controls, we have option of showing only relevant values for second parameter based on the values selected for first parameter. For this select the “Show relevant values only” checkbox.
  11. Next add 2 Custom filters for Account Name and Learner id. These filters should me mapped to the parameters we had created earlier. For this choose the Filter type as “Custom filter” and select the “Use parameters” checkbox.
  12. Now using the Visualize option, we can verify if our Controls are working correctly
  13. To share the Dashboard with others, use the share option on top towards the right and use publish dashboard. We can search for users of the AWS account by email and publish to selective users.

Introduction to Alexa

Amazon’s Alexa is the voice activated, interactive  AI Bot, or intelligent personal assistant in the cloud that lets people speak with their Amazon Echo, Echo dot and other Amazon smart home devices. Alexa is designed to respond to number of commands and converse with people.

Alexa Skills are apps that give Alexa even more abilities. These skills can let her speak to more devices or websites.When the Alexa device is connected through wi-fi or Bluetooth to the internet, it wakes up by merely saying “ALEXA”.  Alexa Skills radically expands the bots repertoire, allowing users to perform more actions with voice-activated control through Alexa.

Overview of Alexa Skill

The most important part of ALEXA skill is its interaction design. Alexa skills don’t have visual feedback  like web or desk top applications and will guide the user through the skill using voice. All Alexa skill replies needs to tell the user clearly what the next options are.

Functional Architechture

An Alexa skill is a small application that interacts with Alexa via an AWS Lambda function

Functional Architechture

Designing the Alexa Skill

The most important aspect of the skill is its Vocal interface. The skill should be interacting naturally with the user. The components of Alexa Skill are :

  • Alexa requires a word, often called as Wake phrase which would alert the device that they can expect a command immediately after.The Default wake phrase is ALEXA. It can also be Amazon, Echo,Computer.
  • Launch Phrase is the word that tells Alexa to trigger a skill. Examples of Launch phrase are “ OPEN”, “ASK” , “START”, “LAUNCH”.
  • Invocation name is the name of the skill.
  • Intents are the goals that the user is trying to achieve by invoking the skill.
  • Utterance tells alexa what the skill should do. Apart from Static utterances such as Start and Launch, dynamic commands can be added. These Dynamic commands are called slots.
  • Each intent can contain one or more slots. A Slot is the variable that is parsed and exposed to the application code.

Sampleutterances

Alexa has a built in natural language processing engine. To Map the verbal phrase to an intent, Alexa handles the complexity of natural language processing through the help of a manually curated file Sampleutterances.txt.

The first word in each line of SampleUtteranaces.txt is the intent name. The application code reads the value of the intent name and responds appropriately. Following the intent name, is the phrase that the user says to achieve that intent. The User might tell a phrase apart from those defined as Slots in the intent. The application is free to react differently based on the presence or value of the slot. To give Alexa the best chance of understanding users, it is recommended to include as many sample utterances as possible. Depending on the skill there could be n number of ever changing sample utterances.

The below example sums up the entire vocal interface

entire vocal interface

Build and Publish a new skill

Building and publishing a new skill in Alexa comprises of  the below steps:

1. Create and Configure Skill :

Create an Alexa skill using  https://developer.amazon.com/alexa-skills-kit. This will open the skill information where we can specify the name of the skill and the invocation name.

2. Create Interaction Model:

Interaction model is a set of rules that defines the way the user interacts with your skill. As a part of an interaction model, Intents, Utterances  are defined. The intent schema should be in JSON format and it should define an array of intents, each with a name, and an optional list of dynamic parts — slots. Alexa will automatically train itself with the provided interaction model.

3. Coding the Backend system:

Once the interaction model has been designed Code and deployment of the Lambda function has to be done.

For each intent, an input/output contract has to be implemented. The input is an IntentRequest which is a representation of the user’s request and includes all the slot values.

The response from alexa can be of  multiple ways.

  • Ask the user a question and wait for response.
  • Give the details to the user and shut down.
  • Say nothing and shut down.

Alexa can either respond verbally or the response could be displayed on the phone.

4. Deploying the Backend system:

The skills can be deployed as an AWS Lambda function with code written in Java or Node.Js, Python or C#. The simplest approach would be the code in Node.Js.

5. Testing the Skill:

Testing of the Skill can either be done through the test simulator available in the Developer console account or through the device connected to the development account.

6. Publishing the Skill:

To Publish the skill, The skill has to be  submitted by filling out the “Publishing Information” and the “Privacy & Compliance” sections

This is the fifth blog in our series helping you understand all about cloud, when you are in a dilemma to choose Azure or AWS or both, if needed.

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on data analytics and the current trends on the subject.

If you would rather like to have quick look at the comparison table, Click here

This blog is intended to help you strategize your data analytics initiatives so that you can make the most informed decision possible by analyzing all the data you need in real time. Furthermore, we also will help you draw comparisons between Azure and AWS, the two leaders in cloud, and their capabilities in Big Data and Analytics as published in a handout released by Microsoft.

Beyond doubts, this is an era of data. Every touch point of your business generates volumes of data and these data cannot be simply whisked away, cast aside as valuable business insights can be unearthed with a little effort. Here’s where your Data Analytics infrastructure helps.

A 2017 Planning Guide for Data and Analytics published by Gartner written by the Analyst John Hagerty states that

The Key Findings as per the report are as follows:

  • Data and analytics must drive modern business operations, not just reflect them. Technical professionals must holistically manage an end-to-end data and analytics architecture to acquire, organize, analyze and deliver insights to support that goal.
  • Analytics are now infused in places where they never existed before.
  • Executives will seek strategies to better manage and monetize data for internal and external business ecosystems.
  • Data gravity is rapidly shifting to the cloud, with IoT, data providers and cloud-native applications leading the way. It is no longer a question of "if" for using cloud for data and analytics; it's "how."

The last point emphasizes on how cloud is playing a prominent role when it comes to Data Analytics and if you have thoughts on who and how, Gartner in its latest magic quadrant has said that AWS and Azure are the top leaders. Now, if you are in doubt whether to go the Azure way or AWS or should it be the both, here’s the comparison table showing their respective Big Data and Analytics Capabilities

 

ServiceDescriptionAWSAzure
Elastic data warehouseA fully managed data warehouse that analyzes data using business intelligence tools. RedshiftSQL Data Warehouse
Big data processingSupports technologies that break up large data processing tasks into multiple jobs, and then combine the results to enable massive parallelism.Elastic MapReduce (EMR)HDInsight
Data orchestrationProcesses and moves data between different compute and storage services, as well as on-premises data sources at specifed intervals. Data PipelineData Factory
Cloud-based ETL/data integration service that orchestrates and automates the movement and transformation of data from various sources.AWS Glue Data CatalogData Factory + Data Catalog
AnalyticsStorage and analysis platforms that create insights from massive quantities of data, or data that originates from many sources.Kinesis AnalyticsStream Analytics

Data Lake Analytics

Data Lake Store
Streaming dataAllow mass ingestion of small data inputs, typically from devices and sensors, to process and route data.Kinesis Streams

Kinesis Firehose
Event Hubs

Event Hubs Capture
Visualizationperform ad-hoc analysis, and develop business insights from data.QuickSight (Preview)Power BI
Allows visualization and data analysis tools to be embedded in applications.Power BI Embedded
SearchA scalable search server based on Apache Lucene.Elasticsearch ServiceMarketplace—Elasticsearch
Delivers full-text search and related search analytics and capabilities.CloudSearchSearch
Machine learningProduces an end-to-end workfow to create, process, refne, and publish predictive models from complex data sets.Machine LearningMachine Learning
Data discoveryProvides the ability to better register, enrich, discover, understand, and consume data sources.Data Catalog
A serverless interactive query service that uses standard SQL for analyzing databases.Amazon AthenaData Lake Analytics

Click here to read the entire guide published by Microsoft Azure Team:

This is our fourth blog in the series of blogs intended to help you embark on a cloud strategy, most importantly when you are in dilemma to choose AWS or Azure, the two prominent cloud players today.

If you had missed our earlier blogs, click here

1st Blog - Compute

2nd Blog- Storage

3rd Blog- CDN & Networking

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on the database aspect of cloud strategy.

If you would rather like to have quick look at the database comparison table, click here

Through this blog, let’s understand the database aspect of your cloud strategy. As per the guide, Database services refers to options for storing data, whether it’s a managed relational SQL database that’s globally distributed or a multi-model NoSQL database designed for any scale.

When you decide cloud, one of the critical decisions you face is which database to use - SQL or NoSQL. Though SQL has an impressive track record, NoSQL is not far behind as it is gradually making notable gains and has many proponents. Once you have picked your database, the other big decision to make is which cloud vendor to choose amongst the many vendors.

Here’s where you consider Gartner’s prediction; the research company published a document that states

“Public cloud services, such as Amazon Web Services (AWS), Microsoft Azure and IBM Cloud, are innovation juggernauts that offer highly operating-cost-competitive alternatives to traditional, on-premises hosting environments.

Cloud databases are now essential for emerging digital business use cases, next-generation applications and initiatives such as IoT. Gartner recommends that enterprises make cloud databases the preferred deployment model for all new business processes, workloads, and applications. As such, architects and tech professionals should start building a cloud-first data strategy now, if they haven't done so already”

Reinstating the trend, recently Gartner has published a new magic quadrant for infrastructure-as-a-service (IaaS) that – surprising nobody – has Amazon Web Services and Microsoft alone in the leader's quadrant and a few others thought outside of the box.

 

Now, the question really is, Azure or AWS for your cloud data? Or should it be both? Here’s a quick comparison table to guide you.

ServiceDescriptionAWSAzure
Relational databaseSQL Database is a high-performance, reliable, and secure database you can use to build data-driven applications and websites, without needing to manage infrastructure.RDSSQL Database including Postgres and MySQL
NoSQL—document storageA globally-distributed, multi-model database that natively supports multiple data models: key-value, documents, graphs, and columnar.DynamoDBCosmos DB
NoSQL—key/value storageA non-relational data store for semi-structured data. DynamoDB and SimpleDBTable Storage
CachingAn in-memory–based, distributed-caching service that provides a high-performance store typically used to offoad non-transactional work from a database.ElastiCacheRedis Cache
Database migrationFocuses on migration of database schema and data from one database format to a specifc database technology in the cloud.Database Migration Service (Preview)SQL Database Migration Wizard

Click here to read the entire guide published by Microsoft Azure Team:

In line with our latest blog series highlighting how common cloud services are made available via Azure and Amazon Web Services (AWS), as published by Microsoft, this third blog in the series helps you understand Cloud Networking and Content Delivery capabilities of both Azure and AWS.

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on cloud content delivery networking and the current trends on the subject.

If you would rather like to have quick look at the comparison table, click here

When we talk about cloud Content Delivery Network (CDN) and the related networking capabilities it includes all the hardware and software that allows you to easily provision private networks, connect your cloud application to your on-premises datacenters, and more.

According to Gartner, Content delivery networks (CDNs) are a type of distributed computing infrastructure, where devices (servers or appliances) reside in multiple points of presence on multi-hop packet-routing networks, such as the Internet, or on private WANs. A CDN can be used to distribute rich media downloads or streams, deliver software packages and updates, and provide services such as global load balancing, Secure Sockets Layer acceleration and dynamic application acceleration via WAN optimization techniques.

In simpler terms, this highly distributed server platforms are optimized to deliver content in a way that improves customer experience. Hence, it is important to decrease latency by keeping the data closer to the users, protect it from security threats while ensuring rapid streamlined content delivery including general web delivery, content purge, content caching and tracking history as long as 90 days.

As per G2Crowd.com, most organizations use CDN services, such as web caching, request routing, and server-load balancing, to reduce load times and improve website performance. Further to qualify as a CDN provider, a service provider must:

  • Allow access to a geographically dispersed network of PoPs in multiple data centers
  • Help websites access this network to deliver content to website visitors
  • Offer services designed to improve website performance
  • Provide scalable Internet bandwidth allowances according to customer needs
  • Maintain data center(s) of servers to reduce the possibility of overloading individual instances

With this background, let’s look at the AWS vs Azure comparison chart in terms of Networking and Content Delivery Capabilities:

ServiceDescriptionAWSAzure
Cloud virtual networkingProvides an isolated, private environment in the cloud.Virtual Private CloudVirtual Network
Cross-premises connectivityConnects Azure virtual networks to other Azure virtual networks or customer on-premises networks. It also supports VPN tunneling.AWS VPN GatewayVPN Gateway
Domain name system managementManage DNS records using the same credentials, billing, and support contract as other Azure services.

Service that hosts domain names, routes users to Internet applications, manages traf c to apps, and improves app availability with automatic failover.
Route 53




Route 53
DNS




Traffic Manager
Content delivery networkGlobal content delivery network that transfers audio, video, applications, images, and other les.CloudFrontContent Delivery Network
Dedicated networkEstablishes a dedicated, private network connection from a location to the cloud provider.Direct ConnectExpressRoute
Load balancingAutomatically distributes incoming application traf c to add scale, handle failover, and route to a collection of resources.Elastic Load BalancingLoad Balancer

Application Gateway

To read more about the Microsoft guide which briefs all about cloud by drawing comparisons between Azure or AWS, click here (link to PDF download)

You may also like to read our previous blogs in these series, if so, please click here:

http://cloudiqtech.com/azure-vs-aws-compute/
http://cloudiqtech.com/aws-vs-azure-cloud-storage/

Azure or AWS or Azure & AWS? What’s your cloud strategy for Storage?

This is our second blog, in our latest blog series helping you understand all about cloud, especially when you are in doubt whether to go Azure or AWS or both.

To read our first blog talking about Cloud strategy in general and Compute in particular, click here…

Moving on, in this blog let’s find what Azure or AWS offer when it comes to Storage Capabilities for your Cloud Infrastructure.

Globally CIOs are increasingly looking to cease running their own data centers and move to cloud which is evident when we read the projection made by a leading researcher, MarketsandMarkets. They had reported that the global cloud storage business sector to grow from $18.87 billion in 2015 to $65.41 billion by 2020, at a compound annual growth rate (CAGR) of 28.2 percent during the forecast period.

Reinstating the fact, 451 Research’s Voice of the Enterprise survey last year stated that Public cloud storage spending will double by next year (2017). "IT managers are recognizing the need for storage transformation to meet the realities of the new digital economy, especially in terms of improved efficiency and agility in the face of relentless data growth," said Simon Robinson, research vice president at 451 and research director of the new Voice of the Enterprise: Storage service. "It's clear from our Q4 study that emerging options, especially public cloud storage and all-flash array technologies, will be increasingly important components in this transformation" he added further.

As we see, many companies are in for Cloud Storage, undoubtedly. But the big question - Whom to choose from a gamut of leading public cloud players including big players like AZURE, AWS; Should it be AZURE alone for your cloud storage or AWS or a combination of both still prevails.

This needs a thorough understanding. To help you decide for good, we have decided to re-produce a guide, published by Microsoft that briefs Azure‘s capabilities in comparison to AWS when it comes to Cloud Strategy. And we will see the Storage part in this blog, but before, that a little backgrounder on Cloud Storage.

When we talk about cloud storage device mechanisms, we include all logical units of data storage covering from files, blocks, and datasets to objects and their relative storage interfaces. These instances of virtual storage devices are designed specifically for cloud-based provisioning and can be scaled as per need. It is to be noted that different cloud service consumers utilize different technologies to interface with virtualized cloud storage devices.

ServiceDescriptionAWSAzure
Object storageObject storage service for use cases including cloud apps,
content distribution, backup, archiving, disaster recovery,
and big data analytics.
Simple Storage Services (S3) Storage—Block Blob (for content logs, files) (Standard—Hot)
Virtual Server disk
infrastructure
SSD storage optimized for I/O intensive
read/write operations.
Elastic Block Store (EBS)Disk Storage—Page Blobs (for VHDs or other random-write type data)

Disk Storage—Premium Storage
Shared file storageA simple interface to create and configure file
systems quickly as well as share common files.
Elastic File SystemFile Storage (file share between VMs)
Archiving—cool storageA lower cost tier for storing data that is
infrequently accessed and long-lived.
S3 IA GlacierStorage—Hot, Cool & Archive Tier
BackupBackup and archival solutions that allow files and folders
to be backed-up and recovered from the cloud, and
provide off-site protection against data loss.
Backup and RecoveryBackup
Hybrid storageIntegrates on-premises IT environments with cloud
storage. Automates data management and storage, plus
supports disaster recovery.
Storage GatewayStorSimple
Bulk data transferA data transport solution that uses secure disks and
appliances to transfer substantial amounts of data.

Petabyte- to Exabyte-scale data transport solution.
AWS Import/Export Disk




AWS Import/Export Snowball

AWS Snowball Edge

AWS Snowmobile
Import/Export



Data Box
Disaster recoveryAutomates protection and replication of virtual
machines with health monitoring, recovery plans,
and recovery plan testing.
Site Recovery

For a more detailed understanding download the document here

Surprisingly, as per an article published by Gartner, “Cloud Computing is still perplexing to many CIOs even after a decade of cloud’. While cloud computing is a foundation for digital business, Gartner estimates that less than one-third of enterprises have a documented cloud strategy. This indeed comes as a surprise given the fact that cloud has evolved from a disruption to the indispensable tech of today and tomorrow, all along strategically adopted by many progressive companies.

In the same article Donna Scott, Vice President and distinguished analyst at Gartner states that “Cloud computing will become the dominant design style for new applications and for refactoring a large number of existing applications over the next 10-plus years”. She also added that “A cloud strategy clearly defines the business outcomes you seek, and how you are going to get there. Having a cloud strategy will enable you to apply its tenets quickly with fewer delays, thus speeding the arrival of your ultimate business outcomes.”

However, it is easier said than done. Many top businesses still have questions like how to make the most from cloud computing? What kind of architectures and techniques need to be strategized to support the many flavors of evolving cloud computing? Private or Public? Hybrid or Public? Azure or AWS, or it should be a hybrid combo?

Through a series of blogs we intent to bring answers to these questions. As a first one, we would like to highlight and represent a comparative cloud service map focusing on both Azure and AWS both leaders in public cloud platforms, as published by Microsoft.

The well-researched article draws detailed comparisons between Azure and AWS and how common cloud services across parameters such as Marketplace, Compute, Storage, Networking, Database, Analytics, Big Data, Intelligence, IOT, Mobile and Enterprise Integration are made available via Azure and Amazon Web Services (AWS)

It should be noted that as prominent public cloud platforms providers, Azure and AWS each offer businesses a wide and comprehensive capabilities across the globe. Many organizations have chosen either one of them or both depending upon their needs in order to gain more agility, and flexibility while minimizing the risk and maximizing the larger benefits of a multi-cloud environment.

For starters, let’s start with COMPUTE and the points one should consider and compare before deciding the Azure or AWS approach or a combination of both.

ServiceDescriptionAWSAzure
Virtual serversAllows users to deploy, manage, and maintain
OS and server software; instance types provide
configurations of CPU/RAM.

Offers a lightweight, simplified product offering users can
choose from from when building out a virtual machine.
Elastic Compute Cloud (EC2)
VMs




Amazon Lightsail
Virtual Machines





Virtual Machine Images
Container managementSupports Docker containers and allows users to run
applications on managed instance clusters.

Allows customers to store Docker formatted images. Used
to create all types of container deployments on Azure.
EC2 Container Service (ECS)




EC2 Container Registry
Container Service




Container Registry
Microservice-based
applications
Orchestrates and manages the execution, lifetime, and
resilience of complex, interrelated code components
that can be either stateless or stateful.
Service Fabric
Backend process logicIntegrates systems and runs backend processes
in response to events or schedules without
provisioning or managing servers.
LambdaFunctions


Event Grid
Job orchestrationWhen processing across hundreds or thousands
of compute nodes, this tool orchestrates the
tasks and interactions between compute
resources that are necessary.
AWS BatchBatch
ScalabilityAutomatically changes the number of instances
providing a compute workload. Users set defined
metrics and thresholds that determine if the platform
adds or removes instances.
AWS Auto ScalingVirtual Machine Scale Sets

App Service Scale Capability (PAAS)

AutoScaling
Pre-defined templatesCommunity-led templates for creating and
deploying virtual machine-based solutions.
AWS Quick StartQuickstart templates

For a more detailed understanding download the document here

A microservices-based architecture introduces agility, flexibility and supports a sustainable DEVOPS culture ensuring closer collaboration within businesses and the news is that it’s actually happening for those who embraced it.

True, monolith apps architectures have enabled businesses to benefit from IT all along as it is single coded, simple to develop, test and run. As they are also based on a logical modular hexagonal or layered architectures (Presentation Layer responsible for handling HTTP requests and responding with either HTML or JSON/XML, Business logic layer, Database access and Apps integration) they cover and tie all processes, functions and gaps to an extent.

Despite these ground level facts, monolith software, which is instrumental for businesses embrace IT in their initial stages and which even exists today, is seeing problems. The growing complex business operation conditions are purely to be blamed.

So, how do businesses today address new pressures caused by digitization, continuous technology disruptions, increased customer awareness & interceptions and sudden regulatory interventions? The answer lies in agility, flexibility and scalability of the underlying IT infrastructure- the pillars of rapid adaptability to changes.

Monolith Apps, even though it is based on a well-designed 3 tier architecture, in the long run, loses fluidity and turns rigid. Irrespective of its modularity, modules are still dependent on each other and any minimal change in one module needs generation and deployment of all artifacts in each server pool, touched across the distributed environment.

Besides whenever there is a critical problem, the blame game starts amongst the UI developers, business logic experts, backend developers, database programmers, etc as they are predominantly experts in their domains, but have little knowledge about other processes. As the complexity of business operations sets in, the agility, flexibility and scalability part of your software is highly tested in a monolithic environment.

Here’s where Microservices plays a huge role as the underlying architecture helps you break your software applications into independent loosely coupled services that can be deployed and managed solely at that level and needn’t have to depend on other services.

For example, if your project needs you to design and manage inventory, sales, shipping, and billing and UI shopping cart modules, you can break each service down as an independently deployable module. Each has its own database, where monitoring and maintenance of application servers are done independently as the architecture allows you to decentralize the database, reducing complexity. Besides it enables continuous delivery/deployment of large, complex applications which means technology also evolves along with the business.

The other important aspect is that microservices promotes a culture wherein whoever develops the service is also responsible to manage it. This avoids the handover concept and the following misunderstandings and conflicts whenever there is a crisis.

In line with the DevOps concept, Microservices enables easy collaboration between the development and operations team as they embrace and work on a common toolset that establishes common terminology, as well as processes for requirements, dependencies, and problems. There is no denying the fact that DevOps and microservices work better when applied together.

Perhaps that’s the reason companies like Netflix, Amazon, etc are embracing the concept of microservices in their products. And for other new businesses embracing it, a new environment where agility, flexibility and closer collaboration between business and technology becomes a reality providing the much-needed edge in these challenging times.

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 

US

626 120th Ave NE, B102, Bellevue,

WA, 98005.

 sales@cloudiqtech.com

INDIA

No. 3 & 4, Venkateswara Avenue,Bazaar Main Rd, Ramnagar South, Madipakkam, Chennai - 600091


© 2019 CloudIQ Technologies. All rights reserved.