Welcome to Cloud View! We hope you had a joyful and fun holiday! NOW with the festivities behind us, it’s time for business again! Let’s start with what IT Leaders are planning for 2020. Leader’s Speak Industry insights AI Solutions As a digital-first service provider, we at CloudIQ help implement AI solutions across organizations. Impact […]


Welcome to Cloud View!

The cloud computing landscape is evolving, innovating, and expanding almost at the speed of thought! Every week there are hundreds of announcements, insights, opinions, and studies discussing new trends, ideas, and technological breakthroughs. To help you get the most relevant information, we have put together a weekly curated list of must-read articles under Cloud View.


Cloud computing in 2020

As 2019 winds down, our eyes turn to the next year. Let’s find out what the pundits are predicting for cloud computing in 2020.
Hybrid cloud is passé as onmi-cloud becomes the preferred enterprise approach; Kubernetes is all set to become the dominant talking point in tech conversations in 2020, and AI will become omnipresent.

How will upcoming technology affect humans – at work or at home?

How can IT leaders invest in today’s technology for a future payoff? Here are the top 10 strategic tech trends by Gartner – a look into the future. A must-read before planning for the coming year.

AI and Robotics

No talk of the future is complete without AI and robotics! Keith Shaw, editor-in-chief of Robotics Business Review, shares some insights on where robotics is heading.

Industry Insights

Microsoft and Google Cloud’s battle for the enterprise

Google Cloud is aggressively trying to elbow into the cloud market, leading to a battle royale. All the cloud giants have deep pockets and are not afraid of investing BIG, which is great for innovation and enterprise clients. Check out the whole article for a more in-depth look at how the race for cloud dominance is playing out.  

Confidential Computing

Microsoft brings confidential computing capability to Kubernetes workloads – an additional layer of security to keep business data safe.

This week at CloudIQ

CloudIQ Technologies is now a Kubernetes Certified Service Provider (KCSP)

Cloud Native Computing Foundation (CNCF) recognizes CloudIQ as one of the few (122 as on Nov 19, 2019) service providers worldwide to receive KCSP certification. As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

Understanding Kubernetes Concepts – A QuickStart Guide 

Container technology made software development more agile, however, containers need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in. Here is a quick start guide to understanding Kubernetes concepts.

At a Glance

Seattle, WA, November 21, 2019 – CloudIQ Technologies, a fast-growing premier cloud consulting and solutions provider, announced today its status as a Kubernetes Certified Service Provider (KCSP).

Cloud Native Computing Foundation(CNCF) is a non-profit member of the Linux Foundation that promotes cloud-native computing and the certification body for Kubernetes. CNCF recognizes CloudIQ as one of the few (121 as on date) service providers worldwide to receive KCSP certification.

“We are so thrilled to welcome CloudIQ Technologies into the CNCF family,” said Dan Kohn, Executive Director of the Cloud Native Computing Foundation. “As part of our select group of Kubernetes Certified Services Providers (KCSPs), CloudIQ will be an integral part of our Kubernetes platform outreach and will be available to help organizations successfully adopt Kubernetes.”

As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

CloudIQ’s technology experts, who have more than a decade of varied technical and business experience, deliver the full range of cloud-native / open source solutions to its clients through its team of Azure & AWS certified architects, Certified Kubernetes Administrators (CKA), Certified Kubernetes Developers, and DevSecOps professionals.

“We have been extremely happy to work closely with the CNCF community to push the boundaries of the Kubernetes platform further and pass on the benefits of cloud-native technologies to our clients,” said Mr. Prem Kandalu, CEO. “As a part of the Cloud Native Computing Foundation, we will continue to develop new capabilities and strengthen our cloud strategy and business. We hope our contribution to the pooled expertise at CNCF delivers greater speed, scale, economic advantages to developers across the board and leads to the growth of the entire K8s community.”

About CloudIQ Tech:

CloudIQ is a leading cloud consulting and solutions firm with deep industry expertise in building cloud-native solutions that help customers realize the cost, scale, and security benefits of the cloud. As strategic advisors to several Fortune 500 companies, CloudIQ provides operational strategies, monitoring & assistance with command centers, and application transformation guidance for comprehensive cloud migrations.

For more information visit our site https://cloudiqtech.com
Or you are welcome to contact us by calling us at +1 (206) 203-4151
Or mailing at sales@cloudiqtech.com

Cosmos DB is Microsoft Azure’s hugely successful tool to help their clients manage data on a global scale. This multi-model database service allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide.

As Cosmos DB supports multiple data models, you can take advantage of fast, single-digit-millisecond data access using any of your favorite APIs, including SQL, MongoDB, Cassandra, Tables, or Gremlin. Being a NoSQL database, anyone with experience in MongoDB can easily work with Cosmos DB. Meanwhile, by supporting SQL APIs, it makes it easy to interact with SQL knowledge.

Why Cosmos DB?

For organizations looking to build a flexible and scalable database that is globally distributed, Cosmos DB is especially useful as it

  • provides a ready-to-use, extremely dynamic database service
  • guarantees low latency of less than 10 milliseconds when reading data and less than 15 milliseconds when writing data.
  • offers customers a faster, completely seamless experience.
  • offers 99.99% availability

Here are some tried and tested tips from our senior Azure expert on how to get the most out of Cosmos DB.

Data Modeling

Cosmos DB is great because it lets you model semi/unstructured aggregates and dynamic entities. This means it’s very easy to model ever-changing entities, entities that don’t all share the same attributes and hierarchical aggregates. To model for Cosmos, you need to think in terms of hierarchy and aggregates instead of entities and relations. NoSQL lets you, say, store a thing that has other things, which have things of their own. Give me the whole hierarchy of things back. So, you don’t have a person, rental addresses, and a relation between them. Instead, you have rental records, which aggregate for each person what rental addresses they’ve had.

The following NO SQL rules will perfectly match cosmos DB too.

  • Should be used as a complement to an existing or additional database.
  • Deal with PACELC theorem, an extension of CAP theorem.
  • A data modeler should think in terms of queries instead of in terms of storage
Connection Types

Cosmos DB can be connected to the application by 2 modes.

  • Gateway mode
  • Direct mode

The gateway mode is the default mode in Microsoft.Azure.DocumentDB SDK. It uses HTTPS port with a single endpoint. While the direct mode is the default for .NET V3 SDK. This uses TCP and HTTPS for connectivity.

The gateway mode is better while your application runs within a corporate network with strict network rules because it has a single endpoint that can be configured to the firewall for security. Meanwhile, the gateway mode performance will be low when compared to the direct mode.

There is also an option to connect through a RESTful programming model provided by the SDK. All the CRUD can be done through REST calls. This method is recommended if you need a client App to do database access directly instead of providing API. Thus, the overhead of providing an API wrapper to consume the Cosmos DB can be eradicated, and a performance payoff can be prevented.

The recommended mode is always the direct mode in most of the scenarios, which provides better performance.

I am taking the popular volcano data for comparing the response time between the SDK and RESTful model.

Query Executed in both the versions.

SELECT * FROM c where c.Status="Holocene"
Response details of the SDK

The query was responded with the data in 3710 ms.

Response details of the RESTful model

The query was responded with data in 5810 ms.

If we developed an API with this mode, then the response time taken by our API, too needs to be considered. So, using RESTful model in API will be a trade-off with the performance. Use this mode to query from the client directly.

Partitioning the DB

The logical partition is the primary key to provide performance to the cosmos transactions. For example, if you have a database with a list of student data of around 1500 from a school. Now, a simple search for a student named “Peter” will lead to a search through all the 1500 data entries, which consumes high throughput to get the result. Now split the data logically by the “Grade” they belong to. Now, querying like a student named “Peter” from “Grade 5” will lead the system to search only 30 or 40 students from the total 1500 saving the throughput consumption and elevating the performance as compared to the earlier approach.

The common phenomenon will be a. Any key such as city, state, country kind of properties can be used as Partition Key.

a. Any key such as city, state, country kind of properties can be used as Partition Key.
 b. No partition is required up to 10 GB.
 c. The query must be provided with the partition key to be searched.
 d. The property selected as partition key must be in all the documents in the container.

I am taking the popular Volcano data for testing the performance with and without Partition Key.

1. I initially created a collection without a partition key. The performance for the given query is

SELECT * FROM c where c.Status="Holocene"
Partition key range id0
Retrieved document count200
Retrieved document size (in bytes)100769
Output document count200
Output document size (in bytes)101069
Index hit document count200
Index lookup time (ms)0.21
Document load time (ms)1.29
Query engine execution time (ms)0.33
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.52

2. Then I recreated the same collection with “/country” as a partition. Now the same query with Japan as partition value results with the given values.

Partition key range id0
Retrieved document count16
Retrieved document size (in bytes)7887
Output document count16
Output document size (in bytes)7952
Index hit document count16
Index lookup time (ms)0.23
Document load time (ms)0.17
Query engine execution time (ms)0.06
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.01
Tune the Index

Indexing is always a top priority item in the checklist to tune performance. Indexing is an internal job that keeps track of the metadata about the data, which helps in finding the result set data for a query. By default, all the properties of a Cosmos container will be indexed. But it is not necessary as it is a useless overhead to the DB, and also keeping track of a lot of data consumes enough RUs, which is not cost-effective. The better approach is to exclude all the paths from indexing and add only age paths that are used for querying in the application. 

Indexing ModeWith Default IndexingWith Custom Indexing
RU's Consumed3146.1519.83
Output Doc Count100100
Doc load time (In ms)646.512.32
Query engine execution time (In ms)434.264.96
System function execution time (In ms)57.032.41

The execution of the query, by default, will return 100 documents. We can increase the number of documents by providing “maxItemCount” value. The maximum size will be 1000 documents. But it is not meaningful to take 1000 documents at a time from the DB except in some scenarios. To improve the performance and to show a crisp result set to the user, always reduce the “maxItemCount”. Unlike SQL databases, pagination is the default behavior in Cosmos. So, even though you are going to provide maximum count as 1000 and the result for the query is going to be more than that, then the Cosmos is going to return a token named “Continuation Token”. This token consists of a unique value that points to the query we did and the page number. If the total result for the query is provided by the Cosmos, then we can do the logic of pagination at the front end. Thus, by reducing the number of documents per response, we can save the throughput consumption, network data transaction, and increase performance.

Throughput Management

RU ’s or Request Units is the common term we always come across when using Cosmos DB. When you read a document from a container or write a document to it, then you are trading an RU with the Cosmos for your operation. It is like currency in our common world. Without money, you can’t buy anything, and without RU, you can’t query anything. You can buy only the items that cost equivalent to or less than the money you have in hand. Similarly, you can query the data that equals the RU’s you have.

If you have large data and if the query needs to traverse deep into the collection, then you need enough RU’s. Adding a property in the index will consume some RU’s. So, if all the properties are indexed, then your RU’s pay off will be high, and you will lack RU’s for querying. So, always add in the index only the properties that are needed while querying.

Index properly, save the RU’s, and utilize it during querying. For example, if you have 100K documents in the container. The Cosmos DB is consuming 1000 RU’s to do a query operation up to 50K documents with the default indexing, then our query will not reach the rest of the 50K documents, and we will never receive them in our result set. If appropriately indexed, the same query will consume only 400 RU’s to penetrate all the 100K documents.

Startup latency

The very first query will always be a bit late because of the time it takes to awaken the connection. To overcome this latency, it is best practice to call “OpenAsync()” in SDK 2 in the beginning while creating the connection.

await client.OpenAsync();
Singleton Connection

The best approach is to connect the DB and keep the connection alive for all the instances of the application. Also, polling the DB within a period will keep the connection alive. This reduces the DB connectivity latency.


Make sure the Cosmos and the applications are grouped within the same Azure Region. This reduces the latency a lot. The lowest possible latency is achieved by ensuring the calling application is located within the same Azure region as the provisioned Azure Cosmos DB endpoint.

Programming Best Practices
  • Always use the latest SDK version.
  • Use Streaming API (in SDK 3) that can receive and return data without serializing. Helpful when your API is just a relay and not doing any logical operations on the data.
  • Tune the queries.
  • Implement retry logic with reasonable waiting time to prevent throttle during a busy time.

By carefully analyzing all the above factors, we can improve the Cosmos DB query performance substantially.

Kubernetes is the reigning market leader when it comes to container orchestration! Any organization working with the container ecosystem is either already using Kubernetes or considering it. However, despite the undoubted ease and speed Kubernetes bring to the container ecosystem, they also need specialized expertise to deploy and manage.

Many organizations consider the DIY approach to Kubernetes and if you have an in-house IT team with the requisite experience or if your requirements are large enough to justify the cost of hiring a dedicated Kubernetes team – then an internal Kubernetes strategy could certainly be beneficial.

However, if you don’t fall in the category mentioned above, then managed Kubernetes is the smartest and most cost-effective way ahead. With professionals in the picture, you can be assured of getting long term strategy, seamless implementation, and dedicated on-going service, which will

  • reduce deployment time
  • provide 24×7 support
  • handle all upgrades and fixes
  • troubleshoot as and when needed

Kubernetes solution providers offer a wide range of services – from fully managed to bare bone implementation to preconfigured Kubernetes environments on SaaS models to training for your in-house staff.

Look at your operational needs and your budget and explore the market for Kubernetes services options before you pick the service and the digital partner that ticks all your boxes.   

Meanwhile, do look at our tutorial on troubleshooting Kubernetes deployments.

Kubernetes deployments issues are not always easy to troubleshoot. In some cases, the errors can be resolved easily, and, in some cases, detecting errors requires us to dig deeper and run various commands to identify and resolve the issues.

The first step is to list down all pods after installing your application. The following command lists down all pods in all namespaces.

kubectl get pods -A

If you find any issues on the pod status, you can then use kubectl describe, kubectl logs, kubectl exec commands to get more detailed information.

Debugging Pods
Pod Status Shows ImagePullBackOff or ErrImagePull

This status indicates that your pod could not run because the pod could not pull the image from the container registry. To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier>

This command will provide more information about the issue.

  • Image name or tag incorrect.
    • Check the image name and tag and try to pull the image manually on the host using docker pull to verify.
  • Authentication failure related to Container registry.
    • Check the secrets, roles, service principal related to your container registry and try to pull the image manually on the host using docker pull to verify.
docker pull <image-name:tag> 
Pod Status Shows Waiting

This status indicates your pod has been scheduled to a worker node, but it can’t run on that machine. To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier> -n <namespace>

The most common causes related to this issue are

  • Image name or tag incorrect.
    • Check the image name and tag and try to pull the image manually on the host using docker pull to verify.
  • Authentication failure related to Container registry.
    • Check the secrets, roles, service principal related to your container registry and try to pull the image manually on the host using docker pull to verify.
Pod Status Shows Pending or CrashLoopBackOff

This status indicates your pod could not be scheduled on a node for various reasons like resource constraints (insufficient CPU or memory resources), volume mounting issues.  To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier> -n <namespace>

This command will provide more information about the issue. Most common issues are

  • Insufficient resources
    • If resources are insufficient, clean up your existing resources or scaling your nodes (vertically or horizontally) to increase the resources.
  • Volume mounting
    • Check your volume’s mounting definition and storage classes.
  • Using hostPort
    • When you bind a Pod to a hostPort, there are a limited number of places that a pod can be scheduled. In most cases, hostPort is unnecessary, try using a Service object to expose your pod. If you do require hostPort, then you can only schedule as many Pods as there are nodes in your Kubernetes cluster
Pod is crashing or unhealthy

Sometimes the scheduled pods are crashing or unhealthy.  Run kubectl logs to find the root cause.

kubectl logs <pod_identifier> -n <namespace>

If you have multiple containers, run the following command to find the root cause.

kubectl logs <pod_identifier> -c <container_name> -n <namespace>

If your container has previously crashed, you can access the previous container’s crash log with:

kubectl logs –previous <pod_identifier> -c <container_name> -n <namespace>

If your pod is running but with 0/1 ready state or 0/2 ready state (in case if you have multiple containers in your pod), then you need to verify the readiness. Check the health check (readiness probe) in this case.

Most common issues are

  • Application issues
    • Run the below command to check the logs.
               kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
               kubectl describe <pod_identifier> -n <namespace>
  • Readiness probe health check failed
    • Check the health check (readiness probe) in this case. Also, check the READY column of the kubectl get pods output to find out if the readiness probe is executing positively.
    • Run the below command to check the logs.
         kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
         kubectl describe <pod_identifier> -n <namespace>
  • Liveness probe health check failed
    • Check the health check (liveness probe) in this case. Also, check the RESTARTS column of the kubectl get pods output. To find out if the liveness probe is executing positively.
    • Run the below command to check the logs.
         kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
         kubectl describe <pod_identifier> -n <namespace>
Pod is running but has application issues

In some cases, the pods are running, but the output of the application is incorrect. In this case, you should run the following to find the root cause.

  • Run the below command and identify the issue.
kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • If you are interested in the last n lines of logs run
kubectl logs <pod_identifier> -c <container_name> --tail <n-lines> -n <namespace>
  • Run the commands inside the container using
kubectl exec <pod_identifier> -c <container_name> /bin/bash -n <namespace>

Run the commands like ‘curl’ or ‘ps’ ‘ls’ to troubleshoot the issue after you get into the container.

Pod is running and working but cannot access through services

In some cases, the pods are working as expected but cannot access through the services. Most common causes of this issue are

  • Service not registered properly
    • Check that the service exists and describe the service and validate the pod selectors to run the following commands.
kubectl get svc
kubectl describe svc <svc-name>
kubectl get endpoints
  • Run the following commands to verify pod selector
kubectl get pods --selector=name={name},{label-name}={label-value}
  • The service may be deployed in a different namespace.
    • Verify that the pod’s containerPort matches up with the Service’s containerPort
  • Service is registered properly but has a DNS issue
    • Get into the container using exec command and run nslookup using the following command
kubectl get endpoints
kubectl exec <pod_identifier> -c <container_name> /bin/bash
nslookup <service-name>
  • If you have any issues to run the command for curl or nslookup. Deploy debugging pod using image yauritux/busybox-curl in the same namespace to verify. Please run the following commands to verify
kubectl run --generator=run-pod/v1 -it --rm <name> --image=yauritux/busybox-curl -n <namespace>
  • Run the following to verify within the container
curl http://<servicename>
telnet <service-ip> <service-port>
nslookup <servicename>

Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery, and other functionalities to help businesses scale and grow.

It gives organizations a secure and robust platform to develop their custom cloud-based solutions and has several unique features that make it one of the most reliable and flexible cloud platform such as

  • Mobile-friendly access through AWS Mobile Hub and AWS Mobile SDK
  • Fully managed purpose-built Databases
  • Serverless cloud functions
  • Range of storage options that are affordable and scalable.
  • Unbeatable security and compliance

Following are some core services offered by AWS:

AWS Core services
  1. An EC2 instance is a virtual server in Amazon’s Elastic Compute Cloud (EC2) for running applications on the AWS infrastructure.
  2. Amazon Elastic Block Store (EBS) is a cloud-based block storage system provided by AWS that is best used for storing persistent data.
  3. Amazon Virtual Private Cloud (Amazon VPC) enables us to launch AWS resources into a virtual network that we have defined. This virtual network closely resembles a traditional network that we would operate in our own data center, with the benefits of using the scalable infrastructure of AWS.
  4. Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
  5. AWS security groups (SGs) are associated with EC2 instances and provide security at the protocol and port access level. Each security group — working much the same way as a firewall — contains a set of rules that filter traffic coming into and out of an EC2 instance.

Let us look more deeply at one of AWS’s core services – AWS CloudFormation – that is key for managing workloads on AWS.

1.   CloudFormation

AWS CloudFormation is a service that helps us model and set up our Amazon Web Services resources so that we can spend less time managing those resources and more time focusing on our applications that run in AWS.  We create a template that describes all the AWS resources that we want (like Amazon EC2 instances or S3 buckets), and AWS CloudFormation takes care of provisioning and configuring those resources for us. We don’t need to individually create and configure AWS resources and figure out what’s dependent on what; AWS CloudFormation handles all of that.

A stack is a collection of AWS resources that you can manage as a single unit. In other words, we can create, update, or delete a collection of resources by creating, updating, or deleting stacks. All the resources in a stack are defined by the stack’s AWS CloudFormation template.

2.   CloudFormation template

CloudFormation templates can be written in either JSON or YAML.  The structure of the template in YAML is given below:

AWSTemplateFormatVersion: "version date"

  template metadata
  set of parameters
  set of mappings
  set of conditions
  set of resources
  set of outputs

In the above yaml file,

  1. AWSTemplateFormatVersion – The AWS CloudFormation template version that the template conforms to.
  2. Description – A text string that describes the template.
  3. Metadata – Objects that provide additional information about the template.
  4. Parameters – Values to pass to our template at runtime (when we create or update a stack). We can refer to parameters from the Resources and Outputs sections of the template.
  5. Mappings – A mapping of keys and associated values that we can use to specify conditional parameter values, like a lookup table. We can match a key to a corresponding value by using the Fn::FindInMap intrinsic function in the Resources and Outputs sections.
  6. Conditions – Conditions that control whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update. For example, we can conditionally create a resource that depends on whether the stack is for a production or test environment.
  7. Resources – Specifies the stack resources and their properties, such as an Amazon Elastic Compute Cloud instance or an Amazon Simple Storage Service bucket.  We can refer to resources in the Resources and Outputs sections of the template.
  8. Outputs – Describes the values that are returned whenever we view our stack’s properties. For example, we can declare an output for an S3 bucket name and then call the AWS cloudformation describe-stacks AWS CLI command to view the name.

Resources is the only required section in the CloudFormation template.  All other sections are optional.

3.   CloudFormation template to create S3 bucket


    Type: AWS::S3::Bucket

In AWS Console, go to CloudFormation and click on Create Stack

Upload the template file which we created.  This will get stored in an S3 location, as shown below.

Click next and give a stack name

Click Next and then “Create stack”.  After a few minutes, you can see that the stack creation is completed.

Clicking on the Resource tab, you can see that the S3 bucket has been created with name “s3-stack-hellobucket-buhpx7oucrgn”.  AWS has provided this same since we didn’t specify the BucketName property in YAML.

Note that deleting the stack will delete the S3 bucket which it had created.

4.   Intrinsic functions

AWS CloudFormation provides several built-in functions that help you manage your stacks.

In the below example, we create two resources – a Security Group and an EC2 Instance, which uses this Security Group.  We can refer to the Security Group resource using the !Ref function.


    Type: 'AWS::EC2::Instance'
        - !Ref InstanceSecurityGroup
      KeyName: mykey
      ImageId: ''
    Type: 'AWS::EC2::SecurityGroup'
      GroupDescription: Enable SSH access via port 22
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'

Some other commonly used intrinsic functions are

  1. Fn::GetAtt – returns the value of an attribute from a resource in the template.
  2. Fn::Join – appends a set of values into a single value, separated by the specified delimiter. If a delimiter is an empty string, the set of values are concatenated with no delimiter.
  3. Fn::Sub – substitutes variables in an input string with values that you specify. In our templates, we can use this function to construct commands or outputs that include values that aren’t available until we create or update a stack.
5.   Parameters

Parameters enable us to input custom values to your template each time you create or update a stack.


    Type: String
    Default: t2.micro
      - t2.micro
      - m1.small
      - m1.large
    Description: Enter t2.micro, m1.small, or m1.large. Default is t2.micro.
    Type: AWS::EC2::Instance
        Ref: InstanceTypeParameter
      ImageId: ami-0ff8a91507f77f867
6.   Pseudo Parameters

Pseudo parameters are parameters that are predefined by AWS CloudFormation. We do not declare them in our template. Use them the same way as we would a parameter as the argument for the Ref function.

Commonly used pseudo parameters:

  1. AWS: Region – Returns a string representing the AWS Region in which the encompassing resource is being created, such as us-west-2
  2. AWS::StackName – Returns the name of the stack as specified during cloudformation create-stack, such as teststack
7.   Mappings

The optional Mappings section matches a key to a corresponding set of named values. For example, if you want to set values based on a region, we can create a mapping that uses the region name as a key and contains the values we want to specify for each specific region. We use the Fn::FindInMap intrinsic function to retrieve values in a map.

We cannot include parameters, pseudo parameters, or intrinsic functions in the Mappings section.


AWSTemplateFormatVersion: "2010-09-09"
      HVM64: ami-0ff8a91507f77f867
      HVMG2: ami-0a584ac55a7631c0c
      HVM64: ami-0bdb828fd58c52235
      HVMG2: ami-066ee5fd4a9ef77f1
      HVM64: ami-047bb4163c506cd98
      HVMG2: ami-0a7c483d527806435
      HVM64: ami-06cd52961ce9f0d85
HVMG2: ami-053cdd503598e4a9d
      HVM64: ami-08569b978cc4dfa10
      HVMG2: ami-0be9df32ae9f92309
    Type: "AWS::EC2::Instance"
      ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", HVM64]
      InstanceType: m1.small
8.   Outputs

The optional Outputs section declares output values that we can import into other stacks (to create cross-stack references), return in response (to describe stack calls), or view on the AWS CloudFormation console. For example, we can output the S3 bucket name for a stack to make the bucket easier to find.

In the below example, the output named StackVPC returns the ID of a VPC, and then exports the value for cross-stack referencing with the name VPCID appended to the stack’s name.

    Description: The ID of the VPC
    Value: !Ref MyVPC
      Name: !Sub "${AWS::StackName}-VPCID"

There is little doubt that data will guide the next generation of business strategy and will bring new efficiencies across industries. But for that to happen, organizations must be able to extract insights from their data.

Qubole is an ideal platform to activate end-to-end data processing in organizations. It combines all types of data – structured, unstructured, and legacy offline data – into a single data pipeline and turns it into rich insights by adding AI, ML, and deep analytics tools to the mix.

It scales seamlessly to accommodate more users and new data without adding administrative overheads and lowers cloud costs significantly. Simply put, Qubole is a platform that puts big data on the cloud to power business decisions based on real-time analytics.

At CloudIQ Technologies, our data experts have deployed Qubole’s cloud-native data systems for many of our clients, and the results have been outstanding. Here is an article from one of our data engineers that provides an overview of how to setup Qubole to use AWS environment and create and run spark clusters.

AWS Access Configuration:

In order for Qubole to create and run a cluster, we have to grant Qubole access to our AWS environment. We can grant access based on a key or a role. We will use role-based authentication.

Step 1: Login to Qubole

Step 2: Click on the menu at the top left corner and select “Account Settings” under the Control Panel.

Step 3: Scroll down to Access settings

Step 4: Switch Access mode to “IAM Role”

Step 5: Copy the Trusted Principal AWS Account ID and External ID

Step 6: Use the copied values to create a QuboleAccessRole in the AWS account (using the cloudformation template)

Step 7: Copy the Role ARN of the QuboleAccessRole and enter it in the Role ARN field

Step 8: Enter the S3 bucket location where the Qubole metadata will be stored in the “Default Location” field.

Step 9: Click Save

Spark Cluster
Create a cluster

The below steps will help create a new Spark cluster in Qubole.

Step 1: Click on the top-left dropdown menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Spark” and click “Next”

Step 4: Provide a name for the cluster in the “Cluster Labels” field

Step 5: Select the version of Spark to run, Master Node Type, Worker Node Type, Minimum and Maximum nodes

Step 6: Select Region as us-west-2

Step 7: Select Availability Zone as us-west-2a

Step 8: Click “Next”

Step 9: In the Composition screen, you can select the type of nodes that will be spun up.

Step 10: In the Advanced Configuration screen, proceed to EC2 settings

Step 11: Enter “QuboleDualIAMRole” in the “Instance Profile” field

Step 12: Select “AppVPC” in VPC field

Step 13: Select “AppPrivateSNA” under Subnet field

Step 14: Enter the ip address of the Bastion node in the “Bastion Node” field

Step 15: Scroll to the bottom and enter “AppQuboleClusterSG” (security group for the cluster) in the “Persistent Security Group” field

Step 16: Click on “Create”

Run a cluster

To start a cluster, click on the dropdown menu on the top left corner and select cluster. Now click on “Start” button next to the cluster that needs to be started. A cluster is also automatically started when a job is submitted for the cluster.

Submit a job

One of the simplest ways to run a spark job is to submit it through the workbench. You can navigate to the workbench from the drop-down menu at the top left corner. In the workbench, click on “+Create New”. Then select “Spark” next to the title of the job. Once you select Spark, an optional drop-down appears where you can choose “Python”. In the last drop-down menu, select the spark cluster where you want to execute the job. If this cluster is not active, it will be activated automatically. Enter your spark job in the window below. When complete, click on “Run” to run the job.

Airflow Cluster

Airflow scheduler can be used to run various jobs in a sequence. Let’s take a look at configuring an Airflow cluster in Qubole.

Setting up DataStore

The first step in creating an airflow cluster is to set up a datastore. Make sure that the MySQL db is up and running and contains a database for airflow. Now, select “Explore” from the dropdown menu at the top left corner. On the left hand menu, drop down the selection menu showing “Qubole Hive” and select “+Add Data Store”

In the new screen, provide a name for the data store. Select “MySQL” as the database type. Enter the database name for the airflow database (The database should already be created in MySQL). Enter the host address as “hmklabsbienvironment.cq8z1kp7ikd8.us-west-2.rds.amazonaws.com”. Enter the username and password. Make sure to select “Skip Validation”. Since the MySQL db is in a private VPC, Qubole does not have access to it and will not be able to validate.

Configuring Airflow Cluster

Step 1: Click on the top left drop-down menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Airflow” in the type of cluster and click “Next”

Step 4: Give a cluster name. Select the airflow version, node type.

Step 5: Select the datastore which points to the MySQL

Step 6: Select the us-west-2 as the Region

Step 7: Select us-west-2a as the Availability zone

Step 8: Click next to go to Advanced Configuration

Step 9: Select AppVPC as the VPC

Step 10: Select AppPrivateSNA as the Subnet

Step 11: Enter the Bastion Node information

Step 12: Scroll to the bottom and enter AppQuboleClusterSG as the Persistent Security Groups

Step 13: Click on create

Once the cluster is created, you can run it by clicking on “Start” next to the cluster’s name.

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 


626 120th Ave NE, B102, Bellevue,

WA, 98005.



Chennai One IT SEZ,

Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097

© 2019 CloudIQ Technologies. All rights reserved.

Get in touch

Please contact us using the form below


626 120th Ave NE, B102, Bellevue, WA, 98005.

+1 (206) 203-4151



Chennai One IT SEZ,

Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097