Welcome to Cloud View!

The cloud computing landscape is evolving, innovating, and expanding almost at the speed of thought! Every week there are hundreds of announcements, insights, opinions, and studies discussing new trends, ideas, and technological breakthroughs. To help you get the most relevant information, we have put together a weekly curated list of must-read articles under Cloud View.

Predictions

Cloud computing in 2020

As 2019 winds down, our eyes turn to the next year. Let’s find out what the pundits are predicting for cloud computing in 2020.
Hybrid cloud is passé as onmi-cloud becomes the preferred enterprise approach; Kubernetes is all set to become the dominant talking point in tech conversations in 2020, and AI will become omnipresent.

How will upcoming technology affect humans – at work or at home?

How can IT leaders invest in today’s technology for a future payoff? Here are the top 10 strategic tech trends by Gartner – a look into the future. A must-read before planning for the coming year.

AI and Robotics

No talk of the future is complete without AI and robotics! Keith Shaw, editor-in-chief of Robotics Business Review, shares some insights on where robotics is heading.

Industry Insights

Microsoft and Google Cloud’s battle for the enterprise

Google Cloud is aggressively trying to elbow into the cloud market, leading to a battle royale. All the cloud giants have deep pockets and are not afraid of investing BIG, which is great for innovation and enterprise clients. Check out the whole article for a more in-depth look at how the race for cloud dominance is playing out.  

Confidential Computing

Microsoft brings confidential computing capability to Kubernetes workloads – an additional layer of security to keep business data safe.

This week at CloudIQ

CloudIQ Technologies is now a Kubernetes Certified Service Provider (KCSP)

Cloud Native Computing Foundation (CNCF) recognizes CloudIQ as one of the few (122 as on Nov 19, 2019) service providers worldwide to receive KCSP certification. As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

Understanding Kubernetes Concepts – A QuickStart Guide 

Container technology made software development more agile, however, containers need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in. Here is a quick start guide to understanding Kubernetes concepts.

At a Glance


LATEST BLOG

Welcome to Cloud View!

The cloud computing landscape is evolving, innovating, and expanding almost at the speed of thought! Every week there are hundreds of announcements, insights, opinions, and studies discussing new trends, ideas, and technological breakthroughs. To help you get the most relevant information, we have put together a weekly curated list of must-read articles under Cloud View.

Predictions

Cloud computing in 2020

As 2019 winds down, our eyes turn to the next year. Let’s find out what the pundits are predicting for cloud computing in 2020.
Hybrid cloud is passé as onmi-cloud becomes the preferred enterprise approach; Kubernetes is all set to become the dominant talking point in tech conversations in 2020, and AI will become omnipresent.

How will upcoming technology affect humans – at work or at home?

How can IT leaders invest in today’s technology for a future payoff? Here are the top 10 strategic tech trends by Gartner – a look into the future. A must-read before planning for the coming year.

AI and Robotics

No talk of the future is complete without AI and robotics! Keith Shaw, editor-in-chief of Robotics Business Review, shares some insights on where robotics is heading.

Industry Insights

Microsoft and Google Cloud’s battle for the enterprise

Google Cloud is aggressively trying to elbow into the cloud market, leading to a battle royale. All the cloud giants have deep pockets and are not afraid of investing BIG, which is great for innovation and enterprise clients. Check out the whole article for a more in-depth look at how the race for cloud dominance is playing out.  

Confidential Computing

Microsoft brings confidential computing capability to Kubernetes workloads – an additional layer of security to keep business data safe.

This week at CloudIQ

CloudIQ Technologies is now a Kubernetes Certified Service Provider (KCSP)

Cloud Native Computing Foundation (CNCF) recognizes CloudIQ as one of the few (122 as on Nov 19, 2019) service providers worldwide to receive KCSP certification. As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

Understanding Kubernetes Concepts – A QuickStart Guide 

Container technology made software development more agile, however, containers need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in. Here is a quick start guide to understanding Kubernetes concepts.

At a Glance

Seattle, WA, November 21, 2019 – CloudIQ Technologies, a fast-growing premier cloud consulting and solutions provider, announced today its status as a Kubernetes Certified Service Provider (KCSP).

Cloud Native Computing Foundation(CNCF) is a non-profit member of the Linux Foundation that promotes cloud-native computing and the certification body for Kubernetes. CNCF recognizes CloudIQ as one of the few (121 as on date) service providers worldwide to receive KCSP certification.

“We are so thrilled to welcome CloudIQ Technologies into the CNCF family,” said Dan Kohn, Executive Director of the Cloud Native Computing Foundation. “As part of our select group of Kubernetes Certified Services Providers (KCSPs), CloudIQ will be an integral part of our Kubernetes platform outreach and will be available to help organizations successfully adopt Kubernetes.”

As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

CloudIQ’s technology experts, who have more than a decade of varied technical and business experience, deliver the full range of cloud-native / open source solutions to its clients through its team of Azure & AWS certified architects, Certified Kubernetes Administrators (CKA), Certified Kubernetes Developers, and DevSecOps professionals.

“We have been extremely happy to work closely with the CNCF community to push the boundaries of the Kubernetes platform further and pass on the benefits of cloud-native technologies to our clients,” said Mr. Prem Kandalu, CEO. “As a part of the Cloud Native Computing Foundation, we will continue to develop new capabilities and strengthen our cloud strategy and business. We hope our contribution to the pooled expertise at CNCF delivers greater speed, scale, economic advantages to developers across the board and leads to the growth of the entire K8s community.”

About CloudIQ Tech:

CloudIQ is a leading cloud consulting and solutions firm with deep industry expertise in building cloud-native solutions that help customers realize the cost, scale, and security benefits of the cloud. As strategic advisors to several Fortune 500 companies, CloudIQ provides operational strategies, monitoring & assistance with command centers, and application transformation guidance for comprehensive cloud migrations.

For more information visit our site https://cloudiqtech.com
Or you are welcome to contact us by calling us at +1 (206) 203-4151
Or mailing at sales@cloudiqtech.com

Cosmos DB is Microsoft Azure’s hugely successful tool to help their clients manage data on a global scale. This multi-model database service allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide.

As Cosmos DB supports multiple data models, you can take advantage of fast, single-digit-millisecond data access using any of your favorite APIs, including SQL, MongoDB, Cassandra, Tables, or Gremlin. Being a NoSQL database, anyone with experience in MongoDB can easily work with Cosmos DB. Meanwhile, by supporting SQL APIs, it makes it easy to interact with SQL knowledge.

Why Cosmos DB?

For organizations looking to build a flexible and scalable database that is globally distributed, Cosmos DB is especially useful as it

  • provides a ready-to-use, extremely dynamic database service
  • guarantees low latency of less than 10 milliseconds when reading data and less than 15 milliseconds when writing data.
  • offers customers a faster, completely seamless experience.
  • offers 99.99% availability

Here are some tried and tested tips from our senior Azure expert on how to get the most out of Cosmos DB.

Data Modeling

Cosmos DB is great because it lets you model semi/unstructured aggregates and dynamic entities. This means it’s very easy to model ever-changing entities, entities that don’t all share the same attributes and hierarchical aggregates. To model for Cosmos, you need to think in terms of hierarchy and aggregates instead of entities and relations. NoSQL lets you, say, store a thing that has other things, which have things of their own. Give me the whole hierarchy of things back. So, you don’t have a person, rental addresses, and a relation between them. Instead, you have rental records, which aggregate for each person what rental addresses they’ve had.

The following NO SQL rules will perfectly match cosmos DB too.

  • Should be used as a complement to an existing or additional database.
  • Deal with PACELC theorem, an extension of CAP theorem.
  • A data modeler should think in terms of queries instead of in terms of storage
Connection Types

Cosmos DB can be connected to the application by 2 modes.

  • Gateway mode
  • Direct mode

The gateway mode is the default mode in Microsoft.Azure.DocumentDB SDK. It uses HTTPS port with a single endpoint. While the direct mode is the default for .NET V3 SDK. This uses TCP and HTTPS for connectivity.

The gateway mode is better while your application runs within a corporate network with strict network rules because it has a single endpoint that can be configured to the firewall for security. Meanwhile, the gateway mode performance will be low when compared to the direct mode.

There is also an option to connect through a RESTful programming model provided by the SDK. All the CRUD can be done through REST calls. This method is recommended if you need a client App to do database access directly instead of providing API. Thus, the overhead of providing an API wrapper to consume the Cosmos DB can be eradicated, and a performance payoff can be prevented.

The recommended mode is always the direct mode in most of the scenarios, which provides better performance.

I am taking the popular volcano data for comparing the response time between the SDK and RESTful model.

Query Executed in both the versions.

SELECT * FROM c where c.Status="Holocene"
Response details of the SDK

The query was responded with the data in 3710 ms.

Response details of the RESTful model

The query was responded with data in 5810 ms.

If we developed an API with this mode, then the response time taken by our API, too needs to be considered. So, using RESTful model in API will be a trade-off with the performance. Use this mode to query from the client directly.

Partitioning the DB

The logical partition is the primary key to provide performance to the cosmos transactions. For example, if you have a database with a list of student data of around 1500 from a school. Now, a simple search for a student named “Peter” will lead to a search through all the 1500 data entries, which consumes high throughput to get the result. Now split the data logically by the “Grade” they belong to. Now, querying like a student named “Peter” from “Grade 5” will lead the system to search only 30 or 40 students from the total 1500 saving the throughput consumption and elevating the performance as compared to the earlier approach.

The common phenomenon will be a. Any key such as city, state, country kind of properties can be used as Partition Key.

a. Any key such as city, state, country kind of properties can be used as Partition Key.
 b. No partition is required up to 10 GB.
 c. The query must be provided with the partition key to be searched.
 d. The property selected as partition key must be in all the documents in the container.

I am taking the popular Volcano data for testing the performance with and without Partition Key.

1. I initially created a collection without a partition key. The performance for the given query is

SELECT * FROM c where c.Status="Holocene"
Resultset
MetricValue
Partition key range id0
Retrieved document count200
Retrieved document size (in bytes)100769
Output document count200
Output document size (in bytes)101069
Index hit document count200
Index lookup time (ms)0.21
Document load time (ms)1.29
Query engine execution time (ms)0.33
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.52

2. Then I recreated the same collection with “/country” as a partition. Now the same query with Japan as partition value results with the given values.

Resultset
MetricValue
Partition key range id0
Retrieved document count16
Retrieved document size (in bytes)7887
Output document count16
Output document size (in bytes)7952
Index hit document count16
Index lookup time (ms)0.23
Document load time (ms)0.17
Query engine execution time (ms)0.06
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.01
Tune the Index

Indexing is always a top priority item in the checklist to tune performance. Indexing is an internal job that keeps track of the metadata about the data, which helps in finding the result set data for a query. By default, all the properties of a Cosmos container will be indexed. But it is not necessary as it is a useless overhead to the DB, and also keeping track of a lot of data consumes enough RUs, which is not cost-effective. The better approach is to exclude all the paths from indexing and add only age paths that are used for querying in the application. 

Indexing ModeWith Default IndexingWith Custom Indexing
RU's Consumed3146.1519.83
Output Doc Count100100
Doc load time (In ms)646.512.32
Query engine execution time (In ms)434.264.96
System function execution time (In ms)57.032.41
Paging

The execution of the query, by default, will return 100 documents. We can increase the number of documents by providing “maxItemCount” value. The maximum size will be 1000 documents. But it is not meaningful to take 1000 documents at a time from the DB except in some scenarios. To improve the performance and to show a crisp result set to the user, always reduce the “maxItemCount”. Unlike SQL databases, pagination is the default behavior in Cosmos. So, even though you are going to provide maximum count as 1000 and the result for the query is going to be more than that, then the Cosmos is going to return a token named “Continuation Token”. This token consists of a unique value that points to the query we did and the page number. If the total result for the query is provided by the Cosmos, then we can do the logic of pagination at the front end. Thus, by reducing the number of documents per response, we can save the throughput consumption, network data transaction, and increase performance.

Throughput Management

RU ’s or Request Units is the common term we always come across when using Cosmos DB. When you read a document from a container or write a document to it, then you are trading an RU with the Cosmos for your operation. It is like currency in our common world. Without money, you can’t buy anything, and without RU, you can’t query anything. You can buy only the items that cost equivalent to or less than the money you have in hand. Similarly, you can query the data that equals the RU’s you have.

If you have large data and if the query needs to traverse deep into the collection, then you need enough RU’s. Adding a property in the index will consume some RU’s. So, if all the properties are indexed, then your RU’s pay off will be high, and you will lack RU’s for querying. So, always add in the index only the properties that are needed while querying.

Index properly, save the RU’s, and utilize it during querying. For example, if you have 100K documents in the container. The Cosmos DB is consuming 1000 RU’s to do a query operation up to 50K documents with the default indexing, then our query will not reach the rest of the 50K documents, and we will never receive them in our result set. If appropriately indexed, the same query will consume only 400 RU’s to penetrate all the 100K documents.

Startup latency

The very first query will always be a bit late because of the time it takes to awaken the connection. To overcome this latency, it is best practice to call “OpenAsync()” in SDK 2 in the beginning while creating the connection.

await client.OpenAsync();
Singleton Connection

The best approach is to connect the DB and keep the connection alive for all the instances of the application. Also, polling the DB within a period will keep the connection alive. This reduces the DB connectivity latency.

Regions

Make sure the Cosmos and the applications are grouped within the same Azure Region. This reduces the latency a lot. The lowest possible latency is achieved by ensuring the calling application is located within the same Azure region as the provisioned Azure Cosmos DB endpoint.

Programming Best Practices
  • Always use the latest SDK version.
  • Use Streaming API (in SDK 3) that can receive and return data without serializing. Helpful when your API is just a relay and not doing any logical operations on the data.
  • Tune the queries.
  • Implement retry logic with reasonable waiting time to prevent throttle during a busy time.

By carefully analyzing all the above factors, we can improve the Cosmos DB query performance substantially.

Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery, and other functionalities to help businesses scale and grow.

It gives organizations a secure and robust platform to develop their custom cloud-based solutions and has several unique features that make it one of the most reliable and flexible cloud platform such as

  • Mobile-friendly access through AWS Mobile Hub and AWS Mobile SDK
  • Fully managed purpose-built Databases
  • Serverless cloud functions
  • Range of storage options that are affordable and scalable.
  • Unbeatable security and compliance

Following are some core services offered by AWS:

AWS Core services
  1. An EC2 instance is a virtual server in Amazon’s Elastic Compute Cloud (EC2) for running applications on the AWS infrastructure.
  2. Amazon Elastic Block Store (EBS) is a cloud-based block storage system provided by AWS that is best used for storing persistent data.
  3. Amazon Virtual Private Cloud (Amazon VPC) enables us to launch AWS resources into a virtual network that we have defined. This virtual network closely resembles a traditional network that we would operate in our own data center, with the benefits of using the scalable infrastructure of AWS.
  4. Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
  5. AWS security groups (SGs) are associated with EC2 instances and provide security at the protocol and port access level. Each security group — working much the same way as a firewall — contains a set of rules that filter traffic coming into and out of an EC2 instance.

Let us look more deeply at one of AWS’s core services – AWS CloudFormation – that is key for managing workloads on AWS.

1.   CloudFormation

AWS CloudFormation is a service that helps us model and set up our Amazon Web Services resources so that we can spend less time managing those resources and more time focusing on our applications that run in AWS.  We create a template that describes all the AWS resources that we want (like Amazon EC2 instances or S3 buckets), and AWS CloudFormation takes care of provisioning and configuring those resources for us. We don’t need to individually create and configure AWS resources and figure out what’s dependent on what; AWS CloudFormation handles all of that.

A stack is a collection of AWS resources that you can manage as a single unit. In other words, we can create, update, or delete a collection of resources by creating, updating, or deleting stacks. All the resources in a stack are defined by the stack’s AWS CloudFormation template.

2.   CloudFormation template

CloudFormation templates can be written in either JSON or YAML.  The structure of the template in YAML is given below:

---
AWSTemplateFormatVersion: "version date"

Description:
  String
Metadata:
  template metadata
Parameters:
  set of parameters
Mappings:
  set of mappings
Conditions:
  set of conditions
Resources:
  set of resources
Outputs:
  set of outputs

In the above yaml file,

  1. AWSTemplateFormatVersion – The AWS CloudFormation template version that the template conforms to.
  2. Description – A text string that describes the template.
  3. Metadata – Objects that provide additional information about the template.
  4. Parameters – Values to pass to our template at runtime (when we create or update a stack). We can refer to parameters from the Resources and Outputs sections of the template.
  5. Mappings – A mapping of keys and associated values that we can use to specify conditional parameter values, like a lookup table. We can match a key to a corresponding value by using the Fn::FindInMap intrinsic function in the Resources and Outputs sections.
  6. Conditions – Conditions that control whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update. For example, we can conditionally create a resource that depends on whether the stack is for a production or test environment.
  7. Resources – Specifies the stack resources and their properties, such as an Amazon Elastic Compute Cloud instance or an Amazon Simple Storage Service bucket.  We can refer to resources in the Resources and Outputs sections of the template.
  8. Outputs – Describes the values that are returned whenever we view our stack’s properties. For example, we can declare an output for an S3 bucket name and then call the AWS cloudformation describe-stacks AWS CLI command to view the name.

Resources is the only required section in the CloudFormation template.  All other sections are optional.

3.   CloudFormation template to create S3 bucket

S3template.yml

Resources:
  HelloBucket:
    Type: AWS::S3::Bucket

In AWS Console, go to CloudFormation and click on Create Stack

Upload the template file which we created.  This will get stored in an S3 location, as shown below.

Click next and give a stack name

Click Next and then “Create stack”.  After a few minutes, you can see that the stack creation is completed.

Clicking on the Resource tab, you can see that the S3 bucket has been created with name “s3-stack-hellobucket-buhpx7oucrgn”.  AWS has provided this same since we didn’t specify the BucketName property in YAML.

Note that deleting the stack will delete the S3 bucket which it had created.

4.   Intrinsic functions

AWS CloudFormation provides several built-in functions that help you manage your stacks.

In the below example, we create two resources – a Security Group and an EC2 Instance, which uses this Security Group.  We can refer to the Security Group resource using the !Ref function.

Ec2template.yml

Resources:
  Ec2Instance:
    Type: 'AWS::EC2::Instance'
    Properties:
      SecurityGroups:
        - !Ref InstanceSecurityGroup
      KeyName: mykey
      ImageId: ''
  InstanceSecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Enable SSH access via port 22
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'
          CidrIp: 0.0.0.0/0

Some other commonly used intrinsic functions are

  1. Fn::GetAtt – returns the value of an attribute from a resource in the template.
  2. Fn::Join – appends a set of values into a single value, separated by the specified delimiter. If a delimiter is an empty string, the set of values are concatenated with no delimiter.
  3. Fn::Sub – substitutes variables in an input string with values that you specify. In our templates, we can use this function to construct commands or outputs that include values that aren’t available until we create or update a stack.
5.   Parameters

Parameters enable us to input custom values to your template each time you create or update a stack.

TemplateWithParameters.yaml

Parameters: 
  InstanceTypeParameter: 
    Type: String
    Default: t2.micro
    AllowedValues: 
      - t2.micro
      - m1.small
      - m1.large
    Description: Enter t2.micro, m1.small, or m1.large. Default is t2.micro.
Resources:
  Ec2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType:
        Ref: InstanceTypeParameter
      ImageId: ami-0ff8a91507f77f867
6.   Pseudo Parameters

Pseudo parameters are parameters that are predefined by AWS CloudFormation. We do not declare them in our template. Use them the same way as we would a parameter as the argument for the Ref function.

Commonly used pseudo parameters:

  1. AWS: Region – Returns a string representing the AWS Region in which the encompassing resource is being created, such as us-west-2
  2. AWS::StackName – Returns the name of the stack as specified during cloudformation create-stack, such as teststack
7.   Mappings

The optional Mappings section matches a key to a corresponding set of named values. For example, if you want to set values based on a region, we can create a mapping that uses the region name as a key and contains the values we want to specify for each specific region. We use the Fn::FindInMap intrinsic function to retrieve values in a map.

We cannot include parameters, pseudo parameters, or intrinsic functions in the Mappings section.

TemplateWithMappings.yaml

AWSTemplateFormatVersion: "2010-09-09"
Mappings: 
  RegionMap: 
    us-east-1:
      HVM64: ami-0ff8a91507f77f867
      HVMG2: ami-0a584ac55a7631c0c
    us-west-1:
      HVM64: ami-0bdb828fd58c52235
      HVMG2: ami-066ee5fd4a9ef77f1
    eu-west-1:
      HVM64: ami-047bb4163c506cd98
      HVMG2: ami-0a7c483d527806435
    ap-northeast-1:
      HVM64: ami-06cd52961ce9f0d85
HVMG2: ami-053cdd503598e4a9d
    ap-southeast-1:
      HVM64: ami-08569b978cc4dfa10
      HVMG2: ami-0be9df32ae9f92309
Resources: 
  myEC2Instance: 
    Type: "AWS::EC2::Instance"
    Properties: 
      ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", HVM64]
      InstanceType: m1.small
8.   Outputs

The optional Outputs section declares output values that we can import into other stacks (to create cross-stack references), return in response (to describe stack calls), or view on the AWS CloudFormation console. For example, we can output the S3 bucket name for a stack to make the bucket easier to find.

In the below example, the output named StackVPC returns the ID of a VPC, and then exports the value for cross-stack referencing with the name VPCID appended to the stack’s name.

Outputs:
  StackVPC:
    Description: The ID of the VPC
    Value: !Ref MyVPC
    Export:
      Name: !Sub "${AWS::StackName}-VPCID"

As organizations start to create and maintain clusters in AKS (Azure Kubernetes Service), they also need to use cloud-based identity and access management service to access other Azure cloud resources and services. The Azure Active Directory (AAD) pod identity is a service that gives users this control by assigning identities to individual pods.  

Without these controls, accounts may get access to resources and services they don’t require. And it can also become hard for IT teams to track which set of credentials were used to make changes.

Azure AD Pod identity is just one small part of the container and Kubernetes management process and as you delve deeper, you will realize the true power that Kubernetes and Containers bring to your DevOps ecosystem.

Here is a more detailed look at how to use AAD pod identity for connecting pods in AKS cluster with Azure Key Vault.

Pod Identity

Integrate your key management system with Kubernetes using pod identity. Secrets, certificates, and keys in a key management system become a volume accessible to pods. The volume is mounted into the pod, and its data is available directly in the container file system for your application.

On an existing AKS cluster –

Deploy Key Vault FlexVolume to your AKS cluster with this command:

  • kubectl create -f https://raw.githubusercontent.com/Azure/kubernetes-keyvault-flexvol/master/deployment/kv-flexvol-installer.yaml
1. Create the Deployment

Run this command to create the aad-pod-identity deployment on an RBAC-enabled cluster:

  • kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml

Or run this command to deploy to a non-RBAC cluster:

  • kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment.yaml
2. Create an Azure Identity

Create azure managed identity

Command:- az identity create -g ResourceGroupNameOfAKsService -n aks-pod-identity(ManagedIdentity)

Output:-  

{
"clientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ",
"clientSecretUrl": "https://control-westus.identity.azure.net/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/aks_dev_rg_wu/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-pod-identity/credentials?tid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&oid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx&aid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ",
"id": "/subscriptions/xxxxxxxx-xxxx-XXXX-XXXX-XXXXXXXXXXXX/resourcegroups/aks_dev_rg_wu/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-pod-identity",
"location": "westus",
"name": "aks-pod-identity",
"principalId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"resourceGroup": "au10515_aks_dev_rg_wu",
"tags": {},
"tenantId": XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX ",
"type": "Microsoft.ManagedIdentity/userAssignedIdentities"
}

Assign Cluster SPN Role

Command for Getting AKSServicePrincipalID:- az aks show -g <resourcegroup> -n <name> –query servicePrincipalProfile.clientId -o tsv

Command:-az role assignment create –role “Managed Identity Operator” –assignee <AKSServicePrincipalId> –scope < ID of Managed identity>

Assign Azure Identity Roles

Command:- az role assignment create –role Reader –assignee <Principal ID of Managed identity> –scope <KeyVault Resource ID>

Set policy to access keys in your Key Vault

Command:- az keyvault set-policy -n dev-kv –key-permissions get –spn  <Client ID of Managed identity>

Set policy to access secrets in your Key Vault

Command:- az keyvault set-policy -n dev-kv –secret-permissions get –spn <Client ID of Managed identity>

Set policy to access certs in your Key Vault

Command:- az keyvault set-policy -n dev-kv –certificate-permissions get –spn <Client ID of Managed identity>

3. Install the Azure Identity

Save this Kubernetes manifest to a file named aadpodidentity.yaml:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
name: <a-idname>
spec:
type: 0
ResourceID: /subscriptions/<subid>/resourcegroups/<resourcegroup>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<name>
ClientID: <clientId>

Replace the placeholders with your user identity values. Set type: 0 for user-assigned MSI or type: 1 for Service Principal.

Finally, save your changes to the file, then create the AzureIdentity resource in your cluster:

kubectl apply -f aadpodidentity.yaml

4. Install the Azure Identity Binding

Save this Kubernetes manifest to a file named aadpodidentitybinding.yaml:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: demo1-azure-identity-binding
spec:
  AzureIdentity: <a-idname>
  Selector: <label value to match>

Replace the placeholders with your values. Ensure that the AzureIdentity name matches the one in aadpodidentity.yaml.

Finally, save your changes to the file, then create the AzureIdentityBinding resource in your cluster:

kubectl apply -f aadpodidentitybinding.yaml

Sample Nginx Deployment for accessing key vault secret using Pod Identity

Save this sample nginx pod manifest file named nginx-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx-flex-kv-podid
    aadpodidbinding: 
  name: nginx-flex-kv-podid
spec:
  containers:
  - name: nginx-flex-kv-podid
    image: nginx
    volumeMounts:
    - name: test
      mountPath: /kvmnt
      readOnly: true
  volumes:
  - name: test
    flexVolume:
      driver: "azure/kv"
      options:
        usepodidentity: "true"         # [OPTIONAL] if not provided, will default to "false"
        keyvaultname: ""               # the name of the KeyVault
        keyvaultobjectnames: ""        # list of KeyVault object names (semi-colon separated)
        keyvaultobjecttypes: secret    # list of KeyVault object types: secret, key or cert (semi-colon separated)
        keyvaultobjectversions: ""     # [OPTIONAL] list of KeyVault object versions (semi-colon separated), will get latest if empty
        resourcegroup: ""              # the resource group of the KeyVault
        subscriptionid: ""             # the subscription ID of the KeyVault
        tenantid: ""            # the tenant ID of the KeyVault
Azure AD Pod Identity points to remember when implementing in cluster
  • Azure AD Pod Identity is currently bound to the default namespace. Deploying an Azure Identity and it’s binding to other namespaces, will not work!
  • Pods from all namespaces can be executed in the context of an Azure Identity deployed to the default namespace (related to point 1)
  • Every Pod Developer can add the aadpodidbinding label to his/her pod and use your Azure Identity
  • Azure Identity Binding is not using default Kubernetes label selection mechanism

There is little doubt that data will guide the next generation of business strategy and will bring new efficiencies across industries. But for that to happen, organizations must be able to extract insights from their data.

Qubole is an ideal platform to activate end-to-end data processing in organizations. It combines all types of data – structured, unstructured, and legacy offline data – into a single data pipeline and turns it into rich insights by adding AI, ML, and deep analytics tools to the mix.

It scales seamlessly to accommodate more users and new data without adding administrative overheads and lowers cloud costs significantly. Simply put, Qubole is a platform that puts big data on the cloud to power business decisions based on real-time analytics.

At CloudIQ Technologies, our data experts have deployed Qubole’s cloud-native data systems for many of our clients, and the results have been outstanding. Here is an article from one of our data engineers that provides an overview of how to setup Qubole to use AWS environment and create and run spark clusters.

AWS Access Configuration:

In order for Qubole to create and run a cluster, we have to grant Qubole access to our AWS environment. We can grant access based on a key or a role. We will use role-based authentication.

Step 1: Login to Qubole

Step 2: Click on the menu at the top left corner and select “Account Settings” under the Control Panel.

Step 3: Scroll down to Access settings

Step 4: Switch Access mode to “IAM Role”

Step 5: Copy the Trusted Principal AWS Account ID and External ID

Step 6: Use the copied values to create a QuboleAccessRole in the AWS account (using the cloudformation template)

Step 7: Copy the Role ARN of the QuboleAccessRole and enter it in the Role ARN field

Step 8: Enter the S3 bucket location where the Qubole metadata will be stored in the “Default Location” field.

Step 9: Click Save

Spark Cluster
Create a cluster

The below steps will help create a new Spark cluster in Qubole.

Step 1: Click on the top-left dropdown menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Spark” and click “Next”

Step 4: Provide a name for the cluster in the “Cluster Labels” field

Step 5: Select the version of Spark to run, Master Node Type, Worker Node Type, Minimum and Maximum nodes

Step 6: Select Region as us-west-2

Step 7: Select Availability Zone as us-west-2a

Step 8: Click “Next”

Step 9: In the Composition screen, you can select the type of nodes that will be spun up.

Step 10: In the Advanced Configuration screen, proceed to EC2 settings

Step 11: Enter “QuboleDualIAMRole” in the “Instance Profile” field

Step 12: Select “AppVPC” in VPC field

Step 13: Select “AppPrivateSNA” under Subnet field

Step 14: Enter the ip address of the Bastion node in the “Bastion Node” field

Step 15: Scroll to the bottom and enter “AppQuboleClusterSG” (security group for the cluster) in the “Persistent Security Group” field

Step 16: Click on “Create”

Run a cluster

To start a cluster, click on the dropdown menu on the top left corner and select cluster. Now click on “Start” button next to the cluster that needs to be started. A cluster is also automatically started when a job is submitted for the cluster.

Submit a job

One of the simplest ways to run a spark job is to submit it through the workbench. You can navigate to the workbench from the drop-down menu at the top left corner. In the workbench, click on “+Create New”. Then select “Spark” next to the title of the job. Once you select Spark, an optional drop-down appears where you can choose “Python”. In the last drop-down menu, select the spark cluster where you want to execute the job. If this cluster is not active, it will be activated automatically. Enter your spark job in the window below. When complete, click on “Run” to run the job.

Airflow Cluster

Airflow scheduler can be used to run various jobs in a sequence. Let’s take a look at configuring an Airflow cluster in Qubole.

Setting up DataStore

The first step in creating an airflow cluster is to set up a datastore. Make sure that the MySQL db is up and running and contains a database for airflow. Now, select “Explore” from the dropdown menu at the top left corner. On the left hand menu, drop down the selection menu showing “Qubole Hive” and select “+Add Data Store”

In the new screen, provide a name for the data store. Select “MySQL” as the database type. Enter the database name for the airflow database (The database should already be created in MySQL). Enter the host address as “hmklabsbienvironment.cq8z1kp7ikd8.us-west-2.rds.amazonaws.com”. Enter the username and password. Make sure to select “Skip Validation”. Since the MySQL db is in a private VPC, Qubole does not have access to it and will not be able to validate.

Configuring Airflow Cluster

Step 1: Click on the top left drop-down menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Airflow” in the type of cluster and click “Next”

Step 4: Give a cluster name. Select the airflow version, node type.

Step 5: Select the datastore which points to the MySQL

Step 6: Select the us-west-2 as the Region

Step 7: Select us-west-2a as the Availability zone

Step 8: Click next to go to Advanced Configuration

Step 9: Select AppVPC as the VPC

Step 10: Select AppPrivateSNA as the Subnet

Step 11: Enter the Bastion Node information

Step 12: Scroll to the bottom and enter AppQuboleClusterSG as the Persistent Security Groups

Step 13: Click on create

Once the cluster is created, you can run it by clicking on “Start” next to the cluster’s name.

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 

US

626 120th Ave NE, B102, Bellevue,

WA, 98005.

 sales@cloudiqtech.com

INDIA

Chennai One IT SEZ,

Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097


© 2019 CloudIQ Technologies. All rights reserved.