On July 21, 2015, when Kubernetes v1.0 was released, it redefined the container technology landscape. All the bottlenecks of application deployment, scaling, and management in containers was made simpler and faster with intelligent automation.

Container technology made software development more agile and brought in resource efficiency – they made scaling smoother and faster. However, they also need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in.

Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management.  It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation.

What does Kubernetes do?

Kubernetes allows you to leverage the full potential of your container ecosystem. With automation, it streamlines container workflow and frees up the IT team to concentrate on their core areas of application development by removing the need to manage container networking, storage, logs, alerting, etc. Overall, it automates deploying, scaling, and managing of containerized applications on a cluster of servers.

Key Benefits of Kubernetes

Flexibility for scaling – it enables horizontal infrastructure scaling by quickly adding or removing new severs. Kubernetes has the option of automating vertical scaling, too, by taking into account application provided metrics.

Health check and self-healing designed in Kubernetes allow it to maintain high availability of applications and infrastructure.

Enhanced deployment speed – with automated rollouts and rollbacks, canary deployments, and wide-ranging support for a variety of programming languages, Kubernetes speeds up the process of building, testing, and deploying new software.

Let’s understand more about Kubernetes concepts

1. Kubernetes Objects

Kubernetes contains several abstractions that represent the state of our system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what our Kubernetes cluster is doing. These abstractions are represented by objects in the Kubernetes API.  The basic Kubernetes objects include:

  • Pod
  • Service
  • Volume
  • Namespace

In this blog, we will look at the Pod and Service objects.

2. Pods

A pod is a higher level of abstraction grouping containerized components.  A pod consists of one or more containers that are guaranteed to be co-located on the host machine and can share resources.  The basic scheduling unit in Kubernetes is a pod.  The host machines on which the pods are scheduled are called Nodes.

3. Pod definition yaml

Kubernetes objects are mostly created by declaring their configuration in a yaml file.
Given below is yaml file to define a simple pod.

apiVersion: v1 
kind: Pod
metadata: 
name: nginx 
labels: 
name: nginx 
spec: 
containers: 
- name: nginx 
image: nginx 
ports: 
- containerPort: 8080

In the above yaml file,

  1. apiVersion – denotes which version of the Kubernetes API we are using to create this object.
  2. kind – specifies what kind of object we want to create.  For Pod object, the apviVersion is always v1.
  3. metadata – has data to uniquely identify the object (name) and labels.
  4. Labels are key/value pairs that are attached to objects.  Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users but do not directly imply semantics to the core system.  So, in the above example, instead of “name: nginx” we can have “appname: nginx”, “name: mynginxapp” or anything we like.
  5. Spec – defines the object specification and differs for each object type.  For Pod object, the spec has an array of containers since a pod consists of one or more containers.
  6. For each container, we provide below attributes:
  • name – name of the container.  This can be different from name of pod and is not related to it.
  • Image – name of the docker image to be used to build this container
  • ports – the ports in this container to be exposed outside the pod.  Here we are running the nginx web-server on port 8080 and exposing it.

Suppose we have the above pod definition in a file named pod-definition.yaml, the pod is created by executing the below Kubernetes command:

$ kubectl create -f pod-definition.yaml

4. Pod communication and need for services

Each pod in Kubernetes is assigned a unique Pod IP address within the cluster, which allows applications to use ports without the risk of conflict. 

Within the pod, all containers can reference each other on localhost, but a container within one pod has no way of directly addressing another container within another pod; for that, it must use the Pod IP Address.

An application developer should never use the Pod IP Address though, to reference / invoke a capability in another pod, as Pod IP addresses are ephemeral - the specific pod that they are referencing may be assigned to another Pod IP address on restart.  Instead, we should use a reference to a Service, which holds a reference to the target pod at the specific Pod IP Address.

5. Services

In Kubernetes, a Service is an abstraction that defines a logical set of Pods and a policy by which to access them. The set of Pods targeted by a Service is usually determined by a selector.

Sample YAML for a service to expose the pod(s) which we created earlier is given below:

apiVersion: v1
 kind: Service
 metadata:
   name: my-service
 spec:
   selector:
     name: nginx
   ports:
     - protocol: TCP
       port: 80
       targetPort: 8080
       nodePort: 30230
   type: NodePort 

This yaml file defines a service named my-service which is used to access the Pods which have a label ‘name: nginx’. 

  1. The selector field of the service must match the label field of the Pods to which we want to connect.
  2. There are 3 ports defined in above YAML file:
  • Port is the port number that makes a service visible to other services running within the same Kubernetes cluster.
  • Target Port is the port on the POD where the service is running.  This is an optional field; if not provided, Kubernetes assigns the same value as Port field
  • Node port is the port on which the service can be accessed by external users.  NodePort can only have values from 30000 to 32767.  If this optional field is not provided in the definition, Kubernetes automatically assigns a value for NodePort service.

To create the service object, enter the above yaml code in a file named service-defn.yaml and execute the command given below:

$ kubectl create -f service-defn.yaml

6. Types of Services

In the above example, we have type: Nodeport for the service.  The different values allowed for the type field are:

  1. ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default Service type.
  2. NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort).   A ClusterIP Service, to which the NodePort Service routes, is automatically created.  We will be able to contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>
  3. LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
  4. ExternalName: Maps the Service to the contents of the externalName field (e.g., foo.bar.example.com), by returning a CNAME record with its value.  No proxying of any kind is set up.  We need CoreDNS version 1.7 or higher to use the ExternalName type.

Many of you are running your mission-critical applications on containers, and if you haven't already deployed Kubernetes to manage your container ecosystem, then chances are you soon will.

If you are considering a Kubernetes implementation, then there are several ways to go about it –

  • In-house Kubernetes deployment – if you have a large enough IT team with the requisite expertise in Kubernetes architecture and deployment, then getting your Kubernetes cluster up and running in-house is certainly a possibility. Kubernetes deployment is a complex process and requires a mix of specific skill sets. Also running and monitoring a Kubernetes platform requires the full-time services of a dedicated team, and your requirement must justify this additional cost.
  • SaaS Solutions for Kubernetes– if your business needs are specific and straightforward, then you can explore the market for pre-designed Kubernetes offerings on a SaaS payment model.
  • Fully outsourced (managed) Kubernetes services – if budget permits and your business demands, then bringing in the professionals is a safe and hassle-free solution. From infrastructure assessments to building a Kubernetes strategy to engineering, deploying, and managing enterprise-wide Kubernetes solutions – you can outsource your entire project to experts.
  • Many service providers like CloudIQ also offer day-to-day management and support as well as Kubernetes training to your IT staff to set up internal management expertise.

If the last decade of cloud has taught us anything, it is that when it comes to technology, bringing in professionals to do the job always turns out to be the best option in the long run. Kubernetes is a sophisticated platform that requires specialized competencies. Here is a look at one of our tutorials on Kubernetes Networking – how it all works under the hood.

KUBERNETES NETWORKING – DATA PLANE

In Kubernetes, applications run as a set of pods with their own IP address and port. Kubernetes provides an abstract way to expose the applications/pods as a network service. Various forms of the service abstractions include ClusterIP, NodePort, Load Balancer & Ingress. When service requests enters Kubernetes cluster, the service abstractions have to be directed to individual service endpoints of Pods. This data plane function is implemented using a Linux  Kernel feature - iptables.

Iptables is used to set up, maintain, and inspect the tables of IP packet filter rules in the Linux kernel. Several different tables may be defined. Each table contains a number of built-in chains and may also contain user-defined chains. Each chain is a list of rules which can match a set of packets. Each rule specifies what to do with a packet that matches. This is called a 'target', which may be a jump (-j) to a user-defined chain in the same table.

The service(SVC) to service endpoints(SEP) are programmed using KUBE-SERVICES user-defined chains in the NAT(Network Address Translation) table. The contents of the iptables can be extracted using “iptables-save” command

# Generated by iptables-save v1.6.0 on Mon Sep 16 08:00:17 2019
*nat
:PREROUTING ACCEPT [1:52]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [23:1438]
:POSTROUTING ACCEPT [10:592]
:DOCKER - [0:0]
:IP-MASQ-AGENT - [0:0]

:KUBE-SERVICES - [0:0]

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

Let's consider the Services in the following example.

cloudiq@hubandspoke:~$ kubectl get svc --namespace=workshop-development

NAMEciq-ingress-workshop-development-nginx-ingress-controller
TYPELoadBalancer
CLUSTER-IPhttp://192.168.5.65
EXTERNAL-IPhttp://10.82.0.97
PORT(S)80:30512/TCP,443:31512/TCP
AGE19h

Here we have the following service abstractions that are defined.

LoadBalancerIP=10.82.0.97

NodePort=30512/31512

ClusterIP=192.168.5.65

The above services have to be translated to individual service endpoints. The rules performing matching and translation are programmed using custom chains in the NAT table of Ip Tables as below.

Lets look for LoadBalancer=10.82.0.97 service

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep 10.82.0.97
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:http loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-SXB4UOYSLPHVISJM
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-JLRSZDR3OXJ4SUA2

Let’s look at the HTTPS service available on port 443.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-FW-JLRSZDR3OXJ4SUA2

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-FW-JLRSZDR3OXJ4SUA2

:KUBE-FW-JLRSZDR3OXJ4SUA2 - [0:0]
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-MARK-MASQ
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-MARK-DROP
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-JLRSZDR3OXJ4SUA2

We see below NodePort & Cluster IP translation. The service chains SVC point to two different service endpoints. In order to select between the two service endpoints, a random probability measure is calculated, and appropriate SEP service endpoints are selected.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SVC-JLRSZDR3OXJ4SUA2

:KUBE-SVC-JLRSZDR3OXJ4SUA2 - [0:0]
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-NODEPORTS -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https" -m tcp --dport 31512 -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-SERVICES -d 192.168.5.65/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-4R3FOXQSM5T2ZADC
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -j KUBE-SEP-PI7R3ONIYH4XJLMW

In the SEP service endpoints, the actual DNAT is performed.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SEP-4R3FOXQSM5T2ZADC
:KUBE-SEP-4R3FOXQSM5T2ZADC - [0:0]
-A KUBE-SEP-4R3FOXQSM5T2ZADC -s 10.82.0.10/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-4R3FOXQSM5T2ZADC -p tcp -m tcp -j DNAT --to-destination 10.82.0.10:443
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-4R3FOXQSM5T2ZADC
cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SEP-PI7R3ONIYH4XJLMW
:KUBE-SEP-PI7R3ONIYH4XJLMW - [0:0]
-A KUBE-SEP-PI7R3ONIYH4XJLMW -s 10.82.0.82/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-PI7R3ONIYH4XJLMW -p tcp -m tcp -j DNAT --to-destination 10.82.0.82:443
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -j KUBE-SEP-PI7R3ONIYH4XJLMW

Automation Testing helps complete the entire software testing life cycle (STLC) in less time and improve efficiency of the testing process.

Test Automation enables teams to verify functionality, test for regression and run simultaneous tests efficiently. In this article we will take a detailed look at the Automation Testing Tools available, standards and best practices to be followed during Test Automation.

Following the best practices for Software Testing Life Cycle (Unit testing, Integration Testing & System Testing) ensures that the client gets the software as intended without any bugs. End-to-end testing is the methodology used to test whether the flow of an application is performing as designed from start to finish. Carrying out end-to-end tests helps identify system dependencies and ensure the flow of right information across various system components and the system.

Ultimately Automation Testing increases the speed of test execution and the test coverage.

When to Choose Automation Testing
  • There is lots of regression work
  • GUI is same, but you have lot of often functional changes
  • Requirements do not change frequently
  • Load and performance testing with many virtual users
  • Repetitive test cases that tend well to automation & saves time
  • Huge projects
  • Projects that need to test the same areas

Steps to Implement Automation Testing
  • Identify areas within software to automate
  • Choose the appropriate tool for test automation
  • Write test scripts
  • Develop test suits
  • Execute test scripts
  • Build result reports
  • Find possible bugs or performance issues
Choosing your Automation Testing Tool

The strategy to adopt test automation should clearly define when to opt for automation, its scope and selection of the right kind of tools for execution. And when it comes to tools the top ones to go for are

  • Cypress
  • Selenium
  • Protractor
  • Appium(Mobile)
Why Cypress?

Cypress is a JavaScript based testing framework built for the modern web. Cypress helps to create End-to-end tests, Integration tests and Unit tests. Cypress takes a different approach compared to other testing frameworks, since it’s executed in the same run loop as the application. It also leverages a Node.js server to handle any task that needs to happen outside of the browser. With its ability to understand everything happening inside and outside of the browser, it produces more consistent results.

Key Features of Cypress
  • Automatic Waiting – No need for adding wait and sleep.
  • Spies, Stubs, and Clocks – Verify and control the behaviour of functions, server responses, or timers.
  • Network traffic control and monitoring – Easily control, stub, and test edge cases without involving your server. You can stub network traffic however you like.
  • Consistent Results – Cypress architecture doesn’t use Selenium or WebDriver. It is fast, consistent and does reliable tests that are flake-free.
  • Screenshots and Videos – View screenshots taken automatically on failure, or videos of your entire test suite when run from the CLI.
Azure CICD Setup with Cypress

Cypress runs on most of the following CI providers.

Azure DevOps / VSTS CI / TeamFoundation
BitBucket
CircleCI
Docker
GitLab
Jenkins
TravisCI

Azure DevOps – Steps to Integrate Cypress Automation Tests
  • Pre-Build Testing
  • Install the Node module and run application in test mode
  • Run the tests
  • Publish the test results
  • Cypress Containerization
  • Build the docker container of cypress
  • Push the image to container
  • Publish the Build

Before we get started here are the basic Cypress installation commands

Clean up the old results
$ rm -rf cypress/reports/
 
Run the cypress application with required spec file.
$ cypress run –spec \”cypress/integration/**/*.spec.ts\” // mention your spec file
 
Configure the mocha reports path for publishing test results.
–reporter junit –reporter-options ‘mochaFile=cypress/reports/test-output-[hash].xml,toConsole=true’
 
Uninstall the application.
$ npm uninstall cypress-multi-reporters; npm uninstall cypress-promise; npm uninstall cypres

Pre-Build Testing

It is critical to test the application before the Build, Deployment or Release. Essentially the process involves regression and smoke testing. And don’t forget the sanity checks before the build is deployed in the staging environment.

Cypress comes in handy for testing angular / JavaScript applications before they are deployed to staging or production environment.

Install the Node module and run application in test mode

Install the required node module of the application then run the application with test mode.

$ npm install –save-dev start-server-and-test

$ start-server-and-test start http://localhost:4200

Publish the test results

The results of the Cypress test execution are stored in specified path and are added to the Azure DevOps test results. Cypress supports JUnit, Mocha, Mochawsome test results reporter formats and provides options to create customised test results and merge all the test results as well.

Cypress Containerization

Cypress supports docker containerization and that makes it easy to set it up in a cluster environment like AKS. The Cypress base images are available at the link below.

https://github.com/cypress-io/cypress-docker-images

Copy the package.json and UI source code to the app folder and run the Cypress test. The following commands are used to run the docker and execute.

  script: |
        docker run -d -it --name cypressName:cypressImageTag bash
        docker commit -p cypressName:cypressImageTag
        docker stop cypressName
        docker rm -f cypressName
    
    - script: docker tag cypressName:cypressImageTag
      displayName: Tag Cypress image 
      
    
    - task: Docker@1
    displayName: Push image To Registry
    inputs:
        command: push
        azureSubscriptionEndpoint: azureSubscriptionEndpoint
        azureContainerRegistry: $(azureContainerRegistry)
        imageName: acrImageName:BuildId
 
    - script: sudo rm -rf /test-results/*
    displayName: Removing Previous Results
 
    - task: ShellScript@2
    displayName: 'Bash Script - cypress base image post-deployment'
    inputs:
        scriptPath: ./cypress-deployment.sh
        args: $(azureRegistry) $(cypressImageName) $(azureContainerValue) $(CYPRESS_OPTIONS) 
        continueOnError: true
    - task: PublishTestResults@1
    displayName: 'Publish Test Results ./test-results-*.xml'
    inputs:
 
    cypress-base-image-post-deployment.sh
 
    docker run -v $systemSourceDirectory:/app/cypress/reports --name vca-arp-ui 
    $cypress_Latestimage npx cypress run $cypressOptions bash

Now the container should be set up on on your local machine and start running your specs.

Cypress is simple and easily integrates with your CI environment. Apart from the browser support, Cypress reduces the efforts of manual testing and is relatively faster when compared to other automation testing tools.

In this article we will discuss how to create security groups in AWS for Kubernetes. The goal is to set up a Kubernetes cluster on AWS EC2, having provisioned your virtual machines. You are going to need two security groups: one for the control plane load balancer, and another for the VMs.

Creating a Security Group through the AWS Console

Prerequisite: You should have a VPC (virtual private cloud) set up.

Log into the AWS EC2 (or VPC) console. On the left-hand menu, under Network and Security, click Security Groups.

Click on Create Security Group.

Enter a Name and a Description for your Security Group. Then select your VPC from the drop-down menu. Click Add Rule.

You will need 2 TCP ingress rules, one over port 6443, another over port 443. We are choosing to allow the Source from anywhere. In production you may want to restrict the CIDR, IP, or security group that can reach this load balancer.

We are choosing to leave the outbound rules as default, in which all outbound traffic is permitted.

Click Create and your security group is created!

Select your security group in the console. You may want to give your security group a Name (in addition to the Group Name that you specified when creating it).

But you are not done yet: you must add tags to your security group. These tags will alert AWS that this security group is to be used for Kubernetes. Click on the Tags tab at the bottom of the window. Then click Add/Edit Tags.

You will need 2 tags:
  • Name: KubernetesCluster. Value: <the name of your Kubernetes cluster>
  • Name: kubernetes.io/cluster/<the name of your Kubernetes cluster>. Value: owned

Click Save and your tags are saved!

Creating a Security Group for the Virtual Machines

Follow the steps above to create a security group for your virtual machines. Here are the ports that you will need to open for your control plane VMs:

The master node:
  1. 22 for SSH from your bastion host
  2. 6443 for the Kubernetes API Server
  3. 2379-2380 for the ETCD server
  4. 10250 for the Kubelet health check
  5. 10252 for the Kube controller manager
  6. 10255 for the read only kubelet API
The worker nodes:
  1. 22 for SSH
  2. 10250 for the kubelet health check
  3. 30000-32767 for external applications. However, it is more likely that you will expose external applications to outside the cluster via load balancers, and restrict access to these ports to within your vpc.
  4. 10255 for the read only kubelet API

We have chosen to combine the master and the worker rules into one security group for convenience. You may want to separate them into 2 security groups for extra security.

Follow the step-by-step instructions detailed above and you will have successfully created AWS Security Groups for Kubernetes.

Introduction:

Cloud Foundry has a Container-based architecture, open source cloud application platform. It provides the cloud instances and mainly used to deploy the Application directly into cloud environment. Instead of running the app separately, using the CF CLI(Command Line Interface) tool to deploy , test, configure and manage the apps on CF.

Features of Cloud Foundry:
  • An open source Cloud Native Platform
  • Fast and easy to build, test, deploy ,manage& scale apps
  • Works with any language or framework
  • Highly adaptable
  • Can able to see running status of apps
  • Can scale up or down, debug apps on CF
How to interact with CF?
  • Command Line Interface (CLI): from terminal / command prompt
  • IDE plugins
Org and App Space Roles:

CF uses role-based access control, with each role granting permissions in either an organization or an application space.

Organisation :
  • An Organisation or org represents an organisational account and groups together users, resources, applications, and environments.
  • Each organisation has a resource quota and it shares the same resource and domain.
  • Organisations segregate tenants in a Cloud Foundry installation.

To List all orgs that the user has access to the below command can be given in the terminal.

cf orgs
Space:
  • An organisation have separate spaces for development, staging and production versions of the apps.
  • A space can also have its own quota.
  • It has the shared location for developing and running apps
  • Every application and service is scoped to a space

To List all spaces in the current org

cf spaces
Relationship between org, space and Apps:
 
org
space
APP
APP
Service
Instance
Service
Instance
space
APP
APP
Service
Instance
Service
Instance
Before pushing the app into Cloud Foundry, Ensure that:
  • Log into cloud foundary using cf login command
    • cf login -a API-URL
  • It will prompt for username and password, then give the correct credentials
  • Select the org and space where the app gets push.
  • Then push the application using cf push
How to deploy an app into cf?

To deploy an application, need to push its code to the Cloud Foundry instance. The push command is used to push the application on cloud foundary. The arguments may be vary depends on application types. However, it is the best practice to specify all the arguments in a system file called manifest.yml

It provides consistency and reproducibility.An app can specify its service instance dependencies in the manifest.yml file. It will automatically bind to the service instances.

  1. # Start a new app called "myapp"
  2. # If there's a manifest.yml in the current folder,
  3. # the config will be read from there
  4. cf push
Manifest Format

Manifests has written in YAML. The below manifest illustrates some YAML conventions, as follows:

  • The manifest file begins with three dashes.
  • The applications block begins with a heading followed by a colon.
  • The app name is preceded by a single dash and one space.
  • Subsequent lines in the block are indented two spaces to align with name
Sample manifest.yml

---

applications:

- name: my-app

memory: 512M

instances: 2

buildpack:nodejs_buildpack

Buildpack:
  • A Cloud Foundry component that resolves app’s runtime dependencies
  • It provides framework and run time support for applications.
  • It is used to determine what dependencies to download
  • It is used to tell how to configure applications to communicate with different services.
  • It is used to compile or prepare the application for launch.
What happens when push an app using cf push?
  • Upload: App files sent to CF
  • Staging:Executable artifact is created (droplet)
  • Running:App starts on an app host

App receives web requests (if it binds to TCP port)

List of cf commands:
cf commandsPurpose
cf targetSets or views the targeted organization or space
cf stopStops an application
cf startStart an app
cf set-envSets an environment variable for an application(cf set-env var_name var_value)
cf servicesLists all of the services that are available in the current space
cf restartStop all instances of the app, then start them again. This causes downtime.
cf restageRecreate the app’s executable artifact using the latest pushed app files and the latest environment (variables, service bindings, buildpack, stack, etc.). This action will cause app downtime.
cf renameRename an app
cf pushDeploys a new application(cf push )
cf marketplaceLists all of the services that are available in the marketplace.
cf logsDisplays the STDOUT and STDERR log streams of an application.(cf logs
cf login -a Log in to CF
cf helpshow help
cf eventsDisplays runtime events that are related to an application.(cf events )
cf deleteDeletes an existing application.(cf delete
cf create-spaceCreates a space.(cf create-space
cf bind-serviceBinds an existing service instance to your application.
cf appsLists all of the applications that you deployed in the current space. The status of each application is also displayed.
cf apiTo view the current API endpoint
cf -vDisplays the version of the Cloud Foundry command line interface.
What is Synthetic New Relic?:

New Relic Synthetics is a set of automated scriptable tools to monitor the websites, critical business transactions and API endpoints. A detailed individual results from each monitor run can also be viewed. With access to New relic Insights, in-depth queries of data can be run from Synthetics monitors. Creation of custom dashboards are also possible.

Features of Synthetic New Relic:
  • Easy to set up real time instrumentation and analytics
  • REST API functions
  • Real browsers
  • Comparative charting with Browser
  • New Relic Insights support
  • Advanced scripted monitoring
  • Global test coverage
Different types of Synthetic Monitor:

There are four types of monitor.

a) Ping monitor:

Ping monitors are the simplest type of monitor. These monitors are used to check if an application is online. The Synthetics ping monitor uses a simple Java HTTP client to make requests to your site.

b) API tests:

API tests are used to monitor API endpoints. This can ensure that the app server works in addition to the corresponding website. New Relic uses the “http request module” internally to make HTTP calls to API endpoint and validate the results.

c) Browser:

Simple browser monitors essentially are simple, pre-built scripted browser monitors. These monitors make a request to the site using an instance of Google Chrome.

d) Script_Browser:

Scripted browser monitors are used for more sophisticated, customized monitoring. A custom script can be created to navigate to the website, take specific actions and ensure that the specific resources are present.

Creation of Synthetic Monitor:

API Test Monitor:

Step 1:

  • Login to new relic monitor

Step 2 - Create synthetic monitor

  • Click “synthetic” in new relic dashboard after click on the “Add new” in the right up corner.

Step 3: Enter the Required Details

  • Select on “API Test” in monitor type.
  • Enter the monitor name under details
  • Select one location for the monitor under monitoring locations.
  • Set the Schedule – Set frequency for monitoring. For example On selecting frequency as 10 mins, The monitor would run this monitor and check for every 10 mins.
  • Set Notification – Notification to email ids can be set with help of new alert policy or can be appended to existing alert policy. In case of existing alert policy, Click on “Add to an existing alert policy” and the existing policy can be selected. In case of new policy, email address and policy name has to be given. There are three type of policy,
    1. By Policy – Only one open incident at a time for this alert policy.
    2. By Condition – Only one open incident at a time per alert condition
    3. By condition and entity – open an incident every time a condition is violated.
  • Only on completing the above steps, Script can be written by clicking on “Write Your script”
  • Click on “create monitor” after the monitor creation steps done.
PING Monitor:

Step 1:

  • Login to new relic monitor

Step 2 - Create synthetic monitor

  • Click “synthetic” in new relic dashboard after click on the “Add new” in the right up corner.

Step 3: Enter the Required Details

  • Select on “API Test” in monitor type
  • Enter the monitor name under details
  • Enter the URL and enter the response corresponding URL
  • Select one location for the monitor under monitoring locations.
  • Set the Schedule – Set frequency for monitoring. For example On selecting frequency as 10 mins, The monitor would run this monitor and check for every 10 mins.
  • Set Notification – Notification to email ids can be set with help of new alert policy or can be appended to existing alert policy. In case of existing alert policy, Click on “Add to an existing alert policy” and the existing policy can be selected. In case of new policy, email address and policy name has to be given. There are three type of policy,
    1. By Policy – Only one open incident at a time for this alert policy.
    2. By Condition – Only one open incident at a time per alert condition.
    3. By condition and entity – open an incident every time a condition is violated.
  • Only on completing the above steps, Ping monitor gets created when clicking on “ Create Monitor”

Synthetic Monitor Functionality:

API Test:
Pass Scenario:

Below script is used to store the data using Post method, then pass the value to the call back function .Call back function is nothing but it is a function is passed into another function as an argument.

Here, call back function has three arguments like error, response and body.

In the below script, comparing the value “gear” and “10” with JSON body value. Both the values are same. Hence no assertion error is triggered.

In case of value mismatch, an assertion error is thrown.

Failure scenario:

In the below script, the values do not match with the JSON body value. Hence an assertion error is thrown.

In case of assertion error, an alert will be sent to the mail id given in the notification channel. The Assertion error will not be resolved until the Value is made “10”.

Mail Alert: (Ping & API Test)

The error log can be seen as below:

After the error is fixed, an update would be sent to the notification channel

Delete a Monitor: (Ping & API Test)
  • From the Monitors list, select the monitor which needs to delete.
  • In the selected monitor, under settings click on General to view the monitor settings page.
  • Select the trash icon, it will show alert popup and click on “ok” in alert popup then monitor will delete.

What is Cross Browser Testing?

Cross Browser Testing is a type of Functional Test to check whether web application works as expected on different browsers.

(Or)

Cross-browser testing is basically running the same set of test cases multiple times on different browsers.

Below two are the most intent of cross-browser testing,

Below two are the most intent of cross-browser testing,
  1. Below two are the most intent of cross-browser testing,
  2. Appearance of the page in different browsers- is it the same, is it different, if one is better than the other, etc

Note: In recent years, testing mobile browsers are included on the Cross-Browser testing scope.

When this testing can be started?

Any testing reaps the best benefits when it is done early on. Therefore, the industry recommendation is to start with it as soon as the page designs are available. Because finding and fixing bugs on early stages are very cost effective. Finding bugs after release or completion of application will not be a cost effective one.

Cross Browser testing through Manual:

Sure, it can be done manually. First, business needs to identify all browsers that the application needs to support. Tester need to run all the testcase against every identified browser and observe whether the appearance and functionality are same.

Through manual testing, it is not possible to cover many browsers and its major versions. So, performing cross browser testing manually will be costly and time-consuming too.

In an Agile world it’s not a good advice to do whole cross browser testing through manual.

Cross Browser testing through Automation:

As stated above, Cross-browser testing is basically running the same set of test cases multiple times on different browsers. This type of repeated task is best suited for automation. Thus, it’s more cost and time effective to perform this testing by using tools.

Selenium for Cross Browser Testing:

Selenium is well known for automated testing of the web-based applications. Just by changing the browser to be used for running the test cases, selenium makes it very easy to run the same test cases multiple times using different browsers.

Note: Rest of this blog we are going to see how Selenium can be used for Cross-Browser Testing.

Advantages of choosing Selenium:
  • Open source
  • Supports programming languages like Java, Perl, Python, C#, Ruby, Groovy, Java Script, etc
  • Platform Independent: Supports (OS) like Windows, Mac, Linux, UNIX, etc.
  • Supports multiple browsers namely, Internet Explorer, Chrome, Firefox, Opera, Safari, etc
  • Ease of implementation
  • Reusability

By using TestNG along with Selenium Grid we can achieve parallel test execution on different browser in different machines. Let’s see TestNG and Selenium Grid on the following topics,

TestNG:

TestNG is an automation testing framework in which NG stands for "Next Generation". TestNG is inspired from JUnit which uses the annotations (@). Default Selenium tests do not generate a proper format for the test results. Using TestNG we can generate test results.

Why TestNG?
  • Multiple test cases can be grouped easily by converting them into testng.xml file. In which you can make priorities which test case should be executed first.
  • The same test case can be executed multiple times without loops just by using keyword called 'invocation count.'
  • Using TestNG, you can execute multiple test cases on multiple browsers
  • It can be easily integrated with tools like Maven, Jenkins, etc.
Selenium Grid

Selenium Grid is a part of the Selenium Suite which specialise in running multiple tests across different browsers, operating system and machines. You can connect to it with Selenium Remote by specifying the browser, browser version, and operating system you want

Components of Selenium Grid
Hub:

In Selenium Grid, the HUB is a computer which is the central point where we can load our tests into. Hub also acts as a server because of which it acts as a central point to control the network of Test machines. The Selenium Grid has only one hub and it is the master of the network.

Nodes

In Selenium Grid, a NODE is referred to a Test Machine which opts to connect with the Hub. This test machine will be used by Hub to run tests on. A Grid network can have multiple nodes. A node is supposed to have different platforms i.e. different operating system and browsers. The node does not need the same platform for running as that of hub.

Advantages of Selenium Grid
  • Selenium Grid allows running multiple tests across different web browsers, operating systems, and machines. This ensures compatibility of the application under test across multiple combinations of web browsers, operating system, and hardware architecture
  • It speeds up the test suite completion time as it can run multiple tests in parallel. For example, if we have 10 nodes and we need to execute a test suite of 50 tests then it is going to take 10 times lesser time than a single machine that runs this test suit without Selenium Grid.
Disadvantage of Selenium Grid
  • Extra cost to project as it requires additional machines as Nodes
Grid Code Snippets:

What is Jenkins?

Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.

With Jenkins, organizations can accelerate the software development process through automation. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.

Jenkins achieves Continuous Integration with the help of plugins. Plugins allows the integration of Various DevOps stages. If you want to integrate a particular tool, you need to install the plugins for that tool. For example: Git, Maven 2 project, Amazon EC2, HTML publisher etc.

Advantages of Jenkins include:

  • It is an open source tool with great community support.
  • It is easy to install.
  • It has 1000+ plugins to ease your work. If a plugin does not exist, you can code it and share with the community.
  • It is free of cost.
  • It is built with Java and hence, it is portable to all the major platforms
What is Continuous Integration?

Continuous Integration is a development practice in which the developers are required to commit changes to the source code in a shared repository several times a day or more frequently. Every commit made in the repository is then built. This allows the teams to detect the problems early. Apart from this, depending on the Continuous Integration tool, there are several other functions like deploying the build application on the test server, providing the concerned teams with the build and test results etc.

Continuous Integration with Jenkins
  • First, a developer commits the code to the source code repository. Meanwhile, the Jenkins server checks the repository at regular intervals for changes.
  • Soon after a commit occurs, the Jenkins server detects the changes that have occurred in the source code repository. Jenkins will pull those changes and will start preparing a new build.
  • If the build fails, then the concerned team will be notified.
  • If built is successful, then Jenkins deploys the built in the test server.
  • After testing, Jenkins generates a feedback and then notifies the developers about the build and test results.
  • It will continue to check the source code repository for changes made in the source code and the whole process keeps on repeating.
Jenkins Distributed Architecture

Jenkins uses a Master-Slave architecture to manage distributed builds. In this architecture, Master and Slave communicate through TCP/IP protocol.

Jenkins Master

Your main Jenkins server is the Master. The Master’s job is to handle:

  • Scheduling build jobs.
  • Dispatching builds to the slaves for the actual execution.
  • Monitor the slaves (possibly taking them online and offline as required).
  • Recording and presenting the build results.
  • A Master instance of Jenkins can also execute build jobs directly.
Jenkins Slave

A Slave is a Java executable that runs on a remote machine. Following are the characteristics of Jenkins Slaves:

  • It hears requests from the Jenkins Master instance.
  • Slaves can run on a variety of operating systems.
  • The job of a Slave is to do as they are told to, which involves executing build jobs dispatched by the Master.
  • You can configure a project to always run on a particular Slave machine, or a particular type of Slave machine, or simply let Jenkins pick the next available Slave.
What is a Jenkins pipeline?

A pipeline is a collection of jobs that brings the software from version control into the hands of the end users by using automation tools. It is a feature used to incorporate continuous delivery in our software development workflow.

Over the years, there have been multiple Jenkins pipeline releases including, Jenkins Build flow, Jenkins Build Pipeline plugin, Jenkins Workflow, etc. What are the key features of these plugins?

  • They represent multiple Jenkins jobs as one whole workflow in the form of a pipeline.
  • What do these pipelines do? These pipelines are a collection of Jenkins jobs which trigger each other in a specified sequence.

Lets look at an example. Suppose I’m developing a small application on Jenkins and I want to build, test and deploy it. To do this, I will allot 3 jobs to perform each process. So, job1 would be for build, job2 would perform tests and job3 for deployment. I can use the Jenkins build pipeline plugin to perform this task. After creating three jobs and chaining them in a sequence, the build plugin will run these jobs as a pipeline.

This approach is effective for deploying small applications. But what happens when there are complex pipelines with several processes (build, test, unit test, integration test, pre-deploy, deploy, monitor) running 100’s of jobs?

The maintenance cost for such a complex pipeline is huge and increases with the number of processes. It also becomes tedious to build and manage such a vast number of jobs. To overcome this issue, a new feature called Jenkins Pipeline Project was introduced.

The key feature of this pipeline is to define the entire deployment flow through code. What does this mean? It means that all the standard jobs defined by Jenkins are manually written as one whole script and they can be stored in a version control system. It basically follows the ‘pipeline as code’ discipline. Instead of building several jobs for each phase, you can now code the entire workflow and put it in a Jenkinsfile. Below is a list of reasons why you should use the Jenkins Pipeline.

Jenkins Pipeline Advantages
  • It models simple to complex pipelines as code by using Groovy DSL (Domain Specific Language)
  • The code is stored in a text file called the Jenkinsfile which can be checked into a SCM (Source Code Management)
  • Improves user interface by incorporating user input within the pipeline
  • It is durable in terms of unplanned restart of the Jenkins master
  • It can restart from saved checkpoints
  • It supports complex pipelines by incorporating conditional loops, fork or join operations and allowing tasks to be performed in parallel
  • It can integrate with several other plugins
What is a Jenkinsfile?

A Jenkinsfile is a text file that stores the entire workflow as code and it can be checked into a SCM on your local system. How is this advantageous? This enables the developers to access, edit and check the code at all times.

The Jenkinsfile is written using the Groovy DSL and it can be created through a text/groovy editor or through the configuration page on the Jenkins instance. It is written based on two syntaxes, namely:

  • Declarative pipeline syntax
  • Scripted pipeline syntax

Declarative pipeline is a relatively new feature that supports the pipeline as code concept. It makes the pipeline code easier to read and write. This code is written in a Jenkinsfile which can be checked into a source control management system such as Git.

Whereas, the scripted pipeline is a traditional way of writing the code. In this pipeline, the Jenkinsfile is written on the Jenkins UI instance. Though both these pipelines are based on the groovy DSL, the scripted pipeline uses stricter groovy based syntaxes because it was the first pipeline to be built on the groovy foundation. Since this Groovy script was not typically desirable to all the users, the declarative pipeline was introduced to offer a simpler and more optioned Groovy syntax.

The declarative pipeline is defined within a block labelled ‘pipeline’ whereas the scripted pipeline is defined within a ‘node’

An example Jenkinsfile looks like this:

pipeline {
environment {
BUILD_SCRIPTS_GIT="http://10.100.100.10:7990/scm/~myname/mypipeline.git"
BUILD_SCRIPTS='mypipeline'
BUILD_HOME='/var/lib/jenkins/workspace'
 }
agent any
stages {
stage('Checkout: Code') {
steps {
sh "mkdir -p $WORKSPACE/repo;\
git config --global user.email 'email@address.com';\
git config --global user.name 'myname';\
git config --global push.default simple;\
git clone $BUILD_SCRIPTS_GIT repo/$BUILD_SCRIPTS"
sh "chmod -R +x $WORKSPACE/repo/$BUILD_SCRIPTS"
  }
 }
stage('Yum: Updates') {
steps {
sh "sudo chmod +x $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
sh "sudo $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
   }
  }
 }
post {
always {
cleanWs()
  }
 }
}

The above Jenkins file does the following.

  • sets up environment variables
  • pulls data down from a git repo
  • sets it up in a Jenkins workspace
  • runs a script under scripts/
  • once completes by cleaning up the workspace (successful or not)
Pipeline concepts
  • Pipeline

This is a user defined block which contains all the processes such as build, test, deploy, etc. It is a collection of all the stages in a Jenkinsfile. All the stages and steps are defined within this block. It is the key block for a declarative pipeline syntax.

  • Node

A node is a machine that executes an entire workflow. It is a key part of the scripted pipeline syntax.

There are various mandatory sections which are common to both the declarative and scripted pipelines, such as stages, agent and steps that must be defined within the pipeline. These are explained below:

  • Agent

An agent is a directive that can run multiple builds with only one instance of Jenkins. This feature helps to distribute the workload to different agents and execute several projects within a single Jenkins instance. It instructs Jenkins to allocate an executor for the builds.

A single agent can be specified for an entire pipeline or specific agents can be allotted to execute each stage within a pipeline. Few of the parameters used with agents are:

  • Any

Runs the pipeline/ stage on any available agent.

  • None

This parameter is applied at the root of the pipeline and it indicates that there is no global agent for the entire pipeline and each stage must specify its own agent.

  • Label

Executes the pipeline/stage on the labelled agent.

  • Docker

This parameter uses docker container as an execution environment for the pipeline or a specific stage. In the below example I’m using docker to pull an ubuntu image. This image can now be used as an execution environment to run multiple commands.

  • Stages

This block contains all the work that needs to be carried out. The work is specified in the form of stages. There can be more than one stage within this directive. Each stage performs a specific task. In the following example, I’ve created multiple stages, each performing a specific task.

  • Steps

A series of steps can be defined within a stage block. These steps are carried out in sequence to execute a stage. There must be at least one step within a steps directive. In the following example I’ve implemented an echo command within the build stage. This command is executed as a part of the ‘Build’ stage.

Continuous Integration (CI) is a development practice where developers integrate code into a shared repository frequently, preferably several times a day. Each integration can then be verified by an automated build and automated tests. While automated testing is not strictly part of CI it is typically implied.

One of the key benefits of integrating regularly is that you can detect errors quickly and locate them more easily. As each change introduced is typically small, pinpointing the specific change that introduced a defect can be done quickly.

In recent years CI has become a best practice for software development and is guided by a set of key principles. Among them are revision control, build automation and automated testing.

Benefits and Advantages of Continuous Integration

Continuous Integration has many benefits. A good CI setup speeds up your workflow and encourages the team to push every change without being afraid of breaking anything. There are more benefits to it than just working with a better software release process. Continuous Integration brings great business benefits as well.

  • Reduces the time and effort for integrations of different code changes
  • Enables a quick feedback mechanism on every change
  • Allows earlier detection and prevention of defects
  • Helps collaboration between team members so recent code is always shared
  • Reduces manual testing effort
  • Building features more incrementally saves time on the debugging side so you can focus on adding features
  • First step into fully automating the whole release process
  • Prevents divergence in different branches as they are integrated regularly
Continuous Integration Tools

Jenkins

Jenkins is a cross-platform open source CI tool written in Java. It offers configuration through both the GUI interface and the console commands. Jenkins is a very flexible tool to use because it offers an extension of features through plugins. Its plugin list is very broad, and one can easily add their own plugins to that list. Furthermore, Jenkins can distribute software builds and test loads on several machines.

Travis CI

Travis CI is an open source CI service free for all open source projects hosted on GitHub. Since Travis CI is hosted, it is platform independent. It is configured using Travis.Yml files which contain actionable data. Travis CI supports a variety of software languages, and the build configuration for each of those languages is complete. Travis CI uses virtual machines to create applications.

TeamCity

TeamCity is a Java-based sophisticated CI tool offered by JetBrains. It supports Java,Net and Ruby platforms. TeamCity has a range of free plugins available developed both by JetBrains and third parties. It also offers integration with several IDEs including, Eclipse, IntelliJ IDEA and Visual Studio. Moreover, TeamCity allows simultaneous running of multiple builds and tests in different platforms and environments.

GitLab CI

GitLab CI is hosted on the free hosting service GitLab.com, and it offers Git repository management function with features such as, access control, bug tracking, and code reviewing. GitLab CI is completely unified with GitLab and it can easily be used to link projects using the GitLab API. GitLab CI process builds are coded in the Go language and can execute on several operating systems such as, Windows, Linux, Docker, OSX, and FreeBSD.

CircleCI

CircleCI is a CI tool hosted only on GitHub. It supports several languages, including Java, Python, Ruby/Rails, Node.js, PHP, Skala and Haskell. It offers services based on containers. CircleCI offers one container free, and any number of projects can be built on it. It offers up to five levels of parallelization (1x, 4x, 8x, 12x and 16x). Therefore, maximum parallelization of 16x can be achieved in one build. CircleCI also supports Docker platform.

Bamboo

Bamboo is a CI tool developed by Atlassian. Bamboo is available in two versions, cloud and server. For the cloud version, Atlassian offers hosting service with the help of Amazon EC2 account. For the server version, self-hosting needs to be done. Bamboo supports well known Atlassian products, JIRA and BitBucket.

Machine Learning

Artificial Intelligence

Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction.

Machine Learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

In Traditional Programming, data and program are run on the computer to produce the output. In Machine Learning, data and output are run on the computer to create a program. The program can be used in traditional programming.

Machine learning algorithms are often categorized as supervised or unsupervised.

Supervised Learning

Supervised learning is a learning in which we teach or train the machine using data which is well labelled that means some data is already tagged with correct answer. After that, machine is provided with new set of examples(data) so that supervised learning algorithm analyses the training data (set of training examples) and produces a correct outcome from labelled data.

Classification algorithms and regression algorithms are types of supervised learning. Classification algorithms are used when the outputs are restricted to a limited set of values. For a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email. For an algorithm that identifies spam emails, the output would be the prediction of either "spam" or "not spam", represented by the Boolean values true and false. Regression algorithms are named for their continuous outputs, meaning they may have any value within a range. Examples of a continuous value are the temperature, length, or price of an object.

Unsupervised Learning

Unsupervised learning is the training of machine using information that is neither classified nor labelled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. The most common unsupervised learning method is cluster analysis or clustering, which is used for exploratory data analysis to find hidden patterns or grouping in data.

Some simple Machine Learning algorithms

Linear Regression

Here, we establish a relationship between independent and dependent variables by fitting the best line. It is used to estimate real values (cost of houses, number of calls, total sales, etc.) based on a continuous variable(s).

Below model is used to predict the Ice cream sales based on the temperature in a city.

We need a weight(w) and a bias(b) to fit a straight-line (y = wx + b) and this can be diagrammatically represented as given below:

Above diagram is the simplest Neural Network. A neural network is a system of hardware and/or software patterned after the operation of neurons in the human brain.

Logistic Regression

Logistic Regression is a classification algorithm used to estimate discrete binary values (like 0/1, yes/no, true/false) based on given set of independent variables. Typically, this involves fitting a curve to separate 2 distinct classes of data points.

The neural network for logistic regression has multiple weights / bias as inputs and 2 output nodes as shown below:

Deep Learning

Deep learning is a specific method of machine learning, and it’s based primarily on the use of neural networks.

In traditional supervised machine learning, systems require an expert to use his or her domain knowledge to specify the information (called features) in the input data that will best lead to a well-trained system. In Deep Learning, rather than specifying the features in our data that we think will lead to the best classification accuracy, we let the machine find this information on its own. Often, it can look at the problem in a way that even an expert wouldn’t have been able to imagine.

Neural Network Terminology

Activation function

The activation function of a node defines the output of that node, or “neuron”, given an input or set of inputs. This output is then used as input for the next node and so on until a desired solution to the original problem is found. Some of the commonly used activation functions are given below

Input / Output / Hidden Layers

Simply as the name suggests the input layer is the one which receives the input and is essentially the first layer of the network. The output layer is the one which generates the output or is the final layer of the network. The processing layers are the hidden layers within the network. These hidden layers are the ones which perform specific tasks on the incoming data and pass on the output generated by them to the next layer. The input and output layers are the ones visible to us, while are the intermediate layers are hidden.

Forward propagation

Forward Propagation refers to the movement of the input through the hidden layers to the output layers. In forward propagation, the information travels in a single direction FORWARD. The input layer supplies the input to the hidden layers and then the output is generated. There is no backward movement.

Cost / Loss function

When we build a network, the network tries to predict the output as close as possible to the actual value. We measure this accuracy of the network using the loss function. The loss function tries to penalize the network when it makes errors. Our objective while running the network is to increase our prediction accuracy and to reduce the error, hence minimizing the loss function. The most optimized output is the one with the least value of the loss function. If we define the loss function to be the mean squared error, it can be written as –

C= 1/m ∑ (y – a)2 where m is the number of training inputs, a is the predicted value and y is the actual value of that example.

The learning process revolves around minimizing the cost.

Gradient Descent

Gradient descent is an optimization algorithm for minimizing the cost. To think of it intuitively, while climbing down a hill you should take small steps and walk down instead of just jumping down at once. Therefore, what we do is, if we start from a point x, we move down a little i.e. delta h, and update our position to x-delta h and we keep doing the same till we reach the bottom. Consider bottom to be the minimum cost point.

Mathematically, to find the local minimum of a function one takes steps proportional to the negative of the gradient of the function.

Learning Rate

rate at which we descend towards the minima of the cost function is the learning rate. We should choose the learning rate very carefully since it should neither be very large that the optimal solution is missed and nor should be very low that it takes forever for the network to converge.

Backpropagation

When we define a neural network, we assign random weights and bias values to our nodes. Once we have received the output for a single iteration, we can calculate the error of the network. This error is then fed back to the network along with the gradient of the cost function to update the weights of the network. These weights are then updated so that the errors in the subsequent iterations is reduced. This updating of weights using the gradient of the cost function is known as back-propagation.

Steps in training a Neural Network
  • Initialize weights and biases.
  • ii. Forward propagation: Using the input X, weights W and biases b, for every layer we compute Z and A, the Linear and Non-linear activations. At the final layer, we compute f(A^(L-1)) which could be a sigmoid, softmax or linear function of A^(L-1) and this gives the prediction y_hat.
  • Compute the loss function: This is a function of the actual label y and predicted label y_hat. It captures how far off our predictions are from the actual target. Our objective is to minimize this loss function.
  • Backward Propagation: In this step, we calculate the gradients of the loss function f(y, y_hat) with respect to A, W, and b called dA, dW and db. Using these gradients, we update the values of the parameters from the last layer to the first.
  • Repeat steps 2–4 for n iterations/epochs till we feel we have minimized the loss function, without overfitting the train data
Machine Learning using Python

Simple Machine Learning models like Linear Regression can be trained using the python library scikit-learn. Neural Networks are built and trained using the libraries Keras, TensorFlow or PyTorch.

In below simple example, we are building a linear regression model to predict the ice cream sales based on temperature. 80% of the available data is used for testing and we are using the remaining 20% data for testing our model.

  
import matplotlib.pyplot as plt   
import numpy as np   
from sklearn.linear_model import LinearRegression  
from sklearn.metrics import r2_score  
import pandas as pd  
                  
 # load the dataset   
 Stock_Market = {'Temprature_in_Fahrenheit' :[58, 62, 52, 60, 66, 74, 68, 80, 76, 74, 64,],  
 'Ice_Cream_sales': [215,325,185,332,406,522,412,614,544,44500000,408]          
                        }  
                  
 df = pd.DataFrame(Stock_Market,columns=['Temprature_in_Fahrenheit','Ice_Cream_sales'])  
          
 X = df[['Temprature_in_Fahrenheit']]  
 Y = df['Ice_Cream_sales']  
 # splitting X and y into training and testing sets   
 from sklearn.model_selection import train_test_split   
 X_train, X_test, y_train, y_test = train_test_split(X, Y, 
 test_size=0.2, random_state=1)
          
 # create linear regression object   
 reg = LinearRegression()  
 # train the model using the training sets   
 reg.fit(X_train, y_train)  
  #Prediction  
  y_predict = reg.predict(X_test)  
        
  ## plotting residual errors in training data   
  plt.scatter(reg.predict(X_train), reg.predict(X_train) - 
  y_train, color = "green", s = 10, label = 'Train data')   
  ## plotting residual errors in test data   
  plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, 
  color = "blue", s = 10, label = 'Test data')   
  ## plotting line for zero residual error   
  plt.hlines(y = 0, xmin = 0, xmax = 2000, linewidth = 2)   
  ## plotting legend   
  plt.legend(loc = 'upper right')   
  ## plot title   
  plt.title("Residual errors")     
  ## function to show plot   
  plt.show()  
 
      

RABBITMQ

What is RabbitMQ?

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

Features of RabbitMQ:

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

  • Robust messaging for building applications in a distributed manner.
  • Easy to use
  • Runs on all major Operating Systems.
  • Supports a huge number of developer platforms
  • Supports multiple messaging protocols, message queuing, delivery acknowledgement, flexible routing to queues, multiple exchange type.
  • Open source and commercially supported
How RabbitMQ Works?

The Producer sends messages to an exchange. An exchange is responsible for the routing of the messages to the different queues. An exchange accepts messages from the producer application and routes them to message queues with the help of bindings and routing keys. A binding is a key between a queue and an exchange. Then consumers receive messages from the queue.

Prerequisites:
  • RabbitMQ
  • Python

How to Send and Receive a message using RABBITMQ?

Send a Message using RabbitMQ:

Following Program send.py will send a single message to the queue.

Step 1: To Establish a connection with RabbitMQ server.

 
       
        import pika
        
        connection = pika.BlockingConnection(
            pika.ConnectionParameters(host='localhost'))
        channel = connection.channel()
       
 

Step 2: To Create a hello queue to which the message will be delivered:


channel.queue_declare(queue='hello')
     
   

Step 3: Publish the message and mention the exchange details and queue name in the exchange and routing key params to which queue the message should go.


channel.basic_publish(exchange='', routing_key='hello', body='Hello RabbitMQ!')
        print(" [x] Sent 'Hello RabbitMQ!'")
        

Step 4: closing a connection to make sure the network buffers were flushed and our message was actually delivered to RabbitMQ


connection.close()
       
 
Recieve a message using RabbitMQ:

Following Program recieve.py will send a single message to the queue.

Step 1: It works by subscribing a callbackfunction to a queue. Whenever receiving a message, this callback function is called by the Pika library. Following function will print on the screen the contents of the message.


def callback(ch, method, properties, body):
print(" [x] Received %r" % body)
       
 

Step 2:Next, need to tell RabbitMQ that this particular callback function should receive messages from our hello queue:


channel.basic_consume(
queue='hello', on_message_callback=callback, auto_ack=True)

        

Step 3:And finally, Enter a never-ending loop that waits for data and runs callbacks whenever necessary.


print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()
     
   

Step 4:Open terminal. Run the Send.py The producer program will stop after every run: python send.py [x] Sent 'Hello RabbitMQ!' We can go to the web browser and hit the URL http://localhost:15672/, and see the count of the message sent as shown below in the dashboard:

Step 5: Open terminal. Run the receive.py program.

python receive.py
[*] Waiting for messages. To exit press CTRL+C
[x] Received 'Hello RabbitMQ!'

If ready and total count is zero in the dashboard, then confirm the messages are received by consumer.

Note: Continuously send a message through RabbitMQ. As noticed, the receive.py program doesn't exit. It will stay ready to receive further messages, and may be interrupted with Ctrl-C.

ELK – Elasticsearch, Logstash & Kibana

Introduction

As more and more IT infrastructures move to public clouds such as Amazon Web Services, Microsoft Azure, and Google Cloud, public cloud security tools, and logging platforms are both becoming more and more critical.

The ELK Stack is popular because it fulfills a specific need in the log management and log analysis space. In cloud-based infrastructures, consolidating log outputs to a central location from different sources like web servers, mail servers, database servers, network appliances can be particularly useful. This is especially true when trying to make better data-informed decisions. The ELK stack simplifies searching and analyzing data by providing insights in real-time from the log data.

It is common to run the full ELK stack, not each individual component separately. Each of these services play a important role and in order to perform under high demand it is more advantageous to deploy each service on its own server.

Why ELK?
• Rapid on-premise (or cloud) installation and easy to deploy
• Scales vertically and horizontally
• Easy and various APIs to use
• Ease of writing queries, a lot easier then writing a MapReduce job
• Availability of libraries for most programming/scripting languages
• Elastic offers a host of language clients for Elasticsearch, including Ruby, Python, PHP,
Perl, .NET, Java, and Javascript, and more
• Tools availability
• It’s free (open source), and it’s quick

ELK Stack is used for log collection, indexing and visualization of the collected log data, we can collect any type of logs (windows event logs, http logs , apache server logs etc. ) in the ELK Stack as per configuration.

Logstash:

Logstash is a light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination. It is most often used as a data pipeline for Elasticsearch, an open-source analytics and search engine. Because of its tight integration with Elasticsearch, powerful log processing capabilities, and over 200 pre-built open-source plugins that can help you easily index your data, Logstash is a popular choice for loading data into Elasticsearch.

Logstash allows you to easily ingest unstructured data from a variety of data sources including system logs, website logs, and application server logs. Logstash offers pre-built filters, so you can readily transform common data types, index them in Elasticsearch, and start querying without having to build custom data transformation pipelines.

The server component of Logstash processes incoming logs. In other words, Logstash collects, parses, and enriches logs before indexing them into Elasticsearch.

It is the pipeline which collects log data and pushes the collected data to the Elasticsearch.

Logstash Input Plugins

• Stdin – Reads events from standard input
• File – Streams events from files (similar to “tail -0F”)
• Syslog – Reads syslog messages as events
• Eventlog – Pulls events from the Windows Event Log
• Imap – read mail from an IMAP server
• Rss – captures the output of command line tools as an event
• Snmptrap – creates events based on SNMP trap messages
• Twitter – Reads events from the Twitter Streaming API
• Irc – reads events from an IRC server
• Exec – Captures the output of a shell command as an event
• Elasticsearch – Reads query results from an Elasticsearch cluster

Logstash Filter Plugins

• grok – parses unstructured event data into fields
• Mutate – performs mutations on fields
• Geoip – adds geographical information about an IP address
• Date – parse dates from fields to use as the Logstash timestamp for
an event
• Cidr – checks IP addresses against a list of network blocks
• Drop – drops all events

Logstash Output Plugins

• Stdout – prints events to the standard output
• Csv – write events to disk in a delimited format
• Email – sends email to a specified address when output is received
• Elasticsearch – stores logs in Elasticsearch
• Exec – runs a command for a matching event
• File – writes events to files on disk
• mongoDB – writes events to MongoDB
• Redmine – creates tickets using the Redmine API

Elasticsearch

Elasticsearch is the Data storage and indexing part of the ELK Stack. It stores data and indexes it.

It is also a Search engine based on Lucene and provides a distributed search engine with an HTTP web interface and schema-free JSON documents.

The distributed nature of Elasticsearch enables it to process large volumes of data in parallel, quickly finding the best matches for your queries. Elasticsearch operations such as reading or writing data usually take less than a second to complete. This lets you use Elasticsearch for near real-time use cases such as application monitoring and anomaly detection.

An index is like a 'database' in a relational database. It has a mapping which defines multiple types. We can think of an index as a type of data organization mechanism, allowing the user to partition data a certain way.

Other key concepts of Elasticsearch are replicas and shards, the mechanism Elasticsearch uses to distribute data around the cluster. Elasticsearch implements a clustered architecture that uses sharding to distribute data across multiple nodes, and replication to provide high availability.

The index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards. A shard is a Lucene index and an Elasticsearch index is a collection of shards. The application talks to an index, and Elasticsearch routes the requests to the appropriate shards.

Kibana:

Kibana is the visualization web interface through which we can visualize the indexed log data. Kibana is an open-source data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support.

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 

US

626 120th Ave NE, B102, Bellevue,

WA, 98005.

 sales@cloudiqtech.com

INDIA

No. 3 & 4, Venkateswara Avenue,Bazaar Main Rd, Ramnagar South, Madipakkam, Chennai - 600091


© 2019 CloudIQ Technologies. All rights reserved.