+01 (414) 230 - 5550
Uncategorized
What is Jenkins?

Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.

 

With Jenkins, organizations can accelerate the software development process through automation. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.

 

Jenkins achieves Continuous Integration with the help of plugins. Plugins allows the integration of Various DevOps stages. If you want to integrate a particular tool, you need to install the plugins for that tool. For example: Git, Maven 2 project, Amazon EC2, HTML publisher etc.

 

Advantages of Jenkins include:

  • It is an open source tool with great community support.
  • It is easy to install.
  • It has 1000+ plugins to ease your work. If a plugin does not exist, you can code it and share with the community.
  • It is free of cost.
  • It is built with Java and hence, it is portable to all the major platforms

 

What is Continuous Integration?

Continuous Integration is a development practice in which the developers are required to commit changes to the source code in a shared repository several times a day or more frequently. Every commit made in the repository is then built. This allows the teams to detect the problems early. Apart from this, depending on the Continuous Integration tool, there are several other functions like deploying the build application on the test server, providing the concerned teams with the build and test results etc.

 

Continuous Integration with Jenkins
  • First, a developer commits the code to the source code repository. Meanwhile, the Jenkins server checks the repository at regular intervals for changes.
  • Soon after a commit occurs, the Jenkins server detects the changes that have occurred in the source code repository. Jenkins will pull those changes and will start preparing a new build.
  • If the build fails, then the concerned team will be notified.
  • If built is successful, then Jenkins deploys the built in the test server.
  • After testing, Jenkins generates a feedback and then notifies the developers about the build and test results.
  • It will continue to check the source code repository for changes made in the source code and the whole process keeps on repeating.

 

Jenkins Distributed Architecture

Jenkins uses a Master-Slave architecture to manage distributed builds. In this architecture, Master and Slave communicate through TCP/IP protocol.

 

Jenkins Master

Your main Jenkins server is the Master. The Master’s job is to handle:

  • Scheduling build jobs.
  • Dispatching builds to the slaves for the actual execution.
  • Monitor the slaves (possibly taking them online and offline as required).
  • Recording and presenting the build results.
  • A Master instance of Jenkins can also execute build jobs directly.
Jenkins Slave

A Slave is a Java executable that runs on a remote machine. Following are the characteristics of Jenkins Slaves:

  • It hears requests from the Jenkins Master instance.
  • Slaves can run on a variety of operating systems.
  • The job of a Slave is to do as they are told to, which involves executing build jobs dispatched by the Master.
  • You can configure a project to always run on a particular Slave machine, or a particular type of Slave machine, or simply let Jenkins pick the next available Slave.

 

What is a Jenkins pipeline?

A pipeline is a collection of jobs that brings the software from version control into the hands of the end users by using automation tools. It is a feature used to incorporate continuous delivery in our software development workflow.

 

Over the years, there have been multiple Jenkins pipeline releases including, Jenkins Build flow, Jenkins Build Pipeline plugin, Jenkins Workflow, etc. What are the key features of these plugins?

  • They represent multiple Jenkins jobs as one whole workflow in the form of a pipeline.
  • What do these pipelines do? These pipelines are a collection of Jenkins jobs which trigger each other in a specified sequence.

Lets look at an example. Suppose I’m developing a small application on Jenkins and I want to build, test and deploy it. To do this, I will allot 3 jobs to perform each process. So, job1 would be for build, job2 would perform tests and job3 for deployment. I can use the Jenkins build pipeline plugin to perform this task. After creating three jobs and chaining them in a sequence, the build plugin will run these jobs as a pipeline.

 

This approach is effective for deploying small applications. But what happens when there are complex pipelines with several processes (build, test, unit test, integration test, pre-deploy, deploy, monitor) running 100’s of jobs?

 

The maintenance cost for such a complex pipeline is huge and increases with the number of processes. It also becomes tedious to build and manage such a vast number of jobs. To overcome this issue, a new feature called Jenkins Pipeline Project was introduced.

 

The key feature of this pipeline is to define the entire deployment flow through code. What does this mean? It means that all the standard jobs defined by Jenkins are manually written as one whole script and they can be stored in a version control system. It basically follows the ‘pipeline as code’ discipline. Instead of building several jobs for each phase, you can now code the entire workflow and put it in a Jenkinsfile. Below is a list of reasons why you should use the Jenkins Pipeline.

 

Jenkins Pipeline Advantages
  • It models simple to complex pipelines as code by using Groovy DSL (Domain Specific Language)
  • The code is stored in a text file called the Jenkinsfile which can be checked into a SCM (Source Code Management)
  • Improves user interface by incorporating user input within the pipeline
  • It is durable in terms of unplanned restart of the Jenkins master
  • It can restart from saved checkpoints
  • It supports complex pipelines by incorporating conditional loops, fork or join operations and allowing tasks to be performed in parallel
  • It can integrate with several other plugins

 

What is a Jenkinsfile?

A Jenkinsfile is a text file that stores the entire workflow as code and it can be checked into a SCM on your local system. How is this advantageous? This enables the developers to access, edit and check the code at all times.

 

The Jenkinsfile is written using the Groovy DSL and it can be created through a text/groovy editor or through the configuration page on the Jenkins instance. It is written based on two syntaxes, namely:

  • Declarative pipeline syntax
  • Scripted pipeline syntax

Declarative pipeline is a relatively new feature that supports the pipeline as code concept. It makes the pipeline code easier to read and write. This code is written in a Jenkinsfile which can be checked into a source control management system such as Git.

 

Whereas, the scripted pipeline is a traditional way of writing the code. In this pipeline, the Jenkinsfile is written on the Jenkins UI instance. Though both these pipelines are based on the groovy DSL, the scripted pipeline uses stricter groovy based syntaxes because it was the first pipeline to be built on the groovy foundation. Since this Groovy script was not typically desirable to all the users, the declarative pipeline was introduced to offer a simpler and more optioned Groovy syntax.

 

The declarative pipeline is defined within a block labelled ‘pipeline’ whereas the scripted pipeline is defined within a ‘node’

 

An example Jenkinsfile looks like this:

 

The above Jenkins file does the following.

  • sets up environment variables
  • pulls data down from a git repo
  • sets it up in a Jenkins workspace
  • runs a script under scripts/
  • once completes by cleaning up the workspace (successful or not)

 

Pipeline concepts
  • Pipeline

This is a user defined block which contains all the processes such as build, test, deploy, etc. It is a collection of all the stages in a Jenkinsfile. All the stages and steps are defined within this block. It is the key block for a declarative pipeline syntax.

 

  • Node

A node is a machine that executes an entire workflow. It is a key part of the scripted pipeline syntax.

There are various mandatory sections which are common to both the declarative and scripted pipelines, such as stages, agent and steps that must be defined within the pipeline. These are explained below:

 

  • Agent

An agent is a directive that can run multiple builds with only one instance of Jenkins. This feature helps to distribute the workload to different agents and execute several projects within a single Jenkins instance. It instructs Jenkins to allocate an executor for the builds.

A single agent can be specified for an entire pipeline or specific agents can be allotted to execute each stage within a pipeline. Few of the parameters used with agents are:

 

  • Any

Runs the pipeline/ stage on any available agent.

 

  • None

This parameter is applied at the root of the pipeline and it indicates that there is no global agent for the entire pipeline and each stage must specify its own agent.

 

  • Label

Executes the pipeline/stage on the labelled agent.

 

  • Docker

This parameter uses docker container as an execution environment for the pipeline or a specific stage. In the below example I’m using docker to pull an ubuntu image. This image can now be used as an execution environment to run multiple commands.

 

  • Stages

This block contains all the work that needs to be carried out. The work is specified in the form of stages. There can be more than one stage within this directive. Each stage performs a specific task. In the following example, I’ve created multiple stages, each performing a specific task.

 

  • Steps

A series of steps can be defined within a stage block. These steps are carried out in sequence to execute a stage. There must be at least one step within a steps directive. In the following example I’ve implemented an echo command within the build stage. This command is executed as a part of the ‘Build’ stage.

0

Uncategorized

Continuous Integration (CI) is a development practice where developers integrate code into a shared repository frequently, preferably several times a day. Each integration can then be verified by an automated build and automated tests. While automated testing is not strictly part of CI it is typically implied.

 

One of the key benefits of integrating regularly is that you can detect errors quickly and locate them more easily. As each change introduced is typically small, pinpointing the specific change that introduced a defect can be done quickly.

 

In recent years CI has become a best practice for software development and is guided by a set of key principles. Among them are revision control, build automation and automated testing.

 

Benefits and Advantages of Continuous Integration

Continuous Integration has many benefits. A good CI setup speeds up your workflow and encourages the team to push every change without being afraid of breaking anything. There are more benefits to it than just working with a better software release process. Continuous Integration brings great business benefits as well.

  • Reduces the time and effort for integrations of different code changes
  • Enables a quick feedback mechanism on every change
  • Allows earlier detection and prevention of defects
  • Helps collaboration between team members so recent code is always shared
  • Reduces manual testing effort
  • Building features more incrementally saves time on the debugging side so you can focus on adding features
  • First step into fully automating the whole release process
  • Prevents divergence in different branches as they are integrated regularly
Continuous Integration Tools

Jenkins

Jenkins is a cross-platform open source CI tool written in Java. It offers configuration through both the GUI interface and the console commands. Jenkins is a very flexible tool to use because it offers an extension of features through plugins. Its plugin list is very broad, and one can easily add their own plugins to that list. Furthermore, Jenkins can distribute software builds and test loads on several machines.

 

Travis CI

Travis CI is an open source CI service free for all open source projects hosted on GitHub. Since Travis CI is hosted, it is platform independent. It is configured using Travis.Yml files which contain actionable data. Travis CI supports a variety of software languages, and the build configuration for each of those languages is complete. Travis CI uses virtual machines to create applications.

 

TeamCity

TeamCity is a Java-based sophisticated CI tool offered by JetBrains. It supports Java,Net and Ruby platforms. TeamCity has a range of free plugins available developed both by JetBrains and third parties. It also offers integration with several IDEs including, Eclipse, IntelliJ IDEA and Visual Studio. Moreover, TeamCity allows simultaneous running of multiple builds and tests in different platforms and environments.

 

GitLab CI

GitLab CI is hosted on the free hosting service GitLab.com, and it offers Git repository management function with features such as, access control, bug tracking, and code reviewing. GitLab CI is completely unified with GitLab and it can easily be used to link projects using the GitLab API. GitLab CI process builds are coded in the Go language and can execute on several operating systems such as, Windows, Linux, Docker, OSX, and FreeBSD.

 

CircleCI

CircleCI is a CI tool hosted only on GitHub. It supports several languages, including Java, Python, Ruby/Rails, Node.js, PHP, Skala and Haskell. It offers services based on containers. CircleCI offers one container free, and any number of projects can be built on it. It offers up to five levels of parallelization (1x, 4x, 8x, 12x and 16x). Therefore, maximum parallelization of 16x can be achieved in one build. CircleCI also supports Docker platform.

 

Bamboo

Bamboo is a CI tool developed by Atlassian. Bamboo is available in two versions, cloud and server. For the cloud version, Atlassian offers hosting service with the help of Amazon EC2 account. For the server version, self-hosting needs to be done. Bamboo supports well known Atlassian products, JIRA and BitBucket.

0

Uncategorized

Machine Learning

Artificial Intelligence

Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction.

 

Machine Learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

 

In Traditional Programming, data and program are run on the computer to produce the output. In Machine Learning, data and output are run on the computer to create a program. The program can be used in traditional programming.

 


Machine learning algorithms are often categorized as supervised or unsupervised.

 


Supervised Learning

Supervised learning is a learning in which we teach or train the machine using data which is well labelled that means some data is already tagged with correct answer. After that, machine is provided with new set of examples(data) so that supervised learning algorithm analyses the training data (set of training examples) and produces a correct outcome from labelled data.

 

Classification algorithms and regression algorithms are types of supervised learning. Classification algorithms are used when the outputs are restricted to a limited set of values. For a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email. For an algorithm that identifies spam emails, the output would be the prediction of either “spam” or “not spam”, represented by the Boolean values true and false. Regression algorithms are named for their continuous outputs, meaning they may have any value within a range. Examples of a continuous value are the temperature, length, or price of an object.

 

Unsupervised Learning

Unsupervised learning is the training of machine using information that is neither classified nor labelled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. The most common unsupervised learning method is cluster analysis or clustering, which is used for exploratory data analysis to find hidden patterns or grouping in data.

 

Some simple Machine Learning algorithms

Linear Regression

Here, we establish a relationship between independent and dependent variables by fitting the best line. It is used to estimate real values (cost of houses, number of calls, total sales, etc.) based on a continuous variable(s).

 

Below model is used to predict the Ice cream sales based on the temperature in a city.

 


 

We need a weight(w) and a bias(b) to fit a straight-line (y = wx + b) and this can be diagrammatically represented as given below:

 


Above diagram is the simplest Neural Network. A neural network is a system of hardware and/or software patterned after the operation of neurons in the human brain.

 

Logistic Regression

Logistic Regression is a classification algorithm used to estimate discrete binary values (like 0/1, yes/no, true/false) based on given set of independent variables. Typically, this involves fitting a curve to separate 2 distinct classes of data points.

 


The neural network for logistic regression has multiple weights / bias as inputs and 2 output nodes as shown below:

 


 

Deep Learning

Deep learning is a specific method of machine learning, and it’s based primarily on the use of neural networks.

 


In traditional supervised machine learning, systems require an expert to use his or her domain knowledge to specify the information (called features) in the input data that will best lead to a well-trained system. In Deep Learning, rather than specifying the features in our data that we think will lead to the best classification accuracy, we let the machine find this information on its own. Often, it can look at the problem in a way that even an expert wouldn’t have been able to imagine.

 


 

Neural Network Terminology

Activation function

The activation function of a node defines the output of that node, or “neuron”, given an input or set of inputs. This output is then used as input for the next node and so on until a desired solution to the original problem is found. Some of the commonly used activation functions are given below

 

 

Input / Output / Hidden Layers

Simply as the name suggests the input layer is the one which receives the input and is essentially the first layer of the network. The output layer is the one which generates the output or is the final layer of the network. The processing layers are the hidden layers within the network. These hidden layers are the ones which perform specific tasks on the incoming data and pass on the output generated by them to the next layer. The input and output layers are the ones visible to us, while are the intermediate layers are hidden.


 

Forward propagation

Forward Propagation refers to the movement of the input through the hidden layers to the output layers. In forward propagation, the information travels in a single direction FORWARD. The input layer supplies the input to the hidden layers and then the output is generated. There is no backward movement.

 

Cost / Loss function

When we build a network, the network tries to predict the output as close as possible to the actual value. We measure this accuracy of the network using the loss function. The loss function tries to penalize the network when it makes errors. Our objective while running the network is to increase our prediction accuracy and to reduce the error, hence minimizing the loss function. The most optimized output is the one with the least value of the loss function. If we define the loss function to be the mean squared error, it can be written as –

 

C= 1/m ∑ (y – a)2 where m is the number of training inputs, a is the predicted value and y is the actual value of that example.

 

The learning process revolves around minimizing the cost.

 

Gradient Descent

Gradient descent is an optimization algorithm for minimizing the cost. To think of it intuitively, while climbing down a hill you should take small steps and walk down instead of just jumping down at once. Therefore, what we do is, if we start from a point x, we move down a little i.e. delta h, and update our position to x-delta h and we keep doing the same till we reach the bottom. Consider bottom to be the minimum cost point.

 

Mathematically, to find the local minimum of a function one takes steps proportional to the negative of the gradient of the function.


 

Learning Rate

rate at which we descend towards the minima of the cost function is the learning rate. We should choose the learning rate very carefully since it should neither be very large that the optimal solution is missed and nor should be very low that it takes forever for the network to converge.

 

Backpropagation

When we define a neural network, we assign random weights and bias values to our nodes. Once we have received the output for a single iteration, we can calculate the error of the network. This error is then fed back to the network along with the gradient of the cost function to update the weights of the network. These weights are then updated so that the errors in the subsequent iterations is reduced. This updating of weights using the gradient of the cost function is known as back-propagation.

 

Steps in training a Neural Network
  • Initialize weights and biases.
  • ii. Forward propagation: Using the input X, weights W and biases b, for every layer we compute Z and A, the Linear and Non-linear activations. At the final layer, we compute f(A^(L-1)) which could be a sigmoid, softmax or linear function of A^(L-1) and this gives the prediction y_hat.
  • Compute the loss function: This is a function of the actual label y and predicted label y_hat. It captures how far off our predictions are from the actual target. Our objective is to minimize this loss function.
  • Backward Propagation: In this step, we calculate the gradients of the loss function f(y, y_hat) with respect to A, W, and b called dA, dW and db. Using these gradients, we update the values of the parameters from the last layer to the first.
  • Repeat steps 2–4 for n iterations/epochs till we feel we have minimized the loss function, without overfitting the train data

 

Machine Learning using Python

Simple Machine Learning models like Linear Regression can be trained using the python library scikit-learn. Neural Networks are built and trained using the libraries Keras, TensorFlow or PyTorch.

 

In below simple example, we are building a linear regression model to predict the ice cream sales based on temperature. 80% of the available data is used for testing and we are using the remaining 20% data for testing our model.

 

0