What is Jenkins?

Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.

With Jenkins, organizations can accelerate the software development process through automation. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.

Jenkins achieves Continuous Integration with the help of plugins. Plugins allows the integration of Various DevOps stages. If you want to integrate a particular tool, you need to install the plugins for that tool. For example: Git, Maven 2 project, Amazon EC2, HTML publisher etc.

Advantages of Jenkins include:

  • It is an open source tool with great community support.
  • It is easy to install.
  • It has 1000+ plugins to ease your work. If a plugin does not exist, you can code it and share with the community.
  • It is free of cost.
  • It is built with Java and hence, it is portable to all the major platforms
What is Continuous Integration?

Continuous Integration is a development practice in which the developers are required to commit changes to the source code in a shared repository several times a day or more frequently. Every commit made in the repository is then built. This allows the teams to detect the problems early. Apart from this, depending on the Continuous Integration tool, there are several other functions like deploying the build application on the test server, providing the concerned teams with the build and test results etc.

Continuous Integration with Jenkins
  • First, a developer commits the code to the source code repository. Meanwhile, the Jenkins server checks the repository at regular intervals for changes.
  • Soon after a commit occurs, the Jenkins server detects the changes that have occurred in the source code repository. Jenkins will pull those changes and will start preparing a new build.
  • If the build fails, then the concerned team will be notified.
  • If built is successful, then Jenkins deploys the built in the test server.
  • After testing, Jenkins generates a feedback and then notifies the developers about the build and test results.
  • It will continue to check the source code repository for changes made in the source code and the whole process keeps on repeating.
Jenkins Distributed Architecture

Jenkins uses a Master-Slave architecture to manage distributed builds. In this architecture, Master and Slave communicate through TCP/IP protocol.

Jenkins Master

Your main Jenkins server is the Master. The Master’s job is to handle:

  • Scheduling build jobs.
  • Dispatching builds to the slaves for the actual execution.
  • Monitor the slaves (possibly taking them online and offline as required).
  • Recording and presenting the build results.
  • A Master instance of Jenkins can also execute build jobs directly.
Jenkins Slave

A Slave is a Java executable that runs on a remote machine. Following are the characteristics of Jenkins Slaves:

  • It hears requests from the Jenkins Master instance.
  • Slaves can run on a variety of operating systems.
  • The job of a Slave is to do as they are told to, which involves executing build jobs dispatched by the Master.
  • You can configure a project to always run on a particular Slave machine, or a particular type of Slave machine, or simply let Jenkins pick the next available Slave.
What is a Jenkins pipeline?

A pipeline is a collection of jobs that brings the software from version control into the hands of the end users by using automation tools. It is a feature used to incorporate continuous delivery in our software development workflow.

Over the years, there have been multiple Jenkins pipeline releases including, Jenkins Build flow, Jenkins Build Pipeline plugin, Jenkins Workflow, etc. What are the key features of these plugins?

  • They represent multiple Jenkins jobs as one whole workflow in the form of a pipeline.
  • What do these pipelines do? These pipelines are a collection of Jenkins jobs which trigger each other in a specified sequence.

Lets look at an example. Suppose I’m developing a small application on Jenkins and I want to build, test and deploy it. To do this, I will allot 3 jobs to perform each process. So, job1 would be for build, job2 would perform tests and job3 for deployment. I can use the Jenkins build pipeline plugin to perform this task. After creating three jobs and chaining them in a sequence, the build plugin will run these jobs as a pipeline.

This approach is effective for deploying small applications. But what happens when there are complex pipelines with several processes (build, test, unit test, integration test, pre-deploy, deploy, monitor) running 100’s of jobs?

The maintenance cost for such a complex pipeline is huge and increases with the number of processes. It also becomes tedious to build and manage such a vast number of jobs. To overcome this issue, a new feature called Jenkins Pipeline Project was introduced.

The key feature of this pipeline is to define the entire deployment flow through code. What does this mean? It means that all the standard jobs defined by Jenkins are manually written as one whole script and they can be stored in a version control system. It basically follows the ‘pipeline as code’ discipline. Instead of building several jobs for each phase, you can now code the entire workflow and put it in a Jenkinsfile. Below is a list of reasons why you should use the Jenkins Pipeline.

Jenkins Pipeline Advantages
  • It models simple to complex pipelines as code by using Groovy DSL (Domain Specific Language)
  • The code is stored in a text file called the Jenkinsfile which can be checked into a SCM (Source Code Management)
  • Improves user interface by incorporating user input within the pipeline
  • It is durable in terms of unplanned restart of the Jenkins master
  • It can restart from saved checkpoints
  • It supports complex pipelines by incorporating conditional loops, fork or join operations and allowing tasks to be performed in parallel
  • It can integrate with several other plugins
What is a Jenkinsfile?

A Jenkinsfile is a text file that stores the entire workflow as code and it can be checked into a SCM on your local system. How is this advantageous? This enables the developers to access, edit and check the code at all times.

The Jenkinsfile is written using the Groovy DSL and it can be created through a text/groovy editor or through the configuration page on the Jenkins instance. It is written based on two syntaxes, namely:

  • Declarative pipeline syntax
  • Scripted pipeline syntax

Declarative pipeline is a relatively new feature that supports the pipeline as code concept. It makes the pipeline code easier to read and write. This code is written in a Jenkinsfile which can be checked into a source control management system such as Git.

Whereas, the scripted pipeline is a traditional way of writing the code. In this pipeline, the Jenkinsfile is written on the Jenkins UI instance. Though both these pipelines are based on the groovy DSL, the scripted pipeline uses stricter groovy based syntaxes because it was the first pipeline to be built on the groovy foundation. Since this Groovy script was not typically desirable to all the users, the declarative pipeline was introduced to offer a simpler and more optioned Groovy syntax.

The declarative pipeline is defined within a block labelled ‘pipeline’ whereas the scripted pipeline is defined within a ‘node’

An example Jenkinsfile looks like this:

pipeline {
environment {
BUILD_SCRIPTS_GIT="http://10.100.100.10:7990/scm/~myname/mypipeline.git"
BUILD_SCRIPTS='mypipeline'
BUILD_HOME='/var/lib/jenkins/workspace'
 }
agent any
stages {
stage('Checkout: Code') {
steps {
sh "mkdir -p $WORKSPACE/repo;\
git config --global user.email 'email@address.com';\
git config --global user.name 'myname';\
git config --global push.default simple;\
git clone $BUILD_SCRIPTS_GIT repo/$BUILD_SCRIPTS"
sh "chmod -R +x $WORKSPACE/repo/$BUILD_SCRIPTS"
  }
 }
stage('Yum: Updates') {
steps {
sh "sudo chmod +x $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
sh "sudo $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
   }
  }
 }
post {
always {
cleanWs()
  }
 }
}

The above Jenkins file does the following.

  • sets up environment variables
  • pulls data down from a git repo
  • sets it up in a Jenkins workspace
  • runs a script under scripts/
  • once completes by cleaning up the workspace (successful or not)
Pipeline concepts
  • Pipeline

This is a user defined block which contains all the processes such as build, test, deploy, etc. It is a collection of all the stages in a Jenkinsfile. All the stages and steps are defined within this block. It is the key block for a declarative pipeline syntax.

  • Node

A node is a machine that executes an entire workflow. It is a key part of the scripted pipeline syntax.

There are various mandatory sections which are common to both the declarative and scripted pipelines, such as stages, agent and steps that must be defined within the pipeline. These are explained below:

  • Agent

An agent is a directive that can run multiple builds with only one instance of Jenkins. This feature helps to distribute the workload to different agents and execute several projects within a single Jenkins instance. It instructs Jenkins to allocate an executor for the builds.

A single agent can be specified for an entire pipeline or specific agents can be allotted to execute each stage within a pipeline. Few of the parameters used with agents are:

  • Any

Runs the pipeline/ stage on any available agent.

  • None

This parameter is applied at the root of the pipeline and it indicates that there is no global agent for the entire pipeline and each stage must specify its own agent.

  • Label

Executes the pipeline/stage on the labelled agent.

  • Docker

This parameter uses docker container as an execution environment for the pipeline or a specific stage. In the below example I’m using docker to pull an ubuntu image. This image can now be used as an execution environment to run multiple commands.

  • Stages

This block contains all the work that needs to be carried out. The work is specified in the form of stages. There can be more than one stage within this directive. Each stage performs a specific task. In the following example, I’ve created multiple stages, each performing a specific task.

  • Steps

A series of steps can be defined within a stage block. These steps are carried out in sequence to execute a stage. There must be at least one step within a steps directive. In the following example I’ve implemented an echo command within the build stage. This command is executed as a part of the ‘Build’ stage.

RABBITMQ

What is RabbitMQ?

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

Features of RabbitMQ:

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

  • Robust messaging for building applications in a distributed manner.
  • Easy to use
  • Runs on all major Operating Systems.
  • Supports a huge number of developer platforms
  • Supports multiple messaging protocols, message queuing, delivery acknowledgement, flexible routing to queues, multiple exchange type.
  • Open source and commercially supported
How RabbitMQ Works?

The Producer sends messages to an exchange. An exchange is responsible for the routing of the messages to the different queues. An exchange accepts messages from the producer application and routes them to message queues with the help of bindings and routing keys. A binding is a key between a queue and an exchange. Then consumers receive messages from the queue.

Prerequisites:
  • RabbitMQ
  • Python

How to Send and Receive a message using RABBITMQ?

Send a Message using RabbitMQ:

Following Program send.py will send a single message to the queue.

Step 1: To Establish a connection with RabbitMQ server.

 
       
        import pika
        
        connection = pika.BlockingConnection(
            pika.ConnectionParameters(host='localhost'))
        channel = connection.channel()
       
 

Step 2: To Create a hello queue to which the message will be delivered:


channel.queue_declare(queue='hello')
     
   

Step 3: Publish the message and mention the exchange details and queue name in the exchange and routing key params to which queue the message should go.


channel.basic_publish(exchange='', routing_key='hello', body='Hello RabbitMQ!')
        print(" [x] Sent 'Hello RabbitMQ!'")
        

Step 4: closing a connection to make sure the network buffers were flushed and our message was actually delivered to RabbitMQ


connection.close()
       
 
Recieve a message using RabbitMQ:

Following Program recieve.py will send a single message to the queue.

Step 1: It works by subscribing a callbackfunction to a queue. Whenever receiving a message, this callback function is called by the Pika library. Following function will print on the screen the contents of the message.


def callback(ch, method, properties, body):
print(" [x] Received %r" % body)
       
 

Step 2:Next, need to tell RabbitMQ that this particular callback function should receive messages from our hello queue:


channel.basic_consume(
queue='hello', on_message_callback=callback, auto_ack=True)

        

Step 3:And finally, Enter a never-ending loop that waits for data and runs callbacks whenever necessary.


print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()
     
   

Step 4:Open terminal. Run the Send.py The producer program will stop after every run: python send.py [x] Sent 'Hello RabbitMQ!' We can go to the web browser and hit the URL http://localhost:15672/, and see the count of the message sent as shown below in the dashboard:

Step 5: Open terminal. Run the receive.py program.

python receive.py
[*] Waiting for messages. To exit press CTRL+C
[x] Received 'Hello RabbitMQ!'

If ready and total count is zero in the dashboard, then confirm the messages are received by consumer.

Note: Continuously send a message through RabbitMQ. As noticed, the receive.py program doesn't exit. It will stay ready to receive further messages, and may be interrupted with Ctrl-C.

Messaging system

A Messaging System is responsible for transferring data from one application to another, so the applications can focus on data, but not worry about how to share it. In Big Data, an enormous volume of data is used.

Two types of messaging patterns are available:

Messaging system

In a point-to-point system, messages are persisted in a queue. One or more consumers can consume the messages in the queue, but a particular message can be consumed by a maximum of one consumer only.

Publish-Subscribe Messaging System

In the publish-subscribe system, messages are persisted in a topic. Unlike point-to-point system, consumers can subscribe to one or more topic and consume all the messages in that topic. In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers.

Kafka

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables us to pass messages from one end-point to another.

The following diagram illustrates the main terminologies.

Topics

A stream of messages belonging to a category is called a topic. Data is stored in topics. Kafka topics are analogous to radio / TV channels. Multiple consumers can subscribe to same topic and consume the messages.

Topics are split into partitions. For each topic, Kafka keeps a minimum of one partition. Each such partition contains messages in an immutable ordered sequence.

Partition offset

Each partitioned message has a unique sequence id called as offset. For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.

The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space.

Replicas of partition

Replicas are nothing but backups of a partition. Replicas are never read or write data. They are used to prevent data loss.

Brokers

Brokers are simple system responsible for maintaining the published data. Each broker may have zero or more partitions per topic.

Kafka Cluster

Kafka’s having more than one broker are called as Kafka cluster.

Kafka Cluster Architecture

Zookeeper

ZooKeeper is used for managing and coordinating Kafka broker. ZooKeeper service is mainly used to notify producer and consumer about the presence of any new broker in the Kafka system or failure of the broker in the Kafka system.

Consumer Group

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

Kafka features
  1. High Throughput: Provides support for hundreds of thousands of messages with modest hardware.
  2. Scalability: Highly scalable distributed system with no downtime
  3. Data Loss: Kafka ensures no data loss once configured properly
  4. Stream processing: Kafka can be used along with real time streaming applications like Spark and Storm
  5. Durability: Provides support to persisting messages on disk
  6. Replication: Messages can be replicated across clusters, which supports multiple subscribers
Installing and Getting started

Prerequisite: Install Java

  1. Download kafka .tgz file from https://kafka.apache.org/downloads
  2. Untar the file and go into the kafka directory
              
             > tar -xzf kafka_2.11-2.1.0.tgz
             > cd kafka_2.11-2.1.0 
             
  3. Start the zookeeper server using the properties in zookeeper.properties
              
            > bin/zookeeper-server-start.sh config/zookeeper.properties
              
            
  4. iv. Start the Kafka broker using the properties in server.properties
              
            > bin/kafka-server-start.sh config/server.properties 
             
            
Create a topic

Let's create a topic named "test" with a single partition and only one replica:
In below command 2181 is the port number we have specified in zookeeper.properties

            
        > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
            
        

To list the current list of topics, we can query the zookeeper using below command:

                
                    > bin/kafka-topics.sh --list --zookeeper localhost:2181
        Test
                
        
Command line producer

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message. Run the producer and then type a few messages into the console to send to the server

(In below command 9092 is the port number we configured for the broker in server.properties)

                
                    > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
        This is a message
        This is another message
                
        
Command line consumer

Kafka also has a command line consumer that will dump out messages to standard output.

                
                    > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
        This is a message
        This is another message
                
        
Kafka producer and consumer using python

The Kafka producer and consumer can be coded in many languages like java, python, etc. In this section, we will see how to send and receive messages from a python topic using python.

  1. First we have to install the kafka-python package using python package manager.
                            
                                pip install kafka-python
                            
                            
  2. Below python program reads records from an input file and sends them as messages to the test topic which we created in previous section

    Producer.py

                    
                        from kafka import KafkaProducer
            producer = KafkaProducer(bootstrap_servers=['localhost:9092'])
            with open('/example/data/inputfile.txt') as f:
                 lines = f.readlines()
            
            for line in lines:
               producer.send('test', line)
                    
            

Below python program consumes the messages from the kafka topic and prints them on the screen.

Consumer.py

                
                    from kafka import KafkaConsumer
        consumer = KafkaConsumer(bootstrap_servers=['localhost:9092'], auto_offset_reset='earliest')
        consumer.subscribe(['test'])
        for msg in consumer:
            print(msg)
                
        

In the above program, we have

                
                    auto_offset_reset=’earliest’.
                
                

This will cause the consumer program to read all the messages from beginning if the same program is run again and again.

Instead if we have

                 auto_offset_reset=’latest’,
                
                

the consumer program will read messages starting from the latest offset which was consumed earlier.

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 

US

626 120th Ave NE, B102, Bellevue,

WA, 98005.

 sales@cloudiqtech.com

INDIA

No. 3 & 4, Venkateswara Avenue,Bazaar Main Rd, Ramnagar South, Madipakkam, Chennai - 600091


© 2019 CloudIQ Technologies. All rights reserved.