Kubernetes Deployment Controller – An Inside Look

Container and container orchestration have become the default system for any DevOps team that wants to scale on-demand, reduce costs, and deliver faster. And to get the best out of container technology, Kubernetes is the way to go. A recommended Kubernetes practice is to manage pods through a Deployment; this way, they can be monitored and restarted if a failure occurs.

A deployment is created by using a Kubernetes Deployment Controller object. The application (in a container) is deployed to Kubernetes by declaratively passing a desired state to the Kubernetes Deployment Controller. A K8s deployment controller object is utilized for monitoring, management of upgrade, downgrade, and scaling of services (e.g., pods) without any disruption or downtime. This is made possible because the deployment controller is the single source of truth for the sizes of new and old replica sets. It maintains multiple replica sets, and when you describe a desired state, the DC changes the actual state at the correct pace.

Here’s a detailed look at the inner workings of Kubernetes Deployment Controller

K8s deployment controller is responsible for the following functions

– Managing a set of pods in the form of Replica Sets & Hash-based labels
– Rolling out new versions of application through new Replica Sets
– Rolling back to old versions of application through old Replica Sets
– Pause & Resume Rollout/Rollback functions
– Scale-Up/Down functions

“The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state. Examples of controllers that ship with Kubernetes today are the replication controller, endpoints controller, namespace controller, and serviceaccounts controller.”

func NewControllerInitializers(loopMode ControllerLoopMode) map[string]InitFunc {
        controllers := map[string]InitFunc{}
        controllers["endpoint"] = startEndpointController
        controllers["endpointslice"] = startEndpointSliceController
        controllers["replicationcontroller"] = startReplicationController
        controllers["podgc"] = startPodGCController
        controllers["resourcequota"] = startResourceQuotaController
        controllers["namespace"] = startNamespaceController
        controllers["serviceaccount"] = startServiceAccountController
        controllers["garbagecollector"] = startGarbageCollectorController
        controllers["daemonset"] = startDaemonSetController
        controllers["job"] = startJobController
        controllers["deployment"] = startDeploymentController
        controllers["replicaset"] = startReplicaSetController
        controllers["horizontalpodautoscaling"] = startHPAController
        controllers["disruption"] = startDisruptionController
        controllers["statefulset"] = startStatefulSetController
        controllers["cronjob"] = startCronJobController
        controllers["csrsigning"] = startCSRSigningController
        controllers["csrapproving"] = startCSRApprovingController
        controllers["csrcleaner"] = startCSRCleanerController
        controllers["ttl"] = startTTLController
        controllers["bootstrapsigner"] = startBootstrapSignerController
        controllers["tokencleaner"] = startTokenCleanerController
        controllers["nodeipam"] = startNodeIpamController
        controllers["nodelifecycle"] = startNodeLifecycleController
 	if loopMode == IncludeCloudLoops {
                controllers["service"] = startServiceController
                controllers["route"] = startRouteController
                controllers["cloud-node-lifecycle"] = startCloudNodeLifecycleController
                // TODO: volume controller into the IncludeCloudLoops only set.
        }
        controllers["persistentvolume-binder"] = startPersistentVolumeBinderController
        controllers["attachdetach"] = startAttachDetachController
        controllers["persistentvolume-expander"] = startVolumeExpandController
        controllers["clusterrole-aggregation"] = startClusterRoleAggregrationController
        controllers["pvc-protection"] = startPVCProtectionController
        controllers["pv-protection"] = startPVProtectionController
        controllers["ttl-after-finished"] = startTTLAfterFinishedController
        controllers["root-ca-cert-publisher"] = startRootCACertPublisher

        return controllers
}

Let’s look at inside workings of “Deployment” Controller. It watches for following object updates.

func startDeploymentController(ctx ControllerContext) (http.Handler, bool, error) {
        if !ctx.AvailableResources[schema.GroupVersionResource{Group: "apps", Version: "v1", Resource: "deployments"}] {
                return nil, false, nil
        }
        dc, err := deployment.NewDeploymentController(
                ctx.InformerFactory.Apps().V1().Deployments(),
                ctx.InformerFactory.Apps().V1().ReplicaSets(),
                ctx.InformerFactory.Core().V1().Pods(),
                ctx.ClientBuilder.ClientOrDie("deployment-controller"),
        )
        if err != nil {
                return nil, true, fmt.Errorf("error creating Deployment controller: %v", err)
        }
        go dc.Run(int(ctx.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs), ctx.Stop)
        return nil, true, nil
}

The “Deployment Controller” initializes the following Event handlers.

dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                AddFunc:    dc.addDeployment,
                UpdateFunc: dc.updateDeployment,
                // This will enter the sync loop and no-op, because the deployment has been deleted from the store.
                DeleteFunc: dc.deleteDeployment,
        })
        rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                AddFunc:    dc.addReplicaSet,
                UpdateFunc: dc.updateReplicaSet,
                DeleteFunc: dc.deleteReplicaSet,
        })
        podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                DeleteFunc: dc.deletePod,
        })

Since Kubernetes uses asynchronous programming, the events are processed through work queues and workers.

func (dc *DeploymentController) addDeployment(obj interface{}) {
        d := obj.(*apps.Deployment)
        klog.V(4).Infof("Adding deployment %s", d.Name)
        dc.enqueueDeployment(d)
}

The items from the queue are handled by “syncDeployment” handler. Some of the functions done by the handler are shown below.

// List ReplicaSets owned by this Deployment, while reconciling ControllerRef
        // through adoption/orphaning.
        rsList, err := dc.getReplicaSetsForDeployment(d)
	
	// List all Pods owned by this Deployment, grouped by their ReplicaSet.
        // Current uses of the podMap are:
        //
        // * check if a Pod is labeled correctly with the pod-template-hash label.
        // * check that no old Pods are running in the middle of Recreate Deployments.
        podMap, err := dc.getPodMapForDeployment(d, rsList)

	// Update deployment conditions with an Unknown condition when pausing/resuming
        // a deployment. In this way, we can be sure that we won't timeout when a user
        // resumes a Deployment with a set progressDeadlineSeconds.
        if err = dc.checkPausedConditions(d); err != nil {
                return err
        }

	// rollback is not re-entrant in case the underlying replica sets are updated with a new
        // revision so we should ensure that we won't proceed to update replica sets until we
        // make sure that the deployment has cleaned up its rollback spec in subsequent enqueues.
        if getRollbackTo(d) != nil {
                return dc.rollback(d, rsList)
        }

        scalingEvent, err := dc.isScalingEvent(d, rsList)
        if err != nil {
                return err
        }
        if scalingEvent {
                return dc.sync(d, rsList)
        }

        switch d.Spec.Strategy.Type {
        case apps.RecreateDeploymentStrategyType:
                return dc.rolloutRecreate(d, rsList, podMap)
        case apps.RollingUpdateDeploymentStrategyType:
                return dc.rolloutRolling(d, rsList)
        }

Sync is responsible for reconciling deployments on scaling events or when they are paused.

func (dc *DeploymentController) sync(d *apps.Deployment, rsList []*apps.ReplicaSet) error {
        newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(d, rsList, false)
        if err != nil {
                return err
        }
        if err := dc.scale(d, newRS, oldRSs); err != nil {
                // If we get an error while trying to scale, the deployment will be requeued
                // so we can abort this resync
                return err
        }

        // Clean up the deployment when it's paused and no rollback is in flight.
        if d.Spec.Paused && getRollbackTo(d) == nil {
                if err := dc.cleanupDeployment(oldRSs, d); err != nil {
                        return err
                }
        }

        allRSs := append(oldRSs, newRS)
        return dc.syncDeploymentStatus(allRSs, newRS, d)
}


// scale scales proportionally in order to mitigate risk. Otherwise, scaling up can increase the size
// of the new replica set and scaling down can decrease the sizes of the old ones, both of which would
// have the effect of hastening the rollout progress, which could produce a higher proportion of unavailable
// replicas in the event of a problem with the rolled out a template. Should run only on scaling events or
// when a deployment is paused and not during the normal rollout process.

func (dc *DeploymentController) scale(deployment *apps.Deployment, newRS 
*apps.ReplicaSet, oldRSs []*apps.ReplicaSet) error {

 	// If there is only one active replica set then we should scale that up to the full count of the
        // deployment. If there is no active replica set, then we should scale up the newest replica set.
        if activeOrLatest := deploymentutil.FindActiveOrLatest(newRS, oldRSs); activeOrLatest != nil {


	// If the new replica set is saturated, old replica sets should be fully scaled down.
        // This case handles replica set adoption during a saturated new replica set.
        if deploymentutil.IsSaturated(deployment, newRS) {

 // There are old replica sets with pods, and the new replica set is not saturated. 
        // We need to proportionally scale all replica sets (new and old) in case of a
        // rolling deployment.
        if deploymentutil.IsRollingUpdate(deployment) {

		// Number of additional replicas that can be either added or removed from the total
                // replicas count. These replicas should be distributed proportionally to the active
                // replica sets.
                deploymentReplicasToAdd := allowedSize - allRSsReplicas

                // The additional replicas should be distributed proportionally amongst the active
                // replica sets from the larger to the smaller in size replica set. Scaling direction
                // drives what happens in case we are trying to scale replica sets of the same size.
                // In such a case when scaling up, we should scale up newer replica sets first, and
                // when scaling down, we should scale down older replica sets first.

We hope this article helped you understand the inner workings of Kubernetes deployment controller. If you would like to learn more about Kubernetes and get certified, join our 2-day Kubernetes workshop.