kubernetes ceph-csi分析目录导航

概述

kube-controller-manager组件中,有两个controller与存储相关,分别是PV controller与AD controller。

基于tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

AD Cotroller分析

AD Cotroller作用

AD Cotroller全称Attachment/Detachment 控制器,主要负责创建、删除VolumeAttachment对象,并调用volume plugin来做存储设备的Attach/Detach操作(将数据卷挂载到特定node节点上/从特定node节点上解除挂载),以及更新node.Status.VolumesAttached等。

不同的volume plugin的Attach/Detach操作逻辑有所不同,如通过ceph-csi(out-tree volume plugin)来使用ceph存储,则的Attach/Detach操作只是修改VolumeAttachment对象的状态,而不会真正的将数据卷挂载到节点/从节点上解除挂载。

kubelet启动参数--enable-controller-attach-detach

AD Cotroller与kubelet中的volume manager逻辑相似,都可以做Attach/Detach操作,但是kube-controller-manager与kubelet中,只会有一个组件做Attach/Detach操作,通过kubelet启动参数--enable-controller-attach-detach设置。设置为 true 表示启用kube-controller-manager的AD controller来做Attach/Detach操作,同时禁用 kubelet 执行 Attach/Detach 操作(默认值为 true)。

注意

当k8s通过ceph-csi来使用ceph存储,volume plugin为ceph-csi,AD controller的Attach/Detach操作,只是创建/删除VolumeAttachment对象,而不会真正的将数据卷挂载到节点/从节点上解除挂载;csi-attacer组件也不会做挂载/解除挂载操作,只是更新VolumeAttachment对象,真正的节点挂载/解除挂载操作由kubelet中的volume manager调用volume plugin(ceph-csi)来完成。

两个关键结构体

(1)desiredStateOfWorld: 记录着集群中期望要挂载到node的pod的volume信息,简称DSW,数据来源:pod里声明的volume列表。

(2)actualStateOfWorld: 记录着集群中实际已经挂载到node节点的volume信息,简称ASW,数据来源:node对象中的node.Status.VolumesAttached属性值列表。

node.Status.VolumesInUse作用

node对象中的node.Status.VolumesInUse记录的是已经attach到该node节点上,并已经mount了的volume信息。

该属性由kubelet的volume manager来更新。当一个volume加入dsw中,就会被更新到node.Status.VolumesInUse中,直到该volume在asw与dsw中均不存在或已经处于unmounted状态时,会从node.Status.VolumesInUse中移除。

AD controller做volume的dettach操作前,会先判断该属性,如果该属性值中含有该volume,则说明volume还在被使用,返回dettach失败的错误。

node.Status.VolumesAttached作用

node对象中的node.Status.VolumesAttached记录的是已经attach到该node节点上的volume信息。

该属性由kube-controller-manager中的AD controller根据asw的值来更新。

attachDetachController.Run源码分析

直接看到attachDetachControllerRun()方法,来分析attachDetachController的主体处理逻辑。

主要调用了以下4个方法,下面会逐一分析:

(1)adc.populateActualStateOfWorld()

(2)adc.populateDesiredStateOfWorld()

(3)adc.reconciler.Run()

(4)adc.desiredStateOfWorldPopulator.Run()

// pkg/controller/volume/attachdetach/attach_detach_controller.go
func (adc *attachDetachController) Run(stopCh <-chan struct{}) {
defer runtime.HandleCrash()
defer adc.pvcQueue.ShutDown() klog.Infof("Starting attach detach controller")
defer klog.Infof("Shutting down attach detach controller") synced := []kcache.InformerSynced{adc.podsSynced, adc.nodesSynced, adc.pvcsSynced, adc.pvsSynced}
if adc.csiNodeSynced != nil {
synced = append(synced, adc.csiNodeSynced)
}
if adc.csiDriversSynced != nil {
synced = append(synced, adc.csiDriversSynced)
} if !kcache.WaitForNamedCacheSync("attach detach", stopCh, synced...) {
return
} err := adc.populateActualStateOfWorld()
if err != nil {
klog.Errorf("Error populating the actual state of world: %v", err)
}
err = adc.populateDesiredStateOfWorld()
if err != nil {
klog.Errorf("Error populating the desired state of world: %v", err)
}
go adc.reconciler.Run(stopCh)
go adc.desiredStateOfWorldPopulator.Run(stopCh)
go wait.Until(adc.pvcWorker, time.Second, stopCh)
metrics.Register(adc.pvcLister,
adc.pvLister,
adc.podLister,
adc.actualStateOfWorld,
adc.desiredStateOfWorld,
&adc.volumePluginMgr,
adc.csiMigratedPluginManager,
adc.intreeToCSITranslator) <-stopCh
}

1 adc.populateActualStateOfWorld

作用:初始化actualStateOfWorld结构体。

主要逻辑:遍历node对象,获取node.Status.VolumesAttached与node.Status.VolumesInUse,再调用adc.actualStateOfWorld.MarkVolumeAsAttached与adc.processVolumesInUse更新actualStateOfWorld信息。

// pkg/controller/volume/attachdetach/attach_detach_controller.go
func (adc *attachDetachController) populateActualStateOfWorld() error {
klog.V(5).Infof("Populating ActualStateOfworld")
nodes, err := adc.nodeLister.List(labels.Everything())
if err != nil {
return err
} for _, node := range nodes {
nodeName := types.NodeName(node.Name)
for _, attachedVolume := range node.Status.VolumesAttached {
uniqueName := attachedVolume.Name
// The nil VolumeSpec is safe only in the case the volume is not in use by any pod.
// In such a case it should be detached in the first reconciliation cycle and the
// volume spec is not needed to detach a volume. If the volume is used by a pod, it
// its spec can be: this would happen during in the populateDesiredStateOfWorld which
// scans the pods and updates their volumes in the ActualStateOfWorld too.
err = adc.actualStateOfWorld.MarkVolumeAsAttached(uniqueName, nil /* VolumeSpec */, nodeName, attachedVolume.DevicePath)
if err != nil {
klog.Errorf("Failed to mark the volume as attached: %v", err)
continue
}
adc.processVolumesInUse(nodeName, node.Status.VolumesInUse)
adc.addNodeToDswp(node, types.NodeName(node.Name))
}
}
return nil
}

2 adc.populateDesiredStateOfWorld

作用:初始化desiredStateOfWorld结构体。

主要逻辑:遍历pod列表,遍历pod的volume信息

(1)根据pod的volume信息来初始化desiredStateOfWorld;

(2)从actualStateOfWorld中查询,如果pod的volume已经attach到node了,则更新actualStateOfWorld,将该volume标记为已attach。

// pkg/controller/volume/attachdetach/attach_detach_controller.go
func (adc *attachDetachController) populateDesiredStateOfWorld() error {
klog.V(5).Infof("Populating DesiredStateOfworld") pods, err := adc.podLister.List(labels.Everything())
if err != nil {
return err
}
for _, pod := range pods {
podToAdd := pod
adc.podAdd(podToAdd)
for _, podVolume := range podToAdd.Spec.Volumes {
nodeName := types.NodeName(podToAdd.Spec.NodeName)
// The volume specs present in the ActualStateOfWorld are nil, let's replace those
// with the correct ones found on pods. The present in the ASW with no corresponding
// pod will be detached and the spec is irrelevant.
volumeSpec, err := util.CreateVolumeSpec(podVolume, podToAdd.Namespace, nodeName, &adc.volumePluginMgr, adc.pvcLister, adc.pvLister, adc.csiMigratedPluginManager, adc.intreeToCSITranslator)
if err != nil {
klog.Errorf(
"Error creating spec for volume %q, pod %q/%q: %v",
podVolume.Name,
podToAdd.Namespace,
podToAdd.Name,
err)
continue
}
plugin, err := adc.volumePluginMgr.FindAttachablePluginBySpec(volumeSpec)
if err != nil || plugin == nil {
klog.V(10).Infof(
"Skipping volume %q for pod %q/%q: it does not implement attacher interface. err=%v",
podVolume.Name,
podToAdd.Namespace,
podToAdd.Name,
err)
continue
}
volumeName, err := volumeutil.GetUniqueVolumeNameFromSpec(plugin, volumeSpec)
if err != nil {
klog.Errorf(
"Failed to find unique name for volume %q, pod %q/%q: %v",
podVolume.Name,
podToAdd.Namespace,
podToAdd.Name,
err)
continue
}
if adc.actualStateOfWorld.IsVolumeAttachedToNode(volumeName, nodeName) {
devicePath, err := adc.getNodeVolumeDevicePath(volumeName, nodeName)
if err != nil {
klog.Errorf("Failed to find device path: %v", err)
continue
}
err = adc.actualStateOfWorld.MarkVolumeAsAttached(volumeName, volumeSpec, nodeName, devicePath)
if err != nil {
klog.Errorf("Failed to update volume spec for node %s: %v", nodeName, err)
}
}
}
} return nil
}

3 adc.reconciler.Run

主要是调用rc.reconcile做desiredStateOfWorld与actualStateOfWorld之间的调谐:对比desiredStateOfWorld与actualStateOfWorld,做attach与detach操作,更新actualStateOfWorld,并根据actualStateOfWorld更新node对象的.Status.VolumesAttached

// pkg/controller/volume/attachdetach/reconciler/reconciler.go
func (rc *reconciler) Run(stopCh <-chan struct{}) {
wait.Until(rc.reconciliationLoopFunc(), rc.loopPeriod, stopCh)
} // reconciliationLoopFunc this can be disabled via cli option disableReconciliation.
// It periodically checks whether the attached volumes from actual state
// are still attached to the node and update the status if they are not.
func (rc *reconciler) reconciliationLoopFunc() func() {
return func() { rc.reconcile() if rc.disableReconciliationSync {
klog.V(5).Info("Skipping reconciling attached volumes still attached since it is disabled via the command line.")
} else if rc.syncDuration < time.Second {
klog.V(5).Info("Skipping reconciling attached volumes still attached since it is set to less than one second via the command line.")
} else if time.Since(rc.timeOfLastSync) > rc.syncDuration {
klog.V(5).Info("Starting reconciling attached volumes still attached")
rc.sync()
}
}
}

3.1 rc.reconcile

主要逻辑:

(1)遍历actualStateOfWorld中已经attached的volume,判断desiredStateOfWorld中是否存在,如果不存在,则调用rc.attacherDetacher.DetachVolume执行该volume的Detach操作;

(2)遍历desiredStateOfWorld中期望被attached的volume,判断actualStateOfWorld中是否已经attached到node上,如果没有,则先调用rc.isMultiAttachForbidden判断该volume的AccessModes是否支持多节点挂载,如支持,则继续调用rc.attacherDetacher.AttachVolume执行该volume的attach操作;

(3)调用rc.nodeStatusUpdater.UpdateNodeStatuses():根据从actualStateOfWorld获取已经attached到node的volume,更新node.Status.VolumesAttached的值。

// pkg/controller/volume/attachdetach/reconciler/reconciler.go
func (rc *reconciler) reconcile() {
// Detaches are triggered before attaches so that volumes referenced by
// pods that are rescheduled to a different node are detached first. // Ensure volumes that should be detached are detached.
for _, attachedVolume := range rc.actualStateOfWorld.GetAttachedVolumes() {
if !rc.desiredStateOfWorld.VolumeExists(
attachedVolume.VolumeName, attachedVolume.NodeName) {
// Don't even try to start an operation if there is already one running
// This check must be done before we do any other checks, as otherwise the other checks
// may pass while at the same time the volume leaves the pending state, resulting in
// double detach attempts
if rc.attacherDetacher.IsOperationPending(attachedVolume.VolumeName, "") {
klog.V(10).Infof("Operation for volume %q is already running. Can't start detach for %q", attachedVolume.VolumeName, attachedVolume.NodeName)
continue
} // Set the detach request time
elapsedTime, err := rc.actualStateOfWorld.SetDetachRequestTime(attachedVolume.VolumeName, attachedVolume.NodeName)
if err != nil {
klog.Errorf("Cannot trigger detach because it fails to set detach request time with error %v", err)
continue
}
// Check whether timeout has reached the maximum waiting time
timeout := elapsedTime > rc.maxWaitForUnmountDuration
// Check whether volume is still mounted. Skip detach if it is still mounted unless timeout
if attachedVolume.MountedByNode && !timeout {
klog.V(5).Infof(attachedVolume.GenerateMsgDetailed("Cannot detach volume because it is still mounted", ""))
continue
} // Before triggering volume detach, mark volume as detached and update the node status
// If it fails to update node status, skip detach volume
err = rc.actualStateOfWorld.RemoveVolumeFromReportAsAttached(attachedVolume.VolumeName, attachedVolume.NodeName)
if err != nil {
klog.V(5).Infof("RemoveVolumeFromReportAsAttached failed while removing volume %q from node %q with: %v",
attachedVolume.VolumeName,
attachedVolume.NodeName,
err)
} // Update Node Status to indicate volume is no longer safe to mount.
err = rc.nodeStatusUpdater.UpdateNodeStatuses()
if err != nil {
// Skip detaching this volume if unable to update node status
klog.Errorf(attachedVolume.GenerateErrorDetailed("UpdateNodeStatuses failed while attempting to report volume as attached", err).Error())
continue
} // Trigger detach volume which requires verifying safe to detach step
// If timeout is true, skip verifySafeToDetach check
klog.V(5).Infof(attachedVolume.GenerateMsgDetailed("Starting attacherDetacher.DetachVolume", ""))
verifySafeToDetach := !timeout
err = rc.attacherDetacher.DetachVolume(attachedVolume.AttachedVolume, verifySafeToDetach, rc.actualStateOfWorld)
if err == nil {
if !timeout {
klog.Infof(attachedVolume.GenerateMsgDetailed("attacherDetacher.DetachVolume started", ""))
} else {
metrics.RecordForcedDetachMetric()
klog.Warningf(attachedVolume.GenerateMsgDetailed("attacherDetacher.DetachVolume started", fmt.Sprintf("This volume is not safe to detach, but maxWaitForUnmountDuration %v expired, force detaching", rc.maxWaitForUnmountDuration)))
}
}
if err != nil && !exponentialbackoff.IsExponentialBackoff(err) {
// Ignore exponentialbackoff.IsExponentialBackoff errors, they are expected.
// Log all other errors.
klog.Errorf(attachedVolume.GenerateErrorDetailed("attacherDetacher.DetachVolume failed to start", err).Error())
}
}
} rc.attachDesiredVolumes() // Update Node Status
err := rc.nodeStatusUpdater.UpdateNodeStatuses()
if err != nil {
klog.Warningf("UpdateNodeStatuses failed with: %v", err)
}
}
rc.attachDesiredVolumes

主要逻辑:调用rc.attacherDetacher.AttachVolume触发attach逻辑。

// pkg/controller/volume/attachdetach/reconciler/reconciler.go
func (rc *reconciler) attachDesiredVolumes() {
// Ensure volumes that should be attached are attached.
for _, volumeToAttach := range rc.desiredStateOfWorld.GetVolumesToAttach() {
if rc.actualStateOfWorld.IsVolumeAttachedToNode(volumeToAttach.VolumeName, volumeToAttach.NodeName) {
// Volume/Node exists, touch it to reset detachRequestedTime
if klog.V(5) {
klog.Infof(volumeToAttach.GenerateMsgDetailed("Volume attached--touching", ""))
}
rc.actualStateOfWorld.ResetDetachRequestTime(volumeToAttach.VolumeName, volumeToAttach.NodeName)
continue
}
// Don't even try to start an operation if there is already one running
if rc.attacherDetacher.IsOperationPending(volumeToAttach.VolumeName, "") {
if klog.V(10) {
klog.Infof("Operation for volume %q is already running. Can't start attach for %q", volumeToAttach.VolumeName, volumeToAttach.NodeName)
}
continue
} if rc.isMultiAttachForbidden(volumeToAttach.VolumeSpec) {
nodes := rc.actualStateOfWorld.GetNodesForAttachedVolume(volumeToAttach.VolumeName)
if len(nodes) > 0 {
if !volumeToAttach.MultiAttachErrorReported {
rc.reportMultiAttachError(volumeToAttach, nodes)
rc.desiredStateOfWorld.SetMultiAttachError(volumeToAttach.VolumeName, volumeToAttach.NodeName)
}
continue
}
} // Volume/Node doesn't exist, spawn a goroutine to attach it
if klog.V(5) {
klog.Infof(volumeToAttach.GenerateMsgDetailed("Starting attacherDetacher.AttachVolume", ""))
}
err := rc.attacherDetacher.AttachVolume(volumeToAttach.VolumeToAttach, rc.actualStateOfWorld)
if err == nil {
klog.Infof(volumeToAttach.GenerateMsgDetailed("attacherDetacher.AttachVolume started", ""))
}
if err != nil && !exponentialbackoff.IsExponentialBackoff(err) {
// Ignore exponentialbackoff.IsExponentialBackoff errors, they are expected.
// Log all other errors.
klog.Errorf(volumeToAttach.GenerateErrorDetailed("attacherDetacher.AttachVolume failed to start", err).Error())
}
}
}
rc.nodeStatusUpdater.UpdateNodeStatuses

主要逻辑:从actualStateOfWorld获取已经attach到node的volume,更新node对象的node.Status.VolumesAttached属性值。

// pkg/controller/volume/attachdetach/statusupdater/node_status_updater.go
func (nsu *nodeStatusUpdater) UpdateNodeStatuses() error {
// TODO: investigate right behavior if nodeName is empty
// kubernetes/kubernetes/issues/37777
nodesToUpdate := nsu.actualStateOfWorld.GetVolumesToReportAttached()
for nodeName, attachedVolumes := range nodesToUpdate {
nodeObj, err := nsu.nodeLister.Get(string(nodeName))
if errors.IsNotFound(err) {
// If node does not exist, its status cannot be updated.
// Do nothing so that there is no retry until node is created.
klog.V(2).Infof(
"Could not update node status. Failed to find node %q in NodeInformer cache. Error: '%v'",
nodeName,
err)
continue
} else if err != nil {
// For all other errors, log error and reset flag statusUpdateNeeded
// back to true to indicate this node status needs to be updated again.
klog.V(2).Infof("Error retrieving nodes from node lister. Error: %v", err)
nsu.actualStateOfWorld.SetNodeStatusUpdateNeeded(nodeName)
continue
} if err := nsu.updateNodeStatus(nodeName, nodeObj, attachedVolumes); err != nil {
// If update node status fails, reset flag statusUpdateNeeded back to true
// to indicate this node status needs to be updated again
nsu.actualStateOfWorld.SetNodeStatusUpdateNeeded(nodeName) klog.V(2).Infof(
"Could not update node status for %q; re-marking for update. %v",
nodeName,
err) // We currently always return immediately on error
return err
}
}
return nil
} func (nsu *nodeStatusUpdater) updateNodeStatus(nodeName types.NodeName, nodeObj *v1.Node, attachedVolumes []v1.AttachedVolume) error {
node := nodeObj.DeepCopy()
node.Status.VolumesAttached = attachedVolumes
_, patchBytes, err := nodeutil.PatchNodeStatus(nsu.kubeClient.CoreV1(), nodeName, nodeObj, node)
if err != nil {
return err
} klog.V(4).Infof("Updating status %q for node %q succeeded. VolumesAttached: %v", patchBytes, nodeName, attachedVolumes)
return nil
}

4 adc.desiredStateOfWorldPopulator.Run

作用:更新desiredStateOfWorld,跟踪desiredStateOfWorld初始化后的后续变化更新。

主要调用了两个方法:

(1)dswp.findAndRemoveDeletedPods:更新desiredStateOfWorld,从中删除已经不存在的pod;

(2)dswp.findAndAddActivePods:更新desiredStateOfWorld,将新增的pod volume加入desiredStateOfWorld。

// pkg/controller/volume/attachdetach/populator/desired_state_of_world_populator.go
func (dswp *desiredStateOfWorldPopulator) Run(stopCh <-chan struct{}) {
wait.Until(dswp.populatorLoopFunc(), dswp.loopSleepDuration, stopCh)
} func (dswp *desiredStateOfWorldPopulator) populatorLoopFunc() func() {
return func() {
dswp.findAndRemoveDeletedPods() // findAndAddActivePods is called periodically, independently of the main
// populator loop.
if time.Since(dswp.timeOfLastListPods) < dswp.listPodsRetryDuration {
klog.V(5).Infof(
"Skipping findAndAddActivePods(). Not permitted until %v (listPodsRetryDuration %v).",
dswp.timeOfLastListPods.Add(dswp.listPodsRetryDuration),
dswp.listPodsRetryDuration) return
}
dswp.findAndAddActivePods()
}
}

4.1 dswp.findAndRemoveDeletedPods

主要逻辑:

(1)从desiredStateOfWorld中取出pod列表;

(2)查询该pod对象是否还存在于etcd;

(3)不存在则调用dswp.desiredStateOfWorld.DeletePod将该pod从desiredStateOfWorld中删除。

// pkg/controller/volume/attachdetach/populator/desired_state_of_world_populator.go
// Iterate through all pods in desired state of world, and remove if they no
// longer exist in the informer
func (dswp *desiredStateOfWorldPopulator) findAndRemoveDeletedPods() {
for dswPodUID, dswPodToAdd := range dswp.desiredStateOfWorld.GetPodToAdd() {
dswPodKey, err := kcache.MetaNamespaceKeyFunc(dswPodToAdd.Pod)
if err != nil {
klog.Errorf("MetaNamespaceKeyFunc failed for pod %q (UID %q) with: %v", dswPodKey, dswPodUID, err)
continue
} // Retrieve the pod object from pod informer with the namespace key
namespace, name, err := kcache.SplitMetaNamespaceKey(dswPodKey)
if err != nil {
utilruntime.HandleError(fmt.Errorf("error splitting dswPodKey %q: %v", dswPodKey, err))
continue
}
informerPod, err := dswp.podLister.Pods(namespace).Get(name)
switch {
case errors.IsNotFound(err):
// if we can't find the pod, we need to delete it below
case err != nil:
klog.Errorf("podLister Get failed for pod %q (UID %q) with %v", dswPodKey, dswPodUID, err)
continue
default:
volumeActionFlag := util.DetermineVolumeAction(
informerPod,
dswp.desiredStateOfWorld,
true /* default volume action */) if volumeActionFlag {
informerPodUID := volutil.GetUniquePodName(informerPod)
// Check whether the unique identifier of the pod from dsw matches the one retrieved from pod informer
if informerPodUID == dswPodUID {
klog.V(10).Infof("Verified pod %q (UID %q) from dsw exists in pod informer.", dswPodKey, dswPodUID)
continue
}
}
} // the pod from dsw does not exist in pod informer, or it does not match the unique identifier retrieved
// from the informer, delete it from dsw
klog.V(1).Infof("Removing pod %q (UID %q) from dsw because it does not exist in pod informer.", dswPodKey, dswPodUID)
dswp.desiredStateOfWorld.DeletePod(dswPodUID, dswPodToAdd.VolumeName, dswPodToAdd.NodeName)
}
}

4.2 dswp.findAndAddActivePods()

主要逻辑:

(1)从etcd中获取全部pod信息;

(2)调用util.ProcessPodVolumes,将新增pod volume加入desiredStateOfWorld。

// pkg/controller/volume/attachdetach/populator/desired_state_of_world_populator.go
func (dswp *desiredStateOfWorldPopulator) findAndAddActivePods() {
pods, err := dswp.podLister.List(labels.Everything())
if err != nil {
klog.Errorf("podLister List failed: %v", err)
return
}
dswp.timeOfLastListPods = time.Now() for _, pod := range pods {
if volutil.IsPodTerminated(pod, pod.Status) {
// Do not add volumes for terminated pods
continue
}
util.ProcessPodVolumes(pod, true,
dswp.desiredStateOfWorld, dswp.volumePluginMgr, dswp.pvcLister, dswp.pvLister, dswp.csiMigratedPluginManager, dswp.intreeToCSITranslator) } }

小结

attachDetachController.Run中4个主要方法作用:

(1)adc.populateActualStateOfWorld():初始化actualStateOfWorld结构体。

(2)adc.populateDesiredStateOfWorld():初始化desiredStateOfWorld结构体。

(3)adc.reconciler.Run():做desiredStateOfWorld与actualStateOfWorld之间的调谐:对比desiredStateOfWorld与actualStateOfWorld,做attach与detach操作,更新actualStateOfWorld,并根据actualStateOfWorld更新node对象的.Status.VolumesAttached

(4)adc.desiredStateOfWorldPopulator.Run():更新desiredStateOfWorld,跟踪desiredStateOfWorld初始化后的后续变化更新。

NewAttachDetachController源码分析

接下来看到NewAttachDetachController方法,简单分析一下它的EventHandler。从代码中可以看到,主要是注册了pod对象与node对象的EventHandler。

// pkg/controller/volume/attachdetach/attach_detach_controller.go
// NewAttachDetachController returns a new instance of AttachDetachController.
func NewAttachDetachController(
... podInformer.Informer().AddEventHandler(kcache.ResourceEventHandlerFuncs{
AddFunc: adc.podAdd,
UpdateFunc: adc.podUpdate,
DeleteFunc: adc.podDelete,
}) ... nodeInformer.Informer().AddEventHandler(kcache.ResourceEventHandlerFuncs{
AddFunc: adc.nodeAdd,
UpdateFunc: adc.nodeUpdate,
DeleteFunc: adc.nodeDelete,
}) ...
}

1 adc.podAdd

与adc.podUpdate一致,看下面adc.podUpdate的分析。

2 adc.podUpdate

作用:主要是更新dsw,将新pod的volume加入dsw中。

主要看到util.ProcessPodVolumes方法。

// pkg/controller/volume/attachdetach/attach_detach_controller.go
func (adc *attachDetachController) podUpdate(oldObj, newObj interface{}) {
pod, ok := newObj.(*v1.Pod)
if pod == nil || !ok {
return
}
if pod.Spec.NodeName == "" {
// Ignore pods without NodeName, indicating they are not scheduled.
return
} volumeActionFlag := util.DetermineVolumeAction(
pod,
adc.desiredStateOfWorld,
true /* default volume action */) util.ProcessPodVolumes(pod, volumeActionFlag, /* addVolumes */
adc.desiredStateOfWorld, &adc.volumePluginMgr, adc.pvcLister, adc.pvLister, adc.csiMigratedPluginManager, adc.intreeToCSITranslator)
}
util.ProcessPodVolumes

主要是更新dsw,将新pod的volume加入dsw中。

// pkg/controller/volume/attachdetach/util/util.go
// ProcessPodVolumes processes the volumes in the given pod and adds them to the
// desired state of the world if addVolumes is true, otherwise it removes them.
func ProcessPodVolumes(pod *v1.Pod, addVolumes bool, desiredStateOfWorld cache.DesiredStateOfWorld, volumePluginMgr *volume.VolumePluginMgr, pvcLister corelisters.PersistentVolumeClaimLister, pvLister corelisters.PersistentVolumeLister, csiMigratedPluginManager csimigration.PluginManager, csiTranslator csimigration.InTreeToCSITranslator) {
if pod == nil {
return
} if len(pod.Spec.Volumes) <= 0 {
klog.V(10).Infof("Skipping processing of pod %q/%q: it has no volumes.",
pod.Namespace,
pod.Name)
return
} nodeName := types.NodeName(pod.Spec.NodeName)
if nodeName == "" {
klog.V(10).Infof(
"Skipping processing of pod %q/%q: it is not scheduled to a node.",
pod.Namespace,
pod.Name)
return
} else if !desiredStateOfWorld.NodeExists(nodeName) {
// If the node the pod is scheduled to does not exist in the desired
// state of the world data structure, that indicates the node is not
// yet managed by the controller. Therefore, ignore the pod.
klog.V(4).Infof(
"Skipping processing of pod %q/%q: it is scheduled to node %q which is not managed by the controller.",
pod.Namespace,
pod.Name,
nodeName)
return
} // Process volume spec for each volume defined in pod
for _, podVolume := range pod.Spec.Volumes {
volumeSpec, err := CreateVolumeSpec(podVolume, pod.Namespace, nodeName, volumePluginMgr, pvcLister, pvLister, csiMigratedPluginManager, csiTranslator)
if err != nil {
klog.V(10).Infof(
"Error processing volume %q for pod %q/%q: %v",
podVolume.Name,
pod.Namespace,
pod.Name,
err)
continue
} attachableVolumePlugin, err :=
volumePluginMgr.FindAttachablePluginBySpec(volumeSpec)
if err != nil || attachableVolumePlugin == nil {
klog.V(10).Infof(
"Skipping volume %q for pod %q/%q: it does not implement attacher interface. err=%v",
podVolume.Name,
pod.Namespace,
pod.Name,
err)
continue
} uniquePodName := util.GetUniquePodName(pod)
if addVolumes {
// Add volume to desired state of world
_, err := desiredStateOfWorld.AddPod(
uniquePodName, pod, volumeSpec, nodeName)
if err != nil {
klog.V(10).Infof(
"Failed to add volume %q for pod %q/%q to desiredStateOfWorld. %v",
podVolume.Name,
pod.Namespace,
pod.Name,
err)
} } else {
// Remove volume from desired state of world
uniqueVolumeName, err := util.GetUniqueVolumeNameFromSpec(
attachableVolumePlugin, volumeSpec)
if err != nil {
klog.V(10).Infof(
"Failed to delete volume %q for pod %q/%q from desiredStateOfWorld. GetUniqueVolumeNameFromSpec failed with %v",
podVolume.Name,
pod.Namespace,
pod.Name,
err)
continue
}
desiredStateOfWorld.DeletePod(
uniquePodName, uniqueVolumeName, nodeName)
}
}
return
}

3 adc.podDelete

作用:将volume从dsw中删除。

// pkg/controller/volume/attachdetach/attach_detach_controller.go
func (adc *attachDetachController) podDelete(obj interface{}) {
pod, ok := obj.(*v1.Pod)
if pod == nil || !ok {
return
} util.ProcessPodVolumes(pod, false, /* addVolumes */
adc.desiredStateOfWorld, &adc.volumePluginMgr, adc.pvcLister, adc.pvLister, adc.csiMigratedPluginManager, adc.intreeToCSITranslator)
}

4 adc.nodeAdd

主要是调用了adc.nodeUpdate方法进行处理,所以作用与adc.nodeUpdate基本相似,看adc.nodeUpdate的分析。

// pkg/controller/volume/attachdetach/attach_detach_controller.go
func (adc *attachDetachController) nodeAdd(obj interface{}) {
node, ok := obj.(*v1.Node)
// TODO: investigate if nodeName is empty then if we can return
// kubernetes/kubernetes/issues/37777
if node == nil || !ok {
return
}
nodeName := types.NodeName(node.Name)
adc.nodeUpdate(nil, obj)
// kubernetes/kubernetes/issues/37586
// This is to workaround the case when a node add causes to wipe out
// the attached volumes field. This function ensures that we sync with
// the actual status.
adc.actualStateOfWorld.SetNodeStatusUpdateNeeded(nodeName)
}

5 adc.nodeUpdate

作用:往dsw中添加node,根据node.Status.VolumesInUse来更新asw。

// pkg/controller/volume/attachdetach/attach_detach_controller.go
func (adc *attachDetachController) nodeUpdate(oldObj, newObj interface{}) {
node, ok := newObj.(*v1.Node)
// TODO: investigate if nodeName is empty then if we can return
if node == nil || !ok {
return
} nodeName := types.NodeName(node.Name)
adc.addNodeToDswp(node, nodeName)
adc.processVolumesInUse(nodeName, node.Status.VolumesInUse)
}
5.1 adc.addNodeToDswp

往dsw中添加node

func (adc *attachDetachController) addNodeToDswp(node *v1.Node, nodeName types.NodeName) {
if _, exists := node.Annotations[volumeutil.ControllerManagedAttachAnnotation]; exists {
keepTerminatedPodVolumes := false if t, ok := node.Annotations[volumeutil.KeepTerminatedPodVolumesAnnotation]; ok {
keepTerminatedPodVolumes = (t == "true")
} // Node specifies annotation indicating it should be managed by attach
// detach controller. Add it to desired state of world.
adc.desiredStateOfWorld.AddNode(nodeName, keepTerminatedPodVolumes)
}
}
5.2 adc.processVolumesInUse

根据node.Status.VolumesInUse来更新asw

// processVolumesInUse processes the list of volumes marked as "in-use"
// according to the specified Node's Status.VolumesInUse and updates the
// corresponding volume in the actual state of the world to indicate that it is
// mounted.
func (adc *attachDetachController) processVolumesInUse(
nodeName types.NodeName, volumesInUse []v1.UniqueVolumeName) {
klog.V(4).Infof("processVolumesInUse for node %q", nodeName)
for _, attachedVolume := range adc.actualStateOfWorld.GetAttachedVolumesForNode(nodeName) {
mounted := false
for _, volumeInUse := range volumesInUse {
if attachedVolume.VolumeName == volumeInUse {
mounted = true
break
}
}
err := adc.actualStateOfWorld.SetVolumeMountedByNode(attachedVolume.VolumeName, nodeName, mounted)
if err != nil {
klog.Warningf(
"SetVolumeMountedByNode(%q, %q, %v) returned an error: %v",
attachedVolume.VolumeName, nodeName, mounted, err)
}
}
}

6 adc.nodeDelete

作用:从dsw中删除node以及该node相关的挂载信息

func (adc *attachDetachController) nodeDelete(obj interface{}) {
node, ok := obj.(*v1.Node)
if node == nil || !ok {
return
} nodeName := types.NodeName(node.Name)
if err := adc.desiredStateOfWorld.DeleteNode(nodeName); err != nil {
// This might happen during drain, but we still want it to appear in our logs
klog.Infof("error removing node %q from desired-state-of-world: %v", nodeName, err)
} adc.processVolumesInUse(nodeName, node.Status.VolumesInUse)
}

小结

pod对象与node对象的EventHandler主要作用分别如下:

(1)adc.podAdd:更新dsw,将新pod的volume加入dsw中;

(2)adc.podUpdate:更新dsw,将新pod的volume加入dsw中;

(3)adc.podDelete:将volume从dsw中删除;

(4)adc.nodeAdd:往dsw中添加node,根据node.Status.VolumesInUse来更新asw;

(5)adc.nodeUpdate:往dsw中添加node,根据node.Status.VolumesInUse来更新asw;

(6)adc.nodeDelete:从dsw中删除node以及该node相关的挂载信息。

总结

AD Cotroller作用

AD Cotroller全称Attachment/Detachment 控制器,主要负责创建、删除VolumeAttachment对象,并调用volume plugin来做存储设备的Attach/Detach操作(将数据卷挂载到特定node节点上/从特定node节点上解除挂载),以及更新node.Status.VolumesAttached等。

不同的volume plugin的Attach/Detach操作逻辑有所不同,如通过ceph-csi(out-tree volume plugin)来使用ceph存储,则的Attach/Detach操作只是修改VolumeAttachment对象的状态,而不会真正的将数据卷挂载到节点/从节点上解除挂载。

两个关键结构体

(1)desiredStateOfWorld: 记录着集群中期望要挂载到node的pod的volume信息,简称DSW,数据来源:pod里声明的volume列表。

(2)actualStateOfWorld: 记录着集群中实际已经挂载到node节点的volume信息,简称ASW,数据来源:node对象中的node.Status.VolumesAttached属性值列表。

AD controller会做desiredStateOfWorld与actualStateOfWorld之间的调谐:对比desiredStateOfWorld与actualStateOfWorld,对volume做attach或detach操作。

kube-controller-manager源码分析-AD controller分析的更多相关文章

  1. k8s client-go源码分析 informer源码分析(5)-Controller&Processor源码分析

    client-go之Controller&Processor源码分析 1.controller与Processor概述 Controller Controller从DeltaFIFO中pop ...

  2. Spring mvc之源码 handlerMapping和handlerAdapter分析

    Spring mvc之源码 handlerMapping和handlerAdapter分析 本篇并不是具体分析Spring mvc,所以好多细节都是一笔带过,主要是带大家梳理一下整个Spring mv ...

  3. ArrayList源码和多线程安全问题分析

    1.ArrayList源码和多线程安全问题分析 在分析ArrayList线程安全问题之前,我们线对此类的源码进行分析,找出可能出现线程安全问题的地方,然后代码进行验证和分析. 1.1 数据结构 Arr ...

  4. Okhttp3源码解析(3)-Call分析(整体流程)

    ### 前言 前面我们讲了 [Okhttp的基本用法](https://www.jianshu.com/p/8e404d9c160f) [Okhttp3源码解析(1)-OkHttpClient分析]( ...

  5. Okhttp3源码解析(2)-Request分析

    ### 前言 前面我们讲了 [Okhttp的基本用法](https://www.jianshu.com/p/8e404d9c160f) [Okhttp3源码解析(1)-OkHttpClient分析]( ...

  6. HashMap的源码学习以及性能分析

    HashMap的源码学习以及性能分析 一).Map接口的实现类 HashTable.HashMap.LinkedHashMap.TreeMap 二).HashMap和HashTable的区别 1).H ...

  7. ThreadLocal源码及相关问题分析

    前言 在高并发的环境下,当我们使用一个公共的变量时如果不加锁会出现并发问题,例如SimpleDateFormat,但是加锁的话会影响性能,对于这种情况我们可以使用ThreadLocal.ThreadL ...

  8. 物联网防火墙himqtt源码之MQTT协议分析

    物联网防火墙himqtt源码之MQTT协议分析 himqtt是首款完整源码的高性能MQTT物联网防火墙 - MQTT Application FireWall,C语言编写,采用epoll模式支持数十万 ...

  9. Netty 源码学习——客户端流程分析

    Netty 源码学习--客户端流程分析 友情提醒: 需要观看者具备一些 NIO 的知识,否则看起来有的地方可能会不明白. 使用版本依赖 <dependency> <groupId&g ...

随机推荐

  1. 老vue项目webpack3升级到webpack5全过程记录(一)

    背景 19年新建的vue项目,使用的是webpack3,随着项目的积累,组件的增多导致本地构建,线上打包等操作速度极慢,非常影响开发效率和部署效率,基于此问题,本次对webpack及相关插件进行了优化 ...

  2. 网络编程-UDP的服务器和客户端----keep on going never give up

    1 //**************************************服务器********************************************** 2 #inclu ...

  3. golang:Channel协程间通信

    channel是Go语言中的一个核心数据类型,channel是一个数据类型,主要用来解决协程的同步问题以及协程之间数据共享(数据传递)的问题.在并发核心单元通过它就可以发送或者接收数据进行通讯,这在一 ...

  4. [c++] 模板、迭代器、泛型

    模板 函数模板:重载的进一步抽象,只需定义一个函数体即可用于所有类型 在C++中,数据的类型也可以通过参数来传递,在函数定义时可以不指明具体的数据类型,当发生函数调用时,编译器可以根据传入的实参自动推 ...

  5. python文件对象几种操作模式区别——文件操作方法详解

    文件对象的字节模式/b模式(以utf-8编码为例) 读操作 写操作 指针操作 ASCII字节 返回bytes/字节类型的Ascii 写入bytes类型字节 例如:b'This is ascii' 使用 ...

  6. 使用find命令查找大文件

    使用find命令查找大文件 find命令是Linux系统管理员工具库中最强大的工具之一.它允许您根据不同的标准(包括文件大小)搜索文件和目录. 例如,如果在当前工作目录中要搜索大小超过100MB的文件 ...

  7. centos7安装google-chrome

    完整的安装步骤:https://www.tecmint.com/install-google-chrome-on-redhat-centos-fedora-linux/ 1.简单安装测试版:sudo ...

  8. JQuery.Gantt开发指南(转)

    说明 日前需要用到甘特图,以下转载内容源自网络. • 概述 1.JQuery.Gantt是一个开源的基于JQuery库的用于实现甘特图效果的可扩展功能的JS组件库. •前端页面 o 资源引用 首先我们 ...

  9. 重新整理 .net core 实践篇—————配置系统之强类型配置[十]

    前言 前文中我们去获取value值的时候,都是通过configurationRoot 来获取的,如configurationRoot["key"],这种形式. 这种形式有一个不好的 ...

  10. 处理SpringMVC中遇到的乱码问题

    乱码在日常开发写代码中是非常常见的,以前乱码使用的是通过设置一个过滤器解决, 现在可以使用SpringMVC给提供的过滤器,在web.xml设置,这比我们自己写的过滤器强大的的多. 注意:每次修改了x ...