28 August, 2023

Kubernetes 容器运行时接口 CRI

笔记

写这篇文章是来填很久之前挖下的坑。

本文涉及组件的源码版本如下：

Kubernetes 1.24
CRI 0.25.0
Containerd 1.6

容器运行时（Container Runtime）是负责管理和执行容器的组件。它负责将容器镜像转化为在主机上运行的实际容器进程，提供镜像管理、容器的生命周期管理、资源隔离、文件系统、网络配置等功能。

常见容器运行时有下面这几种，这些容器运行时都提供了不同程度的功能和性能。但他们都遵循容器运行时接口（CRI），以便能够与 Kubernetes 或其他容器编排系统集成，实现容器的调度和管理。

有了 CRI，我们也可以“随意”地在几种容器运行时之间进行切换，而无需重新编译 Kubernetes。简单来讲，CRI 定义了所有对容器的操作，作为容器编排系统与容器运行时间的标准接口存在。

CRI 的前生今世

CRI 的首次引入是在 Kubernets 1.5，初始版本是 v1alpha1。在这之前，Kubernetes 需要在 kubelet 源码中维护对各个容器运行时的支持。

有了 CRI 之后，在 kubelet 中仅需支持 CRI 即可，然后通过一个中间层 CRI shim（grpc 服务器）与容器运行时进行交互。因为此时各家容器运行时实现还未支持 CRI。

在去年发布的 Kubernetes 1.24 中，正式移除了 Dockershim，与容易运行时的交互得到了简化。

Kubernetes 目前支持 CRI 的 v1alpha2 和 v1。其中 v1 版本是在 Kubernetes 1.23 版本中引入的。

每次 kubelet 启动时，首先会尝试使用 v1 的 API 与容器运行时进行连接。如果失败，才会尝试使用 v1alpha2。

kubelet 与 CRI

在之前做过的 kubelet 源码分析中曾提到 Kubelet#syncLoop() 会持续监控来自文件、apiserver、http 的变更，来更新 pod 的状态。写那篇文章的时候，分析到这里就结束了。因为这之后的工作就交给容器运行时来完成 sandbox 和各种容器的创建和运行，见 kubeGenericRuntimeManager#SyncPod()。

kubelet 启动时便会初始化 CRI 客户端，与容器运行时建立连接并确认 CRI 的版本。

创建 pod 的过程中，都会通过 CRI 与容器运行时进行交互：

创建 sandbox
创建容器
拉取镜像

参考源码

接下来我们以 Containerd 为例，看下如何处理 kubelet 的请求。

Containerd 与 CRI

Containerd 的 criService 实现了 CRI 接口 RuntimeService 和 ImageService 的 RuntimeServiceServer 和 ImageServiceServer。

cirService 会进一步包装成 instrumentedService，保证所有的操作都是在 k8s.io 命名空间下执行的

RuntimeServiceServer

type RuntimeServiceServer interface {
	// Version returns the runtime name, runtime version, and runtime API version.
	Version(context.Context, *VersionRequest) (*VersionResponse, error)
	// RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
	// the sandbox is in the ready state on success.
	RunPodSandbox(context.Context, *RunPodSandboxRequest) (*RunPodSandboxResponse, error)
	// StopPodSandbox stops any running process that is part of the sandbox and
	// reclaims network resources (e.g., IP addresses) allocated to the sandbox.
	// If there are any running containers in the sandbox, they must be forcibly
	// terminated.
	// This call is idempotent, and must not return an error if all relevant
	// resources have already been reclaimed. kubelet will call StopPodSandbox
	// at least once before calling RemovePodSandbox. It will also attempt to
	// reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
	// multiple StopPodSandbox calls are expected.
	StopPodSandbox(context.Context, *StopPodSandboxRequest) (*StopPodSandboxResponse, error)
	// RemovePodSandbox removes the sandbox. If there are any running containers
	// in the sandbox, they must be forcibly terminated and removed.
	// This call is idempotent, and must not return an error if the sandbox has
	// already been removed.
	RemovePodSandbox(context.Context, *RemovePodSandboxRequest) (*RemovePodSandboxResponse, error)
	// PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
	// present, returns an error.
	PodSandboxStatus(context.Context, *PodSandboxStatusRequest) (*PodSandboxStatusResponse, error)
	// ListPodSandbox returns a list of PodSandboxes.
	ListPodSandbox(context.Context, *ListPodSandboxRequest) (*ListPodSandboxResponse, error)
	// CreateContainer creates a new container in specified PodSandbox
	CreateContainer(context.Context, *CreateContainerRequest) (*CreateContainerResponse, error)
	// StartContainer starts the container.
	StartContainer(context.Context, *StartContainerRequest) (*StartContainerResponse, error)
	// StopContainer stops a running container with a grace period (i.e., timeout).
	// This call is idempotent, and must not return an error if the container has
	// already been stopped.
	// The runtime must forcibly kill the container after the grace period is
	// reached.
	StopContainer(context.Context, *StopContainerRequest) (*StopContainerResponse, error)
	// RemoveContainer removes the container. If the container is running, the
	// container must be forcibly removed.
	// This call is idempotent, and must not return an error if the container has
	// already been removed.
	RemoveContainer(context.Context, *RemoveContainerRequest) (*RemoveContainerResponse, error)
	// ListContainers lists all containers by filters.
	ListContainers(context.Context, *ListContainersRequest) (*ListContainersResponse, error)
	// ContainerStatus returns status of the container. If the container is not
	// present, returns an error.
	ContainerStatus(context.Context, *ContainerStatusRequest) (*ContainerStatusResponse, error)
	// UpdateContainerResources updates ContainerConfig of the container synchronously.
	// If runtime fails to transactionally update the requested resources, an error is returned.
	UpdateContainerResources(context.Context, *UpdateContainerResourcesRequest) (*UpdateContainerResourcesResponse, error)
	// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
	// for the container. This is often called after the log file has been
	// rotated. If the container is not running, container runtime can choose
	// to either create a new log file and return nil, or return an error.
	// Once it returns error, new container log file MUST NOT be created.
	ReopenContainerLog(context.Context, *ReopenContainerLogRequest) (*ReopenContainerLogResponse, error)
	// ExecSync runs a command in a container synchronously.
	ExecSync(context.Context, *ExecSyncRequest) (*ExecSyncResponse, error)
	// Exec prepares a streaming endpoint to execute a command in the container.
	Exec(context.Context, *ExecRequest) (*ExecResponse, error)
	// Attach prepares a streaming endpoint to attach to a running container.
	Attach(context.Context, *AttachRequest) (*AttachResponse, error)
	// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
	PortForward(context.Context, *PortForwardRequest) (*PortForwardResponse, error)
	// ContainerStats returns stats of the container. If the container does not
	// exist, the call returns an error.
	ContainerStats(context.Context, *ContainerStatsRequest) (*ContainerStatsResponse, error)
	// ListContainerStats returns stats of all running containers.
	ListContainerStats(context.Context, *ListContainerStatsRequest) (*ListContainerStatsResponse, error)
	// PodSandboxStats returns stats of the pod sandbox. If the pod sandbox does not
	// exist, the call returns an error.
	PodSandboxStats(context.Context, *PodSandboxStatsRequest) (*PodSandboxStatsResponse, error)
	// ListPodSandboxStats returns stats of the pod sandboxes matching a filter.
	ListPodSandboxStats(context.Context, *ListPodSandboxStatsRequest) (*ListPodSandboxStatsResponse, error)
	// UpdateRuntimeConfig updates the runtime configuration based on the given request.
	UpdateRuntimeConfig(context.Context, *UpdateRuntimeConfigRequest) (*UpdateRuntimeConfigResponse, error)
	// Status returns the status of the runtime.
	Status(context.Context, *StatusRequest) (*StatusResponse, error)
	// CheckpointContainer checkpoints a container
	CheckpointContainer(context.Context, *CheckpointContainerRequest) (*CheckpointContainerResponse, error)
	// GetContainerEvents gets container events from the CRI runtime
	GetContainerEvents(*GetEventsRequest, RuntimeService_GetContainerEventsServer) error
}

ImageServiceServer

type ImageServiceServer interface {  
    // ListImages lists existing images.    ListImages(context.Context, *ListImagesRequest) (*ListImagesResponse, error)  
    // ImageStatus returns the status of the image. If the image is not    // present, returns a response with ImageStatusResponse.Image set to    // nil.    ImageStatus(context.Context, *ImageStatusRequest) (*ImageStatusResponse, error)  
    // PullImage pulls an image with authentication config.    PullImage(context.Context, *PullImageRequest) (*PullImageResponse, error)  
    // RemoveImage removes the image.    // This call is idempotent, and must not return an error if the image has    // already been removed.    RemoveImage(context.Context, *RemoveImageRequest) (*RemoveImageResponse, error)  
    // ImageFSInfo returns information of the filesystem that is used to store images.  
    ImageFsInfo(context.Context, *ImageFsInfoRequest) (*ImageFsInfoResponse, error)  
}

下面以创建 sandbox 为例看一下 Containerd 的源码。

Containerd 源码分析

创建 sandbox 容器的请求通过 CRI 的 UDS（Unix domain socket）接口 /runtime.v1.RuntimeService/RunPodSandbox，进入到 criService 的处理流程中。在 criService#RunPodSandbox()，负责创建和运行 sandbox 容器，并保证容器状态正常。