使用 Kubernetes 构建 CI 作业及处理通用进程——第 2 部分

Kubernetes 内容系列由上下两部分组成，本文是第二部分。在第一部分中，我们回顾了 Kubernetes 部署工作负载的基本构建块——Docker 映像/容器和 Kubernetes pod。在本文中，我们将使用 Kubernetes job object，这个功能具有更好的容错性和扩展性。在深入了解之前，强烈建议大家先阅读第一部分，了解底层构建块，这也是看懂本文的基础。

运行用例代码

运行代码需要以下前提条件，建议大家在阅读的同时，运行下列命令：

安装 Docker
安装 Kubectl，并连接到 Kubernetes 群集，你可以使用 kind 或 Minikube 在本地运行开发群集。
拆分示例代码存储库，便于进行版本发布
创建具有管理发布权限的个人访问令牌

以上所有命令都应该从存储库的根目录运行。因此，在拆分代码存储库之后，需要先进行复制，并在代码存储库根目录下打开一个终端。

为了使代码示例更容易运行，请在 shell 中设置以下环境变量（用相关具体信息替换 YourGitHub* 值）：

export GITHUB_TOKEN=YourGitHhubPersonalAccessToken
export GITHUB_USER=YourGitHubUserName

同时，设置以下环境变量，让示例代码更加简洁，也更容易处理：

QUEUE_IMAGE=ghcr.io/orihoch/k8s-ci-processing-jobs-builder-queue

使用 Kubernetes Job object 功能，强化工作负载执行

在上一篇文章中，我们了解了如何在 Kubernetes 集群上运行 pod。虽然这个操作适用于大部分用例，但也有一些缺点。Kubernetes 集群可以动态运行，可以停止节点进行升级，或在 RAM 不够的节点上安排 pod，但这将导致节点意外中止。所以一般建议不直接使用 pod。最好的操作是使用更高级别的抽象，让 Kubernetes 处理类似的意外故障。

运行 CI 构建作业或其他一次性进程任务时，建议使用 Kubernetes job object。job object 可以管理、调度 pod，确保作业运行并完成。

Kubernetes yaml 文件可以定义作业（所有示例的 yaml 文件都放在代码存储库的 manifests/ 目录中）。我们使用 envsubt 的 shell 模板简化创建过程，以相同模板创建多个对象。

让我们从一个简单的示例开始，将第 1 部分中使用的 pod 复制到 Kubernetes job object：

/# manifests/single-pod-job.yaml
 apiVersion: batch/v1 
kind: Job 
metadata: 
 name: "builder-$TAG-$OS-$ARCH" 
spec: 
 template: 
  spec: 
   containers: 
   - name: builder 
   image: $BUILDER_IMAGE 
   args: ["$OS/$ARCH", "$GITHUB_USER", "$TAG"] 
   env: 
   - name: TOKEN 
     value: "$GITHUB_TOKEN" 
restartPolicy: Never/

name：创建的作业对象名称。根据环境变量，我们使用 envsubt 进行动态生成。
image：我们要部署的 Docker 映像，与第 1 部分我们在编译和发布简单 hello world golang 二进制文件中使用的映像相同。这也可以更改为 C++ 编译或其他构建、进程任务。
args：要传递到映像的参数，在本例中指的是需要编译的操作系统架构、要发布为二进制文件的 GitHub 用户名和标记名。
env：我们将 GitHub 令牌作为环境变量添加到容器中，允许脚本将二进制文件发布到 GitHub。
restartPolicy：默认情况下，Kubernetes 会重启意外停止的 pod。但由于我们只需要运行一次的构建脚本，所以不需要重启。

发布 v0.0.3 新版本，并使用以下命令将一些作业部署到集群：

cat manifests/single-pod-job.yaml | OS=linux ARCH=amd64 TAG=v0.0.3 envsubst  | kubectl apply -f -
cat manifests/single-pod-job.yaml | OS=linux ARCH=386 TAG=v0.0.3 envsubst    | kubectl apply -f -
cat manifests/single-pod-job.yaml | OS=windows ARCH=arm TAG=v0.0.3 envsubst  | kubectl apply -f -

我们可以拆解下面命令的意思：

- cat manifests/single-pod-job.yaml |
打开清单文件并将内容发给下一条命令。
- OS=linux ARCH=amd64 TAG=v0.0.3 envsubst |
运行 envsubst 命令，该命令在任何 shell 中都可用，并提供基本的模板功能。它用实际值替换清单文件中的环境变量，允许在值不同的情况下重复使用同一个文件。转发到下一个命令。
- kubectl apply -f –
在Kubernetes集群上应用清单，创建 job object。

在运行上述命令之后，你可以看到 pod 将与前一篇文章中的一样：

kubectl get pods

但是，你也可以看到 job object：

kubectl get jobs

job object 跟踪 pod 并确保每个 pod 顺利运行并完成。如果失败，job object将重试，并安排最多 6 次的新 pod（可通过 backoffLimit 属性进行配置）。这意味着节点故障或 RAM 不足等意外故障将不会影响作业运行，Kubernetes 让作业顺畅无阻。

作业完成后，你应该删除job object，清理所有pod，防止集群出现混乱：

kubectl delete job builder-v0.0.3-linux-amd64 builder-v0.0.3-linux-386 builder-v0.0.3-windows-arm

当你删除 job object，创建的 pod 也会相应删除。

使用作业队列进行扩展

构建脚本支持 44 个操作系统架构，如果你集群容量够大，最好并行运行所有架构。然而，到现在为止，所有的示例都需要单独安排每个作业。Kubernetes job object 的优点是能够调用许多并行 pod，并等待它们完成处理。

要使用该功能，我们需要一个队列来存储需要处理的项目，并厘清队列的逻辑——从队列中获取项目、处理超时/错误等。我们可以采取不同的方法实现这一点，不过你也可以检查你的公司是不是已经有相应的解决方案。在本示例中，我将采用基于 Redis 和最少的 Python“粘合”代码完成简单的队列。

你可以在 builder-queue/ 目录中看到所有代码，我将在重点强调部分代码：

builder-queue/Dockerfile–扩展包含构建脚本的构建器映像，可以轻松修改该映像，扩展任何具有队列功能的构建脚本或进程任务。我们添加了 Python3 和帮助实现队列的 rq 库。
builder-queue/builder_queue_entrypoint.sh–覆盖 builder entrypoint，并添加处理队列的命令——获取队列的信息，向队列添加项目，并运行工作程序处理队列中的项目。
builder-queue/builder_queue.py–向队列中添加项目。将所有兼容的操作系统架构作为单个项目添加到队列中。
builder-queue/builder_queue_lib.py–处理每个项目，调用 builder 脚本的原始入口点。该入口点可编译操作系统架构，并发布二进制文件。

要部署作业队列，我们首先需要队列服务器。在本次示例中，我们将使用Redis。代码存储库包含一个简单的 yaml，其中包含 Redis 部署和服务。点击链接即可查看 yaml。以下命令将进行部署，并等待部署完成：

/kubectl apply -f manifests/redis.yaml &&\
kubectl wait deployment/redis --for condition=available

kubectl apply -f manifests/redis.yaml: 将给定的清单文件应用到集群。在本例中，我们不需要通过 envsubst 来传递，因为没有要替换的环境变量。
kubectl wait deployment/redis –for condition=available: 等待，直到 Redis 部署可用。

要将作业添加到队列，并查询队列状态，我们访问 Redis 服务器，可以使用kubectl port-forward命令来启用此功能：

kubectl port-forward deployment/redis 6379 &

Now local port 6379 is forwarded to the redis deployment on your Kubernetes cluster.

现在，本地端口 6379 被转发到 Kubernetes 集群上的 redis 部署中。

部署 v0.0.4 新版本，并运行以下命令，将所有操作系统架构添加到队列中：

docker run --network host $QUEUE_IMAGE --rq-add all $GITHUB_USER v0.0.4

你可以查看该命令的作用，主要是将项目添加到 Redis 队列中。每个操作系统架构列为一个项目。

现在一切就绪，可以开始运行实际的工作负载了，我们将使用以下 yaml：

# manifests/multi-pod-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
 name: "builder-queue"
spec:
 parallelism: 4
 template:
  spec:
    containers:
    - name: builder-queue
    image: ghcr.io/orihoch/k8s-ci-processing-jobs-builder-queue
    args: ["--rq-worker"]
    env:
     - name: TOKEN
     value: "$GITHUB_TOKEN"
     - name: RQ_REDIS_HOST
     value: "redis"
restartPolicy: OnFailure

这与前面的简单作业的主要区别是 parallelism 属性。在本例中，我们将其设置为 4，这意味着将启动 4 个平行 pod。我们使用带有–rq worker 参数的 jobs builder 队列映像，该参数将处理队列中的作业，直到剩余项为零为止。我们将 restartPolicy 设置为 OnFailure，这样，如果出现错误，pod 将重新启动。但当队列中没有剩余项目时，进程将退出并返回，pod 不会重新启动。

使用以下命令部署此作业：

cat manifests/multi-pod-job.yaml | envsubst | kubectl apply -f -

检查 pod, 已成功创建并等待运行：

kubectl get pods

我们应该可以看到 4 个 pod，正如我们在 parallelism 属性中指定数量。pod运行时，可以使用以下命令检查队列状态：

docker run --network host $QUEUE_IMAGE --rq-info

应该可以看到，所有程序都处于忙碌状态，队列中的项目数慢慢减少。

处理完队列中的所有项目后，所有 pod 状态应显示为“已完成”。

现在，可以通过运行以下命令，停止转发到 Redis 部署端口：

kill %1

最后，你可以清理创建的 pod，删除 job object，防止集群出现混乱。

kubectl delete job builder-queue

总结

本文在上一篇文章的基础上进一步扩展，并充分利用了 Kubernetes 的容错和伸缩性能。我也建议大家多去研究 Kubernetes job object，了解所有可用的功能和配置选项：

我们在示例中使用 GO 创建的 hello world 程序，也可以延展到 C++ 编译或其他构建作业/数据处理/耗时的任务。更改 parallelism 属性，就能轻松地增加 pod 的规模，将本例中的 4 个pod 扩展为数百个，并行运行。尽管 CI 系统可以有效帮助处理 CI 作业，但有时也会有一些限制，这时候，工具集中的Kubernetes job object 说不定能帮你渡过难关。

Ori Hoch

Ori 是 DevOps 的顾问，拥有超过 15 年的项目技术经验，包括小型初创企业到大型公司。Ori 擅长于帮助团队升级 DevOps、CI/CD 和自动化系统，以及 Kubernetes 和云本地系统。Ori 长期活跃于开放数据和开源项目领域，并贡献了很多精彩内容。详情请参考他的 GitHub 主页：https://github.com/OriHoch

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_147093399_1	1 minute	Set by Google to distinguish users.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
BAIDUID	1 year	Baidu installs this cookie to store analytical data like number of sessions, time spent on the page, bounce rate, the device used, etc.
utm_campaign	2 months	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	2 months	This cookie is used for storing the session content value if present.
utm_source	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.

Cookie	Duration	Description
AGL_USER_ID	7977 years 6 months 22 days 13 hours	No description available.
BIGipServersn-mch-v2-80	session	No description
BIGipServersn02web-nginx-app_https	session	No description
Hm_ck_1654686534484	session	No description
Hm_ck_1654686545903	session	No description
Hm_ck_1654686785317	session	No description
Hm_ck_1654686803939	session	No description
Hm_ck_1654686830687	session	No description
Hm_ck_1654686905307	session	No description
Hm_lpvt_08824d287f65a57bc02536f25f8be026	session	No description
Hm_lvt_08824d287f65a57bc02536f25f8be026	1 year	No description
HMACCOUNT	15 years 7 months 10 days 13 hours	This cookie is set by the provider Baidu. This cookie is used to send data about visitor device and behaviour to Baidu. It helps in tracking the visitor across devices.
HMACCOUNT_BFESS	15 years 7 months 10 days 13 hours	No description available.
ib_last_referrer	2 months	No description
incap_ses_873_2167377	session	No description
nlbi_2167377	session	No description
referrer66_00f	1 month	No description
SESSb11a5778793f573778d3e2b21c7f1e0a	session	No description
visid_incap_2167377	1 year	No description
visitorId	1 year	No description

使用 Kubernetes 构建 CI 作业及处理通用进程——第 2 部分

运行用例代码

使用 Kubernetes Job object 功能，强化工作负载执行

使用作业队列进行扩展

总结

Ori Hoch

订阅博客

Related Posts

目前和未来的缓存构建

通过 Incredibuild 实现 GitLab 管道的精细化

构建交付管道的 8 个 CI/CD 工具