弹性自组织多集群管理系统设计与实现-AET-电子技术应用

弹性自组织多集群管理系统设计与实现

网络安全与数据治理

夏令明, 周俊，赵锋

网络通信与安全紫金山实验室未来网络研究中心, 江苏南京211111

摘要： Kubernetes等云原生技术在业界应用时，承载能力有限，无法满足更高可用性要求，且易被云供应商锁定；东数西算等战略的实施运行，需以多集群管理技术为基础，但是传统的云管平台难以满足跨多云应用的服务部署和治理的挑战。提出软件定义的自组织基础设施管理、幂等的分层调度新理念，实现以集群为最小单位的弹性基础设施管理架构，将多个Kubernetes集群组成中心式、去中心式、树状等任意拓扑结构，进行应用的跨云调度及管理。方案基于树状集群结构进行了测试验证，并与其他方案对比，测试结果表明该方案能够满足未来分布式云场景下海量集群组织管理需求，且保持接入新集群不超过1 s，应用的调度延迟不超过200 ms。

关键词： 自组织基础设施分布式云幂等的分层调度

中图分类号：TP393文献标识码：ADOI:10.19358/j.issn.2097-1788.2023.12.014
引用格式：夏令明,周俊，赵锋.弹性自组织多集群管理系统设计与实现［J］.网络安全与数据治理，2023，42（12）：84-89.

Design and implementation of a elastic self organizing multi cluster management system

Xia Lingming, Zhou Jun, Zhao Feng

Future Network Research Center, Network Communication and Security Purple Mountain Laboratory, Nanjing 211111, China

Abstract： When cloud native technologies such as Kubernetes are applied in the industry, their carrying capacity is limited, they cannot meet higher availability requirements, and are easily locked in by cloud providers. The implementation and operation of strategies such as Eastern Data and Western Computing need to be based on multi cluster management technology. However, traditional cloud management platforms cannot meet the challenges of service deployment and governance across multi cloud applications. Aiming at the above problems, this paper puts forward a new concept of softwaredefined selforganizing infrastructure management and idempotent hierarchical scheduling. An elastic infrastructure management architecture with clusters as the smallest unit is designed and implemented, which can make multiple Kubernetes clusters into a multicluster organization scheme with any topology structure such as central, decentralized and tree, and carry out cross cloud scheduling and management of applications. The tree structure is tested and compared with other solutions, which can well meet the huge number clusters organization and management requirements in the future distributed cloud scenario while keep the registration latency of cluster limit to 1 s, scheduler latency limit to 200 ms.

Key words : self organizing infrastructure; distributed cloud; idempotent hierarchical scheduling

引言

单Kubernetes［1］集群无法满足边缘、地域、资源管理等需求，因此在东数西算等典型多集群场景中［2］，将不得不解决集群的接入控制、集群资源抽象、权限管理、应用管理、多集群调度、服务维持、多租户以及多集群服务发现等问题［3-5］，这大大增加了多集群方案的复杂性和难度。目前社区和业界，集群拓扑均以父子两层架构为主，父集群作为主控集群，其余集群为子集群，用于承载工作负载，其中主流的有Kubefed［6-7］联邦方案、Karmada［8］、Clusternet［9］、Admiralty［10］四种。Kubefed和 Karmada是一类，它们通过Template、Overide、Propgation 等定义负载的通用配置、专有配置和调度策略。Karmada 自Kubefederation发展而来，但是支持更丰富的插件化调度能力以及多集群服务（Multi cluster service）等特性，Karmada 也顺利成为CNCF基金会孵化项目。但是这二者仅支持中心式的两层架构，扩展性和承载力都存在理论瓶颈。Clusternet 项目是一个践行了OCM模型的多集群方案，也入选了CNCF沙箱项目，子集群通过受控的Token，在子集群启动时，接入到父集群之中。

作者信息

夏令明, 周俊，赵锋

(网络通信与安全紫金山实验室未来网络研究中心, 江苏南京211111)

文章下载地址：https://www.chinaaet.com/resource/share/2000005882

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容