Skip to content

自定义 Insight 组件调度策略

当部署可观测平台 Insight 到 Kubernetes 环境时,正确的资源管理和优化至关重要。 Insight 包含多个核心组件,如 Prometheus、OpenTelemetry、FluentBit、Vector、Elasticsearch 等, 这些组件在运行过程中可能因为资源占用问题对集群内其他 Pod 的性能产生负面影响。 为了有效地管理资源并优化集群的运行,节点亲和性成为一项重要的配置选项。

本文将重点探讨如何通过污点节点亲和性的配置策略,使得每个组件能够在适当的节点上运行, 并避免资源竞争或争用,从而确保整个 Kubernetes 集群的稳定性和高效性。

通过污点为 Insight 配置专有节点

由于 Insight Agent 包含了 DaemonSet 组件,所以本节所述的配置方式是让除了 Insight DameonSet 之外的其余组件均运行在专有节点上。

该方式是通过为专有节点添加污点(taint),并配合污点容忍度(tolerations)来实现的。 更多细节可以参考 Kubernetes 官方文档

可以参考如下命令为节点添加及移除污点:

# 添加污点
kubectl taint nodes worker1 node.daocloud.io=insight-only:NoSchedule

# 移除污点
kubectl taint nodes worker1 node.daocloud.io:NoSchedule-

有以下两种途径让 Insight 组件调度至专有节点:

1. 为每个组件添加污点容忍度

针对 insight-serverinsight-agent 两个 Chart 分别进行配置:

server:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"

ui:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"

runbook:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"

# mysql:
victoria-metrics-k8s-stack:
  victoria-metrics-operator:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
  vmcluster:
    spec:
      vmstorage:
        tolerations:
          - key: "node.daocloud.io"
            operator: "Equal"
            value: "insight-only"
            effect: "NoSchedule"
      vmselect:
        tolerations:
          - key: "node.daocloud.io"
            operator: "Equal"
            value: "insight-only"
            effect: "NoSchedule"
      vminsert:
        tolerations:
          - key: "node.daocloud.io"
            operator: "Equal"
            value: "insight-only"
            effect: "NoSchedule"
  vmalert:
    spec:
      tolerations:
        - key: "node.daocloud.io"
          operator: "Equal"
          value: "insight-only"
          effect: "NoSchedule"
  alertmanager:
    spec:
      tolerations:
        - key: "node.daocloud.io"
          operator: "Equal"
          value: "insight-only"
          effect: "NoSchedule"

jaeger:
  collector:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
  query:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"

opentelemetry-collector-aggregator:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"

opentelemetry-collector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"

grafana-operator:
  operator:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
  grafana:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
kibana:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"

elastic-alert:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"

vector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      tolerations:
        - key: "node.daocloud.io"
          operator: "Equal"
          value: "insight-only"
          effect: "NoSchedule"
  prometheus-node-exporter:
    tolerations:
      - effect: NoSchedule
        operator: Exists
  prometheusOperator:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"

kube-state-metrics:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
opentelemetry-operator:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
opentelemetry-collector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
tailing-sidecar-operator:
  operator:
    tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
opentelemetry-kubernetes-collector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
prometheus-blackbox-exporter:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
etcd-exporter:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule" 

2. 通过命名空间级别配置

insight-system 命名空间的 Pod 都容忍 node.daocloud.io=insight-only 污点。

  1. 调整 apiserver 的配置文件 /etc/kubernetes/manifests/kube-apiserver.yaml,放开 PodTolerationRestriction,PodNodeSelector, 参考下图:

    insight-ns-toleration

  2. insight-system 命名空间增加注解:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: insight-system
      annotations:
        scheduler.alpha.kubernetes.io/defaultTolerations: '[{"operator": "Equal", "effect": "NoSchedule", "key": "node.daocloud.io", "value": "insight-only"}]'
    

重启 insight-system 命名空间下面的组件即可正常容忍 insight-system 下的 Pod 调度。

为节点添加 Label 和节点亲和性来管理组件调度

Info

节点亲和性概念上类似于 nodeSelector,它使你可以根据节点上的 标签(label) 来约束 Pod 可以调度到哪些节点上。 节点亲和性有两种:

  1. requiredDuringSchedulingIgnoredDuringExecution:调度器只有在规则被满足的时候才能执行调度。此功能类似于 nodeSelector, 但其语法表达能力更强。
  2. preferredDuringSchedulingIgnoredDuringExecution:调度器会尝试寻找满足对应规则的节点。如果找不到匹配的节点,调度器仍然会调度该 Pod。

更过细节请参考 kubernetes 官方文档

为了实现不同用户对 Insight 组件调度的灵活需求,Insight 分别提供了较为细粒度的 Label 来实现不同组件的调度策略,以下是标签与组件的关系说明:

标签 Key 标签 Value 说明
node.daocloud.io/insight-any 任意值,推荐用 true 代表 Insight 所有组件优先考虑带了该标签的节点
node.daocloud.io/insight-prometheus 任意值,推荐用 true 特指 Prometheus 组件
node.daocloud.io/insight-vmstorage 任意值,推荐用 true 特指 VictoriaMetrics vmstorage 组件
node.daocloud.io/insight-vector 任意值,推荐用 true 特指 Vector 组件
node.daocloud.io/insight-otel-col 任意值,推荐用 true 特指 OpenTelemetry 组件

可以参考如下命令为节点添加及移除标签:

# 为 node8 添加标签,先将 insight-prometheus 调度到 node8 
kubectl label nodes node8 node.daocloud.io/insight-prometheus=true

# 移除 node8 的 node.daocloud.io/insight-prometheus 标签
kubectl label nodes node8 node.daocloud.io/insight-prometheus-

以下是 insight-prometheus 组件在部署时默认的亲和性偏好:

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - preference:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: DoesNotExist
      weight: 1
    - preference:
        matchExpressions:
        - key: node.daocloud.io/insight-prometheus # (1)!
          operator: Exists
      weight: 2
    - preference:
        matchExpressions:
        - key: node.daocloud.io/insight-any
          operator: Exists
      weight: 3
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            topologyKey: kubernetes.io/hostname
            labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/instance
                  operator: In
                  values:
                    - insight-agent-kube-prometh-prometheus
  1. 先将 insight-prometheus 调度到带有 node.daocloud.io/insight-prometheus 标签的节点