k8s之eventer钉钉报警
介绍
在Kubernetes中,事件分为两种,一种是Warning事件,表示产生这个事件的状态转换是在非预期的状态之间产生的;另外一种是Normal事件,表示期望到达的状态,和目前达到的状态是一致的。我们用一个Pod的生命周期进行举例,当创建一个Pod的时候,首先Pod会进入Pending的状态,等待镜像的拉取,当镜像录取完毕并通过健康检查的时候,Pod的状态就变为Running。此时会生成Normal的事件。而如果在运行中,由于OOM或者其他原因造成Pod宕掉,进入Failed的状态,而这种状态是非预期的,那么此时会在Kubernetes中产生Warning的事件。那么针对这种场景而言,如果我们能够通过监控事件的产生就可以非常及时的查看到一些容易被资源监控忽略的问题。
针对Kubernetes的事件监控场景,Kuernetes社区在Heapter中提供了简单的事件离线能力,后来随着Heapster的废弃,相关的能力也一起被归档了。为了弥补事件监控场景的缺失,阿里云容器服务发布并开源了kubernetes事件离线工具kube-eventer。支持离线kubernetes事件到钉钉机器人、SLS日志服务、Kafka开源消息队列、InfluxDB时序数据库等等。
官方仓库地址 :https://github.com/AliyunContainerService/kube-eventer
支持通知方式:
- 钉钉机器人
- 微信
- SLS日志服务
- Kafka开源消息队列
- InfluxDB时序数据库
- elasticsearch
- mongodb
- …….
部署
创建yaml文件
添加以下内容
钉钉告警
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
| apiVersion: apps/v1 kind: Deployment metadata: labels: name: kube-eventer name: kube-eventer namespace: kube-system spec: replicas: 1 selector: matchLabels: app: kube-eventer template: metadata: labels: app: kube-eventer annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: dnsPolicy: ClusterFirstWithHostNet serviceAccount: kube-eventer containers: - image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.2.0-484d9cd-aliyun name: kube-eventer command: - "/kube-eventer" - "--source=kubernetes:https://kubernetes.default" - --sink=dingtalk:https://oapi.dingtalk.com/robot/send?access_token=da26636c287511a04ea3e57fdfd860439b2644ae50d0dee18311ab9c1c5a669b&label=huang&level=Warning env: - name: TZ value: "Asia/Shanghai" volumeMounts: - name: localtime mountPath: /etc/localtime readOnly: true - name: zoneinfo mountPath: /usr/share/zoneinfo readOnly: true resources: requests: cpu: 100m memory: 100Mi limits: cpu: 500m memory: 250Mi volumes: - name: localtime hostPath: path: /etc/localtime - name: zoneinfo hostPath: path: /usr/share/zoneinfo --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-eventer rules: - apiGroups: - "" resources: - configmaps - events verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-eventer roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-eventer subjects: - kind: ServiceAccount name: kube-eventer namespace: kube-system --- apiVersion: v1 kind: ServiceAccount metadata: name: kube-eventer namespace: kube-system
|
飞书告警
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
| --- apiVersion: apps/v1 kind: Deployment metadata: labels: name: kube-eventer name: kube-eventer namespace: kube-system spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: kube-eventer strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: app: kube-eventer spec: containers: - command: - /kube-eventer - '--source=kubernetes:https://kubernetes.default' - >- --sink=webhook:https://open.feishu.cn/open-apis/bot/v2/hook/1d4ec947-141d-4b39-a2ab-8f4c60e42f67?level=Warning&namespaces=[^kube-system]&reason=[^(FailedComputeMetricsReplicas|FailedGetResourceMetric)]&method=POST&header=Content-Type=application/json&custom_body_configmap=custom-body&custom_body_configmap_namespace=kube-system env: - name: TZ value: Asia/Shanghai image: 'registry.aliyuncs.com/acs/kube-eventer-amd64:v1.2.0-484d9cd-aliyun' imagePullPolicy: IfNotPresent name: kube-eventer resources: limits: cpu: 500m memory: 250Mi requests: cpu: 100m memory: 100Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/localtime name: localtime readOnly: true - mountPath: /usr/share/zoneinfo name: zoneinfo readOnly: true dnsPolicy: ClusterFirstWithHostNet restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: kube-eventer serviceAccountName: kube-eventer terminationGracePeriodSeconds: 30 volumes: - hostPath: path: /etc/localtime type: '' name: localtime - hostPath: path: /usr/share/zoneinfo type: '' name: zoneinfo
|
飞书告警自定义模板
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| --- apiVersion: v1 data: content: |- { "msg_type": "interactive", "card": { "config": { "wide_screen_mode": true, "enable_forward": true }, "header": { "title": { "tag": "plain_text", "content": "测试k8s" }, "template": "Red" }, "elements": [ { "tag": "div", "text": { "tag": "lark_md", "content": "**EventName:** {{ .Name }}\n**EventNamespace:** {{ .Namespace }}\n**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}" } } ] } } kind: ConfigMap metadata: name: custom-body namespace: kube-system
|
部署
1
| kubectl apply -f eventer.yaml
|
字段说明
- Webhook Sink支持根据事件的Kind、Reason、Level、Namespace进行过滤,支持通过泛化的逻辑将数据离线给自定义Webhook系统、钉钉、微信、贝洽(bear chat)、slack等等。那么如何使用泛化的Webhook来实现上述的逻辑呢?首先我们先来看下Webhook Sink的参数与定义。
1 2
| # --sink=webhook:<WEBHOOK_URL>&level=<Normal or Warning, Warning default> --sink=webhook:<WEBHOOK_URL>?level=Warning&namespaces=ns1,ns2&kinds=Node,Pod&method=POST&header=customHeaderKey=customHeaderValue
|
- level -事件级别(可选。默认:警告。选项:警告和正常)
- namespaces -要过滤的名称空间(可选。默认值:所有名称空间,使用逗号分隔多个名称空间,Regexp模式支持)
- kinds -要过滤的种类(可选。默认:所有种类,使用逗号分隔多个种类。选项:Node,Pod等。)
- reason-进行过滤的原因(可选。默认值:空,支持Regexp模式)。您可以在查询中使用多原因字段。
- method -发送请求的方法(可选。默认:GET)
- header-请求中的标头(可选。默认值:空)。您可以在查询中使用多标题字段。
- custom_body_configmap-请求主体模板的configmap名称。您可以使用模板来自定义请求主体。(可选的。)
- custom_body_configmap_namespace-请求主体模板的configmap命名空间。(可选的。)