k8s使用

  • kubectl创建别名
1
alias k=kubectl
  • tab补全命令
1
2
3
yum install -y bash-completion
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash | sed s/kubectl/k/g)
  • kubernets运行应用
1
2
3
kubectl run kubia --image=luksa/kubia --port=8080 --generator=run/v1
#--image=luksa/kubia 容器运行时所需镜像
#--port=8080 监听8080端口
  • 创建一个服务
1
2
3
4
5
#创建一个LoadBalancer服务
kubectl expose pod kubia --type=LoadBalancer --name kubia-http
#查看
kubectl get svc
kubectl scale rc nginx-test --replicas=3
  • 公共配置参数
1
2
3
4
5
6
7
8
9
--log-backtrace-at traceLocation 记录日志每到 file:行号时打印一次stack trace 默认值0
--log-dir string 日志文件路径
--log-flush-frequency duration 设置flush日志文件的时间间隔 默认值5s
--logtostderr 设置为true表示将日志输出到stderr 不输出到日志文件
--alsologtostderr 设置为true表示将日志输出到日志文件同时输出到stderr
--stderrthreshold severity 将threshold级别以上的日志输出到stderr 默认值2
--v Level 配置日志级别
--vmodule moduleSpec 详细日志级别
--version version[=true] 输出其版本号
  • kube-apiserver 启动参数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
--admission-control strings 对发送给apiserver的任何请求进行准入控制 配置为一个准入控制器列表
AlwaysAdmit 运行所有请求
AlwaysDeny 禁止所有请求
AlwaysPullImages 启动容器之前总是去下载镜像
DefaultStorageClass 实现共享存储动态供应 为未指定StorageClass或PV的PVC匹配默认StorageClass
DefaultTolerationSeconds 设置默认容忍时间 5min
DenyEscalatingExec 拦截 所有exec和attach到具有特权的Pod上的请求
DenyExecOnPrivileged 拦截所有想在privileged container上执行命令的请求
ImagePolicyWebhook 允许后端webhook程序完成admission controller
LimitRanger 配额管理
NamespaceLifecycle 拒绝在不存在namespace中创建资源对象的请求 删除namespace时删除所有对象
PodPreset pod启动时注入应用所需设置
PodSecurityPolicy 对pod进行安全策略控制
ResourceQuota 配额管理
……
--advertise-address ip 广播给集群所有成员自己的IP地址
--allow-privileged 配置为true 运行pod中运行拥有系统特权的容器应用
--anonymous-auth 配置为true表示apiserver接收匿名请求 默认值true
--apiserver-count 集群中运行apiserver数量 默认值1
--authorization-mode 认证模式列表 多个以逗号分隔

Pod 资源文件详细说明

属性名称 取值类型 是否必需 取值说明
version String yes v1
kind String yes Pod
metadata Object yes 元数据
metadata.name String yes Pod的名称
metadata.namespace String yes Pod所属名称空间
metadata.labels[] List 自定义标签列表
metadata.annotation[] List 自定义注解列表
spec Object yes Pod中容器详细定义
spec.containers[] List yes Pod中的容器列表
spec.containers[].name String yes 容器的名称
spec.containers[].image String yes 容器的镜像名称
spec.containers[].imagePullPolicy String 获取镜像策略
spec.containers[].command[] List 容器启动命令列表
spec.containers[].args[] List 启动命令参数列表
spec.containers[].workingDir String 容器工作目录
spec.containers[].volumeMounts[] List 容器存储卷配置
spec.containers[].volumeMounts[].name String 共享存储卷名称
spec.containers[].volumeMounts[].mountPath String 存储卷容器内挂载绝对路径
spec.containers[].volumeMounts[].readOnly Boolean 是否只读模式,默认读写模式
spec.containers[].ports[] List 容器暴露的端口号列表
spec.containers[].ports[].name String 端口的名称
spec.containers[].ports[].containerPort Int 容器需要监听的端口号
spec.containers[].ports[].hostPort Int 默认与containerPort一致
spec.containers[].ports[].protocol String 端口协议TCP UDP 默认TCP
spec.containers[].env[] List 容器需要环境变量列表
spec.containers[].env[].name String 环境变量的名称
spec.containers[].env[].value String 环境变量的值
spec.containers[].resources Object 资源限制和资源请求设置
spec.containers[].resources.limits Object 资源限制的设置
spec.containers[].resources.limits.cpu String CPU限制 单位为core数
spec.containers[].resources.limits.memory String 内存限制 单位MiB/GiB
spec.containers[].resources.requests Object 请求限制的设置
spec.containers[].resources.requests.cpu String CPU请求 单位为core数
spec.containers[].resources.requests.memory String 内存请求 单位MiB/GiB
spec.volumes[] List Pod定义共享存储卷列表
spec.volumes[].name String 共享存储卷的名称
spec.volumes[].emptyDir Object 与Pod同生命周期的临时目录
spec.volumes[].hostPath Object Pod所在宿主机的目录
spec.volumes[].hostPath.path String Pod所在在主机的目录
spec.volumes[].secret Object 挂载预定义secret对象到容器
spec.volumes[].configMap Object 挂载预定义configMap对象到容器
spec.volumes[].livenessProbe Object 健康检查配置
spec.volumes[].livenessProbe.exec Object 使用exec方式
spec.volumes[].livenessProbe.exec.command[] String 指定命令或脚本
spec.volumes[].livenessProbe.httpGet Object 使用httpGet方式 path prot
spec.volumes[].livenessProbe.tcpSocket Object 使用tcpSocket方式
spec.volumes[].livenessProbe.initialDelaySeconds Number 启动后首次探测时间 单位s
spec.volumes[].livenessProbe.timeoutSeconds Number 探测超时时间 默认1s
spec.volumes[].livenessProbe.periodSeconds Number 探测时间间隔 默认10s
spec.restartPolicy String 重启策略
spec.nodeSelector Object Pod调度到包含label的Node key:value格式指定
spec.imagePullSecrets Object Pull镜像使用secret
spec.hostNetwork Boolean 是否使用主机网络模式

Pod 资源文件详细说明

属性名称 取值类型 是否必需 取值说明
version String yes v1
kind String yes Pod
metadata Object yes 元数据
metadata.name String yes Pod的名称
metadata.namespace String yes Pod所属名称空间
metadata.labels[] List 自定义标签列表
metadata.annotation[] List 自定义注解列表
spec Object yes Pod中容器详细定义
spec.selector[] List yes 选择指定label标签的Pod
spec.type String yes service的类型默认ClusterIP
spec.clusterIP String 虚拟服务IP地址
spec.sessionAffinity String 是否支持session 默认为空 可选ClientIP 同一客户端到同一后端Pod
spec.ports[] List service需要暴露端口列表
spec.ports[].name String 端口名称
spec.ports[].protocol String 端口协议 TCP UDP 默认TCP
spec.ports[].port int 服务监听端口号
spec.ports[].targetPort int 需要转发到后端Pod的端口号
spec.ports[].nodePort int 当type=NodePort时 映射宿主机端口号
status object 当type=LoadBalancer时 设置外部负载均衡器地址
status.loadBalancer object 外部负载均衡器
status.loadBalancer.ingress object 外部负载均衡器
status.loadBalancer.ingress.ip string 外部负载均衡器的IP地址
status.loadBalancer.ingress.hostname string 外部负载均衡器的主机名

进入容器

1
kubectl exec -it podname -c containername -n namespace -- shell command

VOLUME

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: apps/v1  #注意版本号
kind: Deployment
metadata:
name: nginx-dep
spec:
selector: #属性,选择器
matchLabels:
app: nginx
replicas: 1 #管理的副本个数
template: #模板属性
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts: #定义挂载卷
- mountPath: "/var/log/nginx"
name: nginx-vol
- name: busybox
image: busybox
command: ["sh", "-c", "tail -f /logs/access.log"]
volumeMounts:
- mountPath: /logs
name: nginx-vol
volumes: #定义共享卷
- name: nginx-vol
emptyDir: {}

CONFIGMAP

1
2
3
4
5
6
7
8
9
10
11
12
kubectl create configmap user-config --from-file=./
kubectl create configmap log-config --from-file=./2.txt
#查看
kubectl get cm/user-config -o yaml

apiVersion: v1 #注意版本号
kind: ConfigMap
metadata:
name: test-configmap
data:
apploglevel: info
appdatadir: /var/data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
apiVersion: apps/v1  #注意版本号
kind: Deployment
metadata:
name: nginx-dep-configmap
spec:
selector: #属性,选择器
matchLabels:
app: nginx
replicas: 1 #管理的副本个数
template: #模板属性
metadata:
labels:
app: nginx
spec:
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "env | grep APP"]
env:
- name: APPLOGLEVEL
valueFrom:
configMapKeyRef:
name: test-configmap
key: apploglevel
- name: APPDATADIR
valueFrom:
configMapKeyRef:
name: test-configmap
key: appdatadir
restartPolicy: Never
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#envFrom
apiVersion: apps/v1 #注意版本号
kind: Deployment
metadata:
name: nginx-dep-configmap
spec:
selector: #属性,选择器
matchLabels:
app: nginx
replicas: 1 #管理的副本个数
template: #模板属性
metadata:
labels:
app: nginx
spec:
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "env"]
envFrom:
- configMapRef:
name: test-configmap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#configmap热更新
apiVersion: v1
kind: ConfigMap
metadata:
name: reload-config
data:
logLevel: INFO
---
apiVersion: extensions/v1beta1
#apiVersion: apps/v1 要加上selector
kind: Deployment
metadata:
name: nginx-ig
spec:
replicas: 2
template:
metadata:
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
volumeMounts:
- name: reload-volume
mountPath: /etc/config
volumes:
- name: reload-volume
configMap:
name: reload-config

kubectl exec nginx-ig-b898c76f5-2w8ws -it -- cat /etc/config/logLevel
kubectl edit configmaps reload-config
kubectl exec nginx-ig-b898c76f5-2w8ws -it -- cat /etc/config/logLevel
#注意:使用configmap挂载env不会同步更新,使用configmap挂载volume的中数据需要一段时间(10s)才能同步更新
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#将pod信息注入环境变量
apiVersion: apps/v1 #注意版本号
kind: Deployment
metadata:
name: nginx-dep-configmap
spec:
selector: #属性,选择器
matchLabels:
app: nginx
replicas: 1 #管理的副本个数
template: #模板属性
metadata:
labels:
app: nginx
spec:
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "env"]
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#将资源限制信息注入环境变量
apiVersion: apps/v1 #注意版本号
kind: Deployment
metadata:
name: nginx-dep-configmap
spec:
selector: #属性,选择器
matchLabels:
app: nginx
replicas: 1 #管理的副本个数
template: #模板属性
metadata:
labels:
app: nginx
spec:
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "env"]
resources:
requests:
memory: "32Mi"
cpu: "125m"
limits:
memory: "64Mi"
cpu: "250m"
env:
- name: MY_CPU_REQUEST
valueFrom:
resourceFieldRef:
containerName: busybox
resource: requests.cpu
- name: MY_MEM_LIMIT
valueFrom:
resourceFieldRef:
containerName: busybox
resource: limits.memory

Pod健康检查

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#livenessProbe 检查容器是否存活(running)
#1.ExecAction 容器内执行命令,改命令返回码为0表明容器健康
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: busybox
command: ["sh", "-c", "echo ok > /tmp/health","sleep 10","rm -rf /tmp/health","sleep 600"]
livenessProbe:
exec:
command: ["cat", "/tmp/health"]
initialDelaySeconds: 15
timeoutSeconds: 1
#2.TCPSocketAction
apiVersion: v1
kind: Pod
metadata:
name: liveness-socket
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
tcpSocket:
port: 81
#首次探测时间
initialDelaySeconds: 5
#每隔多少s探测一次
periodSeconds: 2
#检查失败尝试几次
failureThreshold: 3
#3.HTTPGetAction
apiVersion: v1
kind: Pod
metadata:
name: liveness-http
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /index1.html
port: 80
#首次探测时间
initialDelaySeconds: 20
timeoutSeconds: 1
#检查失败尝试几次
failureThreshold: 3

#ReadinessProbe 检查容器是否启动完成(ready)

调度器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#公平
#资源利用率高
#效率
#灵活

#自定义调度器
apiVersion: v1
kind: Pod
metadata:
name: busybox
labels:
name: bb-test
spec:
schedulername: my-scheduler
containers:
- image: busybox
command:
- sleep
- "3600"
name: busybox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#节点亲和性
#requiredDuringSchedulingIgnoredDuringExecution 硬策略
#preferredDuringSchedulingIgnoredDuringExecution 软策略

apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
#硬策略
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.7.152
containers:
- name: with-node-affinity
image: nginx
imagePullPolicy: "IfNotPresent"
---
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity2
spec:
affinity:
nodeAffinity:
#硬策略
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.7.153
containers:
- name: with-node-affinity2
image: nginx
imagePullPolicy: "IfNotPresent"
1
2
3
4
5
6
7
#键值运算关系
In label的值在某个列表中
NotIn label的值不在某个列表中
Gt label的值大于某个值
Lt label的值小于某个值
Exists 某个label存在
DoesNotExist 某个label不存在

Pod调度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#requiredDuringSchedulingIgnoredDuringExecution 硬策略
#preferredDuringSchedulingIgnoredDuringExecution 软策略

apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pod-1
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: nginx
imagePullPolicy: "IfNotPresent"
---
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity2
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pod-1
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-affinity
image: nginx
imagePullPolicy: "IfNotPresent"

#1.Deployment 全自动调度
#2.定向调度 NodeSelector
#给Node打标签
kubectl label nodes node-01 key=val
#查看标签
kubectl get nodes --show-labels
#删除标签
kubectl label nodes node-01 key-
#修改标签
kubectl label nodes node-01 key=val2 --overwirte
#例子
apiVersion: v1
kind: Pod
metadata:
name: busybox
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
name: busybox
nodeSelector:
zone: north

#NodeAffinity Node亲和性调度
#PodAffinity
1
2
3
4
5
#亲和性/反亲和性调度策略
调度策略 匹配标签 操作符 是否支持拓扑域 调度目标
nodeAffinity 节点 In,NotIn,Exists,DoesNotExist,Gt,Lt 否 指定主机
podAffinity Pod In,NotIn,Exists,DoesNotExist 是 Pod与指定Pod同一拓扑域
podAntiAffinity Pod In,NotIn,Exists,DoesNotExist 是 Pod与指定Pod同一拓扑域

污点(Taint) 容忍(Toleration)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#Taint
key=value:effect
每个污点有一个key和value作为污点标签,其中value可以为空,effect描述污点作用
effect支持:
NoSchedule: 不会将pod调度到具有此污点的Node上
PreferNoSchedule: 尽量避免将pod调度到具有此污点的Node上
NoExecute: 不会将pod调度到具有此污点的Node上,同时将Node上已存在Pod驱逐出去

污点设置
kubectl taint nodes 192.168.7.152 check=ropon:NoExecute
查看
kubectl describe node 192.168.7.152|grep Taints
删除
kubectl taint nodes 192.168.7.152 check:NoExecute-

#Toleration
pod.spec.tolerations

tolerations:
- key: "check"
operator: "Exists"
value: "ropon"
effect: "NoSchedule"
#停留在node污点的时间
tolerationSeconds: 60

apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity2
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pod-1
topologyKey: kubernetes.io/hostname
tolerations:
- key: "check"
operator: "Equal"
value: "ropon"
effect: "NoExecute"
tolerationSeconds: 60
containers:
- name: with-pod-affinity
image: nginx
imagePullPolicy: "IfNotPresent"

其中key,value,effect与Node上设置taint必须一致
operator的值Exists将会忽略value值
tolerationSeconds 描述当前Pod需要被驱逐时在Node上保留运行时间

不指定key值时容忍所有污点key
tolerations:
- operator: "Exists"

不指定effect值时容忍所有污点作用
tolerations:
- key: "key"
operator: "Exists"

DaemonSet 每个Node上调度一个Pod

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#守护进程
#日志采集
#监控程序

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: daemonset-demo
spec:
selector:
matchLabels:
app: testbb
template:
metadata:
labels:
app: testbb
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
name: busybox

Job 批处理调度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#spec.template格式同Pod
#RestartPolicy仅支持Never或OnFailure
#单个Pod时,默认Pod成功运行后Job即结束
#spec.completions标志Job结束需要成功运行的Pod个数,默认为1
#spec.parallelism标志并运行Pod个数,默认为1
#spec.activeDeadlineSeconds标志失败Pod最大重试时间
apiVersion: batch/v1
kind: Job
metadata:
name: job-demo
spec:
template:
metadata:
name: job-demo
spec:
containers:
- image: busybox
command:
- sleep
- "40"
name: busybox
restartPolicy: Never

CronJob 基于时间的Job

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#spec.schedule 调度必需字段,指定任务运行周期
#spec.jobTemplate Job模板必需字段,指定需要运行的任务
#spec.startingDeadlineSeconds启动Job期限
#spec.concurrencyPolicy并发策略,默认Allow Forbid禁止并发 Replace 取消当前用新替换
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cronjob-demo
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- image: busybox
command:
- sh
- -c
- date;echo hello world
name: busybox
restartPolicy: OnFailure

Service 服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#Cluster默认类型
#NodePort
#LoadBalancer
#ExternalName

apiVersion: apps/v1 #注意版本号
kind: Deployment
metadata:
name: myapp-dep
spec:
selector: #属性,选择器
matchLabels:
app: myapp
replicas: 3 #管理的副本个数
template: #模板属性
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: nginx
imagePullPolicy: "IfNotPresent"
ports:
- name: http
containerPort: 80
---
apiVersion: v1 #注意版本号
kind: Service
metadata:
name: myapp
spec:
type: ClusterIP
selector: #属性,选择器
app: myapp
ports:
- name: http
port: 80
targetPort: 80

Headless Service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1  #注意版本号
kind: Service
metadata:
name: myapp-headless
spec:
clusterIP: "None"
selector: #属性,选择器
app: myapp
ports:
- name: http
port: 80
targetPort: 80

#测试
dig @172.20.0.23 myapp-headless.default.svc.cluster.local

NodePort

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1  #注意版本号
kind: Service
metadata:
name: myapp-nodeport
spec:
type: NodePort
selector: #属性,选择器
app: myapp
ports:
- name: http
port: 80
targetPort: 80

LoadBalancer

1
实际与NodePort方式一样

ExternalName

1
2
3
4
5
6
7
8
9
10
apiVersion: v1  #注意版本号
kind: Service
metadata:
name: myapp-ex1
spec:
type: ExternalName
externalName: test.ropon.top

#测试
dig @172.20.0.23 myapp-ex1.default.svc.cluster.local

Ingress

1
2
kubectl apply -f /etc/ansible/manifests/ingress/nginx-ingress/nginx-ingress.yaml
kubectl apply -f /etc/ansible/manifests/ingress/nginx-ingress/nginx-ingress-svc.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#ingress http
apiVersion: extensions/v1beta1
#apiVersion: apps/v1 要加上selector
kind: Deployment
metadata:
name: nginx-ig
spec:
replicas: 2
template:
metadata:
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-svc
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
name: nginx
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: nginx-test
spec:
rules:
- host: test1.ropon.top
http:
paths:
- path: /
backend:
serviceName: nginx-svc
servicePort: 80
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#https ingress
kubectl create secret tls ropon-tls --cert ropon.top.crt --key ropon.top.key

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: nginx-test-https
spec:
tls:
- hosts:
- test2.ropon.top
secretName: ropon-tls
rules:
- host: test2.ropon.top
http:
paths:
- path: /
backend:
serviceName: nginx-svc
servicePort: 80
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#basicauth
htpasswd -c auth ropon
kubectl create secret generic basic-auth --from-file=auth

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: nginx-test-auth
annotations:
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: basic-auth
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required - ropon'
spec:
rules:
- host: test3.ropon.top
http:
paths:
- path: /
backend:
serviceName: nginx-svc
servicePort: 80
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#rewrite
#nginx.ingress.kubernetes.io/rewrite-target 重定向目标URL
#nginx.ingress.kubernetes.io/ssl-redirect
#nginx.ingress.kubernetes.io/force-ssl-redirect 强制重定向https
#nginx.ingress.kubernetes.io/app-root
#nginx.ingress.kubernetes.io/use-regex 使用正则

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: nginx-test-rewrite
annotations:
nginx.ingress.kubernetes.io/rewrite-target: https://test2.ropon.top:23457
spec:
rules:
- host: test4.ropon.top
http:
paths:
- path: /
backend:
serviceName: nginx-svc
servicePort: 80

Secret

1
2
3
4
#Service Accout 
#访问kubernets api由kubernets自动创建并且自动挂载Pod的/run/secrets/kubernetes.io/serviceaccount
#Opaque base64编码格式的secret 用来存储密码 密钥
#kubernets.io/dockerconfigjson 用来存储私有docker registry
1
2
#Service Accout
kubectl exec nginx-ig-b898c76f5-2w8ws -- ls /run/secrets/kubernetes.io/serviceaccount
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#Opaque
echo "ropon"|base64
echo "123456"|base64

apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
username: cm9wb24K
password: MTIzNDU2Cg==
---
apiVersion: extensions/v1beta1
#apiVersion: apps/v1 要加上selector
kind: Deployment
metadata:
name: nginx-secret-test
spec:
replicas: 2
template:
metadata:
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
volumeMounts:
- name: secrets
mountPath: "/test"
readOnly: true
volumes:
- name: secrets
secret:
secretName: mysecret

#测试
kubectl exec nginx-secret-test-5d9f5c4bc-l6jjk -- cat /test/password
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#secret导入环境变量
apiVersion: extensions/v1beta1
#apiVersion: apps/v1 要加上selector
kind: Deployment
metadata:
name: nginx-secret-test1
spec:
replicas: 1
template:
metadata:
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
env:
- name: TEST_USER
valueFrom:
secretKeyRef:
name: mysecret
key: username

#测试
kubectl exec nginx-secret-test1-5d89cd9486-zzjw4 -- env
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#创建docker registry
kubectl create secret docker-registry myregistrykey --docker-server= --docker-username= --docker-password= --docker-email=

apiVersion: extensions/v1beta1
#apiVersion: apps/v1 要加上selector
kind: Deployment
metadata:
name: nginx-secret-test2
spec:
replicas: 1
template:
metadata:
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
imagePullSecrets:
- name: myregistrykey

Volume

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#emptyDir
#暂存空间 共享数据

apiVersion: extensions/v1beta1
#apiVersion: apps/v1 要加上selector
kind: Deployment
metadata:
name: nginx-vol-test
spec:
replicas: 1
template:
metadata:
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
volumeMounts:
- name: cache-vol
mountPath: "/cache"
- name: busybox
image: busybox
imagePullPolicy: IfNotPresent
command:
- sleep
- "3600"
volumeMounts:
- name: cache-vol
mountPath: "/test"
volumes:
- name: cache-vol
emptyDir: {}

#测试
kubectl exec pod nginx-vol-test-5bc5485bdb-tk7wm -c nginx -it -- /bin/sh
kubectl exec pod/nginx-vol-test-5bc5485bdb-tk7wm -c busybox -it -- /bin/sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#hostPath
#将节点的文件或目录挂载到集群中
#"" 默认 不做任何检查
#DirectoryOrCreate 指定路径不存在则创建空目录 权限755 与kublete具有相同组和所有权
#Directory 指定路径下必须存在目录
#FileOrCreate 指定文件路径不存在则创建空文件 权限644 与kublete具有相同组和所有权
#File 指定路径下必须存在文件
#Socket 指定路径下必须存在套接字
#CharDevice 指定路径下必须存在字符设备
#BlockDevice 指定路径下必须存在块设备
mkdir /www
echo "hello" > /www/index.html
date >> /www/index.html

apiVersion: extensions/v1beta1
#apiVersion: apps/v1 要加上selector
kind: Deployment
metadata:
name: nginx-vol-test1
spec:
replicas: 3
template:
metadata:
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
volumeMounts:
- name: nginx-vol
mountPath: "/usr/share/nginx/html"
volumes:
- name: nginx-vol
hostPath:
path: /www
type: Directory

PV PVC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
PV是集群中的资源 独立于Pod的生命周期
PVC 用户存储请求 与Pod类型,Pod消耗节点资源(CPU和内存)PVC消耗PV资源
PV访问模式
ReadWriteOnce RWO 该卷可被单个节点读写挂载
ReadOnlyMany ROX 该卷可被多个节点读挂载
ReadWriteMany RWX 该卷可被多个节点读写挂载

回收策略
Retain 保留手动回收
Recycle 回收
Delete 删除

状态
Available 可用
Bound 已绑定
Released 已释放
Failed 失败
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#部署PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfspv1
spec:
capacity:
storage: 8Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
nfs:
path: /home/k8sdata
server: 172.16.7.151
---
#创建服务并使用PVC
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
#部署statefulset
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: "/usr/share/nginx/html"
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: ["ReadWriteMany"]
storageClassName: nfs
resources:
requests:
storage: 2Gi

新增Node节点

1
2
3
4
5
6
7
8
9
#安装ansible
yum install -y ansible
#安装pip
yum install -y python-pip
#安装netaddr
pip install netaddr -i https://mirrors.aliyun.com/pypi/simple/
pip install configparser -i https://mirrors.aliyun.com/pypi/simple/
pip install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple/
pip install zipp -i https://mirrors.aliyun.com/pypi/simple/

动态PV

1
2
#github地址:
https://github.com/kubernetes-incubator/external-storage/tree/master/nfs/deploy/kubernetes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#创建RBAC授权
cat rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
1
2
3
4
5
6
7
#创建Storageclass类
cat storageclass-nfs.yaml
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
name: managed-nfs-storage
provisioner: fuseim.pri/ifs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#创建nfs的deployment
cat deployment-nfs.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nfs-client-provisioner
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
imagePullSecrets:
- name: registry-pull-secret
serviceAccount: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: lizhenliang/nfs-client-provisioner:v2.0.0
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: fuseim.pri/ifs
- name: NFS_SERVER
value: 172.16.7.151
- name: NFS_PATH
value: /home/k8sdata
volumes:
- name: nfs-client-root
nfs:
server: 172.16.7.151
path: /home/k8sdata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#使用statefulset创建nginx服务动态供给pv
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
#部署statefulset
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: "/usr/share/nginx/html"
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "nfs-storage"
resources:
requests:
storage: 2Gi

StatefulSet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#Pod名称:$(statefulset名称)-$(序号)
#StatefulSet为每个Pod副本创建一个DNS域名
#格式:$(podname).$(headlessservername).$(namespace).svc.cluster.local 通过域名通信并非Pod IP
#StatefulSet使用Headless服务控制Pod的域名
#格式:$(servicename).$(namespace).svc.cluster.local
#根据volumeClaimTemplates为每个Pod创建一个pvc
#删除Pod不会删除其pvc,手工删除pvc将自动释放pv

#StatefulSet启动顺序
#有序部署:部署StatefulSet 多个副本 顺序创建(0~N-1) 下一个Pod运行之前 之前Pod必须是Running或Ready
#有序删除:Pod被删除时 顺序删除(N-1~0)
#有序扩展:Pod扩展 之前Pod必须是Running或Ready

#使用场景
持久化存储 Pod重新调度后还能访问相同数据 基于PVC实现
稳定网络标识符 Pod重新调度后其PodName和HostName不变
有序部署 有序扩展 基于init containers实现
有序收缩

集群安全

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
Authentication
RBAC
Role ClusterRole RoleBinding ClusterRoleBinding
k8s没有提供用户管理
ApiServer会把客户端证书CN字段作为User 把names.O字段作为Group

kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
namespace: default
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get","watch","list"]

ClusterRole 具有与Role相同的权限角色控制能力 不同的是ClusterRole是集群级别
集群级别的资源控制
非资源类型endpoints
所有命名空间资源控制

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: secret-reader
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get","watch","list"]

RoleBinding ClusterRoleBinding
RoleBinding可以将角色中定义权限赋予用户或用户组 包含一组权限列表(subjects)
权限列表包含不同形式权限资源类型(user,groups,service accounts)
RoleBinding包含对被Bind的Role引用,RoleBinding适用于某个命名空间内的授权
ClusterRoleBinding适用于集群范围内的授权

#将default命名空间pod-reader Role授予ropon用户
#此后ropon用户名在default命名空间中将具有pod-reader的权限
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: pod-reader
namespace: default
subjects:
- kind: User
name: ropon
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io

RoleBinding同样可以引用ClusterRole来对当前namespace内用户、用户组或ServiceAccount进行授权
允许集群管理员在整个集群内定义一些通用的ClusterRole,然后再不同namespace中使用RoleBinding来引用

#RoleBinding引用一个ClusterRole,这个ClusterRole具有整个集群内对secrets的访问权限
#但其授权用户ropon只能访问development空间中的secrets(因为RoleBinding定义在development命名空间)
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: read-secrets
namespace: development
subjects:
- kind: User
name: ropon
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: read-secrets
apiGroup: rbac.authorization.k8s.io

#使用ClusterRoleBinding可以对整个集群中所有命名空间资源权限进行授权
#ClusterRoleBinding授权manager组所有用户名在全部命名空间对secrets进行访问
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: read-secrets-global
namespace: development
subjects:
- kind: Group
name: manager
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: read-secrets
apiGroup: rbac.authorization.k8s.io

Resources

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Kubernets集群内一些资源一般以其名称字符串来表示,这些字符串一般会在API的URL地址中出现
某些资源还包含子资源,比如logs资源属于pods的子资源
GET /api/v1/namespaces/{namespace}/pods/{name}/log

#定义pods资源logs访问权限的Role
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: pod-and-pod-logs-reader
namespace: default
rules:
- apiGroup: [""]
resources: ["pods/log"]
verbs: ["get","list"]

RoleBinding和ClusterRoleBinding可以将Role绑定subjects
subjects可以是groups、users或者service accounts
subjects中Users使用字符串表示,恶意普通名字字符串,可以是email地址
还可以字符串形式数组ID,但前缀不能以system开头
同理Groups格式与Users相同,都为一个字符串,前缀不能以system开头

实战 创建一个用户只能管理dev空间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
useradd devuser
passwd devuser
kubectl create namespace dev
cat dev-csr.json
{
"CN": "devuser",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "HangZhou",
"L": "XS",
"O": "k8s",
"OU": "System"
}
]
}

cfssl gencert -ca=./ca.pem -ca-key=./ca-key.pem -profile=kubernets ./dev-csr.json|cfssljson -bare devuser
#设置集群参数
export KUBE_APISERVER="https://192.168.7.150:6443"
kubectl config set-cluster cluster1 \
--certificate-authority=/etc/kubernetes/ssl/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER}
--kubeconfig=devuser.kubeconfig
#设置客户端认证参数
kubectl config set-credentials devuser \
--client-certificate=/etc/kubernetes/ssl/kubelet.pem \
--client-key=/etc/kubernetes/ssl/kubelet-key.pem \
--embed-certs=true \
--kubeconfig=devuser.kubeconfig
#设置上下文参数
kubectl config set-context cluster1 \
--cluster=cluster1 \
--user=devuser \
--namespace=dev \
--kubeconfig=devuser.kubeconfig
#进行RoleBinding角色绑定
kubectl create rolebinding devuser-admin-binding --clusterrole=admin --user=devuser --namespace=dev
cp devuser.kubeconfig /home/devuser/.kube/config
#切换devuser用户并切换上下文
cd /home/devuser/.kube
kubectl config use-context cluster1 --kubeconfig=config

Helm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
cat helm-rabc-config.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system

kubectl create -f helm-rabc-config.yaml
helm init --service-account tiller --history-max 200 --tiller-image registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts --upgrade
kubectl get pod -n kube-system -l name=tiller
#替换helm的repo为阿里镜像仓库
helm repo remove stable
helm repo add stable https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
helm repo update

cat Chart.yaml
name: hello-world
version: 1.0.0

cat templates/deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: hello-world
spec:
replocas: 2
template:
metadata:
labels:
app: hello-world
spec:
containers:
- name: hello-world
image: nginx
ports:
- containerPort: 80
protocal: TCP

cat templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: hello-world
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: hello-world

#安装
helm install .
#列出已经部署的Release
helm ls
#查询具体Release的状态
helm status XXXXX
#删除所有与具体Release相关的kubernets资源
helm delete XXXXX
helm rollback

#Debug 使用模板动态生成k8s资源清单 能提前预览生成的结果
#--dry-run --debug 选项打印出 生成的清单文件内容 但不执行部署
helm install . --dry-run --debug --set image.tag=latest

Prometheus

1

go操作etcd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
package main

import (
"context"
"crypto/tls"
"crypto/x509"
"fmt"
"io/ioutil"
"log"
"time"

"go.etcd.io/etcd/clientv3"
)

// etcd client put/get demo
// use etcd/clientv3

func main() {
//使用https链接etcd
var etcdCert = "./etcd.pem"
var etcdCertKey = "./etcd-key.pem"
var etcdCa = "./ca.pem"

cert, err := tls.LoadX509KeyPair(etcdCert, etcdCertKey)
if err != nil {
return
}

caData, err := ioutil.ReadFile(etcdCa)
if err != nil {
return
}

pool := x509.NewCertPool()
pool.AppendCertsFromPEM(caData)

_tlsConfig := &tls.Config{
Certificates: []tls.Certificate{cert},
RootCAs: pool,
}
cli, err := clientv3.New(clientv3.Config{
Endpoints: []string{"https://192.168.7.150:2379"},
DialTimeout: 5 * time.Second,
TLS: _tlsConfig,
})
if err != nil {
// handle error!
fmt.Printf("connect to etcd failed, err:%v\n", err)
return
}
fmt.Println("connect to etcd success")
defer cli.Close()
//// put
//ctx, cancel := context.WithTimeout(context.Background(), time.Second)
//_, err = cli.Put(ctx, "ropon", "666")
//cancel()
//if err != nil {
// fmt.Printf("put to etcd failed, err:%v\n", err)
// return
//}
//// get
//ctx, cancel = context.WithTimeout(context.Background(), time.Second)
//resp, err := cli.Get(ctx, "ropon")
//cancel()
//if err != nil {
// fmt.Printf("get from etcd failed, err:%v\n", err)
// return
//}
//for _, ev := range resp.Kvs {
// fmt.Printf("%s:%s\n", ev.Key, ev.Value)
//}

//// watch key:q1mi change
//rch := cli.Watch(context.Background(), "west") // <-chan WatchResponse
//for wresp := range rch {
// for _, ev := range wresp.Events {
// fmt.Printf("Type: %s Key:%s Value:%s\n", ev.Type, ev.Kv.Key, ev.Kv.Value)
// }
//}

// 创建一个5秒的租约
resp, err := cli.Grant(context.TODO(), 5)
if err != nil {
log.Fatal(err)
}

// 5秒钟之后, /ropon/ 这个key就会被移除
_, err = cli.Put(context.TODO(), "/ropon/", "8888", clientv3.WithLease(resp.ID))
if err != nil {
log.Fatal(err)
}

// the key 'foo' will be kept forever
ch, kaerr := cli.KeepAlive(context.TODO(), resp.ID)
if kaerr != nil {
log.Fatal(kaerr)
}
for {
ka := <-ch
fmt.Println("ttl:", ka.TTL)
}
}

k8s搭建devops环境

主要内容:

  • 使用kubeadm搭建kubernetes环境
  • 安装flannel网络插件
  • 搭建nfs服务器
  • 安装nfs provisioner
  • 安装helm
  • 安装nginx ingress
  • 安装Jenkins
  • 安装gitlab
  • 安装harbor

具体步骤:

安装ansible、expect

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#批量授权脚本
cat plssh.sh
#!/bin/bash
# Author: Ropon
# Blog: https://www.ropon.top
declare -A CserverLst
CserverLst=([s1]="192.168.8.151" [s2]="192.168.8.152")
cport="22"
cpasswd="ropon.top"
ansible_host="/etc/ansible/hosts"
tmpsshfile="/tmp/ssh.exp"
flag="k8snode"

yum install -y ansible expect
echo '#!/usr/bin/expect
spawn ssh-keygen
expect {
"*.ssh/id_rsa*" {exp_send "\r";exp_continue}
"*passphrase*" {exp_send "\r";exp_continue}
"*again*" {exp_send "\r"}
}' > $tmpsshfile
expect $tmpsshfile
sleep 1
echo "[$flag]" >> $ansible_host

for key in ${!CserverLst[*]}; do
cat > $tmpsshfile << EOF
#!/usr/bin/expect
spawn ssh-copy-id ${CserverLst[$key]} -p ${cport}
expect {
"*yes/no*" {exp_send "yes\r";exp_continue}
"*password*" {exp_send "${cpasswd}\r";exp_continue}
}
EOF
expect $tmpsshfile
echo "${CserverLst[$key]} ansible_ssh_port=${cport}" >> $ansible_host
done
ansible $flag -m ping

kubernets

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
#配置:至少2台2核心4GB
#检查系统主机名
#master和node执行以下命令检查:
cat /etc/redhat-release
lscpu|grep CPU
#修改主机名
hostnamectl set-hostname master01
hostnamectl set-hostname node01
hostnamectl set-hostname node02
#查看修改结果
hostnamectl status
#配置hosts文件
echo "127.0.0.1 $(hostname)" >> /etc/hosts
cat >> /etc/hosts << EOF
192.168.8.150 master01
192.168.8.151 node01
192.168.8.152 node02
EOF
#关闭防护墙
systemctl disable firewalld
systemctl stop firewalld
systemctl disable iptables
systemctl stop iptables
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
setenforce 0
#禁用swap
swapoff -a
sed -i.bak '/swap/s/^/#/' /etc/fstab

#master node节点批量执行
echo "
#新增br_netfilter ipvs模块
#!/bin/bash
# Author: Ropon
# Blog: https://www.ropon.top
cat > /etc/sysconfig/modules/br_netfilter_ipvs.modules << EOF
modprobe br_netfilter
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
EOF
chmod 755 /etc/sysconfig/modules/br_netfilter_ipvs.modules
cat > /etc/rc.sysinit << EOF
#!/bin/bash
for file in /etc/sysconfig/modules/*.modules ; do
[ -x $file ] && \$file
done
EOF
#优化内核参数
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
lsmod |grep br_netfilter" > netfilter.sh

ansible k8snode -m copy -a 'src=/root/netfilter.sh dest=/root/netfilter.sh mode=744'
ansible k8snode -m shell -a 'bash /root/netfilter.sh'
echo "#设置源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
yum clean all
yum makecache fast -y" > yum.sh

ansible k8snode -m copy -a 'src=/root/yum.sh dest=/root/yum.sh mode=744'
ansible k8snode -m shell -a 'bash /root/yum.sh'

#安装docker
echo "yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum install docker-ce-18.09.0 docker-ce-cli-18.09.0 containerd.io-1.2.13 -y
mkdir -p /etc/docker
#k8s推荐使用systemd,然而docker默认以ccgroups方式启动,故做以下修改
#k8s配置文件/var/lib/kubelet/kubeadm-flags.env
tee /etc/docker/daemon.json <<-'EOF'
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://xxx.mirror.aliyuncs.com"]
}
EOF
systemctl daemon-reload
systemctl start docker
systemctl enable docker" > docker.sh

ansible k8snode -m copy -a 'src=/root/docker.sh dest=/root/docker.sh mode=744'
ansible k8snode -m shell -a 'bash /root/docker.sh'

#安装k8s
#master node
echo "yum install -y kubelet-1.16.9 kubeadm-1.16.9 kubectl-1.16.9
systemctl enable kubelet" > k8s.sh

ansible k8snode -m copy -a 'src=/root/k8s.sh dest=/root/k8s.sh mode=744'
ansible k8snode -m shell -a 'bash /root/k8s.sh'

#master初始化集群
kubeadm init --kubernetes-version=1.16.9 \
--apiserver-advertise-address=192.168.8.150 \
--image-repository registry.aliyuncs.com/google_containers \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
#安装flannel插件
kubectl apply -f kube-flannel.yml
#配置kubectl
mkdir -p /root/.kube
cp /etc/kubernetes/admin.conf /root/.kube/config
#node执行加入集群
kubeadm join 192.168.8.150:6443 --token xxxxxxxxxxxxxxxx \
--discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
#命令补全
#创建别名
alias k=kubectl
yum install -y bash-completion
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
cd ~;echo "source <(kubectl completion bash)" >> .bashrc
#配置node kubectl
mkdir -p /root/.kube
cp /etc/kubernetes/admin.conf /root/.kube/config
scp -P 22 /root/.kube/config node01:/root/.kube/config
scp -P 22 /root/.kube/config node02:/root/.kube/config
#测试
kubectl get node -A
#安装nfs
yum install nfs-utils rpcbind -y
systemctl enable rpcbind.service
systemctl enable nfs.service
mkdir /home/k8sdata
chown nfsnobody.nfsnobody /home/k8sdata
echo "/home/k8sdata 192.168.8.150(rw,sync,root_squash) 192.168.8.151(rw,sync,root_squash) 192.168.8.152(rw,sync,root_squash)">>/etc/exports
systemctl start rpcbind
systemctl start nfs
showmount -e localhost
#测试
showmount -e 192.168.8.150
mkdir /test
mount 192.168.8.150:/home/k8sdata /test/
cd /test/
echo "ok" > test.txt
#安装provisioner
kubectl -f rbac.yaml
kubectl -f storageclass-nfs.yaml
#注意修改nfs服务地址
kubectl -f deployment-nfs.yaml
#安装helm
wget http://panel.ropon.top/soft/helm-v3.2.4-linux-amd64.tar.gz
tar xf helm-v3.2.4-linux-amd64.tar.gz
mv linux-amd64/helm /usr/bin
helm version
rm -rf helm-v3.2.4-linux-amd64.tar.gz linux-amd64/
#添加国内源
helm repo add stable https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
helm repo update
#安装ingress
#注意修改api-server - --service-node-port-range=1-65535
/etc/kubernetes/manifests/kube-apiserver.yaml
systemctl daemon-reload
systemctl restart kubelet
kubectl apply -f ingress.yaml
kubectl apply -f ingress-svc.yaml
#安装jenkins
helm search repo stable/jenkins
helm pull stable/jenkins
#修改values.yaml文件
Image: "jenkinsci/blueocean"
ImageTag: "latest"
ImagePullPolicy: "IfNotPresent"
HostName: jenkins.ropon.top
AdminPassword: xxxxxx
#修改Jenkins时间
JavaOpts: >
-Djava.awt.headless=true
-Dorg.apache.commons.jelly.tags.fmt.timeZone=Asia/Shanghai
-Dfile.encoding=UTF-8
ServiceType: ClusterIP
#LoadBalancerSourceRanges:
#- 0.0.0.0/0
#取消自动安装插件
InstallPlugins:
#- kubernetes:1.1
#- workflow-aggregator:2.5
#- workflow-job:2.15
#- credentials-binding:1.13
#- git:3.6.4
StorageClass: "managed-nfs-storage"
rbac:
install: true
helm install jenkins .
#安装jenkins插件
#更新源(web面板修改)
https://mirrors.tuna.tsinghua.edu.cn/jenkins/updates/update-center.json
cd /var/jenkins_home/updates
sed -i 's/http:\/\/updates.jenkins-ci.org\/download/https:\/\/mirrors.tuna.tsinghua.edu.cn\/jenkins/g' default.json && sed -i 's/http:\/\/www.google.com/https:\/\/www.baidu.com/g' default.json
#手工安装以下插件
Chinese
pipeline
kubernets
gitlab
#安装gitlab
cat > gitlab-setup.sh << EOF
#!/bin/bash
mkdir -p /home/gitlab
docker run --detach \\
--hostname xxxx.ropon.top \\
--env GITLAB_OMNIBUS_CONFIG="external_url 'http://xxxx.ropon.top/'; gitlab_rails['gitlab_shell_ssh_port'] = 6022;" \\
--publish 443:443 --publish 80:80 --publish 6022:22 \\
--name gitlab \\
--restart always \\
--volume /home/gitlab/config:/etc/gitlab \\
--volume /home/gitlab/logs:/var/log/gitlab \\
--volume /home/gitlab/data:/var/opt/gitlab \\
--cpus 2 \\
--memory 2048MB \\
gitlab/gitlab-ce:11.2.2-ce.0
EOF
sh gitlab-setup.sh
#启动https
/etc/gitlab/gitlab.rb
nginx['redirect_http_to_https'] =true
nginx['ssl_certificate'] = "/etc/gitlab/ssl/server.crt"
nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/server.key"
#安装harbor
wget http://panel.ropon.top/k8s/harbor-offline-installer-v1.8.2.tgz
tar xf harbor-offline-installer-v1.8.2.tgz
#修改harbor.yml文件
hostname: xxxx.ropon.top
#开启https
port: 443
certificate: /home/harbor/ropon.top.crt
private_key: /home/harbor/ropon.top.key
#下载docker-compose
wget http://panel.ropon.top/soft/docker-compose-Linux-x86_64
mv docker-compose-Linux-x86_64 /usr/bin/docker-compose
./prepare
./install.sh
#jenkins配置k8s
https://kubernetes.default
default
http://jenkins.default:8080
jenkins-agent.default:50000
#配置gitlab
#新建任务,进入流水线任务编辑,勾选Build when a change is pushed to GitLab
Admin area => Settings => Outbound requests 勾选
project => Settings => Integrations
#创建拉取镜像秘钥
kubectl create secret docker-registry hellogoregistrykey --docker-server=xxxx.ropon.top --docker-username=admin --docker-password=xxxxxx --docker-email=ropon@ropon.top
#之前使用iptables后启动ipvs
kubectl -n kube-system edit cm kube-proxy
mode: "ipvs"
#删除之前pod等待重建
kubectl get pod -n kube-system |grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'

Pipeline

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def gitlabUrl = "gitlab.ropon.top"
def harborUrl = "harbor.ropon.top"
def GroupName = "testgo"
def projectName = "hellogo"
def imageTag = "latest"
def kubectlImage = "lachlanevenson/k8s-kubectl:v1.16.9"
def branchName = "master"
def gitAuthName = "gitlab-auth-user"
def harborAuthName = "harbor-auth-user"
def sendmsgAuthName = "sendmsg-auth-user"
def msgText = "构建完成,请测试"

podTemplate(cloud: 'kubernetes',containers: [
containerTemplate(name: 'docker', image: 'docker:stable', command: 'cat', ttyEnabled: true),
containerTemplate(name: 'kubectl', image: "${kubectlImage}", command: 'cat', ttyEnabled: true)
],
volumes: [
hostPathVolume(hostPath: '/var/run/docker.sock', mountPath: '/var/run/docker.sock'),
hostPathVolume(hostPath: '/root/.kube', mountPath: '/root/.kube')
]
)

{
node (POD_LABEL) {
stage('pull code') {
checkout([$class: 'GitSCM', branches: [[name: "*/${branchName}"]], doGenerateSubmoduleConfigurations: false, extensions: [], submoduleCfg: [], userRemoteConfigs: [[credentialsId: "${gitAuthName}", url: "http://${gitlabUrl}/${GroupName}/${projectName}.git"]]])
}
container('docker') {
stage('docker-build') {
withCredentials([usernamePassword(credentialsId: "${harborAuthName}", passwordVariable: 'password', usernameVariable: 'username')]) {
sh "docker login -u $username -p $password $harborUrl"
}
sh "docker build -t ${projectName}:${imageTag} ."
def imageName = "${projectName}:${imageTag}"
def remoteImageName = "${harborUrl}/${GroupName}/${imageName}"
sh "docker tag $imageName $remoteImageName"
sh "docker push $remoteImageName"
sh "docker rmi $imageName"
sh "docker rmi $remoteImageName"
}
}
container('kubectl') {
stage('k8s deploy') {
sh "kubectl --kubeconfig=/root/.kube/config apply -f deployment.yaml"
}
}
stage('send msg') {
withCredentials([usernamePassword(credentialsId: "${sendmsgAuthName}", passwordVariable: 'password', usernameVariable: 'username')]) {
sh "wget http://panel.ropon.top/soft/sendmsg && chmod +x sendmsg && ./sendmsg $password $username $msgText"
}
}
}
}

Docerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
FROM golang:1.13-alpine3.10 as builder
ENV GO111MODULE=on \
CGO_ENABLED=0 \
GOOS=linux \
GOARCH=amd64 \
GOPROXY=https://goproxy.cn

COPY . /app/
RUN cd /app && go build -o hellogo .

FROM scratch
COPY --from=builder /app/hellogo /hellogo
ENTRYPOINT ["/hellogo"]

Deployment

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
apiVersion: v1  #注意版本号
kind: Service
metadata:
name: myapp
spec:
type: ClusterIP
selector: #属性,选择器
app: hello
ports:
- name: http
port: 9000
targetPort: 9000
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: hellogo
spec:
rules:
- host: hellogo.ropon.top
http:
paths:
- backend:
serviceName: myapp
servicePort: 9000
---
apiVersion: apps/v1 #描述文件遵循extensions/v1beta1版本的Kubernetes API
kind: Deployment #创建资源类型为Deployment
metadata: #该资源元数据
name: test-hello #Deployment名称
spec: #Deployment的规格说明
selector:
matchLabels:
app: hello
replicas: 2 #指定副本数为3
template: #定义Pod的模板
metadata: #定义Pod的元数据
labels: #定义label(标签)
app: hello #label的key和value分别为app和nginx
spec: #Pod的规格说明
imagePullSecrets:
- name: hellogoregistrykey
containers:
- name: hellogo #容器的名称
image: harbor.ropon.top/testgo/hellogo:v4 #创建容器所使用的镜像
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9000

k8s v1.20.11

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#docker版本
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum install docker-ce-19.03.15 docker-ce-cli-19.03.15 containerd.io-1.2.13 -y
mkdir -p /etc/docker
tee /etc/docker/daemon.json <<-EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://xxxxx.mirror.aliyuncs.com"]
}
EOF
systemctl daemon-reload
systemctl start docker
systemctl enable docker

#k8s版本
yum install -y kubelet-1.20.11 kubeadm-1.20.11 kubectl-1.20.11
systemctl enable kubelet

#其他同上

go格式化输出json

1
2
3
4
5
6
7
var prettyJSON bytes.Buffer
err := json.Indent(&prettyJSON, body, "", "\t")
if err != nil {
log.Println("JSON parse error: ", err)
return
}
fmt.Println(string(prettyJSON.Bytes()))

python格式化输出字符串

1
2
3
4
5
6
7
8
9
10
# 方式一
# val = {"host": ip, "ttl": 60 }
# cmd_string = f"/bin/etcdctl put /coredns/{flag} '{json.dumps(val)}'"
# 方式二
# cmd_string = """/bin/etcdctl put /coredns/{flag} '{{"host": "{ip}","ttl": 60}}'""".format(flag=flag, ip=ip)
# 方式三
# cmd_string = """/bin/etcdctl put /coredns/{0} '{{"host": "{1}","ttl": 60}}'""".format(flag, ip)
# 方式四
# cmd_string = f"""/bin/etcdctl put /coredns/{flag} '{{"host": "{ip}","ttl": 60}}'"""
# print(cmd_string)

任务调度demo

  • 模拟任务调度
  • 同步(串行)任务/异步(并行)任务
  • 遇到同步任务需执行完成后再执行后续任务
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
package main

import (
"context"
"fmt"
"sync"
"time"
)

func test(wg *sync.WaitGroup, ctx context.Context, jobId int) {
defer wg.Done()
for i := 0; i < 5; i++ {
select {
case <-ctx.Done():
fmt.Printf("任务Id:%d,异常退出\n", jobId)
return
default:
fmt.Printf("任务Id:%d,执行第%d次\n", jobId, i)
if jobId > 2 {
time.Sleep(time.Second * 5)
} else {
time.Sleep(time.Second * 2)
}

}
}
}

func main() {
ctx, cancel := context.WithCancel(context.Background())
wg := new(sync.WaitGroup)
go func() {
time.Sleep(time.Second * 20)
cancel()
}()
for i := 0; i < 5; i++ {
wg.Add(1)
go test(wg, ctx, i)

if i < 1 {
wg.Wait()
select {
case <-ctx.Done():
fmt.Println("main1 异常退出")
return
default:
fmt.Println("1 select")
}
}
}
wg.Wait()
select {
case <-ctx.Done():
fmt.Println("main2 异常退出")
return
default:
fmt.Println("2 select")
}
//测试阻塞
select {}
}

从0开始搭建运维平台

为了便于管理,安装jumpserver

  • 生成加密私钥

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    if [ ! "$SECRET_KEY" ]; then
    SECRET_KEY=$(cat /dev/urandom | tr -dc A-Za-z0-9 | head -c 50)
    echo "SECRET_KEY=$SECRET_KEY" >>~/.bashrc
    echo $SECRET_KEY
    else
    echo $SECRET_KEY
    fi
    if [ ! "$BOOTSTRAP_TOKEN" ]; then
    BOOTSTRAP_TOKEN=$(cat /dev/urandom | tr -dc A-Za-z0-9 | head -c 16)
    echo "BOOTSTRAP_TOKEN=$BOOTSTRAP_TOKEN" >>~/.bashrc
    echo $BOOTSTRAP_TOKEN
    else
    echo $BOOTSTRAP_TOKEN
    fi
  • 创建mysql数据库及账号密码

    1
    2
    create database jumpserver default charset 'utf8' collate 'utf8_bin';
    grant all on jumpserver.* to 'jumpserver'@'%' identified by 'xxxxxxxxxx';
  • 通过docker启动jumpserver

    1
    2
    #xxx是目录 xxxx对外暴露端口 xxxx.com是某个域名
    docker run --name jms_all -d -v /xxx/jumpserver:/opt/jumpserver/data/media -p xxxx:80 -p xxxx:2222 -e SECRET_KEY=xxx -e BOOTSTRAP_TOKEN=xxx -e DB_HOST=1.1.1.1 -e DB_PORT=3306 -e DB_USER=jumpserver -e DB_PASSWORD=xxx -e DB_NAME=jumpserver -e REDIS_HOST=1.1.1.1 -e REDIS_PORT=6379 dockerhub.xxxx.com/jumpserver/jms_all:2.1.1
  • 配置jumpserver

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    #创建管理用户xxx_root
    useradd xxx_root
    #根据提示生成一对公私钥,建议设置私钥密码
    ssh-keygen
    #建议在ecs模版集成xxx_root管理用户
    useradd xxx_root
    mkdir -p /home/xxx_root/.ssh
    chmod 700 /home/xxx_root/.ssh
    echo "sshkey" > /home/xxx_root/.ssh/authorized_keys
    chmod 600 /home/xxx_root/.ssh/authorized_keys
    chown xxx_root.xxx_root -R /home/xxx_root
    echo "xxx_root ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
    #登录jumpserver根据提示配置
    #系统设置
    #管理用户(xxx_root)
    #系统用户(jump_root/backend/web/devops)
    #资产列表

开通ack专用版

  • 1.20.4/docker/flanel/ipvs

  • 初始化

    1
    2
    3
    4
    5
    #master节点三个,调整apiSever服务端口访问1024-65535
    #node节点任意
    安装node_exporter/dnsmasq/crontab(清理go_log日志/docker镜像)
    创建命名空间并配置拉取镜像密钥
    推荐给节点打标签分组groupname=xxx_ops
  • 配置资产管理系统(resources-collector/cmdb)

    1
    2
    3
    #resources-collector主要从云厂商拉取资源
    创建数据库cmdb
    #cmdb对内提供资源/服务信息相关接口
  • 新建k8s-manager

    1
    对k8sApi封装,方便对内调用
  • 新建configmgr

    1
    配置中心
  • 新建opscenter

    1
    统一认证网关
  • 新建dbman

    1
    数据库操作相关
  • 新建ops-helper

    1
    对运维助手封装
  • 新建ops-frontend-v2

    1
    运维平台前端管理页面
  • 新建publish-system-v2

    1
    发布系统

缓存淘汰策略

淘汰策略

  • FIFO(First In First Out)

    先进先出,也就是淘汰缓存中最老(最早添加)的记录,创建一个队列,新增记录添加到队尾,当内存不足时,淘汰队首;

    但是很多场景下,部分记录虽然是最早添加的但也经常被访问,这类数据会被频繁添加缓存然后又被淘汰,导致命中率降低

  • LFU(Least Frequently Used)

    最少使用,也就是淘汰缓存中访问频率最低的记录,LFU需要维护一个按访问次数排序的队列,每次访问次数加1,队列重新排序,

    当内存不足时,淘汰访问次数最少的记录,维护每个记录的访问次数,对内存消耗较高,另外访问模式发生变化,LFU需要时间去适应,也就是说LFU算法受历史数据影响较大,比如某个记录历史访问很高,但在某个时间点后几乎不再被访问,因历史访问次数过高,迟迟不能被淘汰

  • LRU(Least Recently Used)

    最近最少使用,创建一个队列,如果某个记录被访问了,则移动到队尾,那么队首则是最少访问的数据,当内存不足时,淘汰改记录即可

Go语言实现LRU

  • 字典/双向链表(Map list.List)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
type Cache struct {
maxBytes int64 //最大容量
uBytes int64 //已使用容量
ll *list.List //双向链表
cache map[string]*list.Element //缓存数据
OnRemoved func(key string, value Value) //当记录被淘汰时回调
}

type Value interface {
Len() int
}

type entry struct {
key string
value Value
}

func New(maxBytes int64, onRemoved func(string, Value)) *Cache {
return &Cache{
maxBytes: maxBytes,
ll: list.New(),
cache: make(map[string]*list.Element),
OnRemoved: onRemoved,
}
}
  • 对缓存增删改查
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//增/改
func (c *Cache) Add(key string, value Value) {
//如果健存在,更新健值并将起移到队尾,因双向链表队尾是相对的
if ele, ok := c.cache[key]; ok {
c.ll.MoveToBack(ele)
kv := ele.Value.(*entry)
c.uBytes += int64(value.Len()) - int64(kv.value.Len())
kv.value = value
} else {
//不存在则新增并向队尾添加节点,并在字典中添加key和节点映射关系
//更新已使用容量,如果设置最大容量,则移除最少访问的节点
ele := c.ll.PushBack(&entry{key: key, value: value})
c.cache[key] = ele
c.uBytes += int64(len(key)) + int64(value.Len())
}
for c.maxBytes != 0 && c.uBytes > c.maxBytes {
c.RemoveOldEle()
}
}

//删
func (c *Cache) RemoveOldEle() {
//取队首节点
ele := c.ll.Front()
if ele != nil {
//从链表删除并从cache删除该节点的映射关系
c.ll.Remove(ele)
kv := ele.Value.(*entry)
delete(c.cache, kv.key)
//更新已使用容量
c.uBytes -= int64(len(kv.key)) + int64(kv.value.Len())
//回调函数
if c.OnRemoved != nil {
c.OnRemoved(kv.key, kv.value)
}
}
}

//查
func (c *Cache) Get(key string) (value Value, ok bool) {
//从cache中找到双向链表的节点并将该节点移到队尾
if ele, ok := c.cache[key]; ok {
c.ll.MoveToBack(ele)
kv := ele.Value.(*entry)
return kv.value, ok
}
return
}
  • 测试
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
package lru

import (
"reflect"
"testing"
)

type String string

func (s String) Len() int {
return len(s)
}

func TestGet(t *testing.T) {
c := New(0, nil)
c.Add("key1", String("val1"))
if v, ok := c.Get("key1"); !ok || string(v.(String)) != "val1" {
t.Fatalf("cache hit key1=val1 failed")
}
if _, ok := c.Get("key2"); ok {
t.Fatalf("cache miss key2 failed")
}
}

func TestRemoveOldEle(t *testing.T) {
k1, k2, k3 := "key1", "key2", "key3"
v1, v2, v3 := "val1", "val2", "val3"
maxBytes := len(k1 + k2 + v1 + v2)
c := New(int64(maxBytes), nil)
c.Add(k1, String(v1))
c.Add(k2, String(v2))
c.Add(k3, String(v3))

if _, ok := c.Get("key1"); ok || c.Len() != 2 {
t.Fatalf("removeoldele key1 failed")
}
}

func TestOnRemoved(t *testing.T) {
keys := make([]string, 0)
callback := func(key string, value Value) {
keys = append(keys, key)
}
c := New(int64(10), callback)
c.Add("k1", String("v1"))
c.Add("k2", String("v2"))
c.Add("k3", String("v3"))
c.Add("k4", String("k4"))

expect := []string{"k1", "k2"}
if !reflect.DeepEqual(expect, keys) {
t.Fatalf("call onremoved failed, expect keys equals to %s, get %s", expect, keys)
}
}

代码审查平台

代码审查平台

1
2
3
4
5
6
7
8
9
sonarqube
#1、启动pg docker
docker run --name db -e POSTGRES_USER=sonar -e POSTGRES_PASSWORD=codoon.com -d postgres
#2、创建volume
docker volume create sonarqube_data
docker volume create sonarqube_extensions
docker volume create sonarqube_logs
#3、启动sonarqube
docker run -d --name sonarqube -p 9000:9000 --link db -e SONAR_JDBC_URL=jdbc:postgresql://db:5432/sonar -e SONAR_JDBC_USERNAME=sonar -e SONAR_JDBC_PASSWORD=codoon.com -v sonarqube_data:/opt/sonarqube/data -v sonarqube_extensions:/opt/sonarqube/extensions -v sonarqube_logs:/opt/sonarqube/logs sonarqube:8.9.3-community

prometheus + confd + etcd 自动发现

  • 架构

    1. Prometheus的配置文件都是经由confd从etcd中读取并生成
    2. 采集端采用node-exporter,kafka-exporter,mysql-exporter等进行采集,启动的时候需要调用cmdb接口将自身数据写入etcd
    3. codoon-alert通过与etcd进行交互,对rules,告警屏蔽等进行配置
  • 主配置文件

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    global:
    scrape_interval: 10s #抓取间隔
    scrape_timeout: 10s #抓取超时时间
    evaluation_interval: 15s #评估规则间隔
    alerting:
    alertmanagers:
    - scheme: http
    timeout: 10s
    api_version: v1
    static_configs:
    - targets:
    - 127.0.0.1:9093
    rule_files:
    - /codoon/prometheus/etc/rules/rule_*.yml
    scrape_configs:
    - job_name: prometheus
    honor_timestamps: true
    scrape_interval: 10s
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: http
    static_configs:
    - targets:
    - 127.0.0.1:9090
    - job_name: codoon_ops
    honor_timestamps: true
    scrape_interval: 10s
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: http
    file_sd_configs:
    - files:
    - /codoon/prometheus/etc/targets/target_*.json
    refresh_interval: 20s #重载配置文件间隔
  • prometheus启动命令

    1
    2
    3
    /codoon/prometheus/prometheus --web.enable-lifecycle --config.file=/codoon/prometheus/etc/prometheus.yml --storage.tsdb.path=/codoon/prometheus

    nohup ./prometheus --web.enable-lifecycle --config.file=./etc/prometheus.yml --storage.tsdb.path=/codoon/prometheus --web.external-url=xxx.com/ 2>&1 > prometheus.log &
  • confd配置文件

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    #服务发现
    #conf.d/discovery_host.toml
    [template]
    src = "discovery_host.tmpl"
    dest = "/codoon/prometheus/etc/targets/target_host.json"
    mode = "0777"
    keys = [
    "/prometheus/discovery/host",
    ]
    reload_cmd = "curl -XPOST 'http://127.0.0.1:9090/-/reload'"

    #templates/discovery_host.tmpl
    [
    {{- range $index, $info := getvs "/prometheus/discovery/host/*" -}}
    {{- $data := json $info -}}
    {{- if ne $index 0 }},{{- end }}
    {
    "targets": [
    "{{$data.address}}"
    ],
    "labels":{
    "instance": "{{$data.name}}"
    {{- if $data.labels -}}
    {{- range $data.labels -}}
    ,"{{.key}}": "{{.val}}"
    {{- end}}
    {{- end}}
    }
    }{{- end }}
    ]

    #规则下发
    #conf.d/rule_host.toml
    [template]
    src = "rule_host.tmpl"
    dest = "/codoon/prometheus/etc/rules/rule_host.yml"
    mode = "0777"
    keys = [
    "/prometheus/rule/host",
    ]
    reload_cmd = "curl -XPOST 'http://127.0.0.1:9090/-/reload'"

    #templates/rule_host.tmpl
    groups:
    - name: host
    rules:
    {{- range $info := getvs "/prometheus/rule/host/*"}}
    {{- $data := json $info}}
    {{- if $data.status}}
    - alert: {{$data.alert}}
    expr: {{$data.expr}}
    for: {{$data.for}}
    {{- if $data.labels}}
    labels:
    {{- range $data.labels}}
    {{.key}}: {{.val}}
    {{- end}}
    {{- end}}
    annotations:
    {{- if $data.summary}}
    summary: "{{$data.summary}}"
    {{- end}}
    {{- if $data.description}}
    description: "{{$data.description}}"
    {{- end}}
    {{- end }}
    {{- end }}
  • confd启动命令

    1
    2
    3
    /codoon/prometheus/confd-0.16.0-linux-amd64 -confdir /codoon/prometheus/confd/ -backend etcdv3  -watch -node http://127.0.0.1:2379

    nohup ./confd-0.16.0-linux-amd64 -confdir ./confd/ -backend etcdv3 -watch -node http://127.0.0.1:2379 2>&1 > confd.log &
  • 模拟服务发现

    1
    2
    3
    4
    #标签默认有instance: name
    etcdctl put /prometheus/discovery/host/test1 '{"name":"test1","address":"10.12.10.1:9091"}'
    #自定义标签
    etcdctl put /prometheus/discovery/host/test2 '{"name":"test2","address":"10.12.10.1:9092","labels":[{"key":"label1","val":"test1"},{"key":"label2","val":"test2"}]}'
  • 模拟规则下发

    1
    2
    3
    4
    etcdctl put /prometheus/rule/host/test1 '{"alert":"test1 is down","expr":"up == 0","for":"30s","summary":"s1","description":"d1"}'
    #自定义标签
    etcdctl put /prometheus/rule/host/test2 '{"alert":"test2 is down","expr":"up == 0","for":"1m","summary":"s1","description":"d1","labels":[{"key":"label1","val":"test1"},{"key":"label2","val":"test2"}]}'

  • alertmanager

    1
    nohup ./alertmanager-0.21.0.linux-amd64/alertmanager --config.file=alertmanager-0.21.0.linux-amd64/alertmanager.yml 2>&1 > alertmanager.log &
  • 常用promsql

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    /prometheus/rule/host/nodata
    #无数据
    {"status":true,"alert":"no data","expr":"up == 0","for":"5m","summary":"no data","description":"{{$labels.instance}} no data for 5m, curr: {{ $value }}","labels":[{"key":"diyk","val":"diyv"}]}

    /prometheus/rule/host/availcpult20
    #cpu可用率小于20%
    {"status":true,"alert":"avail cpu lt 20%","expr":"avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) by (type,instance,env,ip) < 0.2","for":"5m","summary":"avail cpu lt 20%","description":"avail cpu lt 20% for 5m, curr: {{ $value }}","labels":[{"key":"diyk","val":"diyv"}]}

    /prometheus/rule/host/availmemlt20
    #mem可用率小于20%
    {"status":true,"alert":"avail mem lt 20%","expr":"1-(node_memory_MemTotal_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes - node_memory_MemFree_bytes) /node_memory_MemTotal_bytes < 0.2","for":"5m","summary":"avail mem lt 20%","description":"avail mem lt 20% for 5m, curr: {{ $value }}","labels":[{"key":"diyk","val":"diyv"}]}

    /prometheus/rule/host/availdisklt20
    #disk可用率小于20%
    {"status":true,"alert":"avail disk lt 20%","expr":"node_filesystem_avail_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*docker.*|.*pod.*|.*container|.*kubelet\"} /node_filesystem_size_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*docker.*|.*pod.*|.*container|.*kubelet\"} < 0.2","for":"5m","summary":"avail disk lt 20%","description":"mount: {{ $labels.mountpoint }} avail lt 20G for 5m, curr: {{ $value }}","labels":[{"key":"diyk","val":"diyv"}]}

    /prometheus/rule/host/load1toohigh
    #1分钟负载
    {"status":true,"alert":"load1 is too high","expr":"node_load1/2 > on(type,instance,env,ip) count(node_cpu_seconds_total{mode=\"system\"}) by (type,instance,env,ip)","for":"5m","summary":"load1 is too high","description":"load1 is too high for 5m, curr: {{ $value }}","labels":[{"key":"diyk","val":"diyv"}]}

    /prometheus/rule/host/useiopsgt80
    #iops使用率大于80%
    {"status": true,"alert":"iops too high","expr":"rate(node_disk_io_time_seconds_total[5m]) > 0.8","for":"5m","summary":"iops too high","description":"iops too high for 5m, curr: {{ $value }}","labels":[{"key":"diyk","val":"diyv"}]}


    (1 - (node_memory_MemFree_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"} +node_memory_Buffers_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"} +node_memory_Cached_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"} / (node_memory_MemTotal_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"})))* 100


    ((node_memory_MemTotal_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"} - node_memory_MemFree_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"} - node_memory_Buffers_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"} - node_memory_Cached_bytes) / (node_memory_MemTotal_bytes{origin_prometheus=~"$origin_prometheus",job=~"$job"} )) * 100

    #告警规则整理
    1分钟的负载大于cpu核心数 持续5m
    node_load1 > on(instance,ip) count(node_cpu_seconds_total{mode="system"}) by (instance,ip)

    CPU可用率小于20% 持续5m
    avg(rate(node_cpu_seconds_total{mode="system"}[5m])) by (instance) *100
    avg(rate(node_cpu_seconds_total{mode="user"}[5m])) by (instance) *100
    avg(rate(node_cpu_seconds_total{mode="iowait"}[5m])) by (instance) *100
    avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) *100

    磁盘可用率小于20%且可用小于20G 持续5m
    (node_filesystem_avail_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*pod.*|.*docker-lib.*\"} / node_filesystem_size_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*pod.*|.*docker-lib.*\"} < 0.2) and node_filesystem_avail_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*pod.*|.*docker-lib.*\"} < 20*1024^3

    内存使用率大于80% 持续5m
    (node_memory_MemTotal_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes - node_memory_MemFree_bytes) /node_memory_MemTotal_bytes

    IOPS write大于300 read 大于2000 持续5m
    rate(node_disk_reads_completed_total[5m]) > 1000 or rate(node_disk_writes_completed_total[5m]) > 200

    网卡 1小时总流量 5分钟速率
    increase(node_network_receive_bytes_total[60m]) /1024/1024
    increase(node_network_transmit_bytes_total[60m]) /1024/1024
    rate(node_network_receive_bytes_total[5m])*8
    rate(node_network_transmit_bytes_total[5m])*8
  • temp

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
     {"status": true,"alert":"rw iops too high","expr":"rate(node_disk_io_time_seconds_total[5m]) > 0.8","for":"5m","summary":"iops too high","description":"iops too high for 5m, curr: {{ $value }}","labels":[{"key":"receiver","val":"xxxx,xxxx,xxx"}

    etcdctl put /prometheus/discovery/host/codoon-istio-master01 '{"name":"codoon-istio-master01","address":"10.10.16.73:9100","labels": [{"key":"type","val":"host"},{"key":"ip","val":"10.10.16.73"}]}'

    etcdctl put /prometheus/rule/host/cpuavail20 '{"alert":"cpu avail less 20","expr":"avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) by (instance) < 0.2","for":"5m","summary":"avail less 20","description":"cpu avail less 20 for 5m, curr: {{ $value }}","labels":[{"key":"receiver","val":"xxx"}]}'

    etcdctl put /prometheus/rule/host/memuse80 '{"alert":"mem use gt 80","expr":"(node_memory_MemTotal_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes - node_memory_MemFree_bytes) /node_memory_MemTotal_bytes > 0.8","for":"5m","summary":"use gt 80","description":"mem use gt 80 for 5m, curr: {{ $value }}","labels":[{"key":"receiver","val":"xxx"}]}'

    etcdctl put /prometheus/rule/host/iopsth '{"alert":"rw iops too high","expr":"rate(node_disk_reads_completed_total[5m]) > 1000 or rate(node_disk_writes_completed_total[5m]) > 200","for":"5m","summary":"iops too high","description":"iops too high for 5m, curr: {{ $value }}","labels":[{"key":"receiver","val":"xxxx"}]}'

    {
    "status": true,
    "alert": "avail disk lt 20%",
    "expr": "node_filesystem_avail_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*docker.*|.*pod.*|.*container|.*kubelet\"} /node_filesystem_size_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*docker.*|.*pod.*|.*container|.*kubelet\"} < 0.2 and node_filesystem_avail_bytes{fstype=~\"ext.*|xfs\",mountpoint!~\".*docker.*|.*pod.*|.*container|.*kubelet\"} < 50*1024^3",
    "for": "2m",
    "summary": "avail disk lt 20%",
    "description": "mount: {{ $labels.mountpoint }} avail lt 20% for 2m, curr: {{ $value }}",
    "labels": [{
    "key": "severity",
    "val": "warnning"
    }]
    }

    etcdctl put /prometheus/rule/host/load1too2high '{"status":true,"alert":"load1 is too2 high","expr":"node_load1 > on(type,instance,env,ip) count(node_cpu_seconds_total{mode=\"system\"}) by (type,instance,env,ip) /1.5","for":"2m","summary":"load1 is too2 high","description":"load1 is too2 high for 2m, curr: {{ $value }}","labels":[{"key":"severity","val":"critical"}]}'
  • 启动脚本

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    vim /usr/lib/systemd/system/prometheus.service
    [Unit]
    Description=prometheus
    Documentation=codoon_ops
    After=network.target
    [Service]
    EnvironmentFile=-/etc/sysconfig/prometheus
    User=prometheus
    ExecStart=/usr/local/prometheus/prometheus \
    --web.enable-lifecycle \
    --storage.tsdb.path=/codoon/prometheus/data \
    --config.file=/codoon/prometheus/etc/prometheus.yml \
    --web.listen-address=0.0.0.0:9090 \
    --web.external-url= $PROM_EXTRA_ARGS \
    --log.level=debug
    Restart=on-failure
    StartLimitInterval=1
    RestartSec=3
    [Install]
    WantedBy=multi-user.target

    systemctl daemon-reload
    systemctl enable prometheus

  • docker

    1
    docker run --name promconfd -d -v /codoon/prometheus/etc:/opt/prometheus/etc -v /codoon/prometheus/data:/opt/prometheus/data -v /codoon/prometheus/confd/etc:/opt/confd/etc -p 9090:9090 dockerhub.xxxx.com/prom/prometheus:v2.24.1
  • 部署方式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    prometheus+confd 以docker方式部署 prom-monitor
    tsdb数据库存放路径:/codoon/prometheus/data
    prometheus配置文件路径:/codoon/prometheus/etc
    confd配置文件路径:/codoon/prometheus/confd/etc

    ops-etcd0|1|2
    etcd服务自动发现
    /prometheus/discovery/host/*
    /prometheus/discovery/db/*
    ...

    规则自动下发
    /prometheus/rule/host/*
    /prometheus/rule/host/*
    ...
  • 发送消息策略

    1
    2
    3
    4
    1、warnning级别告警首次先等1分钟再发,
    看同类型是否有critical级别告警,若有立即发送,warnning级别告警不再发送
    2、warnning级别告警间隔20分钟发送1次
    3、critical级别告警间隔10分钟发送1次
  • 静默配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    通过opscenter配置,原理是通过标签判断过滤,会找最优匹配
    抑制逻辑:同一告警高优先级自动抑制低优先级,高优先级恢复后自动解除抑制
    静默配置保存到ops-etcd /prometheus/silencev2

    支持alertname:instance:lables... 告警名称、实例、IP、级别等正则匹配
    新增静默 POST
    curl -X POST -H 'Content-Type: application/json' -d '{"sc_key":"tidb","sc_val":"instance:severity:alertname:tidb-(node|ssd-[0-9]+)warnning(load1.*|avail cpu.*)"}' codoon-alert.in.xxx.com:8875/backend/codoon_alert/api/v1/silence
    删除静默 DELETE
    curl -X DELETE codoon-alert.in.xxx.com:8875/backend/codoon_alert/api/v1/silence/tidb
    查看静默 GET
    curl codoon-alert.in.xx.com:8875/backend/codoon_alert/api/v1/silence

    查看alertconfig配置 GET
    curl codoon-alert.in.xxx.com:8875/backend/codoon_alert/api/v1/alertconfig?cfg_key=notice|wait|clear|reslove


    {
    "data": {
    "apitmporcheckall": "instance:alertname:(nginx-api-tmp|apicheck(-[0-9])?)(.*)",
    "intwarnall": "instance:severity:alertname:integrationwarnning(.*)",
    "istio": "instance:severity:alertname:(codoon[0-9]+istio)warnning(load1.*)",
    "monitor_roy": "instance:severity:alertname:monitor_roywarnning(load1.*)",
    "testall": "instance:alertname:testall(.*)",
    "tidb": "instance:severity:alertname:tidb-(node|ssd-[0-9]+)warnning(load1.*|avail cpu.*)"
    },
    "description": "ok",
    "status": "OK"
    }
  • 告警配置

    1
    2
    3
    和静默配置原理一样,通过标签过滤,默认会找最优匹配,标签匹配逻辑,
    优先检查=、!=,其次检查=~、!~(正则)
    告警配置保存到ops-etcd /prometheus/receiver
  • 告警模板

    1
    2
    3
    通过opscenter自定义,告警大于3条时会自动收拢,
    同时会再发一封邮件(包括完整告警信息)
    告警配置保存到ops-etcd /prometheus/template
  • 其他说明

    1
    2
    3
    4
    5
    标签type=service会根据服务名称(service=xxx)通过cmdb获取告警人
    不希望收到恢复通知,可在标签中配置resolved=no
    pod cpu/mem(pprof_type=memory/cpu)告警会发pprof
    service error/panic(log_type: ERRO/PANIC)会从loki获取详情并发送
    servicemap 日志名与服务映射,watch err_check/service_map