Openshift中Pod的SpringBoot2应用程序健康检查

1. 准备测试的SpringBoot工程, 需要Java 8 JDK or greater and Maven 3.3.x or greater.

git clone https://github.com/megadotnet/Openshift-healthcheck-demo.git

假设您已经掌握基本JAVA应用程序开发,Openshift容器平台已经部署成功。我们的测试工程依赖库Spring Boot Actuator2, 有如新特性

    • 支持 Jersey RESTful Web 服务
    • 支持基于反应式理念的 WebFlux Web App
    • 新的端点映射
    • 简化用户自定义端点的创建
    • 增强端点的安全性

Actuator提供了13个接口,如下:

在Spring Boot 2.x中为了安全起见,Actuator只开放了两个端点/actuator/health和/actuator/info,可以在配置文件中设置开关。

部署编译jar文件到Openshift容器平台

Openshift的部署过程简述如下(这里采用是二进制部署方式):

--> Found image dc046fe (16 months old) in image stream "openshift/s2i-java" under tag "latest" for "s2i-java:latest"

Java S2I builder 1.0

--------------------

Platform for building Java (fatjar) applications with maven or gradle

Tags: builder, maven-3, gradle-2.6, java, microservices, fatjar

* A source build using binary input will be created

* The resulting image will be pushed to image stream "health-demo:latest"

* A binary build was created, use 'start-build --from-dir' to trigger a new build

--> Creating resources with label app=health-demo ...

imagestream "health-demo" created

buildconfig "health-demo" created

--> Success

Uploading directory "oc-build" as binary input for the build ...

build "health-demo-1" started

--> Found image fb46616 (5 minutes old) in image stream "hshreport-stage/health-demo" under tag "latest" for "health-demo:latest"

Java S2I builder 1.0

--------------------

Platform for building Java (fatjar) applications with maven or gradle

Tags: builder, maven-3, gradle-2.6, java, microservices, fatjar

* This image will be deployed in deployment config "health-demo"

* Ports 7575/tcp, 8080/tcp will be load balanced by service "health-demo"

* Other containers can access this service through the hostname "health-demo"

--> Creating resources with label app=health-demo ...

deploymentconfig "health-demo" created

service "health-demo" created

--> Success

Run 'oc status' to view your app.

route "health-demo" exposed

以上过程最近暴露Router, 方便我们演示

演示过程:

第一回合

修改刚部署的deploymentConfig的yaml,增加readiness配置,如下:

---

readinessProbe:
   failureThreshold: 3
   httpGet:
     path: /actuator/health
     port: 8080
     scheme: HTTP
   initialDelaySeconds: 10
   periodSeconds: 10
   successThreshold: 1
   timeoutSeconds: 1

对于以几个参数,我们说明下,大家需要理解

      • initialDelaySeconds:容器启动后第一次执行探测是需要等待多少秒。
      • periodSeconds:执行探测的频率。默认是10秒,最小1秒。
      • timeoutSeconds:探测超时时间。默认1秒,最小1秒。
      • successThreshold:探测失败后,最少连续探测成功多少次才被认定为成功。默认是 1。对于 liveness 必须是 1。最小值是 1。
      • failureThreshold:探测成功后,最少连续探测失败多少次才被认定为失败。默认是 3。最小值是 1。

或是采用openshift cli命令行来配置readiness:

oc set probe dc/app-cli \

--readiness \

--get-url=http://:8080/notreal \

--initial-delay-seconds=5

$ oc get pod -w

#刚才修改 deploymentConfig,pod重新部署了

NAME READY                 STATUS   RESTARTS  AGE

health-demo-1-build 0/1 Completed    0        16m

health-demo-2-sqh4z 1/1 Running      0        11m

执行HTTP API 来STOP停止Tomcat,curl http://${value-name-app}-MY_PROJECT_NAME.LOCAL_OPENSHIFT_HOSTNAME/api/stop,   注意此处URL相部署的DNS环境有关系

程序中日志如下:

Stopping Tomcat context.

2020-01-11 22:17:21.004 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:22.008 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.012 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.114 INFO 1 --- [nio-8080-exec-9] o.a.c.c.C.[Tomcat].[localhost].[/] : Destroying Spring FrameworkServlet 'dispatcherServlet'

#跟踪pod的动态

$ oc get pod –w

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 16m

health-demo-2-sqh4z 1/1 Running 0 11m

health-demo-2-sqh4z 0/1 Running 0 13m

#我们查阅pod的详细描述

$ oc describe pod/health-demo-2-sqh4z

Name: health-demo-2-sqh4z

Namespace: hshreport-stage

Security Policy: restricted

Node: openshift-lb-02.hsh.io/10.108.78.145

Start Time: Sat, 11 Jan 2020 22:08:59 +0800

Labels: app=health-demo

deployment=health-demo-2

deploymentconfig=health-demo

Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"hshreport-stage","name":"health-demo-2","uid":"e6436263-347b-11ea-856c...

openshift.io/deployment-config.latest-version=2

openshift.io/deployment-config.name=health-demo

openshift.io/deployment.name=health-demo-2

openshift.io/generated-by=OpenShiftNewApp

openshift.io/scc=restricted

Status: Running

IP: 10.131.5.124

Controllers: ReplicationController/health-demo-2

Containers:

health-demo:

Container ID: docker://25cdf63f55d839610287b4e2a3cc67182377bfe5010990357f83329286c7e64f

Image: docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Image ID: docker-pullable://docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Ports: 7575/TCP, 8080/TCP

State: Running

Started: Sat, 11 Jan 2020 22:09:09 +0800

Ready: False

Restart Count: 0

Readiness: http-get http://:8080/actuator/health delay=10s timeout=1s period=10s #success=1 #failure=3

Environment:

APP_OPTIONS: -Xmx512m -Xss512k -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8

DEPLOYER: liu.xxxxx (Administrator) (cicd-1.1.24)

REVISION:

SPRING_PROFILES_ACTIVE: stage

TZ: Asia/Shanghai

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-n4klp (ro)

Conditions:

Type Status

Initialized True

Ready False

PodScheduled True

Volumes:

default-token-n4klp:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-n4klp

Optional: false

QoS Class: BestEffort

Node-Selectors: region=primary

Tolerations: <none>

Events:

FirstSeen LastSeen Count From SubObjectPath Type Reason Message

--------- -------- ----- ---- ------------- -------- ------ -------

16m 16m 1 default-scheduler Normal Scheduled Successfully assigned health-demo-2-sqh4z to openshift-lb-02.hsh.io

16m 16m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Pulling pulling image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

16m 16m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Pulled Successfully pulled image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

15m 15m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Created Created container

15m 15m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Normal Started Started container

15m 15m 1 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: Get http://10.131.5.124:8080/actuator/health: dial tcp 10.131.5.124:8080: getsockopt: connection refused

7m 5m 16 kubelet, openshift-lb-02.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 404

注意上面的WARN事件,我们发现POD没有被重启,因为我们只配置了readiness

第二回合

对之前部署后应用,增加health check, /actuator/health 是SpringBoot2.0的示例工程默认的健康检查的端点

修改deploymentConfig,增加readiness与liveness

---

livenessProbe:
   failureThreshold: 3
   httpGet:
     path: /actuator/health
     port: 8080
     scheme: HTTP
   initialDelaySeconds: 60
   periodSeconds: 10
   successThreshold: 1
   timeoutSeconds: 1

name: health-demo

ports:
   -
     containerPort: 7575
     protocol: TCP
   -
     containerPort: 8080
     protocol: TCP

readinessProbe:
   failureThreshold: 3
   httpGet:
     path: /actuator/health
     port: 8080
     scheme: HTTP
   initialDelaySeconds: 10
   periodSeconds: 10
   successThreshold: 1
   timeoutSeconds: 1

也可以通过Web 控制台来修改,示例截图如下:

我们看到 WEB UI的配置与yaml中参数是一样的。

oc cli命令行方法 #Configure Liveness/Readiness probes on DCs

oc set probe dc cotd1 --liveness -- echo ok

oc set probe dc/cotd1 --readiness --get-url=http://:8080/index.php --initial-delay-seconds=2

TCP的示例

oc set probe dc/blog --readiness --liveness --open-tcp 8080

移动 probe

$ oc set probe dc/blog --readiness --liveness –remove

执行stop后,此时请求ROUTER已显示,POD还在运行,浏览器返回

Application is not available

通过URL执行后,STOP tomcat,pod中程序部分日志如下:

Stopping Tomcat context.

2020-01-11 22:17:21.004 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:22.008 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.012 INFO 1 --- [nio-8080-exec-9] o.apache.catalina.core.StandardWrapper : Waiting for [1] instance(s) to be deallocated for Servlet [dispatcherServlet]

2020-01-11 22:17:23.114 INFO 1 --- [nio-8080-exec-9] o.a.c.c.C.[Tomcat].[localhost].[/] : Destroying Spring FrameworkServlet 'dispatcherServlet'

过一会儿,我们监控pod动态

$ oc get pod -w

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 33m

health-demo-3-02v11 1/1 Running 0 5m

health-demo-3-02v11 0/1 Running 0 7m

health-demo-3-02v11 0/1 Running 1 7m

$ oc get pod

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 36m

health-demo-3-02v11 1/1 Running 1 8m

请求 curl http://${value-name-app}-MY_PROJECT_NAME.LOCAL_OPENSHIFT_HOSTNAME/api/greeting?name=s2i

后浏览器显示:

{"content":"Hello, s2i!"} (the recovery took 41.783 seconds)

此时POD已经被重启,应用程序的日志如下:

2020-01-11 22:35:13.597 INFO 1 --- [ main] s.b.a.e.w.s.WebMvcEndpointHandlerMapping : Mapped "{[/actuator],methods=[GET],produces=[application/vnd.spring-boot.actuator.v2+json || application/json]}" onto protected java.util.Map<java.lang.String, java.util.Map<java.lang.String, org.springframework.boot.actuate.endpoint.web.Link>> org.springframework.boot.actuate.endpoint.web.servlet.WebMvcEndpointHandlerMapping.links(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)

2020-01-11 22:35:13.750 INFO 1 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup

2020-01-11 22:35:13.873 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''

2020-01-11 22:35:13.882 INFO 1 --- [ main] dev.snowdrop.example.ExampleApplication : Started ExampleApplication in 8.061 seconds (JVM running for 9.682)

2020-01-11 22:35:22.445 INFO 1 --- [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring FrameworkServlet 'dispatcherServlet'

2020-01-11 22:35:22.445 INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization started

2020-01-11 22:35:22.485 INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 39 ms

$ oc describe pod/health-demo-3-02v11

Name: health-demo-3-02v11

Namespace: hshreport-stage

Security Policy: restricted

Node: openshift-node-04.hsh.io/10.108.78.139

Start Time: Sat, 11 Jan 2020 22:32:12 +0800

Labels: app=health-demo

deployment=health-demo-3

deploymentconfig=health-demo

Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"hshreport-stage","name":"health-demo-3","uid":"23ad2f21-347f-11ea-856c...

openshift.io/deployment-config.latest-version=3

openshift.io/deployment-config.name=health-demo

openshift.io/deployment.name=health-demo-3

openshift.io/generated-by=OpenShiftNewApp

openshift.io/scc=restricted

Status: Running

IP: 10.129.5.178

Controllers: ReplicationController/health-demo-3

Containers:

health-demo:

Container ID: docker://3e5a6b081022c914d8e118dce829294570e54f441b84394a2b13f6eebb4f5c74

Image: docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Image ID: docker-pullable://docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Ports: 7575/TCP, 8080/TCP

State: Running

Started: Sat, 11 Jan 2020 22:35:04 +0800

Last State: Terminated

Reason: Error

Exit Code: 143

Started: Sat, 11 Jan 2020 22:32:15 +0800

Finished: Sat, 11 Jan 2020 22:35:03 +0800

Ready: True

Restart Count: 1

Liveness: http-get http://:8080/actuator/health delay=60s timeout=1s period=10s #success=1 #failure=3

Readiness: http-get http://:8080/actuator/health delay=10s timeout=1s period=10s #success=1 #failure=3

Environment:

APP_OPTIONS: -Xmx512m -Xss512k -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8

DEPLOYER: liu.xxxxxx(Administrator) (cicd-1.1.24)

REVISION:

SPRING_PROFILES_ACTIVE: stage

TZ: Asia/Shanghai

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-n4klp (ro)

Conditions:

Type Status

Initialized True

Ready True

PodScheduled True

Volumes:

default-token-n4klp:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-n4klp

Optional: false

QoS Class: BestEffort

Node-Selectors: region=primary

Tolerations: <none>

Events:

FirstSeen LastSeen Count From SubObjectPath Type Reason Message

--------- -------- ----- ---- ------------- -------- ------ -------

17m 17m 1 default-scheduler Normal Scheduled Successfully assigned health-demo-3-02v11 to openshift-node-04.hsh.io

15m 14m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 404

15m 14m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 404

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulling pulling image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulled Successfully pulled image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Created Created container

14m 14m 1 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Killing Killing container with id docker://health-demo:pod "health-demo-3-02v11_hshreport-stage(27e5a1da-347f-11ea-856c-0050568d3d78)" container "health-demo" is unhealthy, it will be killed and re-created.

17m 14m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Started Started container

第二次我们执行 STOP的HTTP  API

$ oc get pod –w

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 47m

health-demo-3-02v11 1/1 Running 1 19m

health-demo-3-02v11 0/1 Running 1 19m

health-demo-3-02v11 0/1 Running 2 20m

health-demo-3-02v11 1/1 Running 2 20m

$ oc get pod

NAME READY STATUS RESTARTS AGE

health-demo-1-build 0/1 Completed 0 49m

health-demo-3-02v11 1/1 Running 2 21

HTTP 请求返回结果:

{"content":"Hello, s2i!"} (the recovery took 51.984 seconds)

$ oc describe pod/health-demo-3-02v11

Name: health-demo-3-02v11

Namespace: hshreport-stage

Security Policy: restricted

Node: openshift-node-04.hsh.io/10.108.78.139

Start Time: Sat, 11 Jan 2020 22:32:12 +0800

Labels: app=health-demo

deployment=health-demo-3

deploymentconfig=health-demo

Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"hshreport-stage","name":"health-demo-3","uid":"23ad2f21-347f-11ea-856c...

openshift.io/deployment-config.latest-version=3

openshift.io/deployment-config.name=health-demo

openshift.io/deployment.name=health-demo-3

openshift.io/generated-by=OpenShiftNewApp

openshift.io/scc=restricted

Status: Running

IP: 10.129.5.178

Controllers: ReplicationController/health-demo-3

Containers:

health-demo:

Container ID: docker://e12d1975aa26b07643ae1666ae6bce7ceab4f25fb4c6c947427ba526ad6fdf7b

Image: docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Image ID: docker-pullable://docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6

Ports: 7575/TCP, 8080/TCP

State: Running

Started: Sat, 11 Jan 2020 22:47:14 +0800

Last State: Terminated

Reason: Error

Exit Code: 143

Started: Sat, 11 Jan 2020 22:35:04 +0800

Finished: Sat, 11 Jan 2020 22:47:02 +0800

Ready: True

Restart Count: 2

Liveness: http-get http://:8080/actuator/health delay=60s timeout=1s period=10s #success=1 #failure=3

Readiness: http-get http://:8080/actuator/health delay=10s timeout=1s period=10s #success=1 #failure=3

Environment:

APP_OPTIONS: -Xmx512m -Xss512k -Djava.net.preferIPv4Stack=true -Dfile.encoding=utf-8

DEPLOYER: liu.xxxxx (Administrator) (cicd-1.1.24)

REVISION:

SPRING_PROFILES_ACTIVE: stage

TZ: Asia/Shanghai

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-n4klp (ro)

Conditions:

Type Status

Initialized True

Ready True

PodScheduled True

Volumes:

default-token-n4klp:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-n4klp

Optional: false

QoS Class: BestEffort

Node-Selectors: region=primary

Tolerations: <none>

Events:

FirstSeen LastSeen Count From SubObjectPath Type Reason Message

--------- -------- ----- ---- ------------- -------- ------ -------

21m 21m 1 default-scheduler Normal Scheduled Successfully assigned health-demo-3-02v11 to openshift-node-04.hsh.io

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulling pulling image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

19m 6m 6 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 404

19m 6m 6 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 404

18m 6m 2 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Killing Killing container with id docker://health-demo:pod "health-demo-3-02v11_hshreport-stage(27e5a1da-347f-11ea-856c-0050568d3d78)" container "health-demo" is unhealthy, it will be killed and re-created.

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Pulled Successfully pulled image "docker-registry.default.svc:5000/hshreport-stage/health-demo@sha256:292f09b7d9ca9bc12560febe3f4ba73e50b3c1a5701cbd55689186e844157fb6"

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Created Created container

21m 6m 3 kubelet, openshift-node-04.hsh.io spec.containers{health-demo} Normal Started Started container

#看下事件

$ oc get ev

LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE

29m 41m 6 health-demo-3-02v11 Pod spec.containers{health-demo} Warning Unhealthy kubelet, openshift-node-04.hsh.io Liveness probe failed: HTTP probe failed with statuscode: 404

29m 41m 6 health-demo-3-02v11 Pod spec.containers{health-demo} Warning Unhealthy kubelet, openshift-node-04.hsh.io Readiness probe failed: HTTP probe failed with statuscode: 404

29m 41m 2 health-demo-3-02v11 Pod spec.containers{health-demo} Normal Killing kubelet, openshift-node-04.hsh.io Killing container with id docker://health-demo:pod "health-demo-3-02v11_hshreport-stage(27e5a1da-347f-11ea-856c-0050568d3d78)" container "health-demo" is unhealthy, it will be killed and re-created.

19m 19m 1 health-demo-3-02v11 Pod spec.containers{health-demo} Normal Killing kubelet, openshift-node-04.hsh.io Killing container with id docker://health-demo:Need to kill Pod

44m 44m 1 health-demo-3-deploy Pod Normal Scheduled default-scheduler Successfully assigned health-demo-3-deploy to openshift-lb-02.hsh.io

44m 44m 1 health-demo-3-deploy Pod spec.containers{deployment} Normal Pulled kubelet, openshift-lb-02.hsh.io Container image "openshift/origin-deployer:v3.6.1" already present on machine

44m 44m 1 health-demo-3-deploy Pod spec.containers{deployment} Normal Created kubelet, openshift-lb-02.hsh.io Created container

44m 44m 1 health-demo-3-deploy Pod spec.containers{deployment} Normal Started kubelet, openshift-lb-02.hsh.io Started container

44m 44m 1 health-demo-3 ReplicationController Normal SuccessfulCreate replication-controller Created pod: health-demo-3-02v11

19m 19m 1 health-demo-3 ReplicationController Normal SuccessfulDelete

发现,POD的名称health-demo-3-02v11没有变,演示到这儿结束

小结

Liveness probes可做三种检查

HTTP(S) checks—Checks a given URL endpoint served by the container, and evaluates the HTTP response code.

Container execution check—A command, typically a script, that’s run at intervals to verify that the container is behaving as expected. A non-zero exit code from the command results in a liveness check failure.

TCP socket checks—Checks that a TCP connection can be established on a specific TCP port in the application pod.


Readiness和liveness的区别

readiness 就是意思是否可以访问,liveness就是是否存活。如果一个readiness 为fail 的后果是把这个pod 的所有service 的endpoint里面的改pod ip 删掉,意思就这个pod对应的所有service都不会把请求转到这pod来了。但是如果liveness 检查结果是fail就会直接kill container,当然如果你的restart policy 是always 会重启pod。

Readiness探针和Liveness探针都是用来检测容器进程状态的。区别在于前者关注的是是否把进程的服务地址加入Service的负载均衡列表;而后者则决定是否去重启这个进程来排除故障。它们在进程的整个生命周期中都存在而且同时工作,职责分离。

Kubelet 可以选择是否执行在容器上运行的两种探针执行和做出反应:

  • livenessProbe:指示容器是否正在运行。如果存活探测失败,则 kubelet 会杀死容器,并且容器将受到其 重启策略 的影响。如果容器不提供存活探针,则默认状态为 Success。
  • readinessProbe:指示容器是否准备好服务请求。如果就绪探测失败,端点控制器将从与 Pod 匹配的所有 Service 的端点中删除该 Pod 的 IP 地址。初始延迟之前的就绪状态默认为 Failure。如果容器不提供就绪探针,则默认状态为 Success。

最佳实践

一般来讲,Readiness的执行间隔要比Liveness设置的较长一点比较好。因为当后端进程负载高的时候,我们可以暂时从转发列表里面摘除,但是Liveness决定的是进程是否重启,其实这个时候进程不一定需要重启。所以Liveness的检测周期可以稍微长一点,另外失败的容忍数量也可以多一点。具体根据实际情况判断吧。


今天先到这儿,希望对云原生,技术领导力, 企业管理,系统架构设计与评估,团队管理, 项目管理, 产品管管,团队建设 有参考作用 , 您可能感兴趣的文章:
Openshift部署流程介绍
Openshift V3系列各组件版本
领导人怎样带领好团队
构建创业公司突击小团队
国际化环境下系统架构演化
微服务架构设计
视频直播平台的系统架构演化
微服务与Docker介绍
Docker与CI持续集成/CD
互联网电商购物车架构演变案例
互联网业务场景下消息队列架构
互联网高效研发团队管理演进之一
消息系统架构设计演进
互联网电商搜索架构演化之一
企业信息化与软件工程的迷思
企业项目化管理介绍
软件项目成功之要素
人际沟通风格介绍一
精益IT组织与分享式领导
学习型组织与企业
企业创新文化与等级观念
组织目标与个人目标
初创公司人才招聘与管理
人才公司环境与企业文化
企业文化、团队文化与知识共享
高效能的团队建设
项目管理沟通计划
构建高效的研发与自动化运维
某大型电商云平台实践
互联网数据库架构设计思路
IT基础架构规划方案一(网络系统规划)
餐饮行业解决方案之客户分析流程
餐饮行业解决方案之采购战略制定与实施流程
餐饮行业解决方案之业务设计流程
供应链需求调研CheckList
企业应用之性能实时度量系统演变

如有想了解更多软件设计与架构, 系统IT,企业信息化, 团队管理 资讯,请关注我的微信订阅号:

作者:Petter Liu
出处:http://www.cnblogs.com/wintersun/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
该文章也同时发布在我的独立博客中-Petter Liu Blog。

Openshift中Pod的SpringBoot2健康检查的更多相关文章

  1. 2.k8s.Pod生命周期,健康检查

    #Pod生命周期,健康检查 pod创建过程 Init容器 就绪探测 存活探测 生命周期钩子 #Pod创建过程 master节点:kubectl -> kube-api -> kubenle ...

  2. pod资源的健康检查-liveness探针的exec使用

    使用探针的方式对pod资源健康检查 探针的种类 livenessProbe:健康状态检查,周期性检查服务是否存活,检查结果失败,将重启容器 readinessProbe:可用性检查,周期性检查服务是否 ...

  3. Nginx负载均衡中后端节点服务器健康检查的操作梳理

    正常情况下,nginx做反向代理,如果后端节点服务器宕掉的话,nginx默认是不能把这台realserver踢出upstream负载集群的,所以还会有请求转发到后端的这台realserver上面,这样 ...

  4. Nginx负载均衡中后端节点服务器健康检查的一种简单方式

    摘自:https://cloud.tencent.com/developer/article/1027287 一.利用nginx自带模块ngx_http_proxy_module和ngx_http_u ...

  5. pod资源的健康检查-readiness探针的httpGet使用

    livenessProbe:健康状态检查,周期性检查服务是否存活,检查结果失败,将重启容器 readinessProbe:可用性检查,周期性检查服务是否可用,不可用将从service的endpoint ...

  6. pod资源的健康检查-liveness探针的httpGet使用

    使用liveness探针httpget方式检测pod健康,httpGet方式使用的最多 [root@k8s-master1 tanzhen]# cat nginx_pod_httpGet.yaml a ...

  7. OpenShift添加应用健康检查功能

    什么是健康检查? 对于部署成功的应用来说,通过访问接口.执行特定命令等方式判断应用是否存活.正常的方式称为健康检查. 在 OpenShift 或 Kubernetes 中,健康检查都有两个探针,分别是 ...

  8. [转]Eureka自我保护机制、健康检查的作用、actuator模块监控

    Eureka自我保护机制 接着以上篇文章建立的三个工程为基础(eureka-server,uerreg,myweb),默认Eureka是开启自我保护的.我们来做个测试,我们先启动三个工程,我们访问注册 ...

  9. nginx高性能WEB服务器系列之六--nginx负载均衡配置+健康检查

    nginx系列友情链接:nginx高性能WEB服务器系列之一简介及安装https://www.cnblogs.com/maxtgood/p/9597596.htmlnginx高性能WEB服务器系列之二 ...

随机推荐

  1. sqli_labs学习笔记(一)Less-54~Less-65

    续上,开门见山 暴库: http://43.247.91.228:84/Less-54/?id=-1' union select 1,2,database() --+ challenges 爆表: h ...

  2. FileNotFoundError: [WinError 2] 系统找不到指定的文件

    用Idle运行Python脚本的时候发现如下错误: Traceback (most recent call last):  File "D:\Python\Python36-32\lib\s ...

  3. re模块的使用

    re模块下的函数 compile(pattern):创建模式对象 import re pat = re.compile('D') m = pat.search('CBA') #等价于re.search ...

  4. mysql--->B+tree索引的设计原理

    1.什么是数据库的索引 每种查找算法都只能应用于特定的数据结构之上,例如二分查找要求被检索数据有序,而二叉树查找只能应用于二叉查找树上,但是数据本身的组织结构不可能完全满足各种数据结构(例如,理论上不 ...

  5. Linux 常用工具sysstat之sar

    sysstat包 iostat.sar.sa1和sa2命令都是sysstat包的一部分.它是Linux包含的性能监视工具集合: sar:收集.报告或存储信息(CPU.内存.磁盘.中断.网卡.TTY.内 ...

  6. 深入理解Java虚拟机内存模型

    前言 本文中部分内容引用至<深入理解Java虚拟机:JVM高级特性与最佳实践(第2版)>第12章,如果有兴趣可自行深入阅读,文末放有书籍PDF版本连接. 一.物理机中的并发 物理机遇到的并 ...

  7. Nacos 配置MySQL8.0持久化

    问题描述 官网下载的Nacos mysql由于驱动过低只支持5.X版本,使用8.X版本的mysql时无法正常启动 解决办法 克隆nacos源码(branch 1.0.0-RC3) master等分支也 ...

  8. *args 和 **kwargs 的区别

    截取百度里的两个答案: 这是Python函数可变参数 args及kwargs *args表示任何多个无名参数,它是一个tuple **kwargs表示关键字参数,它是一个dict 测试代码如下: de ...

  9. thinkphp3关闭Runtime中的日志方法

    将LOG_LEVEL允许记录的日志级别设置为空,则不会记录日志

  10. 记一个实时Linux的中断线程化问题

    背景 有一个项目对实时性要求比较高,于是在linux内核上打了RT_PREEMPT补丁. 最终碰到的一个问题是,芯片本身性能不强,CPU资源不足,急需优化. 初步分析 看了下cpu占用率,除了主应用之 ...