11、consul入门

Prometheus入门 / 2022-06-30

大部分内容参考自官方文档,链接:https://learn.hashicorp.com/tutorials/consul

基础操作部分内容参考自:程序员架构进阶的《微服务注册中心:Consul概念与基础操作》

一、服务发现之consul

1.1、安装

consul二进制包为单文件形式,非常轻便。

1.1.1、下载
wget https://releases.hashicorp.com/consul/1.12.2/consul_1.12.2_linux_amd64.zip
1.1.2、解压
sudo unzip ~/Downloads/consul_1.12.2_linux_amd64.zip -d /usr/local/bin

1.2、基础操作

命令键入consul --help或直接consul可获得命令行使用帮助

1.2.1、命令使用帮助
sanxi@sanxi-PC:~$ consul
Usage: consul [--version] [--help] <command> [<args>]

Available commands are:
    acl            Interact with Consul's ACLs
    agent          Runs a Consul agent
    catalog        Interact with the catalog
    config         Interact with Consul's Centralized Configurations
    connect        Interact with Consul Connect
    debug          Records a debugging archive for operators
    event          Fire a new event
    exec           Executes a command on Consul nodes
    force-leave    Forces a member of the cluster to enter the "left" state
    info           Provides debugging information for operators.
    intention      Interact with Connect service intentions
    join           Tell Consul agent to join cluster
    keygen         Generates a new encryption key
    keyring        Manages gossip layer encryption keys
    kv             Interact with the key-value store
    leave          Gracefully leaves the Consul cluster and shuts down
    lock           Execute a command holding a lock
    login          Login to Consul using an auth method
    logout         Destroy a Consul token created with login
    maint          Controls node or service maintenance mode
    members        Lists the members of a Consul cluster
    monitor        Stream logs from a Consul agent
    operator       Provides cluster-level tools for Consul operators
    reload         Triggers the agent to reload configuration files
    rtt            Estimates network round trip time between nodes
    services       Interact with services
    snapshot       Saves, restores and inspects snapshots of Consul server state
    tls            Builtin helpers for creating CAs and certificates
    validate       Validate config files/directories
    version        Prints the Consul version
    watch          Watch for changes in Consul
1.2.2、启动开发者模式

因为是学习用,因此使用-dev启动开发者模式即可,正式环境则是启动服务器模式-server且为集群。

sanxi@sanxi-PC:/usr/local$ ./consul agent -dev
==> Starting Consul agent...
           Version: '1.12.2'
           Node ID: 'c9e0d203-af0b-5441-3918-fe3a41c37ea9'
         Node name: 'sanxi-PC'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
      Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2022-06-30T09:05:38.866+0800 [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:c9e0d203-af0b-5441-3918-fe3a41c37ea9 Address:127.0.0.1:8300}]"
2022-06-30T09:05:38.866+0800 [INFO]  agent.server.raft: entering follower state: follower="Node at 127.0.0.1:8300 [Follower]" leader=
2022-06-30T09:05:38.866+0800 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: sanxi-PC.dc1 127.0.0.1
2022-06-30T09:05:38.868+0800 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: sanxi-PC 127.0.0.1
2022-06-30T09:05:38.868+0800 [INFO]  agent.router: Initializing LAN area manager
2022-06-30T09:05:38.868+0800 [INFO]  agent.server: Adding LAN server: server="sanxi-PC (Addr: tcp/127.0.0.1:8300) (DC: dc1)"
2022-06-30T09:05:38.868+0800 [INFO]  agent.server.autopilot: reconciliation now disabled
2022-06-30T09:05:38.868+0800 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-127.0.0.1:8300 sanxi-PC <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:0->127.0.0.1:8300: operation was canceled". Reconnecting...
2022-06-30T09:05:38.868+0800 [INFO]  agent.server: Handled event for server in area: event=member-join server=sanxi-PC.dc1 area=wan
2022-06-30T09:05:38.868+0800 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-127.0.0.1:8300 sanxi-PC <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:0->127.0.0.1:8300: operation was canceled". Reconnecting...
2022-06-30T09:05:38.868+0800 [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-127.0.0.1:8300 sanxi-PC.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:0->127.0.0.1:8300: operation was canceled". Reconnecting...
2022-06-30T09:05:38.869+0800 [DEBUG] agent.server.autopilot: autopilot is now running
2022-06-30T09:05:38.869+0800 [DEBUG] agent.server.autopilot: state update routine is now running
2022-06-30T09:05:38.869+0800 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
2022-06-30T09:05:38.869+0800 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2022-06-30T09:05:38.869+0800 [INFO]  agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
2022-06-30T09:05:38.869+0800 [INFO]  agent: started state syncer
2022-06-30T09:05:38.869+0800 [INFO]  agent: Consul agent running!
2022-06-30T09:05:38.869+0800 [INFO]  agent: Started gRPC server: address=127.0.0.1:8502 network=tcp
...
...
1.2.3、发现成员

根据上面的输出信息可得知,consul的HTTP服务默认端口为8500,我们可以直接在浏览器对其访问

image-20220630091102571

此时我们再开多一个命令行窗口,执行命令members

# -detailed选项可以查看更多详情
# 节点     地址            状态     类型    软件版本  协议版本  数据中心  分区
sanxi@sanxi-PC:/usr/local$ consul members
Node      Address         Status  Type    Build   Protocol  DC   Partition  Segment
sanxi-PC  127.0.0.1:8301  alive   server  1.12.2  2         dc1  default    <all>
1.2.4、HTTP API

consul默认端口为8300,8500是它的HTTP服务端口,可以通过以下请求获取其HTTP API信息

sanxi@sanxi-PC:/usr/local$ curl localhost:8500/v1/catalog/nodes
[
    {
        "ID": "c9e0d203-af0b-5441-3918-fe3a41c37ea9",
        "Node": "sanxi-PC",
        "Address": "127.0.0.1",
        "Datacenter": "dc1",
        "TaggedAddresses": {
            "lan": "127.0.0.1",
            "lan_ipv4": "127.0.0.1",
            "wan": "127.0.0.1",
            "wan_ipv4": "127.0.0.1"
        },
        "Meta": {
            "consul-network-segment": ""
        },
        "CreateIndex": 13,
        "ModifyIndex": 14
    }
]
1.2.5、DNS接口

除了提供HTTP API,你还可以使用DNS接口来发现节点。除非开启了缓存,否则DNS接口会查询请求发送至consul server。DNS接口默认端口为8600,可使用dig命令查询。

sanxi@sanxi-PC:/usr/local$ dig @127.0.0.1 -p 8600 Judiths-MBP.node.consul

; <<>> DiG 9.11.5-P4-5.1+dde-Debian <<>> @127.0.0.1 -p 8600 Judiths-MBP.node.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 55667
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;Judiths-MBP.node.consul.       IN      A

;; AUTHORITY SECTION:
consul.                 0       IN      SOA     ns.consul. hostmaster.consul. 1656553988 3600 600 86400 0

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: 四 6月 30 09:53:08 CST 2022
;; MSG SIZE  rcvd: 102
1.2.6、停止agent

使用leave停止运行consul agent后,也会通知数据中心其它节点并将其从目录中剔除。如果是因为故障导致无法通信,则该节点被标记为关键并自动尝试连接,而不会直接剔除。

sanxi@sanxi-PC:/usr/local$ consul leave
Graceful leave complete

1.3、注册服务

本小节将介绍consul的注册服务和运行状况检查。

consul多数被用来当作服务发现,它提供了一个DNS接口让下游节点向其注册。注册方式有几种:

  • 手动注册
  • 配置管理工具在部署时注册
  • 容器编排平台(一般指K8S)集成自动注册服务
1.3.1、定义服务

可以通过提供服务定义来注册服务(最常用),也可以通过调用HTTP API来注册服务,本次学习为服务定义的方式。

首先需要为consul创建配置目录,consul将会加载目录中的所有配置文件,比如/etc/consul.d/

sudo mkdir /etc/consul.d
sudo chown -R sanxi:sanxi /etc/consul.d/  # 记得更改属主属组为所需用户

接着,在consul.d目录下创建一个名为web.json的作为服务定义配置文件用的JSON文件。

vim /etc/consul.d/prometheus.json

配置参考如下:

定义监听了9090端口的服务为Prometheus,标记rails用于定位服务。

{
  "service": {
    "name": "prometheus",
    "tags": [
      "rails"
    ],
    "port": 80
  }
}

重载consul使配置生效

consul reload
1.3.2、查询服务

定义服务完毕后,就可以使用HTTP API或者DNS接口来查询它。

1.3.2.1、DNS查询

查询语法:NAME.service.consul,NAME即JSON文件中定义的服务名name,本文为Prometheus。

还可以使用DNS标签过滤服务,其格式为:TAG.NAME.service.consul,比如我定义为rails,那么就是rails.prometheus.service.consul SRV,即可返回Prometheus服务中所有带rails标签的记录。

sanxi@sanxi-PC:~$ dig @127.0.0.1 -p 8600 prometheus.service.consul SRV

; <<>> DiG 9.11.5-P4-5.1+dde-Debian <<>> @127.0.0.1 -p 8600 prometheus.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25293
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;prometheus.service.consul.     IN      SRV

;; ANSWER SECTION:
prometheus.service.consul. 0    IN      SRV     1 1 9090 sanxi-PC.node.dc1.consul.

;; ADDITIONAL SECTION:
sanxi-PC.node.dc1.consul. 0     IN      A       127.0.0.1
sanxi-PC.node.dc1.consul. 0     IN      TXT     "consul-network-segment="

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: 四 6月 30 15:01:02 CST 2022
;; MSG SIZE  rcvd: 150
1.3.2.2、HTTP API

HTTP查询语法为:curl http://localhost:8500/v1/catalog/service/NAME

curl http://localhost:8500/v1/catalog/service/prometheus

返回格式如下所示返回所有节点:

[
    {
        "ID": "bd64eadd-16cb-0ed4-afb2-5d234f127562",
        "Node": "sanxi-PC",
        "Address": "127.0.0.1",
        "Datacenter": "dc1",
        "TaggedAddresses": {
            "lan": "127.0.0.1",
            "lan_ipv4": "127.0.0.1",
            "wan": "127.0.0.1",
            "wan_ipv4": "127.0.0.1"
        },
        "NodeMeta": {
            "consul-network-sanswer sectionegment": ""
        },
        "ServiceKind": "",
        "ServiceID": "prometheus",
        "ServiceName": "prometheus",
        "ServiceTags": [
            "rails"
        ],
        "ServiceAddress": "",
        "ServiceWeights": {
            "Passing": 1,
            "Warning": 1
        },
        "ServiceMeta": {},
        "ServicePort": 9090,
        "ServiceSocketPath": "",
        "ServiceEnableTagOverride": false,
        "ServiceProxy": {
            "Mode": "",
            "MeshGateway": {},
            "Expose": {}
        },
        "ServiceConnect": {},
        "CreateIndex": 16,
        "ModifyIndex": 16
    }
]

如果仅希望返回健康状态的节点,则使用http://HOST:8500/v1/health/service/NAME?passing

curl 'http://localhost:8500/v1/health/service/prometheus?passing'

返回

[
    {
        "Node": {
            "ID": "bd64eadd-16cb-0ed4-afb2-5d234f127562",
            "Node": "sanxi-PC",
            "Address": "127.0.0.1",
            "Datacenter": "dc1",
            "TaggedAddresses": {
                "lan": "127.0.0.1",
                "lan_ipv4": "127.0.0.1",
                "wan": "127.0.0.1",
                "wan_ipv4": "127.0.0.1"
            },
            "Meta": {
                "consul-network-segment": ""
            },
            "CreateIndex": 11,
            "ModifyIndex": 15
        },
        "Service": {
            "ID": "prometheus",
            "Service": "prometheus",
            "Tags": [
                "rails"
            ],
            "Address": "",
            "Meta": null,
            "Port": 9090,
            "Weights": {
                "Passing": 1,
                "Warning": 1
            },
            "EnableTagOverride": false,
            "Proxy": {
                "Mode": "",
                "MeshGateway": {},
                "Expose": {}
            },
            "Connect": {},
            "CreateIndex": 16,
            "ModifyIndex": 16
        },
        "Checks": [
            {
                "Node": "sanxi-PC",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "ServiceTags": [],
                "Type": "",
                "Interval": "",
                "Timeout": "",
                "ExposedPort": 0,
                "Definition": {},
                "CreateIndex": 11,
                "ModifyInanswer sectiondex": 11
            }
        ]
    }
]
1.3.3、更新服务

既然是自动发现,当然要对其做健康状态检查。

consul支持修改JSON文件与HTTP API两种方式添加状态检查配置,本次示例使用修改文件的方式。

vim /etc/consul.d/prometheus.json

配置参考如下:

{
  "service": {
    "name": "web",
    "tags": [
      "rails"
    ],
    "port": 80,
    "check": {  // 添加的脚本内容
      "args": [
        "curl",
        "localhost"
      ],
      "interval": "10s"  // 10s检查1次存活
    }
  }
}

上面示例意思添加一个基于脚本的状态检查,每10s使用启动consul的用户执行一次curl以检查服务是否存活。

现在重载consul配置使其生效

sanxi@sanxi-PC:~$ consul reload
Configuration reload triggered

我现在故意停掉Prometheus,看看查询结果会如何!

对比一下之前成功的记录,我们可以看到ANSWER SECTION那里已经不见了,再看看flags那里,QUERY: 1,ANSWER: 0,说明配置有定义了一个服务,但是现在服务不可达,所以没有响应。

sanxi@sanxi-PC:~$ dig @127.0.0.1 -p 8600 prometheus.service.consul

; <<>> DiG 9.11.5-P4-5.1+dde-Debian <<>> @127.0.0.1 -p 8600 prometheus.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 22414
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;prometheus.service.consul.     IN      A

;; AUTHORITY SECTION:
consul.                 0       IN      SOA     ns.consul. hostmaster.consul. 1656574343 3600 600 86400 0

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: 四 6月 30 15:32:23 CST 2022
;; MSG SIZE  rcvd: 104

1.4、安全加固

consul支持安全认证与加密两种方式

世间微尘里 独爱茶酒中