elasticsearch集群扩缩容
一、实验场景
公司最近进行业务促销,业务量在一定时间内预计会上升,活动结束后复原。需要对elasticsearch临时扩容,先在测试环境进行扩缩容实验。
二、实验目的
- 验证无损扩容、缩容的过程
- 观察当分片数等于节点数时扩容,分片会不会分流至新节点
- 观察当分片数大于节点数时扩容,分片会不会分流至新节点
三、先说结论,节约时间
- 扩容时,elasticsearch是有自动发现机制的,老节点完全不需要改动,只需保证集群名一致然后启动新实例即可加入,新实例的配置文件最好把新老地址全写上,便于寻找到旧集群
- 扩容后,就算索引分片数少于扩容后的节点数,也会酌情再分配的,更不用说分片数大于节点数的索引
- 缩容时,先禁止数据分配,而后等数据分配完成后,再关停节点,即可无损缩容
- 暴力关停节点时,在未达到节点最小数量限制时,会自动分配数据到其他节点,集群会暂时黄色不健康,过段时间后该节点仍未恢复则将其踢出集群
- 在节点缩容后,如果未将"cluster.routing.allocation.exclude._ip"中的IP置空,此时再重新启动这些已经被缩掉的节点,虽然可以加入集群,但不会被分配数据
四、实验环境
A. 版本
由于生产环境elasticsearch搭建较早,所以版本为6.4.3
但经翻阅官方文档,此次升级涉及的操作在7.x.x和8.x.x也通用
B. 部署规划
集群名 | 节点IP地址 | 节点名 | 类型 |
---|---|---|---|
es-cluster-qufudcj | 192.168.0.191 | node-192.168.0.191 | 原有 |
es-cluster-qufudcj | 192.168.0.192 | node-192.168.0.192 | 原有 |
es-cluster-qufudcj | 192.168.0.193 | node-192.168.0.193 | 原有 |
es-cluster-qufudcj | 192.168.0.151 | node-192.168.0.151 | 新增 |
es-cluster-qufudcj | 192.168.0.152 | node-192.168.0.152 | 新增 |
es-cluster-qufudcj | 192.168.0.153 | node-192.168.0.153 | 新增 |
C. 集群现信息
先证明集群、索引当前都是健康的,在实验中和实验后都需要查看它们
1.节点信息
elasticsearch的各查询接口释义请点击:
[root@server01 ~]# curl -X GET http://192.168.0.191:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.0.191 29 96 0 0.00 0.01 0.05 mdi - node-192.168.0.191
192.168.0.192 55 88 0 0.00 0.01 0.05 mdi * node-192.168.0.192
192.168.0.193 50 87 0 0.02 0.02 0.05 mdi - node-192.168.0.193
2.各节点上已存在的数据
[root@server01 ~]# curl -X GET http://192.168.0.191:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
12 25.7mb 2.4gb 14.3gb 16.7gb 14 192.168.0.192 192.168.0.192 node-192.168.0.192
11 25.6mb 2.4gb 14.3gb 16.7gb 14 192.168.0.193 192.168.0.193 node-192.168.0.193
11 25.6mb 2.7gb 14gb 16.7gb 16 192.168.0.191 192.168.0.191 node-192.168.0.191
3.索引健康情况
[root@server01 ~]# curl -X GET http://192.168.0.191:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open ayaka-user _LUVqSo3TvS_K4IiIHW9DA 3 1 0 0 1.5kb 783b
green open zt_task SkiuBb4YSC6W2RCyHDdp5A 6 1 12992 0 38.8mb 19.4mb
green open toherotest KosnbDyBRDG5zG4MV01TWQ 5 1 3 0 20.3kb 10.1kb
green open zt_action VtvsaDIkQwC2iZwSc-i7vA 3 1 184527 0 38.1mb 19mb
4.索引分片分配情况
这里准备了两个索引来做实验
- zt_task:6分片1副本,观察当分片数大于节点数时扩容,分片会不会分流至新节点
- zt_action:3分片1副本,观察当分片数等于节点数时扩容,分片会不会分流至新节点
[root@server01 ~]# curl -X GET http://192.168.0.191:9200/_cat/shards?v
index shard prirep state docs store ip node
zt_task 2 r STARTED 2200 3.2mb 192.168.0.193 node-192.168.0.193
zt_task 2 p STARTED 2200 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 1 r STARTED 2162 3.2mb 192.168.0.193 node-192.168.0.193
zt_task 1 p STARTED 2162 3.2mb 192.168.0.191 node-192.168.0.191
zt_task 5 p STARTED 2094 3.1mb 192.168.0.193 node-192.168.0.193
zt_task 5 r STARTED 2094 3.1mb 192.168.0.191 node-192.168.0.191
zt_task 4 r STARTED 2181 3.2mb 192.168.0.193 node-192.168.0.193
zt_task 4 p STARTED 2181 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 3 r STARTED 2198 3.2mb 192.168.0.191 node-192.168.0.191
zt_task 3 p STARTED 2198 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 0 p STARTED 2157 3.2mb 192.168.0.191 node-192.168.0.191
zt_task 0 r STARTED 2157 3.2mb 192.168.0.192 node-192.168.0.192
zt_action 2 p STARTED 61809 6.3mb 192.168.0.193 node-192.168.0.193
zt_action 2 r STARTED 61809 6.3mb 192.168.0.191 node-192.168.0.191
zt_action 1 r STARTED 61281 6.3mb 192.168.0.193 node-192.168.0.193
zt_action 1 p STARTED 61281 6.3mb 192.168.0.192 node-192.168.0.192
zt_action 0 r STARTED 61437 6.3mb 192.168.0.191 node-192.168.0.191
zt_action 0 p STARTED 61437 6.3mb 192.168.0.192 node-192.168.0.192
5.索引碎片恢复进度
这个接口主要是在扩容或缩容进行中,分片或副本转移时查看
要注意区分字段,例如bytes(要恢复的字节数)为0时,相应的bytes_percent(恢复字节数百分比)也会是0%,这是正常的
[root@server01 ~]# curl -X GET http://192.168.0.191:9200/_cat/recovery?v
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
zt_task 0 46ms empty_store done n/a n/a 192.168.0.191 node-192.168.0.191 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
zt_task 0 262ms peer done 192.168.0.191 node-192.168.0.191 192.168.0.192 node-192.168.0.192 n/a n/a 10 10 100.0% 10 3382690 3382690 100.0% 3382690 0 0 100.0%
zt_task 1 218ms peer done 192.168.0.191 node-192.168.0.191 192.168.0.193 node-192.168.0.193 n/a n/a 10 10 100.0% 10 3408271 3408271 100.0% 3408271 0 0 100.0%
zt_task 1 705ms peer done 192.168.0.192 node-192.168.0.192 192.168.0.191 node-192.168.0.191 n/a n/a 10 10 100.0% 10 3408271 3408271 100.0% 3408271 0 0 100.0%
zt_task 2 681ms peer done 192.168.0.192 node-192.168.0.192 192.168.0.193 node-192.168.0.193 n/a n/a 10 10 100.0% 10 3436120 3436120 100.0% 3436120 0 0 100.0%
zt_task 2 276ms peer done 192.168.0.151 node-192.168.0.151 192.168.0.192 node-192.168.0.192 n/a n/a 10 10 100.0% 10 3436120 3436120 100.0% 3436120 0 0 100.0%
zt_task 3 713ms peer done 192.168.0.192 node-192.168.0.192 192.168.0.191 node-192.168.0.191 n/a n/a 10 10 100.0% 10 3449383 3449383 100.0% 3449383 0 0 100.0%
zt_task 3 171ms peer done 192.168.0.191 node-192.168.0.191 192.168.0.192 node-192.168.0.192 n/a n/a 10 10 100.0% 10 3449383 3449383 100.0% 3449383 0 0 100.0%
zt_task 4 800ms peer done 192.168.0.192 node-192.168.0.192 192.168.0.193 node-192.168.0.193 n/a n/a 10 10 100.0% 10 3382222 3382222 100.0% 3382222 0 0 100.0%
zt_task 4 55ms empty_store done n/a n/a 192.168.0.192 node-192.168.0.192 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
zt_task 5 59ms empty_store done n/a n/a 192.168.0.193 node-192.168.0.193 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
zt_task 5 63ms peer done 192.168.0.193 node-192.168.0.193 192.168.0.191 node-192.168.0.191 n/a n/a 1 1 100.0% 1 230 230 100.0% 230 0 0 100.0%
zt_action 0 819ms peer done 192.168.0.192 node-192.168.0.192 192.168.0.191 node-192.168.0.191 n/a n/a 27 27 100.0% 27 6642036 6642036 100.0% 6642036 0 0 100.0%
zt_action 0 438ms peer done 192.168.0.191 node-192.168.0.191 192.168.0.192 node-192.168.0.192 n/a n/a 27 27 100.0% 27 6642036 6642036 100.0% 6642036 0 0 100.0%
zt_action 1 841ms peer done 192.168.0.192 node-192.168.0.192 192.168.0.193 node-192.168.0.193 n/a n/a 27 27 100.0% 27 6671167 6671167 100.0% 6671167 0 0 100.0%
zt_action 1 33ms empty_store done n/a n/a 192.168.0.192 node-192.168.0.192 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
zt_action 2 43ms empty_store done n/a n/a 192.168.0.193 node-192.168.0.193 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
zt_action 2 882ms peer done 192.168.0.193 node-192.168.0.193 192.168.0.191 node-192.168.0.191 n/a n/a 27 27 100.0% 27 6692940 6692940 100.0% 6692940 0 0 100.0%
五、扩容实验
A.修改配置
1.停止分片重新分配
如果只是扩容一个节点则不需要这个步骤。但我现在扩容了三个,这三个总不会瞬间同时启动完成,可能造成短时间内分配混乱(猜测),以防万一我停止了分配
执行PUT请求
curl -H "Content-Type: application/json" -X PUT "http://192.168.0.191:9200/_cluster/settings?pretty" -d '
{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}'
参数解释
persistent ——永久变更配置
- 临时(Transient)这些变更在集群重启之前一直会生效。一旦整个集群重启,这些配置就被清除。
- 永久(Persistent)这些变更会永久存在。即使全集群重启它们也会存活下来并覆盖掉静态配置文件里的选项。
cluster.routing.allocation.enable ——分片分配模式
- all -(默认)允许为所有类型的碎片分配碎片,设置为null也可以,他俩一个意思。
- primaries -仅允许主碎片分配碎片。
- new_primaries -仅允许为新索引的主碎片分配碎片。
- none -任何索引都不允许任何类型的碎片分配。
2.修改新实例的主配置文件
原本的三个旧实例配置文件不需要改动,也不需要重启
新实例的配置文件相比旧实例只需要改动两个地方
config/elasticsearch.yml改动如下两点
- node.name: 这个要每个节点都不一样,无论新旧节点,为了方便我都是用IP做后缀。
- discovery.zen.ping.unicast.hosts: 三个新实例上的这个列表要改成六个IP,即新老IP全写上
discovery.zen.ping.unicast.hosts: ['192.168.0.151', '192.168.0.152', '192.168.0.153', '192.168.0.191', '192.168.0.192', '192.168.0.193']
其他的都可以保持和旧实例一致,无论是否可成为master还是能否成为数据节点都无所谓,看自己需求决定
cluster.name一定要保持六个实例都一样!
B.开始扩容
1.启动新实例
我这里是用ansible-playbook统一管理的
[root@server01 playbook]# ansible-playbook -i es-inventory.ini -l @tmp.txt --tags=start deploy_elasticsearch.yml
要注意新实例机器的防火墙,elasticsearch相关端口都要开启,不然启动会报错连不上master的,比如:
[2022-06-21T16:42:31,417][INFO ][o.e.d.z.ZenDiscovery ] [node-192.168.0.151] failed to send join request to master [{node-192.168.0.192}{C1I0ULqvTgWC34RL3U7jGw}{AhJmE_wiSG-rCJzZ0Zz4NA}{192.168.0.192}{192.168.0.192:9300}{ml.machine_memory=3936079872, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[node-192.168.0.192][192.168.0.192:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[node-192.168.0.151][192.168.0.151:9300] connect_exception]; nested: IOException[没有到主机的路由: 192.168.0.151/192.168.0.151:9300]; nested: IOException[没有到主机的路由]; ]
2.验证集群节点
通过_cat/nodes接口查看当前集群的节点,可以看到节点已经被自动发现并加入了
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.0.191 23 97 0 0.01 0.04 0.05 mdi - node-192.168.0.191
192.168.0.192 69 88 0 0.00 0.02 0.05 mdi * node-192.168.0.192
192.168.0.193 71 87 0 0.00 0.01 0.05 mdi - node-192.168.0.193
192.168.0.151 20 86 0 0.00 0.03 0.05 mdi - node-192.168.0.151
192.168.0.153 19 87 0 0.01 0.04 0.05 mdi - node-192.168.0.153
192.168.0.152 19 87 0 0.00 0.03 0.05 mdi - node-192.168.0.152
但是通过_cat/allocation接口看各节点上的数据情况,发现新加入的节点上没有分配数据
这是因为我们在扩容前限制了分片和副本的分配,这证明我们的配置生效了
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
0 0b 2.4gb 23.3gb 25.7gb 9 192.168.0.151 192.168.0.151 node-192.168.0.151
0 0b 2.4gb 23.3gb 25.7gb 9 192.168.0.152 192.168.0.152 node-192.168.0.152
0 0b 2.4gb 23.3gb 25.7gb 9 192.168.0.153 192.168.0.153 node-192.168.0.153
1 25.6mb 2.7gb 14gb 16.7gb 16 192.168.0.191 192.168.0.191 node-192.168.0.191
12 25.7mb 2.4gb 14.3gb 16.7gb 14 192.168.0.192 192.168.0.192 node-192.168.0.192
11 25.6mb 2.4gb 14.3gb 16.7gb 14 192.168.0.193 192.168.0.193 node-192.168.0.193
通过_cat/shards接口站在索引的角度看,也是这个结果,没有数据被分配到新节点上
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/shards?v
index shard prirep state docs store ip node
zt_task 2 r STARTED 2200 3.2mb 192.168.0.193 node-192.168.0.193
zt_task 2 p STARTED 2200 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 1 r STARTED 2162 3.2mb 192.168.0.193 node-192.168.0.193
zt_task 1 p STARTED 2162 3.2mb 192.168.0.191 node-192.168.0.191
zt_task 5 p STARTED 2094 3.1mb 192.168.0.193 node-192.168.0.193
zt_task 5 r STARTED 2094 3.1mb 192.168.0.191 node-192.168.0.191
zt_task 4 r STARTED 2181 3.2mb 192.168.0.193 node-192.168.0.193
zt_task 4 p STARTED 2181 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 3 r STARTED 2198 3.2mb 192.168.0.191 node-192.168.0.191
zt_task 3 p STARTED 2198 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 0 p STARTED 2157 3.2mb 192.168.0.191 node-192.168.0.191
zt_task 0 r STARTED 2157 3.2mb 192.168.0.192 node-192.168.0.192
zt_action 2 p STARTED 61809 6.3mb 192.168.0.193 node-192.168.0.193
zt_action 2 r STARTED 61809 6.3mb 192.168.0.191 node-192.168.0.191
zt_action 1 r STARTED 61281 6.3mb 192.168.0.193 node-192.168.0.193
zt_action 1 p STARTED 61281 6.3mb 192.168.0.192 node-192.168.0.192
zt_action 0 r STARTED 61437 6.3mb 192.168.0.191 node-192.168.0.191
zt_action 0 p STARTED 61437 6.3mb 192.168.0.192 node-192.168.0.192
3.开启分配数据
现在新节点成功启动并加入集群了,开启数据分配,将值改回all
curl -H "Content-Type: application/json" -X PUT "http://192.168.0.191:9200/_cluster/settings?pretty" -d '
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}'
4.再次观察集群和数据
如果数据非常多,可以通过上文提到的_cat/recovery接口观察数据分配进度
- 这个接口返回的字段太多了,所以使用了h传参指定了返回的字段,同时指定了v参数显示字段名,&前加了\是因为shell命令行转译
- 还是和上文提到的一样,如果bytes(字节数)、files(文件数)为零,那么他们对应的百分比为零也是正常的
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/recovery?h=index,shard,time,type,stage,target_host,files,files_percent,bytes,bytes_percent,translog_ops,translog_ops_percent\&v
index shard time type stage target_host files files_percent bytes bytes_percent translog_ops translog_ops_percent
zt_task 0 258ms peer done 192.168.0.153 10 100.0% 3382690 100.0% 0 100.0%
zt_task 0 316ms peer done 192.168.0.152 10 100.0% 3382690 100.0% 0 100.0%
zt_task 1 267ms peer done 192.168.0.151 10 100.0% 3408271 100.0% 0 100.0%
zt_task 1 184ms peer done 192.168.0.152 10 100.0% 3408271 100.0% 0 100.0%
zt_task 2 681ms peer done 192.168.0.193 10 100.0% 3436120 100.0% 0 100.0%
zt_task 2 684ms peer done 192.168.0.153 10 100.0% 3436120 100.0% 0 100.0%
zt_task 3 713ms peer done 192.168.0.191 10 100.0% 3449383 100.0% 0 100.0%
zt_task 3 171ms peer done 192.168.0.192 10 100.0% 3449383 100.0% 0 100.0%
zt_task 4 683ms peer done 192.168.0.151 10 100.0% 3382222 100.0% 0 100.0%
zt_task 4 55ms empty_store done 192.168.0.192 0 0.0% 0 0.0% 0 100.0%
zt_task 5 59ms empty_store done 192.168.0.193 0 0.0% 0 0.0% 0 100.0%
zt_task 5 63ms peer done 192.168.0.191 1 100.0% 230 100.0% 0 100.0%
zt_action 0 825ms peer done 192.168.0.153 27 100.0% 6642036 100.0% 0 100.0%
zt_action 0 819ms peer done 192.168.0.191 27 100.0% 6642036 100.0% 0 100.0%
zt_action 1 802ms peer done 192.168.0.151 27 100.0% 6671167 100.0% 0 100.0%
zt_action 1 33ms empty_store done 192.168.0.192 0 0.0% 0 0.0% 0 100.0%
zt_action 2 43ms empty_store done 192.168.0.193 0 0.0% 0 0.0% 0 100.0%
zt_action 2 330ms peer done 192.168.0.152 27 100.0% 6692940 100.0% 0 100.0%
现在看六个节点上都有数据了
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
6 12.8mb 2.4gb 23.2gb 25.7gb 9 192.168.0.151 192.168.0.151 node-192.168.0.151
5 12.8mb 2.4gb 23.3gb 25.7gb 9 192.168.0.153 192.168.0.153 node-192.168.0.153
6 12.8mb 2.4gb 14.3gb 16.7gb 14 192.168.0.193 192.168.0.193 node-192.168.0.193
6 12.8mb 2.4gb 14.3gb 16.7gb 14 192.168.0.192 192.168.0.192 node-192.168.0.192
6 12.8mb 2.6gb 14gb 16.7gb 16 192.168.0.191 192.168.0.191 node-192.168.0.191
5 12.8mb 2.4gb 23.3gb 25.7gb 9 192.168.0.152 192.168.0.152 node-192.168.0.152
当初规划了zt_task索引有六个分片,1个副本,可以看到被均匀分配到新老节点了
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/shards/zt_task?v
index shard prirep state docs store ip node
zt_task 2 r STARTED 2200 3.2mb 192.168.0.193 node-192.168.0.193
zt_task 2 p STARTED 2200 3.2mb 192.168.0.153 node-192.168.0.153
zt_task 1 r STARTED 2162 3.2mb 192.168.0.151 node-192.168.0.151
zt_task 1 p STARTED 2162 3.2mb 192.168.0.152 node-192.168.0.152
zt_task 5 p STARTED 2094 3.1mb 192.168.0.193 node-192.168.0.193
zt_task 5 r STARTED 2094 3.1mb 192.168.0.191 node-192.168.0.191
zt_task 4 r STARTED 2181 3.2mb 192.168.0.151 node-192.168.0.151
zt_task 4 p STARTED 2181 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 3 r STARTED 2198 3.2mb 192.168.0.191 node-192.168.0.191
zt_task 3 p STARTED 2198 3.2mb 192.168.0.192 node-192.168.0.192
zt_task 0 r STARTED 2157 3.2mb 192.168.0.153 node-192.168.0.153
zt_task 0 p STARTED 2157 3.2mb 192.168.0.152 node-192.168.0.152
而规划的zt_action索引虽然只有3个分片,1个副本,但是看情况也被重新分配了,也挺均匀,甚至有主分片也被挪到了新节点上(p表示是主分片,r表示是复制分片)
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/shards/zt_action?v
index shard prirep state docs store ip node
zt_action 2 p STARTED 61809 6.3mb 192.168.0.193 node-192.168.0.193
zt_action 2 r STARTED 61809 6.3mb 192.168.0.152 node-192.168.0.152
zt_action 1 r STARTED 61281 6.3mb 192.168.0.151 node-192.168.0.151
zt_action 1 p STARTED 61281 6.3mb 192.168.0.192 node-192.168.0.192
zt_action 0 p STARTED 61437 6.3mb 192.168.0.153 node-192.168.0.153
zt_action 0 r STARTED 61437 6.3mb 192.168.0.191 node-192.168.0.191
C.扩容完成
由此可以得出如下结论
- elasticsearch是有自动发现机制的,老节点完全不需要改动,只需保证集群名一致然后启动新实例即可加入,新实例的配置文件最好把新老地址全写上,便于寻找到旧集群
- 就算索引分片数少于扩容后的节点数,也会酌情再分配的
六、缩容实验
A.修改配置
先按照正常缩容流程缩容两台,观察现象。然后暴力关停一台,再观察现象
1.禁止数据分配在某节点
我们先排除192.168.0.151和192.168.0.152。如果想正常缩容,这里填上所有要缩容的机器就可以了,我留一台是为了做实验
curl -H "Content-Type: application/json" -X PUT "http://192.168.0.191:9200/_cluster/settings?pretty" -d '
{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "192.168.0.151,192.168.0.152"
}
}'
执行成功后观察,节点还在集群内
[root@server01 ~]# curl -X GET http://192.168.0.191:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.0.191 74 97 0 0.00 0.01 0.05 mdi - node-192.168.0.191
192.168.0.192 66 88 0 0.02 0.02 0.05 mdi * node-192.168.0.192
192.168.0.193 49 87 0 0.00 0.01 0.05 mdi - node-192.168.0.193
192.168.0.151 60 87 0 0.00 0.01 0.05 mdi - node-192.168.0.151
192.168.0.153 51 88 0 0.00 0.01 0.05 mdi - node-192.168.0.153
192.168.0.152 56 87 0 0.00 0.01 0.05 mdi - node-192.168.0.152
但是192.168.0.151和192.168.0.152上面已经没有数据了(第二个字段可以看出)
[root@server01 ~]# curl -X GET http://192.168.0.191:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
9 16.1mb 2.4gb 14.3gb 16.7gb 14 192.168.0.192 192.168.0.192 node-192.168.0.192
8 16mb 2.4gb 14.3gb 16.7gb 14 192.168.0.193 192.168.0.193 node-192.168.0.193
9 22.3mb 2.7gb 14gb 16.7gb 16 192.168.0.191 192.168.0.191 node-192.168.0.191
0 0b 2.4gb 23.3gb 25.7gb 9 192.168.0.151 192.168.0.151 node-192.168.0.151
8 22.4mb 2.4gb 23.3gb 25.7gb 9 192.168.0.153 192.168.0.153 node-192.168.0.153
0 0b 2.4gb 23.3gb 25.7gb 9 192.168.0.152 192.168.0.152 node-192.168.0.152
B.开始缩容
1.关停两个节点
停止192.168.0.151和192.168.0.152上的elasticsearch实例
我这里是托管成系统服务了,同志们随意
[root@middleware04 ~]# systemctl stop elasticsearch
[root@middleware05 ~]# systemctl stop elasticsearch
2.观察集群和索引
两个节点已经没了
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.0.192 26 90 0 0.16 0.09 0.07 mdi * node-192.168.0.192
192.168.0.153 63 88 0 0.00 0.01 0.05 mdi - node-192.168.0.153
192.168.0.191 26 97 0 0.11 0.05 0.06 mdi - node-192.168.0.191
192.168.0.193 60 87 0 0.06 0.03 0.05 mdi - node-192.168.0.193
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
8 22.4mb 2.4gb 14.3gb 16.7gb 14 192.168.0.193 192.168.0.193 node-192.168.0.193
8 16mb 2.4gb 14.3gb 16.7gb 14 192.168.0.192 192.168.0.192 node-192.168.0.192
9 16.2mb 2.4gb 23.3gb 25.7gb 9 192.168.0.153 192.168.0.153 node-192.168.0.153
9 22.4mb 2.7gb 14gb 16.7gb 16 192.168.0.191 192.168.0.191 node-192.168.0.191
索引健康
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open ayaka-user _LUVqSo3TvS_K4IiIHW9DA 3 1 0 0 1.5kb 783b
green open zt_task SkiuBb4YSC6W2RCyHDdp5A 6 1 12992 0 38.8mb 19.4mb
green open toherotest KosnbDyBRDG5zG4MV01TWQ 5 1 3 0 20.3kb 10.1kb
green open zt_action VtvsaDIkQwC2iZwSc-i7vA 3 1 184527 0 38.1mb 19mb
集群健康
[root@server01 playbook]# curl -X GET http://192.168.0.191:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1656384217 10:43:37 es-cluster-qufudcj green 4 4 34 17 0 0 0 0 - 100.0%
C.缩容完成
结论:先禁止数据分配,而后等数据分配完成后,再关停节点,即可无损缩容
Q.E.D.