SOLR深度源码系列解读专栏（五）：分布式搜索与 SolrCloud

5.1 前言

在之前的文章中，我们已经详细探讨了 SOLR 在单机模式下的索引构建、更新和查询机制。然而，现代搜索应用的规模往往需要分布式系统来支持高并发、大数据量和高可用性。SOLR 通过 SolrCloud 提供了分布式解决方案，能够将索引和查询任务分布到多个节点，同时利用 ZooKeeper 实现集群协调。本篇将从 SolrCloud 的设计理念入手，逐步揭示其分布式架构和关键实现细节，并通过源码分析核心组件的功能。

通过本篇，你将理解 SolrCloud 如何管理分片和副本、如何处理分布式查询和索引，以及如何确保数据一致性。这不仅是对前几篇内容的扩展，也是掌握 SOLR 企业级应用的关键一步。

5.2 SolrCloud 的设计理念

SolrCloud 是 SOLR 的分布式模式，旨在解决单机模式的局限性（如存储容量和查询性能）。其核心设计目标包括：

可扩展性：通过分片（Sharding）支持水平扩展。
高可用性：通过副本（Replica）实现故障转移。
一致性：利用 ZooKeeper 维护集群状态。
简单性：对客户端屏蔽分布式复杂性，提供统一的访问接口。

5.2.1 核心概念

Collection：逻辑上的索引集合，包含多个分片。
Shard：Collection 的一个子集，独立存储部分数据。
Replica：Shard 的副本，运行在不同节点上。
ZooKeeper：分布式协调服务，存储集群元数据（如分片状态）。
Leader：每个 Shard 的主副本，负责协调更新。

这些概念在源码中以类和数据结构的形式体现，后文会逐一分析。

5.3 SolrCloud 架构概览

SolrCloud 的架构可以分为以下层次：

客户端层：通过 CloudSolrClient 与集群交互。
节点层：多个 SOLR 实例（节点），每个节点运行多个 Core。
协调层：ZooKeeper 管理集群状态和配置。
存储层：分布式索引文件，基于 Lucene。

架构示意图（文字描述）

客户端 → CloudSolrClient → [Node1(Shard1 Leader), Node2(Shard1 Replica), Node3(Shard2 Leader)] → ZooKeeper → Lucene 索引

5.4 ZooKeeper 的作用

ZooKeeper 是 SolrCloud 的“大脑”，负责存储和管理集群的元数据。

5.4.1 存储的内容

clusterstate.json：记录 Collection、分片和副本的状态。
live_nodes：当前活跃的节点列表。
configs：全局配置（如 solrconfig.xml、schema.xml）。

示例 clusterstate.json：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "my_collection": {
    "shards": {
      "shard1": {
        "replicas": {
          "core_node1": {"node_name": "node1:8983_solr", "state": "active", "leader": "true"},
          "core_node2": {"node_name": "node2:8983_solr", "state": "active"}
        }
      }
    }
  }
}

5.4.2 ClusterState

ClusterState 是 ZooKeeper 数据的内存表示，位于 org.apache.solr.common.cloud：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
public class ClusterState {
  private final Map<String, DocCollection> collectionStates;
  private final Set<String> liveNodes;

  public ClusterState(ZkClient zkClient) {
    this.collectionStates = loadCollections(zkClient);
    this.liveNodes = zkClient.getLiveNodes();
  }

  public DocCollection getCollection(String collection) {
    return collectionStates.get(collection);
  }
}

loadCollections：从 ZooKeeper 加载集群状态。

5.5 分布式索引

分布式环境下，索引操作需要协调多个节点。

5.5.1 客户端提交

使用 CloudSolrClient 提交文档：

1
2
3
4
5
6
CloudSolrClient client = new CloudSolrClient.Builder().withZkHost("localhost:2181").build();
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "3");
doc.addField("title", "Distributed Test");
client.add("my_collection", doc);
client.commit();

5.5.2 分片路由

CloudSolrClient 根据文档 ID 计算分片：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public class CloudSolrClient extends SolrClient {
  private final ClusterState clusterState;

  @Override
  public UpdateResponse add(String collection, SolrInputDocument doc) throws IOException {
    String shard = routeDoc(doc, collection);
    HttpSolrClient nodeClient = getClientForShard(shard);
    return nodeClient.add(doc);
  }

  private String routeDoc(SolrInputDocument doc, String collection) {
    String id = doc.getFieldValue("id").toString();
    return DocRouter.getRoute(id, clusterState.getCollection(collection));
  }
}

DocRouter：默认使用哈希算法分配分片。

5.5.3 Leader 与 Replica 同步

Leader 接收更新后，通过 DistributedUpdateProcessor 同步到 Replica：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
public class DistributedUpdateProcessor extends UpdateRequestProcessor {
  @Override
  public void processAdd(AddUpdateCommand cmd) throws IOException {
    super.processAdd(cmd);
    distributeUpdate(cmd);
  }

  private void distributeUpdate(AddUpdateCommand cmd) {
    List<Replica> replicas = getReplicas(cmd.getCollection(), cmd.getShard());
    for (Replica replica : replicas) {
      sendUpdateToReplica(replica, cmd);
    }
  }
}

5.6 分布式查询

分布式查询需要从多个分片收集结果并聚合。

5.6.1 查询流程

客户端请求：

GET http://localhost:8983/solr/my_collection/select?q=title:Test

请求分发：ShardHandler 负责分发查询。
结果聚合：合并各分片的结果。

5.6.2 ShardHandler

ShardHandler 协调分布式查询，位于 org.apache.solr.handler.component：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
public class HttpShardHandler extends ShardHandler {
  @Override
  public void submit(ShardRequest sreq, String shard, SolrParams params) {
    HttpSolrClient client = getClientForShard(shard);
    client.request(new QueryRequest(params));
  }

  @Override
  public void prepResponse(ShardRequest sreq) {
    mergeResponses(sreq.responses);
  }
}

5.6.3 CloudSolrClient 查询

1
QueryResponse response = client.query("my_collection", new SolrQuery("title:Test"));

5.7 实践：搭建 SolrCloud 集群

步骤

启动 ZooKeeper：
1
zkServer.sh start

启动两个 SOLR 节点：

1
2
bin/solr start -c -p 8983 -z localhost:2181
bin/solr start -c -p 8984 -z localhost:2181

创建 Collection：

1
bin/solr create -c my_collection -s 2 -rf 2

提交文档并查询：
- 使用 CloudSolrClient 添加文档。
- 查询并观察分片分布。

验证

访问 http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS 查看集群状态。

5.8 源码分析：关键点总结

ClusterState：管理集群元数据。
CloudSolrClient：客户端与集群的桥梁。
DistributedUpdateProcessor：同步索引更新。
ShardHandler：协调分布式查询。

5.9 小结与预告

本篇详细剖析了 SolrCloud 的分布式架构，从 ZooKeeper 的协调作用到分布式查询与索引的实现。通过源码分析，我们理解了 SOLR 如何在集群中实现高可用性和扩展性。下一篇文章将探讨 插件机制与扩展性，带你进入 SOLR 的定制化开发世界。

课后练习

修改分片策略，观察数据分布。
在 ShardHandler 中添加日志，记录查询分发细节。

SOLR深度源码系列解读专栏（五）：分布式搜索与 SolrCloud

相关文章：