代理发现与协商：A2A 如何实现动态交互

摘要：A2A（Agent2Agent）协议通过代理发现与协商机制，实现了 AI 代理在运行时的动态协作，无需硬编码配置即可识别彼此并协商交互方式。本文深入剖析 A2A 的发现与协商流程，聚焦 AgentCard 的交换、能力解析和交互模式的动态调整。结合 GitHub 仓库的实现、Mermaid 图表和代码示例，我们将揭示 A2A 如何通过硬核设计支持多代理系统的灵活性，为开发者提供深入的技术洞察。

1. 引言：动态交互的必要性

在企业 AI 系统中，代理（Agent）需要像团队成员一样协作，处理从费用报销到客服交互的多样化任务。然而，传统的静态接口（例如 REST API）无法满足代理的动态需求：代理可能在运行时加入或更改功能，用户交互可能从文本切换到表单甚至音视频。Google 的 A2A（Agent2Agent） 协议通过 代理发现 和 交互协商 解决了这一问题，让代理能够自适应地识别彼此并优化通信。

A2A 的发现与协商机制以 AgentCard 为核心，结合任务管理和通信协议（HTTP/WebSocket），实现了运行时的灵活性。本文将深入解析这一机制，结合 Google A2A GitHub 仓库的实现，揭示其硬核内核。

2. 代理发现：从未知到可信

2.1 发现的定义

代理发现是指 Host Agent 在运行时识别 Remote Agent 的过程，了解其身份、能力和服务端点。A2A 通过 AgentCard 的交换实现这一目标，类似服务注册中心（如 ZooKeeper）或 DNS，但更轻量且专注于 AI 代理。

发现的流程包括：

请求 AgentCard：Host Agent 向 Remote Agent 的 URL 发送 GET 请求。
解析 AgentCard：获取 Remote Agent 的元数据（名称、能力、任务 schema 等）。
验证能力：检查 Remote Agent 是否支持所需的功能（例如特定的交互模式或任务类型）。

以下是发现过程的时序图：

sequenceDiagram
    participant H as Host Agent
    participant R as Remote Agent
    H->>R: GET /agentcard
    R-->>H: Return AgentCard JSON
    H->>H: Parse name, capabilities, schema
    H->>H: Validate compatibility
    H->>R: Proceed to negotiation

2.2 AgentCard 的作用

AgentCard 是发现的核心，包含以下关键字段（参考 a2a.json）：

name：代理标识符（例如 “ExpenseAgent”）。
url：通信端点（例如 https://example.com/a2a）。
capabilities：功能描述，包括：
- streaming（布尔值）：是否支持流式传输。
- pushNotifications（布尔值）：是否支持推送通知。
- interactionModes（数组）：支持的交互模式，如 ["text", "form", "video"]。
schema：任务输入/输出格式，例如定义 amount 和 currency。

示例 AgentCard：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
  "name": "ExpenseAgent",
  "description": "Processes expense reimbursements",
  "url": "https://example.com/a2a",
  "capabilities": {
    "streaming": false,
    "pushNotifications": true,
    "interactionModes": ["text", "form"]
  },
  "schema": {
    "input": {
      "type": "object",
      "properties": {
        "amount": {"type": "number"},
        "currency": {"type": "string"}
      },
      "required": ["amount", "currency"]
    }
  }
}

2.3 发现的动态性

A2A 的发现机制无需预先配置，Host Agent 只需知道 Remote Agent 的 URL 即可启动交互。这种动态性带来了以下优势：

即插即用：新代理加入系统时，只需发布 AgentCard，无需修改现有代码。
跨平台支持：不同供应商的代理（Google Cloud、AWS）可通过标准化的 AgentCard 协作。
容错性：如果 Remote Agent 不可用，Host Agent 可尝试其他代理的 URL。

3. 交互协商：灵活协作的关键

3.1 协商的定义

交互协商是指 Host Agent 和 Remote Agent 在任务执行前，基于 AgentCard 的 capabilities 确定通信方式和交互模式。A2A 支持多种模式（文本、表单、音视频），协商过程确保双方选择最合适的方案。

协商的典型场景包括：

Host Agent 提议文本交互，Remote Agent 确认支持。
Remote Agent 要求表单输入（例如补充发票图片），Host Agent 动态渲染 UI。
双方协商使用 WebSocket 进行音视频流传输。

以下是协商的流程图：

flowchart TD
    A[Host Agent Parses AgentCard] --> B[Check interactionModes]
    B --> C{Supported Modes?}
    C -->|Text| D[Propose Text]
    C -->|Form| E[Propose Form]
    C -->|Video| F[Propose Video]
    D --> G[Remote Agent Response]
    E --> G
    F --> G
    G -->|Accept| H[Start Task]
    G -->|Suggest Alternative| I[Re-negotiate]
    I --> C

3.2 协商的过程

协商通常分为以下步骤：

能力评估：Host Agent 检查 Remote Agent 的 capabilities.interactionModes 和 streaming。
提议交互：Host Agent 发送首选模式（例如 text）。
确认或调整：Remote Agent 接受提议或建议替代模式（例如 form）。
协议达成：双方确认交互方式，进入任务执行。

协商的时序图如下：

sequenceDiagram
    participant H as Host Agent
    participant R as Remote Agent
    H->>R: GET /agentcard
    R-->>H: Return AgentCard (interactionModes: ["text", "form"])
    H->>R: Propose interaction (mode: text)
    R-->>H: Suggest form (requires additional data)
    H->>R: Agree to form
    H->>R: Submit Task (form data)
    R-->>H: Task Result

3.3 动态调整

A2A 支持任务执行中的动态调整。例如：

Remote Agent 在处理任务时发现数据不足，请求 Host Agent 提供表单输入。
Host Agent 检测到网络条件变化，从文本切换到音视频流。

这种灵活性依赖于 AgentCard 的 capabilities 和任务状态的实时更新。

4. 技术实现：发现与协商的细节

4.1 通信协议

发现和协商主要通过以下协议实现：

HTTP：用于获取 AgentCard 和初始协商，基于 GET 和 POST 请求。
WebSocket：用于实时协商和动态调整，例如推送交互模式变更。

示例 HTTP 请求（获取 AgentCard）：

1
2
3
GET /a2a/agentcard HTTP/1.1
Host: example.com
Accept: application/json

响应：

1
2
3
4
5
6
7
8
{
  "name": "ExpenseAgent",
  "url": "https://example.com/a2a",
  "capabilities": {
    "interactionModes": ["text", "form"],
    "streaming": false
  }
}

4.2 数据验证

Host Agent 在解析 AgentCard 时，使用 JSON Schema 验证其合法性（参考 a2a.json）。例如：

检查 capabilities.interactionModes 是否包含所需模式。
验证 schema.input 是否匹配任务数据。

4.3 错误处理

发现和协商可能遇到以下问题：

AgentCard 不可用：Remote Agent 离线，Host Agent 返回超时错误。
模式不兼容：Remote Agent 不支持提议的交互模式，协商失败。
数据错误：任务输入不符合 schema，Remote Agent 返回验证错误。

A2A 的任务状态机通过 failed 状态和 error 字段处理这些问题。

5. 代码示例：实现发现与协商

以下是一个基于 samples/python/agents/google_adk 的费用报销代理，展示发现与协商的实现。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Host Agent：发现并协商
from a2a import A2AClient, Task
import asyncio

async def expense_client(remote_url: str):
    client = A2AClient(remote_url)

    # 发现：获取 AgentCard
    agent_card = await client.get_agent_card()
    print(f"Discovered agent: {agent_card['name']}")

    # 检查能力
    capabilities = agent_card['capabilities']
    if "text" not in capabilities.get("interactionModes", []):
        raise ValueError("Text interaction not supported")

    # 协商：提议文本交互
    negotiation = await client.negotiate_interaction({"mode": "text"})
    if negotiation["status"] != "accepted":
        print(f"Negotiation failed, suggested: {negotiation['suggested']}")
        return

    # 提交任务
    task = {
        "taskId": "task-001",
        "type": "expense",
        "data": {"amount": 100, "currency": "USD"}
    }
    result = await client.submit_task(task)
    print(f"Task result: {result}")

# Remote Agent：费用报销服务器
from a2a import A2AServer, AgentCard

class ExpenseAgent(A2AServer):
    def __init__(self):
        card = AgentCard(
            name="ExpenseAgent",
            description="Processes expense reimbursements",
            url="http://localhost:8080/a2a",
            capabilities={
                "streaming": False,
                "pushNotifications": True,
                "interactionModes": ["text", "form"]
            },
            schema={
                "input": {
                    "type": "object",
                    "properties": {
                        "amount": {"type": "number"},
                        "currency": {"type": "string"}
                    },
                    "required": ["amount", "currency"]
                }
            }
        )
        super().__init__(card=card)

    async def negotiate_interaction(self, proposal: dict) -> dict:
        mode = proposal.get("mode")
        if mode in self.card.capabilities["interactionModes"]:
            return {"status": "accepted", "mode": mode}
        return {"status": "rejected", "suggested": "form"}

    async def handle_task(self, task: Task) -> dict:
        if task["type"] != "expense":
            return {"status": "failed", "error": "Invalid task type"}
        amount = task["data"]["amount"]
        if amount <= 0:
            return {"status": "failed", "error": "Invalid amount"}
        return {
            "status": "completed",
            "result": f"Approved {amount} {task['data']['currency']}"
        }

if __name__ == "__main__":
    server = ExpenseAgent()
    # 模拟运行客户端和服务器
    asyncio.run(expense_client("http://localhost:8080/a2a"))
    server.run(port=8080)