一架梯子,一头程序猿,仰望星空!
LangChain查询分析 > 内容正文

拆分问题


分解子问题

当用户提出问题时,并不能保证相关结果可以通过单个查询返回。有时要回答一个问题,我们需要将其分解为不同的子问题,为每个子问题检索结果,然后使用累积上下文来回答。

例如,如果用户问:“Web Voyager与反射代理有什么不同”,我们有一个解释Web Voyager的文档和一个解释反射代理的文档,但没有对比两者的文档,那么通过分别检索“什么是Web Voyager”和“什么是反射代理”,并结合检索到的文档,我们很可能会得到更好的结果,而不是基于用户问题直接检索。

实际业务场景,有时候用户输入的问题直接查询本地向量数据库,可能找不到完整的相关信息,把用户的问题分解成多个子问题,分别查询本地向量数据库,然后通过综合所有子问题的相关知识,回答用户的问题,可以进一步提高回答问题的质量。

将输入问题分解为多个明显的子查询的过程就是我们所说的查询分解。有时也称为子查询生成。这里,我们将通过一个示例演示如何进行分解。

设置

安装依赖项

# %pip install -qU langchain langchain-openai

设置环境变量

在这个示例中,我们将使用 OpenAI:

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

查询生成

为了将用户问题转换为一系列子问题,我们将使用 OpenAI 的函数调用 API,该 API 可以每次返回多个函数:

import datetime
from typing import Literal, Optional, Tuple

from langchain_core.pydantic_v1 import BaseModel, Field


# 定义每个子查询的数据结构
class SubQuery(BaseModel):
    """针对软件库教程视频数据库的搜索。"""

    sub_query: str = Field(
        ...,
        description="针对数据库的非常具体的查询。",
    )
from langchain.output_parsers import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# 定义prompt提示词
system = """您擅长将用户问题转换为数据库查询。\
您可以访问关于构建llm应用程序的软件库的教程视频数据库。。\

执行查询分解。给定一个用户问题,将其分解为明显的子问题。

如果有你不熟悉的缩略语或单词,不要试图重新表达它们。"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)

# 定义模型
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
# 绑定工具库,允许模型调用这些工具
llm_with_tools = llm.bind_tools([SubQuery])
# 定义解析器,解析llm返回的结果,将llm结果格式化成SubQuery对象
parser = PydanticToolsParser(tools=[SubQuery])
query_analyzer = prompt | llm_with_tools | parser

让我们试一试:

query_analyzer.invoke({"question": "如何操作 rag"})

返回结果,这里分解的出一个子查询

[SubQuery(sub_query='如何操作 rag')]
query_analyzer.invoke(
    {
        "question": "如何在链中使用多模态模型并将链变成rest API"
    }
)

这里分解出2个子问题

[SubQuery(sub_query='如何在链中使用多模态模型?'),
 SubQuery(sub_query='如何将链转换为 REST API?')]
query_analyzer.invoke(
    {
        "question": "Web Voyager和Reflection Agents之间的区别是什么?他们用语言吗?"
    }
)

这里分解出2个子问题

[SubQuery(sub_query='Web Voyager与Reflection Agents有什么不同?它们使用 Langgraph 吗?'),
 SubQuery(sub_query='Web Voyager 和 Reflection Agents 是否使用 Langgraph?')]

下面是给LLM一些例子,进一步优化LLM返回结果

examples = [
    {
        "input": "What's chat langchain, is it a langchain template?",
        "tool_calls": [
            {"sub_query": "What is chat langchain"},
            {"sub_query": "What is a langchain template"},
        ],
    },
    {
        "input": "How would I use LangGraph to build an automaton",
        "tool_calls": [
            {"sub_query": "How to build automaton with LangGraph"},
        ],
    },
    {
        "input": "How to build multi-agent system and stream intermediate steps from it",
        "tool_calls": [
            {"sub_query": "How to build multi-agent system"},
            {"sub_query": "How to stream intermediate steps"},
            {"sub_query": "How to stream intermediate steps from multi-agent system"},
        ],
    },
    {
        "input": "What's the difference between LangChain agents and LangGraph?",
        "tool_calls": [
            {"sub_query": "What's the difference between LangChain agents and LangGraph?"},
            {"sub_query": "What are LangChain agents"},
            {"sub_query": "What is LangGraph"},
        ],
    },
]
import uuid
from typing import Dict, List

from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)


def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = []
    for tool_call in example["tool_calls"]:
        openai_tool_calls.append(
            {
                "id": str(uuid.uuid4()),
                "type": "function",
                "function": {
                    "name": "SubQuery",
                    "arguments": tool_call,
                },
            }
        )
    messages.append(
        AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls})
    )
    tool_outputs = example.get("tool_outputs") or [
        "This is an example of a correct usage of this tool. Make sure to continue using the tool this way."
    ] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages


example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]
from langchain_core.prompts import MessagesPlaceholder

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \

Perform query decomposition. Given a user question, break it down into the most specific sub questions you can \
which will help you answer the original question. Each sub question should be about a single concept/fact/idea.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
query_analyzer_with_examples = (
    prompt.partial(examples=example_msgs) | llm_with_tools | parser
)
query_analyzer_with_examples.invoke(
    {
        "question": "web voyager和reflection agents有什么区别?它们使用LangGraph吗?"
    }
)
[SubQuery(sub_query="web voyager和reflection agents有什么区别"),
 SubQuery(sub_query='web voyager和reflection agents是否使用LangGraph'),
 SubQuery(sub_query='web voyager是什么'),
 SubQuery(sub_query='reflection agents是什么')]


关联主题