LangChain教程(Python版本) > 内容正文

LangChain 输出解析器

输出解析器

LLM语言模型输出内容是文本格式，但是开发AI应用的时候，我们希望能拿到的是格式化的内容，例如结果转成目标对象，数组等，方便程序处理。这就需要LangChain提供的输出解析器（Output parser）格式化模型返回的内容。

输出解析器作用是用于是格式化语言模型返回的结果。一个输出解析器必须实现两种必要的方法:

“get_format_instructions”: 返回一个字符串，其中包含要求语言模型应该返回什么格式内容的提示词。
“parse”: 将模型返回的内容，解析为目标格式。

下面我们看看LangChain内置的输出解析器。

Pydantic 解析器

下面是LangChain封装的核心输出解析器PydanticOutputParser，该解析器是基于python的pydantic库，用于实现将模型的输出结果转成Python对象。

# 导入必要的模块
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import OpenAI

# 初始化语言模型
model = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0.0)

# 继承`BaseModel`定义需要的数据结构
class Joke(BaseModel):
    # 通过Field告诉模型，使用什么信息填充当前字段
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # 使用Pydantic进行自定义验证逻辑
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field

# 定义解析器，设置我们希望返回的python对象
parser = PydanticOutputParser(pydantic_object=Joke)

# 将输出解析器的格式指令注入到提示词模板(prompt template)中, 通过get_format_instructions函数获取格式指令
prompt = PromptTemplate(
    # 观察提示词模板(prompt template)，里面包含format_instructions和query两个模板参数，format_instructions参数用于注入输出解析器的格式指令，query参数用于注入用户的问题，
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# 使用lcel表达式，定义一个工作流
prompt_and_model = prompt | model
# 调用前面定义的工作流
output = prompt_and_model.invoke({"query": "Tell me a joke."})
parser.invoke(output)

返回结果示例

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

LCEL接口

Runnable接口

输出解析器实现了Runnable接口，是LangChain表达语言（LCEL）的基本构建块之一。它支持invoke、ainvoke、stream、astream、batch、abatch、astream_log等调用方法。

输出解析器在LCEL中的应用

输出解析器可以接受字符串或BaseMessage作为输入，并返回任意类型的结构化数据。我们可以通过将解析器添加到Runnable序列中来构建并调用解析器链。

# 将输出解析器拼接到Lcel表达式中
chain = prompt | model | parser
chain.invoke({"query": "Tell me a joke."})

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

一些解析器可以流式传输部分解析对象，例如SimpleJsonOutputParser，而有些则不支持。最终输出取决于解析器是否能够构建部分解析对象。

from langchain.output_parsers.json import SimpleJsonOutputParser

json_prompt = PromptTemplate.from_template(
    "Return a JSON object with an `answer` key that answers the following question: {question}"
)
json_parser = SimpleJsonOutputParser()
json_chain = json_prompt | model | json_parser

list(json_chain.stream({"question": "Who invented the microscope?"}))

[{},
 {'answer': ''},
 {'answer': 'Ant'},
 {'answer': 'Anton'},
 {'answer': 'Antonie'},
 {'answer': 'Antonie van'},
 {'answer': 'Antonie van Lee'},
 {'answer': 'Antonie van Leeu'},
 {'answer': 'Antonie van Leeuwen'},
 {'answer': 'Antonie van Leeuwenho'},
 {'answer': 'Antonie van Leeuwenhoek'}]

在LCEL中，我们可以通过组合不同的解析器构建复杂的数据处理流程，以满足各种需求。

关联主题

LangChain开发指南

梯子教程-tizi365.com

LangChain教程(Python版本)

LangChain入门

提示词管理

LangChain表达式语言

语言模型

本地数据处理

文档处理

文本向量处理

任务例子

历史记忆

Agents