quick-study

langchain包含以下4个组件

LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
LangServe: A library for deploying LangChain chains as a REST API.
LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

quickstart

LLM Chain

最简单的例子

from langchain_openai import ChatOpenAI
llm = ChatOpenAI()
llm.invoke("how can langsmith help with testing?")

加上LLM Chain

from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are world class technical documentation writer."),
    ("user", "{input}")
])
chain = prompt | llm
chain.invoke({"input": "how can langsmith help with testing?"})

加上一个Output Parser

from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()
chain = prompt | llm | output_parser
chain.invoke({"input": "how can langsmith help with testing?"})

结果

'Langsmith can help with testing in several ways:\n\n1. Test case generation: Langsmith can automatically generate test cases based on the specifications and requirements of the system. It uses techniques such as constraint solving and model checking to generate test inputs that cover different scenarios and edge cases.\n\n2. Test automation: Langsmith can automate the execution of test cases, reducing the manual effort required for testing. It can interact with the system under test, simulate user actions, and verify the expected behavior of the system.\n\n3. Test coverage analysis: Langsmith can analyze the coverage of test cases to identify areas of the system that have not been adequately tested. It can highlight code branches, conditions, or functionality that has not been exercised by the existing test suite, helping testers prioritize their efforts.\n\n4. Regression testing: Langsmith can automatically re-run previously executed test cases to ensure that changes to the system do not introduce new bugs or regressions. It can help in quickly identifying if any existing functionality has been affected by recent changes.\n\n5. Bug detection: Langsmith can analyze the system under test to identify potential bugs, violations of coding standards, or security vulnerabilities. It can perform static analysis on the codebase, identify potential issues, and provide recommendations for improvement.\n\nOverall, Langsmith provides automated and intelligent testing capabilities that can improve the efficiency and effectiveness of the testing process. It can help testers in generating test cases, automating test execution, analyzing coverage, and identifying bugs, ultimately leading to better quality software.'

以上代码运行时，发现数据不是很准确。LLM通常基于某一个历史时间的数据来进行训练，无法获取最新的数据。

langchain通过Retrieval来提供额外的内容给LLM，从而提升LLM回答的准确性

Retrieval Chain

我们将从检索器中查找相关文档，然后将它们传入提示中。检索器可以由任何东西支持 - SQL表格，互联网等 - 但在这个实例中，我们将填充一个向量存储并使用它作为检索器

pip install beautifulsoup4

from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/overview")

docs = loader.load()

docs里包含了网页里的内容

[Document(page_content='\n\n\n\n\nLangSmith Overview and User Guide | 🦜️🛠️ LangSmith\n\n\n\n\n\nSkip to main content🦜️🛠️ LangSmith DocsPython...', 'language': 'en'})]

加载openai的embeddings模型

from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

使用这个嵌入模型将文档输入到一个向量存储中，使用本地向量存储 FAISS

pip install faiss-cpu

构建索引，将文档数据保存在向量数据库里

from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

创建一个检索链

from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

create_stuff_documents_chain会验证prompt，必须得包含 "context"

DOCUMENTS_KEY = "context"
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template("{page_content}")


def _validate_prompt(prompt: BasePromptTemplate) -> None:
    if DOCUMENTS_KEY not in prompt.input_variables:
        raise ValueError(
            f"Prompt must accept {DOCUMENTS_KEY} as an input variable. Received prompt "
            f"with input variables: {prompt.input_variables}"
        )

测试一下document_chain，直接传入一个指定的context内容

from langchain_core.documents import Document
document_chain.invoke({
    "input": "how can langsmith help with testing?",
    "context": [Document(page_content="langsmith can let you visualize test results")]
})

以上并没有用到向量数据库的功能，我们需要创建一个retriever链，这个链将接收一个传入的问题，查找相关的文档，然后将这些文档连同原始问题一起传递给LLM，并要求它回答原始问题。

from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

调用retrieval_chain

response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])

Conversation Retrieval Chain

Agent

from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
    retriever,
    "langsmith_search",
    "Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)

搭建一个机器人

检索器

quickstart​

LLM Chain​

Retrieval Chain​

Conversation Retrieval Chain​

quickstart

LLM Chain

Retrieval Chain

Conversation Retrieval Chain