python 如何通过LangChain的get_relevant_documents方法检索源文档,仅当答案来自自定义知识库

hjqgdpho  于 5个月前  发布在  Python
关注(0)|答案(4)|浏览(147)

我正在制作一个聊天机器人,它访问一个外部知识库docs。我想获取机器人访问的相关文档以获取其答案,但当用户输入是“你好”、“你好吗”、“2+2是什么”或任何不是从外部知识库docs中检索到的答案时,情况就不应该是这样了。在这种情况下,我希望retriever.get_relevant_documents(query)或任何其他行返回一个空列表或类似的东西。

import os
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain 
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

os.environ['OPENAI_API_KEY'] = ''

custom_template = """
This is conversation with a human. Answer the questions you get based on the knowledge you have.
If you don't know the answer, just say that you don't, don't try to make up an answer.
Chat History:
{chat_history}
Follow Up Input: {question}
"""
CUSTOM_QUESTION_PROMPT = PromptTemplate.from_template(custom_template)

llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # Name of the language model
    temperature=0  # Parameter that controls the randomness of the generated responses
)

embeddings = OpenAIEmbeddings()

docs = [
    "Buildings are made out of brick",
    "Buildings are made out of wood",
    "Buildings are made out of stone",
    "Buildings are made out of atoms",
    "Buildings are made out of building materials",
    "Cars are made out of metal",
    "Cars are made out of plastic",
  ]

vectorstore = FAISS.from_texts(docs, embeddings)

retriever = vectorstore.as_retriever()

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever,
    condense_question_prompt=CUSTOM_QUESTION_PROMPT,
    memory=memory
)

query = "what are cars made of?"
result = qa({"question": query})
print(result)
print(retriever.get_relevant_documents(query))

字符串
我尝试为检索器设置一个阈值,但我仍然得到具有高相似度分数的相关文档。在其他有相关文档的用户提示中,我没有得到任何相关文档。

retriever = vectorstore.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": .9})

neekobn8

neekobn81#

你需要在链“return_source_documents”中添加参数,如下所示

qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever,
    condense_question_prompt=CUSTOM_QUESTION_PROMPT,
    memory=memory,
    return_source_documents=True
)

query = "what are cars made of?"
result = qa({"question": query})

字符串
结果你会得到你的源文档沿着的相似性分数
获取所有相关文件

answer = result.get("answer")

docs = result.get("source_documents", [])

yc0p9oo0

yc0p9oo02#

为了解决这个问题,我不得不将链类型更改为RetrievalQA并引入代理和工具。

import os
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.agents import AgentExecutor, Tool,initialize_agent
from langchain.agents.types import AgentType

os.environ['OPENAI_API_KEY'] = ''

system_message = """
"You are the XYZ bot."
"This is conversation with a human. Answer the questions you get based on the knowledge you have."
"If you don't know the answer, just say that you don't, don't try to make up an answer."
"""

llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # Name of the language model
    temperature=0  # Parameter that controls the randomness of the generated responses
)

embeddings = OpenAIEmbeddings()

docs = [
    "Buildings are made out of brick",
    "Buildings are made out of wood",
    "Buildings are made out of stone",
    "Buildings are made out of atoms",
    "Buildings are made out of building materials",
    "Cars are made out of metal",
    "Cars are made out of plastic",
  ]

vectorstore = FAISS.from_texts(docs, embeddings)

retriever = vectorstore.as_retriever()

memory = ConversationBufferMemory(memory_key="chat_history", input_key='input', return_messages=True, output_key='output')

qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vectorstore.as_retriever(),
        verbose=True,
        return_source_documents=True
    )

tools = [
        Tool(
            name="doc_search_tool",
            func=qa,
            description=(
               "This tool is used to retrieve information from the knowledge base"
            )
        )
    ]

agent = initialize_agent(
        agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
        tools=tools,
        llm=llm,
        memory=memory,
        return_source_documents=True,
        return_intermediate_steps=True,
        agent_kwargs={"system_message": system_message}
        )

query1 = "what are buildings made of?"
result1 = agent(query1)

query2 = "who are you?"
result2 = agent(query2)

字符串
如果结果访问源,则它将具有键"intermediate_steps"的值,然后源文档可以通过result1["intermediate_steps"][0][1]["source_documents"]访问
否则,当查询不需要源时,result2["intermediate_steps"]将为空。

uxh89sit

uxh89sit3#

很抱歉问到这里,我不能添加评论。我的问题是:当你添加代理时,你的答案是否变短了?你能解决这个问题吗?

bfhwhh0e

bfhwhh0e4#

这帮助我获得了源文档的URL:

for x in range(len(response["source_documents"][0].metadata)):
    print(response["source_documents"][x].metadata)

字符串
还有这个版本来更好地格式化它:

for x in range(len(response["source_documents"][0].metadata)):
        raw_dict = response["source_documents"][x].metadata
        print("Page number:", raw_dict['page'], "Filename:", raw_dict['source'])

相关问题