* Support hybrid vector retrieval * Enable figures and table reading in Azure DI * Retrieve with multi-modal * Fix mixing up table * Add txt loader * Add Anthropic Chat * Raising error when retrieving help file * Allow same filename for different people if private is True * Allow declaring extra LLM vendors * Show chunks on the File page * Allow elasticsearch to get more docs * Fix Cohere response (#86) * Fix Cohere response * Remove Adobe pdfservice from dependency kotaemon doesn't rely more pdfservice for its core functionality, and pdfservice uses very out-dated dependency that causes conflict. --------- Co-authored-by: trducng <trungduc1992@gmail.com> * Add confidence score (#87) * Save question answering data as a log file * Save the original information besides the rewritten info * Export Cohere relevance score as confidence score * Fix style check * Upgrade the confidence score appearance (#90) * Highlight the relevance score * Round relevance score. Get key from config instead of env * Cohere return all scores * Display relevance score for image * Remove columns and rows in Excel loader which contains all NaN (#91) * remove columns and rows which contains all NaN * back to multiple joiner options * Fix style --------- Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local> Co-authored-by: trducng <trungduc1992@gmail.com> * Track retriever state * Bump llama-index version 0.10 * feat/save-azuredi-mhtml-to-markdown (#93) * feat/save-azuredi-mhtml-to-markdown * fix: replace os.path to pathlib change theflow.settings * refactor: base on pre-commit * chore: move the func of saving content markdown above removed_spans --------- Co-authored-by: jacky0218 <jacky0218@github.com> * fix: losing first chunk (#94) * fix: losing first chunk. * fix: update the method of preventing losing chunks --------- Co-authored-by: jacky0218 <jacky0218@github.com> * fix: adding the base64 image in markdown (#95) * feat: more chunk info on UI * fix: error when reindexing files * refactor: allow more information exception trace when using gpt4v * feat: add excel reader that treats each worksheet as a document * Persist loader information when indexing file * feat: allow hiding unneeded setting panels * feat: allow specific timezone when creating conversation * feat: add more confidence score (#96) * Allow a list of rerankers * Export llm reranking score instead of filter with boolean * Get logprobs from LLMs * Rename cohere reranking score * Call 2 rerankers at once * Run QA pipeline for each chunk to get qa_score * Display more relevance scores * Define another LLMScoring instead of editing the original one * Export logprobs instead of probs * Call LLMScoring * Get qa_score only in the final answer * feat: replace text length with token in file list * ui: show index name instead of id in the settings * feat(ai): restrict the vision temperature * fix(ui): remove the misleading message about non-retrieved evidences * feat(ui): show the reasoning name and description in the reasoning setting page * feat(ui): show version on the main windows * feat(ui): show default llm name in the setting page * fix(conf): append the result of doc in llm_scoring (#97) * fix: constraint maximum number of images * feat(ui): allow filter file by name in file list page * Fix exceeding token length error for OpenAI embeddings by chunking then averaging (#99) * Average embeddings in case the text exceeds max size * Add docstring * fix: Allow empty string when calling embedding * fix: update trulens LLM ranking score for retrieval confidence, improve citation (#98) * Round when displaying not by default * Add LLMTrulens reranking model * Use llmtrulensscoring in pipeline * fix: update UI display for trulen score --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * feat: add question decomposition & few-shot rewrite pipeline (#89) * Create few-shot query-rewriting. Run and display the result in info_panel * Fix style check * Put the functions to separate modules * Add zero-shot question decomposition * Fix fewshot rewriting * Add default few-shot examples * Fix decompose question * Fix importing rewriting pipelines * fix: update decompose logic in fullQA pipeline --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * fix: add encoding utf-8 when save temporal markdown in vectorIndex (#101) * fix: improve retrieval pipeline and relevant score display (#102) * fix: improve retrieval pipeline by extending first round top_k with multiplier * fix: minor fix * feat: improve UI default settings and add quick switch option for pipeline * fix: improve agent logics (#103) * fix: improve agent progres display * fix: update retrieval logic * fix: UI display * fix: less verbose debug log * feat: add warning message for low confidence * fix: LLM scoring enabled by default * fix: minor update logics * fix: hotfix image citation * feat: update docx loader for handle merged table cells + handle zip file upload (#104) * feat: update docx loader for handle merged table cells * feat: handle zip file * refactor: pre-commit * fix: escape text in download UI * feat: optimize vector store query db (#105) * feat: optimize vector store query db * feat: add file_id to chroma metadatas * feat: remove unnecessary logs and update migrate script * feat: iterate through file index * fix: remove unused code --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * fix: add openai embedidng exponential back-off * fix: update import download_loader * refactor: codespell * fix: update some default settings * fix: update installation instruction * fix: default chunk length in simple QA * feat: add share converstation feature and enable retrieval history (#108) * feat: add share converstation feature and enable retrieval history * fix: update share conversation UI --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * fix: allow exponential backoff for failed OCR call (#109) * fix: update default prompt when no retrieval is used * fix: create embedding for long image chunks * fix: add exception handling for additional table retriever * fix: clean conversation & file selection UI * fix: elastic search with empty doc_ids * feat: add thumbnail PDF reader for quick multimodal QA * feat: add thumbnail handling logic in indexing * fix: UI text update * fix: PDF thumb loader page number logic * feat: add quick indexing pipeline and update UI * feat: add conv name suggestion * fix: minor UI change * feat: citation in thread * fix: add conv name suggestion in regen * chore: add assets for usage doc * chore: update usage doc * feat: pdf viewer (#110) * feat: update pdfviewer * feat: update missing files * fix: update rendering logic of infor panel * fix: improve thumbnail retrieval logic * fix: update PDF evidence rendering logic * fix: remove pdfjs built dist * fix: reduce thumbnail evidence count * chore: update gitignore * fix: add js event on chat msg select * fix: update css for viewer * fix: add env var for PDFJS prebuilt * fix: move language setting to reasoning utils --------- Co-authored-by: phv2312 <kat87yb@gmail.com> Co-authored-by: trducng <trungduc1992@gmail.com> * feat: graph rag (#116) * fix: reload server when add/delete index * fix: rework indexing pipeline to be able to disable vectorstore and splitter if needed * feat: add graphRAG index with plot view * fix: update requirement for graphRAG and lighten unnecessary packages * feat: add knowledge network index (#118) * feat: add Knowledge Network index * fix: update reader mode setting for knet * fix: update init knet * fix: update collection name to index pipeline * fix: missing req --------- Co-authored-by: jeff52415 <jeff.yang@cinnamon.is> * fix: update info panel return for graphrag * fix: retriever setting graphrag * feat: local llm settings (#122) * feat: expose context length as reasoning setting to better fit local models * fix: update context length setting for agents * fix: rework threadpool llm call * fix: fix improve indexing logic * fix: fix improve UI * feat: add lancedb * fix: improve lancedb logic * feat: add lancedb vectorstore * fix: lighten requirement * fix: improve lanceDB vs * fix: improve UI * fix: openai retry * fix: update reqs * fix: update launch command * feat: update Dockerfile * feat: add plot history * fix: update default config * fix: remove verbose print * fix: update default setting * fix: update gradio plot return * fix: default gradio tmp * fix: improve lancedb docstore * fix: fix question decompose pipeline * feat: add multimodal reader in UI * fix: udpate docs * fix: update default settings & docker build * fix: update app startup * chore: update documentation * chore: update README * chore: update README --------- Co-authored-by: trducng <trungduc1992@gmail.com> * chore: update README * chore: update README --------- Co-authored-by: trducng <trungduc1992@gmail.com> Co-authored-by: cin-ace <ace@cinnamon.is> Co-authored-by: Linh Nguyen <70562198+linhnguyen-cinnamon@users.noreply.github.com> Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local> Co-authored-by: cin-jacky <101088014+jacky0218@users.noreply.github.com> Co-authored-by: jacky0218 <jacky0218@github.com> Co-authored-by: kan_cin <kan@cinnamon.is> Co-authored-by: phv2312 <kat87yb@gmail.com> Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>
962 lines
35 KiB
Python
962 lines
35 KiB
Python
import html
|
|
import logging
|
|
import threading
|
|
from collections import defaultdict
|
|
from difflib import SequenceMatcher
|
|
from functools import partial
|
|
from typing import Generator
|
|
|
|
import numpy as np
|
|
import tiktoken
|
|
from ktem.llms.manager import llms
|
|
from ktem.reasoning.prompt_optimization import (
|
|
DecomposeQuestionPipeline,
|
|
RewriteQuestionPipeline,
|
|
)
|
|
from ktem.utils.render import Render
|
|
from theflow.settings import settings as flowsettings
|
|
|
|
from kotaemon.base import (
|
|
AIMessage,
|
|
BaseComponent,
|
|
Document,
|
|
HumanMessage,
|
|
Node,
|
|
RetrievedDocument,
|
|
SystemMessage,
|
|
)
|
|
from kotaemon.indices.qa.citation import CitationPipeline
|
|
from kotaemon.indices.splitters import TokenSplitter
|
|
from kotaemon.llms import ChatLLM, PromptTemplate
|
|
|
|
from ..utils import SUPPORTED_LANGUAGE_MAP
|
|
from .base import BaseReasoning
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
EVIDENCE_MODE_TEXT = 0
|
|
EVIDENCE_MODE_TABLE = 1
|
|
EVIDENCE_MODE_CHATBOT = 2
|
|
EVIDENCE_MODE_FIGURE = 3
|
|
MAX_IMAGES = 10
|
|
|
|
|
|
def find_text(search_span, context):
|
|
sentence_list = search_span.split("\n")
|
|
matches = []
|
|
# don't search for small text
|
|
if len(search_span) > 5:
|
|
for sentence in sentence_list:
|
|
match = SequenceMatcher(
|
|
None, sentence, context, autojunk=False
|
|
).find_longest_match()
|
|
if match.size > len(sentence) * 0.35:
|
|
matches.append((match.b, match.b + match.size))
|
|
|
|
return matches
|
|
|
|
|
|
class PrepareEvidencePipeline(BaseComponent):
|
|
"""Prepare the evidence text from the list of retrieved documents
|
|
|
|
This step usually happens after `DocumentRetrievalPipeline`.
|
|
|
|
Args:
|
|
trim_func: a callback function or a BaseComponent, that splits a large
|
|
chunk of text into smaller ones. The first one will be retained.
|
|
"""
|
|
|
|
max_context_length: int = 32000
|
|
trim_func: TokenSplitter | None = None
|
|
|
|
def run(self, docs: list[RetrievedDocument]) -> Document:
|
|
evidence = ""
|
|
images = []
|
|
table_found = 0
|
|
evidence_modes = []
|
|
|
|
evidence_trim_func = (
|
|
self.trim_func
|
|
if self.trim_func
|
|
else TokenSplitter(
|
|
chunk_size=self.max_context_length,
|
|
chunk_overlap=0,
|
|
separator=" ",
|
|
tokenizer=partial(
|
|
tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
|
|
allowed_special=set(),
|
|
disallowed_special="all",
|
|
),
|
|
)
|
|
)
|
|
|
|
for _id, retrieved_item in enumerate(docs):
|
|
retrieved_content = ""
|
|
page = retrieved_item.metadata.get("page_label", None)
|
|
source = filename = retrieved_item.metadata.get("file_name", "-")
|
|
if page:
|
|
source += f" (Page {page})"
|
|
if retrieved_item.metadata.get("type", "") == "table":
|
|
evidence_modes.append(EVIDENCE_MODE_TABLE)
|
|
if table_found < 5:
|
|
retrieved_content = retrieved_item.metadata.get(
|
|
"table_origin", retrieved_item.text
|
|
)
|
|
if retrieved_content not in evidence:
|
|
table_found += 1
|
|
evidence += (
|
|
f"<br><b>Table from {source}</b>\n"
|
|
+ retrieved_content
|
|
+ "\n<br>"
|
|
)
|
|
elif retrieved_item.metadata.get("type", "") == "chatbot":
|
|
evidence_modes.append(EVIDENCE_MODE_CHATBOT)
|
|
retrieved_content = retrieved_item.metadata["window"]
|
|
evidence += (
|
|
f"<br><b>Chatbot scenario from {filename} (Row {page})</b>\n"
|
|
+ retrieved_content
|
|
+ "\n<br>"
|
|
)
|
|
elif retrieved_item.metadata.get("type", "") == "image":
|
|
evidence_modes.append(EVIDENCE_MODE_FIGURE)
|
|
retrieved_content = retrieved_item.metadata.get("image_origin", "")
|
|
retrieved_caption = html.escape(retrieved_item.get_content())
|
|
evidence += (
|
|
f"<br><b>Figure from {source}</b>\n"
|
|
+ "<img width='85%' src='<src>' "
|
|
+ f"alt='{retrieved_caption}'/>"
|
|
+ "\n<br>"
|
|
)
|
|
images.append(retrieved_content)
|
|
else:
|
|
if "window" in retrieved_item.metadata:
|
|
retrieved_content = retrieved_item.metadata["window"]
|
|
else:
|
|
retrieved_content = retrieved_item.text
|
|
retrieved_content = retrieved_content.replace("\n", " ")
|
|
if retrieved_content not in evidence:
|
|
evidence += (
|
|
f"<br><b>Content from {source}: </b> "
|
|
+ retrieved_content
|
|
+ " \n<br>"
|
|
)
|
|
|
|
# resolve evidence mode
|
|
evidence_mode = EVIDENCE_MODE_TEXT
|
|
if EVIDENCE_MODE_FIGURE in evidence_modes:
|
|
evidence_mode = EVIDENCE_MODE_FIGURE
|
|
elif EVIDENCE_MODE_TABLE in evidence_modes:
|
|
evidence_mode = EVIDENCE_MODE_TABLE
|
|
|
|
# trim context by trim_len
|
|
print("len (original)", len(evidence))
|
|
if evidence:
|
|
texts = evidence_trim_func([Document(text=evidence)])
|
|
evidence = texts[0].text
|
|
print("len (trimmed)", len(evidence))
|
|
|
|
return Document(content=(evidence_mode, evidence, images))
|
|
|
|
|
|
DEFAULT_QA_TEXT_PROMPT = (
|
|
"Use the following pieces of context to answer the question at the end in detail with clear explanation. " # noqa: E501
|
|
"If you don't know the answer, just say that you don't know, don't try to "
|
|
"make up an answer. Give answer in "
|
|
"{lang}.\n\n"
|
|
"{context}\n"
|
|
"Question: {question}\n"
|
|
"Helpful Answer:"
|
|
)
|
|
|
|
DEFAULT_QA_TABLE_PROMPT = (
|
|
"Use the given context: texts, tables, and figures below to answer the question, "
|
|
"then provide answer with clear explanation."
|
|
"If you don't know the answer, just say that you don't know, "
|
|
"don't try to make up an answer. Give answer in {lang}.\n\n"
|
|
"Context:\n"
|
|
"{context}\n"
|
|
"Question: {question}\n"
|
|
"Helpful Answer:"
|
|
) # noqa
|
|
|
|
DEFAULT_QA_CHATBOT_PROMPT = (
|
|
"Pick the most suitable chatbot scenarios to answer the question at the end, "
|
|
"output the provided answer text. If you don't know the answer, "
|
|
"just say that you don't know. Keep the answer as concise as possible. "
|
|
"Give answer in {lang}.\n\n"
|
|
"Context:\n"
|
|
"{context}\n"
|
|
"Question: {question}\n"
|
|
"Answer:"
|
|
) # noqa
|
|
|
|
DEFAULT_QA_FIGURE_PROMPT = (
|
|
"Use the given context: texts, tables, and figures below to answer the question. "
|
|
"If you don't know the answer, just say that you don't know. "
|
|
"Give answer in {lang}.\n\n"
|
|
"Context: \n"
|
|
"{context}\n"
|
|
"Question: {question}\n"
|
|
"Answer: "
|
|
) # noqa
|
|
|
|
DEFAULT_REWRITE_PROMPT = (
|
|
"Given the following question, rephrase and expand it "
|
|
"to help you do better answering. Maintain all information "
|
|
"in the original question. Keep the question as concise as possible. "
|
|
"Give answer in {lang}\n"
|
|
"Original question: {question}\n"
|
|
"Rephrased question: "
|
|
) # noqa
|
|
|
|
CONTEXT_RELEVANT_WARNING_SCORE = 0.7
|
|
|
|
|
|
class AnswerWithContextPipeline(BaseComponent):
|
|
"""Answer the question based on the evidence
|
|
|
|
Args:
|
|
llm: the language model to generate the answer
|
|
citation_pipeline: generates citation from the evidence
|
|
qa_template: the prompt template for LLM to generate answer (refer to
|
|
evidence_mode)
|
|
qa_table_template: the prompt template for LLM to generate answer for table
|
|
(refer to evidence_mode)
|
|
qa_chatbot_template: the prompt template for LLM to generate answer for
|
|
pre-made scenarios (refer to evidence_mode)
|
|
lang: the language of the answer. Currently support English and Japanese
|
|
"""
|
|
|
|
llm: ChatLLM = Node(default_callback=lambda _: llms.get_default())
|
|
vlm_endpoint: str = getattr(flowsettings, "KH_VLM_ENDPOINT", "")
|
|
use_multimodal: bool = getattr(flowsettings, "KH_REASONINGS_USE_MULTIMODAL", True)
|
|
citation_pipeline: CitationPipeline = Node(
|
|
default_callback=lambda _: CitationPipeline(llm=llms.get_default())
|
|
)
|
|
|
|
qa_template: str = DEFAULT_QA_TEXT_PROMPT
|
|
qa_table_template: str = DEFAULT_QA_TABLE_PROMPT
|
|
qa_chatbot_template: str = DEFAULT_QA_CHATBOT_PROMPT
|
|
qa_figure_template: str = DEFAULT_QA_FIGURE_PROMPT
|
|
|
|
enable_citation: bool = False
|
|
system_prompt: str = ""
|
|
lang: str = "English" # support English and Japanese
|
|
n_last_interactions: int = 5
|
|
|
|
def get_prompt(self, question, evidence, evidence_mode: int):
|
|
"""Prepare the prompt and other information for LLM"""
|
|
if evidence_mode == EVIDENCE_MODE_TEXT:
|
|
prompt_template = PromptTemplate(self.qa_template)
|
|
elif evidence_mode == EVIDENCE_MODE_TABLE:
|
|
prompt_template = PromptTemplate(self.qa_table_template)
|
|
elif evidence_mode == EVIDENCE_MODE_FIGURE:
|
|
if self.use_multimodal:
|
|
prompt_template = PromptTemplate(self.qa_figure_template)
|
|
else:
|
|
prompt_template = PromptTemplate(self.qa_template)
|
|
else:
|
|
prompt_template = PromptTemplate(self.qa_chatbot_template)
|
|
|
|
prompt = prompt_template.populate(
|
|
context=evidence,
|
|
question=question,
|
|
lang=self.lang,
|
|
)
|
|
|
|
return prompt, evidence
|
|
|
|
def run(
|
|
self, question: str, evidence: str, evidence_mode: int = 0, **kwargs
|
|
) -> Document:
|
|
return self.invoke(question, evidence, evidence_mode, **kwargs)
|
|
|
|
def invoke(
|
|
self,
|
|
question: str,
|
|
evidence: str,
|
|
evidence_mode: int = 0,
|
|
images: list[str] = [],
|
|
**kwargs,
|
|
) -> Document:
|
|
raise NotImplementedError
|
|
|
|
async def ainvoke( # type: ignore
|
|
self,
|
|
question: str,
|
|
evidence: str,
|
|
evidence_mode: int = 0,
|
|
images: list[str] = [],
|
|
**kwargs,
|
|
) -> Document:
|
|
"""Answer the question based on the evidence
|
|
|
|
In addition to the question and the evidence, this method also take into
|
|
account evidence_mode. The evidence_mode tells which kind of evidence is.
|
|
The kind of evidence affects:
|
|
1. How the evidence is represented.
|
|
2. The prompt to generate the answer.
|
|
|
|
By default, the evidence_mode is 0, which means the evidence is plain text with
|
|
no particular semantic representation. The evidence_mode can be:
|
|
1. "table": There will be HTML markup telling that there is a table
|
|
within the evidence.
|
|
2. "chatbot": There will be HTML markup telling that there is a chatbot.
|
|
This chatbot is a scenario, extracted from an Excel file, where each
|
|
row corresponds to an interaction.
|
|
|
|
Args:
|
|
question: the original question posed by user
|
|
evidence: the text that contain relevant information to answer the question
|
|
(determined by retrieval pipeline)
|
|
evidence_mode: the mode of evidence, 0 for text, 1 for table, 2 for chatbot
|
|
"""
|
|
raise NotImplementedError
|
|
|
|
def stream( # type: ignore
|
|
self,
|
|
question: str,
|
|
evidence: str,
|
|
evidence_mode: int = 0,
|
|
images: list[str] = [],
|
|
**kwargs,
|
|
) -> Generator[Document, None, Document]:
|
|
history = kwargs.get("history", [])
|
|
print(f"Got {len(images)} images")
|
|
# check if evidence exists, use QA prompt
|
|
if evidence:
|
|
prompt, evidence = self.get_prompt(question, evidence, evidence_mode)
|
|
else:
|
|
prompt = question
|
|
|
|
# retrieve the citation
|
|
citation = None
|
|
|
|
def citation_call():
|
|
nonlocal citation
|
|
citation = self.citation_pipeline(context=evidence, question=question)
|
|
|
|
if evidence and self.enable_citation:
|
|
# execute function call in thread
|
|
citation_thread = threading.Thread(target=citation_call)
|
|
citation_thread.start()
|
|
else:
|
|
citation_thread = None
|
|
|
|
output = ""
|
|
logprobs = []
|
|
|
|
messages = []
|
|
if self.system_prompt:
|
|
messages.append(SystemMessage(content=self.system_prompt))
|
|
for human, ai in history[-self.n_last_interactions :]:
|
|
messages.append(HumanMessage(content=human))
|
|
messages.append(AIMessage(content=ai))
|
|
|
|
if self.use_multimodal and evidence_mode == EVIDENCE_MODE_FIGURE:
|
|
# create image message:
|
|
messages.append(
|
|
HumanMessage(
|
|
content=[
|
|
{"type": "text", "text": prompt},
|
|
]
|
|
+ [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {"url": image},
|
|
}
|
|
for image in images[:MAX_IMAGES]
|
|
],
|
|
)
|
|
)
|
|
else:
|
|
# append main prompt
|
|
messages.append(HumanMessage(content=prompt))
|
|
|
|
try:
|
|
# try streaming first
|
|
print("Trying LLM streaming")
|
|
for out_msg in self.llm.stream(messages):
|
|
output += out_msg.text
|
|
logprobs += out_msg.logprobs
|
|
yield Document(channel="chat", content=out_msg.text)
|
|
except NotImplementedError:
|
|
print("Streaming is not supported, falling back to normal processing")
|
|
output = self.llm(messages).text
|
|
yield Document(channel="chat", content=output)
|
|
|
|
if logprobs:
|
|
qa_score = np.exp(np.average(logprobs))
|
|
else:
|
|
qa_score = None
|
|
|
|
if citation_thread:
|
|
citation_thread.join()
|
|
answer = Document(
|
|
text=output,
|
|
metadata={"citation": citation, "qa_score": qa_score},
|
|
)
|
|
|
|
return answer
|
|
|
|
|
|
class AddQueryContextPipeline(BaseComponent):
|
|
|
|
n_last_interactions: int = 5
|
|
llm: ChatLLM = Node(default_callback=lambda _: llms.get_default())
|
|
|
|
def run(self, question: str, history: list) -> Document:
|
|
messages = [
|
|
SystemMessage(
|
|
content="Below is a history of the conversation so far, and a new "
|
|
"question asked by the user that needs to be answered by searching "
|
|
"in a knowledge base.\nYou have access to a Search index "
|
|
"with 100's of documents.\nGenerate a search query based on the "
|
|
"conversation and the new question.\nDo not include cited source "
|
|
"filenames and document names e.g info.txt or doc.pdf in the search "
|
|
"query terms.\nDo not include any text inside [] or <<>> in the "
|
|
"search query terms.\nDo not include any special characters like "
|
|
"'+'.\nIf the question is not in English, rewrite the query in "
|
|
"the language used in the question.\n If the question contains enough "
|
|
"information, return just the number 1\n If it's unnecessary to do "
|
|
"the searching, return just the number 0."
|
|
),
|
|
HumanMessage(content="How did crypto do last year?"),
|
|
AIMessage(
|
|
content="Summarize Cryptocurrency Market Dynamics from last year"
|
|
),
|
|
HumanMessage(content="What are my health plans?"),
|
|
AIMessage(content="Show available health plans"),
|
|
]
|
|
for human, ai in history[-self.n_last_interactions :]:
|
|
messages.append(HumanMessage(content=human))
|
|
messages.append(AIMessage(content=ai))
|
|
|
|
messages.append(HumanMessage(content=f"Generate search query for: {question}"))
|
|
|
|
resp = self.llm(messages).text
|
|
if resp == "0":
|
|
return Document(content="")
|
|
|
|
if resp == "1":
|
|
return Document(content=question)
|
|
|
|
return Document(content=resp)
|
|
|
|
|
|
class FullQAPipeline(BaseReasoning):
|
|
"""Question answering pipeline. Handle from question to answer"""
|
|
|
|
class Config:
|
|
allow_extra = True
|
|
|
|
# configuration parameters
|
|
trigger_context: int = 150
|
|
use_rewrite: bool = False
|
|
|
|
retrievers: list[BaseComponent]
|
|
|
|
evidence_pipeline: PrepareEvidencePipeline = PrepareEvidencePipeline.withx()
|
|
answering_pipeline: AnswerWithContextPipeline = AnswerWithContextPipeline.withx()
|
|
rewrite_pipeline: RewriteQuestionPipeline | None = None
|
|
add_query_context: AddQueryContextPipeline = AddQueryContextPipeline.withx()
|
|
|
|
def retrieve(
|
|
self, message: str, history: list
|
|
) -> tuple[list[RetrievedDocument], list[Document]]:
|
|
"""Retrieve the documents based on the message"""
|
|
# if len(message) < self.trigger_context:
|
|
# # prefer adding context for short user questions, avoid adding context for
|
|
# # long questions, as they are likely to contain enough information
|
|
# # plus, avoid the situation where the original message is already too long
|
|
# # for the model to handle
|
|
# query = self.add_query_context(message, history).content
|
|
# else:
|
|
# query = message
|
|
# print(f"Rewritten query: {query}")
|
|
query = None
|
|
if not query:
|
|
# TODO: previously return [], [] because we think this message as something
|
|
# like "Hello", "I need help"...
|
|
query = message
|
|
|
|
docs, doc_ids = [], []
|
|
plot_docs = []
|
|
|
|
for idx, retriever in enumerate(self.retrievers):
|
|
retriever_node = self._prepare_child(retriever, f"retriever_{idx}")
|
|
retriever_docs = retriever_node(text=query)
|
|
|
|
retriever_docs_text = []
|
|
retriever_docs_plot = []
|
|
|
|
for doc in retriever_docs:
|
|
if doc.metadata.get("type", "") == "plot":
|
|
retriever_docs_plot.append(doc)
|
|
else:
|
|
retriever_docs_text.append(doc)
|
|
|
|
for doc in retriever_docs_text:
|
|
if doc.doc_id not in doc_ids:
|
|
docs.append(doc)
|
|
doc_ids.append(doc.doc_id)
|
|
|
|
plot_docs.extend(retriever_docs_plot)
|
|
|
|
info = [
|
|
Document(
|
|
channel="info",
|
|
content=Render.collapsible_with_header(doc, open_collapsible=True),
|
|
)
|
|
for doc in docs
|
|
] + [
|
|
Document(
|
|
channel="plot",
|
|
content=doc.metadata.get("data", ""),
|
|
)
|
|
for doc in plot_docs
|
|
]
|
|
|
|
return docs, info
|
|
|
|
def prepare_citations(self, answer, docs) -> tuple[list[Document], list[Document]]:
|
|
"""Prepare the citations to show on the UI"""
|
|
with_citation, without_citation = [], []
|
|
spans = defaultdict(list)
|
|
has_llm_score = any("llm_trulens_score" in doc.metadata for doc in docs)
|
|
|
|
if answer.metadata["citation"] and answer.metadata["citation"].answer:
|
|
for fact_with_evidence in answer.metadata["citation"].answer:
|
|
for quote in fact_with_evidence.substring_quote:
|
|
matched_excerpts = []
|
|
for doc in docs:
|
|
matches = find_text(quote, doc.text)
|
|
|
|
for start, end in matches:
|
|
if "|" not in doc.text[start:end]:
|
|
spans[doc.doc_id].append(
|
|
{
|
|
"start": start,
|
|
"end": end,
|
|
}
|
|
)
|
|
matched_excerpts.append(doc.text[start:end])
|
|
|
|
print("Matched citation:", quote, matched_excerpts),
|
|
|
|
id2docs = {doc.doc_id: doc for doc in docs}
|
|
not_detected = set(id2docs.keys()) - set(spans.keys())
|
|
|
|
# render highlight spans
|
|
for _id, ss in spans.items():
|
|
if not ss:
|
|
not_detected.add(_id)
|
|
continue
|
|
cur_doc = id2docs[_id]
|
|
highlight_text = ""
|
|
|
|
ss = sorted(ss, key=lambda x: x["start"])
|
|
text = cur_doc.text[: ss[0]["start"]]
|
|
for idx, span in enumerate(ss):
|
|
to_highlight = cur_doc.text[span["start"] : span["end"]]
|
|
if len(to_highlight) > len(highlight_text):
|
|
highlight_text = to_highlight
|
|
text += Render.highlight(to_highlight)
|
|
if idx < len(ss) - 1:
|
|
text += cur_doc.text[span["end"] : ss[idx + 1]["start"]]
|
|
text += cur_doc.text[ss[-1]["end"] :]
|
|
# add to display list
|
|
with_citation.append(
|
|
Document(
|
|
channel="info",
|
|
content=Render.collapsible_with_header_score(
|
|
cur_doc,
|
|
override_text=text,
|
|
highlight_text=highlight_text,
|
|
open_collapsible=True,
|
|
),
|
|
)
|
|
)
|
|
|
|
print("Got {} cited docs".format(len(with_citation)))
|
|
|
|
sorted_not_detected_items_with_scores = [
|
|
(id_, id2docs[id_].metadata.get("llm_trulens_score", 0.0))
|
|
for id_ in not_detected
|
|
]
|
|
sorted_not_detected_items_with_scores.sort(key=lambda x: x[1], reverse=True)
|
|
|
|
for id_, _ in sorted_not_detected_items_with_scores:
|
|
doc = id2docs[id_]
|
|
doc_score = doc.metadata.get("llm_trulens_score", 0.0)
|
|
is_open = not has_llm_score or (
|
|
doc_score > CONTEXT_RELEVANT_WARNING_SCORE and len(with_citation) == 0
|
|
)
|
|
without_citation.append(
|
|
Document(
|
|
channel="info",
|
|
content=Render.collapsible_with_header_score(
|
|
doc, open_collapsible=is_open
|
|
),
|
|
)
|
|
)
|
|
return with_citation, without_citation
|
|
|
|
def show_citations(self, answer, docs):
|
|
# show the evidence
|
|
with_citation, without_citation = self.prepare_citations(answer, docs)
|
|
if not with_citation and not without_citation:
|
|
yield Document(channel="info", content="<h5><b>No evidence found.</b></h5>")
|
|
else:
|
|
# clear the Info panel
|
|
max_llm_rerank_score = max(
|
|
doc.metadata.get("llm_trulens_score", 0.0) for doc in docs
|
|
)
|
|
has_llm_score = any("llm_trulens_score" in doc.metadata for doc in docs)
|
|
# clear previous info
|
|
yield Document(channel="info", content=None)
|
|
|
|
# yield warning message
|
|
if has_llm_score and max_llm_rerank_score < CONTEXT_RELEVANT_WARNING_SCORE:
|
|
yield Document(
|
|
channel="info",
|
|
content=(
|
|
"<h5>WARNING! Context relevance score is low. "
|
|
"Double check the model answer for correctness.</h5>"
|
|
),
|
|
)
|
|
|
|
# show QA score
|
|
qa_score = (
|
|
round(answer.metadata["qa_score"], 2)
|
|
if answer.metadata.get("qa_score")
|
|
else None
|
|
)
|
|
if qa_score:
|
|
yield Document(
|
|
channel="info",
|
|
content=f"<h5>Answer confidence: {qa_score}</h5>",
|
|
)
|
|
|
|
yield from with_citation
|
|
if without_citation:
|
|
yield from without_citation
|
|
|
|
async def ainvoke( # type: ignore
|
|
self, message: str, conv_id: str, history: list, **kwargs # type: ignore
|
|
) -> Document: # type: ignore
|
|
raise NotImplementedError
|
|
|
|
def stream( # type: ignore
|
|
self, message: str, conv_id: str, history: list, **kwargs # type: ignore
|
|
) -> Generator[Document, None, Document]:
|
|
if self.use_rewrite and self.rewrite_pipeline:
|
|
print("Chosen rewrite pipeline", self.rewrite_pipeline)
|
|
message = self.rewrite_pipeline(question=message).text
|
|
print("Rewrite result", message)
|
|
|
|
print(f"Retrievers {self.retrievers}")
|
|
# should populate the context
|
|
docs, infos = self.retrieve(message, history)
|
|
print(f"Got {len(docs)} retrieved documents")
|
|
yield from infos
|
|
|
|
evidence_mode, evidence, images = self.evidence_pipeline(docs).content
|
|
|
|
def generate_relevant_scores():
|
|
nonlocal docs
|
|
docs = self.retrievers[0].generate_relevant_scores(message, docs)
|
|
|
|
# generate relevant score using
|
|
if evidence and self.retrievers:
|
|
scoring_thread = threading.Thread(target=generate_relevant_scores)
|
|
scoring_thread.start()
|
|
else:
|
|
scoring_thread = None
|
|
|
|
answer = yield from self.answering_pipeline.stream(
|
|
question=message,
|
|
history=history,
|
|
evidence=evidence,
|
|
evidence_mode=evidence_mode,
|
|
images=images,
|
|
conv_id=conv_id,
|
|
**kwargs,
|
|
)
|
|
|
|
# show the evidence
|
|
if scoring_thread:
|
|
scoring_thread.join()
|
|
|
|
yield from self.show_citations(answer, docs)
|
|
|
|
return answer
|
|
|
|
@classmethod
|
|
def get_pipeline(cls, settings, states, retrievers):
|
|
"""Get the reasoning pipeline
|
|
|
|
Args:
|
|
settings: the settings for the pipeline
|
|
retrievers: the retrievers to use
|
|
"""
|
|
max_context_length_setting = settings.get("reasoning.max_context_length", 32000)
|
|
|
|
pipeline = cls(
|
|
retrievers=retrievers,
|
|
rewrite_pipeline=RewriteQuestionPipeline(),
|
|
)
|
|
|
|
prefix = f"reasoning.options.{cls.get_info()['id']}"
|
|
llm_name = settings.get(f"{prefix}.llm", None)
|
|
llm = llms.get(llm_name, llms.get_default())
|
|
|
|
# prepare evidence pipeline configuration
|
|
evidence_pipeline = pipeline.evidence_pipeline
|
|
evidence_pipeline.max_context_length = max_context_length_setting
|
|
|
|
# answering pipeline configuration
|
|
answer_pipeline = pipeline.answering_pipeline
|
|
answer_pipeline.llm = llm
|
|
answer_pipeline.citation_pipeline.llm = llm
|
|
answer_pipeline.n_last_interactions = settings[f"{prefix}.n_last_interactions"]
|
|
answer_pipeline.enable_citation = settings[f"{prefix}.highlight_citation"]
|
|
answer_pipeline.system_prompt = settings[f"{prefix}.system_prompt"]
|
|
answer_pipeline.qa_template = settings[f"{prefix}.qa_prompt"]
|
|
answer_pipeline.lang = SUPPORTED_LANGUAGE_MAP.get(
|
|
settings["reasoning.lang"], "English"
|
|
)
|
|
|
|
pipeline.add_query_context.llm = llm
|
|
pipeline.add_query_context.n_last_interactions = settings[
|
|
f"{prefix}.n_last_interactions"
|
|
]
|
|
|
|
pipeline.trigger_context = settings[f"{prefix}.trigger_context"]
|
|
pipeline.use_rewrite = states.get("app", {}).get("regen", False)
|
|
if pipeline.rewrite_pipeline:
|
|
pipeline.rewrite_pipeline.llm = llm
|
|
pipeline.rewrite_pipeline.lang = SUPPORTED_LANGUAGE_MAP.get(
|
|
settings["reasoning.lang"], "English"
|
|
)
|
|
return pipeline
|
|
|
|
@classmethod
|
|
def get_user_settings(cls) -> dict:
|
|
from ktem.llms.manager import llms
|
|
|
|
llm = ""
|
|
choices = [("(default)", "")]
|
|
try:
|
|
choices += [(_, _) for _ in llms.options().keys()]
|
|
except Exception as e:
|
|
logger.exception(f"Failed to get LLM options: {e}")
|
|
|
|
return {
|
|
"llm": {
|
|
"name": "Language model",
|
|
"value": llm,
|
|
"component": "dropdown",
|
|
"choices": choices,
|
|
"special_type": "llm",
|
|
"info": (
|
|
"The language model to use for generating the answer. If None, "
|
|
"the application default language model will be used."
|
|
),
|
|
},
|
|
"highlight_citation": {
|
|
"name": "Highlight Citation",
|
|
"value": True,
|
|
"component": "checkbox",
|
|
},
|
|
"system_prompt": {
|
|
"name": "System Prompt",
|
|
"value": "This is a question answering system",
|
|
},
|
|
"qa_prompt": {
|
|
"name": "QA Prompt (contains {context}, {question}, {lang})",
|
|
"value": DEFAULT_QA_TEXT_PROMPT,
|
|
},
|
|
"n_last_interactions": {
|
|
"name": "Number of interactions to include",
|
|
"value": 5,
|
|
"component": "number",
|
|
"info": "The maximum number of chat interactions to include in the LLM",
|
|
},
|
|
"trigger_context": {
|
|
"name": "Maximum message length for context rewriting",
|
|
"value": 150,
|
|
"component": "number",
|
|
"info": (
|
|
"The maximum length of the message to trigger context addition. "
|
|
"Exceeding this length, the message will be used as is."
|
|
),
|
|
},
|
|
}
|
|
|
|
@classmethod
|
|
def get_info(cls) -> dict:
|
|
return {
|
|
"id": "simple",
|
|
"name": "Simple QA",
|
|
"description": (
|
|
"Simple RAG-based question answering pipeline. This pipeline can "
|
|
"perform both keyword search and similarity search to retrieve the "
|
|
"context. After that it includes that context to generate the answer."
|
|
),
|
|
}
|
|
|
|
|
|
class FullDecomposeQAPipeline(FullQAPipeline):
|
|
def answer_sub_questions(
|
|
self, messages: list, conv_id: str, history: list, **kwargs
|
|
):
|
|
output_str = ""
|
|
for idx, message in enumerate(messages):
|
|
yield Document(
|
|
channel="chat",
|
|
content=f"<br><b>Sub-question {idx + 1}</b>"
|
|
f"<br>{message}<br><b>Answer</b><br>",
|
|
)
|
|
# should populate the context
|
|
docs, infos = self.retrieve(message, history)
|
|
print(f"Got {len(docs)} retrieved documents")
|
|
|
|
yield from infos
|
|
|
|
evidence_mode, evidence, images = self.evidence_pipeline(docs).content
|
|
answer = yield from self.answering_pipeline.stream(
|
|
question=message,
|
|
history=history,
|
|
evidence=evidence,
|
|
evidence_mode=evidence_mode,
|
|
images=images,
|
|
conv_id=conv_id,
|
|
**kwargs,
|
|
)
|
|
|
|
output_str += (
|
|
f"Sub-question {idx + 1}-th: '{message}'\nAnswer: '{answer.text}'\n\n"
|
|
)
|
|
|
|
return output_str
|
|
|
|
def stream( # type: ignore
|
|
self, message: str, conv_id: str, history: list, **kwargs # type: ignore
|
|
) -> Generator[Document, None, Document]:
|
|
sub_question_answer_output = ""
|
|
if self.rewrite_pipeline:
|
|
print("Chosen rewrite pipeline", self.rewrite_pipeline)
|
|
result = self.rewrite_pipeline(question=message)
|
|
print("Rewrite result", result)
|
|
if isinstance(result, Document):
|
|
message = result.text
|
|
elif (
|
|
isinstance(result, list)
|
|
and len(result) > 0
|
|
and isinstance(result[0], Document)
|
|
):
|
|
yield Document(
|
|
channel="chat",
|
|
content="<h4>Sub questions and their answers</h4>",
|
|
)
|
|
sub_question_answer_output = yield from self.answer_sub_questions(
|
|
[r.text for r in result], conv_id, history, **kwargs
|
|
)
|
|
|
|
yield Document(
|
|
channel="chat",
|
|
content=f"<h4>Main question</h4>{message}<br><b>Answer</b><br>",
|
|
)
|
|
|
|
# should populate the context
|
|
docs, infos = self.retrieve(message, history)
|
|
print(f"Got {len(docs)} retrieved documents")
|
|
yield from infos
|
|
|
|
evidence_mode, evidence, images = self.evidence_pipeline(docs).content
|
|
answer = yield from self.answering_pipeline.stream(
|
|
question=message,
|
|
history=history,
|
|
evidence=evidence + "\n" + sub_question_answer_output,
|
|
evidence_mode=evidence_mode,
|
|
images=images,
|
|
conv_id=conv_id,
|
|
**kwargs,
|
|
)
|
|
|
|
# show the evidence
|
|
with_citation, without_citation = self.prepare_citations(answer, docs)
|
|
if not with_citation and not without_citation:
|
|
yield Document(channel="info", content="<h5><b>No evidence found.</b></h5>")
|
|
else:
|
|
yield Document(channel="info", content=None)
|
|
yield from with_citation
|
|
yield from without_citation
|
|
|
|
return answer
|
|
|
|
@classmethod
|
|
def get_user_settings(cls) -> dict:
|
|
user_settings = super().get_user_settings()
|
|
user_settings["decompose_prompt"] = {
|
|
"name": "Decompose Prompt",
|
|
"value": DecomposeQuestionPipeline.DECOMPOSE_SYSTEM_PROMPT_TEMPLATE,
|
|
}
|
|
return user_settings
|
|
|
|
@classmethod
|
|
def get_pipeline(cls, settings, states, retrievers):
|
|
"""Get the reasoning pipeline
|
|
|
|
Args:
|
|
settings: the settings for the pipeline
|
|
retrievers: the retrievers to use
|
|
"""
|
|
prefix = f"reasoning.options.{cls.get_info()['id']}"
|
|
pipeline = cls(
|
|
retrievers=retrievers,
|
|
rewrite_pipeline=DecomposeQuestionPipeline(
|
|
prompt_template=settings.get(f"{prefix}.decompose_prompt")
|
|
),
|
|
)
|
|
|
|
llm_name = settings.get(f"{prefix}.llm", None)
|
|
llm = llms.get(llm_name, llms.get_default())
|
|
|
|
# answering pipeline configuration
|
|
answer_pipeline = pipeline.answering_pipeline
|
|
answer_pipeline.llm = llm
|
|
answer_pipeline.citation_pipeline.llm = llm
|
|
answer_pipeline.n_last_interactions = settings[f"{prefix}.n_last_interactions"]
|
|
answer_pipeline.enable_citation = settings[f"{prefix}.highlight_citation"]
|
|
answer_pipeline.system_prompt = settings[f"{prefix}.system_prompt"]
|
|
answer_pipeline.qa_template = settings[f"{prefix}.qa_prompt"]
|
|
answer_pipeline.lang = SUPPORTED_LANGUAGE_MAP.get(
|
|
settings["reasoning.lang"], "English"
|
|
)
|
|
|
|
pipeline.add_query_context.llm = llm
|
|
pipeline.add_query_context.n_last_interactions = settings[
|
|
f"{prefix}.n_last_interactions"
|
|
]
|
|
|
|
pipeline.trigger_context = settings[f"{prefix}.trigger_context"]
|
|
pipeline.use_rewrite = states.get("app", {}).get("regen", False)
|
|
if pipeline.rewrite_pipeline:
|
|
pipeline.rewrite_pipeline.llm = llm
|
|
return pipeline
|
|
|
|
@classmethod
|
|
def get_info(cls) -> dict:
|
|
return {
|
|
"id": "complex",
|
|
"name": "Complex QA",
|
|
"description": (
|
|
"Use multi-step reasoning to decompose a complex question into "
|
|
"multiple sub-questions. This pipeline can "
|
|
"perform both keyword search and similarity search to retrieve the "
|
|
"context. After that it includes that context to generate the answer."
|
|
),
|
|
}
|