This change speeds up OCR extraction by allowing bypassing OCR for texts that are irrelevant (not in table).
---------
Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
This change remove `BaseComponent`'s:
- run_raw
- run_batch_raw
- run_document
- run_batch_document
- is_document
- is_batch
Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type.
To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components.
Tests are updated accordingly.
Commit changes:
* Add kwargs to vector store's query
* Simplify the BaseComponent
* Update tests
* Remove support for Python 3.8 and 3.9
* Bump version 0.3.0
* Fix github PR caching still use old environment after bumping version
---------
Co-authored-by: ian <ian@cinnamon.is>
By allowing specifying the UI outputs in the code, any time user runs `kh export ...`, that outputs in the code will be included in the UI YAML file. Otherwise, any time the user runs `kh export ...`, the output section in the UI YAML file will be reset to the default output.
* add base Tool
* minor update test_tool
* update test dependency
* update test dependency
* Fix namespace conflict
* update test
* add base Agent Interface, add ReWoo Agent
* minor update
* update test
* fix typo
* remove unneeded print
* update rewoo agent
* add LLMTool
* update BaseAgent type
* add ReAct agent
* add ReAct agent
* minor update
* minor update
* minor update
* minor update
* update base reader with BaseComponent
* add splitter
* update agent and tool
* update vectorstores
* update load/save for indexing and retrieving pipeline
* update test_agent for more use-cases
* add missing dependency for test
* update test case for in memory vectorstore
* add TextSplitter to BaseComponent
* update type hint basetool
---------
Co-authored-by: trducng <trungduc1992@gmail.com>
* add test case for Chroma save/load
* minor name change
* add delete_collection support for chroma
* move save load to chroma
---------
Co-authored-by: Nguyen Trung Duc (john) <john@cinnamon.is>
This CL implements:
- The logic to export log to Excel.
- Route the export logic in the UI.
- Demonstrate this functionality in `./examples/promptui` project.
From pipeline > config > UI. Provide example project for promptui
- Pipeline to config: `kotaemon.contribs.promptui.config.export_pipeline_to_config`. The config follows schema specified in this document: https://cinnamon-ai.atlassian.net/wiki/spaces/ATM/pages/2748711193/Technical+Detail. Note: this implementation exclude the logs, which will be handled in AUR-408.
- Config to UI: `kotaemon.contribs.promptui.build_from_yaml`
- Example project is located at `examples/promptui/`
* [AUR-362] Add In-memory vector store
* [AUR-362] fix delete fun input format
* [AUR-362] revise persist and from persist path to save and load
* [AUR-362] revise simple.py to in_memory.py
Document store handles storing and indexing Documents. It supports the following interfaces:
- add: add 1 or more documents into document store
- get: get a list of documents
- get_all: get all documents in a document store
- delete: delete 1 or more document
- save: persist a document store into disk
- load: load a document store from disk
Design the base interface of vector store, and apply it to the Chroma Vector Store (wrapped around llama_index's implementation). Provide the pipelines to populate and retrieve from vector store.
This change provides the base interface of an embedding, and wrap the Langchain's OpenAI embedding. Usage as follow:
```python
from kotaemon.embeddings import AzureOpenAIEmbeddings
model = AzureOpenAIEmbeddings(
model="text-embedding-ada-002",
deployment="embedding-deployment",
openai_api_base="https://test.openai.azure.com/",
openai_api_key="some-key",
)
output = model("Hello world")
```
- Use cases related to LLM call: https://cinnamon-ai.atlassian.net/browse/AUR-388?focusedCommentId=34873
- Sample usages: `test_llms_chat_models.py` and `test_llms_completion_models.py`:
```python
from kotaemon.llms.chats.openai import AzureChatOpenAI
model = AzureChatOpenAI(
openai_api_base="https://test.openai.azure.com/",
openai_api_key="some-key",
openai_api_version="2023-03-15-preview",
deployment_name="gpt35turbo",
temperature=0,
request_timeout=60,
)
output = model("hello world")
```
For the LLM-call component, I decide to wrap around Langchain's LLM models and Langchain's Chat models. And set the interface as follow:
- Completion LLM component:
```python
class CompletionLLM:
def run_raw(self, text: str) -> LLMInterface:
# Run text completion: str in -> LLMInterface out
def run_batch_raw(self, text: list[str]) -> list[LLMInterface]:
# Run text completion in batch: list[str] in -> list[LLMInterface] out
# run_document and run_batch_document just reuse run_raw and run_batch_raw, due to unclear use case
```
- Chat LLM component:
```python
class ChatLLM:
def run_raw(self, text: str) -> LLMInterface:
# Run chat completion (no chat history): str in -> LLMInterface out
def run_batch_raw(self, text: list[str]) -> list[LLMInterface]:
# Run chat completion in batch mode (no chat history): list[str] in -> list[LLMInterface] out
def run_document(self, text: list[BaseMessage]) -> LLMInterface:
# Run chat completion (with chat history): list[langchain's BaseMessage] in -> LLMInterface out
def run_batch_document(self, text: list[list[BaseMessage]]) -> list[LLMInterface]:
# Run chat completion in batch mode (with chat history): list[list[langchain's BaseMessage]] in -> list[LLMInterface] out
```
- The LLMInterface is as follow:
```python
@dataclass
class LLMInterface:
text: list[str]
completion_tokens: int = -1
total_tokens: int = -1
prompt_tokens: int = -1
logits: list[list[float]] = field(default_factory=list)
```