Commit Graph

21 Commits

Author SHA1 Message Date
Tuan Anh Nguyen Dang (Tadashi_Cin)
9a96a9b876 Add Elasticsearch Docstore (#83)
* add Elasticsearch Docstore

* update missing requirements

* add docstore

* [ignore cache] update default param

* update docstring
2023-11-21 11:59:20 +07:00
Nguyen Trung Duc (john)
0a3fc4b228 Correct the use of abstractmethod (#80)
* Correct abstractmethod usage

* Update interface

* Specify minimal llama-index version [ignore cache]

* Update examples
2023-11-20 11:18:53 +07:00
Nguyen Trung Duc (john)
d79b3744cb Simplify the BaseComponent inteface (#64)
This change remove `BaseComponent`'s:

- run_raw
- run_batch_raw
- run_document
- run_batch_document
- is_document
- is_batch

Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type.

To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components.

Tests are updated accordingly.

Commit changes:

* Add kwargs to vector store's query
* Simplify the BaseComponent
* Update tests
* Remove support for Python 3.8 and 3.9
* Bump version 0.3.0
* Fix github PR caching still use old environment after bumping version

---------

Co-authored-by: ian <ian@cinnamon.is>
2023-11-13 15:10:18 +07:00
ian_Cin
6095526dc7 Add Huggingface embeddings and Cohere embeddings (#63)
* Add huggingface embeddings and cohere embeddings
* Update openai interface and the mock for newer OpenAI SDK

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-11-10 09:38:30 +07:00
Nguyen Trung Duc (john)
9035e25666 Upgrade the declarative pipeline for cleaner interface (#51) 2023-10-24 11:12:22 +07:00
Nguyen Trung Duc (john)
6e7905cbc0 [AUR-411] Adopt to Example2 project (#28)
Add the chatbot from Example2. Create the UI for chat.
2023-10-12 15:13:25 +07:00
ian_Cin
533fffa6db Enable caching for github actions (#43) 2023-10-12 13:52:19 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
79cc60e6a2 [AUR-429] Add MVP pipeline with Ingestion and QA stage (#39)
* add base Tool

* minor update test_tool

* update test dependency

* update test dependency

* Fix namespace conflict

* update test

* add base Agent Interface, add ReWoo Agent

* minor update

* update test

* fix typo

* remove unneeded print

* update rewoo agent

* add LLMTool

* update BaseAgent type

* add ReAct agent

* add ReAct agent

* minor update

* minor update

* minor update

* minor update

* update base reader with BaseComponent

* add splitter

* update agent and tool

* update vectorstores

* update load/save for indexing and retrieving pipeline

* update test_agent for more use-cases

* add missing dependency for test

* update test case for in memory vectorstore

* add TextSplitter to BaseComponent

* update type hint basetool

* add insurance mvp pipeline

* update requirements

* Remove redundant plugins param

* Mock GoogleSearch

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-10-05 12:31:33 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
56bc41b673 Update Base interface of Index/Retrieval pipeline (#36)
* add base Tool

* minor update test_tool

* update test dependency

* update test dependency

* Fix namespace conflict

* update test

* add base Agent Interface, add ReWoo Agent

* minor update

* update test

* fix typo

* remove unneeded print

* update rewoo agent

* add LLMTool

* update BaseAgent type

* add ReAct agent

* add ReAct agent

* minor update

* minor update

* minor update

* minor update

* update base reader with BaseComponent

* add splitter

* update agent and tool

* update vectorstores

* update load/save for indexing and retrieving pipeline

* update test_agent for more use-cases

* add missing dependency for test

* update test case for in memory vectorstore

* add TextSplitter to BaseComponent

* update type hint basetool

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-10-04 14:27:44 +07:00
cin-jacky
205955b8a3 [AUR-387, AUR-425] Add start-project to CLI (#29) 2023-10-03 11:55:34 +07:00
ian_Cin
d83c22aa4e [AUR-395, AUR-415] Adopt Example1 Injury pipeline; add .flow() for enabling bottom-up pipeline execution (#32)
* add example1/injury pipeline example
* add dotenv
* update various api
2023-10-02 16:24:56 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
f9fc02a32a [AUR-363, AUR-433, AUR-434] Add Base Tool interface with Wikipedia/Google tools (#30)
* add base Tool

* minor update test_tool

* update test dependency

* update test dependency

* Fix namespace conflict

* update test

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-09-29 10:18:49 +07:00
cin-jacky
317323c0e5 [AUR-424] Setup CLI interface (#25)
* [AUR-424] Setup CLI interface

* [AUR-424] fix test_vectorstore:test_query

* [AUR-424] exclude examples when setup CLI

* [AUR-424] create kh and kh --export

* [AUR-426] revise cli by using click.group

* Fix dynamic import

* [AUR-426] revert the format of import packages

* [AUR-426] set argument default

* [AUR-426] set click dependencies in setup.py

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-09-27 16:44:38 +09:00
Nguyen Trung Duc (john)
4f189dc931 [AUR-408] Export logs to Excel (#23)
This CL implements:

- The logic to export log to Excel.
- Route the export logic in the UI.
- Demonstrate this functionality in `./examples/promptui` project.
2023-09-25 17:20:03 +07:00
Nguyen Trung Duc (john)
c6dd01e820 [AUR-338, AUR-406, AUR-407] Export pipeline to config for PromptUI. Construct PromptUI dynamically based on config. (#16)
From pipeline > config > UI. Provide example project for promptui

- Pipeline to config: `kotaemon.contribs.promptui.config.export_pipeline_to_config`. The config follows schema specified in this document: https://cinnamon-ai.atlassian.net/wiki/spaces/ATM/pages/2748711193/Technical+Detail. Note: this implementation exclude the logs, which will be handled in AUR-408.
- Config to UI: `kotaemon.contribs.promptui.build_from_yaml`
- Example project is located at `examples/promptui/`
2023-09-21 14:27:23 +07:00
Nguyen Trung Duc (john)
620b2b03ca [AUR-392, AUR-413, AUR-414] Define base vector store, and make use of ChromaVectorStore from llama_index. Indexing and retrieving vectors with vector store (#18)
Design the base interface of vector store, and apply it to the Chroma Vector Store (wrapped around llama_index's implementation). Provide the pipelines to populate and retrieve from vector store.
2023-09-14 14:18:20 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
21350153d4 [AUR-391, AUR-393] Add Document and DocumentReader base (#6)
* Declare BaseComponent

* Brainstorming base class for LLM call

* Define base LLM

* Add tests

* Clean telemetry environment for accurate testing

* Fix README

* Fix typing

* add base document reader

* update test

* update requirements

* Cosmetic change

* update requirements

* reformat

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-08-31 11:24:12 +07:00
ian_Cin
5241edbc46 [AUR-361] Setup pre-commit, pytest, GitHub actions, ssh-secret (#3)
Co-authored-by: trducng <trungduc1992@gmail.com>
2023-08-30 07:22:01 +07:00
Nguyen Trung Duc (john)
c3c25db48c [AUR-385, AUR-388] Declare BaseComponent and decide LLM call interface (#2)
- Use cases related to LLM call: https://cinnamon-ai.atlassian.net/browse/AUR-388?focusedCommentId=34873
- Sample usages: `test_llms_chat_models.py` and `test_llms_completion_models.py`:

```python
from kotaemon.llms.chats.openai import AzureChatOpenAI

model = AzureChatOpenAI(
    openai_api_base="https://test.openai.azure.com/",
    openai_api_key="some-key",
    openai_api_version="2023-03-15-preview",
    deployment_name="gpt35turbo",
    temperature=0,
    request_timeout=60,
)
output = model("hello world")
```

For the LLM-call component, I decide to wrap around Langchain's LLM models and Langchain's Chat models. And set the interface as follow:

- Completion LLM component:
```python
class CompletionLLM:

    def run_raw(self, text: str) -> LLMInterface:
        # Run text completion: str in -> LLMInterface out

    def run_batch_raw(self, text: list[str]) -> list[LLMInterface]:
        # Run text completion in batch: list[str] in -> list[LLMInterface] out

# run_document and run_batch_document just reuse run_raw and run_batch_raw, due to unclear use case
```

- Chat LLM component:
```python
class ChatLLM:
    def run_raw(self, text: str) -> LLMInterface:
        # Run chat completion (no chat history): str in -> LLMInterface out

    def run_batch_raw(self, text: list[str]) -> list[LLMInterface]:
        # Run chat completion in batch mode (no chat history): list[str] in -> list[LLMInterface] out

    def run_document(self, text: list[BaseMessage]) -> LLMInterface:
        # Run chat completion (with chat history): list[langchain's BaseMessage] in -> LLMInterface out

    def run_batch_document(self, text: list[list[BaseMessage]]) -> list[LLMInterface]:
        # Run chat completion in batch mode (with chat history): list[list[langchain's BaseMessage]] in -> list[LLMInterface] out
```

- The LLMInterface is as follow:

```python
@dataclass
class LLMInterface:
    text: list[str]
    completion_tokens: int = -1
    total_tokens: int = -1
    prompt_tokens: int = -1
    logits: list[list[float]] = field(default_factory=list)
```
2023-08-29 15:47:12 +07:00
Nguyen Trung Duc (john)
e9d1d5c118 [AUR-401] Disable Haystack telemetry with monkey patching (#1)
Sample Haystack log when running a pipeline. Note: the `pipeline.classname` can leak company information.

```json
{
  "hardware.cpus": 16,
  "hardware.gpus": 0,
  "libraries.colab": false,
  "libraries.cuda": false,
  "libraries.haystack": "1.20.0rc0",
  "libraries.ipython": false,
  "libraries.pytest": false,
  "libraries.ray": false,
  "libraries.torch": false,
  "libraries.transformers": "4.31.0",
  "os.containerized": false,
  "os.family": "Linux",
  "os.machine": "x86_64",
  "os.version": "6.2.0-26-generic",
  "pipeline.classname": "TempPipeline",
  "pipeline.config_hash": "07a8eddd5a6e512c0d898c6d9f445ed9",
  "pipeline.nodes.PromptNode": 1,
  "pipeline.nodes.Shaper": 1,
  "pipeline.nodes.WebRetriever": 1,
  "pipeline.run_parameters.debug": false,
  "pipeline.run_parameters.documents": [
    0
  ],
  "pipeline.run_parameters.file_paths": 0,
  "pipeline.run_parameters.labels": 0,
  "pipeline.run_parameters.meta": 1,
  "pipeline.run_parameters.params": false,
  "pipeline.run_parameters.queries": true,
  "pipeline.runs": 1,
  "pipeline.type": "Query",
  "python.version": "3.10.12"
}
```

Solution: Haystack telemetry uses the `telemetry` variable, `posthog` library and `HAYSTACK_TELEMETRY_ENABLED` envar. We set the envar to False and make sure the relevant objects are disabled.
2023-08-22 10:02:46 +07:00
trducng
043209fda7 Initiate repository 2023-08-16 14:56:48 +07:00