Commit Graph

74 Commits

Author SHA1 Message Date
trducng
757aabca4d Add app title, favicon. More natural chat 2024-01-25 22:40:32 +07:00
Duc Nguyen (john)
513e86f490 Add dedicated information panel to the UI (#137)
* Allow streaming to the chatbot and the information panel without threading
* Highlight evidence in a simple manner
2024-01-25 19:07:53 +07:00
Duc Nguyen (john)
ebc61400d8 Provide a developer mode when running ktem (#135)
Implement and utilize `on_app_created` to support the developer mode.
2024-01-23 11:46:59 +07:00
Duc Nguyen (john)
2dd531114f Make ktem official (#134)
* Move kotaemon and ktem into same folder

* Update docs

* Update CI

* Resolve mypy, isorts

* Re-allow test pdf files
2024-01-23 10:54:18 +07:00
Duc Nguyen (john)
9c5b707010 Customize application settings (#132)
* Allow customizing the base application

* Make the core llms and embeddings customizable

* Make the settings, reasoning and index customizable

* Import from langchain_openai
2024-01-21 14:36:07 +07:00
Duc Nguyen (john)
5a9d6f75be Migrate the MVP into kotaemon (#108)
- Migrate the MVP into kotaemon.
- Preliminary include the pipeline within chatbot interface.
- Organize MVP as an application.

Todo:

- Add an info panel to view the planning of agents -> Fix streaming agents' output.

Resolve: #60
Resolve: #61 
Resolve: #62
2024-01-10 15:28:09 +07:00
ian_Cin
230328c62f Best docs Cinnamon will probably ever have (#105) 2023-12-20 11:30:25 +07:00
Duc Nguyen (john)
0e30dcbb06 Create Langchain LLM converter to quickly supply it to Langchain's chain (#102)
* Create Langchain LLM converter to quickly supply it to Langchain's chain

* Clean up
2023-12-11 14:55:56 +07:00
Duc Nguyen (john)
da0ac1d69f Change template to private attribute and simplify imports (#101)
---------

Co-authored-by: ian <ian@cinnamon.is>
2023-12-08 18:10:34 +07:00
Duc Nguyen (john)
1f927d3391 Upgrade promptui to conform to Gradio V4 (#98) 2023-12-07 15:24:07 +07:00
ian_Cin
797df5a69c refractor agents (#100)
* refractor agents

* minor cosmetic, add terminal ui for cli

* pump to 0.3.4

* Add temporary path

* fix unclose files in tests

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-12-06 17:06:29 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
d9e925eb75 Add UnstructuredReader with support for various legacy files (.doc, .xls) (#99) 2023-12-05 16:19:13 +07:00
Duc Nguyen (john)
37c744b616 Add file-based document store and vector store (#96)
* Modify docstore and vectorstore objects to be reconstructable
* Simplify the file docstore
* Use the simple file docstore and vector store in MVP
2023-12-04 17:46:00 +07:00
Duc Nguyen (john)
0ce3a8832f Provide type hints for pass-through Langchain and Llama-index objects (#95) 2023-12-04 10:59:13 +07:00
Duc Nguyen (john)
e34b1e4c6d Refactor the index component and update the MVP insurance accordingly (#90)
Refactor the `kotaemon/pipelines` module to `kotaemon/indices`. Create the VectorIndex.

Note: currently I place `qa` to be inside `kotaemon/indices` since at the moment we only have `qa` in RAG. At the same time, I think `qa` can be an independent module in `kotaemon/qa`. Since this can be changed later, I still go at the 1st option for now to observe if we can change it later.
2023-11-30 18:35:07 +07:00
Nguyen Trung Duc (john)
8e3a1d193f Refactor agents and tools (#91)
* Move tools to agents

* Move agents to dedicate place

* Remove subclassing BaseAgent from BaseTool
2023-11-30 09:52:08 +07:00
ian_Cin
4256030b4f Adopt pyproject.toml (#89)
* ditching setup.py in favour of pyproject.toml; bump to 0.3.2

* bump to 0.3.3
2023-11-29 14:58:35 +07:00
ian_Cin
8e0779a22d Enforce all IO objects to be subclassed from Document (#88)
* enforce Document as IO

* Separate rerankers, splitters and extractors (#85)

* partially refractor importing

* add text to embedding outputs

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
2023-11-27 16:35:09 +07:00
Nguyen Trung Duc (john)
2186c5558f Separate rerankers, splitters and extractors (#85) 2023-11-27 14:25:54 +07:00
ian_Cin
0dede9c82d Subclass chat messages from Document (#86) 2023-11-27 10:38:19 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
3ac277cc0b Update Elastics store delete() (#84) 2023-11-21 15:29:00 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
9a96a9b876 Add Elasticsearch Docstore (#83)
* add Elasticsearch Docstore

* update missing requirements

* add docstore

* [ignore cache] update default param

* update docstring
2023-11-21 11:59:20 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
8bb7ad91e0 Add Langchain Agent wrapper with OpenAI Function / Self-ask agent support (#82)
* update Param() type hint in MVP

* update default embedding endpoint

* update Langchain agent wrapper

* update langchain agent
2023-11-20 16:26:08 +07:00
Nguyen Trung Duc (john)
0a3fc4b228 Correct the use of abstractmethod (#80)
* Correct abstractmethod usage

* Update interface

* Specify minimal llama-index version [ignore cache]

* Update examples
2023-11-20 11:18:53 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
98509f886c Update splitters + metadata extractor interface to conform with new LlamaIndex design (#81)
* change splitter to general doc parsers class to fit new llama-index desing
* moving interface of splitter
2023-11-20 10:09:30 +07:00
Nguyen Trung Duc (john)
98c76c4700 Refactor excel Loader (#79) 2023-11-16 11:30:11 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
cc1e75b3c6 Add Citation pipeline (#78)
* add rerankers in retrieving pipeline

* update example MVP pipeline

* add citation pipeline and function call interface

* change return type of QA and AgentPipeline to Document
2023-11-16 11:24:35 +07:00
Nguyen Trung Duc (john)
f8b8d86d4e Move LLM-related components into LLM module (#74)
* Move splitter into indexing module
* Rename post_processing module to parsers
* Migrate LLM-specific composite pipelines into llms module

This change moves the `splitters` module into `indexing` module. The `indexing` module will be created soon, to house `indexing`-related components.

This change renames `post_processing` module into `parsers` module. Post-processing is a generic term which provides very little information. In the future, we will add other extractors into the `parser` module, like Metadata extractor...

This change migrates the composite elements into `llms` module. These elements heavily assume that the internal nodes are llm-specific. As a result, migrating these elements into `llms` module will make them more discoverable, and simplify code base structure.
2023-11-15 16:26:53 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
9945afdf6f Add Reranker implementation and integration in Retrieving pipeline (#77)
* Add base Reranker
* Add LLM Reranker
* Add Cohere Reranker
* Add integration of Rerankers in Retrieving pipeline
2023-11-15 16:03:51 +07:00
Nguyen Trung Duc (john)
b52f312d8e Use new Langchain's dedicated Azure OpenAI embedding class (#76)
* Use new Langchain's dedicated Azure OpenAI embedding class

* Update test
2023-11-15 14:46:32 +07:00
Nguyen Trung Duc (john)
b159897ac6 Combine docstores and vectorstores within a storages component (#72) 2023-11-14 17:50:57 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
640962e916 Update retrieving + agent pipeline (#71) 2023-11-14 16:40:13 +07:00
Nguyen Trung Duc (john)
693ed39de4 Move prompts into LLMs module (#70)
Since the only usage of prompt is within LLMs, it is reasonable to keep it within the LLM module. This way, it would be easier to discover module, and make the code base less complicated.

Changes:

* Move prompt components into llms
* Bump version 0.3.1
* Make pip install dependencies in eager mode

---------

Co-authored-by: ian <ian@cinnamon.is>
2023-11-14 16:00:10 +07:00
Nguyen Trung Duc (john)
8532138842 Move Document and other interface into base/schema (#69) 2023-11-14 11:51:10 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
4704e2c11a Add new OCRReader with PDF+OCR text merging (#66)
This change speeds up OCR extraction by allowing bypassing OCR for texts that are irrelevant (not in table).

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
2023-11-13 17:43:02 +07:00
Nguyen Trung Duc (john)
d79b3744cb Simplify the BaseComponent inteface (#64)
This change remove `BaseComponent`'s:

- run_raw
- run_batch_raw
- run_document
- run_batch_document
- is_document
- is_batch

Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type.

To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components.

Tests are updated accordingly.

Commit changes:

* Add kwargs to vector store's query
* Simplify the BaseComponent
* Update tests
* Remove support for Python 3.8 and 3.9
* Bump version 0.3.0
* Fix github PR caching still use old environment after bumping version

---------

Co-authored-by: ian <ian@cinnamon.is>
2023-11-13 15:10:18 +07:00
ian_Cin
6095526dc7 Add Huggingface embeddings and Cohere embeddings (#63)
* Add huggingface embeddings and cohere embeddings
* Update openai interface and the mock for newer OpenAI SDK

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-11-10 09:38:30 +07:00
Nguyen Trung Duc (john)
9035e25666 Upgrade the declarative pipeline for cleaner interface (#51) 2023-10-24 11:12:22 +07:00
Nguyen Trung Duc (john)
aab982ddc4 Provide ready binary for Mac and Linux to do sharing tunneling (#49) 2023-10-17 17:19:29 +07:00
ian_Cin
2b779926c6 Directly caching the python instead of creating virtual env; add option to ignore caching (#45)
- Directly caching the python instead of creating virtual env
- add option to ignore caching using `[ignore catch]` in the commit message
2023-10-16 15:27:14 +07:00
Nguyen Trung Duc (john)
da6b35f520 Allow persisting the expected output in the code (#46)
By allowing specifying the UI outputs in the code, any time user runs `kh export ...`, that outputs in the code will be included in the UI YAML file. Otherwise, any time the user runs `kh export ...`, the output section in the UI YAML file will be reset to the default output.
2023-10-13 10:26:48 +07:00
Nguyen Trung Duc (john)
6e7905cbc0 [AUR-411] Adopt to Example2 project (#28)
Add the chatbot from Example2. Create the UI for chat.
2023-10-12 15:13:25 +07:00
ian_Cin
533fffa6db Enable caching for github actions (#43) 2023-10-12 13:52:19 +07:00
ian_Cin
84f1fa8cbd [AUR-395] Adopt Example1 disclaimer pipeline (#42)
* Adopt Example1 disclaimer pipeline
* Update Document class
* Add composite components
* Modify Extractor behaviours
2023-10-10 15:42:48 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
79cc60e6a2 [AUR-429] Add MVP pipeline with Ingestion and QA stage (#39)
* add base Tool

* minor update test_tool

* update test dependency

* update test dependency

* Fix namespace conflict

* update test

* add base Agent Interface, add ReWoo Agent

* minor update

* update test

* fix typo

* remove unneeded print

* update rewoo agent

* add LLMTool

* update BaseAgent type

* add ReAct agent

* add ReAct agent

* minor update

* minor update

* minor update

* minor update

* update base reader with BaseComponent

* add splitter

* update agent and tool

* update vectorstores

* update load/save for indexing and retrieving pipeline

* update test_agent for more use-cases

* add missing dependency for test

* update test case for in memory vectorstore

* add TextSplitter to BaseComponent

* update type hint basetool

* add insurance mvp pipeline

* update requirements

* Remove redundant plugins param

* Mock GoogleSearch

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-10-05 12:31:33 +07:00
ian_Cin
2638152054 [Feat] Add support for f-string syntax in PromptTemplate (#38)
* Add support for f-string syntax in PromptTemplate
2023-10-04 16:40:09 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
56bc41b673 Update Base interface of Index/Retrieval pipeline (#36)
* add base Tool

* minor update test_tool

* update test dependency

* update test dependency

* Fix namespace conflict

* update test

* add base Agent Interface, add ReWoo Agent

* minor update

* update test

* fix typo

* remove unneeded print

* update rewoo agent

* add LLMTool

* update BaseAgent type

* add ReAct agent

* add ReAct agent

* minor update

* minor update

* minor update

* minor update

* update base reader with BaseComponent

* add splitter

* update agent and tool

* update vectorstores

* update load/save for indexing and retrieving pipeline

* update test_agent for more use-cases

* add missing dependency for test

* update test case for in memory vectorstore

* add TextSplitter to BaseComponent

* update type hint basetool

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-10-04 14:27:44 +07:00
Nguyen Trung Duc (john)
49ed3f6994 [AUR-405] Auto-generate markdown documentation from pipeline (#33)
* Create a script to auto-generate markdown docs from pipeline
* Clean up documentation for Chain-of-Thought
2023-10-04 10:54:24 +07:00
Nguyen Trung Duc (john)
6ab1854532 feat: Add chain-of-thought (#37)
* Add chain-of-thought

* Use BasePromptComponent

* Add terminate callback for the chain-of-thought
2023-10-04 02:16:33 +07:00
Nguyen Trung Duc (john)
f80a4ea883 [AUR-425] Fix the cookiecutter command (#35) 2023-10-03 12:13:10 +07:00