kotaemon

Author	SHA1	Message	Date
ian_Cin	4256030b4f	Adopt pyproject.toml (#89 ) * ditching setup.py in favour of pyproject.toml; bump to 0.3.2 * bump to 0.3.3	2023-11-29 14:58:35 +07:00
Nguyen Trung Duc (john)	0a3fc4b228	Correct the use of abstractmethod (#80 ) * Correct abstractmethod usage * Update interface * Specify minimal llama-index version [ignore cache] * Update examples	2023-11-20 11:18:53 +07:00
Nguyen Trung Duc (john)	693ed39de4	Move prompts into LLMs module (#70 ) Since the only usage of prompt is within LLMs, it is reasonable to keep it within the LLM module. This way, it would be easier to discover module, and make the code base less complicated. Changes: * Move prompt components into llms * Bump version 0.3.1 * Make pip install dependencies in eager mode --------- Co-authored-by: ian <ian@cinnamon.is>	2023-11-14 16:00:10 +07:00
Nguyen Trung Duc (john)	d79b3744cb	Simplify the `BaseComponent` inteface (#64 ) This change remove `BaseComponent`'s: - run_raw - run_batch_raw - run_document - run_batch_document - is_document - is_batch Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type. To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components. Tests are updated accordingly. Commit changes: * Add kwargs to vector store's query * Simplify the BaseComponent * Update tests * Remove support for Python 3.8 and 3.9 * Bump version 0.3.0 * Fix github PR caching still use old environment after bumping version --------- Co-authored-by: ian <ian@cinnamon.is>	2023-11-13 15:10:18 +07:00
Nguyen Trung Duc (john)	9035e25666	Upgrade the declarative pipeline for cleaner interface (#51 )	2023-10-24 11:12:22 +07:00
Nguyen Trung Duc (john)	6e7905cbc0	[AUR-411] Adopt to Example2 project (#28 ) Add the chatbot from Example2. Create the UI for chat.	2023-10-12 15:13:25 +07:00
cin-jacky	205955b8a3	[AUR-387, AUR-425] Add start-project to CLI (#29 )	2023-10-03 11:55:34 +07:00
Nguyen Trung Duc (john)	c3c25db48c	[AUR-385, AUR-388] Declare BaseComponent and decide LLM call interface (#2 ) - Use cases related to LLM call: https://cinnamon-ai.atlassian.net/browse/AUR-388?focusedCommentId=34873 - Sample usages: `test_llms_chat_models.py` and `test_llms_completion_models.py`: ```python from kotaemon.llms.chats.openai import AzureChatOpenAI model = AzureChatOpenAI( openai_api_base="https://test.openai.azure.com/", openai_api_key="some-key", openai_api_version="2023-03-15-preview", deployment_name="gpt35turbo", temperature=0, request_timeout=60, ) output = model("hello world") ``` For the LLM-call component, I decide to wrap around Langchain's LLM models and Langchain's Chat models. And set the interface as follow: - Completion LLM component: ```python class CompletionLLM: def run_raw(self, text: str) -> LLMInterface: # Run text completion: str in -> LLMInterface out def run_batch_raw(self, text: list[str]) -> list[LLMInterface]: # Run text completion in batch: list[str] in -> list[LLMInterface] out # run_document and run_batch_document just reuse run_raw and run_batch_raw, due to unclear use case ``` - Chat LLM component: ```python class ChatLLM: def run_raw(self, text: str) -> LLMInterface: # Run chat completion (no chat history): str in -> LLMInterface out def run_batch_raw(self, text: list[str]) -> list[LLMInterface]: # Run chat completion in batch mode (no chat history): list[str] in -> list[LLMInterface] out def run_document(self, text: list[BaseMessage]) -> LLMInterface: # Run chat completion (with chat history): list[langchain's BaseMessage] in -> LLMInterface out def run_batch_document(self, text: list[list[BaseMessage]]) -> list[LLMInterface]: # Run chat completion in batch mode (with chat history): list[list[langchain's BaseMessage]] in -> list[LLMInterface] out ``` - The LLMInterface is as follow: ```python @dataclass class LLMInterface: text: list[str] completion_tokens: int = -1 total_tokens: int = -1 prompt_tokens: int = -1 logits: list[list[float]] = field(default_factory=list) ```	2023-08-29 15:47:12 +07:00
Nguyen Trung Duc (john)	e9d1d5c118	[AUR-401] Disable Haystack telemetry with monkey patching (#1 ) Sample Haystack log when running a pipeline. Note: the `pipeline.classname` can leak company information. ```json { "hardware.cpus": 16, "hardware.gpus": 0, "libraries.colab": false, "libraries.cuda": false, "libraries.haystack": "1.20.0rc0", "libraries.ipython": false, "libraries.pytest": false, "libraries.ray": false, "libraries.torch": false, "libraries.transformers": "4.31.0", "os.containerized": false, "os.family": "Linux", "os.machine": "x86_64", "os.version": "6.2.0-26-generic", "pipeline.classname": "TempPipeline", "pipeline.config_hash": "07a8eddd5a6e512c0d898c6d9f445ed9", "pipeline.nodes.PromptNode": 1, "pipeline.nodes.Shaper": 1, "pipeline.nodes.WebRetriever": 1, "pipeline.run_parameters.debug": false, "pipeline.run_parameters.documents": [ 0 ], "pipeline.run_parameters.file_paths": 0, "pipeline.run_parameters.labels": 0, "pipeline.run_parameters.meta": 1, "pipeline.run_parameters.params": false, "pipeline.run_parameters.queries": true, "pipeline.runs": 1, "pipeline.type": "Query", "python.version": "3.10.12" } ``` Solution: Haystack telemetry uses the `telemetry` variable, `posthog` library and `HAYSTACK_TELEMETRY_ENABLED` envar. We set the envar to False and make sure the relevant objects are disabled.	2023-08-22 10:02:46 +07:00
trducng	043209fda7	Initiate repository	2023-08-16 14:56:48 +07:00

10 Commits