Best docs Cinnamon will probably ever have (#105)

2023-12-20 11:30:25 +07:00 · 2023-12-20 11:30:25 +07:00 · 230328c62f
commit 230328c62f
parent 0e30dcbb06
40 changed files with 1036 additions and 46 deletions
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -3,6 +3,7 @@ repos:
    rev: v4.3.0
    hooks:
      - id: check-yaml
+        args: ["--unsafe"]
      - id: check-toml
      - id: end-of-file-fixer
      - id: trailing-whitespace
@ -55,3 +56,9 @@ repos:
            "--new-type-inference",
          ]
        exclude: "^templates/"
+  - repo: https://github.com/codespell-project/codespell
+    rev: v2.2.4
+    hooks:
+      - id: codespell
+        additional_dependencies:
+          - tomli
--- a/README.md
+++ b/README.md
@ -3,6 +3,10 @@
 Quick and easy AI components to build Kotaemon - applicable in client
 project.

+## Documentation
+
+https://docs.promptui.dm.cinnamon.is
+
 ## Install

 ```shell
@ -22,7 +26,7 @@ pip install kotaemon@git+ssh://git@github.com/Cinnamon/kotaemon.git

 - Clone the repo

-  ```shel
+  ```shell
  git clone git@github.com:Cinnamon/kotaemon.git
  cd kotaemon
  ```
--- a/doc_env_reqs.txt
+++ b/doc_env_reqs.txt
@ -0,0 +1,7 @@
+mkdocs
+mkdocstrings[python]
+mkdocs-material
+mkdocs-gen-files
+mkdocs-literate-nav
+mkdocs-git-revision-date-localized-plugin
+mkdocs-section-index
--- a/docs/contributing.md
+++ b/docs/contributing.md
@ -0,0 +1,165 @@
+# Getting started
+
+## Setup
+
+- Create conda environment (suggest 3.10)
+
+  ```shell
+  conda create -n kotaemon python=3.10
+  conda activate kotaemon
+  ```
+
+- Clone the repo
+
+  ```shel
+  git clone git@github.com:Cinnamon/kotaemon.git
+  cd kotaemon
+  ```
+
+- Install all
+
+  ```shell
+  pip install -e ".[dev]"
+  ```
+
+- Pre-commit
+
+  ```shell
+  pre-commit install
+  ```
+
+- Test
+
+  ```shell
+  pytest tests
+  ```
+
+## Credential sharing
+
+This repo uses [git-secret](https://sobolevn.me/git-secret/) to share credentials, which
+internally uses `gpg` to encrypt and decrypt secret files.
+
+This repo uses `python-dotenv` to manage credentials stored as environment variable.
+Please note that the use of `python-dotenv` and credentials are for development
+purposes only. Thus, it should not be used in the main source code (i.e. `kotaemon/` and `tests/`), but can be used in `examples/`.
+
+### Install git-secret
+
+Please follow the [official guide](https://sobolevn.me/git-secret/installation) to install git-secret.
+
+For Windows users, see [For Windows users](#for-windows-users).
+
+For users who don't have sudo privilege to install packages, follow the `Manual Installation` in the [official guide](https://sobolevn.me/git-secret/installation) and set `PREFIX` to a path that you have access to. And please don't forget to add `PREFIX` to your `PATH`.
+
+### Gaining access to credientials
+
+In order to gain access to the secret files, you must provide your gpg public file to anyone who has access and ask them to ask your key to the keyring. For a quick tutorial on generating your gpg key pair, you can refer to the `Using gpg` section from the [git-secret main page](https://sobolevn.me/git-secret/).
+
+### Decrypt the secret file
+
+The credentials are encrypted in the `.env.secret` file. To print the decrypted content to stdout, run
+
+```shell
+git-secret cat [filename]
+```
+
+Or to get the decrypted `.env` file, run
+
+```shell
+git-secret reveal [filename]
+```
+
+### For Windows users
+
+git-secret is currently not available for Windows, thus the easiest way is to use it in WSL (please use the latest version of WSL2). From there you have 2 options:
+
+1. Using the gpg of WSL.
+
+   This is the most straight-forward option since you would use WSL just like any other unix environment. However, the downside is that you have to make WSL your main environment, which means WSL must have write permission on your repo. To achieve this, you must either:
+
+   - Clone and store your repo inside WSL's file system.
+   - Provide WSL with necessary permission on your Windows file system. This can be achieve by setting `automount` options for WSL. To do that, add these content to `/etc/wsl.conf` and then restart your sub-system.
+
+     ```shell
+     [automount]
+     options = "metadata,umask=022,fmask=011"
+     ```
+
+     This enables all permissions for user owner.
+
+2. Using the gpg of Windows but with git-secret from WSL.
+
+   For those who use Windows as the main environment, having to switch back and forth between Windows and WSL will be inconvenient. You can instead stay within your Windows environment and apply some tricks to use `git-secret` from WSL.
+
+   - Install and setup `gpg` on Windows.
+   - Install `git-secret` on WSL. Now in Windows, you can invoke `git-secret` using `wsl git-secret`.
+   - Alternatively you can setup alias in CMD to shorten the syntax. Please refer to [this SO answer](https://stackoverflow.com/a/65823225) for the instruction. Some recommended aliases are:
+
+     ```bat
+     @echo off
+
+     :: Commands
+     DOSKEY ls=dir /B $*
+     DOSKEY ll=dir /a $*
+     DOSKEY git-secret=wsl git-secret $*
+     DOSKEY gs=wsl git-secret $*
+     ```
+
+     Now you can invoke `git-secret` in CMD using `git-secret` or `gs`.
+
+     - For Powershell users, similar behaviours can be achieved using `Set-Alias` and `profile.ps1`. Please refer this [SO thread](https://stackoverflow.com/questions/61081434/how-do-i-create-a-permanent-alias-file-in-powershell-core) as an example.
+
+# PR guideline
+
+## Common conventions
+
+- Review should be done as soon as possible (within 2 business days).
+- PR title: [ticket] One-line description (example: [AUR-385, AUR-388] Declare BaseComponent and decide LLM call interface).
+- [Encouraged] Provide a quick description in the PR, so that:
+  - Reviewers can quickly understand the direction of the PR.
+  - It will be included in the commit message when the PR is merged.
+
+## Environment caching on PR
+
+- To speed up CI, environments are cached based on the version specified in `__init__.py`.
+- Since dependencies versions in `setup.py` are not pinned, you need to pump the version in order to use a new environment. That environment will then be cached and used by your subsequence commits within the PR, until you pump the version again
+- The new environment created during your PR is cached and will be available to others once the PR is merged.
+- If you are experimenting with new dependencies and want a fresh environment every time, add `[ignore cache]` in your commit message. The CI will create a fresh environment to run your commit and then discard it.
+- If your PR include updated dependencies, the recommended workflow would be:
+  - Doing development as usual.
+  - When you want to run the CI, push a commit with the message containing `[ignore cache]`.
+  - Once the PR is final, pump the version in `__init__.py` and push a final commit not containing `[ignore cache]`.
+
+Examples: https://github.com/Cinnamon/kotaemon/pull/2
+
+## Merge PR guideline
+
+- Use squash and merge option
+- 1st line message is the PR title.
+- The text area is the PR description.
+
+![image](https://github.com/Cinnamon/kotaemon/assets/35283585/e2593010-d7ef-46e3-8719-6fcae0315b5d)
+![image](https://github.com/Cinnamon/kotaemon/assets/35283585/bfe6a117-85cd-4dd4-b432-197c791a9901)
+
+## Develop pipelines
+
+- Nodes
+- Params
+- Run function
+
+```
+from kotaemon.base import BaseComponent
+
+class Pipeline(BaseComponent):
+   llm: AzureOpenAIEmbedding
+   doc_store: BaseDocumentStore
+
+   def run(self, input1, input2) -> str:
+      input = input1 + input2
+      output = self.llm(input)
+      self.doc_store.add(output)
+      return output
+
+pipeline = Pipeline(llm=AzureOpenAILLM(), doc_store=InMemoryDocstore())
+output = pipeline("this is text1", "this is text2")
+```
--- a/docs/create-a-component.md
+++ b/docs/create-a-component.md
@ -0,0 +1,71 @@
+# Creating a component
+
+A fundamental concept in kotaemon is "component".
+
+Anything that isn't data or data structure is a "component". A component can be
+thought of as a step within a pipeline. It takes in some input, processes it,
+and returns an output, just the same as a Python function! The output will then
+become an input for the next component in a pipeline. In fact, a pipeline is just
+a component. More appropriately, a nested component: a component that makes use of one or more other components in
+the processing step. So in reality, there isn't a difference between a pipeline
+and a component! Because of that, in kotaemon, we will consider them the
+same as "component".
+
+To define a component, you will:
+
+1. Create a class that subclasses from `kotaemon.base.BaseComponent`
+2. Declare init params with type annotation
+3. Declare nodes (nodes are just other components!) with type annotation
+4. Implement the processing logic in `run`.
+
+The syntax of a component is as follow:
+
+```python
+from kotaemon.base import BaseComponent
+from kotaemon.llms import AzureChatOpenAI
+from kotaemon.parsers import RegexExtractor
+
+
+class FancyPipeline(BaseComponent):
+    param1: str = "This is param1"
+    param2: int = 10
+    param3: float
+
+    node1: BaseComponent    # this is a node because of BaseComponent type annotation
+    node2: AzureChatOpenAI  # this is also a node because AzureChatOpenAI subclasses BaseComponent
+    node3: RegexExtractor   # this is also a node bceause RegexExtractor subclasses BaseComponent
+
+    def run(self, some_text: str):
+        prompt = (self.param1 + some_text) * int(self.param2 + self.param3)
+        llm_pred = self.node2(prompt).text
+        matches = self.node3(llm_pred)
+        return matches
+```
+
+Then this component can be used as follow:
+
+```python
+llm = AzureChatOpenAI(endpoint="some-endpont")
+extractor = RegexExtractor(pattern=["yes", "Yes"])
+
+component = FancyPipeline(
+    param1="Hello"
+    param3=1.5
+    node1=llm,
+    node2=llm,
+    node3=extractor
+)
+component("goodbye")
+```
+
+This way, we can define each operation as a reusable component, and use them to
+compose larger reusable components!
+
+## Benefits of component
+
+By defining a component as above, we formally encapsulate all the necessary
+information inside a single class. This introduces several benefits:
+
+1. Allow tools like promptui to inspect the inner working of a component in
+   order to automatically generate the promptui.
+2. Allow visualizing a pipeline for debugging purpose.
--- a/docs/data-components.md
+++ b/docs/data-components.md
@ -0,0 +1,32 @@
+The data & data structure components include:
+
+- The `Document` class.
+- The document store.
+- The vector store.
+
+### Data Loader
+
+- PdfLoader
+- Layout-aware with table parsing PdfLoader
+
+  - MathPixLoader: To use this loader, you need MathPix API key, refer to [mathpix docs](https://docs.mathpix.com/#introduction) for more information
+  - OCRLoader: This loader uses lib-table and Flax pipeline to perform OCR and read table structure from PDF file (TODO: add more info about deployment of this module).
+  - Output:
+
+    - Document: text + metadata to identify whether it is table or not
+
+      ```
+      - "source": source file name
+      - "type": "table" or "text"
+      - "table_origin": original table in markdown format (to be feed to LLM or visualize using external tools)
+      - "page_label": page number in the original PDF document
+      ```
+
+### Document Store
+
+- InMemoryDocumentStore
+
+### Vector Store
+
+- ChromaVectorStore
+- InMemoryVectorStore
--- a/docs/extra/css/code_select.css
+++ b/docs/extra/css/code_select.css
@ -0,0 +1,5 @@
+.language-pycon .gp,
+.language-pycon .go {
+  /* Generic.Prompt,  Generic.Output */
+  user-select: none;
+}
--- a/docs/index.md
+++ b/docs/index.md
@ -0,0 +1 @@
+--8<-- "README.md"
--- a/docs/overview.md
+++ b/docs/overview.md
@ -0,0 +1,75 @@
+## Introduction
+
+`kotaemon` library focuses on the AI building blocks to implement the Kotaemon. It can be used in both client project and in product development. It consists of base interfaces, core components and a list of utilities:
+
+- Base interfaces: `kotaemon` defines the base interface of a component in a pipeline. A pipeline is also a component. By clearly define this interface, a pipeline of steps can be easily constructed and orchestrated.
+- Core components: `kotaemon` implements (or wraps 3rd-party libraries like Langchain, llama-index,... when possible) commonly used components in kotaemon use cases. Some of these components are: LLM, vector store, document store, retriever... For a detailed list and description of these components, please refer to the [Pipeline Components](Pipeline-Components) and [Data & Data Structure Components](Data-&-Data-Structure-Components) sections.
+- List of utilities: `lib-knowledge` provides utilities and tools that are usually needed in client project. For example, it provides a prompt engineering UI for AI developers in a project to quickly create a prompt engineering tool for DMs and QALs. It also provides a command to quickly spin up a project code base. For a full list and description of these utilities, please refer to the [Utilities](Utilities) section.
+
+```mermaid
+mindmap
+  root((kotaemon))
+    Base Interfaces
+      Document
+      LLMInterface
+      RetrievedDocument
+      BaseEmbeddings
+      BaseChat
+      BaseCompletion
+      ...
+    Core Components
+      LLMs
+        AzureOpenAI
+        OpenAI
+      Embeddings
+        AzureOpenAI
+        OpenAI
+        HuggingFaceEmbedding
+      VectorStore
+        InMemoryVectorstore
+        ChromaVectorstore
+      Agent
+      Tool
+      DocumentStore
+      ...
+    Utilities
+      Scaffold project
+      PromptUI
+      Documentation Support
+```
+
+## Expected benefit
+
+Before `kotaemon`:
+
+- Starting everything from scratch.
+- Knowledge and expertise is fragmented.
+- Nothing to reuse.
+- No way to collaborate between tech and non-tech experts.
+
+`kotaemon` expects to completely revolutionize the way we are building LLM-related projects. It helps the company side-steps those issues by:
+
+- Standardize the interface to (1) make building LLM pipeline clearer (2) more reasonable to integrate pipelines between different projects.
+- Centralize LLM-related technical components into 1 place. Avoid fragmented technology development. Easy to find the LLM-related technology inside the company.
+- Centralize bug fixes and improvements in 1 place.
+- Reduce boilerplate code during project development.
+- Lightning fast prototyping.
+
+## Install
+
+The kotaemon can be installed from source with:
+
+```
+pip install kotaemon@git+ssh://git@github.com/Cinnamon/kotaemon.git
+```
+
+or from Cinnamon's internal python package index:
+
+```
+pip install kotaemon --extra-index-url https://ian_devpi.promptui.dm.cinnamon.is/root/packages
+```
+
+## Example use cases
+
+- Start a project from scratch: `kh start-project`
+- Run prompt engineering UI tool: `kh promptui export`, then `kh promptui run`.
--- a/docs/scripts/generate_examples_docs.py
+++ b/docs/scripts/generate_examples_docs.py
@ -0,0 +1,65 @@
+# import shutil
+from pathlib import Path
+from typing import Any, Iterable
+
+import mkdocs_gen_files
+
+# get the root source code directory
+doc_dir_name = "docs"
+doc_dir = Path(__file__)
+while doc_dir.name != doc_dir_name and doc_dir != doc_dir.parent:
+    doc_dir = doc_dir.parent
+
+if doc_dir == doc_dir.parent:
+    raise ValueError(f"root_name ({doc_dir_name}) not in path ({str(Path(__file__))}).")
+
+
+def generate_docs_for_examples_readme(
+    examples_dir: Path, target_doc_folder: str, ignored_modules: Iterable[Any] = []
+):
+    if not examples_dir.is_dir():
+        raise ModuleNotFoundError(str(examples_dir))
+
+    nav = mkdocs_gen_files.Nav()
+
+    for path in sorted(examples_dir.rglob("*README.md")):
+        # ignore modules with name starts with underscore (i.e. __init__)
+        if path.name.startswith("_") or path.name.startswith("test"):
+            continue
+
+        module_path = path.parent.relative_to(examples_dir).with_suffix("")
+        doc_path = path.parent.relative_to(examples_dir).with_suffix(".md")
+        full_doc_path = Path(target_doc_folder, doc_path)
+
+        parts = list(module_path.parts)
+        identifier = ".".join(parts)
+
+        if "tests" in parts:
+            continue
+
+        ignore = False
+        for each_module in ignored_modules:
+            if identifier.startswith(each_module):
+                ignore = True
+                break
+        if ignore:
+            continue
+
+        nav_titles = [name.replace("_", " ").title() for name in parts]
+        nav[nav_titles] = doc_path.as_posix()
+
+        with mkdocs_gen_files.open(full_doc_path, "w") as f:
+            f.write(f'--8<-- "{path.relative_to(examples_dir.parent)}"')
+
+        mkdocs_gen_files.set_edit_path(
+            full_doc_path, Path("..") / path.relative_to(examples_dir.parent)
+        )
+
+    with mkdocs_gen_files.open(f"{target_doc_folder}/NAV.md", "w") as nav_file:
+        nav_file.writelines(nav.build_literate_nav())
+
+
+generate_docs_for_examples_readme(
+    examples_dir=doc_dir.parent / "examples",
+    target_doc_folder="examples",
+)
--- a/docs/scripts/generate_reference_docs.py
+++ b/docs/scripts/generate_reference_docs.py
@ -0,0 +1,75 @@
+# import shutil
+from pathlib import Path
+from typing import Any, Iterable
+
+import mkdocs_gen_files
+
+# get the root source code directory
+doc_dir_name = "docs"
+doc_dir = Path(__file__)
+while doc_dir.name != doc_dir_name and doc_dir != doc_dir.parent:
+    doc_dir = doc_dir.parent
+
+if doc_dir == doc_dir.parent:
+    raise ValueError(f"root_name ({doc_dir_name}) not in path ({str(Path(__file__))}).")
+
+
+def generate_docs_for_src_code(
+    code_dir: Path, target_doc_folder: str, ignored_modules: Iterable[Any] = []
+):
+    if not code_dir.is_dir():
+        raise ModuleNotFoundError(str(code_dir))
+
+    nav = mkdocs_gen_files.Nav()
+
+    for path in sorted(code_dir.rglob("*.py")):
+        # ignore modules with name starts with underscore (i.e. __init__)
+        # if path.name.startswith("_") or path.name.startswith("test"):
+        #     continue
+
+        module_path = path.relative_to(code_dir).with_suffix("")
+        doc_path = path.relative_to(code_dir).with_suffix(".md")
+        full_doc_path = Path(target_doc_folder, doc_path)
+
+        parts = list(module_path.parts)
+
+        if parts[-1] == "__init__":
+            doc_path = doc_path.with_name("index.md")
+            full_doc_path = full_doc_path.with_name("index.md")
+            parts.pop()
+
+        if not parts:
+            continue
+
+        if "tests" in parts:
+            continue
+
+        identifier = ".".join(parts)
+        ignore = False
+        for each_module in ignored_modules:
+            if identifier.startswith(each_module):
+                ignore = True
+                break
+        if ignore:
+            continue
+
+        nav_titles = [name.replace("_", " ").title() for name in parts]
+        nav[nav_titles] = doc_path.as_posix()
+
+        with mkdocs_gen_files.open(full_doc_path, "w") as f:
+            f.write(f"::: {identifier}")
+
+        # this method works in docs folder
+        mkdocs_gen_files.set_edit_path(
+            full_doc_path, Path("..") / path.relative_to(code_dir.parent)
+        )
+
+    with mkdocs_gen_files.open(f"{target_doc_folder}/NAV.md", "w") as nav_file:
+        nav_file.writelines(nav.build_literate_nav())
+
+
+generate_docs_for_src_code(
+    code_dir=doc_dir.parent / "kotaemon",
+    target_doc_folder="reference",
+    ignored_modules={"contribs"},
+)
--- a/docs/theme/assets/pymdownx-extras/extra-fb5a2a1c86.css
+++ b/docs/theme/assets/pymdownx-extras/extra-fb5a2a1c86.css
--- a/docs/theme/assets/pymdownx-extras/extra-fb5a2a1c86.css.map
+++ b/docs/theme/assets/pymdownx-extras/extra-fb5a2a1c86.css.map
--- a/docs/theme/assets/pymdownx-extras/extra-loader-MCFnu0Wd.js
+++ b/docs/theme/assets/pymdownx-extras/extra-loader-MCFnu0Wd.js
--- a/docs/theme/assets/pymdownx-extras/extra-loader-MCFnu0Wd.js.map
+++ b/docs/theme/assets/pymdownx-extras/extra-loader-MCFnu0Wd.js.map
--- a/docs/theme/assets/pymdownx-extras/material-extra-3rdparty-E-i8w1WA.js
+++ b/docs/theme/assets/pymdownx-extras/material-extra-3rdparty-E-i8w1WA.js
--- a/docs/theme/assets/pymdownx-extras/material-extra-3rdparty-E-i8w1WA.js.map
+++ b/docs/theme/assets/pymdownx-extras/material-extra-3rdparty-E-i8w1WA.js.map
--- a/docs/theme/assets/pymdownx-extras/material-extra-theme-TVq-kNRT.js
+++ b/docs/theme/assets/pymdownx-extras/material-extra-theme-TVq-kNRT.js
@ -0,0 +1,2 @@
+!function(){"use strict";var e;e=function(e){"true"===localStorage.getItem("data-md-prefers-color-scheme")&&document.querySelector("body").setAttribute("data-md-color-scheme",e.matches?"dracula":"default")},new MutationObserver((function(t){t.forEach((function(t){if("childList"===t.type&&t.addedNodes.length)for(var a=0;a<t.addedNodes.length;a++){var r=t.addedNodes[a];if(1===r.nodeType&&"body"===r.tagName.toLowerCase()){d=r,o=void 0,c=void 0,l=void 0,o="not all"!==window.matchMedia("(prefers-color-scheme)").media,c=localStorage.getItem("data-md-color-scheme"),l=localStorage.getItem("data-md-prefers-color-scheme"),c||(c="dracula"),l||(l="false"),"true"===l&&o?c=window.matchMedia("(prefers-color-scheme: dark)").matches?"dracula":"default":l="false",d.setAttribute("data-md-prefers-color-scheme",l),d.setAttribute("data-md-color-scheme",c),o&&window.matchMedia("(prefers-color-scheme: dark)").addListener(e);break}}var d,o,c,l}))})).observe(document.querySelector("html"),{childList:!0}),window.toggleScheme=function(){var e=document.querySelector("body"),t="not all"!==window.matchMedia("(prefers-color-scheme)").media,a=e.getAttribute("data-md-color-scheme"),r=e.getAttribute("data-md-prefers-color-scheme");t&&"default"===a&&"true"!==r?(r="true",a=window.matchMedia("(prefers-color-scheme: dark)").matches?"dracula":"default"):t&&"true"===r?(r="false",a="dracula"):"dracula"===a?(r="false",a="default"):(r="false",a="dracula"),localStorage.setItem("data-md-prefers-color-scheme",r),e.setAttribute("data-md-prefers-color-scheme",r),e.setAttribute("data-md-color-scheme",a)}}();
+//# sourceMappingURL=material-extra-theme-TVq-kNRT.js.map
--- a/docs/theme/assets/pymdownx-extras/material-extra-theme-TVq-kNRT.js.map
+++ b/docs/theme/assets/pymdownx-extras/material-extra-theme-TVq-kNRT.js.map
@ -0,0 +1 @@
+{"version":3,"file":"material-extra-theme-TVq-kNRT.js","sources":["material-extra-theme.js"],"sourcesContent":["(() => {\n\n  const preferToggle = e => {\n    if (localStorage.getItem(\"data-md-prefers-color-scheme\") === \"true\") {\n      document.querySelector(\"body\").setAttribute(\"data-md-color-scheme\", (e.matches) ? \"dracula\" : \"default\")\n    }\n  }\n\n  const setupTheme = body => {\n    const preferSupported = window.matchMedia(\"(prefers-color-scheme)\").media !== \"not all\"\n    let scheme = localStorage.getItem(\"data-md-color-scheme\")\n    let prefers = localStorage.getItem(\"data-md-prefers-color-scheme\")\n\n    if (!scheme) {\n      scheme = \"dracula\"\n    }\n    if (!prefers) {\n      prefers = \"false\"\n    }\n\n    if (prefers === \"true\" && preferSupported) {\n      scheme = (window.matchMedia(\"(prefers-color-scheme: dark)\").matches) ? \"dracula\" : \"default\"\n    } else {\n      prefers = \"false\"\n    }\n\n    body.setAttribute(\"data-md-prefers-color-scheme\", prefers)\n    body.setAttribute(\"data-md-color-scheme\", scheme)\n\n    if (preferSupported) {\n      const matchListener = window.matchMedia(\"(prefers-color-scheme: dark)\")\n      matchListener.addListener(preferToggle)\n    }\n  }\n\n  const observer = new MutationObserver(mutations => {\n    mutations.forEach(mutation => {\n      if (mutation.type === \"childList\") {\n        if (mutation.addedNodes.length) {\n          for (let i = 0; i < mutation.addedNodes.length; i++) {\n            const el = mutation.addedNodes[i]\n\n            if (el.nodeType === 1 && el.tagName.toLowerCase() === \"body\") {\n              setupTheme(el)\n              break\n            }\n          }\n        }\n      }\n    })\n  })\n\n  observer.observe(document.querySelector(\"html\"), {childList: true})\n})()\n\nwindow.toggleScheme = () => {\n  const body = document.querySelector(\"body\")\n  const preferSupported = window.matchMedia(\"(prefers-color-scheme)\").media !== \"not all\"\n  let scheme = body.getAttribute(\"data-md-color-scheme\")\n  let prefer = body.getAttribute(\"data-md-prefers-color-scheme\")\n\n  if (preferSupported && scheme === \"default\" && prefer !== \"true\") {\n    prefer = \"true\"\n    scheme = (window.matchMedia(\"(prefers-color-scheme: dark)\").matches) ? \"dracula\" : \"default\"\n  } else if (preferSupported && prefer === \"true\") {\n    prefer = \"false\"\n    scheme = \"dracula\"\n  } else if (scheme === \"dracula\") {\n    prefer = \"false\"\n    scheme = \"default\"\n  } else {\n    prefer = \"false\"\n    scheme = \"dracula\"\n  }\n  localStorage.setItem(\"data-md-prefers-color-scheme\", prefer)\n  body.setAttribute(\"data-md-prefers-color-scheme\", prefer)\n  body.setAttribute(\"data-md-color-scheme\", scheme)\n}\n"],"names":["preferToggle","e","localStorage","getItem","document","querySelector","setAttribute","matches","MutationObserver","mutations","forEach","mutation","type","addedNodes","length","i","el","nodeType","tagName","toLowerCase","body","preferSupported","scheme","prefers","window","matchMedia","media","addListener","observe","childList","toggleScheme","getAttribute","prefer","setItem"],"mappings":"yBAAA,IAEQA,IAAe,SAAAC,GAC0C,SAAzDC,aAAaC,QAAQ,iCACvBC,SAASC,cAAc,QAAQC,aAAa,uBAAyBL,EAAEM,QAAW,UAAY,YA+BjF,IAAIC,kBAAiB,SAAAC,GACpCA,EAAUC,SAAQ,SAAAC,GAChB,GAAsB,cAAlBA,EAASC,MACPD,EAASE,WAAWC,OACtB,IAAK,IAAIC,EAAI,EAAGA,EAAIJ,EAASE,WAAWC,OAAQC,IAAK,CACnD,IAAMC,EAAKL,EAASE,WAAWE,GAE/B,GAAoB,IAAhBC,EAAGC,UAA+C,SAA7BD,EAAGE,QAAQC,cAA0B,CAlCrDC,EAmCIJ,EAlCfK,SACFC,SACAC,SAFEF,EAAwE,YAAtDG,OAAOC,WAAW,0BAA0BC,MAChEJ,EAASpB,aAAaC,QAAQ,wBAC9BoB,EAAUrB,aAAaC,QAAQ,gCAE9BmB,IACHA,EAAS,WAENC,IACHA,EAAU,SAGI,SAAZA,GAAsBF,EACxBC,EAAUE,OAAOC,WAAW,gCAAgClB,QAAW,UAAY,UAEnFgB,EAAU,QAGZH,EAAKd,aAAa,+BAAgCiB,GAClDH,EAAKd,aAAa,uBAAwBgB,GAEtCD,GACoBG,OAAOC,WAAW,gCAC1BE,YAAY3B,GAalB,KACF,CACF,CAtCW,IAAAoB,EACXC,EACFC,EACAC,CAsCJ,GACF,IAESK,QAAQxB,SAASC,cAAc,QAAS,CAACwB,WAAW,IAG/DL,OAAOM,aAAe,WACpB,IAAMV,EAAOhB,SAASC,cAAc,QAC9BgB,EAAwE,YAAtDG,OAAOC,WAAW,0BAA0BC,MAChEJ,EAASF,EAAKW,aAAa,wBAC3BC,EAASZ,EAAKW,aAAa,gCAE3BV,GAA8B,YAAXC,GAAmC,SAAXU,GAC7CA,EAAS,OACTV,EAAUE,OAAOC,WAAW,gCAAgClB,QAAW,UAAY,WAC1Ec,GAA8B,SAAXW,GAC5BA,EAAS,QACTV,EAAS,WACW,YAAXA,GACTU,EAAS,QACTV,EAAS,YAETU,EAAS,QACTV,EAAS,WAEXpB,aAAa+B,QAAQ,+BAAgCD,GACrDZ,EAAKd,aAAa,+BAAgC0B,GAClDZ,EAAKd,aAAa,uBAAwBgB,EAC5C"}
--- a/docs/theme/main.html
+++ b/docs/theme/main.html
@ -0,0 +1,6 @@
+{% extends "base.html" %}
+
+{% block libs %}
+{{ super() }}
+{% include "partials/libs.html" ignore missing %}
+{% endblock %}
--- a/docs/theme/partials/footer.html
+++ b/docs/theme/partials/footer.html
@ -0,0 +1,49 @@
+
+{% import "partials/language.html" as lang with context %}
+<footer class="md-footer">
+  {% if page.previous_page or page.next_page %}
+    <nav
+      class="md-footer__inner md-grid"
+      aria-label="{{ lang.t('footer.title') }}"
+    >
+      {% if page.previous_page %}
+        <a
+          href="{{ page.previous_page.url | url }}"
+          class="md-footer__link md-footer__link--prev"
+          rel="prev"
+        >
+          <div class="md-footer__button md-icon">
+            {% include ".icons/material/arrow-left.svg" %}
+          </div>
+          <div class="md-footer__title">
+            <div class="md-ellipsis">
+              <span class="md-footer__direction">
+                {{ lang.t("footer.previous") }}
+              </span>
+              {{ page.previous_page.title }}
+            </div>
+          </div>
+        </a>
+      {% endif %}
+      {% if page.next_page %}
+        <a
+          href="{{ page.next_page.url | url }}"
+          class="md-footer__link md-footer__link--next"
+          rel="next"
+        >
+          <div class="md-footer__title">
+            <div class="md-ellipsis">
+              <span class="md-footer__direction">
+                {{ lang.t("footer.next") }}
+              </span>
+              {{ page.next_page.title }}
+            </div>
+          </div>
+          <div class="md-footer__button md-icon">
+            {% include ".icons/material/arrow-right.svg" %}
+          </div>
+        </a>
+      {% endif %}
+    </nav>
+  {% endif %}
+</footer>
--- a/docs/theme/partials/header.html
+++ b/docs/theme/partials/header.html
@ -0,0 +1,88 @@
+
+{% set site_url = config.site_url | d(nav.homepage.url, true) | url %}
+{% if not config.use_directory_urls and site_url[0] == site_url[-1] == "." %}
+  {% set site_url = site_url ~ "/index.html" %}
+{% endif %}
+<header class="md-header" data-md-component="header">
+  <nav
+    class="md-header__inner md-grid"
+    aria-label="{{ lang.t('header.title') }}"
+  >
+    <a
+      href="{{ site_url }}"
+      title="{{ config.site_name | e }}"
+      class="md-header__button md-logo"
+      aria-label="{{ config.site_name }}"
+    >
+      {% include "partials/logo.html" %}
+    </a>
+    <label class="md-header__button md-icon" for="__drawer">
+      {% include ".icons/material/menu" ~ ".svg" %}
+    </label>
+    <div class="md-header__title" data-md-component="header-title">
+      <div class="md-header__ellipsis">
+        <div class="md-header__topic">
+          <span class="md-ellipsis">
+            {{ config.site_name }}
+          </span>
+        </div>
+        <div class="md-header__topic" data-md-component="header-topic">
+          <span class="md-ellipsis">
+            {% if page and page.meta and page.meta.title %}
+              {{ page.meta.title }}
+            {% else %}
+              {{ page.title }}
+            {% endif %}
+          </span>
+        </div>
+      </div>
+    </div>
+    <div class="md-header__options">
+      <div class="md-header-nav__scheme md-header-nav__button md-source__icon md-icon">
+          <a
+            href="javascript:toggleScheme();"
+            title="Light mode"
+            class="light-mode"
+          >
+          {% set icon = "material/weather-sunny" %}
+          {% include ".icons/" ~ icon ~ ".svg" %}
+          </a>
+          <a
+            href="javascript:toggleScheme();"
+            title="Dark mode"
+            class="dark-mode"
+          >
+          {% set icon = "material/weather-night" %}
+          {% include ".icons/" ~ icon ~ ".svg" %}
+          </a>
+          <a
+            href="javascript:toggleScheme();"
+            title="System preference"
+            class="system-mode"
+          >
+          {% set icon = "material/theme-light-dark" %}
+          {% include ".icons/" ~ icon ~ ".svg" %}
+          </a>
+          <!-- <a
+            href="javascript:toggleScheme();"
+            title="Unknown scheme"
+            class="unknown-mode"
+          >
+          {% set icon = "material/help-circle" %}
+          {% include ".icons/" ~ icon ~ ".svg" %}
+          </a> -->
+      </div>
+    </div>
+    {% if "material/search" in config.plugins %}
+      <label class="md-header__button md-icon" for="__search">
+        {% include ".icons/material/magnify.svg" %}
+      </label>
+      {% include "partials/search.html" %}
+    {% endif %}
+    {% if config.repo_url %}
+      <div class="md-header__source">
+        {% include "partials/source.html" %}
+      </div>
+    {% endif %}
+  </nav>
+</header>
--- a/docs/theme/partials/libs.html
+++ b/docs/theme/partials/libs.html
@ -0,0 +1,2 @@
+<script src="{{ 'assets/pymdownx-extras/material-extra-theme-TVq-kNRT.js' | url }}" type="text/javascript"></script>
+<script src="{{ 'assets/pymdownx-extras/material-extra-3rdparty-E-i8w1WA.js' | url }}" type="text/javascript"></script>
--- a/docs/ultilities.md
+++ b/docs/ultilities.md
@ -0,0 +1,172 @@
+Utilities detail can be referred in the sub-pages of this section.
+
+## Prompt engineering UI
+
+![chat-ui](https://github.com/Cinnamon/kotaemon/assets/35283585/ac8f9aac-d853-4571-a48b-d866a99eaf3e)
+
+**_Important:_** despite the name prompt engineering UI, this tool allows DMs to test any kind of parameters that are exposed by AIRs. Prompt is one kind of param. There can be other type of params that DMs can tweak (e.g. top_k, temperature...).
+
+**_Note:_** For hands-on examination of how to use prompt engineering UI, refer `./examples/promptui` and `./examples/example2/`
+
+In client projects, AI developers typically build the pipeline. However, for LLM projects requiring Japanese and domain expertise in prompt creation, non-technical team members (DM, BizDev, and QALs) can be more effective. To facilitate this, "xxx" offers a user-friendly prompt engineering UI that AI developers integrate into their pipelines. This enables non-technical members to adjust prompts and parameters, run experiments, and export results for optimization.
+
+As of Sept 2023, there are 2 kinds of prompt engineering UI:
+
+- Simple pipeline: run one-way from start to finish.
+- Chat pipeline: interactive back-and-forth.
+
+### Simple pipeline
+
+For simple pipeline, the supported client project workflow looks as follow:
+
+1. [AIR] Build pipeline
+2. [AIR] Export pipeline to config: `$ kh promptui export <module.path.piplineclass> --output <path/to/config/file.yml>`
+3. [AIR] Customize the config
+4. [AIR] Spin up prompt engineering UI: `$ kh promptui run <path/to/config/file.yml>`
+5. [DM] Change params, run inference
+6. [DM] Export to Excel
+7. [DM] Select the set of params that achieve the best output
+
+The prompt engineering UI prominently involves from step 2 to step 7 (step 1 is normal AI tasks in project, while step 7 happens exclusively in Excel file).
+
+#### Step 2 - Export pipeline to config
+
+Command:
+
+```
+$ kh promptui export <module.path.piplineclass> --output <path/to/config/file.yml>
+```
+
+where:
+
+- `<module.path.pipelineclass>` is a dot-separated path to the pipeline. For example, if your pipeline can be accessed with `from projectA.pipelines import AnsweringPipeline`, then this value is `projectA.pipelines.AnswerPipeline`.
+- `<path/to/config/file.yml>` is the target file path that the config will be exported to. If the config file already exists, and contains information of other pipelines, the config of current pipeline will additionally be added. If it contains information of the current pipeline (in the past), the old information will be replaced.
+
+By default, all params in a pipeline (including nested params) will be export to the configuration file. For params that you do not wish to expose to the UI, you can directly remove them from the config YAML file. You can also annotate those param with `ignore_ui=True`, and they will be ignored in the config generation process. Example:
+
+```python
+class Pipeline(BaseComponent):
+    param1: str = Param(default="hello")
+    param2: str = Param(default="goodbye", ignore_ui=True)
+```
+
+Declared as above, and `param1` will show up in the config YAML file, while `param2` will not.
+
+#### Step 3 - Customize the config
+
+AIR can further edit the config file in this step to get the most suitable UI (step 4) with their tasks. The exported config will have this overall schema:
+
+```
+<module.path.pipelineclass1>:
+  params:
+    ... (Detail param information to initiate a pipeline. This corresponds to the pipeline init parameters.)
+  inputs:
+    ... (Detail the input of the pipeline e.g. a text prompt, an FNOL... This corresponds to the params of `run(...)` method.)
+  outputs:
+    ... (Detail the output of the pipeline e.g. prediction, accuracy... This is the output information we wish to see in the UI.)
+  logs:
+    ... (Detail what information should show up in the log.)
+```
+
+##### Input and params
+
+The inputs section have the overall schema as follow:
+
+```
+inputs:
+  <input-variable-name-1>:
+    component: <supported-UI-component>
+    params: # this section is optional)
+      value: <default-value>
+  <input-variable-name-2>:
+    ... # similar to above
+params:
+  <param-variable-name-1>:
+    ... # similar to those in the inputs
+```
+
+The list of supported prompt UI and their corresponding gradio UI components:
+
+```
+COMPONENTS_CLASS = {
+    "text": gr.components.Textbox,
+    "checkbox": gr.components.CheckboxGroup,
+    "dropdown": gr.components.Dropdown,
+    "file": gr.components.File,
+    "image": gr.components.Image,
+    "number": gr.components.Number,
+    "radio": gr.components.Radio,
+    "slider": gr.components.Slider,
+}
+```
+
+##### Outputs
+
+The outputs are a list of variables that we wish to show in the UI. Since in Python, the function output doesn't have variable name, so output declaration is a little bit different than input and param declaration:
+
+```
+outputs:
+  - component: <supported-UI-component>
+    step: <name-of-pipeline-step>
+    item: <jsonpath way to retrieve the info>
+  - ... # similar to above
+```
+
+where:
+
+- component: the same text string and corresponding Gradio UI as in inputs & params
+- step: the pipeline step that we wish to look fetch and show output on the UI
+- item: the jsonpath mechanism to get the targeted variable from the step above
+
+##### Logs
+
+The logs show a list of sheetname and how to retrieve the desired information.
+
+```
+logs:
+  <logname>:
+    inputs:
+      - name: <column name>
+        step: <the pipeline step that we would wish to see the input>
+        variable: <the variable in the step>
+      - ...
+    outputs:
+      - name: <column name>
+        step: <the pipeline step that we would wish to see the output>
+        item: <how to retrieve the output of that step>
+```
+
+#### Step 4 + 5 - Spin up prompt engineering UI + Perform prompt engineering
+
+Command:
+
+```
+$ kh promptui run <path/to/config/file.yml>
+```
+
+This will generate an UI as follow:
+
+![Screenshot from 2023-09-20 12-20-31](https://github.com/Cinnamon/kotaemon/assets/35283585/9ac1b95a-b667-42e7-b318-98a1b805d6df)
+
+where:
+
+- The tabs at the top of the UI corresponds to the pipeline to do prompt engineering.
+- The inputs and params tabs allow users to edit (these corresponds to the inputs and params in the config file).
+- The outputs panel holds the UI elements to show the outputs defined in config file.
+- The Run button: will execute pipeline with the supplied inputs and params, and render result in the outputs panel.
+- The Export button: will export the logs of all the run to an Excel files users to inspect for best set of params.
+
+#### Step 6 - Export to Excel
+
+Upon clicking export, the users can download Excel file.
+
+### Chat pipeline
+
+Chat pipeline workflow is different from simple pipeline workflow. In simple pipeline, each Run creates a set of output, input and params for users to compare. In chat pipeline, each Run is not a one-off run, but a long interactive session. Hence, the workflow is as follow:
+
+1. Set the desired parameters.
+2. Click "New chat" to start a chat session with the supplied parameters. This set of parameters will persist until the end of the chat session. During an ongoing chat session, changing the parameters will not take any effect.
+3. Chat and interact with the chat bot on the right panel. You can add any additional input (if any), and they will be supplied to the chatbot.
+4. During chat, the log of the chat will show up in the "Output" tabs. This is empty by default, so if you want to show the log here, tell the AI developers to configure the UI settings.
+5. When finishing chat, select your preference in the radio box. Click "End chat". This will save the chat log and the preference to disk.
+6. To compare the result of different run, click "Export" to get an Excel spreadsheet summary of different run.
--- a/docs/upload-package.md
+++ b/docs/upload-package.md
@ -0,0 +1,24 @@
+Devpi server endpoint (subjected to change): https://ian_devpi.promptui.dm.cinnamon.is/root/packages
+
+Install devpi-client
+
+```bash
+pip install devpi-client
+```
+
+Login to the server
+
+```bash
+devpi use <server endpoint> # set server endpoint provided above
+devpi login <user name> --password=<your password> # login
+```
+
+If you don't yet have an account, please contact Ian or John.
+
+Upload your package
+
+```bash
+devpi use <package name>\dev # choose the index to upload your package
+cd <your package directory which must contain a pyproject.toml/setup.py>
+devpi upload
+```
--- a/knowledgehub/agents/langchain_based.py
+++ b/knowledgehub/agents/langchain_based.py
@ -67,7 +67,7 @@ class LangchainAgent(BaseAgent):
    def run(self, instruction: str) -> AgentOutput:
        assert (
            self.agent is not None
-        ), "Lanchain AgentExecutor is not correclty initialized"
+        ), "Lanchain AgentExecutor is not correctly initialized"

        # Langchain AgentExecutor call
        output = self.agent(instruction)["output"]
--- a/knowledgehub/base/component.py
+++ b/knowledgehub/base/component.py
@ -6,16 +6,16 @@ from kotaemon.base.schema import Document


 class BaseComponent(Function):
-    """A component is a class that can be used to compose a pipeline
+    """A component is a class that can be used to compose a pipeline.

-    Benefits of component:
+    !!! tip "Benefits of component"
        - Auto caching, logging
        - Allow deployment

-    For each component, the spirit is:
+    !!! tip "For each component, the spirit is"
        - Tolerate multiple input types, e.g. str, Document, List[str], List[Document]
        - Enforce single output type. Hence, the output type of a component should be
-        as generic as possible.
+    as generic as possible.
    """

    inflow = None
--- a/knowledgehub/base/schema.py
+++ b/knowledgehub/base/schema.py
@ -22,6 +22,9 @@ class Document(BaseDocument):
    This class accept one positional argument `content` of an arbitrary type, which will
        store the raw content of the document. If specified, the class will use
        `content` to initialize the base llama_index class.
+
+    Args:
+        content: the raw content of the document.
    """

    content: Any
@ -99,7 +102,7 @@ class RetrievedDocument(Document):
    """Subclass of Document with retrieval-related information

    Attributes:
-         score (float): score of the document (from 0.0 to 1.0)
+        score (float): score of the document (from 0.0 to 1.0)
        retrieval_metadata (dict): metadata from the retrieval process, can be used
            by different components in a retrieved pipeline to communicate with each
            other
--- a/knowledgehub/llms/init.py
+++ b/knowledgehub/llms/init.py
@ -4,6 +4,7 @@ from .base import BaseLLM
 from .branching import GatedBranchingPipeline, SimpleBranchingPipeline
 from .chats import AzureChatOpenAI, ChatLLM
 from .completions import LLM, AzureOpenAI, OpenAI
+from .cot import ManualSequentialChainOfThought, Thought
 from .linear import GatedLinearPipeline, SimpleLinearPipeline
 from .prompts import BasePromptComponent, PromptTemplate

@ -28,4 +29,7 @@ __all__ = [
    "GatedLinearPipeline",
    "SimpleBranchingPipeline",
    "GatedBranchingPipeline",
+    # chain-of-thoughts
+    "ManualSequentialChainOfThought",
+    "Thought",
 ]
--- a/knowledgehub/llms/branching.py
+++ b/knowledgehub/llms/branching.py
@ -12,7 +12,8 @@ class SimpleBranchingPipeline(BaseComponent):
    Attributes:
        branches (List[BaseComponent]): The list of branches to be executed.

-    Example Usage:
+    Example:
+        ```python
        from kotaemon.llms import (
            AzureChatOpenAI,
            BasePromptComponent,
@ -45,6 +46,7 @@ class SimpleBranchingPipeline(BaseComponent):
        print(pipeline(condition_text="1"))
        print(pipeline(condition_text="2"))
        print(pipeline(condition_text="12"))
+        ```
    """

    branches: List[BaseComponent] = Param(default_callback=lambda *_: [])
@ -87,7 +89,8 @@ class GatedBranchingPipeline(SimpleBranchingPipeline):
    Attributes:
        branches (List[BaseComponent]): The list of branches to be executed.

-    Example Usage:
+    Example:
+        ```python
        from kotaemon.llms import (
            AzureChatOpenAI,
            BasePromptComponent,
@ -119,6 +122,7 @@ class GatedBranchingPipeline(SimpleBranchingPipeline):
            )
        print(pipeline(condition_text="1"))
        print(pipeline(condition_text="2"))
+        ```
    """

    def run(self, *, condition_text: Optional[str] = None, **prompt_kwargs):
@ -135,7 +139,7 @@ class GatedBranchingPipeline(SimpleBranchingPipeline):
            Union[OutputType, None]: The output of the first branch that satisfies the
            condition, or None if no branch satisfies the condition.

-        Raise:
+        Raises:
            ValueError: If condition_text is None
        """
        if condition_text is None:
--- a/knowledgehub/llms/cot.py
+++ b/knowledgehub/llms/cot.py
@ -1,7 +1,9 @@
 from copy import deepcopy
 from typing import Callable, List

-from kotaemon.base import BaseComponent, Document, Node, Param
+from theflow import Function, Node, Param
+
+from kotaemon.base import BaseComponent, Document

 from .chats import AzureChatOpenAI
 from .completions import LLM
@ -66,13 +68,13 @@ class Thought(BaseComponent):

    prompt: str = Param(
        help=(
-            "The prompt template string. This prompt template has Python-like "
-            "variable placeholders, that then will be subsituted with real values when "
-            "this component is executed"
+            "The prompt template string. This prompt template has Python-like variable"
+            " placeholders, that then will be substituted with real values when this"
+            " component is executed"
        )
    )
    llm: LLM = Node(AzureChatOpenAI, help="The LLM model to execute the input prompt")
-    post_process: BaseComponent = Node(
+    post_process: Function = Node(
        help=(
            "The function post-processor that post-processes LLM output prediction ."
            "It should take a string as input (this is the LLM output text) and return "
@ -83,7 +85,7 @@ class Thought(BaseComponent):
    @Node.auto(depends_on="prompt")
    def prompt_template(self):
        """Automatically wrap around param prompt. Can ignore"""
-        return BasePromptComponent(template=self.prompt)
+        return BasePromptComponent(self.prompt)

    def run(self, **kwargs) -> Document:
        """Run the chain of thought"""
@ -113,20 +115,19 @@ class ManualSequentialChainOfThought(BaseComponent):

    **Create and run a chain of thought without "+" operator:**

-    ```python
-    >> from kotaemon.pipelines.cot import Thought, ManualSequentialChainOfThought
-
-    >> llm = AzureChatOpenAI(...)
-    >> thought1 = Thought(
-           prompt="Word {word} in {language} is ",
-           post_process=lambda string: {"translated": string},
-       )
-    >> thought2 = Thought(
-            prompt="Translate {translated} to Japanese",
-            post_process=lambda string: {"output": string},
-       )
-    >> thought = ManualSequentialChainOfThought(thoughts=[thought1, thought2], llm=llm)
-    >> thought(word="hello", language="French")
+    ```pycon
+    >>> from kotaemon.pipelines.cot import Thought, ManualSequentialChainOfThought
+    >>> llm = AzureChatOpenAI(...)
+    >>> thought1 = Thought(
+    >>>    prompt="Word {word} in {language} is ",
+    >>>    post_process=lambda string: {"translated": string},
+    >>> )
+    >>> thought2 = Thought(
+    >>>     prompt="Translate {translated} to Japanese",
+    >>>     post_process=lambda string: {"output": string},
+    >>> )
+    >>> thought = ManualSequentialChainOfThought(thoughts=[thought1, thought2], llm=llm)
+    >>> thought(word="hello", language="French")
    {'word': 'hello',
     'language': 'French',
     'translated': '"Bonjour"',
--- a/knowledgehub/llms/linear.py
+++ b/knowledgehub/llms/linear.py
@ -21,6 +21,7 @@ class SimpleLinearPipeline(BaseComponent):
            post-processor component or function.

    Example Usage:
+        ```python
        from kotaemon.llms import AzureChatOpenAI, BasePromptComponent

        def identity(x):
@ -41,6 +42,7 @@ class SimpleLinearPipeline(BaseComponent):
            post_processor=identity,
        )
        print(pipeline(word="lone"))
+        ```
    """

    prompt: BasePromptComponent
@ -85,7 +87,8 @@ class GatedLinearPipeline(SimpleLinearPipeline):
        condition (Callable[[IO_Type], Any]): A callable function that represents the
            condition.

-    Example Usage:
+    Usage:
+        ```{.py3 title="Example Usage"}
        from kotaemon.llms import AzureChatOpenAI, BasePromptComponent
        from kotaemon.parsers import RegexExtractor

@ -109,6 +112,7 @@ class GatedLinearPipeline(SimpleLinearPipeline):
        )
        print(pipeline(condition_text="some pattern", word="lone"))
        print(pipeline(condition_text="other pattern", word="lone"))
+        ```
    """

    condition: Callable[[IO_Type], Any]
--- a/knowledgehub/llms/prompts/template.py
+++ b/knowledgehub/llms/prompts/template.py
@ -72,7 +72,7 @@ class PromptTemplate:
                UserWarning,
            )

-    def populate(self, **kwargs):
+    def populate(self, **kwargs) -> str:
        """
        Strictly populate the template with the given keyword arguments.

@ -81,7 +81,7 @@ class PromptTemplate:
                      Each keyword corresponds to a placeholder in the template.

        Returns:
-            str: The populated template.
+            The populated template.

        Raises:
            ValueError: If an unknown placeholder is provided.
--- a/knowledgehub/loaders/base.py
+++ b/knowledgehub/loaders/base.py
@ -4,7 +4,7 @@ from typing import Any, List, Type, Union
 from llama_index import SimpleDirectoryReader, download_loader
 from llama_index.readers.base import BaseReader

-from ..base import BaseComponent, Document
+from kotaemon.base import BaseComponent, Document


 class AutoReader(BaseComponent):
--- a/knowledgehub/loaders/utils/box.py
+++ b/knowledgehub/loaders/utils/box.py
@ -93,7 +93,7 @@ def get_rect_iou(gt_box: List[tuple], pd_box: List[tuple], iou_type=0) -> int:

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
-    # areas - the interesection area
+    # areas - the intersection area
    if iou_type == 0:
        iou = interArea / float(gt_area + pd_area - interArea)
    elif iou_type == 1:
--- a/knowledgehub/loaders/utils/pdf_ocr.py
+++ b/knowledgehub/loaders/utils/pdf_ocr.py
@ -34,8 +34,7 @@ def read_pdf_unstructured(input_path: Union[Path, str]):
        from unstructured.partition.auto import partition
    except ImportError:
        raise ImportError(
-            "Please install unstructured PDF reader \
-              `pip install unstructured[pdf]`"
+            "Please install unstructured PDF reader `pip install unstructured[pdf]`"
        )

    page_items = defaultdict(list)
@ -60,7 +59,7 @@ def read_pdf_unstructured(input_path: Union[Path, str]):
 def merge_ocr_and_pdf_texts(
    ocr_list: List[dict], pdf_text_list: List[dict], debug_info=None
 ):
-    """Merge PDF and OCR text using IOU overlaping location
+    """Merge PDF and OCR text using IOU overlapping location
    Args:
        ocr_list: List of OCR items {"text", "box", "location"}
        pdf_text_list: List of PDF items {"text", "box", "location"}
@ -115,7 +114,7 @@ def merge_ocr_and_pdf_texts(
 def merge_table_cell_and_ocr(
    table_list: List[dict], ocr_list: List[dict], pdf_list: List[dict], debug_info=None
 ):
-    """Merge table items with OCR text using IOU overlaping location
+    """Merge table items with OCR text using IOU overlapping location
    Args:
        table_list: List of table items
            "type": ("table", "cell", "text"), "text", "box", "location"}
@ -123,7 +122,7 @@ def merge_table_cell_and_ocr(
        pdf_list: List of PDF items {"text", "box", "location"}

    Returns:
-        all_table_cells: List of tables, each of table is reprented
+        all_table_cells: List of tables, each of table is represented
            by list of cells with combined text from OCR
        not_matched_items: List of PDF text which is not overlapped by table region
    """
--- a/knowledgehub/parsers/regex_extractor.py
+++ b/knowledgehub/parsers/regex_extractor.py
@ -100,11 +100,14 @@ class RegexExtractor(BaseComponent):
            A list contains the output ExtractorOutput for each input

        Example:
-            document1 = Document(...)
-            document2 = Document(...)
-            document_batch = [document1, document2]
-            batch_output = self(document_batch)
-            # batch_output will be [output1_document1, output1_document2]
+            ```pycon
+            >>> document1 = Document(...)
+            >>> document2 = Document(...)
+            >>> document_batch = [document1, document2]
+            >>> batch_output = self(document_batch)
+            >>> print(batch_output)
+            [output1_document1, output1_document2]
+            ```
        """
        # TODO: this conversion seems common
        input_: list[str] = []
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -0,0 +1,105 @@
+repo_name: Cinnamon/kotaemon
+repo_url: https://github.com/Cinnamon/kotaemon
+site_name: kotaemon Docs
+edit_uri: edit/main/docs/
+
+nav:
+  - Getting Started:
+      - Quick Start: index.md
+      - Overview: overview.md
+      - Contributing: contributing.md
+  - Tutorial:
+      - Data & Data Structure Components: data-components.md
+      - Creating a Component: create-a-component.md
+      - Utilities: ultilities.md
+  # generated using gen-files + literate-nav
+  - API Reference: reference/
+  - Use Cases: examples/
+  - Misc:
+      - Upload python package to private index: upload-package.md
+
+markdown_extensions:
+  - admonition
+  - pymdownx.highlight:
+      use_pygments: true
+      anchor_linenums: true
+      line_spans: __span
+      linenums: true
+      pygments_lang_class: true
+  - pymdownx.inlinehilite
+  - pymdownx.snippets
+  - pymdownx.details
+  - pymdownx.extra
+  - pymdownx.tabbed:
+      alternate_style: true
+  - pymdownx.superfences:
+      custom_fences:
+        - name: mermaid
+          class: mermaid
+          format: !!python/name:pymdownx.superfences.fence_code_format
+  - toc:
+      permalink: true
+      title: Page contents
+
+plugins:
+  - search
+  - gen-files:
+      scripts:
+        - docs/scripts/generate_reference_docs.py
+        - docs/scripts/generate_examples_docs.py
+  - literate-nav:
+      nav_file: NAV.md
+  - mkdocstrings:
+      handlers:
+        python:
+          options:
+            docstring_options:
+              ignore_init_summary: false
+            filters:
+              - "!^_"
+            members_order: source
+            separate_signature: true
+          paths: [kotaemon]
+  - git-revision-date-localized:
+      enable_creation_date: true
+      type: timeago
+      fallback_to_build_date: true
+  - section-index
+
+theme:
+  features:
+    - content.action.edit
+    - content.tabs.link
+    - content.code.annotate
+    - content.code.annotations
+    - content.code.copy
+    - navigation.tabs
+    - navigation.top
+    - navigation.instant
+    - navigation.indexes
+    - toc.follow
+    - search.share
+    - search.highlight
+    - search.suggest
+  name: material
+  custom_dir: docs/theme
+  palette:
+    scheme: dracula
+    primary: deep purple
+    accent: deep purple
+  icon:
+    repo: fontawesome/brands/github
+    edit: material/pencil
+    view: material/eye
+
+extra_css:
+  - extra/css/code_select.css
+  - assets/pymdownx-extras/extra-fb5a2a1c86.css
+
+extra_javascript:
+  - assets/pymdownx-extras/extra-loader-MCFnu0Wd.js
+
+validation:
+  absolute_links: warn
+  omitted_files: warn
+  unrecognized_links: warn
--- a/pyproject.toml
+++ b/pyproject.toml
@ -70,3 +70,10 @@ kh = "kotaemon.cli:main"
 Homepage = "https://github.com/Cinnamon/kotaemon/"
 Repository = "https://github.com/Cinnamon/kotaemon/"
 Documentation = "https://github.com/Cinnamon/kotaemon/wiki"
+
+[tool.codespell]
+skip = "*.js,*.css,*.map"
+# `llm` abbreviation for large language models
+ignore-words-list = "llm,fo"
+quiet-level = 3
+check-filenames = ""
--- a/tests/test_vectorstore.py
+++ b/tests/test_vectorstore.py
@ -34,7 +34,7 @@ class TestChromaVectorStore:
        ]
        assert db._collection.count() == 0, "Expected empty collection"
        output = db.add(documents)
-        assert len(output) == 2, "Expected outputing 2 ids"
+        assert len(output) == 2, "Expected outputting 2 ids"
        assert db._collection.count() == 2, "Expected 2 added entries"

    def test_delete(self, tmp_path):
				`@ -0,0 +1 @@`
				{"version":3,"file":"material-extra-theme-TVq-kNRT.js","sources":["material-extra-theme.js"],"sourcesContent":["(() => {\n\n const preferToggle = e => {\n if (localStorage.getItem(\"data-md-prefers-color-scheme\") === \"true\") {\n document.querySelector(\"body\").setAttribute(\"data-md-color-scheme\", (e.matches) ? \"dracula\" : \"default\")\n }\n }\n\n const setupTheme = body => {\n const preferSupported = window.matchMedia(\"(prefers-color-scheme)\").media !== \"not all\"\n let scheme = localStorage.getItem(\"data-md-color-scheme\")\n let prefers = localStorage.getItem(\"data-md-prefers-color-scheme\")\n\n if (!scheme) {\n scheme = \"dracula\"\n }\n if (!prefers) {\n prefers = \"false\"\n }\n\n if (prefers === \"true\" && preferSupported) {\n scheme = (window.matchMedia(\"(prefers-color-scheme: dark)\").matches) ? \"dracula\" : \"default\"\n } else {\n prefers = \"false\"\n }\n\n body.setAttribute(\"data-md-prefers-color-scheme\", prefers)\n body.setAttribute(\"data-md-color-scheme\", scheme)\n\n if (preferSupported) {\n const matchListener = window.matchMedia(\"(prefers-color-scheme: dark)\")\n matchListener.addListener(preferToggle)\n }\n }\n\n const observer = new MutationObserver(mutations => {\n mutations.forEach(mutation => {\n if (mutation.type === \"childList\") {\n if (mutation.addedNodes.length) {\n for (let i = 0; i < mutation.addedNodes.length; i++) {\n const el = mutation.addedNodes[i]\n\n if (el.nodeType === 1 && el.tagName.toLowerCase() === \"body\") {\n setupTheme(el)\n break\n }\n }\n }\n }\n })\n })\n\n observer.observe(document.querySelector(\"html\"), {childList: true})\n})()\n\nwindow.toggleScheme = () => {\n const body = document.querySelector(\"body\")\n const preferSupported = window.matchMedia(\"(prefers-color-scheme)\").media !== \"not all\"\n let scheme = body.getAttribute(\"data-md-color-scheme\")\n let prefer = body.getAttribute(\"data-md-prefers-color-scheme\")\n\n if (preferSupported && scheme === \"default\" && prefer !== \"true\") {\n prefer = \"true\"\n scheme = (window.matchMedia(\"(prefers-color-scheme: dark)\").matches) ? \"dracula\" : \"default\"\n } else if (preferSupported && prefer === \"true\") {\n prefer = \"false\"\n scheme = \"dracula\"\n } else if (scheme === \"dracula\") {\n prefer = \"false\"\n scheme = \"default\"\n } else {\n prefer = \"false\"\n scheme = \"dracula\"\n }\n localStorage.setItem(\"data-md-prefers-color-scheme\", prefer)\n body.setAttribute(\"data-md-prefers-color-scheme\", prefer)\n body.setAttribute(\"data-md-color-scheme\", scheme)\n}\n"],"names":["preferToggle","e","localStorage","getItem","document","querySelector","setAttribute","matches","MutationObserver","mutations","forEach","mutation","type","addedNodes","length","i","el","nodeType","tagName","toLowerCase","body","preferSupported","scheme","prefers","window","matchMedia","media","addListener","observe","childList","toggleScheme","getAttribute","prefer","setItem"],"mappings":"yBAAA,IAEQA,IAAe,SAAAC,GAC0C,SAAzDC,aAAaC,QAAQ,iCACvBC,SAASC,cAAc,QAAQC,aAAa,uBAAyBL,EAAEM,QAAW,UAAY,YA+BjF,IAAIC,kBAAiB,SAAAC,GACpCA,EAAUC,SAAQ,SAAAC,GAChB,GAAsB,cAAlBA,EAASC,MACPD,EAASE,WAAWC,OACtB,IAAK,IAAIC,EAAI,EAAGA,EAAIJ,EAASE,WAAWC,OAAQC,IAAK,CACnD,IAAMC,EAAKL,EAASE,WAAWE,GAE/B,GAAoB,IAAhBC,EAAGC,UAA+C,SAA7BD,EAAGE,QAAQC,cAA0B,CAlCrDC,EAmCIJ,EAlCfK,SACFC,SACAC,SAFEF,EAAwE,YAAtDG,OAAOC,WAAW,0BAA0BC,MAChEJ,EAASpB,aAAaC,QAAQ,wBAC9BoB,EAAUrB,aAAaC,QAAQ,gCAE9BmB,IACHA,EAAS,WAENC,IACHA,EAAU,SAGI,SAAZA,GAAsBF,EACxBC,EAAUE,OAAOC,WAAW,gCAAgClB,QAAW,UAAY,UAEnFgB,EAAU,QAGZH,EAAKd,aAAa,+BAAgCiB,GAClDH,EAAKd,aAAa,uBAAwBgB,GAEtCD,GACoBG,OAAOC,WAAW,gCAC1BE,YAAY3B,GAalB,KACF,CACF,CAtCW,IAAAoB,EACXC,EACFC,EACAC,CAsCJ,GACF,IAESK,QAAQxB,SAASC,cAAc,QAAS,CAACwB,WAAW,IAG/DL,OAAOM,aAAe,WACpB,IAAMV,EAAOhB,SAASC,cAAc,QAC9BgB,EAAwE,YAAtDG,OAAOC,WAAW,0BAA0BC,MAChEJ,EAASF,EAAKW,aAAa,wBAC3BC,EAASZ,EAAKW,aAAa,gCAE3BV,GAA8B,YAAXC,GAAmC,SAAXU,GAC7CA,EAAS,OACTV,EAAUE,OAAOC,WAAW,gCAAgClB,QAAW,UAAY,WAC1Ec,GAA8B,SAAXW,GAC5BA,EAAS,QACTV,EAAS,WACW,YAAXA,GACTU,EAAS,QACTV,EAAS,YAETU,EAAS,QACTV,EAAS,WAEXpB,aAAa+B,QAAQ,+BAAgCD,GACrDZ,EAAKd,aAAa,+BAAgC0B,GAClDZ,EAAKd,aAAa,uBAAwBgB,EAC5C"}