Commit Graph

39 Commits

Author SHA1 Message Date
Tuan Anh Nguyen Dang (Tadashi_Cin)
d880294153 fix: pwd change in setttings (#147) 2024-08-29 13:41:12 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
2570e11501 feat: merge develop (#123)
* Support hybrid vector retrieval

* Enable figures and table reading in Azure DI

* Retrieve with multi-modal

* Fix mixing up table

* Add txt loader

* Add Anthropic Chat

* Raising error when retrieving help file

* Allow same filename for different people if private is True

* Allow declaring extra LLM vendors

* Show chunks on the File page

* Allow elasticsearch to get more docs

* Fix Cohere response (#86)

* Fix Cohere response

* Remove Adobe pdfservice from dependency

kotaemon doesn't rely more pdfservice for its core functionality,
and pdfservice uses very out-dated dependency that causes conflict.

---------

Co-authored-by: trducng <trungduc1992@gmail.com>

* Add confidence score (#87)

* Save question answering data as a log file

* Save the original information besides the rewritten info

* Export Cohere relevance score as confidence score

* Fix style check

* Upgrade the confidence score appearance (#90)

* Highlight the relevance score

* Round relevance score. Get key from config instead of env

* Cohere return all scores

* Display relevance score for image

* Remove columns and rows in Excel loader which contains all NaN (#91)

* remove columns and rows which contains all NaN

* back to multiple joiner options

* Fix style

---------

Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local>
Co-authored-by: trducng <trungduc1992@gmail.com>

* Track retriever state

* Bump llama-index version 0.10

* feat/save-azuredi-mhtml-to-markdown (#93)

* feat/save-azuredi-mhtml-to-markdown

* fix: replace os.path to pathlib change theflow.settings

* refactor: base on pre-commit

* chore: move the func of saving content markdown above removed_spans

---------

Co-authored-by: jacky0218 <jacky0218@github.com>

* fix: losing first chunk (#94)

* fix: losing first chunk.

* fix: update the method of preventing losing chunks

---------

Co-authored-by: jacky0218 <jacky0218@github.com>

* fix: adding the base64 image in markdown (#95)

* feat: more chunk info on UI

* fix: error when reindexing files

* refactor: allow more information exception trace when using gpt4v

* feat: add excel reader that treats each worksheet as a document

* Persist loader information when indexing file

* feat: allow hiding unneeded setting panels

* feat: allow specific timezone when creating conversation

* feat: add more confidence score (#96)

* Allow a list of rerankers

* Export llm reranking score instead of filter with boolean

* Get logprobs from LLMs

* Rename cohere reranking score

* Call 2 rerankers at once

* Run QA pipeline for each chunk to get qa_score

* Display more relevance scores

* Define another LLMScoring instead of editing the original one

* Export logprobs instead of probs

* Call LLMScoring

* Get qa_score only in the final answer

* feat: replace text length with token in file list

* ui: show index name instead of id in the settings

* feat(ai): restrict the vision temperature

* fix(ui): remove the misleading message about non-retrieved evidences

* feat(ui): show the reasoning name and description in the reasoning setting page

* feat(ui): show version on the main windows

* feat(ui): show default llm name in the setting page

* fix(conf): append the result of doc in llm_scoring (#97)

* fix: constraint maximum number of images

* feat(ui): allow filter file by name in file list page

* Fix exceeding token length error for OpenAI embeddings by chunking then averaging (#99)

* Average embeddings in case the text exceeds max size

* Add docstring

* fix: Allow empty string when calling embedding

* fix: update trulens LLM ranking score for retrieval confidence, improve citation (#98)

* Round when displaying not by default

* Add LLMTrulens reranking model

* Use llmtrulensscoring in pipeline

* fix: update UI display for trulen score

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* feat: add question decomposition & few-shot rewrite pipeline (#89)

* Create few-shot query-rewriting. Run and display the result in info_panel

* Fix style check

* Put the functions to separate modules

* Add zero-shot question decomposition

* Fix fewshot rewriting

* Add default few-shot examples

* Fix decompose question

* Fix importing rewriting pipelines

* fix: update decompose logic in fullQA pipeline

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: add encoding utf-8 when save temporal markdown in vectorIndex (#101)

* fix: improve retrieval pipeline and relevant score display (#102)

* fix: improve retrieval pipeline by extending first round top_k with multiplier

* fix: minor fix

* feat: improve UI default settings and add quick switch option for pipeline

* fix: improve agent logics (#103)

* fix: improve agent progres display

* fix: update retrieval logic

* fix: UI display

* fix: less verbose debug log

* feat: add warning message for low confidence

* fix: LLM scoring enabled by default

* fix: minor update logics

* fix: hotfix image citation

* feat: update docx loader for handle merged table cells + handle zip file upload (#104)

* feat: update docx loader for handle merged table cells

* feat: handle zip file

* refactor: pre-commit

* fix: escape text in download UI

* feat: optimize vector store query db (#105)

* feat: optimize vector store query db

* feat: add file_id to chroma metadatas

* feat: remove unnecessary logs and update migrate script

* feat: iterate through file index

* fix: remove unused code

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: add openai embedidng exponential back-off

* fix: update import download_loader

* refactor: codespell

* fix: update some default settings

* fix: update installation instruction

* fix: default chunk length in simple QA

* feat: add share converstation feature and enable retrieval history (#108)

* feat: add share converstation feature and enable retrieval history

* fix: update share conversation UI

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: allow exponential backoff for failed OCR call (#109)

* fix: update default prompt when no retrieval is used

* fix: create embedding for long image chunks

* fix: add exception handling for additional table retriever

* fix: clean conversation & file selection UI

* fix: elastic search with empty doc_ids

* feat: add thumbnail PDF reader for quick multimodal QA

* feat: add thumbnail handling logic in indexing

* fix: UI text update

* fix: PDF thumb loader page number logic

* feat: add quick indexing pipeline and update UI

* feat: add conv name suggestion

* fix: minor UI change

* feat: citation in thread

* fix: add conv name suggestion in regen

* chore: add assets for usage doc

* chore: update usage doc

* feat: pdf viewer (#110)

* feat: update pdfviewer

* feat: update missing files

* fix: update rendering logic of infor panel

* fix: improve thumbnail retrieval logic

* fix: update PDF evidence rendering logic

* fix: remove pdfjs built dist

* fix: reduce thumbnail evidence count

* chore: update gitignore

* fix: add js event on chat msg select

* fix: update css for viewer

* fix: add env var for PDFJS prebuilt

* fix: move language setting to reasoning utils

---------

Co-authored-by: phv2312 <kat87yb@gmail.com>
Co-authored-by: trducng <trungduc1992@gmail.com>

* feat: graph rag (#116)

* fix: reload server when add/delete index

* fix: rework indexing pipeline to be able to disable vectorstore and splitter if needed

* feat: add graphRAG index with plot view

* fix: update requirement for graphRAG and lighten unnecessary packages

* feat: add knowledge network index (#118)

* feat: add Knowledge Network index

* fix: update reader mode setting for knet

* fix: update init knet

* fix: update collection name to index pipeline

* fix: missing req

---------

Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>

* fix: update info panel return for graphrag

* fix: retriever setting graphrag

* feat: local llm settings (#122)

* feat: expose context length as reasoning setting to better fit local models

* fix: update context length setting for agents

* fix: rework threadpool llm call

* fix: fix improve indexing logic

* fix: fix improve UI

* feat: add lancedb

* fix: improve lancedb logic

* feat: add lancedb vectorstore

* fix: lighten requirement

* fix: improve lanceDB vs

* fix: improve UI

* fix: openai retry

* fix: update reqs

* fix: update launch command

* feat: update Dockerfile

* feat: add plot history

* fix: update default config

* fix: remove verbose print

* fix: update default setting

* fix: update gradio plot return

* fix: default gradio tmp

* fix: improve lancedb docstore

* fix: fix question decompose pipeline

* feat: add multimodal reader in UI

* fix: udpate docs

* fix: update default settings & docker build

* fix: update app startup

* chore: update documentation

* chore: update README

* chore: update README

---------

Co-authored-by: trducng <trungduc1992@gmail.com>

* chore: update README

* chore: update README

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
Co-authored-by: cin-ace <ace@cinnamon.is>
Co-authored-by: Linh Nguyen <70562198+linhnguyen-cinnamon@users.noreply.github.com>
Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local>
Co-authored-by: cin-jacky <101088014+jacky0218@users.noreply.github.com>
Co-authored-by: jacky0218 <jacky0218@github.com>
Co-authored-by: kan_cin <kan@cinnamon.is>
Co-authored-by: phv2312 <kat87yb@gmail.com>
Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>
2024-08-26 08:50:37 +07:00
trducng
ebf1315569 (pump:minor) Allow the indexing pipeline to report the indexing progress onto the UI (#81)
* Turn the file indexing event to generator to report progress

* Fix React text's trimming function

* Refactor delete file into a method
2024-05-25 22:09:41 +07:00
trducng
56dfc8fb53 Allow the application name to be configurable in settings (#80)
* Make app name configurable

* Use app name in browser tab
2024-05-20 22:37:24 +07:00
trducng (john)
5ca3c25404 Avoid empty chat message (#78) 2024-05-20 16:20:50 +07:00
ian_Cin
b2296cfcdf (bump:patch) Feat: Show app version in the Help page (#68)
* typo

* show version in the Help page

* update docs

* pump duckduckgo-search

* allow app version to be set by env var
2024-05-16 14:27:51 +07:00
ian_Cin
654501e01c (bump:minor) Feat: Add mechanism for user-site update and auto creating releases (#56)
* move flowsettings.py and launch.py to root

* update docs

* sync sub package versions

* rename launch.py to app.py and make run scripts work with installation package

* add update scripts

* auto version for root package

* rename authors and update doc dir

* Update auto-bump-and-release.yaml to trigger on push to main branch

* latest as branch instead of tag

* pin deps versions

* cache the changelogs
2024-05-15 16:34:50 +07:00
Duc Nguyen (john)
a8725710af Allow users to select reasoning pipeline. Fix small issues with user UI, cohere name (#50)
* Fix user page

* Allow changing LLM in reasoning pipeline

* Fix CohereEmbedding name
2024-04-25 17:18:12 +07:00
Duc Nguyen (john)
e29bec6275 Allow file index to be private (#45)
* Fix breaking reranker

* Allow private file index

* Avoid setting default to 1 when user management is enabled
2024-04-25 14:24:35 +07:00
Duc Nguyen (john)
1b2082a140 Allow file selector to be disabled (#36)
* Allow file selector to be disabled

* Update docs and variable names
2024-04-16 18:43:56 +07:00
ian_Cin
5286ff48bc Fix info panel overflow (#33)
* update chatbot placeholder

* fix chat info panel overflow bug

* set azure_endpoint to required in AzureChatOpenAI

* update screenshots
2024-04-14 09:34:14 +07:00
ian_Cin
8985963e1e Setup app data dir (#32)
* setup local data dir

* update readme

* update chat panel

* update help page
2024-04-13 23:26:06 +07:00
Duc Nguyen (john)
5ce6bac03d Allow listing indices (#22) 2024-04-11 16:28:04 +07:00
ian_Cin
b507eef541 Improve manuals (#19)
* Rename Admin -> Resources
* Improve ui
* Update docs
2024-04-10 17:04:04 +07:00
Duc Nguyen (john)
7b3307e3c4 Provide embedding manager (#16)
* Provide the Embedding management UI

* Update Fastembed documentation

* Add validation when adding / updating embeddings

* Stop using the old ktem embeddings manager

* Set default local embedding models

* Move the local embeddings below in flowsettings

* Update flowsettings
2024-04-10 15:11:44 +07:00
ian_Cin
8001c86b16 Feat/new UI (#13)
* new custom theme

* improve css: scrollbar, header, tabs and buttons

* update settings tab

* open file index selector by default

* update chat control panel

* update chat panel

* update file index page

* cap gradio<=4.22.0

* rename admin page

* adjust UI

* update flowsettings

* auto start in browser

* change colour for edit LLM page's button
2024-04-08 22:23:00 +07:00
Duc Nguyen (john)
a203fc0f7c Allow users to add LLM within the UI (#6)
* Rename AzureChatOpenAI to LCAzureChatOpenAI
* Provide vanilla ChatOpenAI and AzureChatOpenAI
* Remove the highest accuracy, lowest cost criteria

These criteria are unnecessary. The users, not pipeline creators, should choose
which LLM to use. Furthermore, it's cumbersome to input this information,
really degrades user experience.

* Remove the LLM selection in simple reasoning pipeline
* Provide a dedicated stream method to generate the output
* Return placeholder message to chat if the text is empty
2024-04-06 11:53:17 +07:00
Duc Nguyen (john)
e187e23dd1 Improve UX (#9)
* Go to chat tab when resignin
* Allow placeholder message configurable
* Hide setting tabs if there aren't any settings
2024-04-04 15:39:24 +07:00
ian_Cin
ecf09b275f Fix UI bugs (#8)
* Auto create conversation when the user starts

* Add conversation rename rule check

* Fix empty name during save

* Confirm deleting conversation

* Show warning if users don't select file when upload files in the File Index

* Feedback when user uploads duplicated file

* Limit the file types

* Fix valid username

* Allow login when username with leading and trailing whitespaces

* Improve the user

* Disable admin panel for non-admnin user

* Refresh user lists after creating/deleting users

* Auto logging in

* Clear admin information upon signing out

* Fix unable to receive uploaded filename that include special characters, like !@#$%^&*().pdf

* Set upload validation for FileIndex

* Improve user management UI/UIX

* Show extraction error when indexing file

* Return selected user -1 when signing out

* Fix default supported file types in file index

* Validate changing password

* Allow the selector to contain mulitple gradio components

* A more tolerable placeholder screen

* Allow chat suggestion box

* Increase concurrency limit

* Make adobe loader optional

* Use BaseReasoning

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2024-04-03 16:33:54 +07:00
ian_Cin
43a18ba070 Feat/regenerate answer (#7)
* Add regen button and repharasing question on regen

* Stop appending regen messages to history, allow only one

* Add dynamic conversation state

* Allow reasoning pipeline to manipulate state

---------

Co-authored-by: albert <albert@cinnamon.is>
Co-authored-by: Duc Nguyen (john) <trungduc1992@gmail.com>
2024-04-03 15:37:55 +07:00
ian
14482e9755 bug fix: settings are not persistent 2024-03-28 16:36:05 +07:00
Duc Nguyen (john)
2950e6ed02 Improve behavior of simple reasoning (#157)
* Add base reasoning implementation

* Provide explicit async and streaming capability

* Allow refreshing the information panel
2024-03-12 13:03:38 +07:00
trducng
2b3571e892 Fix subscribing sign-in/out 2024-03-08 13:38:29 +07:00
Duc Nguyen (john)
4f356f7f9a Provide dedicated page for login (#153) 2024-03-08 08:06:51 +07:00
Duc Nguyen (john)
9725d60791 Create user management functionality (#152)
* Create user management page
* Remove old user creating UI
* Add username validation; admin user auto-creation
* Provide docs on user management
* Bump version
2024-03-07 14:19:37 +07:00
Duc Nguyen (john)
8a90fcfc99 Restructure index to allow it to be dynamically created by end-user (#151)
1. Introduce the concept of "collection_name" to docstore and vector store. Each collection can be viewed similarly to a table in a SQL database. It allows better organizing information within this data source.
2. Move the `Index` and `Source` tables from the application scope into the index scope. For each new index created by user, these tables should increase accordingly. So it depends on the index, rather than the app.
3. Make each index responsible for the UI components in the app.
4. Construct the File UI page.
2024-03-07 01:50:47 +07:00
Duc Nguyen (john)
033e7e05cc Improve kotaemon based on insights from projects (#147)
- Include static files in the package.
- More reliable information panel. Faster & not breaking randomly.
- Add directory upload.
- Enable zip file to upload.
- Allow setting endpoint for the OCR reader using environment variable.
2024-02-28 22:18:29 +07:00
trducng
89450ab661 Enable zip file upload in ktem 2024-02-20 02:59:46 +07:00
trducng
771f074c0e Add utf-8 encoding in Help Page for rendering on Windows 2024-02-05 16:42:40 +07:00
trducng
bff55230ba Reduce the default chunk size in the reasoning pipeline to fit LLM capability 2024-02-03 09:38:50 +07:00
trducng
107bc7580e Enable HTML upload 2024-02-02 11:37:57 +07:00
trducng
6ae9634399 Enable .doc file 2024-01-27 23:45:19 +07:00
trducng
23c0331bab Enable pptx support 2024-01-27 23:08:06 +07:00
trducng
c6637ca56e Relate the retrievers to the indexer 2024-01-27 16:39:40 +07:00
Duc Nguyen (john)
22c646e5c4 Add documentation about adding reasoning and indexing pipelines to the application (#138) 2024-01-26 22:31:52 +07:00
trducng
757aabca4d Add app title, favicon. More natural chat 2024-01-25 22:40:32 +07:00
Duc Nguyen (john)
513e86f490 Add dedicated information panel to the UI (#137)
* Allow streaming to the chatbot and the information panel without threading
* Highlight evidence in a simple manner
2024-01-25 19:07:53 +07:00
Duc Nguyen (john)
ebc61400d8 Provide a developer mode when running ktem (#135)
Implement and utilize `on_app_created` to support the developer mode.
2024-01-23 11:46:59 +07:00
Duc Nguyen (john)
2dd531114f Make ktem official (#134)
* Move kotaemon and ktem into same folder

* Update docs

* Update CI

* Resolve mypy, isorts

* Re-allow test pdf files
2024-01-23 10:54:18 +07:00