feat: integrate with docling (#471) bump:patch

* feat: add docling reader implementation

* feat: expose docling to UI

* fix: improve docling output parsing

* docs: update README

---------

Co-authored-by: Tadashi <tadashi@cinnamon.is>
This commit is contained in:
Quang (Albert)
2024-11-16 10:04:57 +07:00
committed by GitHub
parent 5b828c213c
commit 56c40f1c05
7 changed files with 271 additions and 13 deletions

View File

@@ -216,6 +216,17 @@ documents and developers who want to build their own RAG pipeline.
See [Local model setup](docs/local_model.md).
### Setup multimodal document parsing (OCR, table parsing, figure extraction)
These options are available:
- [Azure Document Intelligence (API)](https://azure.microsoft.com/en-us/products/ai-services/ai-document-intelligence)
- [Adobe PDF Extract (API)](https://developer.adobe.com/document-services/docs/overview/pdf-extract-api/)
- [Docling (local, open-source)](https://github.com/DS4SD/docling)
- To use Docling, first install required dependencies: `pip install docling`
Select corresponding loaders in `Settings -> Retrieval Settings -> File loader`
### Customize your application
- By default, all application data is stored in the `./ktem_app_data` folder. You can back up or copy this folder to transfer your installation to a new machine.