Commit Graph

3 Commits

Author SHA1 Message Date
Nguyen Trung Duc (john)
8532138842 Move Document and other interface into base/schema (#69) 2023-11-14 11:51:10 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
4704e2c11a Add new OCRReader with PDF+OCR text merging (#66)
This change speeds up OCR extraction by allowing bypassing OCR for texts that are irrelevant (not in table).

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
2023-11-13 17:43:02 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
6c3d614973 [AUR-432] Add layout-aware table parsing PDF reader (#27)
* add OCRReader, MathPixReader and ExcelReader

* update test case for ocr reader

* reformat

* minor fix
2023-09-26 15:52:44 +07:00