docTR: Document Text Recognition¶
State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by PyTorch
DocTR provides an easy and powerful way to extract valuable information from your documents:
🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
Main Features¶
🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
⚡ Optimized for inference speed on both CPU & GPU
🐦 Light package, minimal dependencies
🛠️ Actively maintained by Mindee
🏭 Easy integration (available templates for browser demo & API deployment)
Model zoo¶
Text detection models¶
Text recognition models¶
SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
VIPTR from “A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition”
Supported datasets¶
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
SROIE from ICDAR 2019.
IIIT-5k from CVIT.
Street View Text from “End-to-End Scene Text Recognition”.
SynthText from Visual Geometry Group.
SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
IC03 from ICDAR 2003.
IC13 from ICDAR 2013.
IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
IIITHWS from “Generating Synthetic Data for Text Recognition”.
WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
COCO-Text dataset from “COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images”.