Paper-heavy workflows to usable data

Turn messy documents into clean business outputs.

DocUnlocked helps logistics, construction, property, bookkeeping and technical teams convert scans, forms, tables and PDFs into CSV, Excel, Word, Markdown, JSON and AI summaries.

Review before useUncertain fields stay visible.
Source traceabilityExamples cite public documents.
No API setup firstInterface now; API later.
Real handwriting transformation
1. ImportPDF, image, web or office file
2. DetectLayout, tables, formulas, figures
3. RebuildReadable document + structured data
4. ExportCSV, Excel, Word, JSON, ZIP
Real NIST SD19 handwriting sample form
NIST handwriting formReal handprinted fields and paragraph
Real cursive lab notebook page from Library of Congress
Lab notebookCursive, faded ink, experiment notes
Real handwritten 1930 census table from the National Archives
Handwritten tablePublic National Archives schedule
Public GSA Government Bill of Lading form preview
Freight documentPublic GSA bill of lading form
NIST handwriting review packagePublic source
Review package preview for a real NIST handwriting form
  • Real handwriting shown
    The source is a public benchmark image, not a staged handwriting sample.
  • Review-first output
    Low-confidence fields remain visible before export.
  • Structure preserved
    Boxes, rows, sections and source crops stay attached.
  • Business exports
    CSV, Excel, Word, Markdown, JSON and summaries are the deliverable.
Clean spreadsheet rowsReady to review
fieldvalueconfidence
document_typeTechnical report / field packet / formHigh
tablesDetected table regions exported as rowsHigh
figuresImage and chart blocks preservedReview
formulasFormula-like regions extracted or preservedReview
exceptionsHandwriting, low contrast, damaged scansFlagged
Workbook tabsOps-ready
  • Summary - document type, source, page count, extraction status.
  • Fields - normalized key/value rows with review flags.
  • Tables - shipment lines, inspection deficiencies, invoice lines or lab measurements.
  • Review - low-confidence fields and handwritten notes.
Editable recovery documentReview required
  • Recovered headings and sections.
  • Tables preserved as editable structures when possible.
  • Images, formulas and complex regions kept as traceable blocks.
  • Useful for teams that need a human-readable final file, not only CSV.
Structured technical payloadAPI-ready
{
  "workflow": "complex_document_recovery",
  "outputs": ["markdown", "docx", "xlsx", "html", "json", "summary", "zip"],
  "review_flags": ["formula_region", "figure_caption", "handwriting"],
  "source_trace": "public document + page coordinates"
}
Operator summaryFast triage
  • Document contains multi-column text, tables, figures and formula-like regions.
  • Clean sections and structured tables can feed review, reporting or downstream AI workflows.
  • Best next action: validate low-confidence figures/formulas before official reuse.
MarkdownAI-ready structure
XLSXTable workbook
DOCXEditable recovery
ZIPFull package
Industry proof library

Lead with the industries still buried in paper.

The buying question is not "does OCR exist?" It is "will this work on my documents, in my workflow, without an engineering team?"

Public-source proof

Real handwriting first, with reviewable outputs.

The site now shows public handwritten sources with visible transformation steps. The promise is not magic handwriting OCR; it is a clean review package the operator can validate and export.

Real handprinted form

NIST SD19 handwriting sample form

A public NIST handprinted form shows real field boxes, digit strings and a handwritten paragraph. This is the right proof class for teams with forms completed by hand.

HandwritingFieldsReview flagsCSV

Source: NIST SD19 handwriting sample image. Output shown as a DocUnlocked review/export package preview.

BeforePublic image
Real NIST SD19 handwritten form source
AfterReview package
DocUnlocked review output preview for NIST handwriting form
Real handwritten table

1930 census schedule to rows

Dense handwritten tables are where simple document demos collapse. The value is row alignment, columns, review flags and spreadsheet-ready output.

TablesRowsExcelReview queue

Source: U.S. National Archives 1930 census schedule PDF. Output shown as a review/export package preview.

BeforePublic PDF
Real handwritten 1930 census schedule table
AfterRows + review
DocUnlocked rows and review preview for handwritten census table
Real cursive lab notes

Lab notebook to Markdown + review queue

Labs, bureaus d'etude and archives need more than CSV. Cursive notes should become Markdown, searchable text, source crops and uncertainty flags.

CursiveMarkdownSource cropAI summary

Source: Library of Congress Alexander Graham Bell notebook page. Output shown as a review/export package preview.

BeforePublic image
Real cursive lab notebook page
AfterMarkdown + review
DocUnlocked Markdown and review preview for cursive lab notebook
Public government PDF

IRS W-4 / vendor-admin style forms

Public tax forms prove structured field and table recovery. For live customers, this maps to W-9, onboarding packets and vendor files.

FieldsTablesWordJSON

Source: irs.gov W-4 PDF. Public federal document processed through MinerU for the deployed proof asset.

BeforePDF
Before screenshot of IRS W-4 public PDF
AfterMinerU preview
After screenshot of extracted IRS W-4 output
Technical documents

Tables, formulas and multi-column reports

For labs, engineering offices and technical teams, the final deliverable is often Markdown, DOCX, HTML, JSON and a structured summary, not just a CSV.

TablesFiguresFormulasMarkdown

Synthetic technical stress sample processed through the real MinerU flow. No private or customer document used.

BeforePDF
Before screenshot of technical PDF stress sample
AfterMinerU preview
After screenshot of technical PDF extraction output
SEO attack surface

Own the long-tail search intent by document type.

Every vertical page targets a concrete pain: paper document in, usable output out.

Freight documents

bill of lading to CSV, freight invoice to Excel, proof of delivery OCR.

Bill of lading page

Construction paperwork

daily field report to Excel, stormwater inspection PDF extraction, contractor forms to spreadsheet.

Construction page

Property operations

inspection checklist to spreadsheet, NSPIRE checklist to CSV, maintenance forms to action list.

Property page

Bookkeeping and vendor admin

W-9 to CSV, vendor forms to spreadsheet, scanned invoice lines to Excel.

Vendor forms page

Labs and bureau d'etude

technical report to Markdown, tables and formulas to JSON, PDF figures to AI-ready package.

Lab reports page

Handwritten paperwork

handwritten forms to Excel, handwritten table to spreadsheet, handwriting OCR with review flags.

Handwriting page

General scanned forms

scanned forms to Excel, PDF table to CSV, paper forms to structured data.

Scanned forms page
North star product path

Interface first. Chrome extension next. API when volume proves it.

1

Web proof lab

Users upload 5 pages and see the output before trusting the system.

2

Watched inbox

Forward PDFs from AP, dispatch or property inboxes into a review queue.

3

Chrome extension

Capture PDFs from portals and web apps without making users download/re-upload manually.

4

Workflow exports

Push CSV/XLSX/JSON/Markdown packages into Sheets, Airtable, accounting tools or ERPs.

5

API

Offer API access only after repeatable vertical workflows and willingness-to-pay are proven.

Test your own document

See what comes back before betting the workflow on it.

Upload a file you are authorized to process. The launch flow validates the file and prepares a secure checkout or support handoff depending on backend availability.

No file selected

PDF, JPG, PNG, WEBP, DOC, DOCX, PPT, PPTX up to 200 pages / 200 MB.

Do not upload documents you are not authorized to process. Cloud document parsing is used for this feature.

Launch pricing

Simple enough to test, capped enough to protect margins.

Privacy and handling

  • MinerU and payment secrets stay server-side in the backend design.
  • No private customer, Benjamin or F2M documents are used in public proof assets.
  • Cloud parsing is used; regulated or highly sensitive documents require the customer's own approval path.
  • Results must be reviewed before legal, financial or official use.
Accuracy note.
DocUnlocked is a document recovery and workflow extraction service, not a certified transcription or legal records service. Scan quality, handwriting, layout, language and file condition affect results.