Simon Willison took LlamaIndex’s LiteParse, a Node CLI for extracting text from PDFs, and got it running as a static browser app. Upload a PDF, choose OCR or not, and the parsing happens locally in the page.

That sounds like a small developer convenience. It is actually a pretty good pattern for a lot of AI-adjacent work: move the boring-but-sensitive preprocessing closer to the user, then send only the useful result downstream if you need a model at all.

The PDF step is usually where the workflow gets messy

LiteParse is not trying to be a magic model. It uses PDF.js and OCR tooling to solve a less glamorous problem: pulling text out of awkward layouts in an order that a human, search index, or RAG pipeline can actually use.

That matters because PDF handling is still one of the places otherwise polished AI workflows go to become weird. Multi-column layouts, scanned pages, figures, and citations can turn a simple “ask this document” feature into a pile of brittle glue code.

  • browser-side parsing keeps private documents off a server by default
  • OCR can be optional instead of always-on compute tax
  • structured output gives builders something easier to inspect and test
  • visual citation ideas become more credible when page positions survive extraction

The builder lesson is less about LiteParse specifically and more about where work should run. If the task is deterministic, inspectable, and privacy-sensitive, the browser may be a better first runtime than a hosted API.

That does not eliminate the need for review. Willison is clear that this was largely vibe-coded with Claude Code, then sanity-checked by testing behavior rather than reading every line. But for a static tool with no document upload path, the blast radius is refreshingly small. A sentence I do not get to write often enough.

Who should care? Anyone building document ingestion, internal research tools, lightweight RAG systems, or support workflows where users drag in PDFs and expect the system not to fall over theatrically.

The practical next move is simple: test LiteParse against your ugliest PDFs before you wire another model into the stack. If local parsing gets you cleaner text, better citations, or fewer uploads of sensitive files, that is not a demo flourish. That is workflow architecture doing its job.

In short

Simon Willison ported LlamaIndex’s LiteParse PDF parser into a browser app. The useful bit is not just PDF extraction. It is the local-first pattern for AI-adjacent tools.