From PDFs to accessible web page PDF importer for Drupal

Our open-source PDF importer turns uploaded PDFs into structured HTML — ready for editorial review and publishing in Drupal. It’s in active development, built in the open funded by councils, help shape what comes next.

Open source

Background processing for long docs 

Plug-in based pipelines

Ai optional

Link and image handling 

Built from real-world PDFs and automated tests 

What it does

Lots of important information still arrives as PDFs. This tool helps you move that content onto the website as proper pages, without spending hours manually reformatting.

With the PDF importer you can:

  • upload a PDF
  • extract text, restore links, and pull out images into Drupal Media
  • (optionally) use AI to add structure like headings, lists, tables, sensible pagination, and page titles
  • save the result into Drupal in a consistent, reviewable format

Why it matters

PDFs can be hard to read on mobile, difficult to keep up to date, and risky for accessibility. Converting a single document into clean, structured HTML can take hours (sometimes days) if you’re doing it by hand. This module reduces that grind, so teams can publish faster and spend more time improving content quality.

“PDFs are chaos.”

That’s why the importer has automated tests using a library of real-world PDFs, so we can keep improving reliability as more organisations adopt it.

Built to be reusable across Drupal

This isn’t a one-off script or a brittle one-client solution. We’re actively working to uncouple the importer from LocalGov Drupal so it can benefit more Drupal sites and distributions.

The goal is a flexible “import engine” that can:

  • support different content models (pages, publications, documents, knowledge bases)
  • be configured for different document types and organisational needs
  • remain open source and community-driven

The importer is up for an award!! 

Vote for us 

How it works

At its simplest:

  1. Upload a PDF
  2. The importer runs an import pipeline (background processing for longer documents)
  3. Editors review the output in Drupal and publish

A key design decision is to process AI structure in one go for the whole document (not page-by-page). That improves consistency (titles, headings, page breaks) and avoids awkward splits mid-table or mid-list.

Open source and actively funded by partners

This module is being built in the open, with funding and collaboration:

  • Prototype funded by Chicken
  • v1.0 funded by Southwark Council
  • v1.1 funded by West Lindsey District Council

If you want this tool to exist (and get better), partner funding is what makes that possible.

Want a demo or to explore co-funding? 

Get in touch and we’ll show you the importer workflow, what it already handles well, and what we’d have build next with partners. 

Join our Publication importer mailing list for updates from us