PDFs are chaos. We built a tool to fix it. PDF importer for Drupal

Our open-source PDF importer turns uploaded PDFs into structured HTML — ready for editorial review and publishing in Drupal. Actively used in production, funded by councils, and up for a Digital Leaders award.

Open source

Background processing for long docs 

Plug-in based pipelines

AI optional

Link and image handling 

Built from real-world PDFs and automated tests 

What it does

Lots of important information still arrives as PDFs. This tool helps you move that content onto the website as proper pages, without spending hours manually reformatting.

Upload a PDF. The importer extracts the text, restores links, and pulls images into Drupal Media. Optionally, AI adds structure: headings, lists, tables, sensible pagination, and page titles. The result lands in Drupal in a consistent, reviewable format — ready for an editor to check and publish.

Manual PDF conversion can take hours, sometimes days. This reduces it to minutes.

Key features

  • Background processing for longer documents
  • Plugin-based pipelines — swap out the extractor, AI model, or output content type
  • AI is optional — works without it, works better with it
  • Link and image handling
  • Automated tests built from a library of real-world PDFs
  • Being decoupled from LocalGov Drupal so it works across more Drupal distributions

Why it matters

PDFs are hard to read on mobile, difficult to keep up to date, and carry real accessibility risk. Converting one document to clean HTML by hand can take a full day. Multiply that by a council's backlog of thousands of documents, and the problem is significant.

This module reduces that grind so teams can publish faster and focus on improving content — not reformatting it.

Hands in a huddle ready to do a high-five

The importer is up for an award

Our partnership with Southwark Council produced something that belongs to everyone — built in the open, tested against real council content, designed to be reused.

We want to make sure Southwark get the recognition they deserve for fully embracing Drupal and open source. 

Vote for Southwark Council at the Digital Leaders awards

Want a demo or to explore co-funding? 

Get in touch and we’ll show you the importer workflow, what it already handles well, and what we’d have build next with partners. 

 

How it works

  1. Upload a PDF
  2. The importer runs an import pipeline (background processing for longer documents)
  3. Editors review the output in Drupal and publish

One design decision worth knowing: AI structure is applied to the whole document in one pass, not page by page. That means consistent titles, headings, and page breaks — and no awkward splits mid-table or mid-list.
 

Built to work beyond LocalGov Drupal (soon)

This isn't a one-off script. We're looking to decouple the importer from LocalGov Drupal so it can work across more Drupal sites and distributions.

The goal is a flexible import engine that supports different content models — pages, publications, documents, knowledge bases — and can be configured for different document types and organisational needs. Open source throughout.

Funded and built in the open

This module is being built in the open, with funding and collaboration:

  • Prototype funded by Chicken
  • v1.0 funded by Southwark Council
  • v1.1 funded by West Lindsey District Council

If you want this tool to exist and keep improving, partner funding is what makes that possible. Get in touch to explore co-funding

Join our PDF importer mailing list for updates from us