PDFs are chaos. We built a tool to fix it. PDF importer for Drupal

Our award winning open-source PDF importer turns uploaded PDFs into structured HTML — ready for editorial review and publishing in Drupal. Actively used in production, funded by councils.

Open source

Background processing for long docs 

Plug-in based pipelines

AI optional

Link and image handling 

Built from real-world PDFs and automated tests 

What it does

Lots of important information still arrives as PDFs. This tool helps you move that content onto the website as proper pages, without spending hours manually reformatting.

Upload a PDF. The importer extracts the text, restores links, and pulls images into Drupal Media. Optionally, AI adds structure: headings, lists, tables, sensible pagination, and page titles. The result lands in Drupal in a consistent, reviewable format — ready for an editor to check and publish.

Manual PDF conversion can take hours, sometimes days. This reduces it to minutes. Typically a 50 page document takes around 250 minutes to copy and paste, the importer can do this in under 10 minutes.

Key features

  • Background processing for longer documents
  • Plugin-based pipelines — swap out the extractor, AI model, or output content type
  • AI is optional — works without it, works better with it
  • Link and image handling
  • Automated tests built from a library of real-world PDFs
  • Being decoupled from LocalGov Drupal so it works across more Drupal distributions

Why it matters

PDFs are hard to read on mobile, difficult to keep up to date, and carry real accessibility risk. Converting one document to clean HTML by hand can take a full day. Multiply that by a council's backlog of thousands of documents, and the problem is significant.

This module reduces that grind so teams can publish faster and focus on improving content — not reformatting it.

 

Ru and Eve from Southwark holding Southwark's shiny new award. Eve is giving a big thumbs up.

The importer is award winning!

Our partnership with Southwark Council produced something that belongs to everyone — built in the open, tested against real council content, designed to be reused.

We brought together real-world content, accessibility requirements and publishing workflow with Chicken's deep Drupal knowledge and open source mindset. What came out of that partnership wasn't just a tool — it shaped solutions.

— Angie Forson - Web and Digital Programme Lead, London Borough of Southwark

Want a demo or to explore co-funding? 

Get in touch and we’ll show you the importer workflow, what it already handles well, and what we’d have build next with partners. 

 

How it works

  1. Upload a PDF
  2. The importer runs an import pipeline (background processing for longer documents)
  3. Editors review the output in Drupal and publish

One design decision worth knowing: AI structure is applied to the whole document in one pass, not page by page. That means consistent titles, headings, and page breaks — and no awkward splits mid-table or mid-list.
 

Built to work beyond LocalGov Drupal (soon)

This isn't a one-off script. We're looking to decouple the importer from LocalGov Drupal so it can work across more Drupal sites and distributions.

The goal is a flexible import engine that supports different content models — pages, publications, documents, knowledge bases — and can be configured for different document types and organisational needs. Open source throughout.

Funded and built in the open

This module is being built in the open, with funding and collaboration:

  • Prototype funded by Chicken
  • v1.0 funded by Southwark Council
  • v1.1 funded by West Lindsey District Council

If you want this tool to exist and keep improving, partner funding is what makes that possible. Get in touch to explore co-funding

Join our Publication importer mailing list for updates from us