How big a PDF can you import? We've successfully imported PDFs of up to 300 pages. The importer works best when the source document is reasonably well-structured — a chaotic 300-page PDF is going to produce chaotic HTML.
Will importing lots of PDFs dramatically increase the page count on my site? Not in the way you might worry about. The importer uses natural breaks and pagination rather than dumping everything onto one giant page. For tools like Siteimprove that charge per page, it's worth checking your contract — but in practice the impact is minimal.
Does it pull through alt text for images and tables? Not automatically, but AI can generate it. You can include a prompt instruction to generate alt text for images, and the quality is generally good enough to be a useful starting point. You'd still want a human to review it.
Can I tell it to change all headings to sentence case? Yes. You can include formatting instructions like this in the AI prompt — small, consistent fixes applied across a whole document in one pass.
What about telephone numbers and hyperlinks that aren't active in the PDF? The AI can be prompted to recognise these patterns and make them clickable. It's not automatic right now, but it's a straightforward addition to the prompt.
Does it work with footnotes? It's an area of active development. Not fully solved yet — watch this space.
Does it work with Microsoft Copilot? Not directly. Copilot works differently from AI providers like OpenAI, Google Gemini, or AWS Bedrock and isn't compatible with the Drupal AI module in the same way. There are several solid alternatives depending on how your council is set up — Gemini works well (Colchester are using it without issues), and AWS Bedrock is worth exploring if you're hosted in that environment. Get in touch and we can help you find the right fit.
Can I edit the content after it's been imported? Yes. Once imported, the content behaves like any other content in Drupal. You can edit, correct, and update it as normal before publishing.
Does it work with Word documents? Not directly, but you can save a Word doc as a PDF first and then import it. This works particularly well when the original document is fairly plain — images and basic formatting come through cleanly.
What happens if the PDF contains personal information? You can add instructions to the AI prompt to strip or flag personal data during extraction. Alternatively, a custom extract plugin could handle this. It's not built in by default, so worth factoring into your workflow.
Do the imported publications appear in search results? Yes — they're web pages, so they're indexed like any other content on your site. If some of the content already exists elsewhere on your site, it's worth thinking through your content strategy to avoid duplication before you import.
Should PDFs disappear altogether, or can you have both? Both is fine, and sometimes the right answer. The ideal is to move to HTML-only over time — content designed for the web is always better than a converted PDF — but you can publish both in parallel where there's a genuine need. The importer is a stepping stone, not a permanent workaround.
Is it available for Drupal 10 and 11? Yes, it works with D10. D11 compatibility is in scope — worth checking the GitHub repo for the latest status if you're planning an upgrade.
What about meeting minutes — could it handle thousands of PDFs from systems like Modern.gov? Early conversations are happening about exactly this use case. Nothing to announce yet, but it's on the radar.
Is the importer open source? Yes. You can find it at github.com/localgovdrupal/localgov_publications_importer.