Merge internal PDF Table of Contents
When working with large PDF documents, having a detailed and accurate Table of Contents (ToC) is crucial for navigation and usability. However, many PDFs lack a proper ToC or have one that is not well integrated into the document structure. Or in my case, I had several PDFs with their own internal ToC, but they were not merged into a single, cohesive ToC for the entire document.
To address this issue, I developed a Python script that automates the process of merging an internal ToC into a PDF file. This tool is designed to be run from the command line, making it accessible and easy to use for anyone who needs to enhance their PDF documents.
#!/usr/bin/env python
= 0
=
# Get title and page number
=
= +
# original_page = reader.get_page_number(item.page) + 1 # 1-based for humans
# title = f"{item.title} (p.{original_page})"
# page_number = original_page - 1 + page_offset # 0-based for PDF
# Add current bookmark
=
# Look ahead: if next item is a list, treat it as children
+= 1 # Skip the children list in next iteration
# Unexpected nested list (should always follow a Destination)
# Recursively process just in case
+= 1
=
= 0
=
=
# Create a new root bookmark using the filename
= +
= None
#PaperSize.A5.width, PaperSize.A5.height)
=
# Append pages
# writer.append(reader, import_outline = False)
# Copy and nest the original outline under this root
=
+=
# Write the output
=
=
And since I also use NixOS, I wrote a separate flake. It can be used for a development environment or (if modified) can be used to install the script as a program.
{
description = "Dev shell with Python and pypdf for PDF TOC merging";
inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
inputs.flake-utils.url = "github:numtide/flake-utils";
outputs = { self, nixpkgs, flake-utils }:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages.system;
in {
devShells.default = pkgs.mkShell {
name = "pypdf-shell";
buildInputs = [
pkgs.python312
pkgs.python312Packages.pypdf
];
};
});
}