About this tool
Extract every hyperlink from a PDF (every URL, every cross-document reference, every clickable link) into a clean list. Useful for auditing what a PDF links to before publishing, archiving the references in a research paper, or migrating link structures from a PDF into a spreadsheet or database.
When to use it
- Auditing a contract or report for any external links before publishing
- Archiving the URL list from a research paper for reference checking
- Migrating link structures from a PDF report into a spreadsheet
- Verifying that all links in a PDF still resolve (you'll need a separate link checker for that)
- Producing a bibliography of online references from a research document
What to expect
Only PDF link annotations are extracted (the clickable areas). Plain-text URLs that aren't formatted as actual hyperlinks won't be detected. You'd extract those via the Extract Text tool and then regex out the URLs. Internal cross-references (e.g., to figures) are included alongside external URLs.
Frequently asked questions
Will plain-text URLs in the document body be extracted?
Only if they're hyperlinked, i.e., clickable. URLs typed into the body without being made into actual links won't be detected by link extraction. For those, use Extract Text and then a URL regex.
Does this include internal cross-references?
Yes. Internal links (e.g., a clickable chapter reference or figure cross-reference) are included alongside external URLs, with a label indicating they're internal.
Can I check whether the links are still alive?
Not from this tool. That requires HTTP requests to each URL, which we don't do. Once you have the list, use a link-checker tool or a quick script to verify.