Skip to main content

How to Check Whether a PDF Is Reasonably Safe

· 7 min read
Pere Pages
Software Engineer
A PDF document icon being inspected through a magnifying glass

A PDFPortable Document Format — is usually just a document. But the format can also carry active features: links, forms, embedded files, automatic actions, multimedia, and sometimes JavaScript. That's what makes "is this PDF safe?" a fair question — and also a slightly wrong one.

You can almost never prove a file is safe. The useful question is:

"Is opening this proportionate to where it came from?"

This post answers that honestly: what the real risk is, why it's usually small, and — if you're curious or genuinely suspicious — how to look inside a PDF yourself.

The honest bottom line

For an everyday PDF from a source you trust, opened in an updated mainstream viewer — your browser, or Preview on macOS — you're already in good shape. You don't need to do anything special.

Modern viewers are the reason:

What protects youWhy it matters
SandboxingThe viewer is isolated from the rest of your system
JavaScript off / restrictedDocument scripts don't run, or run with almost no capability
/Launch ignoredA PDF can't quietly start another program
Maintained / patchedKnown parser bugs get fixed

The scary PDF exploits people remember mostly targeted old Adobe Acrobat/Reader, which executed document JavaScript and honored launch actions by default. Browser PDF engines (Chrome, Edge, Firefox) and macOS Preview were built in a more hostile era and don't behave that way.

Solid reading-focused viewers:

SystemReasonable options
macOSPreview, browser PDF viewer
WindowsMicrosoft Edge, Chrome, Firefox
LinuxEvince, Okular, browser PDF viewer

So PDFs can be dangerous — but for most files, the format isn't where your risk actually lives.

So where does the risk actually live?

Two places, in rough order of how often they bite real people.

1. The source, not the file

For ebooks and downloaded documents, the danger usually comes from where you got it:

Source behaviorRisk
Fake download buttonsMalware or phishing
Bundled .zip filesExtra payloads
Fake installersMalware
Browser popupsSocial engineering
Login promptsCredential theft

A sketchy download page is a stronger warning sign than anything a file inspection will turn up. A clean-looking PDF from a shady source still deserves caution.

2. Active features inside the PDF

The narrower risk is the active features the format allows. They aren't malicious by themselves — plenty of legitimate PDFs use forms or links — but they're the surface an attacker would use:

FeatureWhy it can matter
JavaScript (/JavaScript, /JS)Code-like behavior inside the document
/OpenAction, /AAActions that fire automatically on open
/LaunchTries to start an external program
/EmbeddedFileA file hidden inside the PDF
/AcroForm, /SubmitFormForms and form submission (data out)
/RichMediaEmbedded media; more parser surface
/XFAXML Forms Architecture; advanced forms

An updated, sandboxed viewer neutralizes most of these. Inspecting for them is useful mainly when you're suspicious enough to want a look before opening — which is the rest of this post.

Looking inside a PDF (optional)

This part is curiosity-and-suspicion territory

Everything below is a nice way to see what a PDF actually contains, and a reasonable extra step for a file you're unsure about. It is not a checklist you owe every document. If you trust the source and have a modern viewer, you can stop reading here.

The examples use a real ebook file, EL_ARTE_DE_PENSAR_Dobelli.pdf.

Is it structurally valid?

Start with qpdf:

qpdf --check EL_ARTE_DE_PENSAR_Dobelli.pdf

Example output:

checking EL_ARTE_DE_PENSAR_Dobelli.pdf
PDF Version: 1.5
File is not encrypted
File is not linearized
No syntax or stream encoding errors found; the file may still contain
errors that qpdf cannot detect

This is a good sign. It means:

OutputMeaning
PDF Version: 1.5Normal PDF format version
File is not encryptedNo password or encryption hiding the contents
File is not linearizedNot optimized for progressive web loading; not security-relevant
No syntax or stream encoding errors foundThe internal PDF structure looks valid

But it does not prove the file is safe. qpdf --check checks structure; it does not detect malware.

Search for active features

Look for the tokens from the table above:

strings EL_ARTE_DE_PENSAR_Dobelli.pdf | grep -Ei "javascript|openaction|launch|embeddedfile|acroform|submitform"

No output is good — none of those features were found.

A small shell detail: when grep finds nothing it exits with code 1, and some terminals show that as a failure marker (for example ). That doesn't mean the command broke; it just means "no matches found."

One important caveat: strings reads the raw file, so it can miss tokens that live inside compressed object streams. A clean result here is reassuring but not conclusive — the unpack step below is the reliable one.

Watch out for false positives, too. Searching for links and advanced actions:

strings EL_ARTE_DE_PENSAR_Dobelli.pdf | grep -Ei "/URI|http|https|/AA|/RichMedia|/XFA"

A hit like m/aA looks like /AA (Additional Actions), but it only matched because -i made the search case-insensitive — it caught /aA inside an unrelated string. Dropping -i is stricter:

strings EL_ARTE_DE_PENSAR_Dobelli.pdf | grep -E "/URI|http|https|/AA|/RichMedia|/XFA"

Unpack and search again (the reliable check)

Because features can hide in compressed streams, decompress the file first, then search:

qpdf --qdf --object-streams=disable EL_ARTE_DE_PENSAR_Dobelli.pdf unpacked.pdf

grep -aE "/JavaScript|/JS|/OpenAction|/AA|/Launch|/EmbeddedFile|/AcroForm|/SubmitForm|/URI|/RichMedia|/XFA|http|https" unpacked.pdf

No output here is a strong signal — no obvious sign of any of these:

TokenWhy it matters
/JavaScript or /JSJavaScript inside the PDF
/OpenActionAction triggered when opening the file
/AAAdditional automatic actions
/LaunchAttempt to launch an external program
/EmbeddedFileFile embedded inside the PDF
/AcroFormInteractive form
/SubmitFormForm submission
/URI, http, httpsLinks
/RichMediaEmbedded media
/XFAXML Forms Architecture, advanced PDF forms

Then remove the inspection copy:

rm unpacked.pdf

Putting it together: the example file

For EL_ARTE_DE_PENSAR_Dobelli.pdf, every check came back clean:

CheckResult
qpdf --checkPassed
Not encryptedYes
No JavaScript foundYes
No automatic open actions foundYes
No embedded files foundYes
No forms foundYes
No links foundYes
No rich media foundYes
Risk level: low
Safe to open: reasonably yes
Recommended viewer: simple, maintained, reading-focused PDF viewer
Remaining caution: trust the source, not just the file

A practical decision flow

If a file is genuinely suspicious: isolate it

When a file is suspicious but you still need to look at it, contain it instead of trusting it:

MethodUse when
Separate OS userYou want basic containment
Virtual Machine — VMThe file is suspicious but you need to inspect it
Disposable browser profileYou only need a quick visual check
Offline machineYou want to prevent network access

What the checks did — and didn't — prove

The inspection above checks for common risk indicators. It does not prove a PDF is harmless: a malicious file could still exploit a vulnerability in the reader itself, which is exactly why the viewer matters more than the checklist.

That's the honest framing of the whole exercise:

  • For everyday files from sources you trust, an updated mainstream viewer is enough.
  • The biggest real-world risk is usually the source, not the PDF bytes.
  • The deep inspection is a proportionate extra step for files you're unsure about — and a genuinely interesting look at how PDFs work — not a ritual for every document.

Security is rarely about certainty. It's about reducing exposure to a level that matches where the file came from.