top of page

Critical XXE Vulnerability in Apache Tika Allows Data Exposure via Malicious PDFs (CVE-2025-66516)

December 17th, 2025

Critical

Our Cyber Threat Intelligence Unit is monitoring a critical XML External Entity (XXE) vulnerability in Apache Tika, identified as CVE-2025-66516. Publicly disclosed in early December 2025, it allows an unauthenticated attacker to exploit malicious XFA (XML Forms Architecture) content embedded in PDFs, triggering unsafe external entity resolution during parsing. Apache Tika is widely used for document ingestion, metadata extraction, content analysis, and full-text indexing, including in search and content-ingestion pipelines. Consequently, any application or service that processes untrusted PDF documents with affected Tika versions may be at risk. Notably, this CVE covers the same underlying vulnerability as CVE-2025-54988 (disclosed in August 2025) but expands the scope of affected packages and clarifies that deployments remain vulnerable if only the PDF module was updated; full remediation requires upgrading tika-core to 3.2.2 or later. The Apache Software Foundation has assigned this issue a maximum severity rating (CVSS 10.0), emphasizing the significant risks posed by successful exploitation and the importance of prompt remediation. 

Technical Details

  • CVE ID: CVE-2025-66516.

  • Severity: Critical (CVSS 10.0 - Apache Software Foundation CNA; NVD score pending).

  • Vulnerability Type: XML External Entity (XXE) injection.

  • Attack Vector: Malicious PDF with embedded XFA content.

  • Affected Components and Versions:

    • Apache Tika Core (org.apache.tika:tika-core): versions 1.13 through 3.2.1.

    • Apache Tika PDF Module: versions 2.0.0 through 3.2.1.

    • Apache Tika Parsers (1.x line): versions 1.13 through 1.28.5, where PDFParser is bundled.

  • Fixed Versions:

    • Apache Tika Core: 3.2.2 or later.

    • Apache Tika PDF Module: 3.2.2 or later.

    • Apache Tika Parsers (1.x): Upgrade to 2.0.0 or later (or migrate to current 3.x line).

  • Root Cause:

    • An attacker can craft a PDF containing a malicious embedded XFA file that defines external XML entities (for example, referencing file:// paths or remote URLs).

      • When Apache Tika parses the document using default settings, it may resolve these external entities, resulting in unintended access to local files or network resources.

    • Notably, the underlying issue resides in tika-core:

      • As a result, upgrading only the PDF parsing module without updating tika-core to a fixed version does not fully mitigate the vulnerability.

Image by ThisisEngineering

Impact

Successful exploitation of CVE-2025-66516 may result in:

  • Sensitive data exposure, including unintended access to local files, configuration data, or credentials on systems performing document parsing.

  • Server-side request forgery (SSRF) or unintended outbound connections to internal or external network resources.

  • Denial of Service (DoS) through resource exhaustion or parser disruption.

  • Compromise of document ingestion or indexing pipelines, potentially allowing metadata leakage or corruption of downstream systems.

While this vulnerability does not directly enable remote code execution according to available vendor advisories, downstream risk may increase depending on how dependent applications handle parsed content. Considering Tika’s widespread adoption across enterprise, cloud, and open-source environments, the potential exposure surface is extensive.

Detection Method

Organizations should consider the following detection and monitoring actions:

  • Review document-ingestion and parsing logs for unexpected parsing errors, XML exceptions, or anomalous behavior during PDF processing.

  • Monitor for unexpected outbound network connections originating from document-processing services.

  • Inspect incoming PDFs for embedded XFA or suspicious XML structures, including external entity declarations.

  • Perform dependency and software-composition analysis to identify outdated or vulnerable Apache Tika components.

Indicators of Compromise

There are No Indicators of Compromise (IOCs) for this Advisory.

mix of red, purple, orange, blue bubble shape waves horizontal for cybersecurity and netwo

Recommendations

  • Patch immediately: Upgrade Apache Tika to version 3.2.2 or later, ensuring that tika-core, PDF modules, and parser dependencies are all updated.

  • Validate complete dependency alignment: Confirm that no legacy or transitive Tika components remain at vulnerable versions.

  • Harden parsing workflows: Disable external entity resolution and DTD processing where possible when handling untrusted documents.

  • Restrict high-risk content: Reject, quarantine, or pre-scan PDFs containing XFA or complex XML content before ingestion.

  • Isolate document processing: Run parsing services in sandboxed or containerized environments with minimal filesystem and network permissions.

  • Monitor continuously: Audit document ingestion pipelines for anomalies or unexpected access patterns following parsing operations.

Conclusion

CVE-2025-66516 is a critical XXE vulnerability in Apache Tika that can be exploited simply by submitting a maliciously crafted PDF. Given Tika’s extensive use in document ingestion and indexing pipelines, the risk is both wide-reaching and serious. We urge organizations to prioritize immediate patching, ensure all dependencies are fully upgraded, and harden document-processing workflows to minimize exposure and protect themselves against data leaks, unauthorized network access, and disruptions to essential services.

bottom of page