CVE-2025-4517是一个严重的路径遍历漏洞,影响Python标准库中的tarfile模块。该漏洞允许攻击者通过恶意构造的.tar文件,将文件写入到系统的任意路径,导致敏感文件被覆盖和系统被完全接管。受影响的版本在Python 3.14及以上中默认启用的filter设置使得这一风险更为严重。修复版本已在多个Python更新中发布,建议用户升级到安全补丁版本并在解压时实施严格的路径校验和零信任原则。 CVE-2025-4517 is a critical path traversal vulnerability affecting the tarfile module in Python's standard library. This vulnerability allows attackers to write files to arbitrary paths on the system via a maliciously crafted .tar file, leading to sensitive files being overwritten and the system being fully compromised. The risk is further exacerbated by the filter setting enabled by default in Python 3.14 and above. Patched versions have been released in multiple Python updates. Users are advised to upgrade to patched versions and implement strict path validation and zero-trust principles when extracting.
1. 漏洞概述 (Vulnerability Overview)
- 漏洞类型:路径遍历导致任意文件写入 (Path Traversal leading to Arbitrary File Write)。
- CVSS 评分:Common Vulnerability Scoring System (CVSS) v3.1 评分为 9.4 (Critical / 严重)。
- 受影响组件:Python 标准库中的
tarfile模块。 - 核心影响:当程序解压不可信的
.tar归档文件时,攻击者可以通过精心构造的恶意压缩包,将文件写入到目标解压目录之外的任意系统路径下。这可能导致敏感文件被覆盖、未授权代码的执行,甚至系统被完全接管 (System Compromise)。
2. 技术原理 (Technical Details)
在 Python 的 tarfile 模块中,开发者通常使用 tarfile.TarFile.extractall() 或 tarfile.TarFile.extract() 这两个专业函数来解压文件。
为了提高解压安全性,Python 引入了 filter 参数机制(例如过滤掉危险的符号链接)。然而,CVE-2025-4517 的核心问题恰恰出在这个过滤机制的设计缺陷上:当开发者将 filter 参数设置为 "data" 或 "tar" 时,tarfile 模块未能正确地验证和规范化符号链接 (Symbolic Links) 或硬链接 (Hard Links) 的目标路径。
底层逻辑缺陷:
这本质上是一个路径验证与路径实现之间的不匹配问题 (Mismatch between path validation and path realization)。在底层处理(例如调用操作系统的 os.path.realpath() 函数处理 PATH_MAX 限制条件)时,如果恶意 .tar 文件内部包含指向 ../../../../etc/passwd 或其他关键系统目录的恶意成员名,tarfile 的 "data" 过滤器无法成功将其拦截。
结果就是,原本应该被限制在安全解压目录下的文件,突破了边界,被跨目录写入到了系统的绝对路径中。
3. 受影响的版本与高危触发条件 (Affected Versions and Triggers)
高危触发条件: 只要代码同时满足以下情况,就处于危险之中:
- 提取来自外部不可信来源的 tar 归档文件(如用户上传、网络下载的资源包等)。
- 在代码中调用了带有特定过滤器的解压函数:
tarfile.TarFile.extractall(filter="data")或tarfile.TarFile.extract(filter="data")。
特别注意:在 Python 3.14 及更高版本中,filter 的默认值从 "no filtering"(无过滤)更改为了 "data"。这意味着即使开发者没有显式指定过滤器,依赖新版默认行为的代码也会自动受影响。
已发布修复的 Python 版本: Python Software Foundation (Python 软件基金会) 已在以下更新版本中修复了该问题:
- Python 3.13.4
- Python 3.12.11
- Python 3.11.13
- Python 3.10.18
- Python 3.9.23
4. 实战利用场景 (Exploitation Scenarios)
在红队操作 (Red Teaming) 或渗透测试 (Penetration Testing) 的实战视角下,由于该漏洞的攻击门槛较低且无需用户交互,它非常适合作为获取初始访问权限 (Initial Access) 的跳板:
- 持续集成/持续部署流水线 (Continuous Integration / Continuous Deployment Pipelines, CI/CD):CI/CD runner 经常从镜像仓库或缓存中拉取并解压构建产物 (Artifacts)。红队可以投毒一个恶意的
.tar包,解压时直接覆盖 runner 的 SSH 密钥或环境变量配置文件。 - 机器学习环境 (Machine Learning Pipelines, ML):加载外部的预训练模型权重包(通常打包为
.tar或.tar.gz)时,如果在自动化处理脚本中默认信任了模型文件,极易触发此漏洞覆盖模型加载器的执行逻辑。 - 自动化沙箱与插件系统 (Automated Sandboxes / Plugin Ecosystems):在自动提取并分析未知文件,或安装第三方插件时,沙箱环境可能被反向穿透,导致宿主机受控。
5. 缓解与修复措施 (Mitigation and Workarounds)
- 版本升级 (Upgrade):最直接且彻底的方案是将受影响环境的 Python 升级到上述对应的安全补丁版本。
- 代码层面的临时加固 (Code-level Workarounds):
1. Vulnerability Overview
- Vulnerability Type: Path Traversal leading to Arbitrary File Write.
- CVSS Score: Common Vulnerability Scoring System (CVSS) v3.1 score is 9.4 (Critical).
- Affected Component: The
tarfilemodule in Python's standard library. - Core Impact: When a program extracts untrusted
.tararchive files, an attacker can use a maliciously crafted compressed package to write files to arbitrary system paths outside the target extraction directory. This may lead to sensitive files being overwritten, unauthorized code execution, or even system compromise.
2. Technical Details
In Python's tarfile module, developers typically use the professional functions tarfile.TarFile.extractall() or tarfile.TarFile.extract() to extract files.
To improve extraction security, Python introduced a filter parameter mechanism (for example, to filter out dangerous symbolic links). However, the core issue of CVE-2025-4517 lies precisely in the design flaw of this filtering mechanism: when developers set the filter parameter to "data" or "tar", the tarfile module fails to properly validate and normalize the target paths of symbolic links or hard links.
Underlying Logical Flaw:
This is essentially a mismatch between path validation and path realization. During underlying processing (such as calling the operating system's os.path.realpath() function to handle PATH_MAX constraints), if a malicious .tar file internally contains malicious member names pointing to ../../../../etc/passwd or other critical system directories, the tarfile module's "data" filter cannot successfully intercept them.
As a result, files that should have been confined to the safe extraction directory break through boundaries and are written to absolute paths in the system across directories.
3. Affected Versions and High-risk Trigger Conditions
High-risk Trigger Conditions: As long as the code simultaneously meets the following conditions, it is at risk:
- Extracting tar archive files from external untrusted sources (such as user uploads, network-downloaded resource packages, etc.).
- Calling extraction functions with specific filters in the code:
tarfile.TarFile.extractall(filter="data")ortarfile.TarFile.extract(filter="data").
Special note: In Python 3.14 and higher versions, the default value of filter has changed from "no filtering" (no filtering) to "data". This means that even if developers do not explicitly specify a filter, code relying on the new default behavior will be automatically affected.
Python Versions with Fixes Released: The Python Software Foundation has fixed this issue in the following updated versions:
- Python 3.13.4
- Python 3.12.11
- Python 3.11.13
- Python 3.10.18
- Python 3.9.23
4. Exploitation Scenarios
From a practical perspective of red teaming or penetration testing, due to the low attack threshold and no user interaction required for this vulnerability, it is very suitable as a springboard for gaining initial access:
- Continuous Integration / Continuous Deployment Pipelines (CI/CD): CI/CD runners often pull and extract build artifacts from image repositories or caches. Red teams can poison a malicious
.tarpackage, and during extraction, directly overwrite the runner's SSH keys or environment variable configuration files. - Machine Learning Environments (ML): When loading external pre-trained model weight packages (usually packaged as
.taror.tar.gz), if automated processing scripts trust the model files by default, this vulnerability is easily triggered to overwrite the execution logic of the model loader. - Automated Sandboxes / Plugin Ecosystems: When automatically extracting and analyzing unknown files or installing third-party plugins, sandbox environments may be reverse-penetrated, leading to host compromise.
5. Mitigation and Workarounds
- Upgrade: The most direct and thorough solution is to upgrade the affected Python environment to the corresponding security patch versions mentioned above.
- Code-level Workarounds: