本文详细解析了路径遍历(也称目录遍历)漏洞的核心原理、实际危害与修复方案。该漏洞源于应用程序未对用户传入的文件路径参数进行严格的安全过滤,导致攻击者可利用诸如 ../ 的跳转序列突破预设目录,越权读取服务器上的任意敏感文件(如密码文件、源代码等),在特定条件下甚至能引发远程代码执行(RCE)。防御此漏洞的关键在于避免将用户输入直接传递给文件API,并结合白名单输入验证与底层路径规范化(Canonicalization)进行双重校验。 This article provides a detailed analysis of the core principles, real-world impact, and remediation strategies for path traversal (also known as directory traversal) vulnerabilities. This vulnerability arises from applications failing to perform strict security filtering on user-supplied file path parameters, allowing attackers to use traversal sequences like ../ to break out of intended directories and gain unauthorized access to read arbitrary sensitive files on the server (such as password files, source code, etc.). Under certain conditions, it can even lead to remote code execution (RCE). The key to defending against this vulnerability is to avoid passing user input directly to file APIs, and to implement dual validation using whitelist-based input validation combined with underlying path canonicalization.

1. 📌 主题摘要 (Topic Summary)

本文档探讨了路径遍历(Path Traversal,亦称目录遍历 Directory Traversal)漏洞的核心机制与防御策略,并结合 6 个实战实验(Labs),详细解析了在不同防御机制下(如绝对路径拦截、非递归过滤、URL 解码、前缀/后缀校验)的多种绕过攻击手法。


2. 🧠 核心原理 (Core Principle)

底层机制: 当 Web 应用程序将用户提供的输入(如文件名)直接拼接到服务器的文件路径中,并传递给底层的文件系统操作时,如果没有进行严格的安全验证,就会引发路径遍历漏洞。 攻击者利用操作系统的目录解析规则,输入特殊的目录遍历序列(如 Unix/Linux 下的 ../ 或 Windows 下的 ..\),使解析后的路径“向上跳出”应用程序限定的基础目录(Base Directory),从而访问到文件系统根目录及其他任意位置的文件。

术语规范

  • Path Traversal / Directory Traversal - 路径遍历/目录遍历漏洞。
  • API - Application Programming Interface (应用程序编程接口)。在此处指操作系统提供的用于读写文件的底层函数。
  • URL - Uniform Resource Locator (统一资源定位符)。
  • PoC - Proof of Concept (概念验证代码/载荷)。
  • RCE - Remote Code Execution (远程代码执行) (AI 补充说明:指攻击者利用漏洞在目标服务器上执行任意系统命令,通常是文件写入或包含漏洞的最终危害)

3. 🛠️ 实际应用与举例 (Usage & Examples - 怎么用)

应用场景: 常见于通过 URL 参数动态加载资源的场景,例如电商网站显示商品图片的接口:https://insecure-website.com/loadImage?filename=218.png

具体示例与 PoC (结合实战 Labs): 以下汇总了不同安全防御场景下的具体攻击载荷(Payload)用于读取 Linux 系统标准的用户信息文件 /etc/passwd

实验场景 (Lab Case)防御机制说明攻击载荷 (Payload)绕过原理
基础场景无任何防御措施。../../../etc/passwd连续使用 ../ 跳回文件系统根目录。
绝对路径绕过拦截了 ../ 序列,但按相对路径处理输入。/etc/passwd直接提供目标文件的绝对路径,无需遍历符号。
非递归过滤应用程序仅单次剥离/替换了 ../....//....//....//etc/passwd利用嵌套(双写)序列。当内层的 ../ 被剔除后,外层字符会重新拼接成合法的 ../
多余的 URL 解码拦截了标准遍历序列,但在验证后进行了额外的 URL 解码。..%252f..%252f..%252fetc/passwd双重 URL 编码绕过。%25 解码为 %,与 2f 结合成为 %2f,最终由应用/服务器再次解码为 /
路径起点验证验证参数必须以预期的基础文件夹路径开头。/var/www/images/../../../etc/passwd先输入合法的预期目录满足验证,随后紧跟 ../ 序列向外跳转。
文件后缀验证验证参数必须以预期的扩展名(如 .png)结尾。../../../etc/passwd%00.png空字节截断 (Null Byte Bypass)。利用 %00(URL编码的空字符)。应用层校验后缀通过,但底层 C/C++ 文件系统 API 遇到空字符会认为字符串结束,从而忽略后面的 .png

代码/函数解析

  • File (Java Class): 代表文件和目录路径名的抽象表示形式。例如 new File(BASE_DIRECTORY, userInput) 用于将基础目录与用户输入拼接。
  • getCanonicalPath() (Java Method): Returns the canonical pathname string (返回此抽象路径名的规范路径名字符串)。该方法会解析路径中的所有 .././ 等相对路径符号,以及解析符号链接,最终返回目标文件的真实绝对路径。它是防御路径遍历的核心函数。

4. ⚠️ 危害评估 (Risk & Impact)

如果该漏洞被成功利用,将给系统带来极其严重的后果:

  • 敏感信息泄露:攻击者能够读取应用源代码、数据库凭证(Credentials)、以及后端系统的敏感配置文件(如 Linux 的 /etc/passwd 或 Windows 的 win.ini)。
  • 业务数据篡改:如果应用不仅存在读取漏洞,还存在文件写入漏洞,攻击者可以修改应用数据或系统配置文件。
  • 系统完全接管(AI 补充说明) 攻击者可通过写入 SSH 密钥、覆盖定时任务(Cron jobs)或上传 WebShell,最终实现 RCE,完全控制服务器。

5. 🛡️ 防御与修复建议 (Defense & Mitigation)

最有效的防御策略是彻底避免将用户提供的输入直接传递给底层文件系统 API。如果业务逻辑不可避免,必须采用以下双层防御机制:

  1. 严格的输入验证 (Input Validation)
  2. 路径规范化与目录锁定 (Canonicalization & Base Directory Verification)
// 1. 将用户输入与基础目录拼接
File file = new File(BASE_DIRECTORY, userInput);
// 2. 获取规范化后的绝对路径,并验证其是否未跳出安全目录
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
    // process file (安全,可以处理文件)
} else {
    // 拒绝请求,记录安全日志
}
  1. 权限最小化原则 (Principle of Least Privilege) (AI 补充说明)
  • 确保运行 Web 应用程序的服务账户(如 www-data)仅具有访问必需目录(如 /var/www/images/)的读取权限,严禁赋予系统级目录(如 /etc/)的访问权限。

1. 📌 Topic Summary

This document explores the core mechanisms and defense strategies of Path Traversal (also known as Directory Traversal) vulnerabilities. It provides a detailed analysis of various bypass techniques under different defense mechanisms (such as absolute path blocking, non-recursive filtering, URL decoding, and prefix/suffix validation) through six practical labs.


2. 🧠 Core Principle

Underlying Mechanism: A path traversal vulnerability arises when a web application directly concatenates user-supplied input (e.g., a filename) into a server file path and passes it to underlying filesystem operations without rigorous security validation. Attackers exploit the operating system's directory parsing rules by inputting special directory traversal sequences (e.g., ../ on Unix/Linux or ..\ on Windows). This causes the parsed path to "break out" of the application's designated base directory, allowing access to the filesystem root and other arbitrary file locations.

Terminology:

  • Path Traversal / Directory Traversal - Path traversal/directory traversal vulnerability.
  • API - Application Programming Interface. Here, it refers to the low-level functions provided by the operating system for reading and writing files.
  • URL - Uniform Resource Locator.
  • PoC - Proof of Concept (code/payload demonstrating the vulnerability).
  • RCE - Remote Code Execution (AI supplementary note: refers to attackers executing arbitrary system commands on the target server, typically the final impact of file write or inclusion vulnerabilities).

3. 🛠️ Usage & Examples - How to Exploit

Common Scenarios: Frequently occurs in scenarios where resources are dynamically loaded via URL parameters, such as an e-commerce site's product image endpoint: https://insecure-website.com/loadImage?filename=218.png.

Specific Examples & PoCs (Combined with Practical Labs): The following summarizes specific attack payloads for reading the standard Linux user information file /etc/passwd under various security defense scenarios:

Lab CaseDefense MechanismPayloadBypass Principle
Basic ScenarioNo defenses.../../../etc/passwdRepeated use of ../ to jump back to the filesystem root.
Absolute Path BypassBlocks ../ sequences but processes input as a relative path./etc/passwdDirectly provide the absolute path of the target file, no traversal symbols needed.
Non-recursive FilteringThe application only strips/replaces ../ once.....//....//....//etc/passwdUtilizes nested (double-write) sequences. After the inner ../ is removed, the outer characters reassemble into a valid ../.
Redundant URL DecodingBlocks standard traversal sequences but performs an additional URL decode after validation...%252f..%252f..%252fetc/passwdDouble URL encoding bypass. %25 decodes to %, which combines with 2f to become %2f, finally decoded by the application/server to /.
Path Prefix ValidationValidates that the parameter must start with the expected base folder path./var/www/images/../../../etc/passwdFirst input a legitimate expected directory to satisfy validation, then immediately follow with ../ sequences to break out.
File Extension ValidationValidates that the parameter must end with an expected extension (e.g., .png).../../../etc/passwd%00.pngNull Byte Bypass. Uses %00 (URL-encoded null character). The application layer validation passes the suffix check, but the underlying C/C++ filesystem API treats the null character as the end of the string, thus ignoring the trailing .png.

Code/Function Analysis:

  • File (Java Class): An abstract representation of file and directory pathnames. For example, new File(BASE_DIRECTORY, userInput) is used to concatenate a base directory with user input.
  • getCanonicalPath() (Java Method): Returns the canonical pathname string. This method resolves all relative path symbols like ../ and ./, as well as symbolic links, ultimately returning the real absolute path of the target file. It is a core function for defending against path traversal.

4. ⚠️ Risk & Impact

If this vulnerability is successfully exploited, it can lead to extremely severe consequences for the system:

  • Sensitive Information Disclosure: Attackers can read application source code, database credentials, and sensitive backend system configuration files (e.g., Linux's /etc/passwd or Windows's win.ini).
  • Business Data Tampering: If the application has both read and file write vulnerabilities, attackers can modify application data or system configuration files.
  • Complete System Compromise (AI supplementary note): Attackers can achieve RCE and fully control the server by writing SSH keys, overwriting scheduled tasks (Cron jobs), or uploading WebShells.

5. 🛡️ Defense & Mitigation

The most effective defense strategy is to completely avoid passing user-supplied input directly to underlying filesystem APIs. If business logic makes this unavoidable, a dual-layer defense mechanism must be implemented:

  1. Strict Input Validation:
  2. Path Canonicalization & Base Directory Verification:
// 1. Concatenate user input with the base directory
File file = new File(BASE_DIRECTORY, userInput);
// 2. Get the canonical absolute path and verify it does not escape the secure directory
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
    // process file (safe, file can be processed)
} else {
    // Deny the request and log a security event
}
  1. Principle of Least Privilege (AI supplementary note):
  • Ensure the service account running the web application (e.g., www-data) has only read permissions for necessary directories (e.g., /var/www/images/). Never grant access permissions to system-level directories (e.g., /etc/).