Web Security Vulnerabilities - XML external entity (XXE)
Table of Contents
- What is XML?
- Why We Use XML?
- What are XML Entities?
- What is Document Type Definition (DTD)?
- What are XML External Entities?
- What is XXE Injection?
- What are Types of XXE Injection Attacks?
- What is the Impact of XXE Injection?
- How to Prevent XXE Injection?
- Time to Practice
- Resources
- Conclusion
What is XML?
Extensible Markup Language, also known as XML, is a markup language similar to HTML, designed to store and transport data. It facilitates data exchange between systems such as websites, databases, and third-party applications. Unlike HTML, which has predefined tags, XML allows you to define your own tags tailored to specific needs.
Why we use XML?
- XML supports robust schema validation through XML Schema Definition (XSD) or Document Type Definition (DTD).
- XML is better suited for representing deeply nested or complex data structures compared to JSON.
- Sharing data across different systems is simplified as XML doesn’t require conversion during transfer.
JSON excels in lightweight, web-based, or mobile applications where simplicity, speed, and ease of use are critical. However, XML’s strengths in schema validation, namespaces, and document handling make it indispensable for specific use cases.
What is XML Entities?
XML Entities represent data within an XML document by using a placeholder instead of the actual data.
Example:
In the example below, we define an entity called name
with the value John
.
<!DOCTYPE note [
<!ENTITY name "John">
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>&name;</heading>
<body>Don't forget me this weekend!</body>
</note>
The entity reference &name;
will be replaced by its value (John
) by the XML parser.
Entities
<
and>
represent the characters<
and>
. These are metacharacters used to denote XML tags and must generally be represented using their entities when they appear within data.
What is Document Type Definition (DTD)?
Document Type Definition (DTD) defines the structure and legal elements of an XML document.
Example:
<!DOCTYPE note
[
<!ELEMENT note (job)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
In this example, DTD is declared using the DOCTYPE
keyword.
<!ELEMENT name (job)>
defines that name element must contain the elements: job
.
<!ELEMENT heading (#PCDATA)>
defines that heading element must be of type #PCDATA
(parsed character data).
The DTD can be fully self-contained within the document itself (known as an “internal DTD”) or can be loaded from elsewhere (known as an “external DTD”) or can be hybrid of the two.
What are XML external entities?
XML external entities (XXE) are entities defined outside of the XML document and referenced within it. They are a feature that allows the inclusion of content from external sources into an XML document. External entities are declared using the SYSTEM
keyword in the <!ENTITY>
declaration and typically point to a file or URL.
What is XXE Injection?
XML External Entity (XXE) Injection is a security vulnerability that occurs when an attacker manipulates the XML parser by including malicious external entities. This allows an attacker to view files on the application server or interact with external or internal systems.
Example:
Assume a shopping application uses XML to store and transport data on the product details page.
The XML code:
<?xml version="1.0" encoding="UTF-8"?>
<shop>
<productId>5</productId>
<name>T-Shirt</name>
</shop>
An attacker could exploit this XML code to retrieve files from the server by declaring an external entity and referencing it to retrieve the value of an entity.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE shop [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<shop>
<productId>T-Shirt</productId>
<name>&xxe;</name>
</shop>
In the above example, the attacker tries to retrieve the /etc/passwd
file from the server.
What are types of XXE Injection attacks?
-
Exploit XXE to retrieve files
When an attacker defines an external entity containing a path to the file, and the application returns the content of the file in the response. It requires you modifying the submitted XML.
-
Exploit XXE to perform SSRF
Attackers can use XXE to trigger SSRF and force the application to make request to malicious URLs.
-
Exploit blind XXE to retrieve data via error messages
Blind XXE means that an application doesn’t return data in the response, so an attacker can exploit Blind XXE via triggering parsing errors to generate an error message containing sensitive data.
-
Exploiting blind XXE exfiltrate data out-of-band
where sensitive data is transmitted from the application server to a system that the attacker controls. It involves the attacker hosting a malicious DTD on a system that they control, and then invoking the external DTD from within the in-band XXE payload.
What is the impact of XXE Injection?
XXE vulnerabilities can be quite dangerous as they can lead to the disclosure of sensitive information and other serious security issues. Exploiting of XXE Injection vulnerability may lead to:
- Local File Inclusion (LFI): An attacker can exploit XXE to read sensitive files from the server, such as configuration files, credentials, or other confidential data.
- Server Side Request Forgery (SSRF): XXE can be used to trigger the server to make requests to other systems, potentially disclosing sensitive information or performing unauthorized actions.
- Remote Code Execution (RCE): Allowing an attacker to execute arbitrary code on the server.
- Data Exfiltration: Attackers can use XXE to exfiltrate data from the server by including external entity declarations that send data to an external server controlled by the attacker.
What are types of XXE Injection?
- In an in-band XXE attack, the attacker sends the attack and receives a response through the same channel, for example, via a direct HTTP request and response.
- In an out-of-band XXE attack, the vulnerable system sends the results of an attack to a different resource controlled by the attacker. For example, the attack may be performed using a direct request but cause the hacked web server to send a sensitive file to the attacker’s own web server.
- In a blind XXE attack, the attacker does not receive any direct response or result following an attack. Instead, they observe the behavior of the vulnerable web application (for example, the error messages it generates) to determine whether the attack was successful and use this indirect feedback to exfiltrate information step-by-step.
How to prevent XXE Injection?
Most XXE Injection vulnerabilities arise because the XML parsing libraries supports features that the application doesn’t need or it’s not required, so the most effective way to prevent XXE Injection is to disable these features, including Document Type Definitions (DTDs).
Disable DTD:
Java:
javaCopy codeDocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
Python:
pythonCopy codefrom lxml import etree
parser = etree.XMLParser(resolve_entities=False)
etree.fromstring(xml_string, parser)
.NET:
csharpCopy codeXmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
Time to practice
Let’s practice on some labs to have better understanding on XXE
vulnerability. So, today challenges will be from Portswigger.
Challenge #1
The goal of the first challenge is to read /etc/passwd
file, so let’s start with web page.
We can see that it’s a shop. If we look around we can find check stock
feature, so let’s click it and intercept the request to burp.
We can see the request just checks for the stock and the data was sent as an XML
which indicates to XXE Injection
.
So, let’s try to test for XXE
and read internal files like /etc/passwd
using:
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd" >] >
The first thing is defining a structure of XML
document using DOCTYPE
keyword.
Then we need to represent data with XML document using ENTITY
keyword with name xxe
.
Since our goal is to read a file, we need to use the SYSTEM
keyword and pass the file using the file://
protocol.
Finally, we call the &name;
entity reference, which will be replaced with its value by the parser.
And we have solved the lab.
Challenge #2
In the second challenge, we should read /etc/hosts
file to solve the lab. So let’s start.
Let’s look around the web page and we can see upload feature in the comment section of a post.
As we know it’s an XXE
lab, so we need to upload XML
file.
As svg
format uses XML, an attacker can upload a malicious svg
image and exploit XXE
vulnerability.
So, Let’s try to upload normal svg
image and see whether the application accepts it.
We can see above, svg
files are allowed. So, let’s create local svg
file with our payload.
The payload will like the previous challenge, but here we will try to read /etc/hostname
file.
Now let’s back to comments section again and open image in new tab.
We have the lab solution, so let’s submit it and solve the lab.
Challenge #3
We can see in the challenge description that the lab server is running EC2 metadata endpoint which is http://169.254.169.254/
. So, to solve this lab is to obtain the server’s IAM secret access key from EC2 metadata endpoint.
If we check the web page, it looks like the first challenge, so let’s click check stock
and intercept the request in Burp Repeater
We can see above the data was sent is XML
, so we will use the same payload used in the first challenge to test if it’s vulnerable to XXE
or not.
We can see the application is vulnerable to XXE
and we can read files on the system.
So, what we need to access is EC2 metadata endpoint and retrieve it’s data. Let’s try to access it.
As we can access the EC2 metadata endpoint let’s keep going and follow it’s directories to access the secret key.
/latest/
/meta-data/
/iam/
/secert-credentials/
/admin
Now we have the SecretAccessKey
, let’s submit it and solve the lab.
Resources
- Portswigger - XXE
- OWASP - XXE
- Hackerone - XXE
- Imperva - XXE
- Bug Bounty Bootcamp Book
Conclusion
In this blog, we covered what is XXE, types, impact and prevention. We also discussed about basic knowledge about XML and solve labs to get better understanding.
Hope you enjoy! Thanks for reading.