Introduction
XML presents a useful resource for sending data from service to service and for data processing internally but with anything, as soon as user input gets involved, things get dangerous. The processing of these files comes with an inherent risk due to XML processors having external entities enabled by default. Not everyone knows about these settings which makes this a potentially dangerous thing to have. External entities can be used to grab files or even execute code. Needless to say we do not want this to happen.
An XXE attack occurs when malicious actors send off data in one of the XML formats they have control over (for example an XML upload, a SOAP request or even a DOCX file they can upload as they consist of XML documents after we extract them). The attacker can insert what’s called an external entity into an XML and call that entity in one of the nodes. This might cause the system to execute the external entity and for example execute code. We have shown an example below:
Now I specifically added the &smbConf external entity in the second node of my document as I wanted to make clear XXE attacks can occur in any node of the document.
Like you may have noticed from the example shown above, there are two parts two an XXE attack to retrieve files as this was an example of that. First of all we have to note the inclusion of the external entity
And second of all we need to include this entity in one of the nodes of the document.
The attacking XML contains an external entity called smbConf which will attempt to gain a smb configuration file. As stated before, we then test every possible node for this external entity to see if we can grab the file and display it as an attacker.
Another potentially harmful way XXE could impact an organization is by performing an SSRF attack with the XXE vulnerability. When this happens the server is made to execute an HTTP request on behalf of the attacker with all kinds of serious side effects.
If we want to execute an SSRF attack through XXE, we need to define what URL we want the server to execute a request to like so:
As we can see in the example above, the XXE processor will execute a request to a server running on the internal network that contains an admin panel that can only be accessed by the internal network. This admin panel can now be browsed by an attacker by means of SSRF. If no data is returned however, a blind SSRF might still be possible.
Just like blind SSRF vulnerabilities, blind XXE vulnerabilities also exist. The external entity can still be processed but that does not mean it has to return data. These types of vulnerabilities are harder to find and abuse but with some more creative techniques, attackers can still find and exploit these issues.
It may seem like only directly controlled XML files by the user are vulnerable but nothing could be further from the truth! There are possibilities for attackers to execute what is known as xinlcude attackers. This is where the attackers will insert an XXE attack vector into a non-XML parameter and the server will later on merge the input into an XML file. In this situation however you can only insert your attack string into the XML file and not control its entirety so you will have to be creative as you can not redefine or change the doctype. Luckily, the XML specification comes to the rescue as we can use part of it called the xinlcude section. An example of an xcinlcude attack would be:
This is one of the nastiest of the OWASP top 10 vulnerabilities as it’s often missed, slipping through the net while still having a devastating impact.
We should take to adept source code analysis tools that will scan our code and report issues to us. We should also note down any entry point for XML files such as XML file imports, DOCX file uploads, SVG image uploads and SOAP endpoints. We should make sure to test all these XXE entry points and not only limit ourselves to the regular XXE issues we know but look for blind XXE issues as they are harder to test for and require a different strategy. We need to investigate all the possibilities so this includes anything that might contain the vulnerability, SAML, DTD, SOAP, … and also test these endpoints thoroughly while making sure to test for every node of the XML.
If the attacker can only control part of the XML document, they should aim to test for xinclude attacks.
The first attack scenario we want to start out with an attacker who wants to steal the private SSH keys of their victim so they launch an XXE attack with an external entity which will try to grab the id_rsa file from the victim which is their private key, allowing them to possibly create connections to other servers if they can also grab the known_hosts file from the .ssh folder using the same technique.
In our second attack scenario we want to visit a URL on a web server that can only be connected to from a host that is within the same network to prevent hackers from stealing the sensitive unprotected data they are after which would be a list of credit card details in this example.
The web server contains the list itself so the SSRF attack does not even need to try and reach a different server but can instead connect on the loopback ip address.
These are all nice theoretical examples but i find practical examples and real life examples work best which is why we will be going over a CVE report from CVE-2018-12463 which is an older CVE that describes how a single bad implementation of an XML interpreter can cause serious problems. In this case, an XXE vulnerability led to the ability to read files and perform SSRF attacks from an unauthenticated user which makes this vulnerability even worse.
https://www.cvedetails.com/cve/CVE-2018-12463/
Even big players with great budgets are not immune to this vulnerability, which is exactly what IBM discovered with their websphere application. An XXE vulnerability has been found which allowed attackers to consume resources and get their hands on sensitive user data. CVE-2021-20454 is a great example of why we should be very diligent in our XXE testing and might even need to consider other data formats as JSON.
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-20454
Prevention of XXE attacks will rely heavily on indexing and protecting all possible XML entry points and making sure they do not have external entities enabled where not needed. We need to be aware that XML is something more complex than it seems at first glance and it reaches far and wide. If possible we should opt to use a different data format such as JSON to prevent the possibility of XXEs completely.
If we do use SOAP, we need to make sure we use a version higher than 1.2 as it will be patched properly. Other XML libraries used should also be patched promptly. To aid this process there are checkers that go over the dependencies and report any outdated versions.
It goes without saying that wherever possible external entities should be disabled in the configurations where the application allows this.
In all instances where user data ends up in an XML file (this can also be done by the application merging user input with an XML file) we should implement proper data hygiene and sanitise all the incoming data. The best way to do this is by implementing a whitelisting strategy but we realise this is not always feasible as it can cause business problems to only allow certain input.
An XSD is a great technology to help us validate any incoming XML file and we should make sure every incoming file meets the requirements set forth in the XSD.
Code review can also help us detect these issues before they hit production. This can either be done manually or with the help of source code review tools though these should always be used in conjunction with manual testing and code reviews. These should pay special attention to any endpoint accepting XML input.
A last option is to install a WAF or API security firewall to increase the security of the application but these should never be used in isolation instead we should opt to use them in conjunction with the above preventive measures.
XXE is an often overlooked issue type due to the way developers learn about XML and how they often neglect to learn about it’s more intricate features such as external entities or Xincludes. Since these issues are easy to miss and they have such a large impact in general, it is important to pay close attention to any XML input point and to test it thoroughly.
Subscribe for the latest news