Finding Issues In Regular Expression Logic Using Differential Fuzzing
Regular expressions (or commonly known as regex) have been used for years to provide developers a quick way to pattern match or parse various data in applications. In web security, regexes can be found fairly often as a way to parse untrusted input in order to allow or disallow the input from affecting downstream functions. For example, lets say we have a web application in which a post request has a URL parameter to guide the application logic on a resource location. Typically in this case it is security concern if that URL is pointing to a resource outside an acceptable allowlist of locations. Regexes can be used in this case to lock down the acceptable resource locations while still allowing flexibility in the URL parameter itself. Any breakout of this regex onto the disallowed space can result in an open redirect issue or even worse an SSRF issue (Server Side Request Forgery). Whats worse is that regexes can be very complex structures, difficult to audit and very easy to get wrong. In this post I will outline how developers and testers can validate their assumptions of regexes using a technique called differential fuzzing.
Binary Fuzzing? (In Websec?)
For decades fuzzing software has been a viable strategy in finding bugs. The idea being we take the software under test, wrap it in a harness and fire endless streams of random input into an untrusted surface hoping for a crash. Recently software like AFL, LibFuzzer and AFL++ have gained a lot of traction in both industry and academia to perform what is called coverage-guided fuzzing (Fuzzing with feedback to guide future test mutations). While fuzzing for memory safety issues in itself is a interesting security concern, it is typically not a bug class thought of too much in web security. Specifically, in our case we would like to find issues with regex logic in well formed expressions and not memory safety issues in the regex engine itself.
Differential Fuzzing to the Rescue!
Luckily for us we can leverage these popular fuzzers to perform a technique called differential fuzzing. In differential fuzzing instead of having a test harness in which the fuzzer sends random data to a single piece of software, we instead have a harness where the fuzzer sends test cases to two (or more) independent pieces of software inside the same harness. These two software stacks consume the same test case and logically produce the same output. These two outputs are analyzed at the end of the test and if there is any discrepancy in output, the fuzzer will flag this as a fatal issue for analysis. While theoretically the fuzzer could find a memory issue with either of the two pieces of software, the more likely exception would be a difference of output and parsing logic.
A Regex With A Typo (but it mostly works…)
For the purpose of this exercise we will be trying to validate if a custom regex actually does the job the author intended. In our scenario we have a web application that uses a regex to validate that an untrusted URL parameter is pointing to a location that is deemed acceptable per an allow list of domains.
Our Regex under test:
The developer’s intention here is to validate that the input URL must be pointing to a *.example1.com, *.example2.com or *.example3.com domain. However, for you regex auditing experts can you spot the bug? When prompted you probably can see the abnormality but could you deduce by looking at it that this small typo just completely broke the regex’s security function? While it’s good to study regex syntax, URL format standards and stare at this expression all day, lets instead try to use differential fuzzing to spot the bug instead.
I would like to introduce a fairly new fuzzer project released by Google in December 2020 called Atheris. Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer and you can find the repo here: https://github.com/google/atheris. The wonderful thing about Atheris is that is it dirt simple to install and use without touching a single line of C code or playing with exotic C compilers. Typically on a Ubuntu 20 machine you can apt install clang then pip install atheris and you are done. Here is an example of a very simple harness where the bug is an input of b“bad”
if data == b"bad":
In this case the software under test is the function TestOneInput. Atheris will send a steam of random test cases until it locates a crash.
Our Atheris-based Differential Fuzzer
So with this new tool we can construct a very simple differential fuzzing architecture shown above. Our strategy here is to take random test case from Atheris and process it with our sketchy regex. We then take the same input and process it through a trusted source. The trusted source in this case is python’s Urllib. We will assume that in this experiment Urllib is 100% bug free and follows the same URI parsing logic as most common web browsers and HTTP libraries. Once we have the output results from both Urllib and our regex, we check to see if their outputs contain any issues with respect to each other. More specifically, we first check to ensure that the regex validated the input as trusted, if it did we then validate that the parsed domain string from Urllib contains one of the three allow list domains. If the regex flagged an input as matching/trusted and we find none of the allow list domains inside the authority string from Urllib, we can then conclude that we have bypassed the regex allow list logic.
Execution and Implementation
Let’s first create a directory called RegexFuzz. In this directory we will have the fuzzer implementation called fuzz.py and we will have a directory called corpus
The corpus directory contains initial data for the fuzzer to start with, this can be very helpful for assisting the fuzzer to gain more code coverage and keeping it in the “ballpark” of where potential issues exist. Also as the fuzzer finds more interesting inputs, it will store those inputs into the corpus directory for later use. Lets create two initial corpus with somewhat legit looking subdomains.
At this point we can just run the fuzzer as follows:
Before I show you the crash, lets go through the important pieces of fuzz.py. I have uploaded fuzz.py for all to check out and run for themselves:
At the beginning we pull in all necessary libraries required, we compile our regex once in the global space so that we don’t keep recompiling it in the fuzz loop and lose performance. We also keep a global list (called Allowlist) of what we believe are the acceptable domains per the regex.
We define the harness as TestOneInput that accepts a python bytes argument from Atheris called data. We then place a lower bound limit to make sure any test case we accept contains at least 5 bytes. In reality to find this bug we need at least 3 bytes (the first byte as a domain selector and 2 bytes to trigger the bug). Line 21 is showing how we reformat the test case. We use the first byte of entropy to select a domain from the Allowlist we then use the remaining test bytes and prepend it to the selected domain. This allows us to ensure that we always use the fuzzer’s entropy in selecting the allowed domain. We could have instead placed random bytes directly into the regex and seeded our corpus with example1.com,example2.com,example3.com. However, we know roughly based on the regex that we need to keep the parent domain static and unfuzzed. Having the parent domain part of the fuzzing path would still locate the bug, but would take much longer.
On line 24 we pass our test data into the regex, if the regex match results in “None” then there is no need to continue the fuzz loop, we return right away at line 29. If the match returns an object then we know that the regex successfully validated the fuzz input as trusted and we can continue onto passing that data into Urllib. Urllib has a couple requirements, it will throw exception on malformed UTF-8 which the fuzzer is bound to create. So in order for that to not register as a crash we enclose it into a try block and return on exception. Secondly our regex doesn’t use a scheme in its matching pattern but Urllib requires one so we give it a hardcoded “https://”.
At this point we have output results back from both our regex and Urllib, we also know that our regex has validated the input as trusted. To detemine if there is a logic error and a bypass, we iterate through every domain that is in our Allowlist and perform a check to see if that domain is contained within what Urllib believes to be the parsed authority string of the testcase. If the loop finds a domain in the authority string it will return since the authority is indeed trusted. However, if the loop finds no sign of an Allowlist domain in Urllib’s interpretation of the authority string, we then can conclude that we found a payload that successfully passes the regex as trusted but is not contained in the Allowlist and could potentially be used as a bypass.
The last two lines are global calls to invoke atheris to run against our harness, sys.argv are arguments passed to the LibFuzzer backend of atheris.
We run the fuzzer and within seconds it found a fuzzing differential exception! Let’s zero in on the payload:
So if you zero in on the payload it turns out that 888888.?example2.com passes the regex successfully and Urllib parsed out the example2.com from the authority leaving only the substring. As it turns out having a ? prior to the domain does indeed parse it out of the authority in proper URI parsing. We have found a bypass to the regex, but where did it go wrong? When I run the crash several more times the domain is always example2.com and there is always a .? preceding it. Let’s now revisit our regex:
And there is it! The question mark (Quantifier operator) used to specify 0 or 1 of the preceding group was accidentally escaped during development, and most likely the developer did not have a test case to validate this specific sub expression. We can now use a payload like evil.com.?example2.com to bypass this filter.
By the way, how difficult was it to code this differential fuzzer? 29 lines of operational code! This is a testament to how much heavy lifting Atheris provides when fuzzing in python.
This was just a toy example of an easy to miss logic bug which differential fuzzing was able to find in short order. This technique is not exclusive to just parsing URLs. If you have any regex performing some type of data parsing and you have an independent library that can perform the same parsing as a trusted entity, then you can use this technique to find logic errors in any custom rolled regex.