File based fuzzing and PyDBG

Thanks to Peter of DVLabs for a very interesting post “MindshaRE: Hooking ReadFile and MapViewOfFile for Vulnerability Analysis“. The main issue raised in the post was to get more information before we start fuzzing the file to get interesting results (if we get lucky!!). Here “more information” means to find those offsets in the file, that is being fuzzed, that correspond to the arguments of some of the interesting functions. Once we find these offsets, it makes sense to fuzz contents of the file at those offsets. In this way, we reduce the search space to a great extent. In the following, I will detail a lighter (and incomplete 😦 ) version of the above method by making use of PyDbg. It is a lighter version as I am not providing a full working code (mainly because I am still working on a code that does things completely and I plan to write about that in a separate post. That’s why it is also incomplete). My main intention is to dig some interesting features of PyDbg.

So, the main steps are as follows (as can be recalled from the above mentioned post):

  1. Hook the call to function CreateFile() to know if it is the right file
  2. Hook the call to function ReadFile() (or fgets(), for example), if it corresponds to a file handle that we are interested in.
  3. Set a memory breakpoint at the buffer where ReadFile has written the file data.
  4. On memory access violation, calculate the offset that “may” correspond to a offset in the file.

For the simplicity and to keep it short, let us assume that the file is read entirely and only once (yeah.. i know this is not a very practical assumption..but we’ll have some idea…). We’ll be using PyDbg’s hooking API to hook function calls.

dbg=pydbg() # get the instance of pydbg class to be used in main

hooks = utils.hook_container() # get instance of hooks that will be used to add/delete hooks

The callback function that may be used for CreateFileA hook:

def CreateFileReturn(dbg, argu, ret):
print “Exiting CreateFile”
#print “going to create file: 0x%08x “%argu[0]
if fileName is not False:
#print “created file: “,fileName
if“.(mp3)|(pdf)”, fileName):
print “created file: “,fileName
print “return val: “,hex(ret)

We are mainly checking the 1st argument of the CreateFile (argu[0]) because it is the name of the file that we want to open. It returns file handler to the file. We add it to the dictionary openedFiles. From this, we can know which ReadFile calls to monitor!! Remember to remove this entry from openedFiles, once that handler has been close (how??.. well hook CloseHandle()).

Now, we are ready to hook ReadFile call. The 2nd argument to ReadFile is the buffer where the data is copied. Therefore, on return, we want to set a memory breakpoint at the address of the buffer.

def ReadFileReturn(dbg, argu, ret):
#print “Exiting ReadFile”
for k,v in openedFiles.iteritems():
if argu[0]==k:
print “setting mem BP from 0x%08x to 0x%08x”%(argu[1],argu[1]+argu[2])
dbg.bp_set_mem(argu[1],argu[2],description=”, handler=buffer_access_handler)
#print “return val %d”%ret

In the above code, argu[1] is the address of the buffer and argu[2] is the length of the buffer. It should be noted that memory breakpoints are set of page boundary in which a the required buffer is located. Memory access violation is triggered if any address between belonging to the buffer is accessed. The following pictorial diagram may help in understanding the structure:



<address accessed>


<page#1 end>

From the above diagram, it takes a trivial arithmetic to calculate the offset of the content that is accessed:

 OFFSET = <address accessed> – <buffer-start>

How do we know the <address accessed>?? Well.. it is also very simple, provided we peep into the Pydbg class code. pydbg class has several class variables that are not exposed (into the API documentation). Two very pertinent to our problem are:

self.memory_breakpoint_hit and self.violation_address. The 1st one is the address of the breakpoint that got hit i.e. address of the buffer and 2nd is the exact address that caused the memory access violation i.e. <address accessed>. Now based on the above formula, we can calculate the offset in the file. As I mentioned earlier, we assume that whole file is copied into the buffer and if this holds, the offset corresponds to the position of the file content that was read/written. Now, we can just fuzz text around this offset to see if we get lucky!! The access violation handler, used in ReadFile hook function (i.e. buffer_access_handler) may have the following code:

def buffer_access_handler(dbg):
if dbg.bp_is_ours_mem(dbg.violation_address) ==False:
print “not belonging to mem BP”
print “buffer accessed at bp 0x%08x\n”%dbg.memory_breakpoint_hit
# check if it is a read or write access violation
if dbg.write_violation:
print “write violation from %08x on %08x of mem bp” % (dbg.exception_address, dbg.violation_address)
print “read violation from %08x on %08x of mem bp” % (dbg.exception_address, dbg.violation_address)

print “## 0x%08x\t%s offset: 0x%08x ##”%(dbg.context.Eip,inst,dbg.violation_address – dbg.memory_breakpoint_hit )

So, PyDbg has many (hidden) interesting features to make things easier!!! I shall be posting a more detailed post with complete code which does more than what is explained here.. till then.. happy PyDbging 🙂