Erika Noerenberg: Binary Emulation for Threat Analysis and Hunting with Binee
Erika Noerenberg speaking at the Jailbreak Brewing Company Security Summit on Friday, October 11, 2019.
In August of 2019, Carbon Black researchers Kyle Gwinnup and John Holowczak introduced and open sourced a novel tool called Binee (Binary Emulation Environment) at DEF CON 27. Binee is a complete x86 binary emulation environment focusing on introspection of all IO operations. Because Binee can run on Windows, OS X, and Linux, it can be integrated into existing analysis and processing frameworks regardless of platform.
Methods for extracting data from binaries at scale typically rely on static analysis. Binee additionally provides a method for capturing runtime information typically obtained from dynamic analysis, but at the cost and scale at which static analysis can run. Furthermore, Binee can run in the cloud at scale and output structured data to be analyzed. This can facilitate the automation of malware analysis, data extraction, and hunting across large datasets.
In this talk, I will briefly introduce Binee and demonstrate how static process emulation can assist with both malware analysis and hunting for Windows threats. I will also discuss how this capability can facilitate automation of analysis tasks, as well as preview future work currently in planning.
(Source: Jailbreak Brewing Company)
Erika Noerenberg: [00:00:39:00] A big thanks to the organizers and for Jailbreak for hosting us here. So, my name is Erika Noerenberg. Gutterchurl on Twitter. I wanted to talk to you about some binary emulation for threat analysis and also threat hunting with our newly open sourced software tool called Binee. I work for a company called Carbon Black. You might have heard that we just got acquired by VMware, so admittedly the last couple of weeks have been a little bit hectic because we just started officially on Tuesday. So, hopefully, I can get through this. I've not had a lot of sleep.
Erika Noerenberg: [00:01:18:06] Here we've got-- we have-- I wanted to say just big thanks also my, my team members who actually created this tool. It was primarily done by Kyle Gwinnup and John Holowczak, Holowczak. They just presented this actually back in August at DEF CON. It was supposedly recorded. I've not found the video on YouTube yet. So, hopefully, that will be up there, because they go much more into the depth of the implementation details and how it was written and kind of the, the deeper computer science side of it. They're on one of our teams. We actually split our group. We got too big and so our threat analysis unit actually split up into four groups. I'm only mentioning this because it's kind of, this was kind of like a cross, it's a cross group effort. So PARA was the group with Kyle, John, some other guys, that actually came up with the idea and implemented it. They've done all the developments for, for it.
Erika Noerenberg: [00:02:17:11] I'm actually on a team called TAG. TAG and NSAT, we both, we do the, most of the reverse engineering and then our AREA 51 group has been awesome. They've built our whole infrastructure for the stuff that PARA has built. You know, automated detonation and all of our malware analysis capabilities. So, we definitely have, it's, this whole effort has been a joint thing coming together with, with all these people with different skills. It's been great.
Erika Noerenberg: [00:02:46:06] So, what are we here for? I wanted to introduce Binee, which is the tool I mentioned that was, it's fully open sourced now on our Github. It's the Binary Emulation Environment. So, I want to talk about, a little bit about what it is and why it's useful. So, what's the problem? Getting information from binaries is not always an easy task. You have your static information, so you can get it. It's very quick. You know, you can get a lot of basic information out of it, but you're not really not getting a lot of features.
Erika Noerenberg: [00:03:25:23] On the other side you've got dynamic analysis, where you can get a lot of features but it's very resource intensive, very time intensive and so we want to talk about kind of how can we blend the two to get the most out of the information that we have, especially when we have huge data sets, because there's, you know, if you look on Virus Total, probably terabytes of, of data that's being uploaded of malware samples every day.
Erika Noerenberg: [00:03:56:16] So, how can we combat this problem? And actually this kind of goes to our last speaker, evm. That third bullet that he doesn't get to cover, the coverage between static and dynamic analysis, that's kind of what I'm going to be talking about.
Delegate: [00:04:11:10] Sweet!
Erika Noerenberg: [00:04:14:17] I don't know if, if Heather and Tom planned that or if that was just coincidental, but yes. So, we're going to go a little bit into that. So, the goal anyway was to reduce this cost of information extraction. So, we've got this huge dataset of malware. We want to get as much out of it for as little as possible. We want it to be fast, we want it to be accurate. We want to get as much data out of every binary that we can and do that at scale. So we want to be able to run through, I want to be able to query my entire dataset and get everything back that I want within, you know, a reasonable amount of time, not days.
Erika Noerenberg: [00:04:48:10] If you've ever done, like, a virus total retro hunt, it can take a while, right? If you're trying to run YARA on, you know, a terabyte of samples, it's going to take forever, so we don't want to do that. So, we want to take the best things that we can from this static, and what we're using is emulation for the static extraction, get the best features we can from the static and then combine that with the dynamic to get everything we can without having to have the overhead of detonation or, you know, other dynamic things like that actually physically running it.
Erika Noerenberg: [00:05:24:04] And so, you know, the other problem that you have, you've got anti-analysis on the dynamic side. You know, you might be able to detonate 50 samples, but if they've got anti VM techniques, if they're, you know, VMware and things like that, you're going to be limited. It might just crash out as soon as you try to run it. And on the static side, you know, what if it's packed. What if it's completely obfuscated, encrypted, whatever. You're not going to get a lot out of it, right?
Erika Noerenberg: [00:05:55:09] So, what, what they did to try to solve this problem is actually is emulation. So there's a lot of emulation out there. We actually used Unicorn and Capstone for the most part. Again, I'm not going to go into this, the details of that. I didn't write it, so I'm definitely not qualified to talk about that. But, basically, they want to extend these emulators that already exist, because there are a lot of them out there, but we wanted to add the ability to get more mock operating system functionality.
Erika Noerenberg: [00:06:29:10] So there's just, you know, here's just a few of the existing emulators that are out there. The problem was, you know, there are a lot of these things, but you know, we want to get things like values from the registry or, you know, API system call parameters. We want to be able to save files out. So we want the, the malware to be able to write out files to the system, right? So we want to get all of these IFCs.
Erika Noerenberg: [00:06:54:12] I also threw Panda in there. If you're not familiar with Malrec, it's not exactly an emulator, but it's a binary replay tool. So basically it will, it will emulate the, the execution and you can actually replay it over and over again, which was initially developed to do things like fuzzing and that kind of thing, but he extended it to malware analysis with Malrec.
Erika Noerenberg: [00:07:21:14] So, what are we actually adding to or extending from the things that already exist? One of the things is actually being able to load up the file dependencies. So the things to-- let's say your malware is loading up a custom deal, well things like that. We want that to be emulated in such a way that the malware is getting the function calls that, that it needs from its, its own deal loading.
Erika Noerenberg: [00:07:46:11] The framework for, for defining API hooks was very important, because we want to be able to extend and customize this as much as possible, which also gets into the mock OS subsystem that was, that they've implemented. So we want to mimic the-- we don't want to mimic the entire operating system. You don't want to mimic all Windows, but we want to mimic things like memory management, we want to mimic registry, so that we can get those IFCs. The file system, like I said, we want to be able to get those files that are written out, especially if you've got something that's, like, decrypting a, a payload and writing that out to disc. We want to get that information and the process structures. So, in order to make this extensible, they actually implemented kind of a configuration file that you can actually customize some of these things in the mock OS environment.
Erika Noerenberg: [00:08:36:21] So, I'm going to show you an example. So, this configuration file is actually defining portions of the OS. So let's say that you've got a sample that is looking for a specific value and a specific registry key. We're looking for the existence of the specific registry key. You can define that all here in this config file and this can actually, you can actually literally copy paste values from Regedit and paste this into the file. There's also other options that you can define in here. Again, I'm not going to go into a lot of depth. They've got good documentation on the wiki.
Erika Noerenberg: [00:09:12:11] I want to jump right into Binee. So, basically, this started as a side project. I think it was about a year ago or so. And they got so much success kind of with very little effort that they decided to, to put more into it. And so, we've been kind of slowly developing it and testing it over the last year or so, and they decided that we finally were able to get to a point where we thought we could release it to the community and, hopefully, you know, it will be useful for other people. We have it fully integrated into our own analysis environment, so we can actually get these IFCs and things like that and hunt against it. Again, it's still in development, so it's not like it's, you know, 100% you're going to run every sample and get everything out of it right now. But, you know, just like any malware, it's a cat and mouse game, right? So, something changes, something, you know, the malware authors decided to get, they discover some other way to detect if they're running in something virtual, you know, it's back and forth, right?
Erika Noerenberg: [00:10:17:07] So, I wanted to just start off with showing a little bit of the output. Not really getting into a lot of context just yet, but just to see what it looks like. So this is some sample output. I've just highlighted some of these API calls. So, you can see there's a call to create file and so we get all of these. For any hooks that are implemented like this, we're going to get all the parameters and for things like create file, write file we'll get IFCs like file paths, the names of written files, things like that. Also, here's an example of registry. So there's a, an open key, a query value. We can get that information out of it. We can set keys as well. And then, the actual implement ended up implementing kind of basic threading. So, there is that if there is multi threaded malware it will work in this system.
Erika Noerenberg: [00:11:12:23] So, I gave a brief overview, but like I said, I'm not going to go deep into the implementation of it, because I really want to show you guys and focus on how we can use this as malware researchers and threat hunters, threat researchers and why would we want to.
Erika Noerenberg: [00:11:28:05] So, the first thing, of course, you're going to have pass the PE, PE files, the DLLs. They're going to have to be mapped into the emulation memory and you're going to have to basically emulate the environment for these to run and the malware is going to expect certain things. We need to give it that environment in order to get what we want out of it.
Erika Noerenberg: [00:11:50:23] So, what is the minimum that we have to do? Here we can see, like, there's a call to create file, create file. So, again, we don't want to recreate the entire Windows operating system, so what is the minimum that we have to actually give it? Well this column, you know it, for example, the return value is going to be a valid handle to a file that it wants to, to manipulate and so your value of EEX when it comes back should be a, a valid handle. Your stack for it needs to be cleaned up. You know, there are all of these things. You're going to have to pop parameters off your stack. So we need, like, a framework and some helper functions in order to make this happen, so that the malware is getting the environment and everything is cleaned up afterwards that we can continue execution successfully, right?
Erika Noerenberg: [00:12:41:09] So, again, how this does this in depth is really interesting. My co-worker John, who's one of the co-- who wrote this is actually presenting it at OSDFCon next week and he's going to go a little bit more into depth of the implementation side. So, I don't know how many of you are going to that, and again they, they did present this at DEF CON. So it should be up eventually.
Erika Noerenberg: [00:13:04:21] So, the one part of the implementation that I do want to go into is the hooking, because as the REs who are going to be use this to actually look at your malware, that's the part of it that we're going to be touching the most, because if there are new hooks or hooks that aren't implemented, those are the things that we're going to go in and say I'm going to-- I need to add this hook in order to get this execution or this is the data that I want and it's not implemented right now, so I'm going to go put that into the code.
Erika Noerenberg: [00:13:32:01] So, what are the requirements for hooking these functions? You need to know the calling convention, how many parameters are passed in the function, what are those parameters, what are the types. All this basic information. What's the return value, if there is one. What's expected to be in that return value. And, of course, here's a, a structure, an example hook structure. So, you've got the name, the parameters, the function if it's implemented. So all of these details for most of the documented functions, of course, you can find, like, on MSDN or online.
Erika Noerenberg: [00:14:12:18] So, within Binee we've got two types of hooks. We've got a full hook and there's actually kind of two versions of the full hook, and I'll talk about that a little bit later, and then we've got a partial hook. So the function in, in itself is actually just implemented or emulated within the DLL and I'm going to actually go through examples of all three of these. But here you can see a full hook demonstration for sleep. So, we have the parameters, which is the milliseconds, and then the function is actually defined there. Any other return value, we have the actual, the skip function standard call. You can actually, either if you're implementing the actual full hook you can tell it to, you can return the actual value. Otherwise you can actually just say return success basically.
Erika Noerenberg: [00:15:09:11] So, the parameters field is going to define how many parameters are getting retrieved and the name/value pair. So these are what we're expecting to see, so we've got a memset call here. We know there's a destination and an account. You know, how big is this going to be. And you can see in the output below, that's the actual, you can see the parameters that were actually put in, that were passed in by the malware. So, even with a partial hook like this - this is a partial hook - we still get, capture those call parameters. So, for a lot of API calls, you can imagine they're very useful IFCs that can come out of just the call parameters that we can benefit from.
Erika Noerenberg: [00:15:53:14] So, let's take a look at an example of a missing API hook implementation. So, I've got an X Agent sample here that I run through Binee and it goes through - I've obviously truncated a lot of this - but it goes through some execution, a bunch of functions calls, and then we get down to this lstrcat. Those asterisks on either side, that shows you that this is a completely unimplemented call within Binee. So, I want to say okay, well my function or my execution is halting at this or because of this call not being implemented it's, it's failing on something that I, that I'm interested in that's further down into the execution. So, let's add a hook for it. In this case we don't need a full hook to get what we want. We just really want to have a partial hook so that it kind of gets past that, right? And maybe we want to implement it just so that we can get the parameters they're passing. So we look up in MSDN documentation for the implementation of this lstrcat and we can see that it takes two strings, two LP strings.
Erika Noerenberg: [00:16:54:22] So, down in our Windows, the Kernel32 Go, Go. This is written in Go. Within this function, for any Kernel32 DLL API calls, this is the source file that you would add these hooks to. So, we basically add just a very simple hook. Basically all you have to say, its parameters are 2y LP strings. As simple as that, right? It's not too bad. So let's see what happens after we add this hook. So, now when we run a, our X agent, we get down to that lstrcat call which is highlighted in orange up there and we can see that there's a file path there now. So, this function is concatenating the two strings here and now we have the actual strings that we can pull out. That P that's highlighted in blue before, that indicates that that's a partial hook. So you can see in the, the call below, the F actually indicates it as a full hook.
Erika Noerenberg: [00:17:57:21] I'll also just mention, I'm not going to go into the threading information, but the number at the, at the start of each line indicates the thread that's executing. So, one, two, three. So, we can see now that apparently T Brady isn't very smart and installed some, some malware on his system, and you can see that there's a create file called to create a batch flow and then it's written out with the right file command.
Erika Noerenberg: [00:18:26:22] And if we look at this, so if we just take a look, if you install this straight from our Binee repository from Github and run it in Dockerfile, as I was doing, this is the file structure that you're going to have under here. The OS directory, that's where you're going to have all your API. You're going to have to actually pull, you have to pull them from a Windows system, all the DLLs and things like that, and that's described in depth in the wiki, how to do that. But typically this would go in an OS directory.
Erika Noerenberg: [00:18:58:09] Malware, it's not up there, okay. The Windows directory is where we have this Kernel32 dot go. The ones that have the actual hook implementations. But anyway, so there's the temp directory in here as well. So, defined in Binee is if there's any call to write out a file, it actually has this, this temp directory where it's going to write out anything that, that's written out by the malware. Now, to the malware it is actually writing, writing it out to C users, tbrady appdata. So, as far the malware knows, that's the path that it wrote to. But in actuality, Binee is taking that and writing it out to this temporary directory. So we can see that if I cat the, if I list the, the temp directory, there we see the, the name of that file, that batch file. And if I cat that out, I can see that it's starting Run DLL with net API and the init call, init function.
Erika Noerenberg: [00:19:56:02] So, here's another example that's where we've got a bad route. If you remember that, that malware. It was read somewhere from a couple of years ago. We've got get command, get command line and command line R to RV is not implemented, as we can see with asterisks. And because that's not implemented, we have a different execution path and that process ends up exiting with basically getting nothing out of it. So we, do we want to add a full hook? What do we need out of this? It turns out we, we don't necessarily need to actually implement the entire function. This is still a full hook, but we're basically just saying just pass execution on and say this was successful. We got it. So we're going to call this an op hook basically. It's not really doing anything, it's not doing the functionality that is expected, but we don't really need it. We don't need to clean up the stack. We just need to return success so that the execution continues. So what does this look like?
Erika Noerenberg: [00:20:57:13] So, here we've run it again. You've got that command line to RV which passes through. We run into another unimplemented function, but it turns out that it doesn't really matter. The execution continues. And as we go down you can see that it calls create file and execution continues. So, basically even though-- you've got basically three scenarios. In one case the partial goes just fine, in other cases you need the full hook but you don't necessarily need to implement the entire function like we did here. Passing success on is, is good enough.
Erika Noerenberg: [00:21:36:01] But let's look at it actually implementing a full hook. So again, here we've got our MSDN documentation where we've got the, the definition of the function, search path A. We've got information on the return value and for, again, functions that are documented by Microsoft, we've got the parameter names, we've got the types, return value, we know what it does and so we can have all the information that we need to implement this full hook. So, let's look at one where we have implemented one of these.
Erika Noerenberg: [00:22:14:10] So, here's some malware. As you can see, it's loading up wininet.dll, so we know we're going to have some interesting, probably going to have some interesting IFCs out of some URLs or IP addresses or things like that, that are, that are going to be implemented. We can see that there is get proc addresses for Internet open and then some other functions. Again, I've truncated this a bit. But then we get down to the search path A, and basically it just exits after that. So we're getting nothing.
Erika Noerenberg: [00:22:49:05] So, we add a full hook. So, as you can see here, it's a similar thing. We've got the parameter strings there, so we pull those from, from our documentation. We've got the path, the file name, extension, the buffer and then we've got the actual function definition there. So you can see there's a couple of if clauses. So, basically, if the error, if there's an error return, return, error code returned, it's going to just say, we're just going to return true. We're going to forget that there was an error because it probably doesn't matter anyway. We're just going to tell the malware it was okay. Otherwise, you know, we're going to actually implement this function and return the args or, or the proper output that would be expected and clean up the stack, manipulate the registers, you know, showing function that, that was successful. Everything like that. But again, any of these hooks are actually going to retrieve the parameter values, so we can still get any of the IFCs out of it.
Erika Noerenberg: [00:23:56:18] So, here's another example where we have the search path that we just implemented and we've got a reg open key call that followed that. So here's an example of where implementing that function, which really doesn't take a whole lot of time, it's not a lot of effort, because you've got basically prototypes for a lot of them already. So even if you're not super familiar with, with coding, you've got a lot of examples in there and, if you've got the documentation for the function, it's not, it's not too hard to, to get this implemented. And now we've got some, we know some registry keys that were manipulated by this malware.
Erika Noerenberg: [00:24:39:14] So, this is kind of an iterative process, right? We put, we implement one hook, we get further in the malware, find out there's another hook that we need to implement over, you know, over and over again. That probably sounds kind of an arduous task to just continually be adding these hooks, but it turns out that even with the limited implementations that we have now, we have a statistically significant increase in the amount of data that we're actually able to pull from our, our dataset. I'm not the statistics person. Kyle could probably tell you that, you know, it's a 75% increase in the amount of metadata or something. I don't know. But it is statistically significant and we've, we've been able to benefit a lot from that in our internal environment, and actually too, in, in creating this talk, I was actually able to use this data to do some threat hunting, which I'll talk about a little bit later, to try to identify interesting samples that I could use to demonstrate some of these things, so.
Erika Noerenberg: [00:25:42:12] So, there are some demos. I wanted to show some actual malware, just because I know out there a lot of you probably do malware analysis, but there are some kind of contrived, actually this is a, a piece of malware, this particular example, but up on our wiki John has put some documentation together and they have done a video that they talked through at Black Hat and it is on the name page of our Carbon Black Github page. So you can walk through that as well. But the samples are on Virustotal and he actually walks through, you know, how this looks in Binee. So I definitely recommend if you want to just kind of get your feet wet on it or just see how it is, this particular sample here is interesting because it's, it's packed. So it's a good example of one where we're not going to get anything out of static analysis, but just running it through Binee it will actually unpack itself. It deobfuscates, DLL writes it to disc and so we get all this information out with very little resource or time.
Erika Noerenberg: [00:26:59:02] So, here's just an example of that particular sample I was talking about. So Binee we, if we run it with the verbose output, the dash V on this sample, you can actually see here this is the disassembly for the deobfuscation code that's being run.
Erika Noerenberg: [00:27:21:07] So, we have this dynamic data. This kind of static dynamic data. How do we use this in order to actually threat hunt or use it in our daily work? The possibilities are endless, right? I mean, you can really think about a lot of ways. This, the data that you're getting out of this once you've run it, it's actually instrumented for JSON output. So in our environment personally, we have basically variable JSON structures where we can pull IOCs, we can pull imports. And so instead of just getting the imports from the import table because, you know, what if it's packed, obfuscated, we get any of the dynamic imports that have come out of, you know, maybe there's a, a API resolution function. We get all of those imports as well, once it's run through Binee. So you can actually get a lot more information, and all of this metadata we can query against and then correlate with other sources that we have.
Erika Noerenberg: [00:28:29:14] So, some of the things that, that we've used it for. Like I said before, dynamic automated decoding and decrypting. So, if you think about - I'll just use an example like PlugX, If you're familiar with PlugX, it's been around forever - there's an encrypted payload. There's a DLL that it loads, decrypts the payload and writes the stuff out to disc. And there's also, you know config data. So, a lot of times the, the process would be, or the old process you would say, okay, I'm going to run YARA or whatever my process is to identify all the members of this family, all the things that I think are PlugX, you know, its version 1234, whatever it is, and then I've got my Python script that's going to run through here, determine, you know, okay it's, it's version one. So, I know that I need to run this decryption script against it and, you know, this is how I can dump out the data. But any time you've got a variation change, you know, your script might break or maybe your identification isn't really identifying exactly the right variant.
Erika Noerenberg: [00:29:34:06] Well, in this case you can, because you're kind of statically unpacking it and you've got this deobfuscation or decryption routines like we saw for that other sample, we can actually let the malware do that for us. So we can say, have Binee, you know, unpack the stuff and then just say, go in and say okay, now we've got that, let's write that out to disc. Like, give me that config data. Put that in the JSON and the IFCs and we can store that off for later. If we have to run, you know, say we're doing some sort of research project on the samples that we had, the PlugX that we've had for the last five years or something. We've got, you know, 50,000 samples or something like that. I've got JSON metadata for all of these and I can say well, how many of them used X or Y and I can very quickly just say, query my database, give me all the stuff back, and then it's very quick to just process that, that JSON data with something like Python and pull out whatever information you need very quickly and at scale.
Erika Noerenberg: [00:30:39:17] Again, like I said before, hunting across large datasets, so you're ingesting millions of samples a day, let's say your virus total or something like that. You know, we personally are a system. We have many sources of, of samples that we're ingesting constantly. You can't realistically detonate every one of those. You can't run full dynamic analysis on all of those and again, how much are you going to get with just regular static analysis? We need to get a way to get more metadata out of these things, as much as possible without taxing more resources and more time. And if you've used YARA, you know, if you have a sophisticated YARA rule, you can't run those on huge amounts of data. It's going to take forever and it's very resource intensive. So, this gives us a way to have, basically store off running it, we can run it literally against every sample that comes through and store off that metadata and then we're not having to reprocess that when we want to go hunt later for this stuff.
Erika Noerenberg: [00:31:44:22] The other thing we talked about earlier, automating this collection of run time IOCs. So we can get registry keys, we can get URLs, IP addresses, correlate against these things, but we're doing it at the scale of static analysis. So, even for some of these part packed samples, if we can get them to run at least partially, even partially through Binee, it doesn't even have to run through the entire thing, we can get files that are written out. We can get all of this data and store it off somewhere. This makes it very easy to correlate. And so some of the things like we do, Kyle has implemented some things on, things like PE import clustering, right? So like, it's into kind of a malware sample fingerprinting. So, let's say you wanted to get beyond just families. Maybe you're tracking a campaign. Maybe these actors don't necessarily use the same malware family, but let's say they're reusing code. So, one of the things that, that's common would be using something like imphash or impfuzzy. So making kind of a, a fingerprint of the import table or a fuzzy fingerprint hash of the import table and using that to try to correlate samples for code reuse.
Erika Noerenberg: [00:32:56:15] Well, in this case, if you're doing that just statically, you know, if you're packed, encrypted, that kind of thing, you're going to have very little information. If we can actually run it to the point where at least those imports are resolved or resolved and, and we can pull that data, that gives us a much richer data, a much more, a much higher fidelity picture of the code.
Erika Noerenberg: [00:33:26:19] So this, like, it's a richer static data, we've got more metadata, so we can, the other thing we can do other than just fingerprinting, that allows us to narrow down a dataset. So say we're looking at things. Maybe we, we've got these things that have, like, matching imphash. Even though it's, maybe they're not the same family, maybe they aren't even close. You know, you might have two samples that have nothing to do with each other, but you've still already weeded down to a set of, let's say, like, you know, five gigabytes of samples instead of five terabytes of samples. So now, even though that five gigabytes of samples may not be related, it's a lot easier to take, to pare down and then you can run things like your YARA against it or things like that, or any other systems like that that you would have, and again saving a lot of time and resources. Because I know, if it's anything-- You know, anybody that does this, there's more work than you can possibly do. I mean, there are not enough of us to do all the work that comes in, so anything that saves us time and resources, even though resources are a lot easier to come by and a lot cheaper, it's still not cheap. You know, if you're, if you're using Amazon S3 buckets and things like that, you're paying for all this, you know?
Erika Noerenberg: [00:34:52:02] Oh, so yeah, one of the things that I wanted to mention, just I mentioned briefly. I actually, the way that I use this for this talk, we have databases of all of the JSON data that we've pulled from Binee for literally every sample that comes through our system, and so I'm able to query that database and say, give me all of the samples that we have in our data, in our entire database that have at least 20 imports. Like 20 dynamic imports from Binee. Given that, let's say, okay, give me all those. I want to know all of the, the samples that have a call to Internet open URL or, you know, I want to, I'm going to look for route kits. I want to look at, you know, what has some ZW functions and things like this. This allows me to narrow down and say okay, you know, let me take all the ones have these, this metadata, these matching metadata and narrow down and zero in on a few samples that might be interesting that I can look at a little bit more in depth, so that you're not dealing with 500 samples of PlugX looking for one that is, that demonstrates the one thing that you're interested in. So that helped a lot in trying to find interesting samples that demonstrated particular things that I wanted to show.
Erika Noerenberg: [00:36:17:16] So, what's next? In the near term, obviously, we're going to increase the fidelity of these hooks by adding higher quality, doing more of these full hooks, getting more of this data into there. We've talked about flagging IFCs, so we've got the JSON output already, but actually making, you know, tagging for different IOCs and things like that, which is a fairly easy one. But longer, longer term, they've talked about implementing a single stat mode, so almost like a de-bugger mode, and implementing the full network stacking implementation. Right now, I mean, we could, we can implement network functions, but we're not, you know, emulating the entire network stack and things like that. We also want to add ELF AND Mach-O support. Mach-O is one that I was going to be starting on, but our whole VMware it's, you know, things are on hold a little bit with all that. But we do plan to start on that soon, probably this coming quarter, so I'm really excited about that. And then adding more anti-emulation anti-analysis functionality and things like that.
Erika Noerenberg: [00:37:36:19] So, how can you get started? So our, like I said, all of this is open source up on our Carbon Black Github. So it's github.com/carbonblack/binee, and there are several examples and a lot of documentation in the wiki tab, so definitely check that out. It's actually set up with an included docker file, so you can very easily just run the docker file and run all this code in a container, not have to set up you entire environment. The only real requirement, external requirement that you'd need to pull in, which is documented on there, is pulling in the Windows, the required Windows DLLs. So, for example, in my environment I've got Windows 10 and Windows 7 DLLs to run against. So importing, importing those necessary DLLs which are described in the READ ME, run your sample. So, you know, just run some samples and just see what happens. Run them against Binee and implement a few hooks to see how much further you can get. You know, rinse and repeat. And send pull, pull requests. You know, if you've got issues we're, you know, merging stuff. People have been, you know, [inaudible] to our repositories and been making pull requests and so we're trying to stay on top of that as well. environment and it's really enriched our dataset and, and hunting capabilities. So, we're just looking forward to, to getting more and more fidelity and information out in the future.
Erika Noerenberg: [00:38:57:04] So, please get involved and I hope that it's useful for people. Like I said, we've implemented this into our own malware analysis environment and it's really enriched our dataset and, and hunting capabilities. So, we're just looking forward to, to getting more and more fidelity and information out in the future.
Erika Noerenberg: [00:39:21:17] It's a little bit early, but I could show you the-- let me just pull this up here. Oh right, wrong screen. That's not network. I'm just going to see if we can do this. No, because it's on full screen mode.
Erika Noerenberg: [00:40:00:11] So, I apologize for my terminal colors. This is probably-- I tried to get something that might be high contrast, but here's just a, an example of-- this is Kernel32 dot Go. So, basically these are all the hooks that I was talking about that are currently implemented and so basically just adding new hooks into here is fairly straightforward. I am not an expert programmer and I barely touched go and I managed to, to pretty quickly implement several functions that I needed as I was going along without too much help. So, if you want to see Binee in action, you can see if this works. This is probably terri-- yeah, this is terrible. But basically, I tried to mess around with it, but when you don't have long, you don't know what it's gonna look like. So, basically, we can just about run Binee. So, here I am in a, in a docker container now. So just running Binee by itself, you can see the options, but running it against a sample is literally just as easy as-- let's see the X Agent one. Just tell it the file and you can see the output. So you can get all of your registry. Anything that's implemented. And like I said, you know, here, this clearly didn't, didn't execute fully, but just in the small amount that it actually executed, we actually get quite a lot. So this is the same one that I showed before. We can see that this, this file was written out.
Erika Noerenberg: [00:41:55:18] But we can see that it's written out there. Just cap that file. And there's our, there's our file. So, according to the, the malware it has no idea. It thinks that it wrote this file out. But we can just pull it safely, you know, without having to worry about infecting our systems or doing any of that stuff, because it's not actually executing it. So you can pretty much-- oh and it is-- that brings up a good point, it is cross platform, so you can run it Windows, Linux, Mac or use the docker container like I have here.
Erika Noerenberg: [00:42:52:09] If anybody has any questions. Yes.
Delegate: [00:42:57:19] So, is there any way that I should modify, like, the return codes from something that [INAUDIBLE]?
Erika Noerenberg: [00:43:04:19] So, the question was can you modify the return codes to, to manipulate the behavior? Absolutely. So, a good example of that, which I didn't mention, is sleep. So one of the things that you see all the time with malware is that it wants to sleep for 11 hours or something like that. If we want this thing to just execute through, we can say okay, you just called sleep, you called it for 11 hours, but actually we're going to sleep for two milliseconds and so you're, you're controlling what's output, and in the case of a full hook, you're actually implementing the function. So it's almost like if you think about, for those who maybe are not familiar with how it works, let's say malware, so there's a function, just say create file or something like that. A function that's natively defined in Windows DLL. But the malware actually wants to implement the same function but it wants to do it differently, so it implements its own version of it. So it will load its own DLL to replace that functionality. We're basically kind of doing the same thing. We can do whatever we want with whatever it calls. All that's required is to put the function definition and the hook definition into that, the appropriate whichever. So the Kernel32 that I showed, that would be mimicking anything that was Kernel32 that DLL, for example. So you have user 32 dot DL, reninet, any of those things. So yes, you can basically manipulate anything you want to.
Erika Noerenberg: [00:44:41:10] Any other questions? Yes.
Delegate: [00:44:45:19] Can you specify any entry points, if any? Is it like-- does it have to have a required starting point [INAUDIBLE]?
Erika Noerenberg: [00:44:54:14] So, he was asking can you manipulate the entry point? Again, I didn't develop it. I do believe so. I'm not sure what the implementation of, of the entry point. I don't know the details on that, but I can certainly find out. Neither Kyle or John are here, but I can certainly find out about that. I don't know for sure.
Erika Noerenberg: [00:45:19:14] Questions?
Erika Noerenberg: [00:45:24:19] But, I mean, the, the basic idea, the basic idea is that this environment is entirely in your control and the way that they designed was that-- to make it as, as easily configurable and modular as possible. So the separate config file that allows you to define your own kind of OS environment, so any specific registry keys, let's say the, the inverse. A lot of, one of the typical anti analysis techniques for VM-aware malware, let's say, is looking for a particular registry key that VMware always has in there. Well sometimes it's actually instead of looking for the absence or existence of that, it might actually be looking for the existence of a very specific thing to identify a system, and if it's not there then it's, it's going to quit. So those types of things specific, like for specific programs that are installed, let's say. Let's say they're looking for something that has a specific anti virus member on there, it's looking for that specific registry key. You can put that into the config file and tell the malware, yes, this is the environment that you're looking for. So all that is configurable.
Erika Noerenberg: [00:46:42:22] Yes.
Delegate: [00:46:43:00] [INAUDIBLE]
Erika Noerenberg: [00:46:53:10] So, he's asking about implementing any anti-analysis things like timing loops. That probably gets into more like the stuff that they did with the threading implementation, but again, just off the top of my head, one of the things-- oh yes, so actually I did have an example of that. So there's, let's say get system time as whatever. I can't think of the name of it. But a lot of times the malware will say get the tech count, you know? Is this in, in some sort of emulation environment? If the time between this call and this call is, you know, more than or less than whatever, then I'm going to exit. So, because we manipulate the input and output, or the output of each of those calls, we can tell it yeah, it's-- and basically just, you can just go down to the check that says is this, are these conditions met and just say success. It doesn't even matter what you were asking for. I'm just going to tell you it was, it worked. So you have full control over that.
Erika Noerenberg: [00:47:57:04] Other questions?
Erika Noerenberg: [00:48:09:04] If there's no other questions, please feel free to come up and ask me questions. Please feel free to contact us through Github, or I think both of our-- I've gotKyle and John are also-- put their info up here. But basically you can find them, their information, their Twitter handles and all that on Github. File issues, like I said, make some pull requests. Please play around with it. Let us know what you think and I hope you get something out of it.