Tuesday, February 19, 2013

BinMode: Understanding Data Structures

As most analysts are aware, the tools we use provide a layer of abstraction over the data with which we're engaged.  What we see can often depend upon the tool that we're using.  For example, if a tool is written by a developer and the intended user is an administrator, then while the tool may be useful to a DFIR analyst, it may not provide all of the information that is truly useful to that analyst, based on the goals of their examination, the data that's actually presented by the tool, etc.

This is why understanding the data structures that we're working with can often be very beneficial.

By understanding what is actually available in the data structures, we can:
1.  Make better use of the information that is available.
2.  Locate deleted data in unallocated space (or other unstructured data)

A good recent example of this is the discussion of Java *.idx files, and the resulting parsers that have been created.  Understanding the actual data structures that make up the headers and subsequent sections of these files lets us understand what we're looking at.  For example, for a successful download, the header contains a field that tells us the size of the content.  Most legitimate downloads also include this information in the server response, but malicious downloads of Java content don't always include this information.  As such, we have not only have a good way for determining what may be a suspicious download, but we also have a pivot point we can use...we can use the content size to look for files of that size that were created on the system.

Another example of this is the IE history file format (thanks to Joachim for all the work he's done in documenting the format).  A lot of analysts run various tools to parse out the user's IE web browser history, but how many understand what's actually in the structure?  I'm not saying that you've memorized it and parse everything by hand, but rather that you know enough about it at least be curious when something is missing.  For example, according to Joachim's documentation, each "URL" record can contain multiple time stamps, including when the page requested was last modified, last sync'd, and when it expires.  I say "can" because in some cases, these may be set to 0.  Further, according to the documentation, there's a flag setting that we can use to determine if the HTTP request was a GET or POST request.

How else can this be helpful?  Mandiant's recently released APT intel report provides a great deal of useful information, including references to "wininet" on  page 31.  The WinInet API is what produces the artifacts most commonly associated with IE.  As such, if your organization uses Firefox or Chrome, rather than IE, or you see that the "Default User", "NetworkService", or "LocalService" profiles begin developing quite large IE histories, this may be an indicator of activity.

I've used the documented format for Windows XP and 2003 Event Log records to not only parse the Event Log files on a system, but also locate and recover deleted event records from unallocated space.  In fact, I had an instance where the intruder had cleared the Security Event Log after gaining access to the system, but I was able to recover 334 deleted event records from unallocated space, including the record that showed when they'd initially logged into the system.

Addendum, 20 Feb: I've mentioned DOSDate format time stamps in this blog before, as well as the data structures in which they're used (shell items in shellbag artifacts, ComDlg32 Registry subkey values, LNK files, Jump Lists, etc.).  They're very pervasive across all Windows platforms, and more so on Windows 7 systems.  This MS link provides some information regarding how these time stamps are constructed, as well as how they can play havok with timeline analysis if you're not familiar with them.

This doesn't apply just to Windows systems.  A great example of this is Mari's recent blog post, Finding and Reverse Engineering Deleted SMS Messages.  Mari provides a complete walk-thru of the data source being examined, going so far as to not only identify the data structure, but to also demonstrate how she did this, as well as to show how someone could go about identifying deleted SMS messages.

What's really interesting is that back in the day, there used to be more of a focus on understanding data structures. For example, some DF training programs would require candidates to parse partition tables and compute NTFS MFT data runs. 

Training
Interested in Windows DFIR training?  Windows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register.  Each course includes access to tools and techniques that you won't find anywhere else, as well as a demonstration of the use of the Forensic Scanner.

On 10-12 June 2013, a Windows Forensic Analysis and Registry Analysis combo course will be hosted at the Santa Cruz PD training facility.

Course descriptions and other info on the courses is available here.  Pricing for the combo course is $749 per seat, and will be listed on the ASI training page shortly.

Saturday, February 16, 2013

Java, Timelines, and Training

A lot of interesting things have gone on lately...and I'm sure that there's more to come.  I thought I'd take a moment to raise awareness of some of what happened recently,

Java In The News
Corey posted some links to tools the other day over on the jIIr blog.  In the blog post, he talks about IDX parsers...not long ago, there was a flurry of activity around Java deployment cache index (*.idx) files, and what was contained in them.  

You can find my IDX parsing code here.  I updated it recently to support a variant to version 605 index files that I happened to see (that's one of the things I love about open source....).

Okay, so what's the big deal with this stuff?  Is it just the "flavor-of-the-week" when it comes to DFIR analysis?  I wouldn't think so...in fact, I'd suggest that if you do any sort of work with respect to malware or intrusion/compromise analysis, I would suggest to you that this is a pretty big deal.  In fact, it's likely that anyone doing any sort of DFIR analysis is going to run up against this at some point...you may not recognize it at first, but it's likely to be there.

Remember this graphic from the SANS Forensic blog post from last year?  Right in the middle of the super timeline, there are two *.idx files being created.  This is from last year...during a recent examination, I added the metadata extracted from *.idx files to my own timeline, because the infection routine started with Java being run.  I discuss that a little bit below.

Did anyone see this article recently?  Java exploit?  Yeah, apparently, that's what it was.  I'd bet that the systems that were examined had some pretty interesting *.idx files on them.

I would suspect that as awareness of these artifacts are raised, analysts will begin to recognize that it's now easier to perform a root cause analysis, and to determine the initial infection vector.  IMHO, the primary reason why this isn't done more often is because most assume that it takes too long to do an investigation; however, in not performing these investigations, we're missing out on a great deal of value information and intelligence that we can use to better protect our infrastructures.

Timeline Analysis
As I mentioned above, I used my own *.idx parser to add information from these files to a timeline...and this really helped my analysis.  

First, as I do sometimes, I parsed the data separately and created a mini-timeline from just the data from the *.idx files.  I do this, because in a full timeline, any times that data just gets lost.  Now, I could have done something like created the events file, and then pulled out the information for the *.idx files using the type command, and piping that through find.  Either way works.

Doing this showed me quite a bit about what the user likes to do during the day.  I think what was most revealing about this data was that it made clear how much Java is used on the Internet.  In a way, it was kind of fascinating, but at the same time, this didn't have much of anything to do with analysis goals.  The fact was that while my timeline showed modifications to several cache directories when the system was being infected,  what I did not see was the *.idx files that should have been part of the infection process.

One of the things I teach in my courses is that we need to understand artifacts so that you can not only recognize what's there, but so that you can also see when something that should be there isn't...and I had just such a case recently.  There were a couple of possibilities as to what I was seeing, and one thing I found was that the installed AV had detected the malware; it quarantined the malware (i.e., created a file in the Quarantine directory) but the AV logs also stated quite clearly that (a) it was unable to remove the files from their original location, and (b) the AV software could not alert the user (notifications were disabled).  Unfortunately, the logs did not go back all the way to the date of the infection, so I could not determine if the AV had actually detected the malware when it was installed, or if the detection I was seeing in the logs was a result of a product update that occurred after the infection.  However, I did see that a file with the extension ".idx" had been created in the Quarantine folder, and had since been deleted.  That wasn't definitive, but it was possible that the AV product had found something malicious in at least one *.idx file and had quarantined it.  This would definitely account for the modification that occurred to the cache folder during the infection routine.

So, from the data I was looking at, there were two possibilities...one was that the AV had detected the infection, and while it was able to remove some of the files, it was unable to remove the actual malware itself...and since most AV products do nothing about the malware persistence mechanisms, the malware remained active (as confirmed by other artifacts).

The other possibility was that during the infection process, the malware "cleaned up" behind itself.  After all, right there in the timeline, I could see that during the infection process, the Security Center was disabled, and the malware disabled the mechanism it used to gain access to the system.  This was a multiple-stage infection...it started with Java via IE, downloaded something that ran and created files on the system.  The second stage .exe file was deleted, but there were plenty of artifacts on the system that clearly showed not only that it had run, but also under what user context it had run.  And it was clear that the infection process had also closed off the first stage of the infection.

So, there were some interesting findings in the analysis...all of the above was determined within the first 6 hours of analysis.  As analysts, however, we're finders of fact, facts that we base and the data that I did have available had me leaning more toward the first possibility than the second.  But the fact is that using these techniques, I had enough data to clearly identify not just the facts, but the gaps, as well.

Addendum: After my initial post, I saw this post at the KrebsOnSecurity blog, and this one on disabling Java for the browser on the Sophos site.  Within minutes, I had written a RegRipper plugin to extract the UseJava2IExplorer value.

Resources
ForensicsWiki Java Page
Mark Woan's format specification document

Training
Interested in Windows DFIR training?  Windows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register.  Each course includes access to tools and techniques that you won't find anywhere else, as well as a demonstration of the use of the Forensic Scanner.

On 10-12 June 2013, a Windows Forensic Analysis and Registry Analysis combo course will be hosted at the Santa Cruz PD training facility.

Course descriptions and other info on the courses is available here.  Pricing for the combo course is $749 per seat, and will be listed on the ASI training page shortly.

Wednesday, February 13, 2013

Hosted Training

On 10-12 June 2013, a Windows Forensic Analysis and Registry Analysis combo course will be hosted at the Santa Cruz PD training facility.

Course descriptions and other info on the courses is available here.  Pricing information for the combo course will be available on the ASI training page shortly.

If you are interested in registering for a seat in the training, please contact me here.  As the date of the course approaches, information regarding parking, what to bring, etc., will be provided.

Wednesday, February 06, 2013

There Are Four Lights: The Forensic Scanner

I made a push recently via social media to raise awareness about the Forensic Scanner, and based on some of what I saw come back, I'd like to take a moment to describe what the Forensic Scanner is, and perhaps clear us some misconceptions about the tool.

Just a quick reminder to everyone...in Nov, 2012, the Forensic Scanner moved from the Google Code site to this GitHub site.  If you're going to try the Forensic Scanner, make sure that you run it as Administrator...if you have an admin account and you still have UAC enabled, you won't have what you think are full Admin rights on the box.  Check out Corey's blog post on the topic.

First off, the Forensic Scanner is just a tool, nothing more.  Like any other tool, if you don't understand how it was designed to be used, you very likely won't be using it to it's full capacity, or in it's most effective manner.  Scanner applications have been used in various segments of infosec for quite some time.  When I did vulnerability assessments back in the late '90s, we used scanner products to do some of the heavy lifting.  Even today, there are scanners available for web app assessments, but the key point to remember is that these applications are not intended to replace analysts, or remove analysts from the picture.  Instead, they are intended to perform a wide range of repeatable tests, so that an analyst can review the results and then focus their analysis efforts.  This is also the intention of the Forensic Scanner.

The Forensic Scanner is:

A library of corporate knowledge/intel: An analyst may spend 8, 16, 24 hours or more in analysis and find something new.  Take this finding, for example.  One way to address a finding like this is for the analyst to keep it to himself...but that doesn't really help anyone, does it?  An alternative might be to hold a brown-bag lunch with other analysts, put together a PPT, and tell them what you found.  But how much more useful would it be to write a plugin, and share it with the other analysts?  Within a few minutes, other analysts would have full access to the capability (i.e., finding the issue, or not...) without ever having to have the same experiences as the first analyst.  On a team of eleven analysts, if it took 16 hours for the first analyst to find the issue, you've just saved the team, as a whole, 16 hrs x 10 analysts = 160 hours of time.  This time can mean a great deal to your customer, as you will be providing with information they need to make critical business decisions in an extremely efficient manner.

By creating and maintaining plugins for these findings, the information is maintained in an accessible manner while the examiner who found the artifact is on vacation, or well after they left the organization.  With the proper oversight, the plugin won't simply have lines of code...it will include references and explanations, so that the findings are not only repeatable, but they can be easily understood and explained.

How are you at memorizing the paths to various web browser history files on different versions of Windows (i.e., XP vs Windows 7)?  How are you at mapping USB device usage?  Do you want to have that available at the push of a button?  That's what the Forensic Scanner can give you.

A force multiplier: By looking back at your last engagement and creating or updating plugins based on your findings, and then providing them to the team, you've bridged the gap between a checklist and actually implementing the checklist.  This allows the experience of each analyst to be shared with others, which can lead to more work being done by the same number of analysts, in a much more efficient and timely manner.

A path to a competitive advantage: Analysts are going to find things that others either don't see, or haven't seen yet.  As such, writing a plugin that you keep private within your team can lead to providing better, more comprehensive results to your customers, in a more timely manner.  Based on the plugins you have in your library, you may be able to determine not only the malware that infected a system, but also determine the initial infection vector, in a much more timely manner.  This means that you can provide not just findings, but intelligence to your customer, that they can then use to protect themselves.

The Forensic Scanner is NOT intended to replace any of the current tools that you own and use.  Rather, the purpose of the Forensic Scanner is to augment and optimize your use of those tools that you already own, and get you to the point of deep engagement with those tools much sooner.

Deployment Scenarios
When I first had the idea for the Forensic Scanner,

Lab Tech: A lab tech receives an image, and as part of the verification and in-processing procedures, runs a scan of the mounted image.  The lab tech then contacts the designated analyst to let her know that the image and report are in a specific location, either in the "cloud", or to be retrieved in some other manner.  Rather than having to run through all of the checks herself, the examiner can review the report and focus her analysis faster, providing much more comprehensive and timely findings.

LE examiner: An LE examiner might be interested in P2P file sharing, and one of the biggest issues for LE (at all levels) is that the examiners are cops first.  This makes it very difficult to keep up on various analysis techniques and artifacts, but the Forensic Scanner puts that right at your fingertips.  Perhaps your cases involving illicit images don't require you to do much more than find and catalog the images, and you're done.  Or perhaps you need more...did the user actually access the images at any point?  Did the transfer of the files involve a USB storage device of any kind?  Was a digital camera or a smartphone connected to the system? 

Consultant: A consultant or even an IT security staff member can be on-site, performing triage and acquisitions.  They can run a report, and because the report contains no sensitive (PII, PHI, PCI, etc.) information, they can archive/protect the report, and ship it off to another analyst who is off-site, who can then perform analysis of the report.  The analyst can then respond to the on-site consultant, providing information that can then help them focus their efforts ("..acquire these 5 systems instead of all 300..."). 


Interested in Windows DFIR trainingWindows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register.  Each course includes access to tools and techniques that you won't find anywhere else, as well as a demonstration of the use of the Forensic Scanner.

Saturday, February 02, 2013

BinMode: Parsing Java *.idx files, pt trios

Things have progressed a great deal since I last blogged on this subject.  Specifically, additional information and resources have been added to the ForensicsWiki page on this topic, and Brian has updated his Python code.  Mark Woan has created a .Net console application for parsing these files, as well, and his repo contains a PDF document that delineates the structure of the various versions of these files.

Running my own tool against the Java deployment cache on my system, I don't see much in the way of interesting data; most of what I have on this system is the result of accessing SANS webcasts.  However, parsing the data from the *.idx file that Corey provided, we see the following:


File: d:\cases\781da39f-6b6c0267.idx
URL: http://xhaito.com/work/builds/exp_files/rox.jar
IP: 91.213.217.31
content-length: 14226
last-modified: Sun Sep 12 15:15:32 2010 UTC

Server Response:
------------------------------
HTTP/1.1 200 OK
content-length: 14226
last-modified: Sun, 12 Sep 2010 15:15:32 GMT
content-type: text/plain
date: Sun, 12 Sep 2010 22:38:35 GMT
server: Apache/2
deploy-request-content-type: application/x-java-archive

The information displayed at the top of the output, above "Server Response", is from the header of the *.idx file, while the rest of the information is from Section 2 of the file.  For specifics of this data, take a look at the  PDF document that Mark provided.  Suffice to say, this is a great resource, because what you're seeing is extracted from the binary contents of the file.  Yes, the strings for the URL and IP address can be found via a text or keyword search, but an understanding of the data source and the data structure provides valuable context to the search hits.  Even better, a targeted, Sniper Forensics approach to going after the data is something that we can do now because of what we know about the data itself.

Okay...so what?  Now that we have this information available, how do we use it in exams?  Perhaps the most obvious would be to parse the contents of the *.idx files and check the output against the Malware Domain List, or "MDL".

Another method of analysis for this information would be to parse the data and correlate statistics from all of the available *.idx files (URL, IP address, content type, etc.), showing the stats as an overview before digging into the data itself.  Combining that two...MDL check and stats...would be a great way to perform data reduction. One might  incorporate checking against the MDL directly into a tool that parses the data within *.idx files for inclusion directly into a timeline, adding the pivot points directly to the timeline itself.  Incorporating this with other data...specifically, the user's web browser history...would allow an analyst to easily 'see' an Initial Infection Vector.

For me, the first step is to incorporate this information into a timeline...

Addendum:I updated my code recently to provide more than the output that you see above.  The new version includes options for CSV or TLN output.  It also includes a heuristic to help detect potentially malicious Java archives, as opposed to those that may be legit.


Interested in Windows DFIR trainingWindows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register.