Wednesday, August 02, 2006
"Forensically Sound Duplicate"
I was reading Craig Ball's (excellent) presentations on computer forensics for lawyers at (http://www.craigball.com/articles.html). One of the articles mentions a definition for forensically sound duplicate as:
"A 'forensically-sound' duplicate of a drive is, first and foremost, one created by a method which does not, in any way, alter any data on the drive being duplicated. Second, a forensically-sound duplicate must contain a copy of every bit, byte and sector of the source drive, including unallocated 'empty' space and slack space, precisely as such data appears on the source drive relative to the other data on the drive. Finally, a forensically-sound duplicate will not contain any data (except known filler characters) other than which was copied from the source drive."
There are 3 parts to this definition:
Picking this definition apart, the first thing I noticed (and subsequently emailed Craig about) was the fact that the first part of the definition is often an ideal. Take for instance imaging RAM from a live system. The act of imaging a live system changes the RAM and consequently the data. The exception would be to use a hardware device that dumps RAM (see "A Hardware-Based Memory Acquisition Procedure for Digital Investigations" by Brian Carrier.)
During the email discussions, Craig pointed out an important distinction between data alteraration inherent in the acquisition process (e.g. running a program to image RAM requires the imaging program to be loaded into RAM, thereby modifying the evidence) and data alteration in an explicit manner (e.g. wipe the source evidence as it is being imaged.) Remeber, one of the fundamental components of digital forensics is the preservation of digital evidence.
A forensically sound duplicate should be acquired in such a manner that the acquisition process minimizes the data alterations inherent to data acquisition, and not explicitly alter the source evidence. Another way of wording this could be "an accurate representation of the source evidence". This wording is intentionally broad, allowing one to defend/explain how the acquisition was accurate.
The second part of the definition states that the duplicate should contain every bit, byte, and sector of the source evidence. Similar to the first part of the definition, this is also an ideal. If imaging a hard disk or other physical media, then this part of the definition normally works well. Consider the scenario when a system with multiple terabytes of disk storage contains an executable file with malicious code. If the size of the disk (or other technological restriction) prevents imaging every bit/byte/sector of the disk, then how should the contents of the file be analyzed if simply copying the contents of the file does not make it "forensically sound"? What about network based evidence? According to the folks at the DFRWS 2001 conference (see the "Research Road Map pdf") there are 3 "types" of digital forensic analysis that can be applied:
- Media analysis (your traditional file system style analysis)
- Code analysis (which can be further abstracted to content analysis)
- Network analysis (analyzing network data)
Since more and more of the latter two types of evidence are starting to come into play (e.g. the recent UBS trial with Keith Jones analyzing a logic bomb), a working definiton of "forensically sound duplicate" shouldn't be restricted to just "media analysis". Perhaps this can be worded as "a complete representation of the source evidence". Again, intentionally broad so as to leave room for explanation of circumstances.
The third part of the definition states that the duplicate will not contain any additional data (with the exception of filler characters) other than what was copied from the source medium. This part of the definition rules out "logical evidence containers", essentially any type of evidence file format that includes any type of metadata (e.g. pretty much anything "non dd".) Also compressing the image of evidence on-the-fly (e.g. dd piped to gzip piped to netcat) would break this. Really, if the acquisition process introduces data not contained in the source evidence, the newly introduced data should be distinguishable from the duplication of the source evidence.
Now beyond the 3 parts that Craig mentions, there are a few other things to examine. First of all is what components of digital forensics should a working definition of "forensically sound" cover? Ideally just the acquisition process. The analysis component of forensics, while driven by what was and was not acquired, should not be hindered by the definition of "forensically sound".
Another fact to consider is that a forensic exam should be neutral, and not "favor" one side or the other. This is for several reasons:
- Digital forensic science is a scientific discipline. Science is ideally as neutral as possible (introducing as little bias as possible). Favoring one side or the other introduces bias.
- Often times the analysis (and related conclusions) are used to support an argument for or against some theory. Not examining relevant information that could either prove or disprove a theory (e.g. inculpatory and exculpatory evidence) can lead to incorrect decisions.
So, the question as to what data should and shouldn't be included in a forensically sound duplicate is hard to define. Perhaps "all data that is relevant and reasonably believed to be relevant." The latter part could come into play when examining network traffic (especially on a large network). For instance, when monitoring a suspect on the network (sniffing traffic) and I create a filter to only log/extract traffic to and from a system the suspect is on, I am potentially missing other traffic on the network (sometimes this can even be legally required as sniffing network traffic is considered in many places a wiretap). A definition of "forensically sound duplicate" shouldn't prevent this type of acquisition.
So, working with some of what we have, here is perhaps a (base if nothing else) working defintion for "forensically sound":
"A forensically sound duplicate is a complete and accurate representation
of the source evidence. A forensically sound duplicate is obtained in a
manner that may inherently (due to the acquistion tools, techniques, and
process) alter the source evidence, but does not explicitly alter the
source evidence. If data not directly contained in the source evidence is
included in the duplicate, then the introduced data must be
distinguishable from the representation of the source evidence. The use
of the term complete refers to the components of the source evidence that
are both relevant, and reasonably believed to be relevant."
At this point I'm interested in hearing comments/criticisms/etc. so as to improve this defintion. If you aren't comfortable posting in a public forum, you can email me instead and I'll anonymize the contents :)
"A 'forensically-sound' duplicate of a drive is, first and foremost, one created by a method which does not, in any way, alter any data on the drive being duplicated. Second, a forensically-sound duplicate must contain a copy of every bit, byte and sector of the source drive, including unallocated 'empty' space and slack space, precisely as such data appears on the source drive relative to the other data on the drive. Finally, a forensically-sound duplicate will not contain any data (except known filler characters) other than which was copied from the source drive."
There are 3 parts to this definition:
- Obtained by a method which does not, in any way, alter any data on the drive being duplicated
- That a forensically sound duplicate must contain a copy of every bit, byte and sector of teh source drive
- That a forensically sound duplicate will not contain any data except filler characters (for bad areas of the media) other than that which was copied from the source media.
Picking this definition apart, the first thing I noticed (and subsequently emailed Craig about) was the fact that the first part of the definition is often an ideal. Take for instance imaging RAM from a live system. The act of imaging a live system changes the RAM and consequently the data. The exception would be to use a hardware device that dumps RAM (see "A Hardware-Based Memory Acquisition Procedure for Digital Investigations" by Brian Carrier.)
During the email discussions, Craig pointed out an important distinction between data alteraration inherent in the acquisition process (e.g. running a program to image RAM requires the imaging program to be loaded into RAM, thereby modifying the evidence) and data alteration in an explicit manner (e.g. wipe the source evidence as it is being imaged.) Remeber, one of the fundamental components of digital forensics is the preservation of digital evidence.
A forensically sound duplicate should be acquired in such a manner that the acquisition process minimizes the data alterations inherent to data acquisition, and not explicitly alter the source evidence. Another way of wording this could be "an accurate representation of the source evidence". This wording is intentionally broad, allowing one to defend/explain how the acquisition was accurate.
The second part of the definition states that the duplicate should contain every bit, byte, and sector of the source evidence. Similar to the first part of the definition, this is also an ideal. If imaging a hard disk or other physical media, then this part of the definition normally works well. Consider the scenario when a system with multiple terabytes of disk storage contains an executable file with malicious code. If the size of the disk (or other technological restriction) prevents imaging every bit/byte/sector of the disk, then how should the contents of the file be analyzed if simply copying the contents of the file does not make it "forensically sound"? What about network based evidence? According to the folks at the DFRWS 2001 conference (see the "Research Road Map pdf") there are 3 "types" of digital forensic analysis that can be applied:
- Media analysis (your traditional file system style analysis)
- Code analysis (which can be further abstracted to content analysis)
- Network analysis (analyzing network data)
Since more and more of the latter two types of evidence are starting to come into play (e.g. the recent UBS trial with Keith Jones analyzing a logic bomb), a working definiton of "forensically sound duplicate" shouldn't be restricted to just "media analysis". Perhaps this can be worded as "a complete representation of the source evidence". Again, intentionally broad so as to leave room for explanation of circumstances.
The third part of the definition states that the duplicate will not contain any additional data (with the exception of filler characters) other than what was copied from the source medium. This part of the definition rules out "logical evidence containers", essentially any type of evidence file format that includes any type of metadata (e.g. pretty much anything "non dd".) Also compressing the image of evidence on-the-fly (e.g. dd piped to gzip piped to netcat) would break this. Really, if the acquisition process introduces data not contained in the source evidence, the newly introduced data should be distinguishable from the duplication of the source evidence.
Now beyond the 3 parts that Craig mentions, there are a few other things to examine. First of all is what components of digital forensics should a working definition of "forensically sound" cover? Ideally just the acquisition process. The analysis component of forensics, while driven by what was and was not acquired, should not be hindered by the definition of "forensically sound".
Another fact to consider is that a forensic exam should be neutral, and not "favor" one side or the other. This is for several reasons:
- Digital forensic science is a scientific discipline. Science is ideally as neutral as possible (introducing as little bias as possible). Favoring one side or the other introduces bias.
- Often times the analysis (and related conclusions) are used to support an argument for or against some theory. Not examining relevant information that could either prove or disprove a theory (e.g. inculpatory and exculpatory evidence) can lead to incorrect decisions.
So, the question as to what data should and shouldn't be included in a forensically sound duplicate is hard to define. Perhaps "all data that is relevant and reasonably believed to be relevant." The latter part could come into play when examining network traffic (especially on a large network). For instance, when monitoring a suspect on the network (sniffing traffic) and I create a filter to only log/extract traffic to and from a system the suspect is on, I am potentially missing other traffic on the network (sometimes this can even be legally required as sniffing network traffic is considered in many places a wiretap). A definition of "forensically sound duplicate" shouldn't prevent this type of acquisition.
So, working with some of what we have, here is perhaps a (base if nothing else) working defintion for "forensically sound":
"A forensically sound duplicate is a complete and accurate representation
of the source evidence. A forensically sound duplicate is obtained in a
manner that may inherently (due to the acquistion tools, techniques, and
process) alter the source evidence, but does not explicitly alter the
source evidence. If data not directly contained in the source evidence is
included in the duplicate, then the introduced data must be
distinguishable from the representation of the source evidence. The use
of the term complete refers to the components of the source evidence that
are both relevant, and reasonably believed to be relevant."
At this point I'm interested in hearing comments/criticisms/etc. so as to improve this defintion. If you aren't comfortable posting in a public forum, you can email me instead and I'll anonymize the contents :)