Digital Forensics: 2020

Maksym Boiko, mboiko25@gmail.com, Kyiv, 2020

Some extractions of UFED 4PC are multi-volume ZIP-archives. In order to verify this process it needs to calculate checksum of this whole ZIP-file obtained during export by UFED 4PC software.

The following script calculates the sha256-checksum of each file separately and the entire block of files with an extension starting with the character "z". The hash value of the entire block of files is displayed in the last line.

This script allows to calculate sha256-checksums of files and provides the result in a format compatible with HashCheck, Terracopy, etc.

You should specify two parameters in the command line: 1) the full path to the data export folder, 2) the relative or full path to the file with computed sha256-values:

python.exe ufed-sha256.py <path to data folder> <path to file>

Example:

python.exe ufed-sha256.py "D:\UFED 2020_07_07 (001)\FileSystem Android Backup 01" D:\checksums.sha256

import os, sys
from datetime import datetime
import hashlib

#sha256
def file_as_bytes(f_name):
    with f_name:
        return f_name.read()

dir_ufed=sys.argv[1]
ff=sys.argv[2]
f1=open(ff,'w')

t1=datetime.now()
print ''
print(t1),'\n'
i=0
ln=''
f_sha256={}
sha256=hashlib.sha256()
sha256_all=hashlib.sha256()
for top, dirs, files in os.walk(dir_ufed):
    for nm in files:
        i=i+1
        f=os.path.join(top, nm) # top - filepath, nm - filename
        if f[-3:-2]=='z':
            f_sha256[i]=hashlib.sha256(file_as_bytes(open(f,'rb'))).hexdigest()
	    print f_sha256[i],'*'+f[3:]
            f1.writelines(f_sha256[i])
            f1.writelines(' *')
            f1.writelines(f[3:])
            f1.write('\n')
	    sha256_all.update(file_as_bytes(open(f,'rb')))

t2=datetime.now()
print ''
print(t2)
print(t2-t1),'\n'
ln='[SHA256]=',sha256_all.hexdigest()
print '[SHA256]=',sha256_all.hexdigest()
f1.writelines(ln)
f1.close()

import os, sys
from datetime import datetime
import hashlib

#sha256
def file_as_bytes(f_name):
    with f_name:
        return f_name.read()

dir_ufed=sys.argv[1]
ff=sys.argv[2]
f1=open(ff,'w')

t1=datetime.now()
print ''
print(t1),'\n'
i=0
ln=''
f_sha256={}
sha256=hashlib.sha256()
sha256_all=hashlib.sha256()
for top, dirs, files in os.walk(dir_ufed):
    for nm in files:
        i=i+1
        f=os.path.join(top, nm) # top - filepath, nm - filename
        if f[-3:-2]=='z':
            f_sha256[i]=hashlib.sha256(file_as_bytes(open(f,'rb'))).hexdigest()
	    print f_sha256[i],'*'+f[3:]
            f1.writelines(f_sha256[i])
            f1.writelines(' *')
            f1.writelines(f[3:])
            f1.write('\n')
	    sha256_all.update(file_as_bytes(open(f,'rb')))

t2=datetime.now()
print ''
print(t2)
print(t2-t1),'\n'
ln='[SHA256]=',sha256_all.hexdigest()
print '[SHA256]=',sha256_all.hexdigest()
f1.writelines(ln)
f1.close()

Maksym Boiko, mboiko25@gmail.com, Kyiv, 2020

Remark.

Revision Save IDs (RSIDs) are an interesting thing within OOXML-files. These tags can help an investigator to compare several Microsoft Office documents more deeply and to define a parent document for some text.

There were not so many works on this theme. Maybe, the best one is “Forensic Analysis of OOXML Documents” (E. Didriksen). But if you want to comprehend all the subtleties of WordProcessingML document’ structure, you need to have read some parts of “Standard ECMA-376 Office Open XML File Formats”. Of course, you need to investigate a lot of OOXML-files yourself as well.

I hope I will have more time to present a more comprehensive work with practical cases in the future.

Особливості документів WordprocessingML.pdf (ukr., укр.)

WordprocessingML-Boiko.pdf (eng.)