Maksym Boiko, mboiko25@gmail.com, Kyiv, 2019
The subject of study is deeper understanding of the SQLite3 database file
on examples of the “cache4.db” file of Telegram for Android.
Software. The study used the following software: operating
systems Windows 10, Microsoft Office 365 Home 16.0.10730.20102 (© 2012
Microsoft); sqlite 3.26.0 (https:/www.sqlite.org/copyright.html), SQLite Expert
Personal 5.3.0.339 (© 2018 Coral Creek Software), AccessData FTK Imager
4.2.0.13 (© 2016 AccessData Group, Inc.), HxD Hex Editor 2.1.0.0 (© 2002-2018
Maël Hörz).
I’d like to present you a simplified algorithm of SQLite
records’ structure analysis. I chose a “cache4.db” database file for this purpose. This file contains Telegram
message-related data. It should be said some tools that specialized on mobile
forensics doesn’t support an analyzing of secret chats of Telegram for Android
(as well as in WAL-files).
Every record from
a SQLite table contains from 2 main parts: a header and a body. Also payload
length and RowID are represented before a record header.
Usually
header length consists of a single byte. Values for every column follow the header.
If serial type is equal to 0, 8, 9, 12, 13, then a field’s length is 0 bytes.
If serial type is equal to 1, 2, 3, 4, 5, 6, 7, then a field’s length can be 1,
2, 3, 4, 6 or 8 bytes.
Table 1 Serial Type Codes Of
The Record Format
Serial Type
|
Content Size
|
Meaning
|
0
|
0
|
Value is a NULL.
|
1
|
1
|
Value is an 8-bit twos-complement integer.
|
2
|
2
|
Value is a big-endian 16-bit twos-complement
integer.
|
3
|
3
|
Value is a big-endian 24-bit twos-complement
integer.
|
4
|
4
|
Value is a big-endian 32-bit twos-complement
integer.
|
5
|
6
|
Value is a big-endian 48-bit twos-complement
integer.
|
6
|
8
|
Value is a big-endian 64-bit twos-complement
integer.
|
7
|
8
|
Value is a big-endian IEEE 754-2008 64-bit
floating point number.
|
8
|
0
|
Value is the integer 0. (Only available
for schema format 4 and higher.)
|
9
|
0
|
Value is the integer 1. (Only available
for schema format 4 and higher.)
|
10,11
|
variable
|
Reserved for
internal use. These serial type codes will never appear in a well-formed
database file, but they might be used in transient and temporary database
files that SQLite sometimes generates for its own use. The meanings of these
codes can shift from one release of SQLite to the next.
|
N≥12 and
even
|
(N-12)/2
|
Value is a BLOB that is (N-12)/2 bytes in
length.
|
N≥13 and
odd
|
(N-13)/2
|
Value is a string in the text encoding and (N-13)/2 bytes in length. The null
terminator is not stored.
|
In cases of
large BLOBs and strings, we need to convert serial type’s values for determining of fields’
length.
So, if the 1st
more significant byte (N1) of some field is more than 0x80 (128 in
decimal) then we need to get the next byte (N2) and to do some
mathematical operations. To determine a length of particular field that is
specified within record’s header, we need to perform following actions:
|
(if (N1-128)*128 + N2 is even)
|
or
|
|
|
(if (N1-128)*128 + N2 is odd)
|
Note: To calculate a payload length we only need to convert these values to
decimal. In case of determining of a
large payload’s length, we need to perform only following actions:
(N1-128)*128 + N2
First of all to ease our task we need to determine a
table with significant information, its schema, and only then we’ll be able to
simply analyze records from early mentioned table we’re interested in. For
instance, we’d like to search for message-related data in the “Messages” table.
We can find out a schema of this table in
different ways – by using sqlite dot-commands, SQLite database viewers,
HEX-viewer, or even from another similar database, etc. Knowing table’s schema,
we can disassemble every record of certain table. For example, a schema of the
“Messages” table has been received by using sqlite application and .schema
dot-command as shown on the following figure 1:
CREATE TABLE messages(
mid INTEGER PRIMARY KEY,
uid INTEGER,
read_state INTEGER,
send_state INTEGER,
date INTEGER,
data BLOB,
out INTEGER,
ttl INTEGER,
media INTEGER,
replydata BLOB,
imp INTEGER,
mention INTEGER
);
Figure
1 “Messages” table’s schema
So, we can make
a conclusion that our table has 12 columns with values that we can find by
analyzing records’ headers and bodies in way mentioned below.
Let’s
consider in details one of the records from the “Messages” table that is shown
on the following figure 2. This record is presented in raw on the figure 3. We
can see an internal structure of this record in the table 2 and on the
following figures 4, 5, 6 as well as descriptions and interpretations of
received values.
Figure 2 A record from “Messages” table
Figure 3 A record from “Messages” table (in raw)
Table 2 Record
format
№
|
Offset
|
Length
|
Value
|
Interpretation
|
Description
|
1
|
0x7e83
|
2
|
0x827a
|
378
|
Payload length
|
2
|
0x7e85
|
1
|
0x26
|
38
|
RowID
|
3
|
0x7e86
|
1
|
0x0e
|
14
|
Record header length
|
4
|
0x7e87
|
1
|
0x00
|
NULL
|
mid
|
5
|
0x7e88
|
1
|
0x03
|
3 (24-bit integer)
|
uid length
|
6
|
0x7e89
|
1
|
0x01
|
1 (8-bit integer)
|
read_state length
|
7
|
0x7e8a
|
1
|
0x08
|
0
|
send_state
|
8
|
0x7e8b
|
1
|
0x04
|
4 (32-bit integer)
|
date length
|
9
|
0x7e8c
|
2
|
0x8554
|
356 (BLOB length)
|
data length
|
10
|
0x7e8e
|
1
|
0x08
|
0
|
out
|
11
|
0x7e8f
|
1
|
0x08
|
0
|
ttl
|
12
|
0x7e90
|
1
|
0x08
|
0
|
media
|
13
|
0x7e91
|
1
|
0x00
|
NULL
|
replydata
|
14
|
0x7e92
|
1
|
0x08
|
0
|
imp
|
15
|
0x7e93
|
1
|
0x08
|
0
|
mention
|
16
|
0x7e94
|
3
|
0x0bdb28
|
777000
|
uid value
|
17
|
0x7e97
|
1
|
0x03
|
3
|
read_state value
|
18
|
0x7e98
|
4
|
0x5c39f0c0
|
1547301056
|
date value
|
19
|
0x7e9c
|
356
|
Data (BLOB)
|
Message +
metadata
|
Data (BLOB)
|
Figure 4 Payload length, RowID and record
Figure 5 Record header
Figure 6 Calculating of field’s length