середа, 1 травня 2019 р.

SCRIPTS FOR TELEGRAM CHAT EXTRACTION


Maksym Boiko, mboiko25@gmail.com, Kyiv, 2019


After some requests I decided to attach some test scripts. I didn't do them for widespread use. All of them were for my particular cases. These codes were written in 2018. But in general, an internal structure of cache4.db remains the same. 
As for telegram-parse-enc.py, sometimes it has problems with transliteration and encoding. It is good for analyzing with unknown, poorly studied structure or raw data (for instance, in cases of searching within RAM dumps). 
I have to say that in my opinion this way of analyzing intact cache4.db is not optimal. It is better to analyze it as simple SQLite database and its fields. 
For these purposes I use telegram-sqlite-secret.pytelegram-sqlite-v7.py or telegram-sqlite-v7-without-chats.py. Briefly, they perform SQLite queries and then analyze data depending on whether it is chat, secret chat, channel or not. If my memory doesn't fail me, they have similar codes. Important moment - I specify database filename in the beginning of these scripts (for instance, ff='cache4.db-wal' or conn=sqlite3.connect('cache4.db')).
I hope scripts will be useful for you.

пʼятниця, 19 квітня 2019 р.

SOME PRACTICAL ASPECTS OF SQLITE RECORD STRUCTURE ON AN EXAMPLE OF TELEGRAM “CACHE4.DB” FILE



Maksym Boiko, mboiko25@gmail.com, Kyiv, 2019

The subject of study is deeper understanding of the SQLite3 database file on examples of the “cache4.db” file of Telegram for Android.
Software. The study used the following software: operating systems Windows 10, Microsoft Office 365 Home 16.0.10730.20102 (© 2012 Microsoft); sqlite 3.26.0 (https:/www.sqlite.org/copyright.html), SQLite Expert Personal 5.3.0.339 (© 2018 Coral Creek Software), AccessData FTK Imager 4.2.0.13 (© 2016 AccessData Group, Inc.), HxD Hex Editor 2.1.0.0 (© 2002-2018 Maël Hörz).

I’d like to present you a simplified algorithm of SQLite records’ structure analysis. I chose a “cache4.db” database file for this purpose. This file contains Telegram message-related data. It should be said some tools that specialized on mobile forensics doesn’t support an analyzing of secret chats of Telegram for Android (as well as in WAL-files).
The “cache4.db” file is a SQLite database. So it has an internal structure that is described and documented in detail (see the following link https://www.sqlite.org/fileformat2.html).
Every record from a SQLite table contains from 2 main parts: a header and a body. Also payload length and RowID are represented before a record header.
Usually header length consists of a single byte. Values for every column follow the header. If serial type is equal to 0, 8, 9, 12, 13, then a field’s length is 0 bytes. If serial type is equal to 1, 2, 3, 4, 5, 6, 7, then a field’s length can be 1, 2, 3, 4, 6 or 8 bytes.
To correctly interpret header serial types, we will use explanations that are represented below and available by link https://www.sqlite.org/fileformat2.html:

Table 1 Serial Type Codes Of The Record Format
Serial Type
Content Size
Meaning
0
0
Value is a NULL.
1
1
Value is an 8-bit twos-complement integer.
2
2
Value is a big-endian 16-bit twos-complement integer.
3
3
Value is a big-endian 24-bit twos-complement integer.
4
4
Value is a big-endian 32-bit twos-complement integer.
5
6
Value is a big-endian 48-bit twos-complement integer.
6
8
Value is a big-endian 64-bit twos-complement integer.
7
8
Value is a big-endian IEEE 754-2008 64-bit floating point number.
8
0
Value is the integer 0. (Only available for schema format 4 and higher.)
9
0
Value is the integer 1. (Only available for schema format 4 and higher.)
10,11
variable
Reserved for internal use. These serial type codes will never appear in a well-formed database file, but they might be used in transient and temporary database files that SQLite sometimes generates for its own use. The meanings of these codes can shift from one release of SQLite to the next.
N≥12 and even
(N-12)/2
Value is a BLOB that is (N-12)/2 bytes in length.
N≥13 and odd
(N-13)/2
Value is a string in the text encoding and (N-13)/2 bytes in length. The null terminator is not stored.

In cases of large BLOBs and strings, we need to convert serial type’s values for determining of fields’ length.
So, if the 1st more significant byte (N1) of some field is more than 0x80 (128 in decimal) then we need to get the next byte (N2) and to do some mathematical operations. To determine a length of particular field that is specified within record’s header, we need to perform following actions:

(if (N1-128)*128 + N2 is even)
or

(if (N1-128)*128 + N2 is odd)


Note: To calculate a payload length we only need to convert these values to decimal. In case of determining of a large payload’s length, we need to perform only following actions:

(N1-128)*128 + N2

Let’s imagine that we have extracted a Telegram “cache4.db” file from Android device. (see the following link for detailedinformation about location, typical file structure, etc.)
First of all to ease our task we need to determine a table with significant information, its schema, and only then we’ll be able to simply analyze records from early mentioned table we’re interested in. For instance, we’d like to search for message-related data in the “Messages” table.
We can find out a schema of this table in different ways – by using sqlite dot-commands, SQLite database viewers, HEX-viewer, or even from another similar database, etc. Knowing table’s schema, we can disassemble every record of certain table. For example, a schema of the “Messages” table has been received by using sqlite application and .schema dot-command as shown on the following figure 1:

CREATE TABLE messages(
mid INTEGER PRIMARY KEY,
uid INTEGER,
read_state INTEGER,
send_state INTEGER,
date INTEGER,
data BLOB,
out INTEGER,
ttl INTEGER,
media INTEGER,
replydata BLOB,
imp INTEGER,
mention INTEGER
);

Figure 1 “Messages” table’s schema

So, we can make a conclusion that our table has 12 columns with values that we can find by analyzing records’ headers and bodies in way mentioned below.
Let’s consider in details one of the records from the “Messages” table that is shown on the following figure 2. This record is presented in raw on the figure 3. We can see an internal structure of this record in the table 2 and on the following figures 4, 5, 6 as well as descriptions and interpretations of received values.

Figure 2 A record from “Messages” table

Figure 3 A record from “Messages” table (in raw)
   
Table 2 Record format
Offset
Length
Value
Interpretation
Description
1
0x7e83
2
0x827a
378
Payload length
2
0x7e85
1
0x26
38
RowID
3
0x7e86
1
0x0e
14
Record header length
4
0x7e87
1
0x00
NULL
mid
5
0x7e88
1
0x03
3 (24-bit integer)
uid length
6
0x7e89
1
0x01
1 (8-bit integer)
read_state length
7
0x7e8a
1
0x08
0
send_state
8
0x7e8b
1
0x04
4 (32-bit integer)
date length
9
0x7e8c
2
0x8554
356 (BLOB length)
data length
10
0x7e8e
1
0x08
0
out
11
0x7e8f
1
0x08
0
ttl
12
0x7e90
1
0x08
0
media
13
0x7e91
1
0x00
NULL
replydata
14
0x7e92
1
0x08
0
imp
15
0x7e93
1
0x08
0
mention
16
0x7e94
3
0x0bdb28
777000
uid value
17
0x7e97
1
0x03
3
read_state value
18
0x7e98
4
0x5c39f0c0
1547301056
date value
19
0x7e9c
356
Data (BLOB)
Message + metadata
Data (BLOB)



Figure 4 Payload length, RowID and record

Figure 5 Record header

Figure 6 Calculating of field’s length



At the end, it should be said that this topic is additional for previous articles about Telegram.