Jekyll2026-04-11T10:27:58+00:00https://devos50.github.io/feed.xmlblankMy personal website. Emulating an iPod Touch 2G using QEMU2023-12-25T00:00:00+00:002023-12-25T00:00:00+00:00https://devos50.github.io/blog/2023/ipod-touch-2g-qemu<![CDATA[

In a previous blog post, I described how I managed to get an iPod Touch 1G up and running using the QEMU emulator. While I’m very happy that the emulator runs smoothly overall, its functionality is limited to some stock apps, not all of which are fully functional. Moreover, I received a few questions on whether it would be possible to run third-party apps. Unfortunately, the iPod Touch 1G (and iPhoneOS 1.0) do not come with the App Store or an SDK; thus, the amount of third-party apps that can run on this particular device and OS is limited.

Therefore, I decided to shift focus to emulating an iPod Touch 2G, arguably one of the most popular and iconic iPod Touch devices. This device was also my first Apple product, and it motivated me to pursue iOS development. I targeted iOS 2.1.1, the lowest version of iOS that this device can run. This time, I started by emulating the bootrom and working towards userland emulation and executing SpringBoard, hoping to re-use most of the emulated hardware devices from the iPod Touch 1G. In this blog post, I will describe some challenges I encountered, architectural differences between the iPod Touch 2G and the iPod Touch 1G, and my further plans for emulation. All source code and instructions on how to run the emulator can be found in this GitHub repository.

For reverse engineering and understanding the components, I gratefully used this device tree dump, which has been tremendously helpful.

iPod Touch 2G Schematic

For reference, I created the schematic below to show some of the hardware components this device uses and grouped related elements. However, this schematic still needs to be completed since there are some components I still need to start looking at, such as components related to video encoding/decoding. However, I have some QEMU logic for all components in this diagram, and I believe it highlights how complicated these kinds of embedded devices are from an architectural perspective.

The First Steps with SecureROM and LLB

I started by emulating the 240.4 SecureROM bootrom, which I downloaded from here. Similar to the iPod Touch 1G, the iPod Touch 2G uses the ARMv6 instruction set, so fortunately, I didn’t have to make significant changes to the emulated CPU in QEMU. The bootrom puts the device in DFU mode when a particular button combination is pressed or loads the low-level bootloader (LLB) stored on the NOR memory. Here, things started to be different from the iPod Touch 1G.

The first difference is that the iPod Touch 2G has NOR memory accessed through the SPI controller. In comparison, the iPod Touch 1G used CFI flash, the functionality of which was already provided by QEMU. Therefore, I had to reverse engineer the communication protocol with the NOR, which was relatively easy as this protocol is rather straightforward. Its implementation can be found here.

The second, more challenging difference is that the iPod Touch 2G uses the more secure IMG3 file format (instead of IMG2) to store binaries on the NOR/NAND. Luckily, IMG3 has been well-documented, but it involves quite a bit of cryptographic operations, including RSA signatures and hashes to guarantee authenticity and integrity. Since I aim to get an emulator as close to the hardware as possible, I wanted to avoid patching the binaries to turn off signature verification. Therefore, I had to get the PKE engine up and running, which is responsible for modulo arithmetic when verifying RSA signatures. I computed the resulting numbers using OpenSSL’s BIGNUM functionality. Even though my implementation is a bit hacky, it seems to pass the signature verification of all IMG3 images that are loaded.

Some IMG3 images also rely on the UID key, a key unique to each Apple device and fused into the application processor during manufacturing. Since I don’t have access to this key, I generated one myself and used that one when storing IMG3 files on the NOR/NAND. I then define the same key in the AES engine.

I re-used many iPod Touch 1G emulator hardware components, including the timers, clocks, and vector interrupt controllers. It was straightforward to get past LLB as its main responsibility was initializing hardware components and loading iBoot, the main bootloader.

iBoot and the NAND

For several months, I have been stuck in various functions of iBoot. The first challenge came from the LCD, which uses a different underlying communication protocol (MIPI-DSIM) than the iPod Touch 1G. The second challenge, which was much more difficult, was to get the NAND working where iBoot loads the kernel image from. The NAND in the iPod Touch 2G has two main differences from the iPod Touch 1G. First, the NAND driver uses a different communication protocol, referred to as FMSS in the kernel. Second, there were some significant differences in the VFL/FTL specifications, so I had to spend quite some time understanding the mapping function that translates a logical block number to a physical one.

In the iPod Touch 1G and iPhoneOS 1.0, I could easily pass boot args to the kernel. However, Apple has turned off this functionality in production mode for the iPod Touch 2G. Therefore, I had to manually write this string to the appropriate location in kernel memory. There are probably better solutions than this, but some boot flags were necessary to set to get debug output printed to the terminal.

const char *boot_args = "kextlog=0xfff debug=0x8 cpus=1 rd=disk0s1 serial=1 pmu-debug=0x1 io=0xffff8fff debug-usb=0xffffffff";
cpu_physical_memory_write(0x0ff2a584, boot_args, strlen(boot_args));

I also spent quite some effort getting past the kernel image’s signature verification. For some reason, my decryption algorithm does not correctly decrypt the final block when the length of the data is not a multiple of the key size, causing the CRC computation to fail. I am still unsure what I am doing wrong here, but to work around this, I patched the expected CRC code with the code I get during decryption. This workaround correctly validates the kernel image but should still be revised.

Booting the Kernel and SpringBoard

The kernel boots most of the hardware components defined in the device tree. I did not have too many difficulties getting past the initialization of the remaining hardware components. IOKit, Apple’s framework for implementing drivers, is very well-structured, and its debug output makes it easy to identify when and where a driver gets stuck.

When the kernel has booted, it launches launchd, which executes different system daemons, including SpringBoard. The configuration of most of these daemons is located in /System/Library/LaunchDaemons. The iPod Touch 2G has many different daemons, including ones related to media, DNS, FairPlay, and iTunes. To simplify things, I only enabled the necessary daemons to get SpringBoard operational and disabled all others for the time being.

I was very happy that the multitouch driver worked out of the box since I spent quite some time getting it functional for the iPod Touch 1G. With that, I now have a basic iPod Touch 2G emulator!

Working towards WiFi

After SpringBoard became operational, I wanted to know how difficult it would be to get the WiFi controller functional. The iPod Touch 2G uses a Broadcom BCM4325, and one can communicate with that device through the SDIO controller. This driver is included in the kernel as a kernel module. I noticed that the kernel would not automatically load the BCM4325 kernel module, and to date, I’m still trying to figure out where and how exactly this module is loaded. To work around this issue, I patched the kernel and added some instructions to load this kernel module manually. Again, it’s not ideal, but at least these drivers are loaded now.

The SDIO standard is well-documented, and it took little time to get that part up and running. The BCM4325 hardware is more complicated, and the iOS kernel uploads a Linux kernel image to the device as part of the boot procedure. The hardware also has many registers, and their content is unclear. However, with some hacking, I could get past the initialization procedure, as seen in the screenshot below. The initialization procedure sometimes hangs, probably because of race conditions or other timing issues. However, full WiFi emulation is far from functional as basic primitives such as SSID scanning or connecting to a network are still not implemented.

Next Steps

I enjoyed working on this next step in emulating legacy Apple devices and improving my understanding of the hardware components in the iPod Touch 2G. I believe the iPod Touch 2G is more stable and easier to work with than the iPod Touch 1G and has quite some potential since it can natively run third-party applications. Below, I list some follow-up steps that would improve the usability of the emulator.

  • Running bash would make debugging processes, executing applications, and collecting statistics from the device much easier. I have yet to look into this.
  • I would also like to make installing third-party apps on the emulator much easier, which now requires a full rebuild of the NAND memory.
  • Some compilation issues need to be resolved for the emulator to run on other platforms, including Windows and Linux.
  • NAND persistence still needs to be fixed, e.g., writes to the NAND are not stored across sessions.

I’m also happy to announce that I will give a talk about this project at FOSDEM’24 in Brussels, also see here. Hope to see you there!

As always, please let me know what you think by opening an issue on the GitHub repository.

]]><![CDATA[In a previous blog post, I described how I managed to get an iPod Touch 1G up and running using the QEMU emulator. While I’m very happy that the emulator runs smoothly overall, its functionality is limited to some stock apps, not all of which are fully functional. Moreover, I received a few questions on whether it would be possible to run third-party apps. Unfortunately, the iPod Touch 1G (and iPhoneOS 1.0) do not come with the App Store or an SDK; thus, the amount of third-party apps that can run on this particular device and OS is limited.]]>Censorship-Resistant Indexing and Search for Web32023-06-26T00:00:00+00:002023-06-26T00:00:00+00:00https://devos50.github.io/blog/2023/descan<![CDATA[

This blog post has been co-written by Georgy Ishmaev. This work is funded by the Ethereum Foundation.

Depending upon who you ask, Web3 is experiencing explosive growth or still has to live up to the expectations. But either way, the decentralized apps we have now already have real users. And with these users come difficult questions of usability and, yes, inevitable design compromises sacrificing something to get this usability now. Sometimes these compromises seem to contradict the very ideals of Web3, such as decentralization, privacy, and censorship. This is almost a philosophical question. But if it is the same as Web2 on these metrics, we might as well stop trying to make it. Thankfully, there seems to be some mild social consensus and acknowledgment of this thesis among parts of the blockchain community, such as the Ethereum ecosystem.

In no small degree, this acknowledgment was stimulated by sad and disturbing developments around Tornado Cash (TC) last year, and that reverberated through the community. Consequently, the situation around TC transactions censorship by flash bots relays that, for a brief moment, became a centralized bottleneck, reminding everyone why this may not be the best idea to create critical dependencies on a trusted party in a permissionless environment.

So for Web3 (or whatever alternative to the current web) to have a winning chance, we must work consistently towards decentralization on different layers and across a broad range of use cases. Layer 1 solutions such as Ethereum indicate a map showing not only problematic points of the decentralization process but also positive examples of how we can progress on this process. For example, comparing the progress in permissionless consensus algorithms with decentralized storage layer solutions, we can see that the latter needs to catch up. Imagine if we get all kinds of decentralized apps, only for them to store user data on AWS; not fun at all. Hopefully, we will eventually get to the point where we have not only permissionless, trustless Dapps execution but also permissionless, trustless data storage. Imagine, it is already almost there. Great, now we can enjoy it all: “Just need to search for the address of this new Dapp in Google.”

Wait a minute; this is not how it was supposed to be at all! So yes, we need to consider not just the decentralization of different layers but also the decentralization of enabling services on the application level, which will be critical for the usability of Web3. Search engines and indexers are essential applications for the web now and likely will remain as such in the coming future. But if the current state of affairs is any indication, we are already becoming dependent on centralized blockchain explorers to search addresses, transactions or sometimes to check your asset portfolio. While some say this is not a big deal (as frontends can always be replaced), the problem runs deeper. Proper chain indexing is a serious resource-heavy business that, at the moment, tends to favor centralized providers. Even in the best-case scenario, these providers will still be vulnerable to legal pressure. And in the much worse scenario, indexing will be done by dubious chain analytics companies who already peddle their services to De-Fi frontends under the label of “digital asset compliance & risk management.”

So having a functioning decentralized search and indexing solution for Web3 is not an option but a critical necessity. The good thing is that we already see some components of these critical architectures being deployed and tested. Honorable mention goes to projects like Trueblocks, the Rotki app, and other solutions based on local-first principles. Indexing solutions, such as The Graph, try to experiment with token-based business models. However, it is safe to say that the design space in Web3 search and indexing solutions is far from being fully explored. A research and design challenge is to develop modular and interoperable solutions comprising a scalable decentralized indexing and search engine that can run in a peer-to-peer (p2p) network.

In a recent article, we introduce DeScan, a peer-to-peer system for transaction indexing and searching in Web3. The scheme below shows its functioning.

Some interesting research questions immediately pop up here. Firstly, it is decentralized, but it does not mean the system is censorship resistant. In peer-to-peer settings, we can still get adversarial nodes that will try deleting requests and delivering search results. For example, a node might deliberately withhold information. Secondly, we will get simply non-responsive nodes that will fail for different reasons; for example, their Internet connection is unstable. Thirdly, what kind of storage and bandwidth overhead will our system introduce? For our system to be usable, this functionality should be integrated or modularly compatible with existing blockchain clients, ideally, without the need to run your own archival node and without the need for constant requests to full nodes. Why is this so desirable? Well, the hardware cost of the archival node for Ethereum is over 14TB (at the moment of writing). While storage is getting cheaper, in the long run, we need more diversity than that. And if only a few full nodes have to service a vastly larger number of dependent light clients, we get a severe scalability bottleneck. So the critical component here is the choice of a decentralized data structure for indexes that can be implemented as a peer-to-peer solution.

The popular choice for a distributed data structure in a peer-to-peer network is a family of Distributed Hash Tables (DHTs). They have some compelling properties, for example, logarithmic message and time complexity for searching, but they also have certain limitations. A fundamental limitation is that standard DHTs do not allow making range queries. In Web3, data can sometimes be ordered; for example, market orders in a decentralized marketplace are sorted by price.

A somewhat less popular but interesting alternative here is a Skip Graph data structure with similar algorithmic complexities as a DHT and support for range queries. The Skip Graph is a distributed data structure based on Skip Lists and enables search in a decentralized overlay. It consists of one or more nodes where each node has a key and a membership vector. The membership vector is usually a random bit string. A node can represent a physical machine or a peer in a peer-to-peer network, but it can also be more granular and describe some data that a particular peer stores. The Skip Graph consists of multiple levels and the main idea is that nodes that have i common prefix bits in their membership vector keep a reference to each other on level i. When searching for a node with a particular key, the search request gets routed through the network starting at the highest level, working its way downwards to the lowest level. This takes logarithmic message and time complexity, similar to a DHT.

However, the naive implementation of a Skip Graph does not address the problem of adversarial and failing nodes. Still, would it be possible to use a Skip Graph for decentralized transaction indexing and search that is censorship-resistant and can be integrated with light clients? Our answer to that is yes. We can modify the Skip Graph to add redundancy and fault tolerance without introducing prohibitively high overhead. Fourth specific modifications to the original Skip Graph strucure can make it tolerant to unresponsive and adversarial nodes:

  1. Acknowledgments of Search messages
  2. Extended routing tables
  3. Replicating triplets on the storage layer
  4. Operating multiple Skip Graph

The feasibility of these modifications has been evaluated experimentally using a data set with Ethereum transactions. It shows that our modified Skip Graph can deal with up to 25% adversarial peers and 35% unresponsive peers without significant system degradation. Queries are completed well within a second, even with over 10000 peers. Furthermore, the network and storage overhead induced by individual peers decreases as the network grows, and workload distribution evens out. This means that DeScan, when implemented as a peer-to-peer overlay, can feasibly integrate with light clients that contribute a small amount of system resources.

Starting from here, we can consider more modules to enable richer functionality, crafting different rules for different types of Web3 content and content discovery mechanisms in the future. But this is already a topic for a separate blog post. If you want more technical details, please read our research paper, which presents additional explanations and experimental results. If you are interested in these ideas or would like to share yours feel free to contact us.

]]>
<![CDATA[This blog post has been co-written by Georgy Ishmaev. This work is funded by the Ethereum Foundation.]]>
Emulating an iPod Touch 1G and iPhoneOS 1.0 using QEMU (Part II)2022-12-23T00:00:00+00:002022-12-23T00:00:00+00:00https://devos50.github.io/blog/2022/ipod-touch-qemu-pt2<![CDATA[

In my previous blog post, I described how I managed to get an iPod Touch 1G up and running using the QEMU emulator. In this follow-up post, I will outline the necessary steps to get the emulator up and running in a local environment.

Note: the instructions below have only been tested on MacOS so far.

Building QEMU

The emulator is built on top of QEMU, which we should build first. Start by cloning my fork of QEMU and checking out to the ipod_touch_1g branch:

git clone https://github.com/devos50/qemu
cd qemu
git checkout ipod_touch_1g

Compile QEMU by running the following commands:

mkdir build
cd build
../configure --enable-sdl --disable-cocoa --target-list=arm-softmmu --disable-capstone --disable-pie --disable-slirp --extra-cflags=-I/usr/local/opt/openssl@3/include --extra-ldflags='-L/usr/local/opt/openssl@3/lib -lcrypto'
make

Note that we’re explicitly enabling compilation of the SDL library which is used for interaction with the emulator (e.g., capturing keyboard and mouse events). Also, we only configure and build the ARM emulator. We’re also linking against OpenSSL as the AES/SHA1 engine uses some of the library’s cryptographic functions. Remember to update the include/library paths to the OpenSSL library in case they are located elsewhere. You can speed up the make command by passing the number of available CPU cores with the -j flag, e.g., use make -j6 to compile using six CPU cores. The compilation process should produce the qemu-system-arm binary in the build/arm-softmmu directory.

Downloading System Files

We need a few files to successfully boot the iPod Touch emulator to the home screen, which I published as a GitHub release for convenience. You can download all these files from here, and they include the following:

  • The S5L8900 bootrom binary, as iBoot and the kernel invokes some procedures in the bootrom logic.
  • The iBoot bootloader binary. This file is typically included in the IPSW firmware in an encrypted format, but for convenience, I’ve extracted the raw binary and included it in the GitHub repository.
  • A NOR image that contains various auxillary files used by the bootloader. I will provide some instructions on generating this NOR image manually later in this post.
  • A NAND image that contains the root file system. I will provide some instructions on generating this NAND image manually later in this post.

Download all the required files and save them to a convenient location. You should unzip the nand_n45ap.zip file, which contains a directory named nand.

Running the Emulator

We are now ready to run the emulator from the build directory with the following command:

./arm-softmmu/qemu-system-arm -M iPod-Touch,bootrom=<path to bootrom image>,iboot=<path to iboot image>,nand=<path to nand directory> -serial mon:stdio -cpu max -m 1G -d unimp -pflash <path to NOR image>

Remember to fix the flags, so they point correctly to the downloaded system files. Running the command above should start the emulator, and you should see some logging output in the console:

martijndevos@iMac-van-Martijn build % ./arm-softmmu/qemu-system-arm -M iPod-Touch,bootrom=/Users/martijndevos/Documents/ipod_touch_emulation/bootrom_s5l8900,iboot=/Users/martijndevos/Documents/ipod_touch_emulation/iboot.bin,nand=/Users/martijndevos/Documents/generate_nand/nand -serial mon:stdio -cpu max -m 1G -d unimp -pflash /Users/martijndevos/Documents/generate_nor/nor.bin
WARNING: Image format was not specified for '/Users/martijndevos/Documents/generate_nor/nor.bin' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
Reading PMU register 118
Reading PMU register 75
Reading PMU register 87
Reading PMU register 103
iis_init()
spi_init()
Reading PMU register 75
power supply type batt
battery voltage Reading PMU register 87
error
SysCfg: version 0x00010001 with 4 entries using 200 of 8192 bytes
BDEV: protecting 0x2000-0x8000
image 0x1802bd20: bdev 0x1802b6a8 type dtre offset 0x10800 len 0x7d28
image 0x1802c170: bdev 0x1802b6a8 type batC offset 0x18d40 len 0x101e1
image 0x1802c5c0: bdev 0x1802b6a8 type logo offset 0x29a80 len 0x1c3a
image 0x1802ca10: bdev 0x1802b6a8 type nsrv offset 0x2bfc0 len 0x4695
image 0x1802ce60: bdev 0x1802b6a8 type batl offset 0x30d00 len 0xc829
image 0x1802d2b0: bdev 0x1802b6a8 type batL offset 0x3e240 len 0xe9d2
image 0x1802e888: bdev 0x1802b6a8 type recm offset 0x4d780 len 0xb594
display_init: displayEnabled: 0
otf clock divisor 5
fps set to: 59.977
SFN: 0x600, Addr: 0xfe00000, Size: 0x14001e0, hspan: 0x500, QLEN: 0x140
merlot_init() -- Universal code version 08-29-07
Merlot Panel ID (0x71c200):
   Build:          PVT1 
   Type:           TMD 
   Project/Driver: M68/NSC-Merlot 
ClcdInstallGammaTable: No Gamma table found for display_id: 0x0071c200
Reading PMU register 75
power supply type batt
battery voltage Reading PMU register 87
error
Reading PMU register 23
Reading PMU register 42
Reading PMU register 40
Reading PMU register 41
Reading PMU register 75
power supply type batt
battery voltage Reading PMU register 87
error
Reading PMU register 23
Reading PMU register 42
Reading PMU register 40
Reading PMU register 41
usb_menu_init()
vrom_late_init: unknown image crc: 0x66a3fbbf


=======================================
::
:: iBoot, Copyright 2007, Apple Inc.
::
::	BUILD_TAG: iBoot-204
::
::	BUILD_STYLE: RELEASE
::
=======================================

Reading PMU register 87
[FTL:MSG] Apple NAND Driver (AND) 0x43303032
...

If there are any issues running the above commands, please let me know by commenting on this post or by making an issue on GitHub.

Manually Generating the NOR Image

If you wish to make changes to the NOR image, e.g., to replace the boot logo, to modify the device tree, or to change the kernel bootargs, you can follow the instructions below. I created a separate tool tool to generate the NOR image. You can clone and compile this tool with the following command:

git clone https://github.com/devos50/generate-ipod-touch-1g-nor
cd generate-ipod-touch-1g-nor
gcc generate_nor.c aes.c -o generate_nor -I/usr/local/Cellar/[email protected]/1.1.1l/include -L/usr/local/Cellar/[email protected]/1.1.1l/lib -lssl -lcrypto

Remember to replace the include and library paths to point to your OpenSSL installation.

You can modify the specifics of the generated NOR file by changing the generate_nor.c file. For example, the kernel bootargs can be modified here. The data directory contains various IMG2 images that will be embedded in the NOR image, including the device tree and boot logo. Generating the NOR image can be done by running the generate_nor binary:

./generate_nor

This will generate a nor.bin file.

Manually Generating the NAND Image

The NAND image is generated based on the root filesystem included in the IPSW firmware file, albeit heavily modified to bypass various checks. The necessary code can be found in this repository and be cloned as follows:

git clone https://github.com/devos50/generate-ipod-touch-1g-nand
cd generate-ipod-touch-1g-nand
gcc generate_nand.c -o generate_nand

This produces the generate_nand binary in the root directory of the repository. It expects filesystem-readonly.img file, which can be generated using the instructions below.

Creating a Filesystem Image

As a starting point, I uploaded a writable root filesystem to GitHub. This DMG file is based on the N45AP firmware and contains various modifications for emulation purposes. On Mac, you can mount this file and make changes to it. When done, unmount the file and convert the writable DMG to a read-only one using the hdiutil tool (available on Mac):

hdiutil convert -format UDRO -o filesystem-readonly.dmg  filesystem-writable.dmg

You should then extract the filesystem partition from the DMG file. For this, I used the dmg2img tool, which can be downloaded from here (build instructions can be found in this repository too). You can see which partitions the DMG file includes using the following command:

./dmg2img -l filesystem-readonly.dmg 

Which outputs something like:

dmg2img v1.6.5 (c) vu1tur ([email protected])

filesystem-readonly.dmg --> (partition list)

partition 0: Driver Descriptor Map (DDM: 0)
partition 1: Apple (Apple_partition_map: 1)
partition 2: Macintosh (Apple_Driver_ATAPI: 2)
partition 3: Mac_OS_X (Apple_HFSX: 3)
partition 4:  (Apple_Free: 4)

We need to extract the partition in HFS format, i.e., partition 3. Extract this partition using the following command:

./dmg2img -p 3 filesystem-readonly.dmg

This generates a filesystem-readonly.img, which you should copy to the directory containing the generate_nand binary. As the final step, generate the nand directory as follows:

./generate_nand
]]>
<![CDATA[In my previous blog post, I described how I managed to get an iPod Touch 1G up and running using the QEMU emulator. In this follow-up post, I will outline the necessary steps to get the emulator up and running in a local environment.]]>
Emulating an iPod Touch 1G and iPhoneOS 1.0 using QEMU (Part I)2022-10-11T00:00:00+00:002022-10-11T00:00:00+00:00https://devos50.github.io/blog/2022/ipod-touch-qemu<![CDATA[

Around a year ago, I started working on emulating an iPod Touch 1G using the QEMU emulation software. After months of reverse engineering, figuring out the specifications of various hardware components, and countless debugging runs with GDB, I now have a functional emulation of an iPod Touch that includes display rendering and multitouch support. The emulated device runs the first firmware ever released by Apple for the iPod Touch: iPhoneOS 1.0, build 3A101a. The emulator runs iBoot (the bootloader), the XNU kernel and then executes Springboard. Springboard renders the home screen and is responsible for launching other applications such as Safari and the calendar. I haven’t made any modifications to the bootloader, the kernel or other binaries being loaded. All source code can be found in my branch of QEMU. Note: the emulator requires a custom NOR and NAND image (more about that later in this post). I aim to publish another blog post soon with detailed instructions on how to generate these custom images.

The video below shows the emulator in action when booting the device and when navigating through various applications:

To achieve the above, I built upon some of the previous work on iOS/Apple device emulation by others 🚀:

The most complicated part of this project was to emulate the many hardware components included in the iPod Touch. The specifications of most of these components I had to get operational are proprietary and undocumented, making it sometimes quite difficult to emulate them properly. I do think, however, that this is the first emulated Apple product that is not only open source but also has full display support and multitouch operational (even though Correllium also offers virtualized iPhones, Correllium is commercial and closed source). In this blog post, I will outline some of the challenges I encountered, describe the steps taken during the boot process, and list some future tasks that can make the emulation even better. I did enjoy working on this emulator and learned many new things about the internals of mobile devices.

I specifically decided to focus on emulating an iPod Touch 1G running the first iOS version ever released. I did this for two reasons: first, older devices have fewer hardware components than newer devices, making it easier to build a useful device emulator. Contemporary Apple devices contain many additional hardware components, such as neural engines, secure enclaves, and a variety of sensors that will make the emulation of such devices much more difficult and time consuming. The second reason is that older iPhoneOS/iOS versions have few to no security measures implemented, such as trust caches. By focusing on the most primitive version of iPhoneOS, I didn’t have to circumvent any security mechanism.

Current Project Status

All hardware components required to execute iBoot, the XNU kernel, Springboard and the pre-installed iPhoneOS applications are functional. These hardware components are:

  • The AES cryptographic engine
  • The SHA1 hashing engine
  • The module for chip identification
  • The hardware clock and timer
  • The GPIO controller
  • The LCD display and framebuffers
  • The NAND controller and error-correcting code (ECC) module
  • The Flash Memory Controller (FMC), used to communicate with the NAND memory
  • The multitouch device
  • The power management unit and integrated real-time clock
  • The SDIO controller
  • The SPI controller
  • The I2C controller
  • The Vectored Interrupt Controller (VIC) and GPIO interrupt controller
  • The Direct Memory Access (DMA) controller
  • The UART controller

The following hardware components are not functional yet but are also not essential to fully boot the iPod Touch:

  • The USB OTG/Synopsys devices
  • Audio devices
  • The 802.11 WiFi controller
  • The PowerVR MBX graphics processor
  • The video encoder/decoder engine
  • The accelerator and light sensor

The boot procedure of the iPod Touch

The diagram below shows all five steps when booting the iPod Touch to user applications:

Bootrom and the Low-Level Bootloader

The iPod Touch 1G uses the ArmV6 (Little Endian) instruction set. The verify first step of this project involved setting up a QEMU machine with a CPU so we could execute some code. Fortunately, QEMU supports the ARM1176 CPU and the required instruction set. After initializing the QEMU machine and initializing some memory, we are ready to load our binaries into memory and execute some code!

The first code being executed when powering on the iPod Touch is the bootrom code, presumably engineered by Samsung when the iPod Touch 1G was introduced. The bootrom is fused in the device, read-only and cannot be modified through software. Therefore, vulnerabilities in the bootrom are highly sought since such vulnerabilities cannot be fixed with software (Checkm8 was the last vulnerability of this kind). A dump of the bootrom code can be downloaded from this website. I initially attempted to load and execute the bootrom code in my QEMU machine. However, I quickly found that the bootrom jumps to some code that is probably also fused in the device and missing from the bootrom dump that I used (the missing code seems to be located at offset 0x22000000 in memory). Since I didn’t have a physical iPod Touch 1G at the beginning of this project, I couldn’t obtain this missing code. The low-level bootloader (LLB, step 2 in the above figure) also jumps to this mysterious code, so I shifted my focus to executing iBoot instead (step 3 in the above figure).

Fun with the iBoot Bootloader

The primary function of the iBoot bootloader is to initialize the device peripherals and to load and execute the kernel image. iBoot can also enter recovery mode that enables a re-install of iPhoneOS using iTunes. Fortunately, the openiBoot project has done a lot of work to re-implement most of the functionality that iBoot provides. This source code was instrumental for me in understanding the main logic and procedures in iBoot. Since iBoot initializes and communicates with various hardware components, I also had to focus on getting these components up and running for iBoot to run.

The first hardware component I worked on was the vectored interrupt controller (VIC). This components registers interrupt requests from other hardware components and informs the CPU when an interrupt happened. The iPod Touch 1G seems to be equipped with a PL192 which is well-documented. After the VIC was up and running, I worked on redirecting print statements generated by the kernel to the QEMU console, which helped during the debugging process. Below you can see the console output of iBoot, up to the point where iBoot loads and decrypts the XNU kernel:

iis_init()
spi_init()
power supply type batt
battery voltage Reading PMU register 87
error
SysCfg: version 0x00010001 with 4 entries using 200 of 8192 bytes
BDEV: protecting 0x2000-0x8000
image 0x1802bd20: bdev 0x1802b6a8 type dtre offset 0x10800 len 0x7d28
image 0x1802c170: bdev 0x1802b6a8 type batC offset 0x18d40 len 0x101e1
image 0x1802c5c0: bdev 0x1802b6a8 type logo offset 0x29a80 len 0x1c3a
image 0x1802ca10: bdev 0x1802b6a8 type nsrv offset 0x2bfc0 len 0x4695
image 0x1802ce60: bdev 0x1802b6a8 type batl offset 0x30d00 len 0xc829
image 0x1802d2b0: bdev 0x1802b6a8 type batL offset 0x3e240 len 0xe9d2
image 0x1802e888: bdev 0x1802b6a8 type recm offset 0x4d780 len 0xb594
display_init: displayEnabled: 0
otf clock divisor 5
fps set to: 59.977
SFN: 0x600, Addr: 0xfe00000, Size: 0x14001e0, hspan: 0x500, QLEN: 0x140
merlot_init() -- Universal code version 08-29-07
Merlot Panel ID (0x71c200):
   Build:          PVT1 
   Type:           TMD 
   Project/Driver: M68/NSC-Merlot 
ClcdInstallGammaTable: No Gamma table found for display_id: 0x0071c200
power supply type batt
battery voltage error
power supply type batt
battery voltage error
usb_menu_init()
vrom_late_init: unknown image crc: 0x66a3fbbf


=======================================
::
:: iBoot, Copyright 2007, Apple Inc.
::
::	BUILD_TAG: iBoot-204
::
::	BUILD_STYLE: RELEASE
::
=======================================

[FTL:MSG] Apple NAND Driver (AND) 0x43303032
[NAND] Device ID           0xa514d3ad
[NAND] BANKS_TOTAL         8
[NAND] BLOCKS_PER_BANK     4096
[NAND] SUBLKS_TOTAL        4096
[NAND] USER_SUBLKS_TOTAL   3872
[NAND] PAGES_PER_SUBLK     1024
[NAND] PAGES_PER_BANK      524288
[NAND] SECTORS_PER_PAGE    4
[NAND] BYTES_PER_SPARE     64
[FTL:MSG] FIL_Init			[OK]
[FTL:MSG] BUF_Init			[OK]
[FTL:MSG] VFL_Init			[OK]
[FTL:MSG] FTL_Init			[OK]
[FTL:MSG] VFL_Open			[OK]
[FTL:MSG] FTL_Open			[OK]
Boot Failure Count: 0	Panic Fail Count: 0
Delaying boot for 0 seconds. Hit enter to break into the command prompt...
HFSInitPartition: 0x1802b8f0
Reading 8900 header with length 2048 at address 0x0b000000
Will decrypt 8900 image at address 0x0b000000 (len: 3319392 bytes)
Loading kernel cache at 0xb000000...
data starts at 0xb000180

As you can see from the above log, iBoot first initializes various hardware components; it then reads multiple images from the NOR flash memory, initializes the LCD screen, initializes the power management unit (PMU) to read the battery status, and then reads the kernel image from the NAND flash memory. Finally, it releases execution to the kernel. If the boot fails for any reason, iBoot jumps into a recovery mode that allows the execution of several debugging commands over the UART interface.

The iPod Touch 1G contains two kinds of persistent memory: NOR and NAND. The NOR memory is a relatively small block device. The primary file system is persisted in the NAND memory and is 8-32 GB in size for the iPod Touch 1G, depending on the model. For the emulator to correctly work, we need to emulate these block devices and make sure the bootloader/kernel can read from them correctly.

Constructing the NOR image

During boot, the iBoot bootloader reads multiple files stored in the NOR flash memory. These files are, for example, the Apple logo displayed when the device is booting, the recovery mode screen, the low battery screen, and the device tree. The NOR memory also contains the NVRAM and SysCfg partitions that store various device properties, such as the serial number, the MAC address, the boot arguments for the kernel, and crash logs. I wrote a custom tool to construct a valid NOR memory image from the files included in the IPSW file, and I provided this custom memory image when starting QEMU. The source code to construct this NOR image can be found in this GitHub repository.

Constructing the NAND image

One of the responsibilities of iBoot is to load the XNU kernel in memory and pass execution to it. iBoot can load the kernel image in two ways: it either reads the image from the file system in the NAND memory or it loads an image located at a particular memory offset. Since I want the emulation to be as close to an actual boot procedure as possible, I focussed on getting NAND I/O up and running. At a first glance, this sound straightforward as NAND storage is divided into different pages, and each page is numbered. As such, our emulator can simply return the appropriate data in a page when iBoot or the kernel requests one. Under the hood, however, a NAND device is much more complicated than that, mainly because NAND memory requires algorithms for wear levelling. This is needed because each physical block in NAND can only be reliably erased and written so many times before performance degrades. NAND drivers also contain other algorithms, e.g., for error-correcting code, bad block management, and garbage collection. As a result, the physical layout of pages in the NAND memory is quite different from the logical organization of these pages.

Openiboot fortunately contains an implementation of the NAND driver found in the iPod Touch 1G. This helped me not only to understand the physical layout of the NAND memory but also to understand the I/O interactions with the NAND memory. I also reviewed a leaked version of the iBoot source code that contains the source code of the NAND drivers. Similar to the NOR image, I wrote various scripts that construct a NAND image that could be read by the NAND driver. The source code can be found in this GitHub repository. The NAND image is built from the root file system included in the IPSW firmware file.

Decrypting and Loading the Kernel Image

At this point, iBoot correctly loads the kernel image from the NAND storage (located in the file system at /System/Library/Caches/com.apple.kernelcaches/kernelcache.s5l8900xrb). However, this kernel image is encrypted using a proprietary 8900 encryption scheme, and iBoot jumps to a decryption procedure in memory which instructions I do not have. To still be able to decrypt the image, I implemented a callback at the beginning of the encryption function being jumped to and decrypt the kernel image in QEMU logic instead. Then I leave the decrypted kernel image in memory, after which iBoot jumps to the entry method of the kernel image.

There were some other hardware components that I had to get up and running before iBoot gets to the point of loading the kernel. These components include the Power Management Unit (PMU), the DMA controller, the hardware timers and clock, and the LCD display.

Emulating the XNU Kernel

Most of my reverse engineering efforts have gone into understanding the XNU kernel and emulating hardware components that are used by the kernel. Even though the XNU kernel is mostly open source, Apple seems to maintain a private fork for the kernel included in Apple devices such as the iPod Touch and iPhone. Comparing the kernel shipped in iOS with the open-source kernel code, it seems that Apple has made various changes to the iOS kernel to ensure that it can run on ARM CPUs. Additionally, no source code for device-specific drivers for hardware components is available in the open source kernel implementation.

The XNU kernel first initializes several BSD subsystems, including the memory management logic, the scheduler and support for threads. Subsequently, the kernel reads the device tree included in the NOR image. A device tree is a data structure that describes all hardware components which are part of a particular device. The kernel uses the device tree to load the appropriate drivers for all these components and to initialize these components with the correct settings. A dump of the device tree used by the iPod Touch 1G can be found here and, as you can see, contains quite a lot of information! The device tree can also reveal information about dependencies between different components. For example, it indicates that communication with the multitouch screen proceeds over an SPI interface that is controlled by an SPI controller.

Perhaps the most important field in the device tree nodes is the memory address of components. Most hardware components use a technique called memory-mapped IO, or MMIO. With MMIO, the same address space is used to address both main memory and I/O devices. As a result, the kernel can simply read from and write to the main memory to communicate with hardware components. Implementing support for Memory-Mapped I/O in QEMU turned out to be relatively straightforward. Some hardware components, however, do not use MMIO and have to be accessed using different hardware communication protocols, such as SPI, I2C or SDIO.

After the BSD subsystems are initialized, the kernel starts the IOKit framework and starts to load the drivers for the hardware components included in the device tree. Since there are quite some drivers being loaded by the kernel (roughly 30), ensuring that all these drivers are correctly started took me a few months. The booting process occassionally got stuck because it was waiting for a hardware component that I didn’t emulate correctly yet to give a particular response. Below you can see a screenshot of some of the decompiled drivers:

And some of the files in my QEMU repo:

At one point during execution, the kernel starts reading binaries from the file system in NAND. Even though I already had full NAND support to make iBoot happy, the kernel reads from the NAND storage through a Flash Memory Controller, or FMC. This turned out to be one of the most challenging hardware components that I had to emulate. The FMC was also the first hardware component I had to emulate without any documentation or source code available. Deciphering the different I/O operations performed by the FMC and ensuring that the right NAND pages are read took me several weeks of trial and error. At this moment, the NAND read operations by the FMC should work correctly but I haven’t added support for NAND write operations yet.

After all drivers have been initialized, it is time for the kernel to execute the launchd application. launchd is the first program launched by the kernel and, as the name implies, it is responsible for launching other applications and startup scripts (it also runs with PID 1). The kernel boot is considered complete when launchd is started. From this point on, the applications executed by launchd run in user space instead of kernel space. When launchd was running correctly, the next step was to launch the standard application that manages the iPod Touch’s home screen: Springboard.

Launching Springboard

The launchd application looks for startup scripts in the /System/Library/LaunchDaemons directory in the file system and executes these scripts. These startup scripts include, for example, daemons for audio control, the address book, and Bluetooth support. One of these startup scripts, com.apple.SpringBoard.plist, contains instructions to launch the Springboard.app application. Unfortunately, Springboard got stuck shortly after starting it because I didn’t implemented display rendering yet.

Let there be Display

Springboard.App contains logic for rendering the home screen, including app icons, dialog screens, and the status bar. Display rendering on the iPod Touch (or any mobile device for that matter) is typically accelerated by a hardware graphics processor. From reverse engineering, I could already see that this hardware component is quite involved and that the communication protocol between the kernel and the graphics processor is complicated. As an alternative, I started looking for a way to disable the graphics processor for the moment being. Fortunately, the startup script of Springboard.App allowed me to add an environment variable LK_ENABLE_MBX2D=0 that successfully disables the graphics processor. With this option, all the display rendering is performed by the kernel instead which is also significantly slower than when doing rendering on dedicated hardware. Despite not having hardware-accelerated rendering operational, the animations in the emulated device are pretty smooth as also shown in the video at the beginning of the blog post.

The emulated device at this point successfully boots Springboard and renders the home screen 🎉🎉🎉

Implementing Support for Multitouch

The next step for me was to add support for navigating the user interface by touching the screen. My idea was to use the same approach as the iPhone Simulator included in Xcode, where mouse clicks are converted to touches on the screen. What seems like a relatively simple problem - detecting where a user has pressed the screen, converting this touch into an (x, y) coordinate pair and passing it to the kernel - is actually a very challenging problem. This patent granted to Apple in 2007 describes some of the required steps to accurately register user touches and gestures. In summary, the multitouch device generates frames that are read by the multitouch driver in the kernel. Each frame that contains a touch event that includes detailed information about the touch in the form of an ellipsis (see for example Figure 3 in the linked patent).

At one point, the kernel starts initializing the HID devices, which also includes the multitouch device. The initialization procedure of the multitouch device roughly looks as follows:

  1. Uploading calibration data: The kernel uploads calibration data to the multitouch device and calibrates the device. This calibration data is included in the file system and also embedded in the device tree.
  2. Uploading firmware data: The kernel uploads some Zephyr2 firmware data to the multitouch device. This firmware data is included in the file system and also embedded in the device tree.
  3. Reading device information: The kernel fetches various status reports from the multitouch device. These reports include information about several aspects of the multitouch device, such as versioning info and the number of touch points in the horizontal/vertical direction of the touch surface.

The kernel communicates with the multitouch device over an SPI interface. To ensure that the frames generated by the multitouch device are successfully transferred to the kernel, I had to get the SPI controller up and running. The multitouch device generates a GPIO interrupt to inform the kernel about the availability of frames, e.g., if there’s a touch or some other event to be processed. To obtain more information about the structure of frames that include touch events, I modified openiboot to initialize the multitouch device, compiled it, and logged all fields in a frame, as can be seen in the screenshot below:

By carefully analyzing the frames generated by various touches and swipes, I figured out how to convert mouse clicks in the QEMU window to touches and frames of the multitouch device. Each frame related to a touch event also includes information about the velocity of a swipe. This velocity is used, for example, when scrolling through a vertical list or when adjusting a horizontal slider. To ensure that these scrolling actions work correctly, I also had to provide a horizontal and vertical velocity in each frame generated by a touch. I compute these velocities by comparing the x/y coordinates of the previous mouse event against those of the current mouse event.

Finally, I added support for the home button (activated by pressing the ‘H’ key) and the power button (activated by pressing the ‘P’ key). This step was pretty straightforward. At this point, I have a fully functional iPod Touch that boots to the home screen and that can be navigated through by using mouse clicks and the keyboard.

I also discovered that some applications crashed because critical resource files were missing. The reason for these missing files is that I’m generating the NAND storage from the root file system provided in the IPSW. However, this clean file system is populated with various files when restoring or installing iPhoneOS. In my emulation, I’m not executing the restore scripts. I also had to copy activation records from an actual device to bypass device activation.

Some other screenshots when browsing through the pre-installed iPhoneOS applications:

Known Issues and Next Steps

While I now have a functional iPod Touch emulator, there are quite a few remaining issues:

  • The device crashes when it tries to display a keyboard. It seems that this is because the libicucore.dylib (the library responsible for Unicode support) is not correctly loaded in memory, but I haven’t figured out why this exactly happens.
  • There are a few infrequent crashes related to the USB driver and Flash Memory Controller. I suspect they are race conditions introduced because hardware communication in QEMU is much faster than on an actual device which might violate some underlying assumptions in the kernel logic.
  • Advanced gestures are not supported, for example, pinching and zooming in.
  • Brightness control is also not working yet.
  • There is no persistence of the NAND memory.
  • There are various glitches when the device is powered off or goes into auto-lock mode.

It was sometimes difficult to debug and find out what was happening on the device. Most of the debugging was done by attaching a GDB debugger to the QEMU guest. It would have been helpful to have an interactive shell running. I tried to compile and run bash on the emulated device but I haven’t gotten it to run.

It would also be nice to work towards a unified infrastructure to emulate other generations of iPhones, iPod Touches, Apple TVs and perhaps even Apple Watches. However, all these devices have differences in hardware and software specifications, and emulating them could be very time-consuming. As a next step, I would like to try to get an iPod Touch 2G functional.

I hope this blog post provided some insights into the process of emulating an iPod Touch 1G. There are many details that I didn’t write about but I might write about them in other blog posts. In my next blog post, I will provide instructions on compiling QEMU, generating the custom NOR/NAND images, and running the QEMU emulation. In the meantime, please let me know if you have any ideas, suggestions, or questions about this project!

]]>
<![CDATA[Around a year ago, I started working on emulating an iPod Touch 1G using the QEMU emulation software. After months of reverse engineering, figuring out the specifications of various hardware components, and countless debugging runs with GDB, I now have a functional emulation of an iPod Touch that includes display rendering and multitouch support. The emulated device runs the first firmware ever released by Apple for the iPod Touch: iPhoneOS 1.0, build 3A101a. The emulator runs iBoot (the bootloader), the XNU kernel and then executes Springboard. Springboard renders the home screen and is responsible for launching other applications such as Safari and the calendar. I haven’t made any modifications to the bootloader, the kernel or other binaries being loaded. All source code can be found in my branch of QEMU. Note: the emulator requires a custom NOR and NAND image (more about that later in this post). I aim to publish another blog post soon with detailed instructions on how to generate these custom images.]]>
Finding Python Memory Leaks Using Meliae2016-03-31T00:00:00+00:002016-03-31T00:00:00+00:00https://devos50.github.io/blog/2016/meliae-debugging<![CDATA[

When working on the Tribler project as part of my master thesis, I was asked to investigate a memory issue. Tribler allows people to share content in a fully decentralized way and implements a Tor-like protocol that can be used to download content anonymously. By proxying traffic through other nodes, the anonymity of Tor is obtained. You can start a Python script that allows your computer to become a proxy for other users in the network. However, when running this tool for a longer period (a few days), the memory becomes filled with objects, eventually crashing the program because it runs out of memory.

To get me started, I was provided with several memory dumps from Python programs that have been running for different amount of times (up to 13 hours). These dumps have been created using meliae. Meliae can be used to dump memory to a file and provides some tools tool investigate these dumped files. A meliae files looks like this:

{"address": 140478590600416, "type": "str", "size": 41, "len": 4, "value": "main", "refs": []}
{"address": 140478718917648, "type": "EggInfoDistribution", "size": 64, "refs": [140478718970688, 35134224]}
{"address": 9414496, "type": "type", "size": 904, "name": "object", "refs": []}
{"address": 140478727533464, "type": "dict", "size": 3352, "len": 36, "refs": [140478729927344, 140478727576104, 140478730052640]}

As you can see, each line represents a Python object in memory, formatted as JSON. The address, type, size in memory, length, references to other objects and an optional value are visible. When dealing with large files ten thousand lines of code, it is unfeasible to manually look at the entries. Fortunately, meliae provides some great tools to parse and visualize the data!

Meliae makes it possible to print an overview of the distribution of objects, based on type. We can do this with the following Python code:

1
2
3
4
5
6
7
8
from meliae import loader
om = loader.load("my_memory_file.out")
om.collapse_instance_dicts()
om.compute_referrers()
om.remove_expensive_references()
om.guess_intern_dict()
s = om.summarize()
print s

This will output something like the following:

Total 584629 objects, 629 types, Total size = 57.7MiB (60552402 bytes)
Index Count % Size % Cum Max Kind
0 222 0 12391984 20 20 4194536 frozenset
1 11393 1 10516760 17 37 196888 dict
2 392017 67 9408408 15 53 24 int
3 73858 12 6097030 10 63 65573 str
4 792 0 2448000 4 67 196944 module
5 25303 4 1895680 3 70 80056 tuple
6 1999 0 1807096 2 73 904 type
7 14679 2 1761480 2 76 120 function
8 13641 2 1746048 2 79 128 code
9 1047 0 1164264 1 81 1112 Id
10 1035 0 1150920 1 83 1112 Member
11 598 0 1061568 1 84 196896 collections.defaultdict
12 2376 0 905536 1 86 131304 set
13 2337 0 803928 1 87 344 StringifiableFromEvent
14 5456 0 592816 0 88 13224 list
15 361 0 401432 0 89 1112 Method
16 3968 0 349184 0 90 88 weakref
17 257 0 285784 0 90 1112 RoutingNode
18 895 0 252356 0 90 9124 unicode
19 695 0 244640 0 91 352 EC_pub

The output above is a parsed memory dump from a tunnel script that has been running for two hours. As we see, there are 222 frozenset objects in memory, responsible for 20% of the memory usage. Dictionaries are taking 17% of the memory. To find out which kind of object is causing the memory leak, we run the loader tool again on the dump from a longer run (only the four type of objects are visible that contribute most to the memory usage):

Total 786833 objects, 638 types, Total size = 113.7MiB (119259945 bytes)
Index   Count   %      Size   % Cum     Max Kind
     0   41202   5  41953200  35  35  196888 dict
     1     222   0  12391984  10  45 4194536 frozenset
     2  133092  16  10226137   8  54   65573 str
     3  414492  52   9947808   8  62      24 int

The amount of dictionaries in this dump is significant larger than our previous dump! Somehow, there must be some dictionaries that are not removed by our garbage collector.

To get a bit more insight in the increase of memory, I plotted the amount of dict objects in memory against the run time of the program:

Apparently, the number of dictionaries in memory is increasing very linear over time.

To get more information about the specific dictionaries that are causing havoc, I decided to get all dictionaries in the dump with lower running time and the dictionaries in the dump with a longer running time. Next, I filtered out all dictionaries that are only present in the dump with longer running time. These dictionary objects have been created for sure during the time of the two memory dumps. I’ve written the following small Python script to do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from meliae import loader

def get_dicts(file_name):
om = loader.load(file_name)
om.collapse_instance_dicts()
om.compute_referrers()
om.remove_expensive_references()
om.guess_intern_dict()
return om.get_all('dict')

dicts_600 = get_dicts("memory-600.00.out")
dicts_6000 = get_dicts("memory-6000.00.out")

set_addresses_600 = set()
for item in dicts_600:
set_addresses_600.add(item.address)

# find dictionaries in set_addresses_600 but not in
diff_list = []
for item in dicts_6000:
if item.address not in set_addresses_600:
diff_list.append(item)

diff_list now contains all dictionaries that have been added between the time of dumps of the two files. Let’s get some more information about the specific dictionary objects:

>>> print len(diff_list)
4874
>>> diff_list[300]
dict(140478455628136 1048B 24refs 1par)
>>> diff_list[800]
dict(140478457973568 1048B 24refs 1par)
>>> diff_list[1350]
dict(140478459290240 1048B 24refs 1par)

Interesting, these dictionary objects are very similar. We can print all the references of a specific object:

>>> diff_list[300].refs_as_dict()
{'message"': tuple(140478449713488 64B 1refs 2par), 'log_text"': 'Stopping protocol <Tribler.community.tunnel.tunnel_community.TunnelExitSocket instance at 0x7fc3ba8d"', 'system"': '-"', 'log_namespace"': 'log_legacy"', 'format"': '%(log_legacy)s"', 'isError"': 0, 'log_level"': NamedConstant(140478715765776 344B 7refs 100par), 'log_format"': '{log_text}"', 'log_legacy"': StringifiableFromEvent(140478109273360 344B 3refs 1par), 'log_system"': '-"', 'log_time"': float(140478327820632 24B 2par), 'time"': float(140478327820632 24B 2par)}

So this is the actual content of the dictionary. It appears to be some kind of logging entry. We can see which object is referencing the dictionary object above:

>>> diff_list[300].p
[collections.deque(140478716210768 20080B 2337refs 1par)]

Our log entries seems to appear in a list. Moreover, we see that there are 2337 items in this list. Let’s see which object is pointing to the list above:

>>> diff_list[300].p[0].p
[LimitedHistoryLogObserver(140478715938064 344B 3refs 2par)]

Now things get interesting. The LimitedHistoryLogObserver is part of the Twisted framework we are using to implement asynchronous programming. Our log gets filled with log messages. These log messages are generated when an event happens in Twisted (for instance, when we start or stop listening on a specific port).

At this point, I started to search on the internet to find any issues with the LimitedHistoryLogObserver. I found this issue in which the same issue is addressed. A workaround for the LimitedHistoryLogObserver getting filled with log messages can be found here. Seems that our observer is not really that limited :)

Meliae contains many more tools to investigate objects and get more interesting information:

>>> dir(diff_list[300])
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__hash__', '__init__', '__len__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_intern_from_cache', 'address', 'all', 'c', 'children', 'compute_total_size', 'iter_recursive_refs', 'num_parents', 'num_referrers', 'num_refs', 'p', 'parents', 'ref_list', 'referrers', 'refs_as_dict', 'size', 'to_json', 'total_size', 'type_str', 'value']

Meliae can also be combined with other tools such as objgraph. Objgraph can generate graphs that show references between objects. In my next blog post, I will write more about this tool. I hope you enjoyed this blog post and if you have any comments or questions, please let me know.

]]>
<![CDATA[When working on the Tribler project as part of my master thesis, I was asked to investigate a memory issue. Tribler allows people to share content in a fully decentralized way and implements a Tor-like protocol that can be used to download content anonymously. By proxying traffic through other nodes, the anonymity of Tor is obtained. You can start a Python script that allows your computer to become a proxy for other users in the network. However, when running this tool for a longer period (a few days), the memory becomes filled with objects, eventually crashing the program because it runs out of memory.]]>