January 6, 2022
Blog

Android ABX – Binary XML

On Android 12, a new challenger approaches in the form of a new file format that is replacing XML data in a number of key artefacts. In this blog, Principal Analyst Alex Caithness explains the new ‘ABX’ format, provides open-source code for reading the data and emphasises the importance of collaboration in the digital forensics community.

Collaboration within the Digital Forensics community is a wonderful thing.

It was like any other Saturday night when Alexis Brignoni (aka Brigs, prominent Digital Forensics practitioner and researcher, and creator of the LEAPP series of open source forensic tools: https://github.com/abrignoni) put up the Bat Signal to alert me to a finding made by Josh Hickman (who amongst many things maintains an excellent set of free Android test images: https://thebinaryhick.blog/) regarding an XML file on Android 12 being replaced by some kind of binary format with an ‘ABX’ header.

Figure 1. Time to get busy!

Since the initial discovery of this single file, ABX files have been popping up in multiple locations on Android 12 test images – primarily, but not exclusively in system artefacts. It appears that much like on iOS where we can find property lists (plists) in both text and binary formats, this is a reality we are heading towards on Android too. As Alexis Brignoni put it: “ABX is to XML as bplist is to plist”.

Possibly not everyone’s idea of a fun Saturday evening, this challenge was impossible for me to ignore, so equipped with beer and pretzels (it was the weekend after all) and with a test file provided as a starting point (one of many that would be helpfully filtered through to me by Josh and Alexis during the course of the research) I started digging.

A few Google searches identified a binary XML format which is used to compress the ‘AndroidManifest.xml’ file within APK (Android Application Package) files, but a quick peruse of the format suggested that the ‘ABX’ file identified was something different (if you’re interested there are details on this older format here: https://github.com/google/android-classyshark/blob/master/ClassySharkWS/src/com/google/classyshark/silverghost/translator/xml/XmlDecompressor.java).

With that dead end met, I instead headed to the Android Code Search website and started with some obvious search terms, in particular that ‘ABX’ file signature, when I came upon the following: https://cs.android.com/android/platform/superproject/+/master:frameworks/base/core/java/com/android/internal/util/BinaryXmlSerializer.java which, from a high level, described the structure of the test file I had access to. Good, time to dig in and understand it in more detail then!

Introduction to the ABX File Format

The ABX file format is a binary serialisation of XML data (or rather a subset of XML: some features of XML like namespaces are not supported). It allows an XML document to be stored in a binary format which is both more compact and able to be read more quickly (around a 2.4x reduction in file size and 4.3x faster to read according to Google’s own benchmarks).

Figure 2. A small XML document
Figure 3. The same XML document stored as an ABX file

The file begins with the file signature ‘ABX\0’ encoded as ASCII. According to the source code, the nul (0x00) character at the end of the signature represents the file format version so could change in the future if the format is updated.

The file is encoded as a sequence of XML ‘events’ which describe the document when read in order. This is based upon the idea of a ‘Pull Parser' (https://developer.android.com/reference/org/xmlpull/v1/XmlPullParser) which conceptually would represent the document in Figure 2 as:


Document Starts

Tag Starts: Name is ‘root’

Tag Starts: Name is ‘myElement’; it has 1 attribute called ‘anAttribute’ with the value ‘value of an attribute’

Text: ‘Content of the first myElement’

Tag Ends: Name is ‘myElement’

Tag Starts: Name is ‘myElement’; it has 1 attribute called ‘anAttribute’ with the value ‘some more attribute data’

Text: ‘Content of the Second myElement’

Tag Ends: Name is ‘myElement’

Tag Ends: Name is ‘root’

Document Ends


The ABX format ‘interns’ element names, attribute names and optionally attribute values. This means that the text data which represents these values will be written to the file itself only once, and subsequently, if they are to be written to the file again, instead they are referred to by an ID in a lookup table. You can see an example of evidence of this behaviour in Figure 3 where the string ‘myElement’ appears only once, despite there being four tags with that name in the source data – each subsequent use of this text is ‘interned’.

The format also allows for ‘strongly-typed’ attribute values. In ‘vanilla’ XML, all data in an attribute is text, but ABX allows a programmer to store numbers, binary data and Boolean values natively, so that for example, a floating-point number can be stored in the data as an IEEE 754 value rather than as text, an integer number can be stored as a twos-complement value rather than text, and so on.  The use of native data formats for strongly-typed data is an important feature to note as it may mean that numeric identifiers (like user IDs) which were previously responsive to text searches will no longer provide hits where an XML file has migrated to using ABX.

ABX File Format in Depth

The code which governs the creation of ABX data can be found in this source file: https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/com/android/internal/util/BinaryXmlSerializer.java . This code makes use of the FastDataOutput class for the raw output of the data, which includes the ‘interning’ behaviour described in the previous section: (https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/com/android/internal/util/FastDataOutput.java) in addition to some constant values defined in XmlPullParser (https://android.googlesource.com/platform/libcore/+/refs/heads/master/xml/src/main/java/org/xmlpull/v1/XmlPullParser.java).

What follows is an explanation of this format based upon my understanding of this code and testing we have undertaken.

The format starts with the signature 0x41425800 (ABX\0) which is then followed by one or more ‘tokens’ which are made up of a token type byte followed by data encoded based upon the type defined by the type byte.

The type byte contains two pieces of information. The least significant 4 bits (the lower nibble) gives the type of XML event for this token, the most significant 4 bits (the upper nibble) gives the data type for the token’s data.

All data written as STRING_INTERNED within a file is given an ID; the first ID assigned will be 0, with each subsequent ID being incremented by 1. Any time an interned string is written it is first established whether it has been previously written to the file; if the string has been written previously then its ID is written to the data as an unsigned 16-bit integer; if it is the first time this string has been written then the maximum 16-bit integer is written to the data (0xFFFF) followed by the length of the string in bytes as a 16-bit integer followed by that many bytes of UTF-8 encoded data; the new string will then be given the next available ID. A maximum of 65534 (0xFFFE) strings can be interned. All strings after that point will be stored as if it is the first time they’ve been written, even upon repeated writes.

For all token types other than the ATTRIBUTE type, data of the type defined by the data type follows directly after the type byte. With an ATTRIBUTE, an interned string (and it’s always an interned string, regardless of the type byte) follows the type byte, which is the name of the attribute, followed then by the value encoded as per the type byte.

As an example, let’s work through a few tokens:

Figure 4. Decoding a token at offset 4
Figure 5. Decoding a token at offset 5
Figure 6. Decoding a token at offset 220
Scripts and Test Data

As usual, we at CCL are committed to empowering the digital forensics community through open source, so we’re very happy to announce that we have released a Python module (ccl_abx) which can be used by other projects to convert ABX files back to their XML representation for processing; the module also works as a command line utility which will convert ABX data at the command line.

In addition to the Python module, we have also included a number of test ABX files, and their XML equivalents so that you can confirm for yourself that the module is operating correctly, as well as a Java application (makeabx) which uses the original code from the Android source (patched to remove extraneous dependencies) to convert an XML file to ABX so that you can generate your own test data.

The repository containing the code can be found here: https://github.com/cclgroupltd/android-bits

We're here to help

Our experts are on hand to learn about your organisation and suggest the best approach to meet your needs. Contact an expert today.

Get in touch
hexes