Collected notes on Local Android Backups

They are difficult and convoluted and not at all nice.
The documentation is poor, widely spread, and difficult to find or trust across the internet.
You are expected to instead backup to Google's computer. But I don't want to do that.

Android changes a lot, and it's difficult to know if each piece of available information is still relevant.

Okay. Here's what I figured out. It may not all be correct.

Android has a convoluted storage structure.
The internal storage device is mounted in a bunch of different places with counterintuitive names, including:

/data/media/{uid}/
/sdcard # nb this is the virtual emulated sdcard, it's actually internal flash storage.
/storage/self/primary # When booted, not in recovery

The Sdcard is mounted at

/external_sd
/storage/{Partition-UUID} # When booted not in recovery

/data is a hidden folder containing both App APKs and App Data. It is inaccessible on a non-rooted device, but can be accessed through TWRP recovery or similar.

Android Applications are installed to /data/app/{appbundleid}
They store their app data in one or many locations, in order of likelihood:

/data/data/{appbundleid}/
/data/media/0/Android/{data,media,obb}/{appbundleid}
/external_sd/Android/{data,media,obb}/{appbundleid}

Backup Format?

Android under the hood makes use of unix permissions and SELinux attributes. These need backing up.

TWRP makes backups from recovery mode.
It creates tarballs using a custom tar format via their own fork of libtar.

It is not easily available to compile for non-android non-arm systems.

The format includes double entries for every file, and uses custom headers.
One entry is mostly blank and includes the SELinux info.
The other includes the file data.

In short, TWRP makes custom tarballs with custom tools, which are only really useful for a whole disk backup and restore. This effectively limits your backups to the same device and same ROM version.
If for example, your phone breaks and you buy a new one, accessing your backup data isn't going to be simple.

What is backed up?

By Default, TWRP backs up these partitions:

/boot
/super
/data

However, as explicitly stated, and confusingly worded due to android partition naming conventions, TWRP DOES NOT backup what it calls "Internal Storage"

In this case "Internal Storage" = /data/media
i.e The User Visible Internal Storage Files.

This is fine, you can back up your camera roll without a complicated root imaging software.

However, where this is a problem is that some apps do store data under /data/media/0/Android, which gets excluded from this backup.

https://www.reddit.com/r/Xiaomi/comments/fj36oz/information_on_backing_up_and_restoring_with_twrp/

seemebreakthis reckons you can just create your own tarball from recovery if you want this. Seems sensible. Just BE AWARE that TWRP doesn't back this up by default.

Note- just running 'tar' from an ADB shell does NOT produce a custom TWRP tarball with preserved SELinux data. Later you'll see that preserved SELinux data doesn't see too important for app files.

What's out there?

https://gamerdonkey.com/posts/2020/installing-a-lost-android-app-from-a-twrp-full-backup/

Here gamerdonkey, in 2020 examines the structure of the tarballs.
He is able to strip the SELINUX entries from his TWRP backup, leaving only file contents.
From there he extracts an APK file and re-installs it.

https://www.semipol.de/posts/2016/07/android-manually-restoring-apps-from-a-twrp-backup/

Semipol, in 2016 is able to extract appdata from backed up apps and restore it to a phone with that app installed. He notes that each app is assigned it's own linux user account (at install time), and the app file ownership needs to match this, and SELINUX attributes need to be restored. He explains how to do this with dumpsys, chown and restorecon.

https://newspaint.wordpress.com/2016/05/03/restoring-selinux-labels-after-restoring-from-data-backup-to-android/

Provides context to Semipol

https://stosb.com/blog/recovering-data-from-a-corrupt-tar-archive/

https://gist.github.com/danpawlik/4df7715f4d0f4c67ffffcdd18d0159e7

danpawlik (in 2020) has packaged the above into a script which basically extracts the whole TWRP backup into the /data/data directory, queries the GIDs on the restored backup, and makes sure every file in the {bundleid} directory has that ownership.

Tom Hacohen, in 2018 is able to forensics his way into getting some valuable data out of a TWRP tar backup. It was apparently corrupted, but it's also possible that a regular GNU/BSD version of tar was just choking on the custom-tar-format archive.

Either way, he was able to get his data out of the archive using standard tools, which is reassuring.

(NB. In my personal trials, using a mac, BSD tar utterly choked at trying to extract my TWRP backups, while GNU tar fared better)

https://github.com/simon816/libtar-twrp

Finally Simon816 in 2024 has created a config to allow compiling libtar-twrp for non-android linux devices. w568w kindly packaged this as a yay package for Arch Linux.

I have found the following:

  1. Using TAR from TWRP does NOT preserve the SELINUX attributes. I think this is done by the twrp binary, but there's no option to use this on arbitrary directories. (though apparently by renaming the /data/media directory, it will then be included in a /data partition backup)
  2. Using tar like this will preserve UID and GID, which can then be restored.
  3. The files have ownership {APPUSER:APPUSER} in the /data/data folder, but everything in data/media is owned by user 'media_rw' with id 1023. The groups are app specific though.
  4. There seems to be a system of a prefixed number to the UID/GID for different files relating to the app.
    e.g. These are some of the files from an app Trainpal
	bundleID com.pal.train
	userId 10548
	   
	   /data/media/0/Android/data/com.pal.train 
	   UID 1023: media_rw
	   GID 30548: u0_a548_ext
	   /data/media/0/Android/data/com.pal.train/cache
	   UID 1023: media_rw
	   GID 40548: u0_a548_ext_cache
	   /data/data/com.pal.train/
	   UID: 10548: u0_a548
	   GID: 10548: u0_a548
  1. The userId for the app is assigned at install time. Uninstall and reinstall the same app, new appId. danpawlik above appears to be having success by just restoring the whole backup including the apk, and then reapplying the permissions based on the backed up userId. It makes sense that should work.
    Semipol instead installed the app first, then reapplied ownership based on the apps new userId.
  2. for /data/media, doing a tar backup from booted into recovery doesnt yield any more files than a simple adb pull /storage/self/primary from a computer when booted normally.
    However, it's possible this may change as Google continue to tighten up access to the /storage/self/primary/Android directory for users.
  3. To the question, can you back up and restore apps without root?
    Yes, if you have a custom recovery like TWRP.
    And you're happy monkeying around extracting the files from the backup and deploying them via adb.
    Probably you should just root and use SwiftBackup though.