ART Oat Format
Table of Contents
1. ART Oat Format
1.1. ELF
一个 oat 文件就是一个 elf 文件, 具体的, 它是一个 shared object file (so), 它的结构相比正常的 so 简单许多:
1.1.1. section headers
Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .dynsym DYNSYM 000000d4 0000d4 000040 10 A 2 0 4 [ 2] .dynstr STRTAB 00000114 000114 000046 01 A 0 0 1 [ 3] .hash HASH 0000015c 00015c 000020 04 A 1 0 4 [ 4] .rodata PROGBITS 00001000 001000 181000 00 A 0 0 4096 [ 5] .text PROGBITS 00182000 182000 138554 00 AX 0 0 4096 [ 6] .dynamic DYNAMIC 002bb000 2bb000 000038 08 A 1 0 4096 [ 7] .shstrtab STRTAB 00000000 2bb038 000038 01 0 0 1
1.1.2. program headers
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x00000034 0x00000034 0x000a0 0x000a0 R 0x4 LOAD 0x000000 0x00000000 0x00000000 0x182000 0x182000 R 0x1000 LOAD 0x182000 0x00182000 0x00182000 0x138554 0x138554 R E 0x1000 LOAD 0x2bb000 0x002bb000 0x002bb000 0x00038 0x00038 RW 0x1000 DYNAMIC 0x2bb000 0x002bb000 0x002bb000 0x00038 0x00038 RW 0x1000
1.1.3. Section to Segment mapping
Segment Sections... 00 01 .dynsym .dynstr .hash .rodata 02 .text 03 .dynamic 04 .dynamic
1.1.4. Dynamic sections
Dynamic section at offset 0x2bb000 contains 7 entries: Tag Type Name/Value 0x00000004 (HASH) 0x15c 0x00000005 (STRTAB) 0x114 0x00000006 (SYMTAB) 0xd4 0x0000000b (SYMENT) 16 (bytes) 0x0000000a (STRSZ) 70 (bytes) 0x0000000e (SONAME) Library soname: [system@[email protected]@classes.dex] 0x00000000 (NULL) 0x0
1.1.5. dynsym
Symbol table '.dynsym' contains 4 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00001000 0x181000 OBJECT GLOBAL DEFAULT 4 oatdata 2: 00182000 0x138554 OBJECT GLOBAL DEFAULT 5 oatexec 3: 002ba550 4 OBJECT GLOBAL DEFAULT 5 oatlastword
1.1.6. hash
Histogram for bucket list length (total of 2 buckets): Length Number % of total Coverage 0 0 ( 0.0%) 1 1 ( 50.0%) 33.3% 2 1 ( 50.0%) 100.0%
1.1.7. 总结
- oat 并不包含正常 so 必备一些 section, 例如: 重定位信息(.rela.xxx), plt, got.plt, got, init, finit 等.
- oat 不包含 symtab, 而且 dynsym 的内容也很少, 实际上, oat 的主要信息就是 dynsym 中列出的两个符号: oatdata 和 oatexec, 具体的, oatdata 这个符号指向的是 rodata section (0x1000). 而 oatexec 指向的是 text section (0x18200).
- 正常的 so 在映射时一般将 text 和 rodata 都映射到同一个 RE 类型的 segment. oat 的做法不太一样: 它将 rodata 映射到一个单独的 R 类型的 segment, 将 text 映射到另一个 RE 类型的 segment.
猜测 oat 使用 elf 来封装主要的目的是利用 loader 将 program table header 中指示的 rodata 和 text 映射进来, 而且程序可以通过 dlsym 很容易的获得 oatdata 和 oatexec 两个符号加载后的绝对地址.
1.2. Oat
1.2.1. oatdata
oatdata 中主要的内容是:
- oat_header
- oat_dex_file
oat_dex_file 代表了一个 dex 文件的头信息, 一个 oat 可以包含多个 dex
文件的信息.
- dex_file_location_ 保存了原 dex 文件的名字
- dex_file_pointer_ 指向 dex_file, 后者代表 dex 的内容
- oat_class_offsets_pointer_ 指向 oat_class 数组, 后者保存了一个 dex 文件中所有 class 的信息
- dex_file dex_file 是原 dex 文件的完整拷贝.
- oat_class
代表一个 dex 文件中的一个 class 的信息.
- status 表示该 class 的 status, 例如 kStatusResolved, kStatusInitialized 等
- type 表示该 class 被编译的情况, 例如 kOatClassAllCompiled, kOatClassSomeCompiled, kOatClassNoneCompiled
- methods_pointer_ 指向 oat_method_offsets, 后者通过 code_offset 指向 oatexec 中编译的代码.
-+------------------------------------------------------------------------+ | OatFile-+------------------------------------------------------------+ | | | OatHeader | | | -+-----------------+ | | | | OatDexFile [0] -+-----------------------------+ | | | | | dex_file_location_ | | | | | -+-----------------------------+ | | | | | dex_file_pointer_ -+---+ | | | | -+-----------------------------+ | | | | | | oat_class_offsets_pointer_ -+---+--+ | | | | -+-----------------------------+ | | | | | | OatDexFile [1] | | | | | | | | | | | | | | | | | | | | | ... | | | | | | -+-----------------+<--------------------------------+ | | | | | DexFile[0] | | | | | -+------------+----+ | | | | | DexFile[1] | | | | | | ... | | | | | -+-----------------+<-----------------------------------+ | | | | OatClass[0] -+---------------------+ | | | | | status | | | | | -+---------------------+ | | | | | type | | | | | -+---------------------+ | | | | | methods_pointer_[0]-+-------------+ | | | | | | code_offset-+------+-+-> 对应于 oatexec 的 代码 | | | -+-------------+ | | | | | | frame_size | | | | | | | ... | | | | | | -+-------------+ | | | | | methods_pointer_[1] | | | | | | ... | | | | | -+---------------------+ | | | | OatClass[1] | | | | | ... | | | | -+-----------------+------------------------------------------+ | | | GCMap | | | | ... | | | | GCMap | | | +-----------------+ | | | Vmap Table | | | | ... | | | | Vmap Table | | | +-----------------+ | | +-----------------+ | | | Mapping Table | | | | ... | | | | Mapping Table | | | +-----------------+ | -+------------------------------------------------------------------------+
1.2.2. oatexec
oatexec 的布局为
#+BEGIN_EXAMPLE -+------------------------------------------------------------+ |interpreter_to_interpreter_bridge_ | |interpreter_to_compiled_code_bridge_ | |... | |quick_to_interpreter_bridge_ | +------------------------------------------------------------+ |OatQuickMethodHeader +------------------------+ | | |mapping_table_offset_ | | | |vmap_table_offset_ | | | |gc_map_offset_ | | | |frame_info_ | | | |code_size_ | | | +------------------------+ | |native code | +------------------------------------------------------------+ |OatQuickMethodHeader | |native code | +------------------------------------------------------------+ |... | | | | | -+------------------------------------------------------------+
1.2.2.1. trampoline
oatexec 的最开始保存着几个 trampoline 函数:
- interpreter_to_interpreter_bridge_
- interpreter_to_compiled_code_bridge_
- jni_dlsym_lookup_
- quick_generic_jni_trampoline_
- quick_imt_conflict_trampoline_
- quick_resolution_trampoline_
- quick_to_interpreter_bridge_
这几个 trampoline 的用法:
image writer 在 generate image 时, 需要把所有的 ArtMethod 写入 boot image. 而有些 ArtMethod 并没有对应的 quick code, 例如编译失败的方法, 或 native 方法且没有使用 jni compiler 编译的.
这时 image writer 就会将 ArtMethod 的 entry_point_from_quick_compiled_code_ 设置为相应的 trampoline, 参考 ImageWriter::GetQuickCode
OatWriter::InitOatCode 负责在在 oatexec 部分的开头写入这些函数
oatdump 的结果 // oatexec 的起始位置 EXECUTABLE OFFSET: 0x01ede000 // interpreter_to_interpreter_bridge_ 的位置, 显然它在 oatexec 的开头 INTERPRETER TO INTERPRETER BRIDGE OFFSET: 0x01ede000 // interpreter_to_compiled_code_bridge_ 的位置, 可见一个 trampoline 的大小为 0x10 (16 bytes) INTERPRETER TO COMPILED CODE BRIDGE OFFSET: 0x01ede010 JNI DLSYM LOOKUP OFFSET: 0x01ede020 .... QUICK TO INTERPRETER BRIDGE OFFSET: 0x01ede060 OatDexFile: location: /system/framework/core-libart.jar 0: Ljava/lang/Object; (offset=0x019228b8) (type_idx=1337) (StatusInitialized) (OatClassAllCompiled) 0: void java.lang.Object.<init>() (dex_method_idx=13004) .... // 第一个 OatMethodHeader 的位置, 在 trampoline 之后 (0x01ede070) OatQuickMethodHeader (offset=0x01ede070) mapping_table: (offset=0x01cf2e04) vmap_table: (offset=0x01ea5d51) CODE: (code_offset=0x01ede08c size_offset=0x01ede088 size=8)... 0x01ede08c: d65f03c0 ret 0x01ede090: 00000000 unallocated (Unallocated)
1.2.2.2. managed code
oatexec 后面的部分主要保存着所有被编译过的函数的 native code 及其 OatMethodHeader, oatdata 的 OatClass -> methods_pointer_ [i] -> code_offset 指向该区域.
1.3. Oat Loading
1.3.1. Oat Location
apk/jar 对应的 oat 文件主要存在在两个位置:
- /data/dalvik-cache/<…apk..>.oat
- apk 所在目录的 oat/<arch>/xxx.odex
前者称为 oat file, 主要是指运行时生成的 oat
后者称为 odex file, 主要是指 pre-compile 生成的 oat, 例如:
- /system/app/Browser/Browser2.apk 对应的 odex 在 /system/app/Browser/oat/arm64/Browser2.odex, 这个文件是 pre-compile
- /data/app/Test/base.apk 对应的 odex 在 /data/app/Test/oat/arm/base.odex, 这个 odex 并不是 pre-compile 的, 而是 dex2oat 在手机上生成的.
在加载 oat 时, 会依次查找这两个不同的位置.
1.3.2. boot.oat
Runtime::Init ImageSpace::Init space->OpenOatFile(image_filename, error_msg) OatFile::Open(oat_filename, oat_filename, image_header.GetOatDataBegin(),...) /* 最终通过 loader 把 boot.oat 做为 elf 加载进来,其中 requested_base 代表着 boot.oat 中的 .rodata 需要被加载到的位 置。 这里的 request_base 实际就是 boot.art 中的 oat_data_begin_, 这个参数在这里只做校验用, 实际上 boot.oat 本身已经包含了这个地址信息。 */ OpenElfFile(file.get(), location, requested_base,...)
1.3.3. app.oat
PathClassLoader new DexFile DexFile_openDexFileNative OpenDexFilesFromOat // oat file 或 odex file 是否 update if (!oat_file_assistant.IsUpToDate()): /* 尝试重新调用 dex2oat 或 patchoat 生成 */ oat_file_assistant.MakeUpToDate(filter_, /*out*/ &error_msg) /* 选择 oat 或 odex 文件 */ oat_file(oat_file_assistant.GetBestOatFile()); if (oat_file != nullptr): /* 加载可能存在的 app image */ image_space(oat_file_assistant.OpenImageSpace(source_oat_file)); runtime->GetHeap()->AddSpace(image_space.get()); /* 若 oat 没找到, 则尝试使用原始的 dex 文件: 例如 apk 中的 classes.dex */ if (dex_files.empty()): DexFile::Open(dex_location, dex_location, /*out*/ &error_msg, &dex_files) OatFileAssistant::IsUpToDate: return OatFileIsUpToDate() || OdexFileIsUpToDate() GetOatFile() /* 这里的 requested_base 为 nullptr, 表示 app 的 oat 不会被加载到固定位置 */ OatFile::Open(oat_location, oat_location, nullptr, ...)
1.4. Oat Update To Date
OatFileAssistant::IsUpToDate: /* OatFileIsUpToDate 是检查 dalvik-cache 下的 oat 是否是 up to date, * OdexFileIsUpToDate 是检查 * /{system,data}/app/pkg/oat/<arch>/xxx.odex 是否 up to date */ return OatFileIsUpToDate() || OdexFileIsUpToDate() OatFileIsUpToDate: /* oat up to date 与否主要是通过一些 checksum 来判断 oat 与 dex 或 * image 是否是 up to date: * * 1. 比较 OatHeader->adler32_checksum_ 与 ImageHeader->oat_checksum_ 是一致的 * * 2. Oat 中保存的 OatDextFile->dex_file_location_checksum_ 与原始的 dex 的 crc32 是一致的 * * 3. 比较 OatHeader->image_file_location_oat_checksum_ 与它依赖的 所有 image 文件的 oat_checksum_ 的 combined_image_checksum_ 是 一致的 */ GivenOatFileIsUpToDate(*oat_file)
some notes about the oat consistency
there are three kinds of files we need to consider when talking about the `oat consistency`:
- original dex file (dex in apk/jar)
- generated oat file
- generated art file (the image file)
oat file consistency
oat is an ELF file, when android try to load the oat file, it will check whether it is well-formatted, if not, the oat will be regenerated. the check is based on the ELF format, so even the oat passed the check, it may be still corrupted.
art file consistency
art file is not an ELF file, android just does some sanity check against the file format. In order to detect the case such as the dex2oat gets killed in the middle of image writing, it will write the ImageHeader at the end, so that sanity check against the image header will detect the corruption.
use checksum to maintain consistency between dex/oat/art
- crc32 computed from dex file need to be consistent with `OatHeader->dex_file_location_checksum_`
- `ImageHeader->oat_checksum_` need to be consistent with `OatHeader->adlr32_checksum_`
- `OatHeader->image_file_location_oat_checksum_` need to be consistent with some `combined value` computed from `ImageHeader->oat_checksum_` both in boot.art and app.art
these checksum are in OatHeader and ImageHeader, and are written at the last of oat/art writer.
- some offset need to be consistent between oat and art
when loading oat file, android will check whether it is up to date with these checksums, and will regenerate the oat file if not up to date.
for example:
original apk is changed
when launching the app, android will find the crc32 checksum computed from origin dex file is not consistent with `OatHeader->dex_file_location_checksum_`, thus it will refuse to use the oat file and enter interp mode. due to permission issue, app process can't regenerate the oat file, it will leave it to the system_server.
boot.oat is changed
when loading app oat file, android will detect that the `OatHeader->image_file_location_oat_data_begin_` not consistent with the boot.art, or the `OatHeader->image_file_location_oat_checksum_` is not consistent with the boot.art, thus app oat is not up to date
dex2oat is killed in the middle of image writing
in this case, the checksum in oat and art will be mismatch, thus oat and art will be regenerated.