ART Oat Format

Table of Contents

1. ART Oat Format

1.1. ELF

一个 oat 文件就是一个 elf 文件, 具体的, 它是一个 shared object file (so), 它的结构相比正常的 so 简单许多:

1.1.1. section headers

Section Headers:
[Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[ 0]                   NULL            00000000 000000 000000 00      0   0  0
[ 1] .dynsym           DYNSYM          000000d4 0000d4 000040 10   A  2   0  4
[ 2] .dynstr           STRTAB          00000114 000114 000046 01   A  0   0  1
[ 3] .hash             HASH            0000015c 00015c 000020 04   A  1   0  4
[ 4] .rodata           PROGBITS        00001000 001000 181000 00   A  0   0 4096
[ 5] .text             PROGBITS        00182000 182000 138554 00  AX  0   0 4096
[ 6] .dynamic          DYNAMIC         002bb000 2bb000 000038 08   A  1   0 4096
[ 7] .shstrtab         STRTAB          00000000 2bb038 000038 01      0   0  1

1.1.2. program headers

Program Headers:
Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
PHDR           0x000034 0x00000034 0x00000034 0x000a0 0x000a0 R   0x4
LOAD           0x000000 0x00000000 0x00000000 0x182000 0x182000 R   0x1000
LOAD           0x182000 0x00182000 0x00182000 0x138554 0x138554 R E 0x1000
LOAD           0x2bb000 0x002bb000 0x002bb000 0x00038 0x00038 RW  0x1000
DYNAMIC        0x2bb000 0x002bb000 0x002bb000 0x00038 0x00038 RW  0x1000

1.1.3. Section to Segment mapping

Segment Sections...
00
01     .dynsym .dynstr .hash .rodata
02     .text
03     .dynamic
04     .dynamic

1.1.4. Dynamic sections

Dynamic section at offset 0x2bb000 contains 7 entries:
Tag        Type                         Name/Value
0x00000004 (HASH)                       0x15c
0x00000005 (STRTAB)                     0x114
0x00000006 (SYMTAB)                     0xd4
0x0000000b (SYMENT)                     16 (bytes)
0x0000000a (STRSZ)                      70 (bytes)
0x0000000e (SONAME)                     Library soname: [system@[email protected]@classes.dex]
0x00000000 (NULL)                       0x0

1.1.5. dynsym

Symbol table '.dynsym' contains 4 entries:
Num:    Value  Size Type    Bind   Vis      Ndx Name
0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
1: 00001000 0x181000 OBJECT  GLOBAL DEFAULT    4 oatdata
2: 00182000 0x138554 OBJECT  GLOBAL DEFAULT    5 oatexec
3: 002ba550     4 OBJECT  GLOBAL DEFAULT    5 oatlastword

1.1.6. hash

Histogram for bucket list length (total of 2 buckets):
 Length  Number     % of total  Coverage
      0  0          (  0.0%)
      1  1          ( 50.0%)     33.3%
      2  1          ( 50.0%)    100.0%

1.1.7. 总结

  1. oat 并不包含正常 so 必备一些 section, 例如: 重定位信息(.rela.xxx), plt, got.plt, got, init, finit 等.
  2. oat 不包含 symtab, 而且 dynsym 的内容也很少, 实际上, oat 的主要信息就是 dynsym 中列出的两个符号: oatdata 和 oatexec, 具体的, oatdata 这个符号指向的是 rodata section (0x1000). 而 oatexec 指向的是 text section (0x18200).
  3. 正常的 so 在映射时一般将 text 和 rodata 都映射到同一个 RE 类型的 segment. oat 的做法不太一样: 它将 rodata 映射到一个单独的 R 类型的 segment, 将 text 映射到另一个 RE 类型的 segment.

猜测 oat 使用 elf 来封装主要的目的是利用 loader 将 program table header 中指示的 rodata 和 text 映射进来, 而且程序可以通过 dlsym 很容易的获得 oatdata 和 oatexec 两个符号加载后的绝对地址.

1.2. Oat

1.2.1. oatdata

oatdata 中主要的内容是:

  1. oat_header
  2. oat_dex_file oat_dex_file 代表了一个 dex 文件的头信息, 一个 oat 可以包含多个 dex 文件的信息.
    • dex_file_location_ 保存了原 dex 文件的名字
    • dex_file_pointer_ 指向 dex_file, 后者代表 dex 的内容
    • oat_class_offsets_pointer_ 指向 oat_class 数组, 后者保存了一个 dex 文件中所有 class 的信息
  3. dex_file dex_file 是原 dex 文件的完整拷贝.
  4. oat_class 代表一个 dex 文件中的一个 class 的信息.
    • status 表示该 class 的 status, 例如 kStatusResolved, kStatusInitialized 等
    • type 表示该 class 被编译的情况, 例如 kOatClassAllCompiled, kOatClassSomeCompiled, kOatClassNoneCompiled
    • methods_pointer_ 指向 oat_method_offsets, 后者通过 code_offset 指向 oatexec 中编译的代码.
-+------------------------------------------------------------------------+
 | OatFile-+------------------------------------------------------------+ |
 |         | OatHeader                                                  | |
 |        -+-----------------+                                          | |
 |         | OatDexFile [0] -+-----------------------------+            | |
 |         |                 | dex_file_location_          |            | |
 |         |                -+-----------------------------+            | |
 |         |                 | dex_file_pointer_          -+---+        | |
 |         |                -+-----------------------------+   |        | |
 |         |                 | oat_class_offsets_pointer_ -+---+--+     | |
 |         |                -+-----------------------------+   |  |     | |
 |         | OatDexFile [1]  |                                 |  |     | |
 |         |                 |                                 |  |     | |
 |         |                 |                                 |  |     | |
 |         | ...             |                                 |  |     | |
 |        -+-----------------+<--------------------------------+  |     | |
 |         | DexFile[0]      |                                    |     | |
 |        -+------------+----+                                    |     | |
 |         | DexFile[1]      |                                    |     | |
 |         | ...             |                                    |     | |
 |        -+-----------------+<-----------------------------------+     | |
 |         | OatClass[0]    -+---------------------+                    | |
 |         |                 | status              |                    | |
 |         |                -+---------------------+                    | |
 |         |                 | type                |                    | |
 |         |                -+---------------------+                    | |
 |         |                 | methods_pointer_[0]-+-------------+      | |
 |         |                 |                     | code_offset-+------+-+-> 对应于 oatexec 的 代码
 |         |                 |                    -+-------------+      | |
 |         |                 |                     | frame_size  |      | |
 |         |                 |                     | ...         |      | |
 |         |                 |                    -+-------------+      | |
 |         |                 | methods_pointer_[1] |                    | |
 |         |                 | ...                 |                    | |
 |         |                -+---------------------+                    | |
 |         | OatClass[1]     |                                          | |
 |         | ...             |                                          | |
 |        -+-----------------+------------------------------------------+ |
 |         | GCMap           |                                            |
 |         | ...             |                                            |
 |         | GCMap           |                                            |
 |         +-----------------+                                            |
 |         | Vmap Table      |                                            |
 |         | ...             |                                            |
 |         | Vmap Table      |                                            |
 |         +-----------------+                                            |
 |         +-----------------+                                            |
 |         | Mapping Table   |                                            |
 |         | ...             |                                            |
 |         | Mapping Table   |                                            |
 |         +-----------------+                                            |
-+------------------------------------------------------------------------+

1.2.2. oatexec

oatexec 的布局为

#+BEGIN_EXAMPLE
  -+------------------------------------------------------------+
   |interpreter_to_interpreter_bridge_                          |
   |interpreter_to_compiled_code_bridge_                        |
   |...                                                         |
   |quick_to_interpreter_bridge_                                |
   +------------------------------------------------------------+
   |OatQuickMethodHeader +------------------------+             |
   |                     |mapping_table_offset_   |             |
   |                     |vmap_table_offset_      |             |
   |                     |gc_map_offset_          |             |
   |                     |frame_info_             |             |
   |                     |code_size_              |             |
   |                     +------------------------+             |
   |native code                                                 |
   +------------------------------------------------------------+
   |OatQuickMethodHeader                                        |
   |native code                                                 |
   +------------------------------------------------------------+
   |...                                                         |
   |                                                            |
   |                                                            |
  -+------------------------------------------------------------+
1.2.2.1. trampoline

oatexec 的最开始保存着几个 trampoline 函数:

  1. interpreter_to_interpreter_bridge_
  2. interpreter_to_compiled_code_bridge_
  3. jni_dlsym_lookup_
  4. quick_generic_jni_trampoline_
  5. quick_imt_conflict_trampoline_
  6. quick_resolution_trampoline_
  7. quick_to_interpreter_bridge_

这几个 trampoline 的用法:

image writer 在 generate image 时, 需要把所有的 ArtMethod 写入 boot image. 而有些 ArtMethod 并没有对应的 quick code, 例如编译失败的方法, 或 native 方法且没有使用 jni compiler 编译的.

这时 image writer 就会将 ArtMethod 的 entry_point_from_quick_compiled_code_ 设置为相应的 trampoline, 参考 ImageWriter::GetQuickCode

OatWriter::InitOatCode 负责在在 oatexec 部分的开头写入这些函数

oatdump 的结果

// oatexec 的起始位置
EXECUTABLE OFFSET:
0x01ede000

// interpreter_to_interpreter_bridge_ 的位置, 显然它在 oatexec 的开头
INTERPRETER TO INTERPRETER BRIDGE OFFSET:
0x01ede000

// interpreter_to_compiled_code_bridge_ 的位置, 可见一个 trampoline 的大小为 0x10 (16 bytes)
INTERPRETER TO COMPILED CODE BRIDGE OFFSET:
0x01ede010

JNI DLSYM LOOKUP OFFSET:
0x01ede020

....

QUICK TO INTERPRETER BRIDGE OFFSET:
0x01ede060

OatDexFile:
location: /system/framework/core-libart.jar
0: Ljava/lang/Object; (offset=0x019228b8) (type_idx=1337) (StatusInitialized) (OatClassAllCompiled)
  0: void java.lang.Object.<init>() (dex_method_idx=13004)
    ....

    // 第一个 OatMethodHeader 的位置, 在 trampoline 之后 (0x01ede070)
    OatQuickMethodHeader (offset=0x01ede070)
      mapping_table: (offset=0x01cf2e04)
      vmap_table: (offset=0x01ea5d51)

    CODE: (code_offset=0x01ede08c size_offset=0x01ede088 size=8)...
      0x01ede08c: d65f03c0	ret
      0x01ede090: 00000000	unallocated (Unallocated)
1.2.2.2. managed code

oatexec 后面的部分主要保存着所有被编译过的函数的 native code 及其 OatMethodHeader, oatdata 的 OatClass -> methods_pointer_ [i] -> code_offset 指向该区域.

1.3. Oat Loading

1.3.1. Oat Location

apk/jar 对应的 oat 文件主要存在在两个位置:

  1. /data/dalvik-cache/<…apk..>.oat
  2. apk 所在目录的 oat/<arch>/xxx.odex

前者称为 oat file, 主要是指运行时生成的 oat

后者称为 odex file, 主要是指 pre-compile 生成的 oat, 例如:

  1. /system/app/Browser/Browser2.apk 对应的 odex 在 /system/app/Browser/oat/arm64/Browser2.odex, 这个文件是 pre-compile
  2. /data/app/Test/base.apk 对应的 odex 在 /data/app/Test/oat/arm/base.odex, 这个 odex 并不是 pre-compile 的, 而是 dex2oat 在手机上生成的.

在加载 oat 时, 会依次查找这两个不同的位置.

1.3.2. boot.oat

Runtime::Init
  ImageSpace::Init
    space->OpenOatFile(image_filename, error_msg)
      OatFile::Open(oat_filename, oat_filename, image_header.GetOatDataBegin(),...)
      /* 最终通过 loader 把 boot.oat 做为 elf 加载进来,其中
         requested_base 代表着 boot.oat 中的 .rodata 需要被加载到的位
         置。 这里的 request_base 实际就是 boot.art 中的
         oat_data_begin_, 这个参数在这里只做校验用, 实际上 boot.oat
         本身已经包含了这个地址信息。 */
      OpenElfFile(file.get(), location, requested_base,...)

1.3.3. app.oat

PathClassLoader
  new DexFile
    DexFile_openDexFileNative
      OpenDexFilesFromOat
        // oat file 或 odex file 是否 update
        if (!oat_file_assistant.IsUpToDate()):
          /* 尝试重新调用 dex2oat 或 patchoat 生成 */
          oat_file_assistant.MakeUpToDate(filter_, /*out*/ &error_msg)
        /* 选择 oat 或 odex 文件 */
        oat_file(oat_file_assistant.GetBestOatFile());
        if (oat_file != nullptr):
          /* 加载可能存在的 app image */
          image_space(oat_file_assistant.OpenImageSpace(source_oat_file));
          runtime->GetHeap()->AddSpace(image_space.get());
        /* 若 oat 没找到, 则尝试使用原始的 dex 文件: 例如 apk 中的 classes.dex */
        if (dex_files.empty()):
          DexFile::Open(dex_location, dex_location, /*out*/ &error_msg, &dex_files)

OatFileAssistant::IsUpToDate:
  return OatFileIsUpToDate() || OdexFileIsUpToDate()

GetOatFile()
  /* 这里的 requested_base 为 nullptr, 表示 app 的 oat 不会被加载到固定位置 */
  OatFile::Open(oat_location, oat_location, nullptr, ...)

1.4. Oat Update To Date

OatFileAssistant::IsUpToDate:
  /* OatFileIsUpToDate 是检查 dalvik-cache 下的 oat 是否是 up to date,
   * OdexFileIsUpToDate 是检查
   * /{system,data}/app/pkg/oat/<arch>/xxx.odex 是否 up to date */
  return OatFileIsUpToDate() || OdexFileIsUpToDate()

OatFileIsUpToDate:
  /* oat up to date 与否主要是通过一些 checksum 来判断 oat 与 dex 或
   * image 是否是 up to date:
   *
   * 1. 比较 OatHeader->adler32_checksum_ 与
        ImageHeader->oat_checksum_ 是一致的
   *
   * 2. Oat 中保存的 OatDextFile->dex_file_location_checksum_ 与原始的
        dex 的 crc32 是一致的
   *
   * 3. 比较 OatHeader->image_file_location_oat_checksum_ 与它依赖的
        所有 image 文件的 oat_checksum_ 的 combined_image_checksum_ 是
        一致的
   */
  GivenOatFileIsUpToDate(*oat_file)


some notes about the oat consistency

there are three kinds of files we need to consider when talking about the `oat consistency`:

  1. original dex file (dex in apk/jar)
  2. generated oat file
  3. generated art file (the image file)
  4. oat file consistency

    ​oat is an ELF file, when android try to load the oat file, it will check whether it is well-formatted, if not, the oat will be regenerated. the check is based on the ELF format, so even the oat passed the check, it may be still corrupted.

  5. art file consistency

    art file is not an ELF file, android just does some sanity check against the file format. In order to detect the case such as the dex2oat gets killed in the middle of image writing, it will write the ImageHeader at the end, so that sanity check against the image header will detect the corruption.

  6. use checksum to maintain consistency between dex/oat/art

    • crc32 computed from dex file need to be consistent with `OatHeader->dex_file_location_checksum_`
    • `ImageHeader->oat_checksum_` need to be consistent with `OatHeader->adlr32_checksum_`
    • `OatHeader->image_file_location_oat_checksum_` need to be consistent with some `combined value` computed from `ImageHeader->oat_checksum_` both in boot.art and app.art

    ​ these checksum are in OatHeader and ImageHeader, and are written at the last of oat/art writer.

  7. some offset need to be consistent between oat and art

when loading oat file, android will check whether it is up to date with these checksums, and will regenerate the oat file if not up to date.

for example:

  1. original apk is changed

    when launching the app, android will find the crc32 checksum computed from origin dex file is not consistent with `OatHeader->dex_file_location_checksum_`, thus it will refuse to use the oat file and enter interp mode. due to permission issue, app process can't regenerate the oat file, it will leave it to the system_server.

  2. boot.oat is changed

    when loading app oat file, android will detect that the `OatHeader->image_file_location_oat_data_begin_` not consistent with the boot.art, or the `OatHeader->image_file_location_oat_checksum_` is not consistent with the boot.art, thus app oat is not up to date

  3. dex2oat is killed in the middle of image writing

    in this case, the checksum in oat and art will be mismatch, thus oat and art will be regenerated.

Author: [email protected]
Date:
Last updated: 2023-11-20 Mon 16:08

知识共享许可协议