Happen Before

1. Happen Before

1. Happen Before

1.1. Overview

https://web.archive.org/web/20190512022934/https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5

Happen Before 可以看作是 java 规定了 JVM 需要在哪些地方需要插入 barrier, barrier 有两个作用:

指令重排 (编译器或 cpu) 无法跨过这些 barrier
barrier 要保证可见性, 即 barrier 之前的数据改动需要对 barrier 之后是可见的

JLS 定义的 happend-before 包括:

monitor

hb (monitor unlock, monitor enter)
volatile

hb (volatile write, volatile read)
thread

hb (default value initialization, action in thread)

hb (start thread, action in the started thread)

hb (action in thread, joining on the thread)
<init>

hb (final field write, final field read)

hb (<init>, finalizer)

在实现上, 需要 jvm 在适合的地方插入一些硬件的 barrier 指令, 例如 arm 的 DMB 指令

1.2. volatile

public class playground {

    private static int sGlobalValue = 0;

    public static void main(String[] args) {
        new Thread() {
            @Override
            public void run() {
                int local_value = sGlobalValue;
                while (local_value < 5) {
                    if(local_value != sGlobalValue) {
                        System.out.println("Got sGlobalValue: " + sGlobalValue);
                        local_value= sGlobalValue;
                    }
                }
            }
        }.start();

        new Thread() {
            @Override
            public void run() {
                int local_value = sGlobalValue;
                while (sGlobalValue <5) {
                    sGlobalValue = ++local_value;
                    System.out.println("Set sGlobalValue: " + sGlobalValue);
                    try {
                        Thread.sleep(500);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        }.start();
    }
}

1.2.1. volatile 的实现

byte code 中只会把变量标记为 volatile, 不会生成特殊的代码

volatile int x = 1;
int x = 1;

两者会生成相当的 IPUT 指令, 只不过 x 会被标记为 volatile, jvm 在处理 IPUT 时会针对 volatile 有特殊处理

jvm 打开 dex 文件时, 会把针对 volatile 的变量的 IPUT 替换为内部的 IPUT_VOLATILE 指令

openDexFile
  dvmRawDexFileOpenArray
    dvmPrepareDexInMemory
      rewriteDex
        verifyAndOptimizeClasses
          verifyAndOptimizeClass
            dvmOptimizeClass
              optimizeMethod

optimizeMethod
  case OP_IPUT:
  quickOpc = OP_IPUT_QUICK;
  // 多核时才需要考虑 volatile
  if (forSmp):
      volatileOpc = OP_IPUT_VOLATILE;
  goto rewrite_inst_field;

rewrite_inst_field:
  if (volatileOpc != OP_NOP && dvmIsVolatileField(instField)):
    updateOpcode(method, insns, volatileOpc);

OP_IPUT_VOLATILE 与 OP_IPUT

void dvmSetFieldInt(Object* obj, int offset, s4 val) {
    ((JValue*)BYTE_OFFSET(obj, offset))->i = val;
}


void dvmSetFieldIntVolatile(Object* obj, int offset, s4 val) {
    s4* ptr = &((JValue*)BYTE_OFFSET(obj, offset))->i;
    ANDROID_MEMBAR_STORE();
    *ptr = val;
    ANDROID_MEMBAR_FULL();
}

对于 OP_IPUT, 直接把相当的内存赋值即可, 对于 OP_IPUT_VOLATILE, 在赋值前后多了一些 memory barrier 操作

ANDROID_MEMBAR_STORE & ANDROID_MEMBAR_FULL

#if ANDROID_SMP == 0
#define ANDROID_MEMBAR_STORE android_compiler_barrier
#define ANDROID_MEMBAR_FULL android_compiler_barrier
#else
#define ANDROID_MEMBAR_STORE android_memory_store_barrier
#define ANDROID_MEMBAR_FULL android_memory_barrier
#endif

单核时不需要 cpu memory barrier, 只需要 compiler barrier

android_compiler_barrier

__asm__ __volatile__ ("" : : : "memory");

android_memory_store_barrier

__asm__ __volatile__ ("dmb st" : : : "memory");

android_memory_barrier

__asm__ __volatile__ ("dmb" : : : "memory");

1.2.2. barrier

1.2.2.1. compiler barrier

只是给 compiler 的提示, 要求它在进行编译时的指令重排不允许跨过这个 barrier

1.2.2.2. cpu barrier

cpu barrier 可以控制 cpu 级别的指令重排和 cache 一致性

1.2.3. barrier 不仅指单个对象的可见性

public class Test {

    private static int sGlobalValue = 0;
    private static volatile int mark = 0;

    public static void main(String[] args) {
        new Thread() {
            @Override
            public void run() {
                int local_value = sGlobalValue;
                while (local_value < 5) {
                    int tmp = mark;
                    if(local_value != sGlobalValue) {
                        System.out.println("Got sGlobalValue: " + sGlobalValue);
                        local_value= sGlobalValue;
                    }
                }
            }
        } .start();

        new Thread() {
            @Override
            public void run() {
                int local_value = sGlobalValue;
                while (sGlobalValue <5) {
                    sGlobalValue = ++local_value;
                    mark = 1;
                    System.out.println("Set sGlobalValue: " + sGlobalValue);
                    try {
                        Thread.sleep(500);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        } .start();
    }
}

上面的例子是正常运行的, 虽然 sGlobalValue 不是 volatile, 但对 mark 的读写导致的 barrier 间接导致 sGlobalValue 也被同步了.

所以 happen-before 重点是 barrier, 而不仅仅是涉及对象本身 (一个引用) 的可见性. 例如:

class Test {
    int a;
    int b;
    public Test() {
        this.a = 1;
        this.b = 1;
    }
}

volatile Test mTest;

// thread1:
mTest = new Test();

// thread2:
if (mTest != null) {
    assert(mTest.a == 1);
    assert(mTest.b == 1);
}

不仅 mTest 能保存可见, 而且 mTest 的成员也是可见的, 因为成员的初始化发生在对 mTest 的赋值之前.

1.2.4. volatile 与寄存器

gcc 在处理 c 语言的 volatile 关键字时, 会考虑不把它放在寄存器中. Java 的 JIT 也许也会有同样的考虑. 但 dalvik 的 JIT 并没有考虑这一点:

所有的 store 相关的操作: IPUT, SPUT, … 每个 MIR 最终都会生成一条把临时寄存器的值写入内存的 HIR (参考 dalvik::storeBaseIndexed)

1.3. synchronized

synchronized (即 MonitorEnter 和 MonitorExit) 也是一个 barrier

OP_MONITOR_ENTER
  dvmLockObject(self, obj);
    if (LW_SHAPE(thin) == LW_SHAPE_THIN):
        if (LW_LOCK_OWNER(thin) == threadId):
           // 当前线程已经持锁, 不再处理
        else if (LW_LOCK_OWNER(thin) == 0):
            android_atomic_acquire_cas(thin, newThin, (int32_t*)thinp)
              int status = android_atomic_cas(old_value, new_value, ptr);
              // barrier!
              android_memory_barrier();
        else:
           // spin to acquire the THIN lock
          for (;;):
            android_atomic_acquire_cas(thin, newThin, (int32_t *)thinp)

     else:
       // FAT
       lockMonitor(self, LW_MONITOR(obj->lock));
         dvmLockMutex(&mon->lock);
           // pthread 会负责调用 barrier
           pthread_lock_mutex()

1.4. Thread

Java thread 的入口也相当于一个 barrier, 这是通过 pthread_create 来保证的

pthread_create
  ScopedPthreadMutexLocker start_locker(start_mutex);
    pthread_mutex_lock(mu_);
      ANDROID_MEMBAR_FULL();
  __pthread_clone(start_routine, child_stack, flags, arg);

1.5. final field

通过构造函数对 final field 进行初次赋值时也需要插入一个 barrier

optimizeMethod
  needRetBar = needsReturnBarrier(method);
  switch (opc):
    // final 只能在构造函数中被赋值, 所以 OPC 为 OP_RETURN_VOID
    case OP_RETURN_VOID:
      if (needRetBar):
          rewriteReturnVoid(method, insns);

needsReturnBarrier:
  // 必须是构造函数
  if (strcmp(method->name, "<init>") != 0):
    return false;
  int idx = clazz->ifieldCount;
  while (--idx >= 0):
    // class 存在 final field
    if (dvmIsFinalField(&clazz->ifields[idx]))
      return true;

1.6. finalizer

构造函数与 finalizer 之间需要有一个 barrier

needsReturnBarrier:
// 必须是构造函数
if (strcmp(method->name, "<init>") != 0):
    return false;

// 存在 finalizer
if (IS_CLASS_FLAG_SET(clazz, CLASS_ISFINALIZABLE))
    return true;

1.7. MESI

1.8. 其它

1.8.1. happen-before 与 safe construction

所谓 safe construction, 是指不要在构造函数返回之前把 this `逃逸` 出去, 例如:

不要在构造函数中把 this 赋值给 static 变量
不要在构造函数中把 this 注册为 listener
…

这样做的一个原因是构造函数返回之前, 该对象的 final 成员的值对其它线程是不可见的, 因为final 的 happen-before 是通过在构造函数返回时插入 barrier 实现的, 所以即使把 `逃逸` 的操作放在构造函数的最后也是不安全的:

class Test {
    int a;
    final int b;
    public Test() {
        a = 1;
        b = 2;
        sTest = this;
    }
}

1.8.2. AtomicInteger

AtomicInteger 的 value 本身声明为 volatile, 所以对它的 get 和 set 都是自带 barrier
除此之外, AtomicInteger 通过 CAS 提供了 getAndIncrement 等方法, 提供了原子性

1.8.3. LockSupport

LockSupport 的 park 方法为 java concurrent 库提供了 lock 原语, 最终 park 会依赖于 futex 或 pthread

1.8.4. pthread

pthread_mutex_lock 等函数都有 barrier