Golang 1.13 defer 变化

Golang的1.13发布了,在Release Note的Runtime section上提到了defer在多数情况下可以提升30%的性能。 那么,这30%的性能是怎么提升起来的呢?

我们知道的,以前的defer func会被拆解成runtime.deferprocruntime.deferreturn两个过程。

现在,在deferproc这一步,增加了一个deferprocStack的新过程,由编译器来选择使用deferproc还是deferprocStack。 当然了,既然官方说优化了大部分的使用场景,那么就说明,大部分的情况下编译器是使用了deferprocStack

来看代码,golang runtime panic

// deferprocStack queues a new deferred function with a defer record on the stack.
// The defer record must have its siz and fn fields initialized.
// All other fields can contain junk.
// The defer record must be immediately followed in memory by
// the arguments of the defer.
// Nosplit because the arguments on the stack won't be scanned
// until the defer record is spliced into the gp._defer list.
//go:nosplit
func deferprocStack(d *_defer) {
	gp := getg()
	if gp.m.curg != gp {
		// go code on the system stack can't defer
		throw("defer on system stack")
	}
	// siz and fn are already set.
	// The other fields are junk on entry to deferprocStack and
	// are initialized here.
	d.started = false
	d.heap = false
	d.openDefer = false
	d.sp = getcallersp()
	d.pc = getcallerpc()
	d.framepc = 0
	d.varp = 0
	// The lines below implement:
	//   d.panic = nil
	//   d.fd = nil
	//   d.link = gp._defer
	//   gp._defer = d
	// But without write barriers. The first three are writes to
	// the stack so they don't need a write barrier, and furthermore
	// are to uninitialized memory, so they must not use a write barrier.
	// The fourth write does not require a write barrier because we
	// explicitly mark all the defer structures, so we don't need to
	// keep track of pointers to them with a write barrier.
	*(*uintptr)(unsafe.Pointer(&d._panic)) = 0
	*(*uintptr)(unsafe.Pointer(&d.fd)) = 0
	*(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer))
	*(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))

	return0()
	// No code can go here - the C return register has
	// been set and must not be clobbered.
}

来看一下编译器的选择

package main

func main() {
    defer println(1)
}
        0x001d 00029 (./main.go:4)      PCDATA  $0, $0
        0x001d 00029 (./main.go:4)      PCDATA  $1, $0
        0x001d 00029 (./main.go:4)      MOVL    $8, ""..autotmp_1+8(SP)
        0x0025 00037 (./main.go:4)      PCDATA  $0, $1
        0x0025 00037 (./main.go:4)      LEAQ    "".wrap·1·f(SB), AX
        0x002c 00044 (./main.go:4)      PCDATA  $0, $0
        0x002c 00044 (./main.go:4)      MOVQ    AX, ""..autotmp_1+32(SP)
        0x0031 00049 (./main.go:4)      MOVQ    $1, ""..autotmp_1+56(SP)
        0x003a 00058 (./main.go:4)      PCDATA  $0, $1
        0x003a 00058 (./main.go:4)      LEAQ    ""..autotmp_1+8(SP), AX
        0x003f 00063 (./main.go:4)      PCDATA  $0, $0
        0x003f 00063 (./main.go:4)      MOVQ    AX, (SP)
        0x0043 00067 (./main.go:4)      CALL    runtime.deferprocStack(SB)
        0x0048 00072 (./main.go:4)      TESTL   AX, AX
        0x004a 00074 (./main.go:4)      JNE     9

确实调用了新的deferprocStack。

那么以前的deferproc呢?我们来看一下defer结构的代码,增加了一个heap的变量,用来区分是在堆上还是在栈上分配。

type _defer struct {
	siz     int32   // includes both arguments and results
	started bool
	heap    bool    // <-- 增加了这个新字段
	sp      uintptr // sp at time of defer
	pc      uintptr
	fn      *funcval
	_panic  *_panic // panic that is running defer
	link    *_defer
}

在1.13之前,走的都是deferproc,虽然也有deferpool,但是还是不够用。社区一直在吐槽defer慢,于是这次终于响应了民意。

如何区分defer是在heap还是在stack上呢?

    case ODEFER:
        d := callDefer
        if n.Esc == EscNever {
            d = callDeferStack
        }
        s.call(n.Left, d)

这个n.Escast.Node的逃逸分析结果,被修改为EscNever主要是以下这段:

    case ODEFER:
	if e.loopdepth == 1 { // top level
	    n.Esc = EscNever // force stack allocation of defer record (see ssa.go)
		break
	}

看意思,八成是如果defer外面有1层以上的for循环,就不是EscNever了。 我们来试一下,改一下之前的代码,加个for循环:

package main

func main() {
	for i := 0; i < 10; i++ {
		defer println(1)
	}
}

来看一眼汇编,熟悉的配方又回来了。

        0x0035 00053 (./main.go:5)      MOVL    $8, (SP)
        0x003c 00060 (./main.go:5)      PCDATA  $0, $1
        0x003c 00060 (./main.go:5)      LEAQ    "".wrap·1·f(SB), AX
        0x0043 00067 (./main.go:5)      PCDATA  $0, $0
        0x0043 00067 (./main.go:5)      MOVQ    AX, 8(SP)
        0x0048 00072 (./main.go:5)      MOVQ    $1, 16(SP)
        0x0051 00081 (./main.go:5)      CALL    runtime.deferproc(SB)
        0x0056 00086 (./main.go:5)      TESTL   AX, AX
        0x0058 00088 (./main.go:5)      JNE     92
        0x005a 00090 (./main.go:5)      JMP     33
        0x005c 00092 (./main.go:5)      XCHGL   AX, AX
        0x005d 00093 (./main.go:5)      CALL    runtime.deferreturn(SB)
        0x0062 00098 (./main.go:5)      MOVQ    32(SP), BP
        0x0067 00103 (./main.go:5)      ADDQ    $40, SP
        0x006b 00107 (./main.go:5)      RET

好了,这就是defer提速的原因,defer分配到了栈上,而且确实大多数情况下我们不会在循环中调用defer,所以,RN写的没毛病。