go服务优化技巧

20 June 2019

简介

本文介绍go服务可能会用到的几个优化技巧

技巧1: sync.Pool 池化某些对象，实现复用

池化后进行对象复用，可以减少对象重复创建的开销，并且可以减轻gc的压力。

使用示例:

package main

import (
	"fmt"
	"sync"
)

var bufpool = sync.Pool{
	New: func() interface{} {
		buf := make([]byte, 0, 512)
		return &buf
	},
}

func main() {
	b1 := *bufpool.Get().(*[]byte)
	b1 = append(b1, []byte("aaaaa")...)
	fmt.Println(b1, len(b1), cap(b1))
	fmt.Printf("%p, %p \n", b1, &b1)
	bufpool.Put(&b1)
	b2 := *bufpool.Get().(*[]byte)
	fmt.Println(b2, len(b2), cap(b2))
	fmt.Printf("%p, %p \n", b2, &b2)
	bufpool.Put(&b2)
}

示例程序输出结果为:

[97 97 97 97 97] 5 512
0xc4200a6000, 0xc4200a2020
[97 97 97 97 97] 5 512
0xc4200a6000, 0xc4200a20a0

b2和b1对应的底层页数组的地址是一致的，并且b2从pool里取出来时，保留了b1的值，这样是不合理的， b2的预期值应该是个空对象，所以在put之前需要把对象归零。如上示例，在bufpoo.Put(&b1) 之前增加b1=b1[0:0] 后输出结果如下:

[97 97 97 97 97] 5 512
0xc420096000, 0xc42000a060
[] 0 512
0xc420096000, 0xc42000a0e0

符合预期

技巧2: 避免用带有指针的结构体对象做大map的key

用带指针的对象做map的key, 在gc时会耗费更多的时间，因为gc需要根据指针去遍历所有的数据。比如map[string]int string 做map的key，string在go里用如下结构体实现:

type StringHeader struct {
    Data uintptr
    Len  int
}

详细介绍在StringHeader string中是包含指针的，所以相比用无指针的对象做key，gc会更耗时。

示例:

package main

import (
    "fmt"
    "runtime"
    "strconv"
    "time"
)

const numElements = 1000000

var foo = map[string]int{}

func case1() {
    for i := 0; i < numElements; i++ {
        foo[strconv.Itoa(i)] = i
    }

}

var foo2 = map[int]int{}

func case2() {
    for i := 0; i < numElements; i++ {
        foo2[i] = i
    }
}
func timeGC() {
    t := time.Now()
    runtime.GC()
    fmt.Println("gc took time:", time.Since(t))
}
func main() {
    case1()
    //case2()
    for {
        timeGC()
        time.Sleep(1 * time.Second)
    }

}

注释case2()，打开case1()时，输出如下:

gc took time: 40.927788ms
gc took time: 40.265383ms
gc took time: 40.235497ms
gc took time: 40.562543ms
gc took time: 41.379995ms
gc took time: 40.582498ms
gc took time: 42.926792ms

注释case1(), 打开case2()时，输出如下:

gc took time: 285.715µs
gc took time: 159.778µs
gc took time: 158.922µs
gc took time: 168.993µs
gc took time: 159.776µs
gc took time: 175.365µs

gc耗时差别巨大。

所以在使用大map时，尽量避免使用带指针的结构体对象做key。

技巧3: 使用 strings.Builder 来拼接字符串

Go 1.10 版本, 提供了strings.Builder 来更高效的进行字符串的拼接， Builder 底层实现是向一个byte 的 buffer 中不断写入数据. Builder 实现在src/string/builder.go 中, 结构定义如下:

type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte
}

通过例子对比下性能差异

// main.go
package main

import "strings"

var strs = []string{
	"here's",
	"a",
	"some",
	"long",
	"list",
	"of",
	"strings",
	"for",
	"you",
}

func buildStrNaive() string {
	var s string

	for _, v := range strs {
		s += v
	}

	return s
}
func buildStrBuilder(grow bool) string {
	b := strings.Builder{}
	if grow {
		b.Grow(60)
	}
	for _, v := range strs {
		b.WriteString(v)
	}
	return b.String()
}

// main_test.go
package main

import "testing"

func BenchmarkBuildStr(b *testing.B) {
	b.Run("Naive", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			buildStrNaive()
		}
	})
	b.Run("builder-0", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			buildStrBuilder(false)
		}
	})
	b.Run("builder-1", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			buildStrBuilder(true)
		}
	})
}

通过go test -bench=. -benchmem进行基准测试，结果如下

goos: darwin
goarch: amd64
BenchmarkBuildStr/Naive-4         	 3424706	       374 ns/op	     216 B/op	       8 allocs/op
BenchmarkBuildStr/builder-0-4     	 7817630	       176 ns/op	     120 B/op	       4 allocs/op
BenchmarkBuildStr/builder-1-4     	17136417	        67.4 ns/op	      64 B/op	       1 allocs/op
PASS

在通过 Builder.Grow() 提前预分配空间的情况下性能提升了4倍, 即使不提前预分配空间也能提升一倍多。

技巧4: 使用strconv包替代fmt包

在把整数转为字符串时，strconv.Itoa性能会比fmt.Sprintf好很多，fmt.Sprintf 使用接口interface{}作为参数，存在如下缺点:

失去了类型安全
变量转为interface{}时会进行内存申请

接下来，通过基准测试对比两种把整数转为字符串的方法

package main

import (
	"fmt"
	"strconv"
)

func strconvFmt(b int) string {
	return strconv.Itoa(b)
}
func fmtFmt(b int) string {
	return fmt.Sprintf("%d", b)
}
func main(){}

test文件:

func BenchmarkFmt(b *testing.B) {
	big := 10000
	small := 10
	b.Run("strconv_small", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			strconvFmt(small)
		}
	})
	b.Run("strconv_big", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			strconvFmt(big)
		}
	})
	b.Run("fmt.Sprintf_small", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmtFmt(small)
		}
	})
	b.Run("fmt.Sprintf_big", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmtFmt(big)
		}
	})

}

执行go test -bench=BenchmarkFmt -benchmem得到如下结果

goos: darwin
goarch: amd64
BenchmarkFmt/strconv_small-4         	305850991	         3.92 ns/op	       0 B/op	       0 allocs/op
BenchmarkFmt/strconv_big-4           	30948387	        32.7 ns/op	       5 B/op	       1 allocs/op
BenchmarkFmt/fmt.Sprintf_small-4     	10915570	       105 ns/op	      16 B/op	       2 allocs/op
BenchmarkFmt/fmt.Sprintf_big-4       	10358809	       113 ns/op	      16 B/op	       2 allocs/op
PASS

strconv.Itoa的性能是fmt.Sprintf的3倍多，并且strconv.Itoa在处理绝对值小于100的整数时做了优化，不需要进行alloc操作，性能更高。详情可以看src/strconv/itoa.go中的代码。

技巧5: []byte 转 string 时，用 unsafe 包

string(byteSlice) 把[]byte 转为 string 对应的是OARRAYBYTESTR操作 (src/cmd/compile/internal/gc/walk.go,src/cmd/compile/internal/gc/syntax.go)，此操作在编译阶段会映射成运行时函数slicebytetostring, 此函数定义在src/runtime/string.go 中，函数中需要进行内存申请，性能会有影响。

// Buf is a fixed-size buffer for the result,
// it is not nil if the result does not escape.
func slicebytetostring(buf *tmpBuf, b []byte) (str string) {
    l := len(b)
    if l == 0 {
        // Turns out to be a relatively common case.
        // Consider that you want to parse out data between parens in "foo()bar",
        // you find the indices and convert the subslice to string.
        return ""
    }

    var p unsafe.Pointer
    if buf != nil && len(b) <= len(buf) {
        p = unsafe.Pointer(buf)
    } else {
        p = mallocgc(uintptr(len(b)), nil, false)
    }
    stringStructOf(&str).str = p
    stringStructOf(&str).len = len(b)
    memmove(p, (*(*slice)(unsafe.Pointer(&b))).array, uintptr(len(b)))
    return
}

下面通过基准测试对比下:

func BenchmarkTostr(b *testing.B) {
	bs := []byte("hello go")
	var str string
	b.Run("unsafe", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			str = *(*string)(unsafe.Pointer(&bs))
		}

	})
	b.Run("normal", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			str = string(bs)
		}

	})
	fmt.Println(str)
}

基准测试结果:

goos: darwin
goarch: amd64
BenchmarkTostr/unsafe-4         	1000000000	         0.789 ns/op	       0 B/op	       0 allocs/op
BenchmarkTostr/normal-4         	64188240	        16.3 ns/op	       8 B/op	       1 allocs/op
hello go
PASS

从测试结果看，多了一次内存分配，性能差了20多倍所以在某些情况下，可以考虑使用 unsafe 包实现[]byte 转为string

说明

go 的版本信息为:

$ go version
go version go1.13 darwin/amd64

参考

Simple techniques to optimise Go programs