go服务优化技巧
简介
本文介绍go服务可能会用到的几个优化技巧
技巧1: sync.Pool 池化某些对象,实现复用
池化后进行对象复用,可以减少对象重复创建的开销,并且可以减轻gc的压力。
使用示例:
package main
import (
	"fmt"
	"sync"
)
var bufpool = sync.Pool{
	New: func() interface{} {
		buf := make([]byte, 0, 512)
		return &buf
	},
}
func main() {
	b1 := *bufpool.Get().(*[]byte)
	b1 = append(b1, []byte("aaaaa")...)
	fmt.Println(b1, len(b1), cap(b1))
	fmt.Printf("%p, %p \n", b1, &b1)
	bufpool.Put(&b1)
	b2 := *bufpool.Get().(*[]byte)
	fmt.Println(b2, len(b2), cap(b2))
	fmt.Printf("%p, %p \n", b2, &b2)
	bufpool.Put(&b2)
}
示例程序输出结果为:
[97 97 97 97 97] 5 512
0xc4200a6000, 0xc4200a2020
[97 97 97 97 97] 5 512
0xc4200a6000, 0xc4200a20a0
b2和b1对应的底层页数组的地址是一致的,并且b2从pool里取出来时,保留了b1的值,这样是不合理的,
b2的预期值应该是个空对象,所以在put之前需要把对象归零。
如上示例,在bufpoo.Put(&b1) 之前增加b1=b1[0:0] 后输出结果如下:
[97 97 97 97 97] 5 512
0xc420096000, 0xc42000a060
[] 0 512
0xc420096000, 0xc42000a0e0
符合预期
技巧2: 避免用带有指针的结构体对象做大map的key
用带指针的对象做map的key, 在gc时会耗费更多的时间,因为gc需要根据指针去遍历所有的数据。
比如map[string]int string 做map的key,string在go里用如下结构体实现:
type StringHeader struct {
    Data uintptr
    Len  int
}
详细介绍在StringHeader string中是包含指针的,所以相比用无指针的对象做key,gc会更耗时。
示例:
package main
import (
    "fmt"
    "runtime"
    "strconv"
    "time"
)
const numElements = 1000000
var foo = map[string]int{}
func case1() {
    for i := 0; i < numElements; i++ {
        foo[strconv.Itoa(i)] = i
    }
}
var foo2 = map[int]int{}
func case2() {
    for i := 0; i < numElements; i++ {
        foo2[i] = i
    }
}
func timeGC() {
    t := time.Now()
    runtime.GC()
    fmt.Println("gc took time:", time.Since(t))
}
func main() {
    case1()
    //case2()
    for {
        timeGC()
        time.Sleep(1 * time.Second)
    }
}
注释case2(), 打开case1()时,输出如下:
gc took time: 40.927788ms
gc took time: 40.265383ms
gc took time: 40.235497ms
gc took time: 40.562543ms
gc took time: 41.379995ms
gc took time: 40.582498ms
gc took time: 42.926792ms
注释case1(), 打开case2()时,输出如下:
gc took time: 285.715µs
gc took time: 159.778µs
gc took time: 158.922µs
gc took time: 168.993µs
gc took time: 159.776µs
gc took time: 175.365µs
gc耗时差别巨大。
所以在使用大map时,尽量避免使用带指针的结构体对象做key。
技巧3: 使用 strings.Builder 来拼接字符串
Go 1.10 版本, 提供了strings.Builder 来更高效的进行字符串的拼接,
Builder 底层实现是向一个byte 的 buffer 中不断写入数据.
Builder 实现在src/string/builder.go 中, 结构定义如下:
type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte
}
通过例子对比下性能差异
// main.go
package main
import "strings"
var strs = []string{
	"here's",
	"a",
	"some",
	"long",
	"list",
	"of",
	"strings",
	"for",
	"you",
}
func buildStrNaive() string {
	var s string
	for _, v := range strs {
		s += v
	}
	return s
}
func buildStrBuilder(grow bool) string {
	b := strings.Builder{}
	if grow {
		b.Grow(60)
	}
	for _, v := range strs {
		b.WriteString(v)
	}
	return b.String()
}
// main_test.go
package main
import "testing"
func BenchmarkBuildStr(b *testing.B) {
	b.Run("Naive", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			buildStrNaive()
		}
	})
	b.Run("builder-0", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			buildStrBuilder(false)
		}
	})
	b.Run("builder-1", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			buildStrBuilder(true)
		}
	})
}
通过go test -bench=. -benchmem进行基准测试,结果如下
goos: darwin
goarch: amd64
BenchmarkBuildStr/Naive-4         	 3424706	       374 ns/op	     216 B/op	       8 allocs/op
BenchmarkBuildStr/builder-0-4     	 7817630	       176 ns/op	     120 B/op	       4 allocs/op
BenchmarkBuildStr/builder-1-4     	17136417	        67.4 ns/op	      64 B/op	       1 allocs/op
PASS
在通过 Builder.Grow() 提前预分配空间的情况下性能提升了4倍, 即使不提前预分配空间也能提升一倍多。
技巧4: 使用strconv包替代fmt包
在把整数转为字符串时,strconv.Itoa性能会比fmt.Sprintf好很多,fmt.Sprintf 使用接口interface{}作为参数,存在如下缺点:
- 失去了类型安全
 - 变量转为interface{}时会进行内存申请
 
接下来, 通过基准测试对比两种把整数转为字符串的方法
package main
import (
	"fmt"
	"strconv"
)
func strconvFmt(b int) string {
	return strconv.Itoa(b)
}
func fmtFmt(b int) string {
	return fmt.Sprintf("%d", b)
}
func main(){}
test文件:
func BenchmarkFmt(b *testing.B) {
	big := 10000
	small := 10
	b.Run("strconv_small", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			strconvFmt(small)
		}
	})
	b.Run("strconv_big", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			strconvFmt(big)
		}
	})
	b.Run("fmt.Sprintf_small", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmtFmt(small)
		}
	})
	b.Run("fmt.Sprintf_big", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmtFmt(big)
		}
	})
}
执行go test -bench=BenchmarkFmt -benchmem得到如下结果
goos: darwin
goarch: amd64
BenchmarkFmt/strconv_small-4         	305850991	         3.92 ns/op	       0 B/op	       0 allocs/op
BenchmarkFmt/strconv_big-4           	30948387	        32.7 ns/op	       5 B/op	       1 allocs/op
BenchmarkFmt/fmt.Sprintf_small-4     	10915570	       105 ns/op	      16 B/op	       2 allocs/op
BenchmarkFmt/fmt.Sprintf_big-4       	10358809	       113 ns/op	      16 B/op	       2 allocs/op
PASS
strconv.Itoa的性能是fmt.Sprintf的3倍多,并且strconv.Itoa在处理绝对值小于100的整数时做了优化,
不需要进行alloc操作,性能更高。详情可以看src/strconv/itoa.go中的代码。
技巧5: []byte 转 string 时,用 unsafe 包
string(byteSlice) 把[]byte 转为 string 对应的是OARRAYBYTESTR操作 (src/cmd/compile/internal/gc/walk.go,src/cmd/compile/internal/gc/syntax.go),
此操作在编译阶段会映射成运行时函数slicebytetostring,  此函数定义在src/runtime/string.go 中, 函数中需要进行内存申请,性能会有影响。
// Buf is a fixed-size buffer for the result,
// it is not nil if the result does not escape.
func slicebytetostring(buf *tmpBuf, b []byte) (str string) {
    l := len(b)
    if l == 0 {
        // Turns out to be a relatively common case.
        // Consider that you want to parse out data between parens in "foo()bar",
        // you find the indices and convert the subslice to string.
        return ""
    }
    var p unsafe.Pointer
    if buf != nil && len(b) <= len(buf) {
        p = unsafe.Pointer(buf)
    } else {
        p = mallocgc(uintptr(len(b)), nil, false)
    }
    stringStructOf(&str).str = p
    stringStructOf(&str).len = len(b)
    memmove(p, (*(*slice)(unsafe.Pointer(&b))).array, uintptr(len(b)))
    return
}
下面通过基准测试对比下:
func BenchmarkTostr(b *testing.B) {
	bs := []byte("hello go")
	var str string
	b.Run("unsafe", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			str = *(*string)(unsafe.Pointer(&bs))
		}
	})
	b.Run("normal", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			str = string(bs)
		}
	})
	fmt.Println(str)
}
基准测试结果:
goos: darwin
goarch: amd64
BenchmarkTostr/unsafe-4         	1000000000	         0.789 ns/op	       0 B/op	       0 allocs/op
BenchmarkTostr/normal-4         	64188240	        16.3 ns/op	       8 B/op	       1 allocs/op
hello go
PASS
从测试结果看,多了一次内存分配,性能差了20多倍
所以在某些情况下,可以考虑使用 unsafe 包实现[]byte 转为string
说明
go 的版本信息为:
$ go version
go version go1.13 darwin/amd64
       