Go API 性能测试 – Akatsuki Sasori日志

我们在上线的时候对项目整体性能没有一个全面的了解的话，当流量越来越大时，可能会出现各种各样的问题，比如 CPU 占用高、内存使用率高等。为了避免这些性能瓶颈，我们在开发的过程中需要通过一定的手段来对程序进行性能分析。

Go 语言已经为开发者内置配套了很多性能调优监控的好工具和方法，这大大提升了我们 profile 分析的效率，借助这些工具我们可以很方便地来对 Go 程序进行性能分析。在 Go 语言开发中，通常借助于内置的 pprof 工具包来进行性能分析。

pprof

PProf 是一个 Go 程序性能分析工具，可以分析 CPU、内存等性能。Go 在语言层面上集成了 profile 采样工具，只需在代码中简单地引入 runtime/ppro 或者 net/http/pprof 包即可获取程序的 profile 文件，并通过该文件来进行性能分析

runtime/pprof 还可以为控制台程序或者测试程序产生 pprof 数据。

其实 net/http/pprof 中只是使用 runtime/pprof 包来进行封装了一下，并在 HTTP 端口上暴露出来。

使用pprof

下载 go -v -u ""github.com/gin-contrib/pprof""

引入import "github.com/gin-contrib/pprof"

在路由里面注册这个中间件

pprof.Register(g)

获取profile采集信息

通过 go tool pprof http://127.0.0.1:8080/debug/pprof/profile，可以获取 profile 采集信息并分析。

go tool pprof http://127.0.0.1:8080/debug/pprof/profile

Fetching profile over HTTP from http://127.0.0.1:8080/debug/pprof/profile

Saved profile in /home/jack/pprof/pprof.apiserver (deleted).samples.cpu.002.pb.gz

File: apiserver (deleted)

Type: cpu

Time: Jul 28, 2020 at 10:42pm (PDT)

Duration: 30s, Total samples = 0

Entering interactive mode (type "help" for commands, "o" for options)

(pprof) top10【这里输入了top10】

Showing nodes accounting for 0, 0% of 0 total

flat flat% sum% cum cum%

(pprof)

通过 topN 的输出可以分析出哪些函数占用 CPU 时间片最多，这些函数可能存在性能问题

不直观还可以生成svg.(确保系统已经安装 graphviz 命令。)

也可以直接在浏览器访问 http://localhost:8080/debug/pprof 来查看当前 API 服务的状态，包括 CPU 占用情况和内存使用情况等。

执行命令后，需要等待 30s，pprof 会进行采样

我本地是vmware 所以在宿主机上用的是虚拟机的ip

在 API 上线之前，我们需要知道 API 的性能，以便知道 API 服务器所能承载的最大请求量、性能瓶颈，再根据业务的需求量来对 API 进行性能调优或者扩缩容。通过这些可以使 API 稳定地对外提供服务，并且请求在合理的时间内返回。

API 性能测试指标

API 性能测试，大的方面包括 API 框架的性能和指定 API 的性能，因为指定 API 的性能跟该 API 具体的实现有关，比如有无数据库连接，有无复杂的逻辑处理等，脱离了具体实现来探讨单个 API 的性能是毫无意义的

衡量 API 性能的指标主要有 3 个：

并发数（Concurrent）

并发数是指某个时间范围内，同时正在使用系统的用户个数。

广义上的并发数是指同时使用系统的用户个数，这些用户可能调用不同的 API。严格意义上的并发数是指同时请求同一个 API 的用户个数。本小节所讨论的并发数是严格意义上的并发数。

每秒查询数（QPS）

每秒查询数 QPS 是对一个特定的查询服务器在规定时间内所处理流量多少的衡量标准。

QPS = 并发数 / 平均请求响应时间。

请求响应时间（TTLB）

请求响应时间指的是从客户端发出请求到得到响应的整个时间。这个过程从客户端发起的一个请求开始，到客户端收到服务器端的响应结束。在一些工具中，请求响应时间通常会被称为 TTLB（Time to last byte，意思是从发送一个请求开始，到客户端收到最后一个字节的响应为止所消费的时间）。请求响应时间的单位一般为"秒”或“毫秒”

衡量 API 性能的最主要指标是 QPS，但是在说明 QPS 时，需要指明是多少并发数下的 QPS，否则毫无意义，因为不同并发数下的 QPS 是不同的。比如单用户 100 QPS 和 100 用户 100 QPS 是两个不同的概念，前者说明 API 可以在一秒内串行执行 100 个请求，而后者说明在并发数为 100 的情况下，API 可以在一秒内处理 100 个请求。当 QPS 相同时，并发数越大，说明 API 性能越好，并发处理能力越强。

在并发数设置过大时，API 同时要处理很多请求，会频繁切换进程，而真正用于处理请求的时间变少，使得 QPS 反而会降低。并发数设置过大时，请求响应时间也会变大。API 会有一个合适的并发数，在该并发数下，API 的 QPS 可以达到最大，但该并发数不一定是最佳并发数，还要参考该并发数下的平均请求响应时间

Linux 下有很多 Web 性能测试工具，常用的有 Jmeter、AB、Webbench 和 Wrk。每个工具都有自己的特点，本小节用 Wrk 来对 API 进行性能测试。Wrk 非常简单，安装方便，测试结果也相对专业些，并且可以支持 Lua 脚本来创建更复杂的测试场景。

Wrk 安装

安装步骤如下（需要切换到 root 用户）：

Clone wrk repo

git clone https://github.com/wg/wrk

2 执行 make安装

make cp ./wrk /usr/bin

Wrk 使用方法

Wrk 使用起来不复杂，执行 wrk --help 可以看到 wrk 的所有运行参数：

$ wrk --help Usage: wrk <options> <url> Options: -c, --connections <N> Connections to keep open -d, --duration <T> Duration of test -t, --threads <N> Number of threads to use -s, --script <S> Load Lua script file -H, --header <H> Add header to request --latency Print latency statistics --timeout <T> Socket/request timeout -v, --version Print version details Numeric arguments may include a SI unit (1k, 1M, 1G) Time arguments may include a time unit (2s, 2m, 2h)

常用的参数为：

-t: 线程数（线程数不要太多，是核数的 2 到 4 倍即可，多了反而会因为线程切换过多造成效率降低）
-c: 并发数
-d: 测试的持续时间，默认为 10s
-T: 请求超时时间
-H: 指定请求的 HTTP Header，有些 API 需要传入一些 Header，可通过 Wrk 的 -H 参数来传入
--latency: 打印响应时间分布
-s: 指定 Lua 脚本，Lua 脚本可以实现更复杂的请求

Wrk 结果解析

一个简单的测试如下：

$ wrk -t144 -c3000 -d30s -T30s --latency http://127.0.0.1:8080/sd/health Running 30s test @ http://127.0.0.1:8088/sd/health 144 threads and 3000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 32.01ms 39.32ms 488.62ms 87.93% Req/Sec 1.00k 251.79 3.35k 69.00% Latency Distribution 50% 25.05ms 75% 55.36ms 90% 78.45ms 99% 166.76ms 4329733 requests in 30.10s, 1.81GB read Socket errors: connect 0, read 5, write 0, timeout 64 Requests/sec: 143850.26 Transfer/sec: 61.46MB

144 threads and 3000 connections: 用 144 个线程模拟 3000 个连接，分别对应 -t 和 -c 参数
Thread Stats：线程统计

- Latency: 响应时间，有平均值、标准偏差、最大值、正负一个标准差占比
- Req/Sec: 每个线程每秒完成的请求数, 同样有平均值、标准偏差、最大值、正负一个标准差占比

Latency Distribution: 响应时间分布

- 50%: 50% 的响应时间为：4.74ms
- 75%: 75% 的响应时间为：23.42ms
- 90%: 90% 的响应时间为：82.88ms
- 99%: 99% 的响应时间为：236.39ms

19373531 requests in 30.10s, 1.35GB read: 30s 完成的总请求数（19373531）和数据读取量（1.35GB）
Socket errors: connect 0, read 5, write 0, timeout 64: 错误统计
Requests/sec: QPS
Transfer/sec: TPS

144个线程 3000个请求 -d 测试持续时间 30s -T 请求超时时间30s

wrk -t144 -c3000 -d30s -T30s --latency http://127.0.0.1:8080/sd/health

Running 30s test @ http://127.0.0.1:8080/sd/health

144 threads and 3000 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 584.28ms 2.87s 29.95s 95.33%

Req/Sec 6.25k 4.95k 12.16k 40.52%

Latency Distribution

50% 64.00us

75% 81.00us

90% 282.00us

99% 17.21s

330070 requests in 30.10s, 141.02MB read

Socket errors: connect 0, read 0, write 0, timeout 3

Requests/sec: 10965.79

Transfer/sec: 4.69MB

性能分析

在执行性能测试的过程中，运行 go tool pprof http://127.0.0.1:8080/debug/pprof/profile，采集 30s 的性能数据并查看耗时比较久的 20 个函数：

使用go tool pprof的时候在pprof中需要加上 -cum参数：

即

(pprof) top -cum Showing nodes accounting for 6.38s, 3.58% of 178.08s total Dropped 510 nodes (cum <= 0.89s) Showing top 10 nodes out of 199 flat flat% sum% cum cum% 0.31s 0.17% 0.17% 104.25s 58.54% net/http.(*conn).serve 0.08s 0.045% 0.22% 63.81s 35.83% net/http.serverHandler.ServeHTTP 0.09s 0.051% 0.27% 63.73s 35.79% vendor/github.com/gin-gonic/gin.(*Engine).ServeHTTP 0.15s 0.084% 0.35% 63.08s 35.42% vendor/github.com/gin-gonic/gin.(*Engine).handleHTTPRequest 0.09s 0.051% 0.4% 60.25s 33.83% runtime.mcall 0.22s 0.12% 0.53% 59.60s 33.47% vendor/github.com/gin-gonic/gin.(*Context).Next 0.04s 0.022% 0.55% 59.58s 33.46% vendor/github.com/gin-gonic/gin.RecoveryWithWriter.func1 0.50s 0.28% 0.83% 59.20s 33.24% runtime.schedule 0.02s 0.011% 0.84% 59.16s 33.22% apiserver/router/middleware.NoCache 4.88s 2.74% 3.58% 57.76s 32.43% runtime.findrunnable ....

这样就能根据定位到耗时函数，从而针对性的解决问题

以上全是手动执行的测试，我们来搞个自动化脚本跑一跑

相关文章

发表评论 取消回复

发表评论取消回复