前些天朋友遇到一个关于以太坊使用的leveldb导致的数组越界问题,一起讨论了很久。如果大家持续使用以太坊节点,迟早也会遇到此问题,在本篇文章中给大家分析一下,做好提前准备。
我们先看一下具体的异常信息,对于普通的异常重启geth节点即可解决,但如果遇到下面这个异常信息,重启或升级版本都是无法解决的。
INFO [04-28|10:03:35] Starting peer-to-peer node instance=Geth/v1.7.3-stable/linux-amd64/go1.9INFO [04-28|10:03:35] Allocated cache and file handles database=/mnt/data/eth/geth/chaindata cache=128 handles=1024panic: runtime error: index out of rangegoroutine 1 [running]:github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.shortenb(0x10040ff1126, 0x4, 0xc4204bf9f8) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/util.go:30 +0x14dgithub.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*version).computeCompaction(0xc4416120f0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/version.go:395 +0x4b3github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*versionStaging).finish(0xc4204bfd18, 0xc4201e8000) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/version.go:510 +0x935github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*version).spawn(0xc420182230, 0xc4201e8000, 0xc420182230) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/version.go:279 +0x7agithub.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*session).commit(0xc4201d0240, 0xc4201e8000, 0x0, 0x0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/session.go:195 +0x88github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).recoverJournal(0xc4200ee600, 0xc4200ee600, 0xc420068660) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db.go:538 +0xdb8github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.openDB(0xc4201d0240, 0x0, 0x0, 0xc4201d0240) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db.go:122 +0x6bagithub.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.Open(0x185e540, 0xc4202047e0, 0xc4204c02f0, 0x0, 0x0, 0x0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db.go:194 +0x100github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb.OpenFile(0xc4201f7ba0, 0x1c, 0xc4204c02f0, 0xc4201c5bb0, 0x4, 0x4) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/db.go:216 +0x97github.com/ethereum/go-ethereum/ethdb.NewLDBDatabase(0xc4201f7ba0, 0x1c, 0x80, 0x400, 0x1c, 0x0, 0x0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/ethdb/database.go:72 +0x363github.com/ethereum/go-ethereum/node.(*ServiceContext).OpenDatabase(0xc4202a2da0, 0xf4ad6c, 0x9, 0x80, 0x400, 0x0, 0x0, 0x0, 0x0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/node/service.go:46 +0x133github.com/ethereum/go-ethereum/eth.CreateDB(0xc4202a2da0, 0xc4203f4800, 0xf4ad6c, 0x9, 0x0, 0x0, 0x0, 0x0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/eth/backend.go:201 +0x5dgithub.com/ethereum/go-ethereum/eth.New(0xc4202a2da0, 0xc4203f4800, 0x181d560, 0xc4204c5808, 0x417268) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/eth/backend.go:111 +0x93github.com/ethereum/go-ethereum/cmd/utils.RegisterEthService.func2(0xc4202a2da0, 0xc4204b4420, 0xc4204c5b18, 0x0, 0xc4204b4450) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/utils/flags.go:1065 +0x3dgithub.com/ethereum/go-ethereum/node.(*Node).Start(0xc4201f4480, 0x0, 0x0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/node/node.go:175 +0x433github.com/ethereum/go-ethereum/cmd/utils.StartNode(0xc4201f4480) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/utils/cmd.go:62 +0x2fmain.startNode(0xc420230840, 0xc4201f4480) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/geth/main.go:225 +0x43main.geth(0xc420230840, 0xffbbe0, 0xb2d05e00) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/geth/main.go:215 +0x43github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli%2ev1.HandleAction(0xdd0280, 0xffd068, 0xc420230840, 0xc420230840, 0xc4204c5f40) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli.v1/app.go:490 +0xd2github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli%2ev1.(*App).Run(0xc420250000, 0xc4200100e0, 0xe, 0xe, 0x0, 0x0) /mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli.v1/app.go:264
先查看一下异常信息的第一段代码位置:
/mnt/go/src/github.com/ethereum/go-ethereum-1.7.3/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/util.go:30
进入此源代码之后,能够看到如下代码:
var bunits = [...]string{"", "Ki", "Mi", "Gi"}func shortenb(bytes int) string { i := 0 for ; bytes > 1024 && i < 4; i++ { bytes /= 1024 } return fmt.Sprintf("%d%sB", bytes, bunits[i])}
其中异常就发生在return代码部分,也就是通过bunits[i]获取数据时,i的值超出了bunits数组的范围。
看这段代码,当shortenb传入的bytes<1024 * 1024 * 1024是没问题的,i <= 3。但是,当bytes>1024 * 1024 * 1024 * 1024时,也就是单位到TB的时候,i的值将等于4,此时将发生数组越界异常。
为什么刚才说大家迟早会遇到这个问题呢,就是当我们同步区块链数据一开始就使用full或者很早就采用full模式的话,数据量很快会到达TB级别,而leveldb的这段代码,当到达TB级别之后就会出现数组越界异常。
上面已经分析了问题的原因,那么怎么解决这个问题呢?将数组bunits再扩展一个“Ti”项?这样修改不敢打包票会修复问题,因为只是在数组里面添加一个类型,不确定其他地方是否能够使用此类型。如果要这样修改,可能需要通读相关的代码,然后测试验证才可以。
另外一种比较轻量级的改动是将for循环中i<4的判断修改为i<3,修改后的代码为:
var bunits = [...]string{"", "Ki", "Mi", "Gi"}func shortenb(bytes int) string { i := 0 for ; bytes > 1024 && i < 3; i++ { bytes /= 1024 } return fmt.Sprintf("%d%sB", bytes, bunits[i])}
这样再拿上面bytes>1024 * 1024 * 1024 * 1024计算一下,当单位编程TB的时候,会使用1024GB,符合原来数组的最大单位。
PS:当然,修改之后大家是需要进行相应级别数据量的测试验证的。