history的jvm配置
安装:启动时堆外内存不够
启动报错,原因是需要38G direct memory,但是只配了25G,不够
Not enough direct memory.
Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes,
or druid.processing.numThreads: maxDirectMemory[26,843,545,600],
memoryNeeded[38,654,705,664]
= druid.processing.buffer.sizeBytes[1,073,741,824] *( druid.processing.numThreads[35] + 1 )
-XX:MaxDirectMemorySize=25g
history配置中
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=35
运行
history节点挂掉了
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000048d95000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
经过排查,机器内存不够了,一共64G内存,新启动的别的服务占了20G,历史节点分配了25+25G。
重新启动把启动内存调为15G(堆内)+15G(堆外)
-Xmx15g
-XX:MaxDirectMemorySize=15g
-XX:+DisableExplicitGC
2016-12-23 10:26:18,141 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {topic_a={receivedCount=691, sentCount=0, droppedCount=691, unparseableCount=0}, bid_sdk={receivedCount=65384, sentCount=0, droppedCount=65384, unparseableCount=0}} pending messages in 6ms and committed offsets in 50ms.
Loading segment完本地的64828条数据之后Announcing segment,从coordinator导入一些数据之后挂了
INFO [ZkCoordinator-loading-12] io.druid.server.coordination.ZkCoordinator - Loading segment[64082/64828][xxx_stat_2015-04-16T11:00:00.000+08:00_2015-04-16T12:00:00.000+08:00_2015-04-16T12:18:11.745+08:00_2]
......
INFO [main] io.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[xxx_stat_2015-08-19T11:00:00.000+08:00_2015-08-19T12:00:00.000+08:00_2015-08-19T12:17:07.390+08:00] at path[/druid/segments/xxx.xxx.xxx.com:8070/xxx.xxx.xxx.com:8070_historical__default_tier_2017-03-28T15:14:21.586+08:00_a3aaa395f9c24d769ed62c5e5dec84d5257]
......
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0
参考hellojava博客从生成的hs_err_pid16234.log中看到Dynamic libraries大致有以下几种类型的数据,一共65536条,其中包括一些系统依赖、jar包,最多的还是历史节点persistent数据。hs_err_pid文件的含义参考http://www.raychase.net/1459
Dynamic libraries
46338000-4633b000 ---p 46338000 00:00 0
32d3000000-32d3015000 r-xp 00000000 08:01 459257 /lib64/libselinux.so.1
2aaabbdfd000-2aaabbe06000 r--s 00068000 08:06 106758257 /disk1/xxxx/druid-0.9.1.1/lib/druid-indexing-service-0.9.1.1.jar
2aac9cb92000-2aac9cba3000 r--s 00000000 08:31 162054309 /disk4/xxxx/druid-0.9.1.1-historical-data/persistent/xxx_stat/2016-04-22T09:00:00.000+08:00_2016-04-22T10:00:00.000+08:00/2016-04-22T10:30:13.104+08:00/0/00000.smoosh
这个数据来自
/proc/{pid}/maps
linux操作系统中max_map_count限制一个进程可以拥有的VMA(虚拟内存区域)的数量,虚拟内存可以参考博客:理解虚拟内存
cat /proc/sys/vm/max_map_count
65536
查看sun.nio.ch.FileChannelImpl.map0的源码:
try {
var7 = this.map0(var6, var34, var15);
} catch (OutOfMemoryError var30) {
System.gc();
try {
Thread.sleep(100L);
} catch (InterruptedException var29) {
Thread.currentThread().interrupt();
}
try {
var7 = this.map0(var6, var34, var15);
} catch (OutOfMemoryError var28) {
throw new IOException("Map failed", var28);
}
}
map file这里是在OOM后靠显式的去执行System.gc来回收。应用启动参数上有-XX:+DisableExplicitGC,导致了在map file个数到达了max_map_count后第一次OOM调用System.gc不起作用。
原来是历史节点数据量增长,导致map file个数增长,应用启动参数上有-XX:+DisableExplicitGC,导致了在map file个数到达了max_map_count后直接OOM了。
换成文中推荐的-XX:+ExplicitGCInvokesConcurrent,又出现新的错误,java.lang.OutOfMemoryError: unable to create new native thread
2017-03-28T16:54:20,482 ERROR [main-EventThread] org.apache.zookeeper.ClientCnxn - Caught unexpected throwable
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method) ~[?:1.8.0_40]
......
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) [zookeeper-3.4.8.jar:3.4.8--1]
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed.
......
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00002aaaacd88000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
查看用户进程数是不是有限制ulimit(-a是所有,-u是用户线程数)
ulimit -u
3276400
发现并没有限制
再查看新的hs_err_pidxxx.log发现Dynamic libraries还是只有65536条。刚才设置的参数可能进行了gc,但是数据本身可能就超过了65536,加上其他的东西,肯定是超过了。
让运维把max_map_count改成200000
echo 200000 > /proc/sys/vm/max_map_count
//或者:
sysctl -w vm.max map count=200000
现在应该是正常了,查看正在运行的,Dynamic libraries有8w多条
wc -l /proc/24992/maps
86654 /proc/24992/maps
运行一段时间后我发现
wc -l /proc/24992/maps
65765 /proc/24992/maps
wc -l /proc/24992/maps
65714 /proc/24992/maps
查询条数的限制
com.metamx.common.ISE: Maximum number of rows [500000] reached
在history与cordinator配置
druid.query.groupBy.maxResults=5000000
drop所有数据
2016-12-23 10:26:18,141 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {topic_a={receivedCount=691, sentCount=0, droppedCount=691, unparseableCount=0}, bid_sdk={receivedCount=65384, sentCount=0, droppedCount=65384, unparseableCount=0}} pending messages in 6ms and committed offsets in 50ms.
pending,消费的速度太慢了解决办法:
- consumer.numThreads:启动的消费者个数,调大,默认为core-1
- 量大的topic启动多个tranquility,使用同一个kafka.group.id
- 调大"task.partitions",分区数,即任务启动的poen个数,默认为"1"。
- 增加middle Manager节点的个数,增加其能够处理的任务数配置druid.worker.capacity
tranquility配置
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
middle Manager配置
druid.worker.capacity = 7
例如,有3台不同机器上启动的middle节点,有2个tranquility,配置task.partitions为3,task.replicants为默认的1。那么每台middle Manager节点会启动两个poen来分别处理这两个tranquility的任务。
当然,分区和备份、Middle manager的druid.worker.capacity也不能无限调大。
druid.worker.capacity >= (tranquility的个数 * partitions * replicants)/(middle manager个数)
本文由 妖言君 创作,采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
最后编辑时间为: Jan 10, 2021 at 01:19 pm