tpch.q17 product tests fails due exceeding travis memory limit #9295

kokosing · 2017-11-07T09:33:14Z

tpch.q17 product tests is failing constantly due the limited memory in travis. One way to fix this is to quarantine (disable) the tests. Though, the better idea would be to investigate why memory consumption increased, because this tests was passing.

2017-11-07 14:31:19 INFO: [229 of 403] sql_tests.testcases.hive_tpch.q17 (Groups: tpch)
presto-master_1       | 2017-11-07T14:32:15.023+0545	WARN	node-state-poller-0	com.facebook.presto.metadata.RemoteNodeState	Node state update request to http://172.18.0.7:8081/v1/info/state has not returned in 10.66s
presto-master_1       | 2017-11-07T14:32:17.415+0545	WARN	ContinuousTaskStatusFetcher-20171107_084619_00381_sxx3w.1.0-1320	com.facebook.presto.server.remotetask.RequestErrorTracker	Error getting task status 20171107_084619_00381_sxx3w.1.0: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed: http://172.18.0.7:8081/v1/task/20171107_084619_00381_sxx3w.1.0
presto-master_1       | 2017-11-07T14:32:20.918+0545	WARN	node-state-poller-0	com.facebook.presto.metadata.RemoteNodeState	Node state update request to http://172.18.0.7:8081/v1/info/state has not returned in 16.55s
presto-worker_1       | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000cc000000, 175112192, 0) failed; error='Cannot allocate memory' (errno=12)
presto-worker_1       | #
presto-worker_1       | # There is insufficient memory for the Java Runtime Environment to continue.
presto-worker_1       | # Native memory allocation (mmap) failed to map 175112192 bytes for committing reserved memory.
presto-worker_1       | # An error report file with more information is saved as:
presto-master_1       | 2017-11-07T14:32:27.304+0545	WARN	node-state-poller-0	com.facebook.presto.metadata.RemoteNodeState	Node state update request to http://172.18.0.7:8081/v1/info/state has not returned in 22.94s
presto-worker_1       | # /var/presto/hs_err_pid5.log

The text was updated successfully, but these errors were encountered:

kokosing · 2017-11-07T11:58:54Z

There is nothing obvious which could recently introduce such change. Either no memory invasive code change were recently merged nor travis environment got changed.

Since we already have couple of tpch product tests queries marked as big_query (ones which causes problems on travis due their resource consumption), I think it will be good enough to mark this one as big query as well.

See #9298

findepi · 2017-11-09T20:34:54Z

q17 still fails from time to time in product tests, even after #9232, e.g. https://travis-ci.org/prestodb/presto/jobs/299754172 (PR build for JDBC connector changes, so not affecting hive & product tests)

findepi · 2017-11-11T15:27:03Z

from https://travis-ci.org/prestodb/presto/jobs/300219197, during q17 execution:

presto-worker_1       | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000cc400000, 174063616, 0) failed; error='Cannot allocate memory' (errno=12)
presto-worker_1       | #
presto-worker_1       | # There is insufficient memory for the Java Runtime Environment to continue.
presto-worker_1       | # Native memory allocation (mmap) failed to map 174063616 bytes for committing reserved memory.

from https://travis-ci.org/prestodb/presto/jobs/300027569

presto-worker_1       | /docker/volumes/conf/docker/files/presto-launcher-wrapper.sh: line 24:     6 Killed                  /docker/volumes/presto-server/bin/launcher -Dnode.id="${HOSTNAME}" -Dcatalog.config-dir="${PRESTO_CONFIG_DIRECTORY}"/catalog --config="${PRESTO_CONFIG_DIRECTORY}/${CONFIG}".properties --jvm-config="${PRESTO_CONFIG_DIRECTORY}"/jvm.config --log-levels-file="${PRESTO_CONFIG_DIRECTORY}"/log.properties --data-dir=/var/presto "$@"

findepi · 2017-11-11T16:22:15Z

Setting MALLOC_ARENA_MAX={1,2} in docker containers used for product tests didn't fix the issue (#9310, #9331)

kokosing · 2017-11-23T09:26:57Z

Fixed with #9298

kokosing self-assigned this Nov 7, 2017

This was referenced Nov 7, 2017

Fix error message with invalid catalog and schema #8281

Merged

Add tpch.q17 test query to big_query tests group #9298

Merged

findepi mentioned this issue Nov 8, 2017

Set MALLOC_ARENA_MAX=2 for product tests docker Java containers #9310

Closed

findepi mentioned this issue Nov 11, 2017

[for Travis, ignore] #9331

Closed

findepi mentioned this issue Nov 23, 2017

Decimal as optionally default type for fixed point literals (v2) #9369

Closed

kokosing closed this as completed Nov 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tpch.q17 product tests fails due exceeding travis memory limit #9295

tpch.q17 product tests fails due exceeding travis memory limit #9295

kokosing commented Nov 7, 2017

kokosing commented Nov 7, 2017

findepi commented Nov 9, 2017

findepi commented Nov 11, 2017

findepi commented Nov 11, 2017

kokosing commented Nov 23, 2017

tpch.q17 product tests fails due exceeding travis memory limit #9295

tpch.q17 product tests fails due exceeding travis memory limit #9295

Comments

kokosing commented Nov 7, 2017

kokosing commented Nov 7, 2017

findepi commented Nov 9, 2017

findepi commented Nov 11, 2017

findepi commented Nov 11, 2017

kokosing commented Nov 23, 2017