ClickHouse中“大列”造成的JOIN的内存超限问题

“大列”是指单行数据量非常大的列,通常是100KiB以上。这样的列会导致JOIN(通常LEFT JOIN 和 INNER JOIN)出现内存超限的异常。

常用的JOIN算法

这里讨论的是常用的JOIN算法:partial merge join 与 hash join。Direct join算法不在本文讨论范围。

内存超限的错误一般长这样:

2024.01.16 17:04:30.084819 [ 3654586 ] {52653888-f266-443a-beba-348c66a88e60} <Error> TCPHandler: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 33.61 GiB (attempt to allocate chunk of 34359738368 bytes), maximum: 27.94 GiB.: While executing JoiningTransform. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

测试数据

为了分析“大列”的JOIN运算出现内存超限问题,这里准备了特定的测试数据。

测试数据为两张表:

table1 - 相当于N表,10W行,两列:id,fk(外键)

table2 - 相当于1表,1W行,两列:id,value(value为超长字符串列,每个cell大小500K左右)

创建测试数据的查询如下:

drop database if exists `big_columns_join`;
create database `big_columns_join`;
use big_columns_join; create table `big_columns_join`.`big_columns_table` (
`id` UInt64,
`fk` UInt64
) ENGINE = MergeTree() order by id
as (select
`number`,
rand(4) % 10000
from numbers(100000)
settings max_block_size = 100
); create table `big_columns_join`.`dim_table1` (
`id` UInt64,
`value` String
) ENGINE = MergeTree() order by id
as (select
`number`,
repeat(toString(rand(1)), 100000)
from numbers(10000));

测试查询

测试查询是简单的LEFT JOIN。

select
*
from
`big_columns_join`.`big_columns_table`
left join `big_columns_join`.`dim_table1` on `big_columns_join`.`big_columns_table`.`fk` = `big_columns_join`.`dim_table1`.`id`
format Null;

Hash join算法

运行测试查询之前,设定max_joined_block_size_rows阈值。

set join_algorithm = 'hash';
set max_joined_block_size_rows=10000; -- 阈值设小,需要内存会变少。
-- 运行测试查询

测试结果是:

JOIN算法为hash join时,无论如何调整max_joined_block_size_rows的阈值大小,均会内存超限。

报错信息如下:

0. Poco::Exception::Exception(String const&, int) @ 0x000000001504f759 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000b243077 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
2. DB::Exception::Exception<char const*, char const*, String, long&, String, char const*, std::basic_string_view<char, std::char_traits<char>>>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<long&>::type, std::type_identity<String>::type, std::type_identity<char const*>::type, std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type>, char const*&&, char const*&&, String&&, long&, String&&, char const*&&, std::basic_string_view<char, std::char_traits<char>>&&) @ 0x000000000b255d97 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
3. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000b2551b9 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
4. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000b254d6a in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
5. CurrentMemoryTracker::alloc(long) @ 0x000000000b21b30f in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
6. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x000000000b218368 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
7. void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::resize<>(unsigned long) @ 0x0000000006343dc6 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
8. DB::ColumnString::insertFrom(DB::IColumn const&, unsigned long) @ 0x0000000011c0ae2c in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
9. void DB::(anonymous namespace)::AddedColumns::appendFromBlock<false>(DB::Block const&, unsigned long) @ 0x00000000113d24ca in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
10. DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 63ul, 64ul> DB::(anonymous namespace)::joinRightColumns<(DB::JoinKind)1, (DB::JoinStrictness)3, DB::ColumnsHashing::HashMethodOneNumber<PairNoInit<unsigned long, DB::RowRefList>, DB::RowRefList const, unsigned long, false, true, false>, HashMapTable<unsigned long, HashMapCell<unsigned long, DB::RowRefList, DefaultHash<unsigned long>, HashTableNoState, PairNoInit<unsigned long, DB::RowRefList>>, DefaultHash<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>(std::vector<DB::ColumnsHashing::HashMethodOneNumber<PairNoInit<unsigned long, DB::RowRefList>, DB::RowRefList const, unsigned long, false, true, false>, std::allocator<DB::ColumnsHashing::HashMethodOneNumber<PairNoInit<unsigned long, DB::RowRefList>, DB::RowRefList const, unsigned long, false, true, false>>>&&, std::vector<HashMapTable<unsigned long, HashMapCell<unsigned long, DB::RowRefList, DefaultHash<unsigned long>, HashTableNoState, PairNoInit<unsigned long, DB::RowRefList>>, DefaultHash<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>> const*, std::allocator<HashMapTable<unsigned long, HashMapCell<unsigned long, DB::RowRefList, DefaultHash<unsigned long>, HashTableNoState, PairNoInit<unsigned long, DB::RowRefList>>, DefaultHash<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>> const*>> const&, DB::(anonymous namespace)::AddedColumns&, DB::JoinStuff::JoinUsedFlags&) @ 0x00000000113d602a in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
11. void DB::HashJoin::joinBlockImpl<(DB::JoinKind)1, (DB::JoinStrictness)3, DB::HashJoin::MapsTemplate<DB::RowRefList>>(DB::Block&, DB::Block const&, std::vector<DB::HashJoin::MapsTemplate<DB::RowRefList> const*, std::allocator<DB::HashJoin::MapsTemplate<DB::RowRefList> const*>> const&, bool) const @ 0x0000000011493867 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
12. DB::HashJoin::joinBlock(DB::Block&, std::shared_ptr<DB::ExtraBlock>&) @ 0x00000000113a40e9 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
13. DB::JoiningTransform::readExecute(DB::Chunk&) @ 0x0000000012a32b90 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
14. DB::JoiningTransform::transform(DB::Chunk&) @ 0x0000000012a31905 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
15. DB::JoiningTransform::work() @ 0x0000000012a313aa in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
16. DB::ExecutionThreadContext::executeTask() @ 0x0000000012810d13 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
17. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000012808ad0 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
18. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x000000001280965b in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
19. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0x000000000b317483 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
20. ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'()::operator()() @ 0x000000000b31a5bf in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
21. ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x000000000b314b07 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
22. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000b318a2e in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
23. ? @ 0x00007f6db212b609 in ?
24. ? @ 0x00007f6db2050133 in ?

模板方法joinRightColumns根据join key查找对应的右边表的行,找到行对应的row_number,然后通过调用appendFromBlock(block, row_numer)完成一行的JOIN结果。

appendFromBlock每次调用IColumn::insertFrom插入一行。这里有个问题:ColumnString的内容增长是一行一行地增长,在这个过程中涉及到多个reallocation内存重分配,并造成大量的内存之间的数据复制。

parital merge join算法

运行测试查询之前,设定max_joined_block_size_rows阈值。

set join_algorithm = 'parital_merge';
set max_joined_block_size_rows=10000; -- 阈值设小,需要内存会变少。
-- 第一次运行测试查询 set max_joined_block_size_rows=1000000; -- 阈值设大,需要内存可能会更多。
-- 第二次运行测试查询

测试结果是:

JOIN算法为parital merge join时,调小max_joined_block_size_rows的阈值,内存就不会超限。

当max_joined_block_size_rows比较大时(上例中第二次运行测试查询),会发生内存超限问题。

出错信息:

2024.01.16 17:04:30.084819 [ 3654586 ] {52653888-f266-443a-beba-348c66a88e60} <Error> TCPHandler: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 33.61 GiB (attempt to allocate chunk of 34359738368 bytes), maximum: 27.94 GiB.: While executing JoiningTransform. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

0. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/exception:134: std::exception::capture() @ 0x000000000b609a42 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
1. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/exception:112: std::exception::exception[abi:v15000]() @ 0x000000000b609a0d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
2. /home/ubuntu/depot/ch-pro/base/poco/Foundation/src/Exception.cpp:27: Poco::Exception::Exception(String const&, int) @ 0x000000002562ed40 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
3. /home/ubuntu/depot/ch-pro/src/Common/Exception.cpp:96: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000144166ee in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
4. /home/ubuntu/depot/ch-pro/src/Common/Exception.h:73: DB::Exception::Exception(String&&, int, bool) @ 0x000000000b5fba8a in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
5. /home/ubuntu/depot/ch-pro/src/Common/Exception.h:101: DB::Exception::Exception<char const*, char const*, String, long&, String, char const*, std::basic_string_view<char, std::char_traits<char>>>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<long&>::type, std::type_identity<String>::type, std::type_identity<char const*>::type, std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type>, char const*&&, char const*&&, String&&, long&, String&&, char const*&&, std::basic_string_view<char, std::char_traits<char>>&&) @ 0x0000000014435b3d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
6. /home/ubuntu/depot/ch-pro/src/Common/MemoryTracker.cpp:339: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x00000000144340b0 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
7. /home/ubuntu/depot/ch-pro/src/Common/MemoryTracker.cpp:397: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000014434402 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
8. /home/ubuntu/depot/ch-pro/src/Common/CurrentMemoryTracker.cpp:58: CurrentMemoryTracker::allocImpl(long, bool) @ 0x00000000143bb7db in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
9. /home/ubuntu/depot/ch-pro/src/Common/CurrentMemoryTracker.cpp:90: CurrentMemoryTracker::alloc(long) @ 0x00000000143bb9e1 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
10. /home/ubuntu/depot/ch-pro/src/Common/Allocator.h:161: Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x00000000143b5eba in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
11. /home/ubuntu/depot/ch-pro/src/Common/PODArray.h:165: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::realloc<>(unsigned long) @ 0x000000000b667142 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
12. /home/ubuntu/depot/ch-pro/src/Common/PODArray.h:0: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::resize<>(unsigned long) @ 0x000000000b666ee6 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
13. /home/ubuntu/depot/ch-pro/src/Columns/ColumnString.h:147: DB::ColumnString::insertFrom(DB::IColumn const&, unsigned long) @ 0x000000001f0eaaff in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
14. /home/ubuntu/depot/ch-pro/src/Columns/IColumn.h:179: DB::IColumn::insertManyFrom(DB::IColumn const&, unsigned long, unsigned long) @ 0x000000000b60192b in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
15. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:404: DB::(anonymous namespace)::copyRightRange(DB::Block const&, DB::Block const&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, unsigned long, unsigned long) @ 0x000000001e999757 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
16. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:432: bool DB::(anonymous namespace)::joinEquals<true>(DB::Block const&, DB::Block const&, DB::Block const&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, DB::MergeJoinEqualRange&, unsigned long) @ 0x000000001e997d86 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
17. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:892: bool DB::MergeJoin::leftJoin<true>(DB::MergeJoinCursor&, DB::Block const&, DB::MergeJoin::RightBlockInfo&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, unsigned long&) @ 0x000000001e9b2a7d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
18. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:793: void DB::MergeJoin::joinSortedBlock<false, true>(DB::Block&, std::shared_ptr<DB::ExtraBlock>&) @ 0x000000001e99b738 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
19. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:733: DB::MergeJoin::joinBlock(DB::Block&, std::shared_ptr<DB::ExtraBlock>&) @ 0x000000001e997512 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
20. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:206: DB::JoiningTransform::readExecute(DB::Chunk&) @ 0x00000000208ca1e3 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
21. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:191: DB::JoiningTransform::transform(DB::Chunk&) @ 0x00000000208c9f4e in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
22. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:125: DB::JoiningTransform::work() @ 0x00000000208c985d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
23. /home/ubuntu/depot/ch-pro/src/Processors/Executors/ExecutionThreadContext.cpp:47: DB::executeJob(DB::ExecutingGraph::Node*, DB::ReadProgressCallback*) @ 0x00000000203e5263 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
24. /home/ubuntu/depot/ch-pro/src/Processors/Executors/ExecutionThreadContext.cpp:95: DB::ExecutionThreadContext::executeTask() @ 0x00000000203e4fa0 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
25. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:251: DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000203ca041 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
26. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:217: DB::PipelineExecutor::executeSingleThread(unsigned long) @ 0x00000000203ca357 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
27. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:341: DB::PipelineExecutor::spawnThreads()::$_0::operator()() const @ 0x00000000203cb098 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
28. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/invoke.h:394: decltype(std::declval<DB::PipelineExecutor::spawnThreads()::$_0&>()()) std::__invoke[abi:v15000]<DB::PipelineExecutor::spawnThreads()::$_0&>(DB::PipelineExecutor::spawnThreads()::$_0&) @ 0x00000000203caff5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
29. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/invoke.h:480: void std::__invoke_void_return_wrapper<void, true>::__call<DB::PipelineExecutor::spawnThreads()::$_0&>(DB::PipelineExecutor::spawnThreads()::$_0&) @ 0x00000000203cafd5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
30. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/function.h:235: std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>::operator()[abi:v15000]() @ 0x00000000203cafb5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
31. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/function.h:716: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x00000000203caf80 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse

MergeJoin::leftJoin负责partial merge join的LEFT JOIN运算的实现,最终会调用::copyRightRange方法,复制右边表的内容。在此期间IColumn::insertFrom会被调用,这会涉及到调用realloc重分配内存。在这个点上可能会发生内存超限的异常。

LEFT ANY JOIN

LEFT ANY JOIN走不同代码分支,该代码分支忽略了setting配置max_joined_block_size_rows,因此即使调小了max_joined_block_size_rows,LEFT ANY JOIN依然会产生内存超限异常。

“auto”的内存问题

根据文档,设置join_algorithm='auto' 以启用自动切换JOIN算法的模式,ClickHouse先尝试用hash join算法,然后在内存超过阈值的情况下切换到partial merge join算法。阈值有两个:max_bytes_in_join和max_rows_in_join,指定hash join算法创建的hash table的最大的占用内存字节数和行数。两个阈值只要有一个超过就算是超过阈值。

但是在实践中发现,原先的尝试的hash join所消耗的内存仍然没有释放,导致在切换到partial merge join后在原本不应该超限的情况下仍然产生内存超限的异常。参考以下示例:

SELECT
t1.id AS id,
t2.value AS value
FROM big_columns_join.big_columns_table AS t1
LEFT JOIN big_columns_join.dim_table1 AS t2 ON t1.fk = t2.id
SETTINGS join_algorithm = 'auto', max_bytes_in_join = '15G'
FORMAT `Null`;

运行日志如下:

2024.01.19 11:31:40.580304 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> JoiningTransform: Before join block: 'id UInt64 UInt64(size = 0), fk UInt64 UInt64(size = 0)'
2024.01.19 11:31:40.580692 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> JoiningTransform: After join block: 'id UInt64 UInt64(size = 0), fk UInt64 UInt64(size = 0), value String String(size = 0)'
2024.01.19 11:31:40.680382 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 1.01 GiB.
2024.01.19 11:31:40.753297 [ 3654852 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 2.01 GiB.
2024.01.19 11:31:40.833103 [ 3654862 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 3.01 GiB.
2024.01.19 11:31:40.905190 [ 3654854 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 4.01 GiB.
2024.01.19 11:31:41.047932 [ 3654900 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 6.01 GiB.
2024.01.19 11:31:41.117219 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 7.00 GiB.
2024.01.19 11:31:41.198586 [ 3654851 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 8.01 GiB.
2024.01.19 11:31:41.272147 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 9.00 GiB.
2024.01.19 11:31:41.342679 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 10.00 GiB.
2024.01.19 11:31:41.416512 [ 3654851 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 11.01 GiB.
2024.01.19 11:31:41.486753 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 12.01 GiB.
2024.01.19 11:31:41.560137 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 13.00 GiB.
2024.01.19 11:31:41.630755 [ 3654862 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 14.01 GiB.
2024.01.19 11:31:41.633781 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Trace> HashJoin: (0x7f8299383918) Join data is being released, 15001390080 bytes and 8940 rows in hash table
2024.01.19 11:31:41.666781 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MergeJoin: Joining keys: left [fk], right [t2.id]
2024.01.19 11:31:41.666943 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Trace> HashJoin: (0x7f8299383918) Join data has been already released
2024.01.19 11:31:41.764950 [ 3654852 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 15.00 GiB.
2024.01.19 11:31:42.024590 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 16.14 GiB.
2024.01.19 11:31:42.221853 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 17.64 GiB.
2024.01.19 11:31:42.481914 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 19.64 GiB.
2024.01.19 11:31:43.001865 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 23.64 GiB.
2024.01.19 11:31:44.610495 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Error> executeQuery: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 31.64 GiB (attempt to allocate chunk of 17179869184 bytes), maximum: 27.94 GiB.: While executing MergeSortingTransform: While executing FillingRightJoinSide. (MEMORY_LIMIT_EXCEEDED) (version 23.8.3.1.alex-build-debug) (from [::ffff:127.0.0.1]:33076) (in query: select t1.id as id, t2.value as `value` from `big_columns_join`.`big_columns_table` as t1 left join `big_columns_join`.`dim_table1` as t2 on t1.fk = t2.id settings join_algorithm='auto',max_bytes_in_join=15000000000 format Null;), Stack trace (when copying this message, always include the lines below):

从日志可以看到,首先采用了hash join,然后内存使用量一路飙升到15GiB的时候切换到了partial merge join(日志第17行),原先hash join所使用的15GiB内存将被释放(日志第16行)。最后hash join的内存被释放(日志第18行),但是实际内存使用量并没有减少,仍然从15GiB开始继续增长。留给partial merge join的可用内存不多,因此很快产生内存超限MEMORY_LIMIT_EXCEEDED异常。

目前找到的解决办法只能是调小阈值,但是这样导致原本可以使用hash join的情况下却被迫切换到相对慢速的partial merge join。影响整体性能。

ClickHouse中“大列”造成的JOIN的内存超限问题的更多相关文章

  1. hive中与hbase外部表join时内存溢出(hive处理mapjoin的优化器机制)

    与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select *  from ...

  2. ClickHouse中的循环复制集群拓扑

    关系型数据库,但千万级表关联数据库基本上不太可能做到秒出:考虑过Sharding,但数据量大, 各种成本都很高:热数据存储到ElasticSearch,但无法跨索引关联,导致不得不做宽表, 因为权限, ...

  3. mysql explain中的列

    参考:<高性能mysql>附录D EXPLAIN MySql将Select查询分为简单和复杂类型,复杂类型分为3大类:简单子查询,所谓的派生表(在派生表的子查询),以及UNION查询. 列 ...

  4. Clickhouse 在大数据分析平台 - 留存分析上的应用

    导语 | 本文实践了对于千万级别的用户,操作总数达万级别,每日几十亿操作流水的留存分析工具秒级别查询的数据构建方案.同时,除了留存分析,对于用户群分析,事件分析等也可以尝试用此方案来解决. 文章作者: ...

  5. 在SQL2008查找某数据库中的列是否存在某个值

    在SQL2008查找某数据库中的列是否存在某个值 --SQL2008查找某数据库中的列是否存在某个值 create proc spFind_Column_In_DB ( @type int,--类型: ...

  6. 转载: 黄聪:C#中 Excel列字母与数字的转换

    using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.T ...

  7. Oracle中对列加密的方法

    Oracle中对列加密的方法 2011-12-22 17:21:13 分类: Linux Oracle支持多种列加密方式: 1,透明数据加密(TDE):create table encrypt_col ...

  8. TSQL:A表字段与B表中的关联,关联条件中一列是随机关联的实现方式

    A表字段与B表中的关联,关联条件中一列是随机关联的实现方式 create table test( rsrp string, rsrq string, tkey string, distan strin ...

  9. [No0000107]C#中 Excel列字母与数字的转换

    using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.T ...

  10. OAF TABLE中第一列添加事件不生效

    我遇到一个比较诡异的现象 在TABLE中,我在TABLE的第一列添加了一个MessageCheckBox,并为其设置全局刷新的FireAction事件selection, 但是点击勾选框按钮之后,事件 ...

随机推荐

  1. MyBatis的执行流程分析

    核心流程介绍 我们都知道 MyBatis 是对 JDBC 的封装,不管怎样,JDBC 的那一套还是不会变的,只是做了抽象.封装.归类等:所以我们顺着 JDBC 的思路来说下MyBatis的执行流程,J ...

  2. MySQL运维8-Mycat范围分表

    一.范围分片 根据指定的字段及其配置的范围与数据节点的对应情况,来决定该数据属于哪一个分片. 说明1:范围分片会提前提供一个分片的范围默认是0-500万是一个分片,500万-1000万是一个分片,10 ...

  3. Codeforces-470 div2 C题

    C. Producing Snow time limit per test 1 second memory limit per test 256 megabytes input standard in ...

  4. dtd的三种引入方式

  5. 安装Zabbix客户端zabbix-agent2

    Ubuntu系统:下载链接:http://mirrors.aliyun.com/zabbix/zabbix/6.0/ubuntu/pool/main/z/zabbix/?spm=a2c6h.25603 ...

  6. Unix 文件权限位

    简介 在 Unix 或者类 Unix 文件系统中,每个文件(或者目录,目录也被视为文件)都有自己的所属对象和一个与之相关的 Unix 用户组关联,可以通过设置对应的文件的权限标记指定对于所属用户.所属 ...

  7. 快速上手Linux核心命令(二):关机、重启

    @ 目录 前言 一.在命令行下查看命令帮助 man 命令 --help 命令 info命令 二.Linux关机.重启 shutdown reboot.halt.poweroff 前言 从这篇开始,我们 ...

  8. 第十二部分_强悍的sed

    一.文件编辑器知多少 Windows系统​ ​ UltraEdit等 Linux系统 ​ vim vi gedit nano emacs 二.强悍的sed介绍 1. sed用来做啥? sed是Stre ...

  9. 物联网SIM卡和SIM卡真的不是一回事

    办卡吗,兄弟? 物联网卡?相信大家第一反应都是一愣.大家听过银行卡.电话SIM卡.会员卡-等等,很多人可能都是第一次听说物联网卡.那它到底是个什么东东?它能干什么呢?今天就带大家一探究竟. 那在物联网 ...

  10. 一种DWS迁移Oracle的CONNECT BY语法的方案

    摘要:本文提供一种GaussDB DWS迁移CONNECT BY语法方案. 本文分享自华为云社区<GaussDB(DWS)迁移 - oracle兼容 -- CONNECT BY迁移>,作者 ...