【clickhouse】Clickhouse查询之ArrayJOIN

【clickhouse】Clickhouse查询之ArrayJOIN

2023年6月27日发(作者:)

【clickhouse】Clickhouse查询之ArrayJOIN⽂章⽬录1.概述转载:Array JOIN ⼦句允许在数据表的内部,与数组或者嵌套的字段进⾏JOIN操作,从⽽将⼀⾏数据变多⾏。适⽤于⾏转列操作。CREATE TABLE ( `province` String, `arr_city` Array(String), `arr_rank` Array(UInt8))ENGINE = Log

Ok.

0 rows in set. Elapsed: 0.012 sec.

insert into city values('hubei',['wuhan','xiangyang'],[1,2]),('guangdong',['guangzhou','shenzhen','zhuhai'],[1,2,3]);

insert into city values('beijing',[],[10]),('shanghai',[],[20]);

查看原始数据:Clickhouse> select * from city;

SELECT *FROM city

┌─province──┬─arr_city──────────────────────────┬─arr_rank─┐│ hubei │ ['wuhan','xiangyang'] │ [1,2] ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ [1,2,3] ││ beijing │ [] │ [10] ││ shanghai │ [] │ [20] ││ hongkong │ [] │ [] │└───────────┴───────────────────────────────────┴──────────┘

5 rows in set. Elapsed: 0.003 sec.

ARRAY JOIN:

Clickhouse> select province,city from city array join arr_city as city;

SELECT

province, cityFROM cityARRAY JOIN arr_city AS city

┌─province──┬─city──────┐│ hubei │ wuhan ││ hubei │ xiangyang ││ guangdong │ guangzhou ││ guangdong │ shenzhen ││ guangdong │ zhuhai │└───────────┴───────────┘

5 rows in set. Elapsed: 0.002 sec.

Clickhouse> select province,arr_city,city from city array join arr_city as city;

SELECT

province, arr_city, cityFROM cityARRAY JOIN arr_city AS city

┌─province──┬─arr_city──────────────────────────┬─city──────┐│ hubei │ ['wuhan','xiangyang'] │ wuhan ││ hubei │ ['wuhan','xiangyang'] │ xiangyang ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ guangzhou ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ shenzhen ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ zhuhai │└───────────┴───────────────────────────────────┴───────────┘

5 rows in set. Elapsed: 0.002 sec.

2. LEFT Array JOIN :Clickhouse> select province,arr_city,city from city LEFT array join arr_city as city FORMAT PrettyCompactMonoBlock;

SELECT

province, arr_city, cityFROM cityLEFT ARRAY JOIN arr_city AS cityFORMAT PrettyCompactMonoBlock

┌─province──┬─arr_city──────────────────────────┬─city──────┐│ hubei │ ['wuhan','xiangyang'] │ wuhan ││ hubei │ ['wuhan','xiangyang'] │ xiangyang ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ guangzhou ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ shenzhen ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ zhuhai ││ beijing │ [] │ ││ shanghai │ [] │ ││ hongkong │ [] │ │└───────────┴───────────────────────────────────┴───────────┘

8 rows in set. Elapsed: 0.010 sec. .

当同时对数组字段进⾏ARRAY JOIN 操作时候,查询的计算逻辑是按⾏合并⽽不是产⽣笛卡尔积: Clickhouse> select province,arr_city,arr_rank,v ,arrayMap(x->x*x,arr_rank) mapv,v1 from city c left array join arr_rank as v,mapv as v1 FORMAT PrettyCompactMonoBlock;

SELECT

province,

arr_city,

arr_rank,

v,

arrayMap(x -> (x * x), arr_rank) AS mapv,

v1FROM city AS cLEFT ARRAY JOIN

arr_rank AS v,

mapv AS v1FORMAT PrettyCompactMonoBlock

┌─province──┬─arr_city──────────────────────────┬─arr_rank─┬──v─┬─mapv────┬──v1─┐│ hubei │ ['wuhan','xiangyang'] │ [1,2] │ 1 │ [1,4] │ 1 ││ hubei │ ['wuhan','xiangyang'] │ [1,2] │ 2 │ [1,4] │ 4 ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ [1,2,3] │ 1 │ [1,4,9] │ 1 ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ [1,2,3] │ 2 │ [1,4,9] │ 4 ││ guangdong │ ['guangzhou','shenzhen','zhuhai'] │ [1,2,3] │ 3 │ [1,4,9] │ 9 ││ beijing │ [] │ [10] │ 10 │ [100] │ 100 ││ shanghai │ [] │ [20] │ 20 │ [400] │ 400 ││ hongkong │ [] │ [] │ 0 │ [] │ 0 │└───────────┴───────────────────────────────────┴──────────┴────┴─────────┴─────┘

8 rows in set. Elapsed: 0.004 sec.

---

Clickhouse> SELECT province , arr_rank, a, num, mapped:-] FROM city:-] ARRAY JOIN arr_rank AS a, arrayEnumerate(arr_rank) AS num, arrayMap(x -> x + 1, arr_rank) AS mapped;

SELECT

province,

arr_rank,

a,

num,

mappedFROM cityARRAY JOIN

arr_rank AS a,

arrayEnumerate(arr_rank) AS num,

arrayMap(x -> (x + 1), arr_rank) AS mapped

┌─province──┬─arr_rank─┬──a─┬─num─┬─mapped─┐│ hubei │ [1,2] │ 1 │ 1 │ 2 ││ hubei │ [1,2] │ 2 │ 2 │ 3 ││ guangdong │ [1,2,3] │ 1 │ 1 │ 2 ││ guangdong │ [1,2,3] │ 2 │ 2 │ 3 ││ guangdong │ [1,2,3] │ 3 │ 3 │ 4 ││ beijing │ [10] │ 10 │ 1 │ 11 ││ shanghai │ [20] │ 20 │ 1 │ 21 │└───────────┴──────────┴────┴─────┴────────┘

7 rows in set. Elapsed: 0.005 sec.

arr_rank 和mapped 数组并没有产⽣笛卡尔积。嵌套数据类型本质是数组,ARRAY JOIN 也⽀持嵌套数据类型。嵌套表创建:Clickhouse> create table t_nest(province String, addr Nested(no UInt8,city String,area String)) engine=MergeTree() order by province;

CREATE TABLE t_nest( `province` String, `addr` Nested( no UInt8, city String, area String))ENGINE = MergeTree()ORDER BY province

Clickhouse> desc t_nest;

DESCRIBE TABLE t_nest

┌─name──────┬─type──────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐│ province │ String │ │ │ │ │ ││ │ Array(UInt8) │ │ │ │ │ ││ │ Array(String) │ │ │ │ │ ││ │ Array(String) │ │ │ │ │ │└───────────┴───────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

4 rows in set. Elapsed: 0.003 sec.

Clickhouse> insert into t_nest values('Hubei',[1,2,3],['wuhan','wuhan','wuhan'],['wuchang','hankou','hanyang']);

INSERT INTO t_nest VALUES

Ok.

1 rows in set. Elapsed: 0.005 sec.

Clickhouse> insert into t_nest values('Guangdong',[10,20,30],['guangzhou','shenzhen','shenzhen'],['baiyun','nanshan','baoan']);

INSERT INTO t_nest VALUES

Ok.

1 rows in set. Elapsed: 0.002 sec.

Clickhouse> insert into t_nest values('Shanghai',[],[],[]);

INSERT INTO t_nest VALUES

ouse> select * from t_nest FORMAT PrettyCompactMonoBlock;

SELECT *FROM t_nestFORMAT PrettyCompactMonoBlock

┌─province──┬─────┬────────────────────────────┬───────────────────────┐│ Guangdong │ [10,20,30] │ ['guangzhou','shenzhen','shenzhen'] │ ['baiyun','nanshan','baoan'] ││ Hubei │ [1,2,3] │ ['wuhan','wuhan','wuhan'] │ ['wuchang','hankou','hanyang'] ││ Shanghai │ [] │ [] │ [] │└───────────┴────────────┴─────────────────────────────────────┴────────────────────────────────┘

3 rows in set. Elapsed: 0.003 sec.

对嵌套类型数据的访问 : JOIN 可以直接使⽤字段列名:Clickhouse> select province,,, from t_nest array join addr FORMAT PrettyCompactMonoBlock;

SELECT

province, , , OM t_nestARRAY JOIN addrFORMAT PrettyCompactMonoBlock

┌─province──┬──┬──┬──┐│ Hubei │ 1 │ wuhan │ wuchang ││ Hubei │ 2 │ wuhan │ hankou ││ Hubei │ 3 │ wuhan │ hanyang ││ Guangdong │ 10 │ guangzhou │ baiyun ││ Guangdong │ 20 │ shenzhen │ nanshan ││ Guangdong │ 30 │ shenzhen │ baoan │└───────────┴─────────┴───────────┴───────────┘

6 rows in set. Elapsed: 0.005 sec.

2.也可以使⽤点访问符号的形式:Clickhouse> select province,,, from t_nest array join ,, FORMAT PrettyCompactMonoBlock;

SELECT

province, , , OM t_nestARRAY JOIN

, , RMAT PrettyCompactMonoBlock

┌─province──┬──┬──┬──┐│ Hubei │ 1 │ wuhan │ wuchang ││ Hubei │ 2 │ wuhan │ hankou ││ Hubei │ 3 │ wuhan │ hanyang ││ Guangdong │ 10 │ guangzhou │ baiyun ││ Guangdong │ 20 │ shenzhen │ nanshan ││ Guangdong │ 30 │ shenzhen │ baoan │└───────────┴─────────┴───────────┴───────────┘

6 rows in set. Elapsed: 0.004 sec.

可以看到少了⼀条记录:Clickhouse> select province,,, from t_nest left array join ,, FORMAT PrettyCompactMonoBlock;

SELECT

province, , , OM t_nestLEFT ARRAY JOIN

, , RMAT PrettyCompactMonoBlock

┌─province──┬──┬──┬──┐│ Shanghai │ 0 │ │ ││ Guangdong │ 10 │ guangzhou │ baiyun ││ Guangdong │ 20 │ shenzhen │ nanshan ││ Guangdong │ 30 │ shenzhen │ baoan ││ Hubei │ 1 │ wuhan │ wuchang ││ Hubei │ 2 │ wuhan │ hankou ││ Hubei │ 3 │ wuhan │ hanyang │└───────────┴─────────┴───────────┴───────────┘

7 rows in set. Elapsed: 0.003 sec.

3.嵌套类型也⽀持ARRAY JOIN部分嵌套字段:Clickhouse> select province,,, from t_nest left array join FORMAT PrettyCompactMonoBlock;

SELECT

province, , , OM t_nestLEFT ARRAY JOIN AT PrettyCompactMonoBlock

┌─province──┬──┬────────────────────────────┬───────────────────────┐│ Shanghai │ 0 │ [] │ [] ││ Hubei │ 1 │ ['wuhan','wuhan','wuhan'] │ ['wuchang','hankou','hanyang'] ││ Hubei │ 2 │ ['wuhan','wuhan','wuhan'] │ ['wuchang','hankou','hanyang'] ││ Hubei │ 3 │ ['wuhan','wuhan','wuhan'] │ ['wuchang','hankou','hanyang'] ││ Guangdong │ 10 │ ['guangzhou','shenzhen','shenzhen'] │ ['baiyun','nanshan','baoan'] ││ Guangdong │ 20 │ ['guangzhou','shenzhen','shenzhen'] │ ['baiyun','nanshan','baoan'] ││ Guangdong │ 30 │ ['guangzhou','shenzhen','shenzhen'] │ ['baiyun','nanshan','baoan'] │└───────────┴─────────┴─────────────────────────────────────┴────────────────────────────────┘

7 rows in set. Elapsed: 0.002 sec.

在此种情形下只有array join的数组才会被展开。4.在嵌套类型时候可以通过别名的⽅式访问原始数组:Clickhouse> select province,,,,,, from t_nest left array join addr as a FORMAT PrettyCompactMonoBlock;

SELECT

province, , , , , , OM t_nestLEFT ARRAY JOIN addr AS aFORMAT PrettyCompactMonoBlock

┌─province──┬─────┬────────────────────────────┬───────────────────────┬──┬─────┬───┐│ Guangdong │ [10,20,30] │ ['guangzhou','shenzhen','shenzhen'] │ ['baiyun','nanshan','baoan'] │ 10 │ guangzhou │ baiyun ││ Guangdong │ [10,20,30] │ ['guangzhou','shenzhen','shenzhen'] │ ['baiyun','nanshan','baoan'] │ 20 │ shenzhen │ nanshan ││ Guangdong │ [10,20,30] │ ['guangzhou','shenzhen','shenzhen'] │ ['baiyun','nanshan','baoan'] │ 30 │ shenzhen │ baoan ││ Shanghai │ [] │ [] │ [] │ 0 │ │ ││ Hubei │ [1,2,3] │ ['wuhan','wuhan','wuhan'] │ ['wuchang','hankou','hanyang'] │ 1 │ wuhan │ wuchang ││ Hubei │ [1,2,3] │ ['wuhan','wuhan','wuhan'] │ ['wuchang','hankou','hanyang'] │ 2 │ wuhan │ hankou ││ Hubei │ [1,2,3] │ ['wuhan','wuhan','wuhan'] │ ['wuchang','hankou','hanyang'] │ 3 │ wuhan │ hanyang │└───────────┴────────────┴─────────────────────────────────────┴────────────────────────────────┴──────┴───────────┴─────────┘

7 rows in set. Elapsed: 0.004 sec.

发布者:admin,转转请注明出处:http://www.yc00.com/web/1687816451a47650.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信