经典的SparkSQLHive-SQLMySQL面试-练习题(一)

经典的SparkSQLHive-SQLMySQL面试-练习题(一)

2023年6月23日发(作者:)

经典的SparkSQLHive-SQLMySQL⾯试-练习题(⼀)第⼀题需求:已知⼀个表order,有如下字段:date_time,order_id,user_id,amount。数据样例:2020-10-10,1003003981,00000001,1000,请⽤sql进⾏统计:(1)2019年每个⽉的订单数、⽤户数、总成交⾦额。(2)2020年10⽉的新客数(指在2020年10⽉才有第⼀笔订单)实现:(1)SELECT _month,count(_id) AS order_cnt,count(DISTINCT _id) AS user_cnt,sum(amount) AS total_amountFROM (SELECT order_id, user_id, amount,date_format(date_time,'yyyy-MM') year_monthFROM test_3WHERE date_format(date_time,'yyyy') = '2019') t1GROUP BY _month;(2)SELECT count(user_id)FROM test_3GROUP BY user_idHAVING date_format(min(date_time),'yyyy-MM')='2020-10';第⼆题需求:存在如下客户访问商铺的数据,访问⽇志存储的表名为user_visit,访客的⽤户id为user_id,被访问的店铺名称为shop_name。数据如下:+--------+-----------+|user_id | shop_name|+--------+-----------+| u1|beautiful_a|| u2|beautiful_b|| u1|beautiful_b|| u3|beautiful_c|| u4|beautiful_b|| u1|beautiful_a|| u5|beautiful_b|| u4|beautiful_b|| u6|beautiful_c|| u1|beautiful_b|| u2|beautiful_a|| u5|beautiful_a|+--------+-----------+实现:(1)SELECT shop_name,count(*) uvFROM (SELECT user_id, shop_nameFROM test__visitGROUP BY user_id, shop_name) tGROUP BY shop_name as t;(2)

SELECT _name, _id, M (SELECT t1.*, row_number() over(partition BY _name ORDER BY DESC) rankFROM (SELECT user_id, shop_name,count(*) AS cntFROM test__visitGROUP BY user_id, shop_name) t1 ) t2WHERE rank < 4;

第三题需求:有如下的⽤户访问数据+-------+----------+-----------+|user_id|visit_date|visit_count|+-------+----------+-----------+| u01| 2017/1/21| 5|| u02| 2017/1/23| 6|| u03| 2017/1/22| 8|| u04| 2017/1/20| 3|| u01| 2017/1/23| 6|| u01| 2017/2/21| 8|| u02| 2017/1/23| 6|| u01| 2017/2/22| 4|+-------+----------+-----------+要求使⽤SQL统计出每个⽤户的累积访问次数,如下表所⽰:+-------+-----------+------------------+---------------+|user_id|visit_month|month_total_visit_cnt|total_visit_cnt|+-------+-----------+------------------+---------------+| u01| 2017-01| 11| 11|| u01| 2017-02| 12| 23|| u02| 2017-01| 12| 12|| u03| 2017-01| 8| 8|| u04| 2017-01| 3| 3|+-------+-----------+------------------+---------------+实现:SELECT _id, _month, month_total_visit_cnt,sum(month_total_visit_cnt) over (partition BY user_id ORDER BY visit_month) AS total_visit_cntFROM (SELECT user_id, visit_month,sum(visit_count) AS month_total_visit_cntFROM (SELECT user_id,date_format(regexp_replace(visit_date,'/','-'),'yyyy-MM') AS visit_month, visit_count

FROM test_1) t1GROUP BY user_id, visit_month) t2ORDER BY _id, _month;第四题需求:表user(user_id,name,age)记录⽤户信息,表view_record(user_id,movie_name)记录⽤户观影信息,请根据年龄段(每10岁为⼀个年龄段,70以上的单独作为⼀个年龄段)观看电影的次数进⾏排序?实现:SELECT _group,sum() as view_cntFROM (SELECT user_id,count(*) cntFROM test__recordGROUP BY user_id) t1JOIN (SELECT user_id,CASE WHEN age <= 10 AND age > 0 THEN '0-10'WHEN age <= 20 AND age > 10 THEN '10-20'WHEN age >20 AND age <=30 THEN '20-30'WHEN age >30 AND age <=40 THEN '30-40'WHEN age >40 AND age <=50 THEN '40-50'WHEN age >50 AND age <=60 THEN '50-60'WHEN age >60 AND age <=70 THEN '60-70'ELSE '70以上' END as age_groupFROM test_) t2 ON _id = _id

GROUP BY _group

ORDER BY _group;第五题需求:有⽇志如下,请⽤SQL求得所有⽤户和活跃⽤户的总数及平均年龄。(活跃⽤户指连续两天都有访问记录的⽤户)⽇期 ⽤户 年龄+----------+-------+---+| date_time|user_id|age|+----------+-------+---+|2019-02-12| 2| 19||2019-02-11| 1| 23||2019-02-11| 3| 39||2019-02-11| 1| 23||2019-02-11| 3| 39||2019-02-13| 1| 23||2019-02-15| 2| 19||2019-02-11| 2| 19||2019-02-11| 1| 23||2019-02-16| 2| 19|+----------+-------+---+实现:SELECT sum(total_user_cnt) total_user_cnt, sum(total_user_avg_age) total_user_avg_age, sum(two_days_cnt) two_days_cnt, sum(avg_age) avg_ageFROM (SELECT 0 total_user_cnt, 0 total_user_avg_age, count(*) AS two_days_cnt, cast(sum(age) / count(*) AS decimal(5,2)) AS avg_age FROM (SELECT user_id, max(age) age FROM (SELECT user_id, max(age) age FROM (SELECT user_id, age, date_sub(date_time,rank) flag FROM (SELECT date_time, user_id, max(age) age, row_number() over(PARTITION BY user_id ORDER BY date_time) rank FROM test_5 GROUP BY date_time,user_id) t1 ) t2 GROUP BY user_id, flag HAVING count(*) >=2) t3 GROUP BY user_id) t4 UNION ALL

SELECT count(*) total_user_cnt, cast(sum(age) /count(*) AS decimal(5,2)) total_user_avg_age, 0 two_days_cnt, 0 avg_age FROM (SELECT user_id, max(age) age FROM test_5 GROUP BY user_id) t5) t6;第六题需求:请⽤sql写出所有⽤户中在2020年10⽉份第⼀次购买商品的⾦额,表order字段:购买⽤户:user_id,⾦额:money,购买时间:pay_time(格式:2017-10-01),订单id:order_id实现:SELECT user_id, pay_time, money, order_idFROM (SELECT user_id, money, pay_time, order_id,

row_number() over (PARTITION BY user_id ORDER BY pay_time) rank FROM test_

WHERE date_format(pay_time,'yyyy-MM') = '2020-10') t

WHERE rank = 1;第七题需求:有⼀个账号表如下,请写出SQL语句,查询各⾃区组的money排名前3的账号dist_id string '区组id',account string '账号',gold_coin int '⾦币'实现:SELECT dist_id, account, gold_coinFROM (SELECT dist_id, account, gold_coin, row_number () over (PARTITION BY dist_id ORDER BY gold_coin DESC) rank FROM test_9) tWHERE rank <= 3;第⼋题需求:充值⽇志表credit_log,字段如下:`dist_id` int '区组id',`account` string '账号',`money` int '充值⾦额',`create_time` string '订单时间'请写出SQL语句,查询充值⽇志表2020年08⽉08号每个区组下充值额最⼤的账号,要求结果:区组id,账号,⾦额,充值时间

实现:WITH temp AS (SELECT dist_id,account,sum(`money`) sum_moneyFROM test_8WHERE date_format(create_time,'yyyy-MM-dd') = '2020-08-08'GROUP BY dist_id,account)SELECT _id, t, _moneyFROM (SELECT _id, t, _money,rank() over(partition BY _idORDER BY _money DESC) ranksFROM TEMP) t1WHERE ranks = 1;第九题需求:有⼀个线上服务器访问⽇志格式如下(⽤sql答题)时间 接⼝ IP+----------------------------------------+------------+| date_time |interface |ip |+-------------------+--------------------+------------+|2016-11-09 15:22:05|/request/user/logout| 110.32.5.23||2020-09-28 14:23:1 |/api_v1/user/detail | 57.2.1.16 ||2020-09-28 14:59:40|/api_v2/read/buy | 172.6.5.166|+-------------------+--------------------+------------+求2020年9⽉28号下午14点(14-15点),访问/api_v1/user/detail接⼝的top10的ip地址实现:SELECT ip,count(*) AS countFROM test_7WHERE date_format(date_time,'yyyy-MM-dd HH') >= '2020-09-28 14'AND date_format(date_time,'yyyy-MM-dd HH') < '2020-09-28 15'AND interface='/api_v1/user/detail'GROUP BY ipORDER BY count descLIMIT 10;

发布者:admin,转转请注明出处:http://www.yc00.com/web/1687518229a16422.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信