0%

Hadoop-20-Hive案例实战

第一题

现有数据score_info.txt文件,文件内容如下所示:

1
2
3
4
5
6
7
8
9
1001    01 90
1001 02 90
1001 03 90
1002 01 85
1002 02 85
1002 03 70
1003 01 70
1003 02 70
1003 03 85

编写HiveQL代码,完成以下问题:

  1. 创建score_info表,字段名分别为uidsubject_idscore

    1
    2
    3
    4
    5
    6
    7
    8
    create table if not exists score_info(
    uid string,
    subject_id string,
    score int
    )
    row format delimited
    fields terminated by '\t'
    lines terminated by '\n';
  2. 将数据导入score_info表。

    1
    2
    3
    load data local 
    inpath '/opt/module/hive-3.1.2/datas/score_info.txt'
    overwrite into table score_info;
  3. 求出每门学科的平均成绩。

    1
    2
    3
    4
    5
    6
    select uid,
    score,
    subject_id,
    avg(score) over (partition by subject_id) avg_score
    from
    score_info
  4. 找出所有科目成绩都大于某个学科平均成绩的学生。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    select
    t2.uid
    from
    (
    select
    t1.uid,
    if(score > avg_score, 0, 1) flag
    from(
    select uid,
    score,
    subject_id,
    avg(score) over (partition by subject_id) avg_score
    from
    score_info
    ) t1
    ) t2
    group by uid
    having sum(t2.flag) = 0;

第二题

现有数据action.txt文件,文件内容如下所示:

1
2
3
4
5
6
7
8
u01 2017/1/21  5
u02 2017/1/23 6
u03 2017/1/22 8
u04 2017/1/20 3
u01 2017/1/23 6
u01 2017/2/21 8
u02 2017/1/23 6
u01 2017/2/22 4

编写HiveQL代码,完成以下问题:

  1. 创建action表,字段名分别为userIdvisitDatevisitCount

    1
    2
    3
    4
    5
    6
    7
    8
    create table if not exists action(
    userid string,
    visitDate string,
    visitCount int
    )
    row format delimited
    fields terminated by '\t'
    lines terminated by '\n';
  2. 将数据导入action表。

    1
    2
    3
    load data local 
    inpath '/opt/module/hive-3.1.2/datas/action.txt'
    into table action;
  3. 统计出每个用户的月累计访问次数。

    1
    2
    3
    4
    5
    select
    userid,
    substring(visitDate, 0, 6) visitMonth,
    sum(visitCount) over (partition by userid, month(visitDate)) sumCount
    from action;
  4. 统计出每个用户的逐月累计访问次数。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    select
    userid,
    visitMonth,
    max(sumCount)
    from(
    select
    uid,
    substring(visitDate, 0, 6) visitMonth,
    sum(visitCount) over (partition by userid, month(visitDate) order by visitDate) sumCount
    from action) t1
    group by userid, visitMonth;
-------------本文结束感谢您的阅读-------------