Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
老师
sm-yw-dianshang2
Commits
797b0710
Commit
797b0710
authored
2 years ago
by
老师
Browse files
Options
Download
Email Patches
Plain Diff
123
parent
4502ad4b
main
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
hadoop/hive.txt
+15
-3
hadoop/hive.txt
with
15 additions
and
3 deletions
+15
-3
hadoop/hive.txt
+
15
-
3
View file @
797b0710
创建数据库 create database db_hive;
use db_hive;
//建表 根据导入数据通过,进行行分解 匹配进入字段
create table us_counties(cv_date
string
,county string,state string,cases int,deaths int) row format delimited fields terminated by ',';
create table us_counties(cv_date
date
,county string,state string,cases int,deaths int) row format delimited fields terminated by ',';
//管理namdenode 在安全模式 一般强行退出mapreduce会发生
in/hdfs dfsadmin -safemode leave
导入本地文件进入hive
load data local inpath '/home/hadoop/us-counties.csv' into table db_hive.us_counties;
create table user_behavtior(uid string,gid string,behavior string,time string) row format delimited fields terminated by ',';
load data local inpath '/home/hadoop/Processed_UserBehavior.csv' into table db_hive.user_behavtior;
...
...
@@ -38,4 +37,17 @@ db_hive.user_behavtior u
where u.behavior='buy'
group by u.uid
sort by fnum desc
limit 0,10
\ No newline at end of file
limit 0,10
create table us_counties(cv_date date,county string,state string,cases int,deaths int) row format delimited fields terminated by ',';
1) 统计美国截止每日的累计确诊人数和累计死亡人数。做法是以date作为分组字段,对cases和deaths字段进行汇总统计。
select cv_date,sum(cases),sum(deaths)
from db_hive.us_counties
group by cv_date
3) 统计截止5.19日,美国各州的累计确诊人数和死亡人数。首先筛选出5.19日的数据,然后以state作为分组字段,对cases和deaths字段进行汇总统计。
4) 统计截止5.19日,美国确诊人数最多的十个州。对3)的结果DataFrame注册临时表,然后按确诊人数降序排列,并取前10个州。
5) 统计截止5.19日,美国死亡人数最多的十个州。对3)的结果DataFrame注册临时表,然后按死亡人数降序排列,并取前10个州。
6) 统计截止5.19日,美国确诊人数最少的十个州。对3)的结果DataFrame注册临时表,然后按确诊人数升序排列,并取前10个州。
7) 统计截止5.19日,美国死亡人数最少的十个州。对3)的结果DataFrame注册临时表,然后按死亡人数升序排列,并取前10个州
8) 统计截止5.19日,全美和各州的病死率。病死率 = 死亡数/确诊数,对3)的结果DataFrame注册临时表,然后按公式计算。
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment
Menu
Projects
Groups
Snippets
Help