【ClickHouse】4:clickhouse基本操作二 建库建表导数据
背景介绍:
有三台CentOS7服务器安装了ClickHouse
HostName | IP | 安装程序 | 程序端口 |
centf8118.sharding1.db | 192.168.81.18 | clickhouse-server,clickhouse-client | 9000 |
centf8119.sharding2.db | 192.168.81.19 | clickhouse-server,clickhouse-client | 9000 |
centf8120.sharding3.db | 192.168.81.20 | clickhouse-server,clickhouse-client | 9000 |
1:创建数据库
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)]
CREATE DATABASE testdb; //创建数据库
DROP DATABASE testdb; //删除数据库
2:建表
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
...
) ENGINE = engine
CREATE TABLE test_table(
province String,
province_name String,
create_date date
) ENGINE = MergeTree(create_date, (province), 8192); 如果直接这样执行会报错,有两种方法解决:
1: 在每一行后面加右斜杠,比如:
CREATE TABLE test_table( \
province String, \
province_name String, \
create_date date \
) ENGINE = MergeTree(create_date, (province), 8192); 2: 在登录的时候加 -m参数支持多行模式,比如:
clickhouse-client -m
ENGINE:是表的引擎类型,最常用的MergeTree。还有一个Log引擎也是比较常用。MergeTree要求有一个日期字段,还有主键。Log没有这个限制。
create_date:是表的日期字段,一个表必须要有一个日期字段。
province:是表的主键,主键可以有多个字段,每个字段用逗号分隔
8192:是索引粒度,用默认值8192即可。
3:导入数据
3.1:普通的CSV文件导入。
cat > test_table.csv << EOF
WA,WA_NAME,2020-08-25
CA,CA_NAME,2020-09-25
OR,OR_NAME,2020-10-25
EOF
–导数
clickhouse-client --query "INSERT INTO testdb.test_table FORMAT CSV" < test_table.csv;
–或者用管道的方式
cat test_table.csv | clickhouse-client --query “INSERT INTO testdb.test_table FORMAT CSV”
查看结果数据:
centf8118.sharding1.db :) select * from test_table limit 2 ; SELECT *
FROM test_table
LIMIT 2 ┌─province─┬─province_name─┬─create_date─┐
│ WA │ WA_NAME │ 2020-08-25 │
└──────────┴───────────────┴─────────────┘
┌─province─┬─province_name─┬─create_date─┐
│ CA │ CA_NAME │ 2020-09-25 │
└──────────┴───────────────┴─────────────┘ 2 rows in set. Elapsed: 0.004 sec.
3.2:特殊的CSV文件导入(包含回车换行,转义符等)。
–建表(sql语句)
CREATE TABLE testdb.test_table3 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;
–测试数据(Linux命令)
[root@centf8118 tmp]# cat test_table3.csv
1,123.456,”abc 123”,” abc" "'123”,2020-08-26,2020-08-26 17:08:09
–执行导入(Linux命令)
clickhouse-client --query "INSERT INTO testdb.test_table3 FORMAT CSV" < test_table3.csv
–查看结果(sql语句)
centf8118.sharding1.db :) select * from test_table3; SELECT *
FROM test_table3 ┌─id1─┬─────id2─┬─name1─────┬─name2────────────┬──────date1─┬───────────────date2─┐
│ 1 │ 123.456 │ ”abc 123” │ ” abc" "'123” │ 2020-08-26 │ 2020-08-26 17:08:09 │
└─────┴─────────┴───────────┴──────────────────┴────────────┴─────────────────────┘
3.3:尝试导入官方提供的大数据表
官方文档: https://clickhouse.tech/docs/en/getting-started/tutorial/
1>>下载并提取表数据,提取的文件大小约为10GB。
我这里下载到/data/clickhouse/tmp/目录下。
curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
[root@centf8118 tmp]# curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 802M 100 802M 0 0 8074k 0 0:01:41 0:01:41 --:--:-- 9047k
[root@centf8118 tmp]# curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 405M 100 405M 0 0 8790k 0 0:00:47 0:00:47 --:--:-- 9911k
查看目录下文件大小为9.8G。
[root@centf8118 ~]# du -sh /data/clickhouse/tmp
9.8G /data/clickhouse/tmp
2>>创建新的数据库:tutorial
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
3>>创建要导入数据的表:tutorial.hits_v1和tutorial.visits_v1
创建tutorial.hits_v1表:
CREATE TABLE tutorial.hits_v1
(
`WatchID` UInt64,
`JavaEnable` UInt8,
`Title` String,
`GoodEvent` Int16,
`EventTime` DateTime,
`EventDate` Date,
`CounterID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RegionID` UInt32,
`UserID` UInt64,
`CounterClass` Int8,
`OS` UInt8,
`UserAgent` UInt8,
`URL` String,
`Referer` String,
`URLDomain` String,
`RefererDomain` String,
`Refresh` UInt8,
`IsRobot` UInt8,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`FlashMinor2` String,
`NetMajor` UInt8,
`NetMinor` UInt8,
`UserAgentMajor` UInt16,
`UserAgentMinor` FixedString(2),
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`MobilePhone` UInt8,
`MobilePhoneModel` String,
`Params` String,
`IPNetworkID` UInt32,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`IsArtifical` UInt8,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`ClientTimeZone` Int16,
`ClientEventTime` DateTime,
`SilverlightVersion1` UInt8,
`SilverlightVersion2` UInt8,
`SilverlightVersion3` UInt32,
`SilverlightVersion4` UInt16,
`PageCharset` String,
`CodeVersion` UInt32,
`IsLink` UInt8,
`IsDownload` UInt8,
`IsNotBounce` UInt8,
`FUniqID` UInt64,
`HID` UInt32,
`IsOldCounter` UInt8,
`IsEvent` UInt8,
`IsParameter` UInt8,
`DontCountHits` UInt8,
`WithHash` UInt8,
`HitColor` FixedString(1),
`UTCEventTime` DateTime,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`WindowName` Int32,
`OpenerName` Int32,
`HistoryLength` Int16,
`BrowserLanguage` FixedString(2),
`BrowserCountry` FixedString(2),
`SocialNetwork` String,
`SocialAction` String,
`HTTPError` UInt16,
`SendTiming` Int32,
`DNSTiming` Int32,
`ConnectTiming` Int32,
`ResponseStartTiming` Int32,
`ResponseEndTiming` Int32,
`FetchTiming` Int32,
`RedirectTiming` Int32,
`DOMInteractiveTiming` Int32,
`DOMContentLoadedTiming` Int32,
`DOMCompleteTiming` Int32,
`LoadEventStartTiming` Int32,
`LoadEventEndTiming` Int32,
`NSToDOMContentLoadedTiming` Int32,
`FirstPaintTiming` Int32,
`RedirectCount` Int8,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`ParamPrice` Int64,
`ParamOrderID` String,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`GoalsReached` Array(UInt32),
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`RefererHash` UInt64,
`URLHash` UInt64,
`CLID` UInt32,
`YCLID` UInt64,
`ShareService` String,
`ShareURL` String,
`ShareTitle` String,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`IslandID` FixedString(16),
`RequestNum` UInt32,
`RequestTry` UInt8
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192;
创建tutorial.visits_v1表:
CREATE TABLE tutorial.visits_v1
(
`CounterID` UInt32,
`StartDate` Date,
`Sign` Int8,
`IsNew` UInt8,
`VisitID` UInt64,
`UserID` UInt64,
`StartTime` DateTime,
`Duration` UInt32,
`UTCStartTime` DateTime,
`PageViews` Int32,
`Hits` Int32,
`IsBounce` UInt8,
`Referer` String,
`StartURL` String,
`RefererDomain` String,
`StartURLDomain` String,
`EndURL` String,
`LinkURL` String,
`IsDownload` UInt8,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`PlaceID` Int32,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`IsYandex` UInt8,
`GoalReachesDepth` Int32,
`GoalReachesURL` Int32,
`GoalReachesAny` Int32,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`MobilePhoneModel` String,
`ClientEventTime` DateTime,
`RegionID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`IPNetworkID` UInt32,
`SilverlightVersion3` UInt32,
`CodeVersion` UInt32,
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`UserAgentMajor` UInt16,
`UserAgentMinor` UInt16,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`SilverlightVersion2` UInt8,
`SilverlightVersion4` UInt16,
`FlashVersion3` UInt16,
`FlashVersion4` UInt16,
`ClientTimeZone` Int16,
`OS` UInt8,
`UserAgent` UInt8,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`NetMajor` UInt8,
`NetMinor` UInt8,
`MobilePhone` UInt8,
`SilverlightVersion1` UInt8,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`JavaEnable` UInt8,
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`BrowserLanguage` UInt16,
`BrowserCountry` UInt16,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`Params` Array(String),
`Goals` Nested(
ID UInt32,
Serial UInt32,
EventTime DateTime,
Price Int64,
OrderID String,
CurrencyID UInt32),
`WatchIDs` Array(UInt64),
`ParamSumPrice` Int64,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`ClickLogID` UInt64,
`ClickEventID` Int32,
`ClickGoodEvent` Int32,
`ClickEventTime` DateTime,
`ClickPriorityID` Int32,
`ClickPhraseID` Int32,
`ClickPageID` Int32,
`ClickPlaceID` Int32,
`ClickTypeID` Int32,
`ClickResourceID` Int32,
`ClickCost` UInt32,
`ClickClientIP` UInt32,
`ClickDomainID` UInt32,
`ClickURL` String,
`ClickAttempt` UInt8,
`ClickOrderID` UInt32,
`ClickBannerID` UInt32,
`ClickMarketCategoryID` UInt32,
`ClickMarketPP` UInt32,
`ClickMarketCategoryName` String,
`ClickMarketPPName` String,
`ClickAWAPSCampaignName` String,
`ClickPageName` String,
`ClickTargetType` UInt16,
`ClickTargetPhraseID` UInt64,
`ClickContextType` UInt8,
`ClickSelectType` Int8,
`ClickOptions` String,
`ClickGroupBannerID` Int32,
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`FirstVisit` DateTime,
`PredLastVisit` Date,
`LastVisit` Date,
`TotalVisits` UInt32,
`TraficSource` Nested(
ID Int8,
SearchEngineID UInt16,
AdvEngineID UInt8,
PlaceID UInt16,
SocialSourceNetworkID UInt8,
Domain String,
SearchPhrase String,
SocialSourcePage String),
`Attendance` FixedString(16),
`CLID` UInt32,
`YCLID` UInt64,
`NormalizedRefererHash` UInt64,
`SearchPhraseHash` UInt64,
`RefererDomainHash` UInt64,
`NormalizedStartURLHash` UInt64,
`StartURLDomainHash` UInt64,
`NormalizedEndURLHash` UInt64,
`TopLevelDomain` UInt64,
`URLScheme` UInt64,
`OpenstatServiceNameHash` UInt64,
`OpenstatCampaignIDHash` UInt64,
`OpenstatAdIDHash` UInt64,
`OpenstatSourceIDHash` UInt64,
`UTMSourceHash` UInt64,
`UTMMediumHash` UInt64,
`UTMCampaignHash` UInt64,
`UTMContentHash` UInt64,
`UTMTermHash` UInt64,
`FromHash` UInt64,
`WebVisorEnabled` UInt8,
`WebVisorActivity` UInt32,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`Market` Nested(
Type UInt8,
GoalID UInt32,
OrderID String,
OrderPrice Int64,
PP UInt32,
DirectPlaceID UInt32,
DirectOrderID UInt32,
DirectBannerID UInt32,
GoodID String,
GoodName String,
GoodQuantity Int32,
GoodPrice Int64),
`IslandID` FixedString(16)
)
ENGINE = CollapsingMergeTree(Sign)
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192;
4>>导入数据
clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
数据导入完后,优化一下表
clickhouse-client --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL" clickhouse-client --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
优化完之后,查看下表数据量
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1" clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"
[root@centf8118 tmp]# clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1"
8873898
[root@centf8118 tmp]# clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"
1676861
查询数据
SELECT
StartURL AS URL,
AVG(Duration) AS AvgDuration
FROM tutorial.visits_v1
WHERE StartDate BETWEEN '2014-03-23' AND '2014-03-30'
GROUP BY URL
ORDER BY AvgDuration DESC
LIMIT 10 SELECT
sum(Sign) AS visits,
sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
(100. * goal_visits) / visits AS goal_percent
FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru')
【ClickHouse】4:clickhouse基本操作二 建库建表导数据的更多相关文章
- MySQL建库建表
一直使用SQL SERVER 数据库:最近项目使用MY SQL感觉还是有一点不适应.不过熟悉之后就会好很多. MY SQL 安装之后会有一个管理工具MySQL Workbench 感觉不太好用,数据库 ...
- 【ITOO 2】.NET 动态建库建表:使用SQL字符串拼接方式
导读:在最近接手的项目(高效云平台)中,有一个需求是要当企业用户注册时,给其动态的新建一个库和表.刚开始接手的时候,是一点头绪都没有,然后查了一些资料,也问了问上一版本的师哥师姐,终于有了点头绪.目前 ...
- 【ITOO 3】.NET 动态建库建表:实用EF框架提供的codeFirst实现动态建库
导读:在上篇博客中,介绍了使用SQL字符拼接的方式,实现动态建库建表的方法.这样做虽然也能够实现效果,但是,太麻烦,而且,如果改动表结构,字段的话,会对代码修改很多.但是EF给我们提供了一种代码先行的 ...
- SQL Server建库-建表-建约束
----------------------------------------SQL Server建库-建表-建约束创建School数据库------------------------------ ...
- 使用T-sql建库建表建约束
为什么要使用sql语句建库建表? 现在假设这样一个场景,公司的项目经过测试没问题后需要在客户的实际环境中进行演示,那就需要对数据进行移植,现在问题来了:客户的数据库版本和公司开发阶段使用的数据库不兼容 ...
- mysql那些事(4)建库建表编码的选择
mysql建数据库或者建表的时候会遇到选择编码的问题,以前我们都是习惯性的选择utf8,但是在mysql在5.5.3版本后加了utf8mb4的编码,utf8mb4可以存4个字节Unicode,mb4就 ...
- C# 利用*.SQL文件自动建库建表等的类
/// <summary> /// 自动建库建表 /// </summary> public class OperationSqlFile { SqlConnection sq ...
- Mysql建库建用户建表等常用命令
格式: mysql -h主机地址 -u用户名 -p用户密码 1.连接到本机上的MYSQL.首先打开DOS窗口,然后进入目录mysql\bin,再键入命令mysql -u root -p,回车后提示你输 ...
- oracle 11g 建库 建表 增 删 改 查 约束
一.建库 1.(点击左上角带绿色+号的按钮) 2.(进入这个界面,passowrd为密码.填写完后点击下面一排的Test按钮进行测试,无异常就点击Connect) 二.建表 1-1. create t ...
- 【平台开发】— 4.mysql建库建表
本想着把前端脚手架run起来了,然后就可以借着登录来捋一下前后端交互的过程.但是后端导入JPA的时候就发现了,还没有数据库. 既然是本着学习的目的,那咱也不想只在后端写死返回的数据,要做就做全套. 一 ...
随机推荐
- LogAgen的工作流程
LogAgen的工作流程: 一.读日志 --tailf 第三方库 新建tail_test/main.go package main import ( "fmt" "git ...
- Java简单实现MQ架构和思路01
实现一个 MQ(消息队列)架构可以涉及到很多方面,包括消息的生产和消费.消息的存储和传输.消息的格式和协议等等.下面是一个简单的 MQ 架构的实现示例,仅供参考: 定义消息格式和协议:我们可以定义一个 ...
- docker安装MySQL8.0.35主从复制(实战保姆级)
很久没有记录了,今天有时间就记录一下最近安装遇到的问题 liunx安装docker这个是前提,就不多过述 1 准备两台服务器 10.104.13.139 10.104.13.140 2 确保liunx ...
- IceRPC之深入理解调度管道->快乐的RPC
作者引言 很高兴啊,我们来到了IceRPC之深入理解调度管道->快乐的RPC,为上篇的续篇,深入理解常见的调度类型, 基础引导,有点小压力,打好基础,才能让自已不在迷茫,快乐的畅游世界. 传入请 ...
- Java8 Lambda表达式入门
可能很多人都听说过java8的新特性----Lambada表达式,但可能很多人都不知道Lambda表达式到底有什么用,下面我带大家理解一下Lambada表达式. 在平时的编程中,我们常常会用到匿名内部 ...
- 阿里云入选Gartner「边缘分发平台市场指南」代表厂商
近日,全球技术研究与咨询机构Gartner首次发布边缘分发平台市场指南报告<Market Guide for Edge Distribution Platforms>,阿里云凭借内容分发网 ...
- REACT列表过度
<TransitionGroup> <CSSTransition> <li>aaaa</li> </CSSTransition> </ ...
- The sultion of P4959
problem & blog 首先我们看到 \(x,y\) 有可能为负数,所以我们先把它旋转到第一象限. 然后我们发现如果 \(x_a \ge x_b\) 且 \(y_a \ge y_b\) ...
- INFINI Easysearch 与兆芯完成产品兼容互认证
近日,极限科技旗下软件产品 INFINI Easysearch 搜索引擎软件 V1.0 与兆芯完成兼容性测试,功能与稳定性良好,并获得兆芯产品兼容互认证书. 此次兼容适配基于银河麒麟高级服务器操作系统 ...
- 已将此(这些)订阅标记为不活动,必须将其重新初始化。需要删除 NoSync 订阅,然后重
已将此(这些)订阅标记为不活动,必须将其重新初始化.需要删除 NoSync 订阅,然后重 查找状态不正常的发布 use distribution go select status,*from dbo. ...