- mac中安装docker
2.docker的源建议为http://hub-mirrors.c.163.com
3.下载客户端- docker pull yandex/clickhouse-client
4.下载服务端 - docker pull yandex/clickhouse-server
- docker pull yandex/clickhouse-client
-
创建单独的network
docker create network ck_net
6.用下载好的镜像启动容器
docker run -d --name ck-server --network=cknet --ulimit nofile=262144:262144 --volume=$HOME/www/clickhouse:/var/lib/clickhouse yandex/clickhouse-server
7.验证是否启动完成

8.进入容器 docker exec -it ck-server /bin/bash
9.启动客户端 clickhouse-client
10.进入客户端 查看数据库 show databases;没错就是它,跟mysql 一模一样
11.创建一个数据库玩玩 create database if not exists jack;
12.切换数据库 use jack;嘿嘿 还是它
13.创建一个表看下
create table jack.user(user_id UInt32,message String,timestamp Datetime,metric Float32)ENGINE=MergeTree()PRIMARY KEY(user_id,timestamp)CREATE TABLE jack.user
(
user_id
UInt32,
message
String,
timestamp
Datetime,
metric
Float32
)
ENGINE = MergeTree //注意引擎不一样哦
PRIMARY KEY (user_id, timestamp)
14.插入几条数据咯
INSERT INTO jack.user (user_id, message, timestamp, metric) VALUES
(101, 'Hello, ClickHouse!', now(), -1.0 ),
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
(101, 'Granules are the smallest chunks of data read', now() + 5, 3.14159 )
15.查询一下 见证一下 奇迹
select from user; yeah 你肯定猜到了
还可以增加排序 select from user order by timestamp asc 或者desc
16.来点不一样的吧
select * from user order by timestamp desc format tabseparated
17.搞个大的
英国房地产支付价格
该数据集包含自 1995 年以来有关英格兰和威尔士房地产价格的数据。未压缩的大小约为 4 GiB,在 ClickHouse 中大约需要 278 MiB。足足2000多万的数据啊,谨慎操作,很漫长的,因为great wall的原因
- 建张表
- CREATE TABLE uk_price_paid
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
is_new UInt8,
duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2);- 重头戏插数据咯
INSERT INTO uk_price_paid
WITH
splitByChar(' ', postcode) AS p
SELECT
toUInt32(price_string) AS price,
parseDateTimeBestEffortUS(time) AS date,
p[1] AS postcode1,
p[2] AS postcode2,
transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
b = 'Y' AS is_new,
transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
addr1,
addr2,
street,
locality,
town,
district,
county
FROM url(
'http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv',
'CSV',
'uuid_string String,
price_string String,
time String,
postcode String,
a String,
b String,
c String,
addr1 String,
addr2 String,
street String,
locality String,
town String,
district String,
county String,
d String,
e String'
) SETTINGS max_http_get_redirects=10;
- 重头戏插数据咯
将从pp-complete.csv把数据源源不断的塞进ck里面,时间很漫长……天黑请闭眼。
- 时间很漫长……
- 时间很漫长……
- 时间很漫长……
- 很漫长……
- 漫长……
- 长……
- ……
睡了一觉之后醒来,发现终于可以了
将近一个小时的等待,花都谢了~
- 赶快验证下结果
2800万数据,0.023s
嘻嘻 是不是很棒- 查看下表的大小
SELECT formatReadableSize(total_bytes)
FROM system.tables
WHERE name = 'uk_price_paid'
300M 看来压缩的不错
- 小试牛刀
查询每年平均价格
SELECT
toYear(date) AS year,
round(avg(price)) AS price,
bar(price, 0, 1000000, 80
)
FROM uk_price_paid
GROUP BY year
ORDER BY year
处理了28.21millions 用了2.571s 我还以为在1s内,哎有点失望,毕竟看其他文章20万亿都是秒级,可能是我老婆的电脑不行,破mac 垃圾mac。不过比mysql 好多了曾经我们的用户表2200万,我业务里都查不动了。
-
查询伦敦每年的平均价格
SELECT
toYear(date) AS year,
round(avg(price)) AS price,
bar(price, 0, 2000000, 100
)
FROM uk_pricepaid
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year

还好速度在0.247sec里面
tips:2020 年london房价出事了!但这并不令人意外…… -
只买最贵的,不买最对的 就是有钱,给王阿姨查一下london最贵的房子价格
town,
district,
count() AS c,
round(avg(price)) AS price,
bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
town,
district
HAVING c >= 100
ORDER BY price DESC
LIMIT 10;
这个是前十的房子,前一百的图太大截图截不下来。
哎呀0.675s 表现不行啊,有没有让它更快的方法? -
有,构建投影
让我们通过维度 toYear(date)、district 和 town 创建一个聚合 Projection:
ALTER TABLE uk_price_paid
ADD PROJECTION projection_by_year_district_town(
SELECT
toYear(date),
district,
town,
avg(price),
sum(price),
count()
GROUP BY
toYear(date),
district,
town
)
此时我又查了一下,结果是0.746s,比刚才还慢了。
填充现有数据的 Projection。 (如果不进行 materialize 操作,则 ClickHouse 只会为新插入的数据创建 Projection):
ALTER TABLE uk_price_paid
MATERIALIZE PROJECTION projection_by_year_district_town
SETTINGS mutations_sync = 1
让我们来做个测试:
分别执行每年的价格、伦敦每年的平均价格、房价前100的sql
具体数值是0.342s、0.934s、0.822,之前的分别是2.571s,0.247s,1.010s 这个数据看起来除了伦敦每年的平均价格都降了一些。