1 00:00:00,000 --> 00:00:08,469 foreign 2 00:00:00,500 --> 00:00:08,469 [Music] 3 00:00:11,300 --> 00:00:17,580 runs a team of solution Architects at 4 00:00:15,360 --> 00:00:19,619 Ivan who focus on a whole collection of 5 00:00:17,580 --> 00:00:22,740 Open Source tools running in public 6 00:00:19,619 --> 00:00:24,779 clouds today's our speaker will cover 7 00:00:22,740 --> 00:00:26,820 the difference between regular databases 8 00:00:24,779 --> 00:00:28,260 and columnar databases and how to get 9 00:00:26,820 --> 00:00:30,300 the best performance out of clickhouse 10 00:00:28,260 --> 00:00:32,070 government database please welcome our 11 00:00:30,300 --> 00:00:34,380 speaker thank you very much 12 00:00:32,070 --> 00:00:37,559 [Applause] 13 00:00:34,380 --> 00:00:39,120 um just like every good cooking show 14 00:00:37,559 --> 00:00:40,800 folks I like to start with something 15 00:00:39,120 --> 00:00:43,620 that is something we prepared earlier 16 00:00:40,800 --> 00:00:45,420 when we get to the demo this is what 17 00:00:43,620 --> 00:00:46,739 you'll be seeing uh there's just a 18 00:00:45,420 --> 00:00:49,260 little clickhouse service that I've 19 00:00:46,739 --> 00:00:51,960 created here if anyone really cares it's 20 00:00:49,260 --> 00:00:53,280 running in Google in southeast one but 21 00:00:51,960 --> 00:00:54,719 that sort of doesn't make too much of us 22 00:00:53,280 --> 00:00:56,940 and you'll see that I've also created 23 00:00:54,719 --> 00:00:58,980 this data file ah that does some things 24 00:00:56,940 --> 00:01:02,579 like create some tables and load some 25 00:00:58,980 --> 00:01:06,540 test data for us more from our demo 26 00:01:02,579 --> 00:01:09,479 service later in the presentation 27 00:01:06,540 --> 00:01:11,159 so welcome everyone thank you very much 28 00:01:09,479 --> 00:01:13,380 I want to start by saying thank you for 29 00:01:11,159 --> 00:01:16,280 joining us here today my name's Troy 30 00:01:13,380 --> 00:01:19,979 sellers I'm a Staff solution architect 31 00:01:16,280 --> 00:01:21,720 for a company called Ivan 32 00:01:19,979 --> 00:01:23,220 um we do many things that Ivan one of 33 00:01:21,720 --> 00:01:26,520 the things that I think I'm most proud 34 00:01:23,220 --> 00:01:28,920 of is the company's contribution to the 35 00:01:26,520 --> 00:01:32,880 open source Community we were founded in 36 00:01:28,920 --> 00:01:34,380 2016 uh by four pretty passionate uh 37 00:01:32,880 --> 00:01:36,540 software Engineers that were committing 38 00:01:34,380 --> 00:01:38,040 to the postgres project at the time and 39 00:01:36,540 --> 00:01:39,900 one of the things they wanted to do was 40 00:01:38,040 --> 00:01:41,880 give back to the open source community 41 00:01:39,900 --> 00:01:43,740 and so they had this idea they could 42 00:01:41,880 --> 00:01:46,500 create a company that runs purely open 43 00:01:43,740 --> 00:01:49,439 source data technology and today we've 44 00:01:46,500 --> 00:01:51,540 got 10 I think I think it's around 10 45 00:01:49,439 --> 00:01:53,759 Engineers that work for our open source 46 00:01:51,540 --> 00:01:56,159 project office whose entire job is to 47 00:01:53,759 --> 00:01:58,860 commit back Upstream into projects like 48 00:01:56,159 --> 00:02:00,540 Kafka projects like postgres and 49 00:01:58,860 --> 00:02:02,460 projects like open search 50 00:02:00,540 --> 00:02:04,619 and so I'm really proud to be able to 51 00:02:02,460 --> 00:02:07,560 come down and talk to everyone about 52 00:02:04,619 --> 00:02:10,440 open source technology and what we do a 53 00:02:07,560 --> 00:02:12,840 staff solution architect at Ivan we are 54 00:02:10,440 --> 00:02:15,180 the technical support for our customer 55 00:02:12,840 --> 00:02:18,060 base in a lot of in a lot of scenarios 56 00:02:15,180 --> 00:02:20,280 when you run 12 different open source 57 00:02:18,060 --> 00:02:23,280 open source data projects what that 58 00:02:20,280 --> 00:02:26,280 means is I am a jack of all trades and a 59 00:02:23,280 --> 00:02:28,379 master of none so I want to get started 60 00:02:26,280 --> 00:02:30,060 today who here is an expert in Click 61 00:02:28,379 --> 00:02:32,420 house 62 00:02:30,060 --> 00:02:35,040 perfect 63 00:02:32,420 --> 00:02:37,319 my just in time learning will suffice 64 00:02:35,040 --> 00:02:39,599 excellent so that doesn't surprise me 65 00:02:37,319 --> 00:02:41,760 right like I um heard it first heard 66 00:02:39,599 --> 00:02:43,379 about clickhouse about mid last year 67 00:02:41,760 --> 00:02:44,580 when it came internally into a company 68 00:02:43,379 --> 00:02:46,980 that we're going to start running this 69 00:02:44,580 --> 00:02:49,080 service for our customers and so I 70 00:02:46,980 --> 00:02:51,060 jumped in and this is a good example of 71 00:02:49,080 --> 00:02:52,440 what this presentation might be is the 72 00:02:51,060 --> 00:02:55,140 things I had to learn about to 73 00:02:52,440 --> 00:02:57,300 understand what clickhouse was in order 74 00:02:55,140 --> 00:02:58,200 to talk to people like yourselves about 75 00:02:57,300 --> 00:03:00,300 it 76 00:02:58,200 --> 00:03:01,680 so thank you for joining me today's 77 00:03:00,300 --> 00:03:04,260 journey is going to be pretty simple 78 00:03:01,680 --> 00:03:06,000 we're going to sort of start off and 79 00:03:04,260 --> 00:03:07,500 we're going to cover like what is just 80 00:03:06,000 --> 00:03:09,180 the main difference like we want to talk 81 00:03:07,500 --> 00:03:11,159 about what the problem is that 82 00:03:09,180 --> 00:03:12,720 clickhouse is trying to solve and I 83 00:03:11,159 --> 00:03:13,980 think it's an important spot to start 84 00:03:12,720 --> 00:03:17,400 there 85 00:03:13,980 --> 00:03:19,019 um because that'll lead us to what click 86 00:03:17,400 --> 00:03:20,519 house is and what are the business cases 87 00:03:19,019 --> 00:03:22,440 that we should be thinking about using 88 00:03:20,519 --> 00:03:23,819 it and how you know why does it work the 89 00:03:22,440 --> 00:03:25,980 way it does and all those kind of things 90 00:03:23,819 --> 00:03:27,480 we'll show you a little demo because 91 00:03:25,980 --> 00:03:29,340 what's this technology presentation 92 00:03:27,480 --> 00:03:31,200 without a demo 93 00:03:29,340 --> 00:03:32,940 and then we'll talk a little bit about 94 00:03:31,200 --> 00:03:35,819 some of the things that you should not 95 00:03:32,940 --> 00:03:37,140 do with clickhouse okay this is not the 96 00:03:35,819 --> 00:03:39,180 kind of thing that you like throw all 97 00:03:37,140 --> 00:03:41,040 our databases out we have a new database 98 00:03:39,180 --> 00:03:42,659 now I'd encourage you not to think about 99 00:03:41,040 --> 00:03:45,420 it like that 100 00:03:42,659 --> 00:03:47,340 uh and finally uh there'll be some QR 101 00:03:45,420 --> 00:03:50,340 codes and stuff like that so you can 102 00:03:47,340 --> 00:03:51,780 scan for further resources 103 00:03:50,340 --> 00:03:53,400 um which is mainly just links to 104 00:03:51,780 --> 00:03:55,200 documentation 105 00:03:53,400 --> 00:03:56,940 uh sound good 106 00:03:55,200 --> 00:03:58,560 so thank you everyone in the room thank 107 00:03:56,940 --> 00:04:00,000 you everyone online for joining us I 108 00:03:58,560 --> 00:04:02,280 know we're going out live today as well 109 00:04:00,000 --> 00:04:04,440 so that's pretty exciting for me it's my 110 00:04:02,280 --> 00:04:06,360 first live stream presentation ever I 111 00:04:04,440 --> 00:04:09,299 think so let's talk a little bit about 112 00:04:06,360 --> 00:04:11,400 data and the purpose of data 113 00:04:09,299 --> 00:04:13,560 and what that means to people let's 114 00:04:11,400 --> 00:04:16,079 imagine a couple of things that we're 115 00:04:13,560 --> 00:04:18,419 all pretty familiar with okay uh online 116 00:04:16,079 --> 00:04:20,820 shopping right 117 00:04:18,419 --> 00:04:22,620 um some iot devices there's that concept 118 00:04:20,820 --> 00:04:25,560 of this you know this stream of events 119 00:04:22,620 --> 00:04:26,580 any kind of application that you have on 120 00:04:25,560 --> 00:04:28,800 your phone 121 00:04:26,580 --> 00:04:31,620 right the amount of data that's coming 122 00:04:28,800 --> 00:04:34,620 out of today's modern world uh to to 123 00:04:31,620 --> 00:04:36,240 roll out and often overuse Trope uh is 124 00:04:34,620 --> 00:04:38,520 growing exponentially and there's no 125 00:04:36,240 --> 00:04:40,800 sign of stopping that and this is the 126 00:04:38,520 --> 00:04:43,199 kind of thing that really helps us as 127 00:04:40,800 --> 00:04:44,820 technologists clickhouse is the kind of 128 00:04:43,199 --> 00:04:47,280 database that helps us as technologists 129 00:04:44,820 --> 00:04:49,680 make sense of all that data okay as as 130 00:04:47,280 --> 00:04:52,639 the size of data increases and the 131 00:04:49,680 --> 00:04:55,020 volume and frequency of data increases 132 00:04:52,639 --> 00:04:57,479 historically we had to make decisions 133 00:04:55,020 --> 00:04:59,460 about which data we thought was going to 134 00:04:57,479 --> 00:05:01,740 be useful to support the business cases 135 00:04:59,460 --> 00:05:03,360 we're trying to support and quite often 136 00:05:01,740 --> 00:05:04,860 we had to make decisions about which 137 00:05:03,360 --> 00:05:06,900 data should we throw away 138 00:05:04,860 --> 00:05:09,419 which data should we not collect at all 139 00:05:06,900 --> 00:05:10,860 okay and then we had to sort of look 140 00:05:09,419 --> 00:05:12,479 forward into the future and try and 141 00:05:10,860 --> 00:05:13,800 imagine what questions people were going 142 00:05:12,479 --> 00:05:16,020 to ask us 143 00:05:13,800 --> 00:05:18,600 so we could keep that data 144 00:05:16,020 --> 00:05:20,280 um there is to keep keep that data 145 00:05:18,600 --> 00:05:22,740 that's ready for queried 146 00:05:20,280 --> 00:05:24,960 um this was kind of a function of 147 00:05:22,740 --> 00:05:26,880 storage being a little bit expensive it 148 00:05:24,960 --> 00:05:30,240 was kind of a function of like how hard 149 00:05:26,880 --> 00:05:31,740 it is to process data sets uh in certain 150 00:05:30,240 --> 00:05:33,960 scenarios with some sort of more 151 00:05:31,740 --> 00:05:36,180 traditional Technologies but today 152 00:05:33,960 --> 00:05:38,580 you'll see this is pretty relevant when 153 00:05:36,180 --> 00:05:41,460 we talk about analytics and the analytic 154 00:05:38,580 --> 00:05:43,080 power of clickhouse this is a really 155 00:05:41,460 --> 00:05:45,240 good example of where this starts to 156 00:05:43,080 --> 00:05:48,120 come in has anyone heard of Apache Kafka 157 00:05:45,240 --> 00:05:49,440 everyone and everyone that's definitely 158 00:05:48,120 --> 00:05:50,820 the type of question that everyone's 159 00:05:49,440 --> 00:05:52,919 supposed to raise their hand to it's 160 00:05:50,820 --> 00:05:55,500 just to see if everyone's awake but that 161 00:05:52,919 --> 00:05:59,400 concept of like you now have an 162 00:05:55,500 --> 00:06:01,259 unbounded continuous stream of data okay 163 00:05:59,400 --> 00:06:03,539 so how do you start to choose which 164 00:06:01,259 --> 00:06:05,460 event in this stream is going to be 165 00:06:03,539 --> 00:06:07,500 relevant okay and these kind of 166 00:06:05,460 --> 00:06:09,419 real-time streaming event-driven 167 00:06:07,500 --> 00:06:11,639 architectures as they're called really 168 00:06:09,419 --> 00:06:13,320 highlight that problem how do you make 169 00:06:11,639 --> 00:06:14,400 the decisions about which data you want 170 00:06:13,320 --> 00:06:15,960 to keep 171 00:06:14,400 --> 00:06:17,880 and it's an interesting thing to think 172 00:06:15,960 --> 00:06:19,800 about we've all sort of heard these 173 00:06:17,880 --> 00:06:21,979 terms does it go into a data warehouse a 174 00:06:19,800 --> 00:06:24,300 data Lake do you put it in a database 175 00:06:21,979 --> 00:06:25,520 these things mean different things to 176 00:06:24,300 --> 00:06:28,020 different people 177 00:06:25,520 --> 00:06:29,580 my favorite follow-up question when 178 00:06:28,020 --> 00:06:30,960 anyone tells me they have a data 179 00:06:29,580 --> 00:06:31,979 warehouse is what do you think that 180 00:06:30,960 --> 00:06:34,680 means 181 00:06:31,979 --> 00:06:36,780 and I've heard answers from actual data 182 00:06:34,680 --> 00:06:39,840 what I think an actual data warehouse is 183 00:06:36,780 --> 00:06:42,060 to I've run it in postgres to we have S3 184 00:06:39,840 --> 00:06:44,580 like it's there's all sorts of things 185 00:06:42,060 --> 00:06:46,199 it's a really big umbrella term 186 00:06:44,580 --> 00:06:48,419 and so we want to talk a little bit 187 00:06:46,199 --> 00:06:51,780 about where does click house fit into 188 00:06:48,419 --> 00:06:55,080 this uh at Ivan we are fair to say 189 00:06:51,780 --> 00:06:57,960 pretty popular with postgres which we we 190 00:06:55,080 --> 00:07:00,300 do love postgres we run a lot of it like 191 00:06:57,960 --> 00:07:02,759 I said our first service we ever ran uh 192 00:07:00,300 --> 00:07:05,280 was postgres but it's fair to say that 193 00:07:02,759 --> 00:07:07,740 postgres is not the solution here okay 194 00:07:05,280 --> 00:07:10,139 you're going to run into problems just 195 00:07:07,740 --> 00:07:12,660 with size and storage on disk and stuff 196 00:07:10,139 --> 00:07:16,319 and that is because of a fundamental 197 00:07:12,660 --> 00:07:18,960 nature in the way that these things work 198 00:07:16,319 --> 00:07:20,580 um if you're not familiar with this like 199 00:07:18,960 --> 00:07:22,620 you've probably already all heard about 200 00:07:20,580 --> 00:07:25,139 this this difference between transaction 201 00:07:22,620 --> 00:07:27,539 processing and analytic processing so 202 00:07:25,139 --> 00:07:29,580 let's imagine that online store scenario 203 00:07:27,539 --> 00:07:31,500 okay you've got a shopping cart you've 204 00:07:29,580 --> 00:07:34,020 got someone that comes in let's say that 205 00:07:31,500 --> 00:07:35,699 Catherine here she wants to come in and 206 00:07:34,020 --> 00:07:37,380 go shopping and there's an awful lot of 207 00:07:35,699 --> 00:07:38,940 data that Catherine might interact with 208 00:07:37,380 --> 00:07:40,620 but there's some interesting things 209 00:07:38,940 --> 00:07:42,240 right she might update her delivery 210 00:07:40,620 --> 00:07:44,819 preferences she might update her phone 211 00:07:42,240 --> 00:07:47,699 number her email address there is a row 212 00:07:44,819 --> 00:07:49,919 in a transactional database that she is 213 00:07:47,699 --> 00:07:52,259 going to be interacting with 214 00:07:49,919 --> 00:07:54,720 okay and the the way that that database 215 00:07:52,259 --> 00:07:58,139 works is built to make sure that you can 216 00:07:54,720 --> 00:08:01,020 find that row update that row very 217 00:07:58,139 --> 00:08:02,340 efficiently very quickly okay indexing 218 00:08:01,020 --> 00:08:04,020 and all this kind of stuff we'll talk a 219 00:08:02,340 --> 00:08:06,599 little bit about it but it really solves 220 00:08:04,020 --> 00:08:09,180 that problem Catherine wants to come on 221 00:08:06,599 --> 00:08:10,380 in and update a record that's particular 222 00:08:09,180 --> 00:08:12,780 to her 223 00:08:10,380 --> 00:08:15,720 she wants that to be in a transaction 224 00:08:12,780 --> 00:08:17,460 okay it needs to be 225 00:08:15,720 --> 00:08:19,440 um consistent 226 00:08:17,460 --> 00:08:22,020 then there's this different type of 227 00:08:19,440 --> 00:08:23,639 interaction that happens on this online 228 00:08:22,020 --> 00:08:26,400 store 229 00:08:23,639 --> 00:08:28,259 while Catherine is navigating that she's 230 00:08:26,400 --> 00:08:31,800 looking at products she's clicking on 231 00:08:28,259 --> 00:08:33,899 links okay she's coming in and out 232 00:08:31,800 --> 00:08:35,279 having different sessions there's all 233 00:08:33,899 --> 00:08:36,719 sorts of things she's adding things to 234 00:08:35,279 --> 00:08:38,580 carts she's removing things from cart 235 00:08:36,719 --> 00:08:40,620 she's might be clicking the buy button 236 00:08:38,580 --> 00:08:43,560 there's all these different events on 237 00:08:40,620 --> 00:08:45,240 this kind of online store that are 238 00:08:43,560 --> 00:08:48,060 different to that 239 00:08:45,240 --> 00:08:50,040 concept of like her single record and 240 00:08:48,060 --> 00:08:52,019 these events we typically want to start 241 00:08:50,040 --> 00:08:53,160 to do some kind of analytical processing 242 00:08:52,019 --> 00:08:55,019 on 243 00:08:53,160 --> 00:08:56,640 what is the most popular product people 244 00:08:55,019 --> 00:08:58,740 are looking at what is the most popular 245 00:08:56,640 --> 00:09:01,500 you know product that's been added to 246 00:08:58,740 --> 00:09:03,720 cart in the last 12 hours right these 247 00:09:01,500 --> 00:09:06,140 kind of things typically are different 248 00:09:03,720 --> 00:09:09,779 queries from a transactional database 249 00:09:06,140 --> 00:09:10,860 typically they aggregate large volumes 250 00:09:09,779 --> 00:09:14,519 of data 251 00:09:10,860 --> 00:09:16,200 okay typically the data is immutable 252 00:09:14,519 --> 00:09:18,480 this event stream data is pretty 253 00:09:16,200 --> 00:09:22,080 interesting about that typically 254 00:09:18,480 --> 00:09:24,120 and they don't retrieve an entire row of 255 00:09:22,080 --> 00:09:27,000 data okay they're going to aggregate 256 00:09:24,120 --> 00:09:29,180 across just a few Columns of that data 257 00:09:27,000 --> 00:09:29,180 set 258 00:09:29,820 --> 00:09:34,140 this is like sort of the main takeaway 259 00:09:32,220 --> 00:09:36,180 for you take away from what's the 260 00:09:34,140 --> 00:09:39,480 difference here between click house and 261 00:09:36,180 --> 00:09:41,399 a transactional database it's this slide 262 00:09:39,480 --> 00:09:44,220 okay if you're online you can flip to 263 00:09:41,399 --> 00:09:46,980 YouTube now uh that's that's it okay 264 00:09:44,220 --> 00:09:48,959 this is the thing is that when you're in 265 00:09:46,980 --> 00:09:50,700 an analytical processing database you 266 00:09:48,959 --> 00:09:52,920 usually only want to retrieve some of 267 00:09:50,700 --> 00:09:56,040 the data to do some aggregations across 268 00:09:52,920 --> 00:09:58,380 it and that leads us quite nicely into 269 00:09:56,040 --> 00:10:00,000 how clickhouse does this and what click 270 00:09:58,380 --> 00:10:03,300 house is for 271 00:10:00,000 --> 00:10:05,160 so a brief history again for those like 272 00:10:03,300 --> 00:10:06,060 no one's really familiar with it click 273 00:10:05,160 --> 00:10:08,760 house 274 00:10:06,060 --> 00:10:11,279 that use case we just described that 275 00:10:08,760 --> 00:10:13,320 capturing web analytics was exactly what 276 00:10:11,279 --> 00:10:15,060 this database was described and built 277 00:10:13,320 --> 00:10:19,140 for it's built by a company called 278 00:10:15,060 --> 00:10:21,000 Yandex in 2016 an open sourced out of 279 00:10:19,140 --> 00:10:22,920 the generosity of their heart for all of 280 00:10:21,000 --> 00:10:25,019 us to enjoy and love 281 00:10:22,920 --> 00:10:26,339 Yandex are the second largest web 282 00:10:25,019 --> 00:10:29,760 analytics platform in the world 283 00:10:26,339 --> 00:10:30,600 according to Wikipedia based out of 284 00:10:29,760 --> 00:10:32,880 Russia 285 00:10:30,600 --> 00:10:34,860 but that concept of like ingesting 286 00:10:32,880 --> 00:10:37,260 hundreds of millions and billions of 287 00:10:34,860 --> 00:10:39,120 rows of data really really quickly and 288 00:10:37,260 --> 00:10:40,800 then running aggregation queries on it 289 00:10:39,120 --> 00:10:42,420 is exactly what this technology was 290 00:10:40,800 --> 00:10:44,279 built for 291 00:10:42,420 --> 00:10:46,680 since then I thought it's really good to 292 00:10:44,279 --> 00:10:49,320 talk about like just the open source 293 00:10:46,680 --> 00:10:51,380 Community behind this project like the 294 00:10:49,320 --> 00:10:54,899 community's really taken this project 295 00:10:51,380 --> 00:10:56,880 and and run with it there's a lot of 296 00:10:54,899 --> 00:10:58,620 total contributors there's a really a 297 00:10:56,880 --> 00:11:01,320 real strong collection of active 298 00:10:58,620 --> 00:11:03,540 contributors to this project and it's 299 00:11:01,320 --> 00:11:04,380 really becoming popular out there in the 300 00:11:03,540 --> 00:11:06,060 world 301 00:11:04,380 --> 00:11:08,220 it's one of those things that we don't 302 00:11:06,060 --> 00:11:11,700 take on lightly as well like as we 303 00:11:08,220 --> 00:11:13,860 there's a fairly strong consist criteria 304 00:11:11,700 --> 00:11:15,360 about us adopting a project to operate 305 00:11:13,860 --> 00:11:16,140 for our customers 306 00:11:15,360 --> 00:11:18,000 um 307 00:11:16,140 --> 00:11:20,160 and that's definitely one of them 308 00:11:18,000 --> 00:11:22,260 so we talked about this idea of what 309 00:11:20,160 --> 00:11:24,240 Rowan orientated data looks like in 310 00:11:22,260 --> 00:11:26,339 postgres now when you sort of scratch 311 00:11:24,240 --> 00:11:28,980 underneath the surface what that means 312 00:11:26,339 --> 00:11:31,440 is that the rows themselves are 313 00:11:28,980 --> 00:11:33,720 typically stored in contiguous files on 314 00:11:31,440 --> 00:11:35,700 disks okay and so it means that if you 315 00:11:33,720 --> 00:11:37,440 want a row you can go and find that row 316 00:11:35,700 --> 00:11:39,779 and you're not scanning across different 317 00:11:37,440 --> 00:11:41,339 parts on disk you're actually getting to 318 00:11:39,779 --> 00:11:43,140 it really quickly and you can go right 319 00:11:41,339 --> 00:11:44,760 this row is this big I can just scan 320 00:11:43,140 --> 00:11:47,579 this many bytes and I have that row of 321 00:11:44,760 --> 00:11:48,720 data okay this works really good but if 322 00:11:47,579 --> 00:11:50,760 you can see if you want just to 323 00:11:48,720 --> 00:11:53,100 aggregate one column there's an awful 324 00:11:50,760 --> 00:11:56,700 lot of data you have to read off disk 325 00:11:53,100 --> 00:11:59,339 hold in memory and throw away okay it's 326 00:11:56,700 --> 00:12:00,540 wasteful in that kind of sense and so 327 00:11:59,339 --> 00:12:04,339 you're kind of reading the whole 328 00:12:00,540 --> 00:12:04,339 database to do an aggregation query 329 00:12:04,380 --> 00:12:10,560 where if you think about a columnar or 330 00:12:07,440 --> 00:12:15,240 orientated database rather than storing 331 00:12:10,560 --> 00:12:17,339 rows you store columns as files and if 332 00:12:15,240 --> 00:12:19,800 you get into the uh you know workings of 333 00:12:17,339 --> 00:12:21,660 the clickhouse database when you open up 334 00:12:19,800 --> 00:12:23,579 and see the file system this is exactly 335 00:12:21,660 --> 00:12:25,920 what you'll see you'll see an index 336 00:12:23,579 --> 00:12:28,440 you'll see a few other indexes in there 337 00:12:25,920 --> 00:12:31,740 in each for each table and then you'll 338 00:12:28,440 --> 00:12:34,500 see a file per column 339 00:12:31,740 --> 00:12:36,720 okay and that makes it really simple to 340 00:12:34,500 --> 00:12:39,000 say aggregate all the data in one column 341 00:12:36,720 --> 00:12:41,279 because now I only have to go and read a 342 00:12:39,000 --> 00:12:43,380 single file okay I don't have to pull 343 00:12:41,279 --> 00:12:45,600 that record out of individual files or 344 00:12:43,380 --> 00:12:48,060 places on disk 345 00:12:45,600 --> 00:12:51,180 and this is why this sort of really 346 00:12:48,060 --> 00:12:53,579 helps one of the sort of first kind of 347 00:12:51,180 --> 00:12:56,459 things that makes click house incredibly 348 00:12:53,579 --> 00:12:58,680 performant for this is just the very 349 00:12:56,459 --> 00:13:01,620 simple fact that the way data is stored 350 00:12:58,680 --> 00:13:03,540 on disk is different and this column of 351 00:13:01,620 --> 00:13:05,160 database is not brand new to click 352 00:13:03,540 --> 00:13:06,839 houses there's heaps other services that 353 00:13:05,160 --> 00:13:07,740 do this column of stuff 354 00:13:06,839 --> 00:13:11,279 um 355 00:13:07,740 --> 00:13:12,540 what's interesting about that also is 356 00:13:11,279 --> 00:13:14,639 that this is one of the things that 357 00:13:12,540 --> 00:13:18,540 clickhouse does really well is that now 358 00:13:14,639 --> 00:13:23,880 that you have Columns of data you also 359 00:13:18,540 --> 00:13:25,620 have Columns of the exact same data type 360 00:13:23,880 --> 00:13:27,480 and it's not immediately obvious but 361 00:13:25,620 --> 00:13:30,120 what clickhouse allows you to do here is 362 00:13:27,480 --> 00:13:32,459 now you can specify individual 363 00:13:30,120 --> 00:13:34,380 compression algorithms that are suited 364 00:13:32,459 --> 00:13:36,779 for the particular data type that you're 365 00:13:34,380 --> 00:13:39,540 storing on disk so not only is 366 00:13:36,779 --> 00:13:41,760 clickhouse incredibly fast at retrieving 367 00:13:39,540 --> 00:13:45,000 data it's incredibly efficient at 368 00:13:41,760 --> 00:13:46,740 storing it on disk like it's incredibly 369 00:13:45,000 --> 00:13:49,139 efficient where there's a benchmark post 370 00:13:46,740 --> 00:13:51,360 somewhere on our blog that just talks 371 00:13:49,139 --> 00:13:54,120 about the size we benchmarked click outs 372 00:13:51,360 --> 00:13:55,800 against a postgres service on a data set 373 00:13:54,120 --> 00:13:56,880 of about four and a half billion rows of 374 00:13:55,800 --> 00:14:00,480 data 375 00:13:56,880 --> 00:14:02,480 and the size on disk was about half 376 00:14:00,480 --> 00:14:04,620 like and it's just because of this 377 00:14:02,480 --> 00:14:06,660 efficiency that you get if you've got 378 00:14:04,620 --> 00:14:09,120 all integers or if you've got all time 379 00:14:06,660 --> 00:14:10,920 stamps or if you've got all floats you 380 00:14:09,120 --> 00:14:13,320 can start to specify individual 381 00:14:10,920 --> 00:14:16,920 compression algorithms on those as you 382 00:14:13,320 --> 00:14:18,839 write that to disk it's kind of useful 383 00:14:16,920 --> 00:14:20,519 here's an example of how you'd create 384 00:14:18,839 --> 00:14:22,920 that and sort of if we wanted to get 385 00:14:20,519 --> 00:14:24,899 down and dirty with it yourself when you 386 00:14:22,920 --> 00:14:26,940 create those tables you can specify 387 00:14:24,899 --> 00:14:29,779 individually how that data is going to 388 00:14:26,940 --> 00:14:29,779 be stored on disk 389 00:14:30,060 --> 00:14:34,920 the second thing to talk about is in 390 00:14:32,700 --> 00:14:37,380 this column orientated fashion and by 391 00:14:34,920 --> 00:14:38,940 the way this is a broad brush on just 392 00:14:37,380 --> 00:14:41,639 some of the things that clickhouse does 393 00:14:38,940 --> 00:14:44,880 to make things incredibly efficient It 394 00:14:41,639 --> 00:14:46,199 Is by no means an exhaustive list so if 395 00:14:44,880 --> 00:14:47,940 you're interested in some of this stuff 396 00:14:46,199 --> 00:14:49,620 then I definitely recommend hit the 397 00:14:47,940 --> 00:14:52,380 documentation for click house 398 00:14:49,620 --> 00:14:54,300 but clickhouse has a primary index who 399 00:14:52,380 --> 00:14:56,399 knows what a primary index is 400 00:14:54,300 --> 00:14:57,839 yep again another one just to check 401 00:14:56,399 --> 00:14:58,500 everyone's awake 402 00:14:57,839 --> 00:15:01,139 um 403 00:14:58,500 --> 00:15:02,880 so there is a primary index in Click 404 00:15:01,139 --> 00:15:04,740 house one of the first surprising things 405 00:15:02,880 --> 00:15:07,940 about a primary index that I found is 406 00:15:04,740 --> 00:15:07,940 that it's not unique 407 00:15:08,399 --> 00:15:11,880 there is no requirement for uniqueness 408 00:15:10,139 --> 00:15:13,560 in your primary index 409 00:15:11,880 --> 00:15:16,800 uh the second thing 410 00:15:13,560 --> 00:15:20,339 is it's what's called a sparse index and 411 00:15:16,800 --> 00:15:23,339 so what that means is for every 8 192 412 00:15:20,339 --> 00:15:26,120 rows of data you have an entry in your 413 00:15:23,339 --> 00:15:26,120 primary index 414 00:15:26,459 --> 00:15:29,820 the thing again that's not immediately 415 00:15:28,320 --> 00:15:32,040 obvious this means that this has got 416 00:15:29,820 --> 00:15:35,220 some kind of sort order in it 417 00:15:32,040 --> 00:15:37,260 okay now this sounds really odd about 418 00:15:35,220 --> 00:15:39,120 but then you think about what analytic 419 00:15:37,260 --> 00:15:42,000 queries do 420 00:15:39,120 --> 00:15:44,160 okay so analytic queries probably don't 421 00:15:42,000 --> 00:15:45,839 go and fetch a single line of data from 422 00:15:44,160 --> 00:15:48,420 the database 423 00:15:45,839 --> 00:15:51,240 okay analytic queries probably aggregate 424 00:15:48,420 --> 00:15:53,820 over millions or hundreds of millions or 425 00:15:51,240 --> 00:15:55,800 billions of rows of data 426 00:15:53,820 --> 00:15:58,320 and if you think let's let's call that 427 00:15:55,800 --> 00:16:01,800 ten thousand because the maths is easier 428 00:15:58,320 --> 00:16:05,339 there's now a thousand or a hundred rows 429 00:16:01,800 --> 00:16:08,040 in the index for every million records 430 00:16:05,339 --> 00:16:10,380 okay and so that when I want to go and 431 00:16:08,040 --> 00:16:11,699 scan a subset of data I can go and pick 432 00:16:10,380 --> 00:16:13,920 out blocks 433 00:16:11,699 --> 00:16:17,100 um what they're called granules of data 434 00:16:13,920 --> 00:16:19,079 and and sort of read that in into memory 435 00:16:17,100 --> 00:16:20,399 and process it really quickly one of the 436 00:16:19,079 --> 00:16:22,500 things you'll find when you start to 437 00:16:20,399 --> 00:16:25,320 work with clickhouse it's very very good 438 00:16:22,500 --> 00:16:26,880 at figuring out exactly what data to 439 00:16:25,320 --> 00:16:29,040 read into memory 440 00:16:26,880 --> 00:16:31,740 okay and it's very good at leaving 441 00:16:29,040 --> 00:16:34,560 memory on disk if it doesn't need it 442 00:16:31,740 --> 00:16:37,560 this sparse index is one of those things 443 00:16:34,560 --> 00:16:40,560 now you can specify different primary 444 00:16:37,560 --> 00:16:43,699 keys and sort orders on tables typically 445 00:16:40,560 --> 00:16:43,699 they're the same value 446 00:16:44,480 --> 00:16:48,899 8192 apparently makes more sense for 447 00:16:46,980 --> 00:16:52,019 those that think in binary 448 00:16:48,899 --> 00:16:54,300 I'm one of the 10 people I think in 449 00:16:52,019 --> 00:16:56,459 normal numbers 450 00:16:54,300 --> 00:16:58,860 um and it just sort of makes that a bit 451 00:16:56,459 --> 00:17:01,680 easier to to look at the the second 452 00:16:58,860 --> 00:17:03,240 thing that you can Define one of the 453 00:17:01,680 --> 00:17:07,980 other things I should say that you can 454 00:17:03,240 --> 00:17:10,439 Define in Click house is a skip index 455 00:17:07,980 --> 00:17:11,819 um now it's kind of analogous if you if 456 00:17:10,439 --> 00:17:13,740 you're familiar with sort of more 457 00:17:11,819 --> 00:17:16,439 relational data structures of defining 458 00:17:13,740 --> 00:17:17,880 secondary indexes on your database so if 459 00:17:16,439 --> 00:17:19,260 you're familiar with like a primary 460 00:17:17,880 --> 00:17:21,000 index and then you can define a 461 00:17:19,260 --> 00:17:22,919 secondary index to say help search 462 00:17:21,000 --> 00:17:25,740 performance or something like that in a 463 00:17:22,919 --> 00:17:26,939 postgres service it's kind of analogous 464 00:17:25,740 --> 00:17:27,540 to that 465 00:17:26,939 --> 00:17:31,320 um 466 00:17:27,540 --> 00:17:34,320 but remember we don't store data in rows 467 00:17:31,320 --> 00:17:36,000 so having a secondary index on that row 468 00:17:34,320 --> 00:17:38,820 of data sort of doesn't really make 469 00:17:36,000 --> 00:17:40,559 sense so what you can Define is a thing 470 00:17:38,820 --> 00:17:44,539 called a skip index 471 00:17:40,559 --> 00:17:47,100 and Skip index pardon me skip indexes 472 00:17:44,539 --> 00:17:49,440 allows you to sort of tell the the 473 00:17:47,100 --> 00:17:52,740 clickhouse query engine is that look you 474 00:17:49,440 --> 00:17:54,539 can go to this index and skip over a 475 00:17:52,740 --> 00:17:56,940 certain amount of data 476 00:17:54,539 --> 00:17:58,620 all right inside that it defines what 477 00:17:56,940 --> 00:18:00,480 you can skip when you're processing it 478 00:17:58,620 --> 00:18:03,120 and a good example of something like 479 00:18:00,480 --> 00:18:05,539 this might be um 480 00:18:03,120 --> 00:18:05,539 sorry 481 00:18:07,679 --> 00:18:13,679 you think about like you've got a Time 482 00:18:10,740 --> 00:18:15,360 series data right that let's go back to 483 00:18:13,679 --> 00:18:17,580 that online web store it's a great 484 00:18:15,360 --> 00:18:19,679 example you are going to have time 485 00:18:17,580 --> 00:18:21,120 series data excellent use case for your 486 00:18:19,679 --> 00:18:22,320 primary key it's going to be ordered 487 00:18:21,120 --> 00:18:23,580 it's going to be blocked it's going to 488 00:18:22,320 --> 00:18:25,980 be searchable right all that kind of 489 00:18:23,580 --> 00:18:28,500 stuff but then you might say well you 490 00:18:25,980 --> 00:18:30,299 know what we've got a pretty well-known 491 00:18:28,500 --> 00:18:33,419 requirement that we want to aggregate 492 00:18:30,299 --> 00:18:35,220 every 403 error 493 00:18:33,419 --> 00:18:37,440 okay we want to know when people are 494 00:18:35,220 --> 00:18:39,059 trying to log in and it's failing and we 495 00:18:37,440 --> 00:18:41,760 want to we want to be able to aggregate 496 00:18:39,059 --> 00:18:44,220 that so that kind of HTTP error might be 497 00:18:41,760 --> 00:18:46,140 make a really good skip index you want 498 00:18:44,220 --> 00:18:48,539 to say we're going to query this table a 499 00:18:46,140 --> 00:18:50,760 lot for all the 403 errors and it will 500 00:18:48,539 --> 00:18:54,120 keep a secondary index on disk about 501 00:18:50,760 --> 00:18:56,220 wherein each block the 403 errors live 502 00:18:54,120 --> 00:18:57,780 and so you can go and query the primary 503 00:18:56,220 --> 00:18:59,039 index it'll query the secondary index 504 00:18:57,780 --> 00:19:00,360 and it'll only pull that data together 505 00:18:59,039 --> 00:19:02,400 for that query 506 00:19:00,360 --> 00:19:05,039 there's a lot more to talk about with 507 00:19:02,400 --> 00:19:06,900 skip indexes this is sort of one of the 508 00:19:05,039 --> 00:19:08,580 the interesting pieces of clickhouse 509 00:19:06,900 --> 00:19:10,020 that when you get right into the details 510 00:19:08,580 --> 00:19:11,700 you'll start to talk more about it 511 00:19:10,020 --> 00:19:14,280 there's a number of sort of other 512 00:19:11,700 --> 00:19:15,419 mechanisms around indexing that's 513 00:19:14,280 --> 00:19:17,940 worthwhile to understand about 514 00:19:15,419 --> 00:19:19,679 clickhouse but like where I find that's 515 00:19:17,940 --> 00:19:21,600 really interesting is that all these 516 00:19:19,679 --> 00:19:24,299 things are kind of focused on 517 00:19:21,600 --> 00:19:27,360 how to take a query and not retrieve 518 00:19:24,299 --> 00:19:28,860 data from disk and it's very efficient 519 00:19:27,360 --> 00:19:31,320 at that 520 00:19:28,860 --> 00:19:32,820 and so to answer finally answer the sort 521 00:19:31,320 --> 00:19:34,140 of the fourth part of this what I 522 00:19:32,820 --> 00:19:36,299 thought was important to talk about in 523 00:19:34,140 --> 00:19:39,240 the time we had today when we talk about 524 00:19:36,299 --> 00:19:41,280 the speed is like we've covered it's a 525 00:19:39,240 --> 00:19:44,100 column in a database and and the 526 00:19:41,280 --> 00:19:45,660 advantages of that for querying we've 527 00:19:44,100 --> 00:19:47,220 talked about you know the data 528 00:19:45,660 --> 00:19:48,539 compression by type which is kind of 529 00:19:47,220 --> 00:19:51,299 used we've talked about this physical 530 00:19:48,539 --> 00:19:53,220 sparse index the primary keys and then 531 00:19:51,299 --> 00:19:56,100 the last thing is this vectorized query 532 00:19:53,220 --> 00:19:59,039 execution which is a very long-winded 533 00:19:56,100 --> 00:20:01,380 say work long-winded way of saying that 534 00:19:59,039 --> 00:20:03,840 it uses multiple processes 535 00:20:01,380 --> 00:20:05,280 okay so because it was written in the 536 00:20:03,840 --> 00:20:07,980 world of 537 00:20:05,280 --> 00:20:10,140 um multi-processor computers they've 538 00:20:07,980 --> 00:20:12,720 understood that they can take a column 539 00:20:10,140 --> 00:20:15,120 of data and separate it 540 00:20:12,720 --> 00:20:18,660 and run it on the processes that are 541 00:20:15,120 --> 00:20:20,640 available and aggregate the result uh 542 00:20:18,660 --> 00:20:22,020 in the end there and there's a lot of 543 00:20:20,640 --> 00:20:23,220 stuff that happens underneath the covers 544 00:20:22,020 --> 00:20:24,720 this is one of those nice things about 545 00:20:23,220 --> 00:20:27,480 clickhouse that you never have to worry 546 00:20:24,720 --> 00:20:30,299 about it just does it and so it really 547 00:20:27,480 --> 00:20:32,460 takes a lot it really takes advantage of 548 00:20:30,299 --> 00:20:33,780 the hardware that's available uh to the 549 00:20:32,460 --> 00:20:37,020 servers 550 00:20:33,780 --> 00:20:38,760 so it's really nice and again I sort of 551 00:20:37,020 --> 00:20:39,960 mentioned the Benchmark that we did it's 552 00:20:38,760 --> 00:20:42,360 worthwhile if you're really interested 553 00:20:39,960 --> 00:20:44,340 to jump out and have a look at it while 554 00:20:42,360 --> 00:20:45,960 you're looking at benchmarks I would 555 00:20:44,340 --> 00:20:47,460 encourage you to all to take that with 556 00:20:45,960 --> 00:20:48,380 the largest grain of salt that you can 557 00:20:47,460 --> 00:20:50,400 find 558 00:20:48,380 --> 00:20:53,580 benchmarks make an awful lot of 559 00:20:50,400 --> 00:20:55,440 assumptions about data 560 00:20:53,580 --> 00:20:57,179 and so the data set that we've 561 00:20:55,440 --> 00:20:59,700 benchmarked our stuff on might not be 562 00:20:57,179 --> 00:21:01,320 the data set that you want to run on 563 00:20:59,700 --> 00:21:04,200 um but it's it's an interesting sort of 564 00:21:01,320 --> 00:21:06,840 heuristic to think about 565 00:21:04,200 --> 00:21:09,840 so let's talk a little bit about uh 566 00:21:06,840 --> 00:21:12,240 let's see click house in action 567 00:21:09,840 --> 00:21:14,820 um and and we'll we'll see how we can 568 00:21:12,240 --> 00:21:16,020 run some queries and stuff here oh yeah 569 00:21:14,820 --> 00:21:18,720 so I was going to tell you what what 570 00:21:16,020 --> 00:21:21,179 we're going to show let's do that 571 00:21:18,720 --> 00:21:22,320 oh yeah there's a data set that is all 572 00:21:21,179 --> 00:21:24,960 the menus 573 00:21:22,320 --> 00:21:28,500 uh for restaurants in New York from like 574 00:21:24,960 --> 00:21:30,120 1850 uh onwards it's on the clickhouse 575 00:21:28,500 --> 00:21:32,580 documentation there's a whole bunch of 576 00:21:30,120 --> 00:21:35,280 test data sets uh this data set I think 577 00:21:32,580 --> 00:21:38,100 it's about 1.3 million rows so it's not 578 00:21:35,280 --> 00:21:41,900 massively huge but it's kind of fun to 579 00:21:38,100 --> 00:21:41,900 play with it'll work today 580 00:21:42,780 --> 00:21:46,380 um 581 00:21:44,340 --> 00:21:48,780 I am going to use 582 00:21:46,380 --> 00:21:52,140 my clickhouse client today 583 00:21:48,780 --> 00:21:53,820 excellent okay so let's start with just 584 00:21:52,140 --> 00:21:56,340 sort of have a look at what we've done 585 00:21:53,820 --> 00:21:58,500 here so the first thing to note is that 586 00:21:56,340 --> 00:22:00,360 the clickhouse data set comes with four 587 00:21:58,500 --> 00:22:02,700 tables 588 00:22:00,360 --> 00:22:04,380 and we've created this denormalized 589 00:22:02,700 --> 00:22:05,580 table here as well 590 00:22:04,380 --> 00:22:08,820 okay 591 00:22:05,580 --> 00:22:10,740 remember what we spoke about 592 00:22:08,820 --> 00:22:12,720 the difference between analytical 593 00:22:10,740 --> 00:22:14,640 processing and transaction processing 594 00:22:12,720 --> 00:22:17,039 how one's row based and one's column 595 00:22:14,640 --> 00:22:18,840 based the fact that if you've got column 596 00:22:17,039 --> 00:22:22,260 based data sets and column based 597 00:22:18,840 --> 00:22:24,299 processing means that really wide data 598 00:22:22,260 --> 00:22:26,820 tables with plenty like hundreds and 599 00:22:24,299 --> 00:22:30,179 hundreds and thousands of columns 600 00:22:26,820 --> 00:22:32,460 work really well in Click house 601 00:22:30,179 --> 00:22:34,020 it works really really well allows you 602 00:22:32,460 --> 00:22:35,400 index it allows you to query it back 603 00:22:34,020 --> 00:22:38,400 remember we're only ever going to pull 604 00:22:35,400 --> 00:22:40,020 columns so you get no penalty at all for 605 00:22:38,400 --> 00:22:41,580 having denormalized data so it's a 606 00:22:40,020 --> 00:22:43,020 really good thing to do so what's 607 00:22:41,580 --> 00:22:45,179 happened here as part of the script is 608 00:22:43,020 --> 00:22:47,900 we've taken that and we've denormalized 609 00:22:45,179 --> 00:22:47,900 that data set 610 00:22:49,380 --> 00:22:54,360 um so let's have a look at some queries 611 00:22:52,080 --> 00:22:56,100 and because no one wants to watch me fat 612 00:22:54,360 --> 00:22:58,020 finger a keyboard 613 00:22:56,100 --> 00:23:00,179 I'm going to copy and paste that so 614 00:22:58,020 --> 00:23:03,419 let's take a quick query and run that 615 00:23:00,179 --> 00:23:06,299 here so what have we got we've taken per 616 00:23:03,419 --> 00:23:08,659 per decade account an average price of 617 00:23:06,299 --> 00:23:11,100 the menu and you'll see 618 00:23:08,659 --> 00:23:13,380 that things were pretty stable for a 619 00:23:11,100 --> 00:23:15,780 while and then things got expensive if 620 00:23:13,380 --> 00:23:18,179 anyone's been to Manhattan lately you 621 00:23:15,780 --> 00:23:20,039 probably understand this to be true what 622 00:23:18,179 --> 00:23:21,720 is really interesting about clickhouse 623 00:23:20,039 --> 00:23:24,240 It's probably hard to see from up the 624 00:23:21,720 --> 00:23:26,460 back especially this is what I kind of 625 00:23:24,240 --> 00:23:29,159 like down the bottom that took point so 626 00:23:26,460 --> 00:23:31,320 two tenths of a second to process 1.3 627 00:23:29,159 --> 00:23:34,919 million rows of data and it actually 628 00:23:31,320 --> 00:23:36,840 scanned in 54 Meg of data which is by no 629 00:23:34,919 --> 00:23:38,880 means the entire data set 630 00:23:36,840 --> 00:23:41,460 okay and when you start to play with 631 00:23:38,880 --> 00:23:42,600 click house this kind of response works 632 00:23:41,460 --> 00:23:45,120 really well 633 00:23:42,600 --> 00:23:47,460 think about the use case where this is 634 00:23:45,120 --> 00:23:49,440 really well deployed into your business 635 00:23:47,460 --> 00:23:51,600 okay what's really fantastic about 636 00:23:49,440 --> 00:23:53,700 clickhouse is real-time analytics 637 00:23:51,600 --> 00:23:55,559 especially if you want to expose this 638 00:23:53,700 --> 00:23:57,419 out to say your customers or to people 639 00:23:55,559 --> 00:24:00,419 on a mobile phone or you want to have 640 00:23:57,419 --> 00:24:02,700 that analytics engine out in your 641 00:24:00,419 --> 00:24:06,600 customers world where they can sort of 642 00:24:02,700 --> 00:24:08,280 query it themselves or you know use a UI 643 00:24:06,600 --> 00:24:09,900 to pull data back that's relevant to 644 00:24:08,280 --> 00:24:12,900 them and they get a really performant 645 00:24:09,900 --> 00:24:12,900 experience 646 00:24:13,860 --> 00:24:20,400 um let's have a little look at dishes 647 00:24:17,159 --> 00:24:22,020 what could you buy that had potatoes in 648 00:24:20,400 --> 00:24:24,539 it in 1850 649 00:24:22,020 --> 00:24:27,720 in 1850 in New York you could have 650 00:24:24,539 --> 00:24:32,340 mashed baked plain or boiled 651 00:24:27,720 --> 00:24:35,100 so culinary Extravaganza in 1850s with 652 00:24:32,340 --> 00:24:38,280 potatoes let's have a look let's see 653 00:24:35,100 --> 00:24:40,440 what's happened in 2010 did chefs get 654 00:24:38,280 --> 00:24:42,360 any more imaginative 655 00:24:40,440 --> 00:24:45,720 um let's look at that 656 00:24:42,360 --> 00:24:48,140 they got more imaginative and uh a 657 00:24:45,720 --> 00:24:50,460 little bit wordier 658 00:24:48,140 --> 00:24:52,320 and it's probably about here they 659 00:24:50,460 --> 00:24:54,179 realize that menus are printed on A4 660 00:24:52,320 --> 00:24:56,100 pieces of paper 661 00:24:54,179 --> 00:24:57,840 so but is that yeah is that just 662 00:24:56,100 --> 00:24:59,640 something that's relevant to potatoes or 663 00:24:57,840 --> 00:25:01,380 did we spot maybe is there a trend there 664 00:24:59,640 --> 00:25:03,120 so let's go and have a look 665 00:25:01,380 --> 00:25:06,480 um can we pull out like what's the 666 00:25:03,120 --> 00:25:09,200 length the average length of menu titles 667 00:25:06,480 --> 00:25:09,200 across the years 668 00:25:09,480 --> 00:25:13,140 and as we support yes in the 2000s 669 00:25:11,700 --> 00:25:15,539 people realize that perhaps we're 670 00:25:13,140 --> 00:25:17,820 getting out of mind uh and we've started 671 00:25:15,539 --> 00:25:19,980 to drop back so some fun things uh to 672 00:25:17,820 --> 00:25:21,720 play around with clickhouse again 673 00:25:19,980 --> 00:25:23,760 we're looking at that two hundredths of 674 00:25:21,720 --> 00:25:25,200 a second row these and these are great 675 00:25:23,760 --> 00:25:27,000 like if you look at some of the queries 676 00:25:25,200 --> 00:25:28,640 of what you're doing here you don't 677 00:25:27,000 --> 00:25:30,240 count roundings 678 00:25:28,640 --> 00:25:32,039 extracting things you're running 679 00:25:30,240 --> 00:25:34,620 functions like they are they're queries 680 00:25:32,039 --> 00:25:36,299 that are scanning the entire data set 681 00:25:34,620 --> 00:25:39,840 and this is sort of a really good 682 00:25:36,299 --> 00:25:43,080 example of a queries that work better in 683 00:25:39,840 --> 00:25:45,600 an analytical processing scenario and B 684 00:25:43,080 --> 00:25:48,480 things that pardon me 685 00:25:45,600 --> 00:25:49,919 uh you know that how clickhouse works 686 00:25:48,480 --> 00:25:52,020 and and how it can make some fun stuff 687 00:25:49,919 --> 00:25:53,520 there as well 688 00:25:52,020 --> 00:25:54,419 my favorite let's see if we can figure 689 00:25:53,520 --> 00:25:56,760 out 690 00:25:54,419 --> 00:25:59,760 what the average price of a beer in 691 00:25:56,760 --> 00:26:02,779 uh was and it looks like things got 692 00:25:59,760 --> 00:26:05,820 really expensive there again at 693 00:26:02,779 --> 00:26:07,380 1980s for a buck for a beer 694 00:26:05,820 --> 00:26:08,880 um I don't know I don't trust the data 695 00:26:07,380 --> 00:26:10,679 on that one for some reason I'm not sure 696 00:26:08,880 --> 00:26:11,580 what happened to 28 dollars for beers 697 00:26:10,679 --> 00:26:13,559 but anyway 698 00:26:11,580 --> 00:26:15,059 uh again just some fun things to have 699 00:26:13,559 --> 00:26:17,159 but I definitely encourage you to go and 700 00:26:15,059 --> 00:26:18,600 have a look at the clickhouse data set 701 00:26:17,159 --> 00:26:20,700 there's a lot of different ones in there 702 00:26:18,600 --> 00:26:22,200 there's uh all the rides on taxis and 703 00:26:20,700 --> 00:26:24,299 and stuff there and and you can really 704 00:26:22,200 --> 00:26:27,179 sort of get a feel for how to set this 705 00:26:24,299 --> 00:26:28,500 up and load data ingesting data actually 706 00:26:27,179 --> 00:26:30,000 is one of those things that's really 707 00:26:28,500 --> 00:26:30,779 really useful 708 00:26:30,000 --> 00:26:33,299 um 709 00:26:30,779 --> 00:26:35,580 The Click house is very good at the the 710 00:26:33,299 --> 00:26:37,020 the interesting thing about having a 711 00:26:35,580 --> 00:26:37,799 data set 712 00:26:37,020 --> 00:26:40,020 um 713 00:26:37,799 --> 00:26:42,900 like in column to format and clickhouse 714 00:26:40,020 --> 00:26:45,539 does this very well is that it's really 715 00:26:42,900 --> 00:26:46,919 easy to write data to disk and so if 716 00:26:45,539 --> 00:26:49,559 you're going to load data in and you're 717 00:26:46,919 --> 00:26:52,260 going to set this up in production like 718 00:26:49,559 --> 00:26:55,500 inserting a single row at a time is a 719 00:26:52,260 --> 00:26:57,539 bad idea when you insert data into 720 00:26:55,500 --> 00:27:00,419 clickhouse it takes it and writes it to 721 00:26:57,539 --> 00:27:02,220 disk and then comes back in time and 722 00:27:00,419 --> 00:27:04,260 merges it in and 723 00:27:02,220 --> 00:27:06,299 fills it out and does the things behind 724 00:27:04,260 --> 00:27:08,400 the scenes but its first job is to get 725 00:27:06,299 --> 00:27:10,620 the data and write it to disk 726 00:27:08,400 --> 00:27:12,779 okay so if you're going to write one row 727 00:27:10,620 --> 00:27:15,360 at a time you're gonna get a lot of 728 00:27:12,779 --> 00:27:17,400 these really really small part files 729 00:27:15,360 --> 00:27:19,140 inside your clickhouse data they 730 00:27:17,400 --> 00:27:19,980 recommend like a million records at a 731 00:27:19,140 --> 00:27:22,380 time 732 00:27:19,980 --> 00:27:25,760 to ingest data and so it's a really 733 00:27:22,380 --> 00:27:25,760 really neat kind of scenario 734 00:27:25,799 --> 00:27:31,580 one of the things that we didn't see 735 00:27:27,659 --> 00:27:31,580 pardon me I'll go through this again 736 00:27:35,279 --> 00:27:39,299 this concept these are the data tables 737 00:27:37,679 --> 00:27:41,100 that we're just looking at there's an 738 00:27:39,299 --> 00:27:42,960 interesting thing that you've that might 739 00:27:41,100 --> 00:27:46,500 be new it was new to me this concept of 740 00:27:42,960 --> 00:27:48,539 a table engine and so when you define a 741 00:27:46,500 --> 00:27:53,760 table in Click house you can actually 742 00:27:48,539 --> 00:27:55,260 Define per table how data is stored on 743 00:27:53,760 --> 00:27:56,580 disk like how are you going to treat 744 00:27:55,260 --> 00:27:57,419 that data and there's a number of 745 00:27:56,580 --> 00:28:01,559 different 746 00:27:57,419 --> 00:28:03,720 table engines the merge tree family of 747 00:28:01,559 --> 00:28:06,059 table engines is by far and away the 748 00:28:03,720 --> 00:28:08,520 most common and popular one it's the one 749 00:28:06,059 --> 00:28:11,580 that's for the scenarios we've discussed 750 00:28:08,520 --> 00:28:14,580 is uh is exactly what that's built for 751 00:28:11,580 --> 00:28:15,900 for ingest and analytic queries there's 752 00:28:14,580 --> 00:28:18,360 a few others we'll talk about a couple 753 00:28:15,900 --> 00:28:21,600 others in a second the this replication 754 00:28:18,360 --> 00:28:24,020 merge tree is just a merge tree but also 755 00:28:21,600 --> 00:28:27,000 that is replicated across multiple nodes 756 00:28:24,020 --> 00:28:28,740 quick house supports replication across 757 00:28:27,000 --> 00:28:30,299 multiple nodes for high availability and 758 00:28:28,740 --> 00:28:33,179 all that kind of stuff 759 00:28:30,299 --> 00:28:35,760 I did speak a bunch about primary Keys 760 00:28:33,179 --> 00:28:38,640 remember and you'll notice there's no 761 00:28:35,760 --> 00:28:40,440 Declaration of a primary key there but 762 00:28:38,640 --> 00:28:42,960 there's an order ID 763 00:28:40,440 --> 00:28:45,720 okay so whenever you specify that sort 764 00:28:42,960 --> 00:28:48,179 order order by this ID you're telling it 765 00:28:45,720 --> 00:28:49,679 that I want it to be a primary key as 766 00:28:48,179 --> 00:28:51,299 well so we're going to do the index on 767 00:28:49,679 --> 00:28:52,440 that sort order 768 00:28:51,299 --> 00:28:56,419 um 769 00:28:52,440 --> 00:28:56,419 just for those that are playing at home 770 00:28:56,480 --> 00:29:00,960 some other things that you clickhouse 771 00:28:58,980 --> 00:29:03,000 does is allow some really interesting 772 00:29:00,960 --> 00:29:05,640 type of data formats to pull data into 773 00:29:03,000 --> 00:29:07,140 the database as well there's a lot of 774 00:29:05,640 --> 00:29:09,120 different functions there's a lot of 775 00:29:07,140 --> 00:29:12,720 ways to pull data from other data 776 00:29:09,120 --> 00:29:15,500 sources in two click house 777 00:29:12,720 --> 00:29:18,500 Park a Avro 778 00:29:15,500 --> 00:29:21,059 CSV tsv 779 00:29:18,500 --> 00:29:22,679 protobuf all these kind of things are 780 00:29:21,059 --> 00:29:24,179 available for you to pull in there's an 781 00:29:22,679 --> 00:29:25,500 awful lot of formats to pull into the 782 00:29:24,179 --> 00:29:28,679 database include the clickhouse 783 00:29:25,500 --> 00:29:31,740 engineers have made that very easy to do 784 00:29:28,679 --> 00:29:35,000 you can pull data from S3 you can pull 785 00:29:31,740 --> 00:29:35,000 data from all different sources 786 00:29:35,580 --> 00:29:40,740 we spoke about that this is a good 787 00:29:38,520 --> 00:29:43,260 example this is what we did run okay so 788 00:29:40,740 --> 00:29:44,940 this is how we just pull and denormalize 789 00:29:43,260 --> 00:29:47,039 that data once we got it in there we 790 00:29:44,940 --> 00:29:50,039 just selected 791 00:29:47,039 --> 00:29:51,299 um create the table and you can sort of 792 00:29:50,039 --> 00:29:53,880 create 793 00:29:51,299 --> 00:29:55,679 as a select it's kind of a neat little 794 00:29:53,880 --> 00:29:57,299 way of of building data inside 795 00:29:55,679 --> 00:29:59,279 clickhouse as well there's an awful lot 796 00:29:57,299 --> 00:30:01,020 of different ways to do this functions 797 00:29:59,279 --> 00:30:03,000 or table engines and stuff we won't sort 798 00:30:01,020 --> 00:30:04,860 of talk about that 799 00:30:03,000 --> 00:30:06,419 but what about that scenario we touched 800 00:30:04,860 --> 00:30:10,080 on as like well you've got this stream 801 00:30:06,419 --> 00:30:11,820 of data in Apache Kafka it's unbounded 802 00:30:10,080 --> 00:30:15,179 it's never going to stop how do you get 803 00:30:11,820 --> 00:30:17,159 that across into this world of I want to 804 00:30:15,179 --> 00:30:20,039 be able to run a table 805 00:30:17,159 --> 00:30:21,480 um and and do some real-time stuff well 806 00:30:20,039 --> 00:30:24,360 there's this thing in clickhouse called 807 00:30:21,480 --> 00:30:26,580 the Kafka table engine which connects to 808 00:30:24,360 --> 00:30:28,440 a topic inside Apache Kafka and 809 00:30:26,580 --> 00:30:30,179 represents itself as a consumer group on 810 00:30:28,440 --> 00:30:31,080 that topic 811 00:30:30,179 --> 00:30:32,760 um 812 00:30:31,080 --> 00:30:34,500 what's that what means what's 813 00:30:32,760 --> 00:30:36,360 interesting about that is that you sort 814 00:30:34,500 --> 00:30:38,279 of have to build inside your world a 815 00:30:36,360 --> 00:30:40,559 materialize view welcome to another 816 00:30:38,279 --> 00:30:43,260 table engine uh inside 817 00:30:40,559 --> 00:30:44,760 click house because and for those of us 818 00:30:43,260 --> 00:30:46,679 that have played with Kafka a little bit 819 00:30:44,760 --> 00:30:48,059 you sort of get to read data from a 820 00:30:46,679 --> 00:30:50,700 topic once 821 00:30:48,059 --> 00:30:53,039 and then you advance the offset and so 822 00:30:50,700 --> 00:30:55,919 the idea behind the way clickhouse works 823 00:30:53,039 --> 00:30:57,899 is the the Kafka engine reads data once 824 00:30:55,919 --> 00:30:59,100 the materialized view takes that data 825 00:30:57,899 --> 00:31:02,279 and puts it in a table that you're going 826 00:30:59,100 --> 00:31:04,799 to query multiple times and then your 827 00:31:02,279 --> 00:31:07,159 Kafka engine is ready to read data 828 00:31:04,799 --> 00:31:09,659 at the latest offset in Kafka because 829 00:31:07,159 --> 00:31:10,799 Kafka you don't want to be reading off 830 00:31:09,659 --> 00:31:13,020 disk 831 00:31:10,799 --> 00:31:14,760 uh it's an interesting sort of pattern 832 00:31:13,020 --> 00:31:16,020 uh here at Ivan it's one of those things 833 00:31:14,760 --> 00:31:17,940 we like to make easier those 834 00:31:16,020 --> 00:31:20,039 Integrations so you can kick click a 835 00:31:17,940 --> 00:31:23,039 couple of buttons and get that done 836 00:31:20,039 --> 00:31:25,020 this is just an example of running those 837 00:31:23,039 --> 00:31:28,940 kind of things I mean 838 00:31:25,020 --> 00:31:28,940 this is just a recording of it working 839 00:31:29,539 --> 00:31:34,980 so if it's just a recording is it really 840 00:31:32,760 --> 00:31:36,779 working uh no that's that's just kind of 841 00:31:34,980 --> 00:31:39,360 how this works this is this good example 842 00:31:36,779 --> 00:31:41,640 of running real-time analytics off a 843 00:31:39,360 --> 00:31:43,620 real-time data stream that you want to 844 00:31:41,640 --> 00:31:44,880 do things like expose back to customers 845 00:31:43,620 --> 00:31:47,279 or to different internal business 846 00:31:44,880 --> 00:31:49,140 stakeholders these kind of things really 847 00:31:47,279 --> 00:31:51,539 work really well 848 00:31:49,140 --> 00:31:53,159 um and it's a nice it's a nice pattern 849 00:31:51,539 --> 00:31:55,080 to get used to 850 00:31:53,159 --> 00:31:57,539 the other sort of postgres or the other 851 00:31:55,080 --> 00:31:59,880 table engine we support at Ivan is this 852 00:31:57,539 --> 00:32:02,580 concept of reading data from postgres so 853 00:31:59,880 --> 00:32:03,320 I'll go back to our online store analogy 854 00:32:02,580 --> 00:32:06,659 again 855 00:32:03,320 --> 00:32:09,240 Catherine was looking at product ID 138 856 00:32:06,659 --> 00:32:10,740 right but maybe in this table you want 857 00:32:09,240 --> 00:32:13,679 to know what the name of that product is 858 00:32:10,740 --> 00:32:17,940 what its price perhaps right and so you 859 00:32:13,679 --> 00:32:20,159 can query postgres inside clickhouse 860 00:32:17,940 --> 00:32:21,960 and you can pull up and query that data 861 00:32:20,159 --> 00:32:24,120 either as a reference table and what the 862 00:32:21,960 --> 00:32:26,220 postgres engine will keep that data 863 00:32:24,120 --> 00:32:29,520 table in postgres available to the 864 00:32:26,220 --> 00:32:31,919 service periodically inside clickhouse 865 00:32:29,520 --> 00:32:34,380 and you can then start to use that to 866 00:32:31,919 --> 00:32:36,960 build your queries in Click house and 867 00:32:34,380 --> 00:32:39,539 get some more end use a meaningful type 868 00:32:36,960 --> 00:32:41,700 of reference data if that's kind of one 869 00:32:39,539 --> 00:32:43,140 of your things you can do there is an 870 00:32:41,700 --> 00:32:45,120 interesting pattern we're working on at 871 00:32:43,140 --> 00:32:47,880 the moment in the in the sa group is to 872 00:32:45,120 --> 00:32:50,880 use this concept with the ability to 873 00:32:47,880 --> 00:32:53,279 select from into as a way to Archive 874 00:32:50,880 --> 00:32:55,860 data out of postgres 875 00:32:53,279 --> 00:32:57,840 is kind of a fun kind of world we've got 876 00:32:55,860 --> 00:33:00,659 a bunch of customers that use postgres 877 00:32:57,840 --> 00:33:02,520 for time series database because like I 878 00:33:00,659 --> 00:33:04,140 said we love postgres and so they're 879 00:33:02,520 --> 00:33:05,760 like hey I've got terabytes of this time 880 00:33:04,140 --> 00:33:07,919 series data and postgres how do I get it 881 00:33:05,760 --> 00:33:09,539 to clickhouse all right well maybe we 882 00:33:07,919 --> 00:33:12,860 can run that select into and then you 883 00:33:09,539 --> 00:33:12,860 can clean out that postgres table 884 00:33:13,140 --> 00:33:17,960 uh interesting world we live in now 885 00:33:15,480 --> 00:33:17,960 okay 886 00:33:18,260 --> 00:33:22,640 I hope that some of these things that we 887 00:33:21,539 --> 00:33:25,919 spoke about 888 00:33:22,640 --> 00:33:27,960 sort of fired that use case of uh maybe 889 00:33:25,919 --> 00:33:31,200 I shouldn't use this for transactions 890 00:33:27,960 --> 00:33:34,100 it's not a transactional database 891 00:33:31,200 --> 00:33:37,260 right it's not a transactional database 892 00:33:34,100 --> 00:33:40,620 you can do deletes because we all have 893 00:33:37,260 --> 00:33:42,419 to be gdpr compliant okay but you don't 894 00:33:40,620 --> 00:33:44,220 want to be sort of updating and deleting 895 00:33:42,419 --> 00:33:46,860 individual records in clickhouse you're 896 00:33:44,220 --> 00:33:50,039 not it's not going to like it 897 00:33:46,860 --> 00:33:51,480 it is not a key Value Store 898 00:33:50,039 --> 00:33:55,440 okay 899 00:33:51,480 --> 00:33:57,899 if you want to run redis run redis 900 00:33:55,440 --> 00:34:02,640 um it is not a file store 901 00:33:57,899 --> 00:34:04,919 if you want to S3 run S3 or S3 things 902 00:34:02,640 --> 00:34:07,200 right it's not a file store and it is 903 00:34:04,919 --> 00:34:09,780 not a document store 904 00:34:07,200 --> 00:34:12,359 okay while I say that it does support 905 00:34:09,780 --> 00:34:14,159 Json there's like Json engines and stuff 906 00:34:12,359 --> 00:34:16,980 in there to support Jason so if you want 907 00:34:14,159 --> 00:34:18,419 to drop documents in there but again the 908 00:34:16,980 --> 00:34:21,060 use cases that you typically use 909 00:34:18,419 --> 00:34:23,220 something like a mongodb for back to 910 00:34:21,060 --> 00:34:26,159 that point as they tend to be they can 911 00:34:23,220 --> 00:34:28,080 be update heavy as well right you don't 912 00:34:26,159 --> 00:34:31,339 want to be updating database records 913 00:34:28,080 --> 00:34:31,339 very often inside clickhouse 914 00:34:31,980 --> 00:34:36,419 so I'm going to stop um I think I'm 915 00:34:34,440 --> 00:34:37,859 close to running out of time I wanted to 916 00:34:36,419 --> 00:34:40,260 make sure we left some time for any 917 00:34:37,859 --> 00:34:41,820 questions uh for either in the room or 918 00:34:40,260 --> 00:34:44,159 if we've got the capability to take them 919 00:34:41,820 --> 00:34:46,080 online I'm not entirely sure 920 00:34:44,159 --> 00:34:48,000 um but we'll have a microphone if anyone 921 00:34:46,080 --> 00:34:49,619 wants some questions while someone 922 00:34:48,000 --> 00:34:51,599 thinks about the most hardest question 923 00:34:49,619 --> 00:34:53,940 that you can stump me with that's not 924 00:34:51,599 --> 00:34:57,839 that hard by the way what we've got here 925 00:34:53,940 --> 00:34:59,820 is our over on your left my right is 926 00:34:57,839 --> 00:35:01,800 just a link to a whole bunch of 927 00:34:59,820 --> 00:35:03,480 resources that Elena put together that I 928 00:35:01,800 --> 00:35:05,940 found really helpful to get started for 929 00:35:03,480 --> 00:35:08,640 this on the left 930 00:35:05,940 --> 00:35:10,380 my left your right is a link to a GitHub 931 00:35:08,640 --> 00:35:12,599 repository that is that terraform script 932 00:35:10,380 --> 00:35:14,760 that I built and loaded all that data if 933 00:35:12,599 --> 00:35:18,420 you want and if you want to run that one 934 00:35:14,760 --> 00:35:20,880 there is a free trial for Ivan so you 935 00:35:18,420 --> 00:35:23,339 can go and sort of run quick house 936 00:35:20,880 --> 00:35:24,660 and do that as well I will thoroughly 937 00:35:23,339 --> 00:35:27,060 recommend 938 00:35:24,660 --> 00:35:29,579 um if you're interested getting onto 939 00:35:27,060 --> 00:35:31,440 click house and starting with the 940 00:35:29,579 --> 00:35:34,140 documentation there it's very very well 941 00:35:31,440 --> 00:35:36,000 documented it's very well explained 942 00:35:34,140 --> 00:35:37,680 um but yeah I'll take questions we've 943 00:35:36,000 --> 00:35:39,480 got one at the back I'll just wait for 944 00:35:37,680 --> 00:35:41,820 you to get the mic if we can just so 945 00:35:39,480 --> 00:35:43,560 those online can um 946 00:35:41,820 --> 00:35:45,900 can hold you accountable in the future 947 00:35:43,560 --> 00:35:47,579 years you mentioned like uh pulling in 948 00:35:45,900 --> 00:35:49,440 data so if you have like say a large 949 00:35:47,579 --> 00:35:52,800 number of clients where they be say 950 00:35:49,440 --> 00:35:55,140 mobile app clients or iot devices you 951 00:35:52,800 --> 00:35:57,540 said it's not good to write one at a 952 00:35:55,140 --> 00:35:59,220 time like uh data one at a time what 953 00:35:57,540 --> 00:36:02,339 would be the recommended way to stream 954 00:35:59,220 --> 00:36:04,020 data in is it via directly into click 955 00:36:02,339 --> 00:36:05,460 house or should you be putting that 956 00:36:04,020 --> 00:36:07,859 something in front of it and then 957 00:36:05,460 --> 00:36:09,480 whatever that Services then streams the 958 00:36:07,859 --> 00:36:11,220 data into clickhouse yeah it's a good 959 00:36:09,480 --> 00:36:13,820 question typically I'd probably have 960 00:36:11,220 --> 00:36:16,800 something like Kafka in front of that 961 00:36:13,820 --> 00:36:19,079 Kafka is definitely 962 00:36:16,800 --> 00:36:20,940 um going to handle sort of the ingest on 963 00:36:19,079 --> 00:36:22,560 that does some nice things like order 964 00:36:20,940 --> 00:36:25,940 things in time as well gives you the 965 00:36:22,560 --> 00:36:29,160 ability to process it in other scenarios 966 00:36:25,940 --> 00:36:30,839 there's also a fun conversation to have 967 00:36:29,160 --> 00:36:33,119 around iot devices around network 968 00:36:30,839 --> 00:36:35,339 security and network boundaries because 969 00:36:33,119 --> 00:36:39,480 now you've got to expose endpoints 970 00:36:35,339 --> 00:36:40,680 somewhere to the internet potentially 971 00:36:39,480 --> 00:36:43,440 um 972 00:36:40,680 --> 00:36:45,900 and so again like having either you know 973 00:36:43,440 --> 00:36:47,160 an API and Kafka in front of that I 974 00:36:45,900 --> 00:36:49,260 would prefer to do that rather than 975 00:36:47,160 --> 00:36:50,880 expose my clickhouse service and then 976 00:36:49,260 --> 00:36:52,380 the last which you absolutely hit on 977 00:36:50,880 --> 00:36:54,119 you're still just going to try and 978 00:36:52,380 --> 00:36:55,859 insert one record at a time just at 979 00:36:54,119 --> 00:36:58,140 massive volume I would imagine you would 980 00:36:55,859 --> 00:36:59,760 break click house pretty quickly what 981 00:36:58,140 --> 00:37:01,260 the Kafka engine does which is not 982 00:36:59,760 --> 00:37:03,839 immediately obvious is it actually reads 983 00:37:01,260 --> 00:37:06,000 in batches it'll take that and and take 984 00:37:03,839 --> 00:37:07,560 those batches off that topic and insert 985 00:37:06,000 --> 00:37:08,880 them for you so it handles that for you 986 00:37:07,560 --> 00:37:12,260 as well 987 00:37:08,880 --> 00:37:12,260 good question thank you very much 988 00:37:15,599 --> 00:37:20,099 I wanted to ask is there any integration 989 00:37:17,760 --> 00:37:22,320 with redis 990 00:37:20,099 --> 00:37:24,420 uh is there any integration with redis 991 00:37:22,320 --> 00:37:26,820 that's an excellent question is not at 992 00:37:24,420 --> 00:37:29,640 the moment on the Ivan service I don't 993 00:37:26,820 --> 00:37:33,660 know if there's a redis 994 00:37:29,640 --> 00:37:35,339 function I didn't see one but the amount 995 00:37:33,660 --> 00:37:36,720 of things that I don't know could fill 996 00:37:35,339 --> 00:37:38,460 this room 997 00:37:36,720 --> 00:37:40,440 um so I don't know to be honest I don't 998 00:37:38,460 --> 00:37:42,540 I don't think so but 999 00:37:40,440 --> 00:37:45,260 I'd love to be proven wrong 1000 00:37:42,540 --> 00:37:45,260 is there 1001 00:37:45,420 --> 00:37:49,859 yeah I don't know 1002 00:37:47,060 --> 00:37:53,119 I was about to have a conversation but 1003 00:37:49,859 --> 00:37:53,119 the answers I don't know sorry 1004 00:37:55,680 --> 00:38:00,119 anything else it was either a 1005 00:37:57,599 --> 00:38:02,460 wonderfully concise and easy to digest 1006 00:38:00,119 --> 00:38:03,900 talk or not 1007 00:38:02,460 --> 00:38:06,359 um 1008 00:38:03,900 --> 00:38:08,400 all right hey folks thanks very much for 1009 00:38:06,359 --> 00:38:09,720 joining us today I hope you have a 1010 00:38:08,400 --> 00:38:11,940 wonderful conference as the first of 1011 00:38:09,720 --> 00:38:13,800 three days so I'd like to thank the 1012 00:38:11,940 --> 00:38:15,060 organizers for having us here 1013 00:38:13,800 --> 00:38:17,760 um it's and I'd like to thank everyone 1014 00:38:15,060 --> 00:38:20,640 for turning up as well it's it's always 1015 00:38:17,760 --> 00:38:22,440 a wonderful show of support for open 1016 00:38:20,640 --> 00:38:25,020 source communities that you folks take 1017 00:38:22,440 --> 00:38:26,579 time out of your busy day and I hope you 1018 00:38:25,020 --> 00:38:28,380 have a wonderful day and and learn some 1019 00:38:26,579 --> 00:38:31,280 interesting things 1020 00:38:28,380 --> 00:38:35,169 thank you very much 1021 00:38:31,280 --> 00:38:35,169 [Applause]