1 00:00:12,160 --> 00:00:16,320 welcome back everyone hope you had a 2 00:00:13,840 --> 00:00:18,320 nice break got some food bit of exercise 3 00:00:16,320 --> 00:00:20,160 hopefully a little bit of sunshine from 4 00:00:18,320 --> 00:00:23,199 wherever you are in the world 5 00:00:20,160 --> 00:00:24,880 and uh welcome back to the second half 6 00:00:23,199 --> 00:00:26,960 of devops 7 00:00:24,880 --> 00:00:30,080 and for our first talk uh here back here 8 00:00:26,960 --> 00:00:31,519 in the devops track um we have a talk by 9 00:00:30,080 --> 00:00:34,000 peter chu 10 00:00:31,519 --> 00:00:35,840 called things might go wrong in a data 11 00:00:34,000 --> 00:00:38,160 intensive application 12 00:00:35,840 --> 00:00:39,920 i'm intrigued to find out what possible 13 00:00:38,160 --> 00:00:42,320 things could go wrong as we all know 14 00:00:39,920 --> 00:00:45,840 there's an endless possibility so with 15 00:00:42,320 --> 00:00:45,840 that let's take it away peter 16 00:00:50,000 --> 00:00:55,520 hello everyone i'm here to talk about my 17 00:00:52,640 --> 00:00:59,640 dev oops experience about building a 18 00:00:55,520 --> 00:00:59,640 data intensive application 19 00:01:01,120 --> 00:01:06,320 so what is data intensive application 20 00:01:04,159 --> 00:01:07,680 well i think there is no formal 21 00:01:06,320 --> 00:01:11,040 definition 22 00:01:07,680 --> 00:01:14,320 and i will adopt an idea from deciding 23 00:01:11,040 --> 00:01:16,799 data intensive application written by 24 00:01:14,320 --> 00:01:19,439 martin equipment in my talk 25 00:01:16,799 --> 00:01:22,479 in this book we call it application data 26 00:01:19,439 --> 00:01:23,600 intensive if data is its primary 27 00:01:22,479 --> 00:01:26,479 challenge 28 00:01:23,600 --> 00:01:30,000 the quantity of data the complexity of 29 00:01:26,479 --> 00:01:33,759 data are speed at which lead is changing 30 00:01:30,000 --> 00:01:36,720 the next view circles are bottlenecked 31 00:01:33,759 --> 00:01:39,119 for this idea we can say almost all 32 00:01:36,720 --> 00:01:42,640 success for our business 33 00:01:39,119 --> 00:01:45,920 a based on data intensive system 34 00:01:42,640 --> 00:01:48,640 to decide the data intensive application 35 00:01:45,920 --> 00:01:50,799 we need to consider not only functional 36 00:01:48,640 --> 00:01:52,399 requirements but non-functional 37 00:01:50,799 --> 00:01:56,320 requirements 38 00:01:52,399 --> 00:01:59,439 such as reliability and scalability 39 00:01:56,320 --> 00:02:03,680 and we already have many success stories 40 00:01:59,439 --> 00:02:07,119 from google amazon and facebook 41 00:02:03,680 --> 00:02:09,360 it tends to let the thing 42 00:02:07,119 --> 00:02:12,800 we can just follow their architecture to 43 00:02:09,360 --> 00:02:15,120 build reliable and scalable system 44 00:02:12,800 --> 00:02:19,120 we can just introduce things like the 45 00:02:15,120 --> 00:02:21,360 release or kubernetes and then 46 00:02:19,120 --> 00:02:24,160 whoa now we have system can keep 47 00:02:21,360 --> 00:02:26,800 variable and scalable 48 00:02:24,160 --> 00:02:30,239 well i won't say totally wrong 49 00:02:26,800 --> 00:02:31,280 and i thought we should notice the bios 50 00:02:30,239 --> 00:02:34,959 here 51 00:02:31,280 --> 00:02:37,840 those stories are just in effect as the 52 00:02:34,959 --> 00:02:40,000 sharp layer current situation that of 53 00:02:37,840 --> 00:02:40,959 course 54 00:02:40,000 --> 00:02:43,280 uh 55 00:02:40,959 --> 00:02:45,680 they maybe have different requirements 56 00:02:43,280 --> 00:02:48,720 compared to your place 57 00:02:45,680 --> 00:02:53,040 and they don't tell you how to get there 58 00:02:48,720 --> 00:02:53,040 how many mistakes they made in the past 59 00:02:53,519 --> 00:02:57,120 and that's why we are here today 60 00:02:58,959 --> 00:03:04,319 i am a software engineer from taiwan 61 00:03:02,080 --> 00:03:05,440 as well as a python start like all you 62 00:03:04,319 --> 00:03:08,400 guys 63 00:03:05,440 --> 00:03:10,720 i am a psf contributing member and 64 00:03:08,400 --> 00:03:12,800 organizing pycon in taiwan 65 00:03:10,720 --> 00:03:14,560 by the way taikan taiwan will be home 66 00:03:12,800 --> 00:03:17,280 virtually this year 67 00:03:14,560 --> 00:03:20,239 so welcome everyone to join us online 68 00:03:17,280 --> 00:03:22,480 from october 2nd to 3rd 69 00:03:20,239 --> 00:03:25,760 you can get more informations on our 70 00:03:22,480 --> 00:03:28,400 website if you're interested 71 00:03:25,760 --> 00:03:30,640 back on track the main point is 72 00:03:28,400 --> 00:03:32,879 i've been working on a data intensive 73 00:03:30,640 --> 00:03:35,519 systems for many years 74 00:03:32,879 --> 00:03:38,159 and i'm glad to share my experience with 75 00:03:35,519 --> 00:03:38,159 you today 76 00:03:39,440 --> 00:03:45,840 let me introduce the case we study today 77 00:03:43,040 --> 00:03:49,200 it's a data managed 78 00:03:45,840 --> 00:03:51,280 we host user data for various usage 79 00:03:49,200 --> 00:03:54,319 patterns and workloads 80 00:03:51,280 --> 00:03:55,920 such as online streaming ld data 81 00:03:54,319 --> 00:03:59,599 abbreviation 82 00:03:55,920 --> 00:04:02,879 file distribution things like that 83 00:03:59,599 --> 00:04:06,000 currently it hosts several petabytes and 84 00:04:02,879 --> 00:04:07,599 transfers several several terabytes a 85 00:04:06,000 --> 00:04:09,599 day 86 00:04:07,599 --> 00:04:11,280 in case you don't have a feeling about 87 00:04:09,599 --> 00:04:15,040 these numbers 88 00:04:11,280 --> 00:04:17,359 i google some vectors with you 89 00:04:15,040 --> 00:04:19,199 for example size of 90 00:04:17,359 --> 00:04:23,360 github 91 00:04:19,199 --> 00:04:26,479 arctic code world is 21 terabytes and 92 00:04:23,360 --> 00:04:28,479 java's data lake has 34 petabytes of 93 00:04:26,479 --> 00:04:31,199 data currently 94 00:04:28,479 --> 00:04:34,880 and here is a fun fact 95 00:04:31,199 --> 00:04:39,280 if we put all our basic disks on the 96 00:04:34,880 --> 00:04:40,160 ground we can cover whole football field 97 00:04:39,280 --> 00:04:42,400 and 98 00:04:40,160 --> 00:04:46,080 of course this case cannot be compared 99 00:04:42,400 --> 00:04:48,880 to those strands but i think is no 100 00:04:46,080 --> 00:04:51,360 practical case for most people since not 101 00:04:48,880 --> 00:04:56,040 everyone has the opportunity to view 102 00:04:51,360 --> 00:04:56,040 facebook or google from scratch 103 00:04:57,199 --> 00:05:02,639 this is what they look like 104 00:05:00,000 --> 00:05:06,560 i think it's a game architecture of a 105 00:05:02,639 --> 00:05:09,600 large-scale distributed system 106 00:05:06,560 --> 00:05:12,320 on the top of the diagram we can have 107 00:05:09,600 --> 00:05:13,440 reverse proxy and stellis application 108 00:05:12,320 --> 00:05:16,000 servers 109 00:05:13,440 --> 00:05:18,720 which are responsible for receiving and 110 00:05:16,000 --> 00:05:21,520 serving your service 111 00:05:18,720 --> 00:05:23,600 at the bottom of the diagram you can see 112 00:05:21,520 --> 00:05:26,560 we have various kinds of data 113 00:05:23,600 --> 00:05:28,639 technologies to store structure and 114 00:05:26,560 --> 00:05:31,199 unstructured data 115 00:05:28,639 --> 00:05:34,880 we use starting and partitioning to 116 00:05:31,199 --> 00:05:38,400 distribute loading to different nodes 117 00:05:34,880 --> 00:05:41,840 also we adopt this review file systems 118 00:05:38,400 --> 00:05:44,160 to store unstructured data 119 00:05:41,840 --> 00:05:46,800 there are thunderstorms there are some 120 00:05:44,160 --> 00:05:49,919 other subsystems for drop processing and 121 00:05:46,800 --> 00:05:49,919 data analysis 122 00:05:50,080 --> 00:05:57,039 this is just a very rough diagram i know 123 00:05:53,840 --> 00:05:59,360 to do just like you have an idea 124 00:05:57,039 --> 00:06:01,919 i won't go through deeply since today we 125 00:05:59,360 --> 00:06:05,400 focus on mistakes we made 126 00:06:01,919 --> 00:06:05,400 now how we succeed 127 00:06:06,319 --> 00:06:10,960 in the following time i will tell you 128 00:06:08,560 --> 00:06:13,039 four of many incidents i made in the 129 00:06:10,960 --> 00:06:16,319 past years 130 00:06:13,039 --> 00:06:19,360 the first two are about scalability 131 00:06:16,319 --> 00:06:21,919 the others are about reliability 132 00:06:19,360 --> 00:06:24,319 finally we will review what we can learn 133 00:06:21,919 --> 00:06:28,240 from these incidents 134 00:06:24,319 --> 00:06:28,240 so here we go instead of one 135 00:06:29,520 --> 00:06:34,880 one day we had a new customer 136 00:06:32,400 --> 00:06:39,199 they upload their data generated by 137 00:06:34,880 --> 00:06:43,199 thousands of devices to our platform 138 00:06:39,199 --> 00:06:46,319 24 7 f365 139 00:06:43,199 --> 00:06:48,720 we never saw this usage pattern before 140 00:06:46,319 --> 00:06:52,080 and as you might expect it 141 00:06:48,720 --> 00:06:52,080 we cannot handle it 142 00:06:52,800 --> 00:06:57,759 i trust the issue and figure out a 143 00:06:54,960 --> 00:07:00,880 problem using database 144 00:06:57,759 --> 00:07:04,240 for if for efficiency we use 145 00:07:00,880 --> 00:07:06,720 optimistic locking in our system 146 00:07:04,240 --> 00:07:08,960 optimistic locking means 147 00:07:06,720 --> 00:07:12,800 the system doesn't lock the data 148 00:07:08,960 --> 00:07:16,319 explicitly before we manipulate it 149 00:07:12,800 --> 00:07:19,199 we use it because we ensure 150 00:07:16,319 --> 00:07:20,960 our workloads will now cause contention 151 00:07:19,199 --> 00:07:23,919 very often 152 00:07:20,960 --> 00:07:23,919 but we're wrong 153 00:07:24,319 --> 00:07:27,599 a straightforward solution is to switch 154 00:07:26,880 --> 00:07:29,680 to 155 00:07:27,599 --> 00:07:32,160 pessimistic locking 156 00:07:29,680 --> 00:07:35,039 which ensure only one tray can 157 00:07:32,160 --> 00:07:38,560 manipulate data at each time 158 00:07:35,039 --> 00:07:41,680 but it actually causes many problems 159 00:07:38,560 --> 00:07:44,319 performance of other usage patterns 160 00:07:41,680 --> 00:07:46,960 don't worry since their workload are 161 00:07:44,319 --> 00:07:49,599 suffering from unnecessary locking 162 00:07:46,960 --> 00:07:49,599 operations 163 00:07:51,599 --> 00:07:58,639 in the end i decided hybrid and adaptive 164 00:07:55,120 --> 00:08:01,120 approach to address the issue over here 165 00:07:58,639 --> 00:08:03,759 means we introduce both 166 00:08:01,120 --> 00:08:05,360 optimistic and pessimistic logging in 167 00:08:03,759 --> 00:08:08,479 our system 168 00:08:05,360 --> 00:08:10,160 operations that may encounter contention 169 00:08:08,479 --> 00:08:13,440 such as writing 170 00:08:10,160 --> 00:08:16,240 will use pessimistic login 171 00:08:13,440 --> 00:08:17,919 the others give use optimistic login as 172 00:08:16,240 --> 00:08:19,759 before 173 00:08:17,919 --> 00:08:22,000 on the other hand 174 00:08:19,759 --> 00:08:23,440 adaptive means by 175 00:08:22,000 --> 00:08:26,000 by default 176 00:08:23,440 --> 00:08:27,520 operations just need to obtain a lock 177 00:08:26,000 --> 00:08:30,479 from 178 00:08:27,520 --> 00:08:33,440 application servers before processing 179 00:08:30,479 --> 00:08:35,680 this is not real pessimistic lock since 180 00:08:33,440 --> 00:08:38,719 operations may still complete while 181 00:08:35,680 --> 00:08:42,000 doing updates in database 182 00:08:38,719 --> 00:08:45,040 recall is local lock 183 00:08:42,000 --> 00:08:48,480 well carefully this detected 184 00:08:45,040 --> 00:08:50,480 will complete is actually occur 185 00:08:48,480 --> 00:08:53,519 the system will ensure the rear 186 00:08:50,480 --> 00:08:57,200 pessimistic locking automatically 187 00:08:53,519 --> 00:08:57,530 we call is global log 188 00:08:57,200 --> 00:08:58,800 and 189 00:08:57,530 --> 00:09:01,519 [Music] 190 00:08:58,800 --> 00:09:03,920 it will be more clear in this diagram 191 00:09:01,519 --> 00:09:06,480 you can see local logs can be obtained 192 00:09:03,920 --> 00:09:09,279 from each application server 193 00:09:06,480 --> 00:09:11,200 and global logs are obtained from 194 00:09:09,279 --> 00:09:13,839 databases 195 00:09:11,200 --> 00:09:18,560 in this approach we can satisfy all 196 00:09:13,839 --> 00:09:18,560 users and their usage patterns 197 00:09:19,040 --> 00:09:26,000 this is how because how this case was 198 00:09:21,760 --> 00:09:26,000 solved but what can we learn from it 199 00:09:26,399 --> 00:09:31,920 well in my opinion 200 00:09:29,120 --> 00:09:34,320 the root cause is we didn't predict a 201 00:09:31,920 --> 00:09:36,640 usage pattern like that 202 00:09:34,320 --> 00:09:40,320 in all our test scenarios 203 00:09:36,640 --> 00:09:43,760 optimistic lucky just works fire until 204 00:09:40,320 --> 00:09:46,880 we encounter this scenario 205 00:09:43,760 --> 00:09:49,680 this is a classical uh skeleton 206 00:09:46,880 --> 00:09:53,920 scalability challenge we must be while 207 00:09:49,680 --> 00:09:53,920 building a data intensive application 208 00:09:56,399 --> 00:10:00,480 we will go back to reveal our release 209 00:09:58,720 --> 00:10:04,680 and learn later 210 00:10:00,480 --> 00:10:04,680 let's go to the next one first 211 00:10:06,000 --> 00:10:09,519 so what happened this time 212 00:10:09,680 --> 00:10:15,920 we have an abdul we have an optional 213 00:10:12,880 --> 00:10:19,120 data manager feature for users 214 00:10:15,920 --> 00:10:21,200 basically it scans and removes expire 215 00:10:19,120 --> 00:10:25,120 files for user 216 00:10:21,200 --> 00:10:27,680 it's just a prototype at latin since 217 00:10:25,120 --> 00:10:28,880 almost no one used it for many 218 00:10:27,680 --> 00:10:31,600 years 219 00:10:28,880 --> 00:10:34,800 a lot time it was actually implemented 220 00:10:31,600 --> 00:10:37,440 as a simple quest job 221 00:10:34,800 --> 00:10:39,120 we didn't care about it until one day 222 00:10:37,440 --> 00:10:42,160 the user found it 223 00:10:39,120 --> 00:10:45,959 and made a million times more requests 224 00:10:42,160 --> 00:10:45,959 to this poor contract 225 00:10:46,320 --> 00:10:50,079 now i knew this feature is necessary for 226 00:10:48,640 --> 00:10:52,959 our customers 227 00:10:50,079 --> 00:10:55,519 so i will implement it in a robust and 228 00:10:52,959 --> 00:10:58,800 production ready way 229 00:10:55,519 --> 00:11:01,920 why need is a job processing system 230 00:10:58,800 --> 00:11:04,720 we found one from github called rescue 231 00:11:01,920 --> 00:11:08,040 which use radius as a java q 232 00:11:04,720 --> 00:11:11,279 and can process tasks in a 233 00:11:08,040 --> 00:11:11,279 distributed way 234 00:11:12,560 --> 00:11:18,640 so what can we learn from this incident 235 00:11:15,360 --> 00:11:19,839 again of course this is a scalability 236 00:11:18,640 --> 00:11:21,440 challenge 237 00:11:19,839 --> 00:11:24,399 the difference is 238 00:11:21,440 --> 00:11:25,519 the pattern is exactly what we expected 239 00:11:24,399 --> 00:11:29,839 this time 240 00:11:25,519 --> 00:11:29,839 what we are expected is the law 241 00:11:32,480 --> 00:11:36,640 next one is about reliability 242 00:11:38,240 --> 00:11:43,920 we know security is also an important 243 00:11:41,200 --> 00:11:46,320 non-version of requirement for the data 244 00:11:43,920 --> 00:11:47,440 intensive application 245 00:11:46,320 --> 00:11:50,880 so we 246 00:11:47,440 --> 00:11:53,680 most specifically might both 247 00:11:50,880 --> 00:11:55,560 decide to outsource our data protection 248 00:11:53,680 --> 00:11:59,279 module to a 249 00:11:55,560 --> 00:12:02,399 professional security service provider 250 00:11:59,279 --> 00:12:04,639 the project runs mostly until we deploy 251 00:12:02,399 --> 00:12:04,639 it 252 00:12:04,720 --> 00:12:11,760 we start to receive complaints 253 00:12:07,600 --> 00:12:13,760 that their data is corrupted 254 00:12:11,760 --> 00:12:17,120 that's very weird 255 00:12:13,760 --> 00:12:18,399 because not all but just part of data is 256 00:12:17,120 --> 00:12:21,360 corrupted 257 00:12:18,399 --> 00:12:21,360 what's going on here 258 00:12:23,440 --> 00:12:27,920 i found the problem is the encryption 259 00:12:26,320 --> 00:12:30,720 process 260 00:12:27,920 --> 00:12:32,240 the encryption algorithm is black based 261 00:12:30,720 --> 00:12:34,959 so we have it 262 00:12:32,240 --> 00:12:36,800 we have to add padding before we encrypt 263 00:12:34,959 --> 00:12:39,279 it 264 00:12:36,800 --> 00:12:43,200 let me explain how padding works in case 265 00:12:39,279 --> 00:12:47,760 if you don't have related background 266 00:12:43,200 --> 00:12:50,320 the assure block size is 16 bytes in the 267 00:12:47,760 --> 00:12:54,320 following examples 268 00:12:50,320 --> 00:12:56,079 first example if the input size is 12 269 00:12:54,320 --> 00:12:58,800 bytes 270 00:12:56,079 --> 00:13:00,480 we need to append four fourths 271 00:12:58,800 --> 00:13:04,079 to the data 272 00:13:00,480 --> 00:13:07,040 after that it becomes 16 bytes 273 00:13:04,079 --> 00:13:11,680 and satisfy the rule or not the size of 274 00:13:07,040 --> 00:13:11,680 data must be multiples of block size 275 00:13:11,839 --> 00:13:18,000 in other case 276 00:13:13,440 --> 00:13:21,200 the data size is 16 bytes this time 277 00:13:18,000 --> 00:13:22,720 what should we do we still have to add 278 00:13:21,200 --> 00:13:26,560 in padding 279 00:13:22,720 --> 00:13:29,920 this time we add 16 16 280 00:13:26,560 --> 00:13:29,920 at the end of the data 281 00:13:30,000 --> 00:13:35,200 it's already satisfied rule why should 282 00:13:32,560 --> 00:13:37,920 we do that you may not ask 283 00:13:35,200 --> 00:13:40,320 because in the decryption process we 284 00:13:37,920 --> 00:13:42,800 need to remove padding to recover 285 00:13:40,320 --> 00:13:46,079 original data 286 00:13:42,800 --> 00:13:48,240 lag reason use the last bite to identify 287 00:13:46,079 --> 00:13:49,839 how many bytes of padding need to be 288 00:13:48,240 --> 00:13:52,639 removed 289 00:13:49,839 --> 00:13:53,760 that's why we need to padding in this 290 00:13:52,639 --> 00:13:56,320 example 291 00:13:53,760 --> 00:13:57,680 even though its original data size is 292 00:13:56,320 --> 00:14:02,560 already 293 00:13:57,680 --> 00:14:04,399 multiple multiples of block size 294 00:14:02,560 --> 00:14:06,000 so now we came back to the original 295 00:14:04,399 --> 00:14:09,199 parliament 296 00:14:06,000 --> 00:14:11,760 data corruption occurs because 297 00:14:09,199 --> 00:14:14,240 the implementation of our integration 298 00:14:11,760 --> 00:14:17,279 algorithm only consider the case running 299 00:14:14,240 --> 00:14:20,160 our example but not case too 300 00:14:17,279 --> 00:14:24,800 that's why some users complement their 301 00:14:20,160 --> 00:14:24,800 data is corrupted the other is not 302 00:14:28,639 --> 00:14:35,199 fixed fixed list bug is trivial but we 303 00:14:32,079 --> 00:14:36,720 have to fix corrupted data in the system 304 00:14:35,199 --> 00:14:39,199 as well 305 00:14:36,720 --> 00:14:40,800 this is where reliability issue differs 306 00:14:39,199 --> 00:14:45,680 from others 307 00:14:40,800 --> 00:14:49,040 we are not only fixed local but the data 308 00:14:45,680 --> 00:14:51,519 corrected correct data is also trivial 309 00:14:49,040 --> 00:14:54,240 we just filter corrupted data by 310 00:14:51,519 --> 00:14:59,120 metadata starting database 311 00:14:54,240 --> 00:15:03,040 then we can correct corresponding data 312 00:14:59,120 --> 00:15:05,279 this is stable as an example 313 00:15:03,040 --> 00:15:07,839 data corresponding to first and second 314 00:15:05,279 --> 00:15:09,839 rule is not affected by the five since 315 00:15:07,839 --> 00:15:13,040 it's not processed 316 00:15:09,839 --> 00:15:16,399 by an incorrect algorithm 317 00:15:13,040 --> 00:15:18,560 the cellular is also not effect affected 318 00:15:16,399 --> 00:15:22,240 since its size is 319 00:15:18,560 --> 00:15:22,240 multiples of black size 320 00:15:22,320 --> 00:15:29,440 even though it's indeed processed by an 321 00:15:26,959 --> 00:15:32,560 defective algorithm 322 00:15:29,440 --> 00:15:34,639 the fourth row is affected but is 323 00:15:32,560 --> 00:15:37,199 already fixed 324 00:15:34,639 --> 00:15:38,320 with this thing issued by the version 325 00:15:37,199 --> 00:15:40,639 number 326 00:15:38,320 --> 00:15:44,000 we need this column since we will 327 00:15:40,639 --> 00:15:45,040 correctly split curves in a distributed 328 00:15:44,000 --> 00:15:48,160 way 329 00:15:45,040 --> 00:15:51,519 we will discuss it later 330 00:15:48,160 --> 00:15:52,959 the last row is the lonely one need to 331 00:15:51,519 --> 00:15:55,440 be fixed here 332 00:15:52,959 --> 00:15:57,199 since it was processed by an 333 00:15:55,440 --> 00:16:00,399 incorrect algorithm 334 00:15:57,199 --> 00:16:04,480 and the size is multiples of black sides 335 00:16:00,399 --> 00:16:07,480 it's time to fix us f9 as a fixed 336 00:16:04,480 --> 00:16:07,480 recall 337 00:16:10,480 --> 00:16:16,079 indeed in general we will say this is 338 00:16:13,519 --> 00:16:19,040 just a silly bag 339 00:16:16,079 --> 00:16:20,320 and most times we can fix it with our 340 00:16:19,040 --> 00:16:24,240 pen 341 00:16:20,320 --> 00:16:26,399 but for a large scale system like us 342 00:16:24,240 --> 00:16:30,079 it's not funny 343 00:16:26,399 --> 00:16:32,240 the fact affects affects millions of 344 00:16:30,079 --> 00:16:34,560 files in our system 345 00:16:32,240 --> 00:16:38,480 you can imagine it's a very very 346 00:16:34,560 --> 00:16:40,240 straightforward emergency situation 347 00:16:38,480 --> 00:16:43,360 we if we 348 00:16:40,240 --> 00:16:45,759 evaluate the situation and 349 00:16:43,360 --> 00:16:48,000 we found out if we use 350 00:16:45,759 --> 00:16:48,000 a 351 00:16:48,079 --> 00:16:51,120 basic 352 00:16:49,040 --> 00:16:54,000 python script to fix it 353 00:16:51,120 --> 00:16:55,600 we need hundreds of days to correct all 354 00:16:54,000 --> 00:16:58,959 effective data 355 00:16:55,600 --> 00:17:00,880 nowhere we can do that no user will wait 356 00:16:58,959 --> 00:17:03,199 with us to do that 357 00:17:00,880 --> 00:17:07,120 so we decide to use a 358 00:17:03,199 --> 00:17:09,839 distributed way to get things done 359 00:17:07,120 --> 00:17:11,919 we use a job processing system called 360 00:17:09,839 --> 00:17:14,319 gearman this time 361 00:17:11,919 --> 00:17:17,439 we don't use radius as a job queue since 362 00:17:14,319 --> 00:17:19,839 we want tasks can be kept in a 363 00:17:17,439 --> 00:17:23,280 persistent storage 364 00:17:19,839 --> 00:17:23,280 not immediate release time 365 00:17:23,919 --> 00:17:31,200 so the new cost is obvious 366 00:17:27,439 --> 00:17:34,160 we had an unreliable provider 367 00:17:31,200 --> 00:17:36,640 but why can we figure out a problem 368 00:17:34,160 --> 00:17:38,080 before we deploy it 369 00:17:36,640 --> 00:17:40,559 right 370 00:17:38,080 --> 00:17:43,840 remember the data is affected by this 371 00:17:40,559 --> 00:17:48,240 bug only if its size is multiples of 372 00:17:43,840 --> 00:17:49,600 black size and the block size is 128 373 00:17:48,240 --> 00:17:53,840 bytes 374 00:17:49,600 --> 00:17:57,760 then we only have a 0.7 percent chance 375 00:17:53,840 --> 00:17:57,760 to file back by testing 376 00:17:58,080 --> 00:18:01,840 so the 377 00:17:59,200 --> 00:18:04,559 uh the listener here is 378 00:18:01,840 --> 00:18:06,799 besides increasing taste coverage we 379 00:18:04,559 --> 00:18:09,760 also need to consider how to 380 00:18:06,799 --> 00:18:10,880 tolerate software phones and human 381 00:18:09,760 --> 00:18:12,559 errors 382 00:18:10,880 --> 00:18:15,440 in the production 383 00:18:12,559 --> 00:18:19,120 because there is no way to prove a code 384 00:18:15,440 --> 00:18:19,120 is created or not 385 00:18:22,320 --> 00:18:26,160 and 386 00:18:23,840 --> 00:18:28,160 okay last one 387 00:18:26,160 --> 00:18:29,919 this is the last incident i will show 388 00:18:28,160 --> 00:18:32,480 today 389 00:18:29,919 --> 00:18:33,600 we didn't just screw out the user data 390 00:18:32,480 --> 00:18:35,919 guess what 391 00:18:33,600 --> 00:18:38,320 we lost data too 392 00:18:35,919 --> 00:18:42,480 we lost it not because we don't care 393 00:18:38,320 --> 00:18:44,720 about reliability we d we d we store 394 00:18:42,480 --> 00:18:48,080 multiple replicas in multiple data 395 00:18:44,720 --> 00:18:50,559 centers while receiving user data 396 00:18:48,080 --> 00:18:53,880 that's why that's what we're doing our 397 00:18:50,559 --> 00:18:53,880 system design 398 00:18:56,400 --> 00:19:03,360 so but but what is happen 399 00:19:00,640 --> 00:19:06,400 the background is the system not only 400 00:19:03,360 --> 00:19:08,400 consider reliability but also not 401 00:19:06,400 --> 00:19:11,440 balancing 402 00:19:08,400 --> 00:19:13,280 we will consider both distant usage and 403 00:19:11,440 --> 00:19:16,640 failure domains while choosing a 404 00:19:13,280 --> 00:19:18,480 location to store data 405 00:19:16,640 --> 00:19:21,120 this implies that 406 00:19:18,480 --> 00:19:24,240 newly added node will have a higher 407 00:19:21,120 --> 00:19:26,880 chance to receive data since they have 408 00:19:24,240 --> 00:19:29,360 more resources 409 00:19:26,880 --> 00:19:31,440 available compared to all nodes in 410 00:19:29,360 --> 00:19:33,760 general 411 00:19:31,440 --> 00:19:36,640 here is another fact is 412 00:19:33,760 --> 00:19:39,679 new machines have a higher chance of 413 00:19:36,640 --> 00:19:42,400 damage this is called 414 00:19:39,679 --> 00:19:44,400 bathtub curve 415 00:19:42,400 --> 00:19:46,240 compare these two things cause the 416 00:19:44,400 --> 00:19:48,400 incidence happen 417 00:19:46,240 --> 00:19:50,240 data right into newly added node and you 418 00:19:48,400 --> 00:19:53,840 not 419 00:19:50,240 --> 00:19:53,840 broken that's why 420 00:19:54,640 --> 00:20:00,720 this is another classical reliability 421 00:19:57,039 --> 00:20:02,960 trends caused by hardware fault 422 00:20:00,720 --> 00:20:04,000 on small scale these things are really 423 00:20:02,960 --> 00:20:06,159 happening 424 00:20:04,000 --> 00:20:07,919 many people need more backline their 425 00:20:06,159 --> 00:20:09,760 computer because they don't think this 426 00:20:07,919 --> 00:20:11,039 will happen on their 427 00:20:09,760 --> 00:20:13,760 on their 428 00:20:11,039 --> 00:20:15,760 yeah on their computer 429 00:20:13,760 --> 00:20:17,919 but in a large scale system there are 430 00:20:15,760 --> 00:20:18,880 lots happen regularly 431 00:20:17,919 --> 00:20:21,200 and 432 00:20:18,880 --> 00:20:23,679 the value is is it's almost impossible 433 00:20:21,200 --> 00:20:25,360 to solve it completely 434 00:20:23,679 --> 00:20:27,600 i can tell you how many works we 435 00:20:25,360 --> 00:20:30,320 definitely use 436 00:20:27,600 --> 00:20:31,120 with the week long run test 437 00:20:30,320 --> 00:20:33,120 we 438 00:20:31,120 --> 00:20:36,240 keep writing and reading data in the 439 00:20:33,120 --> 00:20:37,679 system in that while we're doing lab 440 00:20:36,240 --> 00:20:40,640 tests 441 00:20:37,679 --> 00:20:43,600 we even go to the data center and plug 442 00:20:40,640 --> 00:20:46,320 on the block of the 443 00:20:43,600 --> 00:20:47,679 laser cable directly to see what's 444 00:20:46,320 --> 00:20:50,720 happening 445 00:20:47,679 --> 00:20:53,360 we also analyze this system reliability 446 00:20:50,720 --> 00:20:56,159 by some theoretical methods such as 447 00:20:53,360 --> 00:20:58,960 markov trim 448 00:20:56,159 --> 00:21:00,080 if you're interested you can go to and 449 00:20:58,960 --> 00:21:01,280 see 450 00:21:00,080 --> 00:21:04,000 soa 451 00:21:01,280 --> 00:21:06,400 that means service level agreement of 452 00:21:04,000 --> 00:21:09,360 your club provider 453 00:21:06,400 --> 00:21:12,159 i think there is no one guarantee 100 454 00:21:09,360 --> 00:21:12,159 reliability 455 00:21:14,799 --> 00:21:19,919 in the end what can we learn from these 456 00:21:17,360 --> 00:21:19,919 stories 457 00:21:22,159 --> 00:21:27,120 three takeaways 458 00:21:24,559 --> 00:21:30,960 before we start i have to remind you 459 00:21:27,120 --> 00:21:32,240 like it's um a lot of the subjective 460 00:21:30,960 --> 00:21:35,440 you can agree 461 00:21:32,240 --> 00:21:37,760 or not and also welcome to discuss it 462 00:21:35,440 --> 00:21:37,760 later 463 00:21:38,159 --> 00:21:45,360 first no sleep bullet classical 464 00:21:42,960 --> 00:21:47,360 we've seen successful stories from 465 00:21:45,360 --> 00:21:50,080 google or facebook 466 00:21:47,360 --> 00:21:51,120 where i was a junior junior engineer i 467 00:21:50,080 --> 00:21:54,320 may think 468 00:21:51,120 --> 00:21:56,240 their story are so amazing we should 469 00:21:54,320 --> 00:21:59,520 keep following their tech stacks to 470 00:21:56,240 --> 00:22:02,799 build reliable and scalable system 471 00:21:59,520 --> 00:22:04,159 however after the journey through 472 00:22:02,799 --> 00:22:05,520 my journey 473 00:22:04,159 --> 00:22:08,480 i 474 00:22:05,520 --> 00:22:09,840 know that there are many things behind 475 00:22:08,480 --> 00:22:12,320 the thing 476 00:22:09,840 --> 00:22:14,720 but why why can we just follow some 477 00:22:12,320 --> 00:22:16,480 blueprints to build a reliable and 478 00:22:14,720 --> 00:22:18,960 scalable system 479 00:22:16,480 --> 00:22:22,159 like building a bridge 480 00:22:18,960 --> 00:22:22,159 or a tower 481 00:22:23,200 --> 00:22:29,600 in my opinion first is how to enumerate 482 00:22:26,960 --> 00:22:30,640 all possible reliability reliability 483 00:22:29,600 --> 00:22:34,480 causes 484 00:22:30,640 --> 00:22:38,159 just like what i showed you previously 485 00:22:34,480 --> 00:22:39,840 even though we do many preparations 486 00:22:38,159 --> 00:22:41,760 that are still 487 00:22:39,840 --> 00:22:43,360 almost 488 00:22:41,760 --> 00:22:45,200 second 489 00:22:43,360 --> 00:22:47,520 pattern and law 490 00:22:45,200 --> 00:22:49,120 keep changing while your business 491 00:22:47,520 --> 00:22:51,600 expands 492 00:22:49,120 --> 00:22:54,080 you cannot get a final version of the 493 00:22:51,600 --> 00:22:55,280 specs that tell you how many kinds of 494 00:22:54,080 --> 00:22:57,760 workload 495 00:22:55,280 --> 00:23:00,960 we will have and how many requests per 496 00:22:57,760 --> 00:23:00,960 second are needed 497 00:23:01,039 --> 00:23:06,720 as a software engineer i think that's a 498 00:23:04,000 --> 00:23:10,880 very interesting and very 499 00:23:06,720 --> 00:23:10,880 unique challenge we need to deal with 500 00:23:12,000 --> 00:23:14,480 second 501 00:23:13,200 --> 00:23:17,520 you may think 502 00:23:14,480 --> 00:23:19,520 exposure those incidents will not happen 503 00:23:17,520 --> 00:23:21,200 if you have some 504 00:23:19,520 --> 00:23:23,520 fancy techniques 505 00:23:21,200 --> 00:23:26,480 yeah yeah that's fair 506 00:23:23,520 --> 00:23:31,120 i believe a talent engineer can prevent 507 00:23:26,480 --> 00:23:31,120 prevent some of many incidents we have 508 00:23:32,080 --> 00:23:35,440 but 509 00:23:32,880 --> 00:23:39,600 she can use 510 00:23:35,440 --> 00:23:41,919 and tree in most rice scenarios she can 511 00:23:39,600 --> 00:23:46,159 i don't know use eventhousing to scale 512 00:23:41,919 --> 00:23:51,039 system out or she can introduce puzzles 513 00:23:46,159 --> 00:23:54,000 to get strong consistency or adapt 514 00:23:51,039 --> 00:23:57,760 calcite consistency hash ring and 515 00:23:54,000 --> 00:24:00,000 irrational coating in their system 516 00:23:57,760 --> 00:24:03,200 but there's one more thing 517 00:24:00,000 --> 00:24:06,600 i have mentioned yet 518 00:24:03,200 --> 00:24:10,240 the platform reliability and scalability 519 00:24:06,600 --> 00:24:11,600 maintainability is also a key maker 520 00:24:10,240 --> 00:24:13,360 in his book 521 00:24:11,600 --> 00:24:16,240 mapping complements they lack 522 00:24:13,360 --> 00:24:18,720 reliability scalability and 523 00:24:16,240 --> 00:24:20,400 maintainability are the three most 524 00:24:18,720 --> 00:24:23,039 important factors 525 00:24:20,400 --> 00:24:25,679 while designing a data intensive 526 00:24:23,039 --> 00:24:25,679 application 527 00:24:26,159 --> 00:24:30,960 although the insert incidents i 528 00:24:28,480 --> 00:24:34,159 mentioned are primarily about 529 00:24:30,960 --> 00:24:36,720 scalability and reliability 530 00:24:34,159 --> 00:24:39,919 maintainability also play an important 531 00:24:36,720 --> 00:24:39,919 role behind the scene 532 00:24:40,080 --> 00:24:44,840 generally speaking 533 00:24:42,159 --> 00:24:48,000 a data intensive 534 00:24:44,840 --> 00:24:50,480 application it's not also a large scale 535 00:24:48,000 --> 00:24:53,760 system right 536 00:24:50,480 --> 00:24:56,159 it's essential to have a talent team to 537 00:24:53,760 --> 00:25:01,279 support and involve it 538 00:24:56,159 --> 00:25:02,799 no one no one can do it by herself 539 00:25:01,279 --> 00:25:04,640 uh here is this 540 00:25:02,799 --> 00:25:08,559 example i heard 541 00:25:04,640 --> 00:25:12,400 kafka we know it's a messenger kill or 542 00:25:08,559 --> 00:25:15,760 string in processing system right 543 00:25:12,400 --> 00:25:17,840 usually it's just a component in a data 544 00:25:15,760 --> 00:25:20,559 intensive allocation 545 00:25:17,840 --> 00:25:25,039 but you know what in leaking they had 546 00:25:20,559 --> 00:25:25,039 hundreds of engineers to maintain it 547 00:25:26,240 --> 00:25:30,720 why do i mention this here because i 548 00:25:28,640 --> 00:25:33,120 know you guys are those who are willing 549 00:25:30,720 --> 00:25:35,120 to challenge yourself that's a good 550 00:25:33,120 --> 00:25:37,840 thing that's a good thing 551 00:25:35,120 --> 00:25:39,919 but i i recommend you to think twice 552 00:25:37,840 --> 00:25:42,320 before you introduce some advanced 553 00:25:39,919 --> 00:25:44,880 techniques in your stack 554 00:25:42,320 --> 00:25:48,559 consider your thing have ability to 555 00:25:44,880 --> 00:25:48,559 maintain it or not 556 00:25:51,120 --> 00:25:54,559 now you may think 557 00:25:52,960 --> 00:25:56,880 you talk a lot but 558 00:25:54,559 --> 00:25:58,880 the slot meaning it doesn't have 559 00:25:56,880 --> 00:26:00,400 anything to do to build a reliable and 560 00:25:58,880 --> 00:26:03,120 scalable system 561 00:26:00,400 --> 00:26:05,200 i have no answers here but 562 00:26:03,120 --> 00:26:07,120 i think the most important thing within 563 00:26:05,200 --> 00:26:08,159 policing studies 564 00:26:07,120 --> 00:26:11,360 people 565 00:26:08,159 --> 00:26:12,880 people behind the systems and machines 566 00:26:11,360 --> 00:26:13,840 are matter 567 00:26:12,880 --> 00:26:17,760 people 568 00:26:13,840 --> 00:26:21,600 are the most important part of a service 569 00:26:17,760 --> 00:26:22,960 actually we engineers provide service to 570 00:26:21,600 --> 00:26:24,720 our customers 571 00:26:22,960 --> 00:26:27,840 nano's machines 572 00:26:24,720 --> 00:26:29,360 machines are just our interface and 573 00:26:27,840 --> 00:26:32,559 tools 574 00:26:29,360 --> 00:26:35,440 we need talent engineer to not escape 575 00:26:32,559 --> 00:26:36,880 problems and fix them behold before bad 576 00:26:35,440 --> 00:26:39,120 things happen 577 00:26:36,880 --> 00:26:41,840 we need engineers to tolerate and 578 00:26:39,120 --> 00:26:46,000 troubleshooting reliability problems 579 00:26:41,840 --> 00:26:48,320 also we need sre engineers to build an 580 00:26:46,000 --> 00:26:50,240 infrared that has this opportunity to 581 00:26:48,320 --> 00:26:53,279 consumer errors 582 00:26:50,240 --> 00:26:55,760 the infrared can also help developers to 583 00:26:53,279 --> 00:26:57,360 figure out problems in a more productive 584 00:26:55,760 --> 00:27:00,159 way 585 00:26:57,360 --> 00:27:02,559 finally we need supervisors and project 586 00:27:00,159 --> 00:27:05,760 manager to understand it's important to 587 00:27:02,559 --> 00:27:09,039 have a good development culture 588 00:27:05,760 --> 00:27:11,520 this is my zen of data intensive system 589 00:27:09,039 --> 00:27:14,799 design 590 00:27:11,520 --> 00:27:16,240 if you have experience i share today 591 00:27:14,799 --> 00:27:19,039 you may have 592 00:27:16,240 --> 00:27:22,159 similar feelings about these things if 593 00:27:19,039 --> 00:27:23,039 not i learned these things the hard way 594 00:27:22,159 --> 00:27:26,000 and 595 00:27:23,039 --> 00:27:29,919 i hope this talk will save you from 596 00:27:26,000 --> 00:27:29,919 repeating the same mistakes 597 00:27:30,640 --> 00:27:33,880 thank you 598 00:27:34,080 --> 00:27:38,399 well thanks peter that is that was 599 00:27:36,159 --> 00:27:39,919 excellent um lots that i agree with 600 00:27:38,399 --> 00:27:41,600 there which is just tremendous i 601 00:27:39,919 --> 00:27:44,000 particularly like the emphasis on 602 00:27:41,600 --> 00:27:45,520 maintainability uh that you mentioned 603 00:27:44,000 --> 00:27:46,640 there towards the end of the talk i 604 00:27:45,520 --> 00:27:49,600 think that's something that we often 605 00:27:46,640 --> 00:27:52,480 overlook and the fact that it's it's all 606 00:27:49,600 --> 00:27:54,960 being looked after and managed by humans 607 00:27:52,480 --> 00:27:57,360 and largely as all of us here in the 608 00:27:54,960 --> 00:27:58,880 devops track are well aware the humans 609 00:27:57,360 --> 00:28:00,880 are where a lot of the challenge comes 610 00:27:58,880 --> 00:28:02,720 from so thank you so much for that talk 611 00:28:00,880 --> 00:28:05,279 um now peter is actually here in the 612 00:28:02,720 --> 00:28:07,200 chat so if you have questions about 613 00:28:05,279 --> 00:28:09,279 anything that was in in the talk there 614 00:28:07,200 --> 00:28:10,320 do drop it into the chat and hopefully 615 00:28:09,279 --> 00:28:11,360 peter will be able to answer those 616 00:28:10,320 --> 00:28:12,559 questions 617 00:28:11,360 --> 00:28:15,200 um 618 00:28:12,559 --> 00:28:17,600 thank you so much that was tremendous 619 00:28:15,200 --> 00:28:18,880 now between uh now and the next talk we 620 00:28:17,600 --> 00:28:19,600 have another break 621 00:28:18,880 --> 00:28:21,520 so 622 00:28:19,600 --> 00:28:23,840 drop in some questions uh have a chat 623 00:28:21,520 --> 00:28:25,279 amongst yourselves um otherwise yes grab 624 00:28:23,840 --> 00:28:27,279 a quick drink 625 00:28:25,279 --> 00:28:29,440 stretch your legs and we'll be back at 626 00:28:27,279 --> 00:28:31,840 about 3 p.m for the next talk see you 627 00:28:29,440 --> 00:28:31,840 soon