1
00:00:12,160 --> 00:00:16,320
welcome back everyone hope you had a

2
00:00:13,840 --> 00:00:18,320
nice break got some food bit of exercise

3
00:00:16,320 --> 00:00:20,160
hopefully a little bit of sunshine from

4
00:00:18,320 --> 00:00:23,199
wherever you are in the world

5
00:00:20,160 --> 00:00:24,880
and uh welcome back to the second half

6
00:00:23,199 --> 00:00:26,960
of devops

7
00:00:24,880 --> 00:00:30,080
and for our first talk uh here back here

8
00:00:26,960 --> 00:00:31,519
in the devops track um we have a talk by

9
00:00:30,080 --> 00:00:34,000
peter chu

10
00:00:31,519 --> 00:00:35,840
called things might go wrong in a data

11
00:00:34,000 --> 00:00:38,160
intensive application

12
00:00:35,840 --> 00:00:39,920
i'm intrigued to find out what possible

13
00:00:38,160 --> 00:00:42,320
things could go wrong as we all know

14
00:00:39,920 --> 00:00:45,840
there's an endless possibility so with

15
00:00:42,320 --> 00:00:45,840
that let's take it away peter

16
00:00:50,000 --> 00:00:55,520
hello everyone i'm here to talk about my

17
00:00:52,640 --> 00:00:59,640
dev oops experience about building a

18
00:00:55,520 --> 00:00:59,640
data intensive application

19
00:01:01,120 --> 00:01:06,320
so what is data intensive application

20
00:01:04,159 --> 00:01:07,680
well i think there is no formal

21
00:01:06,320 --> 00:01:11,040
definition

22
00:01:07,680 --> 00:01:14,320
and i will adopt an idea from deciding

23
00:01:11,040 --> 00:01:16,799
data intensive application written by

24
00:01:14,320 --> 00:01:19,439
martin equipment in my talk

25
00:01:16,799 --> 00:01:22,479
in this book we call it application data

26
00:01:19,439 --> 00:01:23,600
intensive if data is its primary

27
00:01:22,479 --> 00:01:26,479
challenge

28
00:01:23,600 --> 00:01:30,000
the quantity of data the complexity of

29
00:01:26,479 --> 00:01:33,759
data are speed at which lead is changing

30
00:01:30,000 --> 00:01:36,720
the next view circles are bottlenecked

31
00:01:33,759 --> 00:01:39,119
for this idea we can say almost all

32
00:01:36,720 --> 00:01:42,640
success for our business

33
00:01:39,119 --> 00:01:45,920
a based on data intensive system

34
00:01:42,640 --> 00:01:48,640
to decide the data intensive application

35
00:01:45,920 --> 00:01:50,799
we need to consider not only functional

36
00:01:48,640 --> 00:01:52,399
requirements but non-functional

37
00:01:50,799 --> 00:01:56,320
requirements

38
00:01:52,399 --> 00:01:59,439
such as reliability and scalability

39
00:01:56,320 --> 00:02:03,680
and we already have many success stories

40
00:01:59,439 --> 00:02:07,119
from google amazon and facebook

41
00:02:03,680 --> 00:02:09,360
it tends to let the thing

42
00:02:07,119 --> 00:02:12,800
we can just follow their architecture to

43
00:02:09,360 --> 00:02:15,120
build reliable and scalable system

44
00:02:12,800 --> 00:02:19,120
we can just introduce things like the

45
00:02:15,120 --> 00:02:21,360
release or kubernetes and then

46
00:02:19,120 --> 00:02:24,160
whoa now we have system can keep

47
00:02:21,360 --> 00:02:26,800
variable and scalable

48
00:02:24,160 --> 00:02:30,239
well i won't say totally wrong

49
00:02:26,800 --> 00:02:31,280
and i thought we should notice the bios

50
00:02:30,239 --> 00:02:34,959
here

51
00:02:31,280 --> 00:02:37,840
those stories are just in effect as the

52
00:02:34,959 --> 00:02:40,000
sharp layer current situation that of

53
00:02:37,840 --> 00:02:40,959
course

54
00:02:40,000 --> 00:02:43,280
uh

55
00:02:40,959 --> 00:02:45,680
they maybe have different requirements

56
00:02:43,280 --> 00:02:48,720
compared to your place

57
00:02:45,680 --> 00:02:53,040
and they don't tell you how to get there

58
00:02:48,720 --> 00:02:53,040
how many mistakes they made in the past

59
00:02:53,519 --> 00:02:57,120
and that's why we are here today

60
00:02:58,959 --> 00:03:04,319
i am a software engineer from taiwan

61
00:03:02,080 --> 00:03:05,440
as well as a python start like all you

62
00:03:04,319 --> 00:03:08,400
guys

63
00:03:05,440 --> 00:03:10,720
i am a psf contributing member and

64
00:03:08,400 --> 00:03:12,800
organizing pycon in taiwan

65
00:03:10,720 --> 00:03:14,560
by the way taikan taiwan will be home

66
00:03:12,800 --> 00:03:17,280
virtually this year

67
00:03:14,560 --> 00:03:20,239
so welcome everyone to join us online

68
00:03:17,280 --> 00:03:22,480
from october 2nd to 3rd

69
00:03:20,239 --> 00:03:25,760
you can get more informations on our

70
00:03:22,480 --> 00:03:28,400
website if you're interested

71
00:03:25,760 --> 00:03:30,640
back on track the main point is

72
00:03:28,400 --> 00:03:32,879
i've been working on a data intensive

73
00:03:30,640 --> 00:03:35,519
systems for many years

74
00:03:32,879 --> 00:03:38,159
and i'm glad to share my experience with

75
00:03:35,519 --> 00:03:38,159
you today

76
00:03:39,440 --> 00:03:45,840
let me introduce the case we study today

77
00:03:43,040 --> 00:03:49,200
it's a data managed

78
00:03:45,840 --> 00:03:51,280
we host user data for various usage

79
00:03:49,200 --> 00:03:54,319
patterns and workloads

80
00:03:51,280 --> 00:03:55,920
such as online streaming ld data

81
00:03:54,319 --> 00:03:59,599
abbreviation

82
00:03:55,920 --> 00:04:02,879
file distribution things like that

83
00:03:59,599 --> 00:04:06,000
currently it hosts several petabytes and

84
00:04:02,879 --> 00:04:07,599
transfers several several terabytes a

85
00:04:06,000 --> 00:04:09,599
day

86
00:04:07,599 --> 00:04:11,280
in case you don't have a feeling about

87
00:04:09,599 --> 00:04:15,040
these numbers

88
00:04:11,280 --> 00:04:17,359
i google some vectors with you

89
00:04:15,040 --> 00:04:19,199
for example size of

90
00:04:17,359 --> 00:04:23,360
github

91
00:04:19,199 --> 00:04:26,479
arctic code world is 21 terabytes and

92
00:04:23,360 --> 00:04:28,479
java's data lake has 34 petabytes of

93
00:04:26,479 --> 00:04:31,199
data currently

94
00:04:28,479 --> 00:04:34,880
and here is a fun fact

95
00:04:31,199 --> 00:04:39,280
if we put all our basic disks on the

96
00:04:34,880 --> 00:04:40,160
ground we can cover whole football field

97
00:04:39,280 --> 00:04:42,400
and

98
00:04:40,160 --> 00:04:46,080
of course this case cannot be compared

99
00:04:42,400 --> 00:04:48,880
to those strands but i think is no

100
00:04:46,080 --> 00:04:51,360
practical case for most people since not

101
00:04:48,880 --> 00:04:56,040
everyone has the opportunity to view

102
00:04:51,360 --> 00:04:56,040
facebook or google from scratch

103
00:04:57,199 --> 00:05:02,639
this is what they look like

104
00:05:00,000 --> 00:05:06,560
i think it's a game architecture of a

105
00:05:02,639 --> 00:05:09,600
large-scale distributed system

106
00:05:06,560 --> 00:05:12,320
on the top of the diagram we can have

107
00:05:09,600 --> 00:05:13,440
reverse proxy and stellis application

108
00:05:12,320 --> 00:05:16,000
servers

109
00:05:13,440 --> 00:05:18,720
which are responsible for receiving and

110
00:05:16,000 --> 00:05:21,520
serving your service

111
00:05:18,720 --> 00:05:23,600
at the bottom of the diagram you can see

112
00:05:21,520 --> 00:05:26,560
we have various kinds of data

113
00:05:23,600 --> 00:05:28,639
technologies to store structure and

114
00:05:26,560 --> 00:05:31,199
unstructured data

115
00:05:28,639 --> 00:05:34,880
we use starting and partitioning to

116
00:05:31,199 --> 00:05:38,400
distribute loading to different nodes

117
00:05:34,880 --> 00:05:41,840
also we adopt this review file systems

118
00:05:38,400 --> 00:05:44,160
to store unstructured data

119
00:05:41,840 --> 00:05:46,800
there are thunderstorms there are some

120
00:05:44,160 --> 00:05:49,919
other subsystems for drop processing and

121
00:05:46,800 --> 00:05:49,919
data analysis

122
00:05:50,080 --> 00:05:57,039
this is just a very rough diagram i know

123
00:05:53,840 --> 00:05:59,360
to do just like you have an idea

124
00:05:57,039 --> 00:06:01,919
i won't go through deeply since today we

125
00:05:59,360 --> 00:06:05,400
focus on mistakes we made

126
00:06:01,919 --> 00:06:05,400
now how we succeed

127
00:06:06,319 --> 00:06:10,960
in the following time i will tell you

128
00:06:08,560 --> 00:06:13,039
four of many incidents i made in the

129
00:06:10,960 --> 00:06:16,319
past years

130
00:06:13,039 --> 00:06:19,360
the first two are about scalability

131
00:06:16,319 --> 00:06:21,919
the others are about reliability

132
00:06:19,360 --> 00:06:24,319
finally we will review what we can learn

133
00:06:21,919 --> 00:06:28,240
from these incidents

134
00:06:24,319 --> 00:06:28,240
so here we go instead of one

135
00:06:29,520 --> 00:06:34,880
one day we had a new customer

136
00:06:32,400 --> 00:06:39,199
they upload their data generated by

137
00:06:34,880 --> 00:06:43,199
thousands of devices to our platform

138
00:06:39,199 --> 00:06:46,319
24 7 f365

139
00:06:43,199 --> 00:06:48,720
we never saw this usage pattern before

140
00:06:46,319 --> 00:06:52,080
and as you might expect it

141
00:06:48,720 --> 00:06:52,080
we cannot handle it

142
00:06:52,800 --> 00:06:57,759
i trust the issue and figure out a

143
00:06:54,960 --> 00:07:00,880
problem using database

144
00:06:57,759 --> 00:07:04,240
for if for efficiency we use

145
00:07:00,880 --> 00:07:06,720
optimistic locking in our system

146
00:07:04,240 --> 00:07:08,960
optimistic locking means

147
00:07:06,720 --> 00:07:12,800
the system doesn't lock the data

148
00:07:08,960 --> 00:07:16,319
explicitly before we manipulate it

149
00:07:12,800 --> 00:07:19,199
we use it because we ensure

150
00:07:16,319 --> 00:07:20,960
our workloads will now cause contention

151
00:07:19,199 --> 00:07:23,919
very often

152
00:07:20,960 --> 00:07:23,919
but we're wrong

153
00:07:24,319 --> 00:07:27,599
a straightforward solution is to switch

154
00:07:26,880 --> 00:07:29,680
to

155
00:07:27,599 --> 00:07:32,160
pessimistic locking

156
00:07:29,680 --> 00:07:35,039
which ensure only one tray can

157
00:07:32,160 --> 00:07:38,560
manipulate data at each time

158
00:07:35,039 --> 00:07:41,680
but it actually causes many problems

159
00:07:38,560 --> 00:07:44,319
performance of other usage patterns

160
00:07:41,680 --> 00:07:46,960
don't worry since their workload are

161
00:07:44,319 --> 00:07:49,599
suffering from unnecessary locking

162
00:07:46,960 --> 00:07:49,599
operations

163
00:07:51,599 --> 00:07:58,639
in the end i decided hybrid and adaptive

164
00:07:55,120 --> 00:08:01,120
approach to address the issue over here

165
00:07:58,639 --> 00:08:03,759
means we introduce both

166
00:08:01,120 --> 00:08:05,360
optimistic and pessimistic logging in

167
00:08:03,759 --> 00:08:08,479
our system

168
00:08:05,360 --> 00:08:10,160
operations that may encounter contention

169
00:08:08,479 --> 00:08:13,440
such as writing

170
00:08:10,160 --> 00:08:16,240
will use pessimistic login

171
00:08:13,440 --> 00:08:17,919
the others give use optimistic login as

172
00:08:16,240 --> 00:08:19,759
before

173
00:08:17,919 --> 00:08:22,000
on the other hand

174
00:08:19,759 --> 00:08:23,440
adaptive means by

175
00:08:22,000 --> 00:08:26,000
by default

176
00:08:23,440 --> 00:08:27,520
operations just need to obtain a lock

177
00:08:26,000 --> 00:08:30,479
from

178
00:08:27,520 --> 00:08:33,440
application servers before processing

179
00:08:30,479 --> 00:08:35,680
this is not real pessimistic lock since

180
00:08:33,440 --> 00:08:38,719
operations may still complete while

181
00:08:35,680 --> 00:08:42,000
doing updates in database

182
00:08:38,719 --> 00:08:45,040
recall is local lock

183
00:08:42,000 --> 00:08:48,480
well carefully this detected

184
00:08:45,040 --> 00:08:50,480
will complete is actually occur

185
00:08:48,480 --> 00:08:53,519
the system will ensure the rear

186
00:08:50,480 --> 00:08:57,200
pessimistic locking automatically

187
00:08:53,519 --> 00:08:57,530
we call is global log

188
00:08:57,200 --> 00:08:58,800
and

189
00:08:57,530 --> 00:09:01,519
[Music]

190
00:08:58,800 --> 00:09:03,920
it will be more clear in this diagram

191
00:09:01,519 --> 00:09:06,480
you can see local logs can be obtained

192
00:09:03,920 --> 00:09:09,279
from each application server

193
00:09:06,480 --> 00:09:11,200
and global logs are obtained from

194
00:09:09,279 --> 00:09:13,839
databases

195
00:09:11,200 --> 00:09:18,560
in this approach we can satisfy all

196
00:09:13,839 --> 00:09:18,560
users and their usage patterns

197
00:09:19,040 --> 00:09:26,000
this is how because how this case was

198
00:09:21,760 --> 00:09:26,000
solved but what can we learn from it

199
00:09:26,399 --> 00:09:31,920
well in my opinion

200
00:09:29,120 --> 00:09:34,320
the root cause is we didn't predict a

201
00:09:31,920 --> 00:09:36,640
usage pattern like that

202
00:09:34,320 --> 00:09:40,320
in all our test scenarios

203
00:09:36,640 --> 00:09:43,760
optimistic lucky just works fire until

204
00:09:40,320 --> 00:09:46,880
we encounter this scenario

205
00:09:43,760 --> 00:09:49,680
this is a classical uh skeleton

206
00:09:46,880 --> 00:09:53,920
scalability challenge we must be while

207
00:09:49,680 --> 00:09:53,920
building a data intensive application

208
00:09:56,399 --> 00:10:00,480
we will go back to reveal our release

209
00:09:58,720 --> 00:10:04,680
and learn later

210
00:10:00,480 --> 00:10:04,680
let's go to the next one first

211
00:10:06,000 --> 00:10:09,519
so what happened this time

212
00:10:09,680 --> 00:10:15,920
we have an abdul we have an optional

213
00:10:12,880 --> 00:10:19,120
data manager feature for users

214
00:10:15,920 --> 00:10:21,200
basically it scans and removes expire

215
00:10:19,120 --> 00:10:25,120
files for user

216
00:10:21,200 --> 00:10:27,680
it's just a prototype at latin since

217
00:10:25,120 --> 00:10:28,880
almost no one used it for many

218
00:10:27,680 --> 00:10:31,600
years

219
00:10:28,880 --> 00:10:34,800
a lot time it was actually implemented

220
00:10:31,600 --> 00:10:37,440
as a simple quest job

221
00:10:34,800 --> 00:10:39,120
we didn't care about it until one day

222
00:10:37,440 --> 00:10:42,160
the user found it

223
00:10:39,120 --> 00:10:45,959
and made a million times more requests

224
00:10:42,160 --> 00:10:45,959
to this poor contract

225
00:10:46,320 --> 00:10:50,079
now i knew this feature is necessary for

226
00:10:48,640 --> 00:10:52,959
our customers

227
00:10:50,079 --> 00:10:55,519
so i will implement it in a robust and

228
00:10:52,959 --> 00:10:58,800
production ready way

229
00:10:55,519 --> 00:11:01,920
why need is a job processing system

230
00:10:58,800 --> 00:11:04,720
we found one from github called rescue

231
00:11:01,920 --> 00:11:08,040
which use radius as a java q

232
00:11:04,720 --> 00:11:11,279
and can process tasks in a

233
00:11:08,040 --> 00:11:11,279
distributed way

234
00:11:12,560 --> 00:11:18,640
so what can we learn from this incident

235
00:11:15,360 --> 00:11:19,839
again of course this is a scalability

236
00:11:18,640 --> 00:11:21,440
challenge

237
00:11:19,839 --> 00:11:24,399
the difference is

238
00:11:21,440 --> 00:11:25,519
the pattern is exactly what we expected

239
00:11:24,399 --> 00:11:29,839
this time

240
00:11:25,519 --> 00:11:29,839
what we are expected is the law

241
00:11:32,480 --> 00:11:36,640
next one is about reliability

242
00:11:38,240 --> 00:11:43,920
we know security is also an important

243
00:11:41,200 --> 00:11:46,320
non-version of requirement for the data

244
00:11:43,920 --> 00:11:47,440
intensive application

245
00:11:46,320 --> 00:11:50,880
so we

246
00:11:47,440 --> 00:11:53,680
most specifically might both

247
00:11:50,880 --> 00:11:55,560
decide to outsource our data protection

248
00:11:53,680 --> 00:11:59,279
module to a

249
00:11:55,560 --> 00:12:02,399
professional security service provider

250
00:11:59,279 --> 00:12:04,639
the project runs mostly until we deploy

251
00:12:02,399 --> 00:12:04,639
it

252
00:12:04,720 --> 00:12:11,760
we start to receive complaints

253
00:12:07,600 --> 00:12:13,760
that their data is corrupted

254
00:12:11,760 --> 00:12:17,120
that's very weird

255
00:12:13,760 --> 00:12:18,399
because not all but just part of data is

256
00:12:17,120 --> 00:12:21,360
corrupted

257
00:12:18,399 --> 00:12:21,360
what's going on here

258
00:12:23,440 --> 00:12:27,920
i found the problem is the encryption

259
00:12:26,320 --> 00:12:30,720
process

260
00:12:27,920 --> 00:12:32,240
the encryption algorithm is black based

261
00:12:30,720 --> 00:12:34,959
so we have it

262
00:12:32,240 --> 00:12:36,800
we have to add padding before we encrypt

263
00:12:34,959 --> 00:12:39,279
it

264
00:12:36,800 --> 00:12:43,200
let me explain how padding works in case

265
00:12:39,279 --> 00:12:47,760
if you don't have related background

266
00:12:43,200 --> 00:12:50,320
the assure block size is 16 bytes in the

267
00:12:47,760 --> 00:12:54,320
following examples

268
00:12:50,320 --> 00:12:56,079
first example if the input size is 12

269
00:12:54,320 --> 00:12:58,800
bytes

270
00:12:56,079 --> 00:13:00,480
we need to append four fourths

271
00:12:58,800 --> 00:13:04,079
to the data

272
00:13:00,480 --> 00:13:07,040
after that it becomes 16 bytes

273
00:13:04,079 --> 00:13:11,680
and satisfy the rule or not the size of

274
00:13:07,040 --> 00:13:11,680
data must be multiples of block size

275
00:13:11,839 --> 00:13:18,000
in other case

276
00:13:13,440 --> 00:13:21,200
the data size is 16 bytes this time

277
00:13:18,000 --> 00:13:22,720
what should we do we still have to add

278
00:13:21,200 --> 00:13:26,560
in padding

279
00:13:22,720 --> 00:13:29,920
this time we add 16 16

280
00:13:26,560 --> 00:13:29,920
at the end of the data

281
00:13:30,000 --> 00:13:35,200
it's already satisfied rule why should

282
00:13:32,560 --> 00:13:37,920
we do that you may not ask

283
00:13:35,200 --> 00:13:40,320
because in the decryption process we

284
00:13:37,920 --> 00:13:42,800
need to remove padding to recover

285
00:13:40,320 --> 00:13:46,079
original data

286
00:13:42,800 --> 00:13:48,240
lag reason use the last bite to identify

287
00:13:46,079 --> 00:13:49,839
how many bytes of padding need to be

288
00:13:48,240 --> 00:13:52,639
removed

289
00:13:49,839 --> 00:13:53,760
that's why we need to padding in this

290
00:13:52,639 --> 00:13:56,320
example

291
00:13:53,760 --> 00:13:57,680
even though its original data size is

292
00:13:56,320 --> 00:14:02,560
already

293
00:13:57,680 --> 00:14:04,399
multiple multiples of block size

294
00:14:02,560 --> 00:14:06,000
so now we came back to the original

295
00:14:04,399 --> 00:14:09,199
parliament

296
00:14:06,000 --> 00:14:11,760
data corruption occurs because

297
00:14:09,199 --> 00:14:14,240
the implementation of our integration

298
00:14:11,760 --> 00:14:17,279
algorithm only consider the case running

299
00:14:14,240 --> 00:14:20,160
our example but not case too

300
00:14:17,279 --> 00:14:24,800
that's why some users complement their

301
00:14:20,160 --> 00:14:24,800
data is corrupted the other is not

302
00:14:28,639 --> 00:14:35,199
fixed fixed list bug is trivial but we

303
00:14:32,079 --> 00:14:36,720
have to fix corrupted data in the system

304
00:14:35,199 --> 00:14:39,199
as well

305
00:14:36,720 --> 00:14:40,800
this is where reliability issue differs

306
00:14:39,199 --> 00:14:45,680
from others

307
00:14:40,800 --> 00:14:49,040
we are not only fixed local but the data

308
00:14:45,680 --> 00:14:51,519
corrected correct data is also trivial

309
00:14:49,040 --> 00:14:54,240
we just filter corrupted data by

310
00:14:51,519 --> 00:14:59,120
metadata starting database

311
00:14:54,240 --> 00:15:03,040
then we can correct corresponding data

312
00:14:59,120 --> 00:15:05,279
this is stable as an example

313
00:15:03,040 --> 00:15:07,839
data corresponding to first and second

314
00:15:05,279 --> 00:15:09,839
rule is not affected by the five since

315
00:15:07,839 --> 00:15:13,040
it's not processed

316
00:15:09,839 --> 00:15:16,399
by an incorrect algorithm

317
00:15:13,040 --> 00:15:18,560
the cellular is also not effect affected

318
00:15:16,399 --> 00:15:22,240
since its size is

319
00:15:18,560 --> 00:15:22,240
multiples of black size

320
00:15:22,320 --> 00:15:29,440
even though it's indeed processed by an

321
00:15:26,959 --> 00:15:32,560
defective algorithm

322
00:15:29,440 --> 00:15:34,639
the fourth row is affected but is

323
00:15:32,560 --> 00:15:37,199
already fixed

324
00:15:34,639 --> 00:15:38,320
with this thing issued by the version

325
00:15:37,199 --> 00:15:40,639
number

326
00:15:38,320 --> 00:15:44,000
we need this column since we will

327
00:15:40,639 --> 00:15:45,040
correctly split curves in a distributed

328
00:15:44,000 --> 00:15:48,160
way

329
00:15:45,040 --> 00:15:51,519
we will discuss it later

330
00:15:48,160 --> 00:15:52,959
the last row is the lonely one need to

331
00:15:51,519 --> 00:15:55,440
be fixed here

332
00:15:52,959 --> 00:15:57,199
since it was processed by an

333
00:15:55,440 --> 00:16:00,399
incorrect algorithm

334
00:15:57,199 --> 00:16:04,480
and the size is multiples of black sides

335
00:16:00,399 --> 00:16:07,480
it's time to fix us f9 as a fixed

336
00:16:04,480 --> 00:16:07,480
recall

337
00:16:10,480 --> 00:16:16,079
indeed in general we will say this is

338
00:16:13,519 --> 00:16:19,040
just a silly bag

339
00:16:16,079 --> 00:16:20,320
and most times we can fix it with our

340
00:16:19,040 --> 00:16:24,240
pen

341
00:16:20,320 --> 00:16:26,399
but for a large scale system like us

342
00:16:24,240 --> 00:16:30,079
it's not funny

343
00:16:26,399 --> 00:16:32,240
the fact affects affects millions of

344
00:16:30,079 --> 00:16:34,560
files in our system

345
00:16:32,240 --> 00:16:38,480
you can imagine it's a very very

346
00:16:34,560 --> 00:16:40,240
straightforward emergency situation

347
00:16:38,480 --> 00:16:43,360
we if we

348
00:16:40,240 --> 00:16:45,759
evaluate the situation and

349
00:16:43,360 --> 00:16:48,000
we found out if we use

350
00:16:45,759 --> 00:16:48,000
a

351
00:16:48,079 --> 00:16:51,120
basic

352
00:16:49,040 --> 00:16:54,000
python script to fix it

353
00:16:51,120 --> 00:16:55,600
we need hundreds of days to correct all

354
00:16:54,000 --> 00:16:58,959
effective data

355
00:16:55,600 --> 00:17:00,880
nowhere we can do that no user will wait

356
00:16:58,959 --> 00:17:03,199
with us to do that

357
00:17:00,880 --> 00:17:07,120
so we decide to use a

358
00:17:03,199 --> 00:17:09,839
distributed way to get things done

359
00:17:07,120 --> 00:17:11,919
we use a job processing system called

360
00:17:09,839 --> 00:17:14,319
gearman this time

361
00:17:11,919 --> 00:17:17,439
we don't use radius as a job queue since

362
00:17:14,319 --> 00:17:19,839
we want tasks can be kept in a

363
00:17:17,439 --> 00:17:23,280
persistent storage

364
00:17:19,839 --> 00:17:23,280
not immediate release time

365
00:17:23,919 --> 00:17:31,200
so the new cost is obvious

366
00:17:27,439 --> 00:17:34,160
we had an unreliable provider

367
00:17:31,200 --> 00:17:36,640
but why can we figure out a problem

368
00:17:34,160 --> 00:17:38,080
before we deploy it

369
00:17:36,640 --> 00:17:40,559
right

370
00:17:38,080 --> 00:17:43,840
remember the data is affected by this

371
00:17:40,559 --> 00:17:48,240
bug only if its size is multiples of

372
00:17:43,840 --> 00:17:49,600
black size and the block size is 128

373
00:17:48,240 --> 00:17:53,840
bytes

374
00:17:49,600 --> 00:17:57,760
then we only have a 0.7 percent chance

375
00:17:53,840 --> 00:17:57,760
to file back by testing

376
00:17:58,080 --> 00:18:01,840
so the

377
00:17:59,200 --> 00:18:04,559
uh the listener here is

378
00:18:01,840 --> 00:18:06,799
besides increasing taste coverage we

379
00:18:04,559 --> 00:18:09,760
also need to consider how to

380
00:18:06,799 --> 00:18:10,880
tolerate software phones and human

381
00:18:09,760 --> 00:18:12,559
errors

382
00:18:10,880 --> 00:18:15,440
in the production

383
00:18:12,559 --> 00:18:19,120
because there is no way to prove a code

384
00:18:15,440 --> 00:18:19,120
is created or not

385
00:18:22,320 --> 00:18:26,160
and

386
00:18:23,840 --> 00:18:28,160
okay last one

387
00:18:26,160 --> 00:18:29,919
this is the last incident i will show

388
00:18:28,160 --> 00:18:32,480
today

389
00:18:29,919 --> 00:18:33,600
we didn't just screw out the user data

390
00:18:32,480 --> 00:18:35,919
guess what

391
00:18:33,600 --> 00:18:38,320
we lost data too

392
00:18:35,919 --> 00:18:42,480
we lost it not because we don't care

393
00:18:38,320 --> 00:18:44,720
about reliability we d we d we store

394
00:18:42,480 --> 00:18:48,080
multiple replicas in multiple data

395
00:18:44,720 --> 00:18:50,559
centers while receiving user data

396
00:18:48,080 --> 00:18:53,880
that's why that's what we're doing our

397
00:18:50,559 --> 00:18:53,880
system design

398
00:18:56,400 --> 00:19:03,360
so but but what is happen

399
00:19:00,640 --> 00:19:06,400
the background is the system not only

400
00:19:03,360 --> 00:19:08,400
consider reliability but also not

401
00:19:06,400 --> 00:19:11,440
balancing

402
00:19:08,400 --> 00:19:13,280
we will consider both distant usage and

403
00:19:11,440 --> 00:19:16,640
failure domains while choosing a

404
00:19:13,280 --> 00:19:18,480
location to store data

405
00:19:16,640 --> 00:19:21,120
this implies that

406
00:19:18,480 --> 00:19:24,240
newly added node will have a higher

407
00:19:21,120 --> 00:19:26,880
chance to receive data since they have

408
00:19:24,240 --> 00:19:29,360
more resources

409
00:19:26,880 --> 00:19:31,440
available compared to all nodes in

410
00:19:29,360 --> 00:19:33,760
general

411
00:19:31,440 --> 00:19:36,640
here is another fact is

412
00:19:33,760 --> 00:19:39,679
new machines have a higher chance of

413
00:19:36,640 --> 00:19:42,400
damage this is called

414
00:19:39,679 --> 00:19:44,400
bathtub curve

415
00:19:42,400 --> 00:19:46,240
compare these two things cause the

416
00:19:44,400 --> 00:19:48,400
incidence happen

417
00:19:46,240 --> 00:19:50,240
data right into newly added node and you

418
00:19:48,400 --> 00:19:53,840
not

419
00:19:50,240 --> 00:19:53,840
broken that's why

420
00:19:54,640 --> 00:20:00,720
this is another classical reliability

421
00:19:57,039 --> 00:20:02,960
trends caused by hardware fault

422
00:20:00,720 --> 00:20:04,000
on small scale these things are really

423
00:20:02,960 --> 00:20:06,159
happening

424
00:20:04,000 --> 00:20:07,919
many people need more backline their

425
00:20:06,159 --> 00:20:09,760
computer because they don't think this

426
00:20:07,919 --> 00:20:11,039
will happen on their

427
00:20:09,760 --> 00:20:13,760
on their

428
00:20:11,039 --> 00:20:15,760
yeah on their computer

429
00:20:13,760 --> 00:20:17,919
but in a large scale system there are

430
00:20:15,760 --> 00:20:18,880
lots happen regularly

431
00:20:17,919 --> 00:20:21,200
and

432
00:20:18,880 --> 00:20:23,679
the value is is it's almost impossible

433
00:20:21,200 --> 00:20:25,360
to solve it completely

434
00:20:23,679 --> 00:20:27,600
i can tell you how many works we

435
00:20:25,360 --> 00:20:30,320
definitely use

436
00:20:27,600 --> 00:20:31,120
with the week long run test

437
00:20:30,320 --> 00:20:33,120
we

438
00:20:31,120 --> 00:20:36,240
keep writing and reading data in the

439
00:20:33,120 --> 00:20:37,679
system in that while we're doing lab

440
00:20:36,240 --> 00:20:40,640
tests

441
00:20:37,679 --> 00:20:43,600
we even go to the data center and plug

442
00:20:40,640 --> 00:20:46,320
on the block of the

443
00:20:43,600 --> 00:20:47,679
laser cable directly to see what's

444
00:20:46,320 --> 00:20:50,720
happening

445
00:20:47,679 --> 00:20:53,360
we also analyze this system reliability

446
00:20:50,720 --> 00:20:56,159
by some theoretical methods such as

447
00:20:53,360 --> 00:20:58,960
markov trim

448
00:20:56,159 --> 00:21:00,080
if you're interested you can go to and

449
00:20:58,960 --> 00:21:01,280
see

450
00:21:00,080 --> 00:21:04,000
soa

451
00:21:01,280 --> 00:21:06,400
that means service level agreement of

452
00:21:04,000 --> 00:21:09,360
your club provider

453
00:21:06,400 --> 00:21:12,159
i think there is no one guarantee 100

454
00:21:09,360 --> 00:21:12,159
reliability

455
00:21:14,799 --> 00:21:19,919
in the end what can we learn from these

456
00:21:17,360 --> 00:21:19,919
stories

457
00:21:22,159 --> 00:21:27,120
three takeaways

458
00:21:24,559 --> 00:21:30,960
before we start i have to remind you

459
00:21:27,120 --> 00:21:32,240
like it's um a lot of the subjective

460
00:21:30,960 --> 00:21:35,440
you can agree

461
00:21:32,240 --> 00:21:37,760
or not and also welcome to discuss it

462
00:21:35,440 --> 00:21:37,760
later

463
00:21:38,159 --> 00:21:45,360
first no sleep bullet classical

464
00:21:42,960 --> 00:21:47,360
we've seen successful stories from

465
00:21:45,360 --> 00:21:50,080
google or facebook

466
00:21:47,360 --> 00:21:51,120
where i was a junior junior engineer i

467
00:21:50,080 --> 00:21:54,320
may think

468
00:21:51,120 --> 00:21:56,240
their story are so amazing we should

469
00:21:54,320 --> 00:21:59,520
keep following their tech stacks to

470
00:21:56,240 --> 00:22:02,799
build reliable and scalable system

471
00:21:59,520 --> 00:22:04,159
however after the journey through

472
00:22:02,799 --> 00:22:05,520
my journey

473
00:22:04,159 --> 00:22:08,480
i

474
00:22:05,520 --> 00:22:09,840
know that there are many things behind

475
00:22:08,480 --> 00:22:12,320
the thing

476
00:22:09,840 --> 00:22:14,720
but why why can we just follow some

477
00:22:12,320 --> 00:22:16,480
blueprints to build a reliable and

478
00:22:14,720 --> 00:22:18,960
scalable system

479
00:22:16,480 --> 00:22:22,159
like building a bridge

480
00:22:18,960 --> 00:22:22,159
or a tower

481
00:22:23,200 --> 00:22:29,600
in my opinion first is how to enumerate

482
00:22:26,960 --> 00:22:30,640
all possible reliability reliability

483
00:22:29,600 --> 00:22:34,480
causes

484
00:22:30,640 --> 00:22:38,159
just like what i showed you previously

485
00:22:34,480 --> 00:22:39,840
even though we do many preparations

486
00:22:38,159 --> 00:22:41,760
that are still

487
00:22:39,840 --> 00:22:43,360
almost

488
00:22:41,760 --> 00:22:45,200
second

489
00:22:43,360 --> 00:22:47,520
pattern and law

490
00:22:45,200 --> 00:22:49,120
keep changing while your business

491
00:22:47,520 --> 00:22:51,600
expands

492
00:22:49,120 --> 00:22:54,080
you cannot get a final version of the

493
00:22:51,600 --> 00:22:55,280
specs that tell you how many kinds of

494
00:22:54,080 --> 00:22:57,760
workload

495
00:22:55,280 --> 00:23:00,960
we will have and how many requests per

496
00:22:57,760 --> 00:23:00,960
second are needed

497
00:23:01,039 --> 00:23:06,720
as a software engineer i think that's a

498
00:23:04,000 --> 00:23:10,880
very interesting and very

499
00:23:06,720 --> 00:23:10,880
unique challenge we need to deal with

500
00:23:12,000 --> 00:23:14,480
second

501
00:23:13,200 --> 00:23:17,520
you may think

502
00:23:14,480 --> 00:23:19,520
exposure those incidents will not happen

503
00:23:17,520 --> 00:23:21,200
if you have some

504
00:23:19,520 --> 00:23:23,520
fancy techniques

505
00:23:21,200 --> 00:23:26,480
yeah yeah that's fair

506
00:23:23,520 --> 00:23:31,120
i believe a talent engineer can prevent

507
00:23:26,480 --> 00:23:31,120
prevent some of many incidents we have

508
00:23:32,080 --> 00:23:35,440
but

509
00:23:32,880 --> 00:23:39,600
she can use

510
00:23:35,440 --> 00:23:41,919
and tree in most rice scenarios she can

511
00:23:39,600 --> 00:23:46,159
i don't know use eventhousing to scale

512
00:23:41,919 --> 00:23:51,039
system out or she can introduce puzzles

513
00:23:46,159 --> 00:23:54,000
to get strong consistency or adapt

514
00:23:51,039 --> 00:23:57,760
calcite consistency hash ring and

515
00:23:54,000 --> 00:24:00,000
irrational coating in their system

516
00:23:57,760 --> 00:24:03,200
but there's one more thing

517
00:24:00,000 --> 00:24:06,600
i have mentioned yet

518
00:24:03,200 --> 00:24:10,240
the platform reliability and scalability

519
00:24:06,600 --> 00:24:11,600
maintainability is also a key maker

520
00:24:10,240 --> 00:24:13,360
in his book

521
00:24:11,600 --> 00:24:16,240
mapping complements they lack

522
00:24:13,360 --> 00:24:18,720
reliability scalability and

523
00:24:16,240 --> 00:24:20,400
maintainability are the three most

524
00:24:18,720 --> 00:24:23,039
important factors

525
00:24:20,400 --> 00:24:25,679
while designing a data intensive

526
00:24:23,039 --> 00:24:25,679
application

527
00:24:26,159 --> 00:24:30,960
although the insert incidents i

528
00:24:28,480 --> 00:24:34,159
mentioned are primarily about

529
00:24:30,960 --> 00:24:36,720
scalability and reliability

530
00:24:34,159 --> 00:24:39,919
maintainability also play an important

531
00:24:36,720 --> 00:24:39,919
role behind the scene

532
00:24:40,080 --> 00:24:44,840
generally speaking

533
00:24:42,159 --> 00:24:48,000
a data intensive

534
00:24:44,840 --> 00:24:50,480
application it's not also a large scale

535
00:24:48,000 --> 00:24:53,760
system right

536
00:24:50,480 --> 00:24:56,159
it's essential to have a talent team to

537
00:24:53,760 --> 00:25:01,279
support and involve it

538
00:24:56,159 --> 00:25:02,799
no one no one can do it by herself

539
00:25:01,279 --> 00:25:04,640
uh here is this

540
00:25:02,799 --> 00:25:08,559
example i heard

541
00:25:04,640 --> 00:25:12,400
kafka we know it's a messenger kill or

542
00:25:08,559 --> 00:25:15,760
string in processing system right

543
00:25:12,400 --> 00:25:17,840
usually it's just a component in a data

544
00:25:15,760 --> 00:25:20,559
intensive allocation

545
00:25:17,840 --> 00:25:25,039
but you know what in leaking they had

546
00:25:20,559 --> 00:25:25,039
hundreds of engineers to maintain it

547
00:25:26,240 --> 00:25:30,720
why do i mention this here because i

548
00:25:28,640 --> 00:25:33,120
know you guys are those who are willing

549
00:25:30,720 --> 00:25:35,120
to challenge yourself that's a good

550
00:25:33,120 --> 00:25:37,840
thing that's a good thing

551
00:25:35,120 --> 00:25:39,919
but i i recommend you to think twice

552
00:25:37,840 --> 00:25:42,320
before you introduce some advanced

553
00:25:39,919 --> 00:25:44,880
techniques in your stack

554
00:25:42,320 --> 00:25:48,559
consider your thing have ability to

555
00:25:44,880 --> 00:25:48,559
maintain it or not

556
00:25:51,120 --> 00:25:54,559
now you may think

557
00:25:52,960 --> 00:25:56,880
you talk a lot but

558
00:25:54,559 --> 00:25:58,880
the slot meaning it doesn't have

559
00:25:56,880 --> 00:26:00,400
anything to do to build a reliable and

560
00:25:58,880 --> 00:26:03,120
scalable system

561
00:26:00,400 --> 00:26:05,200
i have no answers here but

562
00:26:03,120 --> 00:26:07,120
i think the most important thing within

563
00:26:05,200 --> 00:26:08,159
policing studies

564
00:26:07,120 --> 00:26:11,360
people

565
00:26:08,159 --> 00:26:12,880
people behind the systems and machines

566
00:26:11,360 --> 00:26:13,840
are matter

567
00:26:12,880 --> 00:26:17,760
people

568
00:26:13,840 --> 00:26:21,600
are the most important part of a service

569
00:26:17,760 --> 00:26:22,960
actually we engineers provide service to

570
00:26:21,600 --> 00:26:24,720
our customers

571
00:26:22,960 --> 00:26:27,840
nano's machines

572
00:26:24,720 --> 00:26:29,360
machines are just our interface and

573
00:26:27,840 --> 00:26:32,559
tools

574
00:26:29,360 --> 00:26:35,440
we need talent engineer to not escape

575
00:26:32,559 --> 00:26:36,880
problems and fix them behold before bad

576
00:26:35,440 --> 00:26:39,120
things happen

577
00:26:36,880 --> 00:26:41,840
we need engineers to tolerate and

578
00:26:39,120 --> 00:26:46,000
troubleshooting reliability problems

579
00:26:41,840 --> 00:26:48,320
also we need sre engineers to build an

580
00:26:46,000 --> 00:26:50,240
infrared that has this opportunity to

581
00:26:48,320 --> 00:26:53,279
consumer errors

582
00:26:50,240 --> 00:26:55,760
the infrared can also help developers to

583
00:26:53,279 --> 00:26:57,360
figure out problems in a more productive

584
00:26:55,760 --> 00:27:00,159
way

585
00:26:57,360 --> 00:27:02,559
finally we need supervisors and project

586
00:27:00,159 --> 00:27:05,760
manager to understand it's important to

587
00:27:02,559 --> 00:27:09,039
have a good development culture

588
00:27:05,760 --> 00:27:11,520
this is my zen of data intensive system

589
00:27:09,039 --> 00:27:14,799
design

590
00:27:11,520 --> 00:27:16,240
if you have experience i share today

591
00:27:14,799 --> 00:27:19,039
you may have

592
00:27:16,240 --> 00:27:22,159
similar feelings about these things if

593
00:27:19,039 --> 00:27:23,039
not i learned these things the hard way

594
00:27:22,159 --> 00:27:26,000
and

595
00:27:23,039 --> 00:27:29,919
i hope this talk will save you from

596
00:27:26,000 --> 00:27:29,919
repeating the same mistakes

597
00:27:30,640 --> 00:27:33,880
thank you

598
00:27:34,080 --> 00:27:38,399
well thanks peter that is that was

599
00:27:36,159 --> 00:27:39,919
excellent um lots that i agree with

600
00:27:38,399 --> 00:27:41,600
there which is just tremendous i

601
00:27:39,919 --> 00:27:44,000
particularly like the emphasis on

602
00:27:41,600 --> 00:27:45,520
maintainability uh that you mentioned

603
00:27:44,000 --> 00:27:46,640
there towards the end of the talk i

604
00:27:45,520 --> 00:27:49,600
think that's something that we often

605
00:27:46,640 --> 00:27:52,480
overlook and the fact that it's it's all

606
00:27:49,600 --> 00:27:54,960
being looked after and managed by humans

607
00:27:52,480 --> 00:27:57,360
and largely as all of us here in the

608
00:27:54,960 --> 00:27:58,880
devops track are well aware the humans

609
00:27:57,360 --> 00:28:00,880
are where a lot of the challenge comes

610
00:27:58,880 --> 00:28:02,720
from so thank you so much for that talk

611
00:28:00,880 --> 00:28:05,279
um now peter is actually here in the

612
00:28:02,720 --> 00:28:07,200
chat so if you have questions about

613
00:28:05,279 --> 00:28:09,279
anything that was in in the talk there

614
00:28:07,200 --> 00:28:10,320
do drop it into the chat and hopefully

615
00:28:09,279 --> 00:28:11,360
peter will be able to answer those

616
00:28:10,320 --> 00:28:12,559
questions

617
00:28:11,360 --> 00:28:15,200
um

618
00:28:12,559 --> 00:28:17,600
thank you so much that was tremendous

619
00:28:15,200 --> 00:28:18,880
now between uh now and the next talk we

620
00:28:17,600 --> 00:28:19,600
have another break

621
00:28:18,880 --> 00:28:21,520
so

622
00:28:19,600 --> 00:28:23,840
drop in some questions uh have a chat

623
00:28:21,520 --> 00:28:25,279
amongst yourselves um otherwise yes grab

624
00:28:23,840 --> 00:28:27,279
a quick drink

625
00:28:25,279 --> 00:28:29,440
stretch your legs and we'll be back at

626
00:28:27,279 --> 00:28:31,840
about 3 p.m for the next talk see you

627
00:28:29,440 --> 00:28:31,840
soon