1 00:00:12,080 --> 00:00:15,440 welcome back everyone to the devops 2 00:00:13,679 --> 00:00:19,039 track here at pycon 3 00:00:15,440 --> 00:00:21,439 pike online 2021 it's uh it's wonderful 4 00:00:19,039 --> 00:00:24,160 to have you all here and uh we welcome 5 00:00:21,439 --> 00:00:25,760 our next speaker with a tremendous talk 6 00:00:24,160 --> 00:00:27,599 it's molly rowe who is here to talk 7 00:00:25,760 --> 00:00:29,039 about metrics for good 8 00:00:27,599 --> 00:00:32,239 not evil 9 00:00:29,039 --> 00:00:33,680 molly over to you 10 00:00:32,239 --> 00:00:36,399 thank you 11 00:00:33,680 --> 00:00:38,000 hey there everyone my name is molly i'm 12 00:00:36,399 --> 00:00:39,680 here today to talk to you about metrics 13 00:00:38,000 --> 00:00:40,960 for good not evil thank you for coming 14 00:00:39,680 --> 00:00:43,520 to my talk 15 00:00:40,960 --> 00:00:46,000 um so i'm the head of people and culture 16 00:00:43,520 --> 00:00:47,280 for record point i have a pretty weird 17 00:00:46,000 --> 00:00:48,640 background 18 00:00:47,280 --> 00:00:50,160 particularly compared to a lot of the 19 00:00:48,640 --> 00:00:51,440 other people who are talking at pycon 20 00:00:50,160 --> 00:00:53,920 this weekend 21 00:00:51,440 --> 00:00:56,239 um i've been a scientist a project 22 00:00:53,920 --> 00:00:59,280 manager a scrum lead a hybrid cloud 23 00:00:56,239 --> 00:01:02,000 consultant a recruiter a manager and now 24 00:00:59,280 --> 00:01:02,960 this head of people and culture role 25 00:01:02,000 --> 00:01:04,960 um 26 00:01:02,960 --> 00:01:06,560 i've spent the last seven or so years 27 00:01:04,960 --> 00:01:09,040 trying to solve the pain points that 28 00:01:06,560 --> 00:01:11,360 plague so many tech companies 29 00:01:09,040 --> 00:01:13,840 using people-centric solutions 30 00:01:11,360 --> 00:01:16,799 so a combination of tooling technology 31 00:01:13,840 --> 00:01:19,360 metrics and empathy 32 00:01:16,799 --> 00:01:21,759 over the last seven years i've kind of 33 00:01:19,360 --> 00:01:24,840 run about almost a thousand interviews 34 00:01:21,759 --> 00:01:27,360 globally for engineering sre and devops 35 00:01:24,840 --> 00:01:29,119 roles um and 36 00:01:27,360 --> 00:01:30,720 in every single one of those i've been 37 00:01:29,119 --> 00:01:33,040 asking the candidates 38 00:01:30,720 --> 00:01:35,119 what kind of company do you want to work 39 00:01:33,040 --> 00:01:37,759 for and then taking that data back and 40 00:01:35,119 --> 00:01:40,000 trying to make that a reality 41 00:01:37,759 --> 00:01:42,159 so i kind of use this constant influx of 42 00:01:40,000 --> 00:01:43,759 user feedback to iterate on my idea of 43 00:01:42,159 --> 00:01:45,759 what engineers want 44 00:01:43,759 --> 00:01:47,840 and what businesses need 45 00:01:45,759 --> 00:01:51,280 and the complexity of the problem that 46 00:01:47,840 --> 00:01:51,280 separates those two things 47 00:01:52,159 --> 00:01:56,079 and kind of off the back of that i've 48 00:01:54,000 --> 00:01:58,000 built this role at record point 49 00:01:56,079 --> 00:02:00,079 for any of us who any of those who 50 00:01:58,000 --> 00:02:01,119 haven't heard of us we're a software as 51 00:02:00,079 --> 00:02:03,040 a service 52 00:02:01,119 --> 00:02:05,680 software engineering company 53 00:02:03,040 --> 00:02:07,439 um that produces a federated information 54 00:02:05,680 --> 00:02:09,599 management product 55 00:02:07,439 --> 00:02:11,360 we try to reduce the cost and the risk 56 00:02:09,599 --> 00:02:12,560 and the manual effort associated with 57 00:02:11,360 --> 00:02:16,319 going through 58 00:02:12,560 --> 00:02:18,640 data and records management audits 59 00:02:16,319 --> 00:02:21,599 we're a scale-up of about 75 people 60 00:02:18,640 --> 00:02:24,000 globally and 40 to 45 of those are 61 00:02:21,599 --> 00:02:26,319 software engineers and sres 62 00:02:24,000 --> 00:02:27,760 this means that a big chunk of my staff 63 00:02:26,319 --> 00:02:30,959 are engineers 64 00:02:27,760 --> 00:02:32,319 and that plays a big role in my employee 65 00:02:30,959 --> 00:02:34,879 engagement 66 00:02:32,319 --> 00:02:37,360 the overall culture of the business and 67 00:02:34,879 --> 00:02:39,760 our retention rates so everything that i 68 00:02:37,360 --> 00:02:41,840 can do to improve the processes and 69 00:02:39,760 --> 00:02:43,599 remove the pain points and improve the 70 00:02:41,840 --> 00:02:45,599 developer experience 71 00:02:43,599 --> 00:02:47,599 uh is a benefit to 72 00:02:45,599 --> 00:02:51,360 the whole business 73 00:02:47,599 --> 00:02:52,160 which is why i care about slos 74 00:02:51,360 --> 00:02:54,239 so 75 00:02:52,160 --> 00:02:56,239 rekka point has 76 00:02:54,239 --> 00:02:58,000 gone through over the past several years 77 00:02:56,239 --> 00:02:59,760 a pretty common experience that i've 78 00:02:58,000 --> 00:03:01,440 seen across multiple different software 79 00:02:59,760 --> 00:03:03,840 engineering companies 80 00:03:01,440 --> 00:03:05,840 where we've had a shifting power balance 81 00:03:03,840 --> 00:03:07,920 between our engineering department and 82 00:03:05,840 --> 00:03:09,840 our product department 83 00:03:07,920 --> 00:03:11,040 we've had leadership transitions that 84 00:03:09,840 --> 00:03:13,599 have meant that there's been kind of 85 00:03:11,040 --> 00:03:14,560 rapid changes in technical direction as 86 00:03:13,599 --> 00:03:17,200 well as 87 00:03:14,560 --> 00:03:19,200 changes in overall delivery focus 88 00:03:17,200 --> 00:03:21,440 and even though both sides of that coin 89 00:03:19,200 --> 00:03:24,080 both product and engineering have had 90 00:03:21,440 --> 00:03:26,400 their time in the sun had their control 91 00:03:24,080 --> 00:03:26,400 um 92 00:03:26,879 --> 00:03:30,879 it hasn't really benefited anyone 93 00:03:29,120 --> 00:03:32,959 so until 94 00:03:30,879 --> 00:03:35,040 you find a balanced approach where those 95 00:03:32,959 --> 00:03:38,080 two factions are working if not in 96 00:03:35,040 --> 00:03:40,400 harmony at least in concert 97 00:03:38,080 --> 00:03:41,440 you're going to have struggles 98 00:03:40,400 --> 00:03:43,120 so 99 00:03:41,440 --> 00:03:44,640 i kind of went out to the market and 100 00:03:43,120 --> 00:03:47,040 looked for 101 00:03:44,640 --> 00:03:48,879 a magic wand for how to solve this 102 00:03:47,040 --> 00:03:50,400 opposition nature that you see in so 103 00:03:48,879 --> 00:03:52,560 many software engineering businesses 104 00:03:50,400 --> 00:03:53,920 between product and engineering 105 00:03:52,560 --> 00:03:55,200 uh and i want to show where i can find 106 00:03:53,920 --> 00:03:57,840 one 107 00:03:55,200 --> 00:04:00,560 and like most things in the life 108 00:03:57,840 --> 00:04:02,239 i found that in google in this case 109 00:04:00,560 --> 00:04:04,239 actually in google 110 00:04:02,239 --> 00:04:06,159 so google's development of the sre 111 00:04:04,239 --> 00:04:09,920 handbook and the incredible 112 00:04:06,159 --> 00:04:12,319 documentation around slos slas 113 00:04:09,920 --> 00:04:14,080 and slis has kind of led me down this 114 00:04:12,319 --> 00:04:16,400 rabbit hole of telemetry and 115 00:04:14,080 --> 00:04:18,079 visualization and metrics 116 00:04:16,400 --> 00:04:20,160 that have taught me about finding a 117 00:04:18,079 --> 00:04:21,440 common language of care 118 00:04:20,160 --> 00:04:22,960 within 119 00:04:21,440 --> 00:04:26,240 that dynamic 120 00:04:22,960 --> 00:04:26,240 and the care for the user 121 00:04:27,040 --> 00:04:29,360 so 122 00:04:27,759 --> 00:04:31,280 a common complaint that i come across 123 00:04:29,360 --> 00:04:32,720 from engineers is that product 124 00:04:31,280 --> 00:04:35,280 management are changing the priorities 125 00:04:32,720 --> 00:04:37,840 too often and work never gets fully 126 00:04:35,280 --> 00:04:39,759 completed or the work that gets time of 127 00:04:37,840 --> 00:04:41,600 day is always feature development work 128 00:04:39,759 --> 00:04:43,600 and never platform or systems 129 00:04:41,600 --> 00:04:45,360 improvements 130 00:04:43,600 --> 00:04:47,600 and the reciprocal complaint comes out 131 00:04:45,360 --> 00:04:50,240 of product you know we're not delivering 132 00:04:47,600 --> 00:04:53,360 client value services being interrupted 133 00:04:50,240 --> 00:04:55,600 by issues and bugs and crappy code and 134 00:04:53,360 --> 00:04:57,680 unreliable platforms 135 00:04:55,600 --> 00:05:00,160 and this rhetoric 136 00:04:57,680 --> 00:05:01,919 justified or not from both sides is kind 137 00:05:00,160 --> 00:05:04,320 of founded in the fact that those teams 138 00:05:01,919 --> 00:05:06,400 are incentivized differently 139 00:05:04,320 --> 00:05:08,639 product is often measured on 140 00:05:06,400 --> 00:05:10,639 delivery of skus 141 00:05:08,639 --> 00:05:13,039 rather than overall customer experience 142 00:05:10,639 --> 00:05:14,880 and engineers measured on so many things 143 00:05:13,039 --> 00:05:17,280 right features shipped deployment 144 00:05:14,880 --> 00:05:19,440 frequency velocity 145 00:05:17,280 --> 00:05:21,120 and this really often leaves us already 146 00:05:19,440 --> 00:05:22,720 holding the bag 147 00:05:21,120 --> 00:05:24,479 they're responsible for the platform 148 00:05:22,720 --> 00:05:26,240 availability and the latency and the 149 00:05:24,479 --> 00:05:27,919 mean time to resolve whenever something 150 00:05:26,240 --> 00:05:29,919 goes wrong 151 00:05:27,919 --> 00:05:32,400 but all of those metrics are heavily 152 00:05:29,919 --> 00:05:33,919 influenced by not only 153 00:05:32,400 --> 00:05:37,280 the priorities that come out of 154 00:05:33,919 --> 00:05:40,800 engineering and and product but also the 155 00:05:37,280 --> 00:05:42,240 work that's done by those two teams 156 00:05:40,800 --> 00:05:44,240 and this is where the power of data 157 00:05:42,240 --> 00:05:46,720 comes in and this is where the role of 158 00:05:44,240 --> 00:05:48,560 sre into the future comes in 159 00:05:46,720 --> 00:05:51,280 because sre are holding the keys to the 160 00:05:48,560 --> 00:05:53,840 kingdom when it comes to 161 00:05:51,280 --> 00:05:57,360 data and being able to create a common 162 00:05:53,840 --> 00:05:58,160 language between those two factions 163 00:05:57,360 --> 00:05:59,759 um 164 00:05:58,160 --> 00:06:01,199 for some organizations 165 00:05:59,759 --> 00:06:03,199 uh those keys are still under 166 00:06:01,199 --> 00:06:05,280 construction and that's okay because all 167 00:06:03,199 --> 00:06:06,880 the raw materials are still there 168 00:06:05,280 --> 00:06:09,520 all of the raw materials and all of the 169 00:06:06,880 --> 00:06:10,560 raw data exists within your services 170 00:06:09,520 --> 00:06:14,560 today 171 00:06:10,560 --> 00:06:14,560 and the future is about how you use it 172 00:06:15,120 --> 00:06:19,039 i personally have 173 00:06:17,280 --> 00:06:21,199 quite an obsession with metrics and data 174 00:06:19,039 --> 00:06:22,960 in general and a belief that until 175 00:06:21,199 --> 00:06:24,319 something's measured it can't really be 176 00:06:22,960 --> 00:06:26,639 improved 177 00:06:24,319 --> 00:06:27,600 at least not on purpose 178 00:06:26,639 --> 00:06:28,560 um 179 00:06:27,600 --> 00:06:30,800 generally 180 00:06:28,560 --> 00:06:33,199 and look i'll admit this talking about 181 00:06:30,800 --> 00:06:34,880 metrics will lose the attention of the 182 00:06:33,199 --> 00:06:37,280 audience and cause people's eyes to 183 00:06:34,880 --> 00:06:38,240 glaze over i know that 184 00:06:37,280 --> 00:06:40,240 but 185 00:06:38,240 --> 00:06:42,880 it's actually because most people have a 186 00:06:40,240 --> 00:06:44,720 pretty crappy experience with metrics 187 00:06:42,880 --> 00:06:47,199 almost all of us have sat down in a 188 00:06:44,720 --> 00:06:48,960 meeting or a retro and gone through the 189 00:06:47,199 --> 00:06:50,160 data and gone 190 00:06:48,960 --> 00:06:52,479 yeah 191 00:06:50,160 --> 00:06:54,160 but that's not right because 192 00:06:52,479 --> 00:06:55,120 x 193 00:06:54,160 --> 00:06:56,720 and 194 00:06:55,120 --> 00:06:59,680 going we all end up going down this 195 00:06:56,720 --> 00:07:02,080 justification or explanation pathway of 196 00:06:59,680 --> 00:07:05,120 why that data is incorrect that even to 197 00:07:02,080 --> 00:07:07,440 our own ears feels like excuses 198 00:07:05,120 --> 00:07:09,360 and that's not our fault 199 00:07:07,440 --> 00:07:12,160 this is because in general people are 200 00:07:09,360 --> 00:07:14,319 really terrible at setting metrics and 201 00:07:12,160 --> 00:07:16,400 we often default to study metrics that 202 00:07:14,319 --> 00:07:17,599 address the symptom of a problem rather 203 00:07:16,400 --> 00:07:20,400 than a root 204 00:07:17,599 --> 00:07:20,400 a root cause 205 00:07:20,479 --> 00:07:24,960 um so let me tell you a quick story 206 00:07:22,319 --> 00:07:27,039 about metrics gone wrong 207 00:07:24,960 --> 00:07:29,280 back in the days of colonially occupied 208 00:07:27,039 --> 00:07:30,720 india the british governor was concerned 209 00:07:29,280 --> 00:07:32,160 with the number of venomous cobras in 210 00:07:30,720 --> 00:07:33,919 the streets of delhi 211 00:07:32,160 --> 00:07:36,800 he decided to implement a scheme 212 00:07:33,919 --> 00:07:39,680 offering cash rewards for cobra heads so 213 00:07:36,800 --> 00:07:41,280 basically he equated dead snakes equals 214 00:07:39,680 --> 00:07:42,319 less snakes in the street 215 00:07:41,280 --> 00:07:45,440 seems 216 00:07:42,319 --> 00:07:47,199 okay on the surface um 217 00:07:45,440 --> 00:07:49,919 initially the scheme seemed to be 218 00:07:47,199 --> 00:07:51,599 successful they were redeeming 219 00:07:49,919 --> 00:07:52,800 or lots of people were redeeming the 220 00:07:51,599 --> 00:07:55,199 cash prize 221 00:07:52,800 --> 00:07:57,280 um and you know assuming that this 222 00:07:55,199 --> 00:07:58,560 continues on track the number of snakes 223 00:07:57,280 --> 00:08:00,560 in the street was 224 00:07:58,560 --> 00:08:03,840 you know surely going to decrease 225 00:08:00,560 --> 00:08:06,879 however over time that didn't happen 226 00:08:03,840 --> 00:08:09,120 and they were trying to figure out why 227 00:08:06,879 --> 00:08:12,080 so upon investigation 228 00:08:09,120 --> 00:08:14,560 what they found was that some very 229 00:08:12,080 --> 00:08:16,560 innovative individuals and then all of 230 00:08:14,560 --> 00:08:20,319 their enterprising neighbors 231 00:08:16,560 --> 00:08:22,000 had started farming cobras 232 00:08:20,319 --> 00:08:22,879 yeah 233 00:08:22,000 --> 00:08:24,960 they 234 00:08:22,879 --> 00:08:26,400 basically started farming these cobras 235 00:08:24,960 --> 00:08:29,360 with the full intent to kill them and 236 00:08:26,400 --> 00:08:31,680 then redeem them for the cash prize 237 00:08:29,360 --> 00:08:33,120 in a rage the governor canceled the 238 00:08:31,680 --> 00:08:34,880 scheme and said this is not in the 239 00:08:33,120 --> 00:08:36,479 spirit of what we intended i was trying 240 00:08:34,880 --> 00:08:39,039 to protect you 241 00:08:36,479 --> 00:08:41,760 no more no more money for snakes 242 00:08:39,039 --> 00:08:43,919 and so what you then have is a populace 243 00:08:41,760 --> 00:08:45,200 who have snake farms in their houses and 244 00:08:43,919 --> 00:08:47,440 backyards 245 00:08:45,200 --> 00:08:49,360 who no longer have an incentive to do so 246 00:08:47,440 --> 00:08:51,600 to keep those snakes 247 00:08:49,360 --> 00:08:53,040 and so they did the easiest and most 248 00:08:51,600 --> 00:08:56,720 predictable thing 249 00:08:53,040 --> 00:08:56,720 and let the snakes go into the streets 250 00:08:56,959 --> 00:09:00,399 so 251 00:08:58,720 --> 00:09:02,399 with a well-meaning well-intentioned 252 00:09:00,399 --> 00:09:04,959 scheme to reduce the number of venomous 253 00:09:02,399 --> 00:09:07,279 cobras and protect the population 254 00:09:04,959 --> 00:09:09,440 the actual result was that by several 255 00:09:07,279 --> 00:09:13,120 orders of magnitude they increased the 256 00:09:09,440 --> 00:09:13,120 number of snakes in the streets in delhi 257 00:09:13,200 --> 00:09:17,839 this law of unintended consequences has 258 00:09:16,080 --> 00:09:20,480 come to be known as the cobra effect 259 00:09:17,839 --> 00:09:22,959 it's so eponymous with this scenario 260 00:09:20,480 --> 00:09:24,880 that i've just um described 261 00:09:22,959 --> 00:09:27,279 and you know you might be tempted to say 262 00:09:24,880 --> 00:09:29,680 look molly that was 200 years ago in 263 00:09:27,279 --> 00:09:31,519 colonial india this is not a great 264 00:09:29,680 --> 00:09:35,399 example people wouldn't do anything 265 00:09:31,519 --> 00:09:35,399 quite so stupid anymore 266 00:09:36,240 --> 00:09:39,200 well 267 00:09:37,040 --> 00:09:42,640 the most recent entry into the cobra 268 00:09:39,200 --> 00:09:45,360 effect hall of fame is uh it goes to the 269 00:09:42,640 --> 00:09:47,680 university in the usa who recently had 270 00:09:45,360 --> 00:09:50,800 to release a statement that said 271 00:09:47,680 --> 00:09:53,600 to any students who have voluntarily 272 00:09:50,800 --> 00:09:54,880 exposed themselves to covert 19 if you 273 00:09:53,600 --> 00:09:57,279 continue 274 00:09:54,880 --> 00:09:59,040 too many students who have done that you 275 00:09:57,279 --> 00:10:00,880 run the risk of being suspended or 276 00:09:59,040 --> 00:10:02,480 expelled 277 00:10:00,880 --> 00:10:04,880 they had to release this statement 278 00:10:02,480 --> 00:10:07,040 because the local plasma donation center 279 00:10:04,880 --> 00:10:09,519 had increased the money reward for 280 00:10:07,040 --> 00:10:11,760 donating plasma for anyone who had 281 00:10:09,519 --> 00:10:13,680 active covert 19 antibodies so someone 282 00:10:11,760 --> 00:10:16,880 who had had covert to more than a 283 00:10:13,680 --> 00:10:17,920 hundred dollars per donation 284 00:10:16,880 --> 00:10:19,360 the scheme was completely 285 00:10:17,920 --> 00:10:21,200 well-intentioned they were trying to 286 00:10:19,360 --> 00:10:25,279 treat people who were in critical care 287 00:10:21,200 --> 00:10:27,839 with covid which is a a use of plasma 288 00:10:25,279 --> 00:10:30,800 and instead what they created was 289 00:10:27,839 --> 00:10:33,279 more people in critical care with covert 290 00:10:30,800 --> 00:10:34,880 in that area 291 00:10:33,279 --> 00:10:37,519 so 292 00:10:34,880 --> 00:10:39,360 this is me circling back to why most 293 00:10:37,519 --> 00:10:41,279 people have a terrible traditional 294 00:10:39,360 --> 00:10:43,519 experience with metrics 295 00:10:41,279 --> 00:10:46,000 humans are pretty bad at using critical 296 00:10:43,519 --> 00:10:47,920 analysis when it comes to incentives and 297 00:10:46,000 --> 00:10:49,839 are rarely just looking for an empirical 298 00:10:47,920 --> 00:10:51,920 outcome like you should rarely be 299 00:10:49,839 --> 00:10:53,519 looking for an empirical outcome you're 300 00:10:51,920 --> 00:10:56,720 often trying to actually influence the 301 00:10:53,519 --> 00:10:59,519 behaviors that drive that outcome 302 00:10:56,720 --> 00:11:01,120 you're not really looking for snakeheads 303 00:10:59,519 --> 00:11:03,760 i hope 304 00:11:01,120 --> 00:11:05,519 you're actually looking for less snakes 305 00:11:03,760 --> 00:11:08,079 you're not really looking for more test 306 00:11:05,519 --> 00:11:10,079 coverage you're looking for better code 307 00:11:08,079 --> 00:11:12,160 so when you're proxying the behavior 308 00:11:10,079 --> 00:11:13,760 that you want with an easy to measure 309 00:11:12,160 --> 00:11:15,760 symptom 310 00:11:13,760 --> 00:11:18,320 this is a common downfall of general 311 00:11:15,760 --> 00:11:18,320 metrics 312 00:11:19,760 --> 00:11:23,040 it's pretty much just a case of be 313 00:11:21,360 --> 00:11:24,959 careful what you wish for 314 00:11:23,040 --> 00:11:26,720 um there's a couple of easy ways to 315 00:11:24,959 --> 00:11:29,279 counteract a lot of these pitfalls that 316 00:11:26,720 --> 00:11:30,399 go beyond imagine how you would game the 317 00:11:29,279 --> 00:11:32,079 system 318 00:11:30,399 --> 00:11:33,680 because people are infinitely 319 00:11:32,079 --> 00:11:37,040 resourceful particularly when you 320 00:11:33,680 --> 00:11:37,040 incentivize them to be so 321 00:11:37,120 --> 00:11:40,800 pretty much everyone should be familiar 322 00:11:38,560 --> 00:11:43,040 with this cost speed quality trade-off 323 00:11:40,800 --> 00:11:45,120 triangle and this is actually a great 324 00:11:43,040 --> 00:11:46,399 way to build metrics 325 00:11:45,120 --> 00:11:49,120 because you should have metrics that 326 00:11:46,399 --> 00:11:51,200 contradict each other 327 00:11:49,120 --> 00:11:53,440 what that allows you to do is create a 328 00:11:51,200 --> 00:11:56,639 balance where 329 00:11:53,440 --> 00:11:58,959 to exceed or to gain a metric or or to 330 00:11:56,639 --> 00:12:01,040 push one metric really hard you're going 331 00:11:58,959 --> 00:12:02,959 to influence those other metrics to 332 00:12:01,040 --> 00:12:05,440 their detriment and therefore have an 333 00:12:02,959 --> 00:12:08,240 overall worse outcome than if you hadn't 334 00:12:05,440 --> 00:12:10,320 like tried to gain that singular metric 335 00:12:08,240 --> 00:12:12,800 a really great example of this is nicole 336 00:12:10,320 --> 00:12:14,639 fourgrin's uh forsgren's metrics for 337 00:12:12,800 --> 00:12:15,920 high performing teams 338 00:12:14,639 --> 00:12:17,680 she outlines these in her book 339 00:12:15,920 --> 00:12:19,200 accelerate which i highly recommend you 340 00:12:17,680 --> 00:12:21,920 read 341 00:12:19,200 --> 00:12:24,160 but basically the premise is that 342 00:12:21,920 --> 00:12:26,720 lead time deployment frequency change 343 00:12:24,160 --> 00:12:28,720 fail ratio and mean time to resolve are 344 00:12:26,720 --> 00:12:31,120 metrics that are indicators of high 345 00:12:28,720 --> 00:12:32,800 performing teams 346 00:12:31,120 --> 00:12:35,279 lead time and deployment frequency 347 00:12:32,800 --> 00:12:37,440 obviously correlate to speed and change 348 00:12:35,279 --> 00:12:41,839 fail ratio and mean time to resolve 349 00:12:37,440 --> 00:12:41,839 obviously correlate to quality or um 350 00:12:42,959 --> 00:12:47,440 reliability sorry thank you um 351 00:12:46,079 --> 00:12:48,880 and so 352 00:12:47,440 --> 00:12:51,120 you know if you're trying to optimize 353 00:12:48,880 --> 00:12:52,959 for your change fail ratio and you know 354 00:12:51,120 --> 00:12:55,200 you're trying to push out perfect code 355 00:12:52,959 --> 00:12:56,560 that has no bugs you're going to impact 356 00:12:55,200 --> 00:12:59,680 your lead time and your deployment 357 00:12:56,560 --> 00:13:01,680 frequency negatively 358 00:12:59,680 --> 00:13:03,839 these are the kind of principles that 359 00:13:01,680 --> 00:13:05,440 led me to the investigation of slos 360 00:13:03,839 --> 00:13:07,600 right the new problem is how do we 361 00:13:05,440 --> 00:13:10,000 define metrics that encompass the whole 362 00:13:07,600 --> 00:13:11,920 product and incentivize good behaviors 363 00:13:10,000 --> 00:13:13,360 between competing elements 364 00:13:11,920 --> 00:13:16,000 it starts with kind of finding that 365 00:13:13,360 --> 00:13:17,920 balance of power between two factions 366 00:13:16,000 --> 00:13:20,320 so no longer talking about speed and 367 00:13:17,920 --> 00:13:22,880 quality or reliability but now talking 368 00:13:20,320 --> 00:13:24,240 about product and engineering as the 369 00:13:22,880 --> 00:13:28,000 business 370 00:13:24,240 --> 00:13:28,000 kind of equivalence of those things 371 00:13:28,480 --> 00:13:31,920 the thing that's often 372 00:13:30,079 --> 00:13:34,480 kind of missing when you do have two 373 00:13:31,920 --> 00:13:36,800 powerful factions vying for control 374 00:13:34,480 --> 00:13:38,720 is an impartial arbiter particularly 375 00:13:36,800 --> 00:13:40,240 when those factions have different 376 00:13:38,720 --> 00:13:41,519 objectives and they're being measured 377 00:13:40,240 --> 00:13:44,800 differently 378 00:13:41,519 --> 00:13:45,519 so the role of sre that's where we come 379 00:13:44,800 --> 00:13:47,279 in 380 00:13:45,519 --> 00:13:50,399 so sre 381 00:13:47,279 --> 00:13:52,880 owning the data across the business and 382 00:13:50,399 --> 00:13:54,399 being able to provide a common language 383 00:13:52,880 --> 00:13:56,560 and a common 384 00:13:54,399 --> 00:13:57,760 thread of discussion between those two 385 00:13:56,560 --> 00:14:00,240 factions 386 00:13:57,760 --> 00:14:02,560 with data to back it up becomes a new 387 00:14:00,240 --> 00:14:05,440 way forward 388 00:14:02,560 --> 00:14:07,519 the role of sre is defined by google 389 00:14:05,440 --> 00:14:09,519 is responsible for so many pieces of 390 00:14:07,519 --> 00:14:11,680 data so they're responsible for the 391 00:14:09,519 --> 00:14:13,680 availability the latency the performance 392 00:14:11,680 --> 00:14:16,000 the efficiency the change management 393 00:14:13,680 --> 00:14:17,120 monitoring emergency response capacity 394 00:14:16,000 --> 00:14:19,120 planning 395 00:14:17,120 --> 00:14:21,600 of all of their services 396 00:14:19,120 --> 00:14:23,680 and this places sre in this pivotal area 397 00:14:21,600 --> 00:14:26,720 of control as the provider and the 398 00:14:23,680 --> 00:14:26,720 purveyor of data 399 00:14:26,959 --> 00:14:32,720 if we rephrase this balance again but 400 00:14:30,000 --> 00:14:34,880 now in terms of sre as the middleman 401 00:14:32,720 --> 00:14:37,279 providing a common language that impacts 402 00:14:34,880 --> 00:14:39,440 both product and engineering we're now 403 00:14:37,279 --> 00:14:40,560 talking about speed and reliability 404 00:14:39,440 --> 00:14:41,519 again 405 00:14:40,560 --> 00:14:43,760 but 406 00:14:41,519 --> 00:14:45,680 where we have slos 407 00:14:43,760 --> 00:14:47,360 or any kind of metric to come in and 408 00:14:45,680 --> 00:14:49,920 bridge the gap and provide a 409 00:14:47,360 --> 00:14:52,320 quantifiable answer to what is important 410 00:14:49,920 --> 00:14:54,720 at any given time and why 411 00:14:52,320 --> 00:14:58,079 as well as that common language to how 412 00:14:54,720 --> 00:14:59,279 to productively discuss it in a way that 413 00:14:58,079 --> 00:15:02,160 we're bringing 414 00:14:59,279 --> 00:15:05,440 logic and data into what is often a very 415 00:15:02,160 --> 00:15:05,440 emotional exchange 416 00:15:07,040 --> 00:15:10,720 so 417 00:15:08,560 --> 00:15:13,600 expectations from users around features 418 00:15:10,720 --> 00:15:16,480 reliability availability security and 419 00:15:13,600 --> 00:15:17,760 quality are all increasing exponentially 420 00:15:16,480 --> 00:15:19,920 right 421 00:15:17,760 --> 00:15:21,760 but most organizations are making 422 00:15:19,920 --> 00:15:24,160 trade-offs and are not really set up to 423 00:15:21,760 --> 00:15:26,639 deliver on all of these vectors which is 424 00:15:24,160 --> 00:15:28,399 the speed and reliability compromise 425 00:15:26,639 --> 00:15:31,040 that we've been talking about because it 426 00:15:28,399 --> 00:15:32,560 really is fundamentally underlying the 427 00:15:31,040 --> 00:15:33,839 the conflict between engineering and 428 00:15:32,560 --> 00:15:36,000 product 429 00:15:33,839 --> 00:15:38,320 on one axis there's this desire and need 430 00:15:36,000 --> 00:15:40,959 to have rock solid stability 431 00:15:38,320 --> 00:15:43,360 um and reliability but the challenge is 432 00:15:40,959 --> 00:15:44,880 if that's what you're optimizing for 433 00:15:43,360 --> 00:15:46,160 you're not innovating to your full 434 00:15:44,880 --> 00:15:47,920 potential 435 00:15:46,160 --> 00:15:50,399 and you're going to have issues within 436 00:15:47,920 --> 00:15:52,000 the market if not now in the future 437 00:15:50,399 --> 00:15:53,920 and on the other hand you can't spend 438 00:15:52,000 --> 00:15:56,639 all of your time pushing for feature 439 00:15:53,920 --> 00:15:58,720 delivery without regard to stability or 440 00:15:56,639 --> 00:16:00,800 you'll rapidly accrue risk and technical 441 00:15:58,720 --> 00:16:03,440 debt and potentially churn your existing 442 00:16:00,800 --> 00:16:03,440 customers 443 00:16:04,800 --> 00:16:07,920 so 444 00:16:06,240 --> 00:16:09,920 what i haven't defined for you and i 445 00:16:07,920 --> 00:16:12,880 want to touch on really briefly is what 446 00:16:09,920 --> 00:16:14,560 are slas slos and slis 447 00:16:12,880 --> 00:16:17,600 this all comes straight from the google 448 00:16:14,560 --> 00:16:18,880 handbook sre handbook please i encourage 449 00:16:17,600 --> 00:16:21,759 you to read it 450 00:16:18,880 --> 00:16:24,720 it is much more thrilling than it sounds 451 00:16:21,759 --> 00:16:27,120 but slos slis and slas are exclusively 452 00:16:24,720 --> 00:16:29,600 used as metrics that capture parts of 453 00:16:27,120 --> 00:16:31,920 your user journey such as availability 454 00:16:29,600 --> 00:16:34,240 or request latency or throughput or 455 00:16:31,920 --> 00:16:36,399 error rate they give you a metric for 456 00:16:34,240 --> 00:16:38,639 both sides of that coin 457 00:16:36,399 --> 00:16:40,959 um as like the queen of product versus 458 00:16:38,639 --> 00:16:43,279 engineering if you assume that both 459 00:16:40,959 --> 00:16:44,240 engineering and product care about your 460 00:16:43,279 --> 00:16:47,199 user 461 00:16:44,240 --> 00:16:47,199 which i hope they do 462 00:16:47,440 --> 00:16:51,360 so your sla is your service level 463 00:16:49,279 --> 00:16:53,600 agreement it's your external metric to 464 00:16:51,360 --> 00:16:56,000 which your business has committed to 465 00:16:53,600 --> 00:16:57,839 legally and with generally a monetary 466 00:16:56,000 --> 00:17:00,079 obligation to meet 467 00:16:57,839 --> 00:17:02,720 this is often something 468 00:17:00,079 --> 00:17:04,640 reasonable but after which it's the kind 469 00:17:02,720 --> 00:17:05,760 of the baseline after which your clients 470 00:17:04,640 --> 00:17:08,480 are 471 00:17:05,760 --> 00:17:09,360 not happy with their service 472 00:17:08,480 --> 00:17:12,240 um 473 00:17:09,360 --> 00:17:14,959 your slo is your service level objective 474 00:17:12,240 --> 00:17:16,640 it is the internal target for the metric 475 00:17:14,959 --> 00:17:18,400 that you're measuring that should 476 00:17:16,640 --> 00:17:20,880 represent 477 00:17:18,400 --> 00:17:23,760 where your client starts to feel pain it 478 00:17:20,880 --> 00:17:26,480 it represents your optimal point where 479 00:17:23,760 --> 00:17:27,839 you want to hit from a reliability 480 00:17:26,480 --> 00:17:31,039 versus speed perspective or a 481 00:17:27,839 --> 00:17:32,720 reliability versus risk perspective 482 00:17:31,039 --> 00:17:35,120 and your sli is your service level 483 00:17:32,720 --> 00:17:36,559 indicator it's the it's the measure of 484 00:17:35,120 --> 00:17:37,600 service reliability it's what you're 485 00:17:36,559 --> 00:17:39,120 measuring 486 00:17:37,600 --> 00:17:40,960 um 487 00:17:39,120 --> 00:17:42,240 slis will tell you that something is 488 00:17:40,960 --> 00:17:46,919 wrong and you need to use all of your 489 00:17:42,240 --> 00:17:46,919 other metrics to figure out what that is 490 00:17:48,000 --> 00:17:51,039 so 491 00:17:49,360 --> 00:17:52,320 this is a good depiction of how this 492 00:17:51,039 --> 00:17:54,160 works right 493 00:17:52,320 --> 00:17:56,320 so 494 00:17:54,160 --> 00:17:58,320 your agreement is just enough to stop 495 00:17:56,320 --> 00:18:00,320 your customer being unhappy or leaving 496 00:17:58,320 --> 00:18:02,799 or churning that's what they've agreed 497 00:18:00,320 --> 00:18:05,039 to as an acceptable level 498 00:18:02,799 --> 00:18:07,039 your objective your slo has to be 499 00:18:05,039 --> 00:18:08,799 tighter than that agreement and it 500 00:18:07,039 --> 00:18:10,559 should represent your desired user 501 00:18:08,799 --> 00:18:12,480 experience 502 00:18:10,559 --> 00:18:14,480 breaching your objective has to have 503 00:18:12,480 --> 00:18:16,799 consequences as well you shouldn't just 504 00:18:14,480 --> 00:18:18,880 be only if we breach the sla is there a 505 00:18:16,799 --> 00:18:21,120 problem breaching your objective is 506 00:18:18,880 --> 00:18:22,240 where you need to be able to leverage 507 00:18:21,120 --> 00:18:24,240 that data 508 00:18:22,240 --> 00:18:25,919 within the business to change priorities 509 00:18:24,240 --> 00:18:27,120 of what's being worked on because your 510 00:18:25,919 --> 00:18:28,400 hand you're heading in the wrong 511 00:18:27,120 --> 00:18:30,160 direction 512 00:18:28,400 --> 00:18:33,200 and it allows you to be proactive before 513 00:18:30,160 --> 00:18:36,799 there is a monetary problem 514 00:18:33,200 --> 00:18:39,600 where you're impacting your sla 515 00:18:36,799 --> 00:18:40,960 anything above your slo means that 516 00:18:39,600 --> 00:18:43,039 you're spending 517 00:18:40,960 --> 00:18:45,039 too much time on reliability and you're 518 00:18:43,039 --> 00:18:46,880 wasting effort that could be used to 519 00:18:45,039 --> 00:18:50,080 deliver features 520 00:18:46,880 --> 00:18:52,799 so anything better than your sl slo is 521 00:18:50,080 --> 00:18:52,799 wasted effort 522 00:18:54,320 --> 00:18:58,720 very quickly about error budgets error 523 00:18:56,160 --> 00:19:00,559 budgets is the next stage in this 524 00:18:58,720 --> 00:19:03,280 scenario where an error budget is 525 00:19:00,559 --> 00:19:04,799 monitoring your slo over time 526 00:19:03,280 --> 00:19:08,400 if your slo 527 00:19:04,799 --> 00:19:10,400 as in the previous slide is 99.95 528 00:19:08,400 --> 00:19:12,160 availability then 529 00:19:10,400 --> 00:19:15,520 your error budget would be 1 minus your 530 00:19:12,160 --> 00:19:18,160 slo 0.05 531 00:19:15,520 --> 00:19:19,120 this means that if you map that out over 532 00:19:18,160 --> 00:19:21,600 the month 533 00:19:19,120 --> 00:19:24,480 0.05 of the minutes in the month gives 534 00:19:21,600 --> 00:19:26,080 you 22 minutes 535 00:19:24,480 --> 00:19:28,080 sre practices encourage you to 536 00:19:26,080 --> 00:19:30,960 strategically burn that budget to zero 537 00:19:28,080 --> 00:19:33,200 on purpose to do things like 538 00:19:30,960 --> 00:19:34,799 deliver new features run expected 539 00:19:33,200 --> 00:19:37,120 systems changes 540 00:19:34,799 --> 00:19:39,840 use planned downtime or just do a 541 00:19:37,120 --> 00:19:41,760 slightly risky experiment it means that 542 00:19:39,840 --> 00:19:43,840 if you're hitting or using your error 543 00:19:41,760 --> 00:19:46,000 budget you're running as fast as you 544 00:19:43,840 --> 00:19:47,760 possibly can without impacting your 545 00:19:46,000 --> 00:19:49,840 availability and without impacting your 546 00:19:47,760 --> 00:19:51,280 client 547 00:19:49,840 --> 00:19:53,360 error budgets are not something you need 548 00:19:51,280 --> 00:19:55,600 to do right away if you're investigating 549 00:19:53,360 --> 00:19:58,559 slos for your own business start at the 550 00:19:55,600 --> 00:20:00,080 start start with slos and slis and 551 00:19:58,559 --> 00:20:01,280 setting those metrics and starting to 552 00:20:00,080 --> 00:20:03,360 measure them and make sure they're the 553 00:20:01,280 --> 00:20:04,799 right thing error budgets are things for 554 00:20:03,360 --> 00:20:06,640 down the track they don't have to be 555 00:20:04,799 --> 00:20:10,159 something that you start with you don't 556 00:20:06,640 --> 00:20:10,159 have to go all out right away 557 00:20:10,960 --> 00:20:16,720 so martin fowler very famously 2018 said 558 00:20:14,320 --> 00:20:18,720 evidence refutes the bimodal it 559 00:20:16,720 --> 00:20:21,120 notion that you have to choose between 560 00:20:18,720 --> 00:20:23,520 speed and stability instead speed 561 00:20:21,120 --> 00:20:25,520 depends on stability so good it practice 562 00:20:23,520 --> 00:20:27,440 gives you both 563 00:20:25,520 --> 00:20:28,960 so 564 00:20:27,440 --> 00:20:31,919 what does good i.t practice actually 565 00:20:28,960 --> 00:20:33,200 look like and how does data and slos and 566 00:20:31,919 --> 00:20:35,200 all of the things that i've talked about 567 00:20:33,200 --> 00:20:38,240 implementation of sre within a business 568 00:20:35,200 --> 00:20:40,960 how does that get you there 569 00:20:38,240 --> 00:20:44,080 using your metrics for good starts with 570 00:20:40,960 --> 00:20:46,960 what are you trying to get out of them 571 00:20:44,080 --> 00:20:48,960 so common language slos provide you with 572 00:20:46,960 --> 00:20:50,880 that common language between product and 573 00:20:48,960 --> 00:20:53,440 engineering of how do you talk about 574 00:20:50,880 --> 00:20:56,960 what is important and what is important 575 00:20:53,440 --> 00:20:56,960 should almost always be your customer 576 00:20:58,080 --> 00:21:02,559 you also now have hard data to influence 577 00:21:00,400 --> 00:21:04,559 that as i said previously like quite 578 00:21:02,559 --> 00:21:05,840 emotional decision product and 579 00:21:04,559 --> 00:21:09,039 engineering are 580 00:21:05,840 --> 00:21:10,480 often rewarded and incentivized on the 581 00:21:09,039 --> 00:21:12,320 metrics that they as individual 582 00:21:10,480 --> 00:21:15,200 departments care about rather than that 583 00:21:12,320 --> 00:21:17,919 centralized core component of the client 584 00:21:15,200 --> 00:21:19,919 so having hard data that backs up that 585 00:21:17,919 --> 00:21:22,480 work needs to happen to stabilize the 586 00:21:19,919 --> 00:21:24,480 product is really important 587 00:21:22,480 --> 00:21:26,000 it also allows you to be proactive you 588 00:21:24,480 --> 00:21:27,760 know when things are trending in a bad 589 00:21:26,000 --> 00:21:30,640 direction where you're going to breach 590 00:21:27,760 --> 00:21:32,640 an slo or an sla you also know that you 591 00:21:30,640 --> 00:21:34,880 can be proactively releasing innovative 592 00:21:32,640 --> 00:21:37,520 pieces of work because you're consuming 593 00:21:34,880 --> 00:21:38,559 your error budget to do so 594 00:21:37,520 --> 00:21:40,320 and 595 00:21:38,559 --> 00:21:42,840 the justification piece is really about 596 00:21:40,320 --> 00:21:45,600 how you have that productive 597 00:21:42,840 --> 00:21:47,360 discussion so having slos in place for 598 00:21:45,600 --> 00:21:50,000 your production services allows you to 599 00:21:47,360 --> 00:21:51,600 remove all of that emotional ambiguity 600 00:21:50,000 --> 00:21:54,320 when it comes to figuring out the impact 601 00:21:51,600 --> 00:21:56,720 of an unplanned change or outage 602 00:21:54,320 --> 00:21:59,039 um businesses often refuse to invest in 603 00:21:56,720 --> 00:22:00,720 availability or reliability until the 604 00:21:59,039 --> 00:22:02,799 bottom line's impacted 605 00:22:00,720 --> 00:22:04,480 so it's really important to have that 606 00:22:02,799 --> 00:22:06,720 data to back up what you're trying to 607 00:22:04,480 --> 00:22:06,720 say 608 00:22:07,280 --> 00:22:11,280 this is where sre comes in as a whole 609 00:22:09,280 --> 00:22:13,200 and the role of sre may change going 610 00:22:11,280 --> 00:22:16,320 forward in the future at least 611 00:22:13,200 --> 00:22:17,200 from kind of what i understand behind it 612 00:22:16,320 --> 00:22:19,280 um 613 00:22:17,200 --> 00:22:21,600 initiatives like slos can be really 614 00:22:19,280 --> 00:22:24,080 difficult to get buy-in for and the 615 00:22:21,600 --> 00:22:26,000 owner to kick off really has to be the 616 00:22:24,080 --> 00:22:28,320 sre team 617 00:22:26,000 --> 00:22:30,960 [Music] 618 00:22:28,320 --> 00:22:32,799 but it can't be them in isolation you 619 00:22:30,960 --> 00:22:34,480 know you're talking about metrics of 620 00:22:32,799 --> 00:22:36,559 what impacts the clients and what the 621 00:22:34,480 --> 00:22:38,400 clients or the customers care about 622 00:22:36,559 --> 00:22:40,080 which means that you need to be involved 623 00:22:38,400 --> 00:22:42,159 with your product owners with your 624 00:22:40,080 --> 00:22:43,919 customers themselves or with your 625 00:22:42,159 --> 00:22:45,760 customer success teams as well as your 626 00:22:43,919 --> 00:22:47,440 engineers who have to do 627 00:22:45,760 --> 00:22:49,200 the work to make sure that we're hitting 628 00:22:47,440 --> 00:22:52,000 those slos because your slos shouldn't 629 00:22:49,200 --> 00:22:55,799 be aspirational they need to be 630 00:22:52,000 --> 00:22:55,799 something that you can achieve 631 00:22:56,400 --> 00:23:01,280 there are lots of selling points behind 632 00:22:58,880 --> 00:23:03,440 slos as a program 633 00:23:01,280 --> 00:23:05,039 but in general 634 00:23:03,440 --> 00:23:06,240 the ability to surface your technical 635 00:23:05,039 --> 00:23:09,280 debt 636 00:23:06,240 --> 00:23:11,760 related to reliability or lack thereof 637 00:23:09,280 --> 00:23:14,000 means that you know you can advocate for 638 00:23:11,760 --> 00:23:15,760 what you need in terms of allocation of 639 00:23:14,000 --> 00:23:17,360 engineering resources 640 00:23:15,760 --> 00:23:19,600 you're reducing your manual effort from 641 00:23:17,360 --> 00:23:21,200 an sre perspective of generating this 642 00:23:19,600 --> 00:23:22,240 data you know what you're going to talk 643 00:23:21,200 --> 00:23:24,080 about you know what you're going to 644 00:23:22,240 --> 00:23:26,159 review you know what the metrics are and 645 00:23:24,080 --> 00:23:28,400 what we care about 646 00:23:26,159 --> 00:23:31,200 and as i said you're also reducing your 647 00:23:28,400 --> 00:23:33,120 risk around monetary risk to the 648 00:23:31,200 --> 00:23:35,280 business so this is how you sell it into 649 00:23:33,120 --> 00:23:37,520 the business is reducing that that 650 00:23:35,280 --> 00:23:39,280 monetary risk or risk of churn by 651 00:23:37,520 --> 00:23:42,320 increase improving your customer 652 00:23:39,280 --> 00:23:42,320 satisfaction rates 653 00:23:42,480 --> 00:23:46,559 um 654 00:23:44,720 --> 00:23:49,520 these are not for everyone i'm not 655 00:23:46,559 --> 00:23:52,159 trying to say that slos or slis or slas 656 00:23:49,520 --> 00:23:53,200 any of those things are for every 657 00:23:52,159 --> 00:23:55,120 business 658 00:23:53,200 --> 00:23:56,720 uh there is a level of maturity 659 00:23:55,120 --> 00:23:58,400 particularly in the businesses that have 660 00:23:56,720 --> 00:24:00,880 actually already implemented this to 661 00:23:58,400 --> 00:24:02,960 great success the googles the 662 00:24:00,880 --> 00:24:04,480 evernotes the twitters 663 00:24:02,960 --> 00:24:06,240 those guys are huge 664 00:24:04,480 --> 00:24:07,840 you don't have to start there you don't 665 00:24:06,240 --> 00:24:09,919 have to look at them and go this is this 666 00:24:07,840 --> 00:24:12,159 huge unachievable mountain 667 00:24:09,919 --> 00:24:14,480 um you know you can start with two to 668 00:24:12,159 --> 00:24:16,720 three slos and iterate and work your way 669 00:24:14,480 --> 00:24:18,400 up you can start to build out that team 670 00:24:16,720 --> 00:24:20,480 or you can build it internally as a 671 00:24:18,400 --> 00:24:22,320 proof of concept within your sre team 672 00:24:20,480 --> 00:24:24,000 and then start to use 673 00:24:22,320 --> 00:24:25,840 the quality and the relevance of the 674 00:24:24,000 --> 00:24:28,080 data that you're producing 675 00:24:25,840 --> 00:24:29,760 to get buy-in from other pieces of the 676 00:24:28,080 --> 00:24:33,360 business as well 677 00:24:29,760 --> 00:24:35,760 and drive that adoption 678 00:24:33,360 --> 00:24:37,279 failure is going to happen like this is 679 00:24:35,760 --> 00:24:39,600 the devops track you guys have been 680 00:24:37,279 --> 00:24:42,400 listening to failure and learnings and 681 00:24:39,600 --> 00:24:44,320 you know amazing triumphs through that 682 00:24:42,400 --> 00:24:46,240 all day 683 00:24:44,320 --> 00:24:47,919 as you implement these kinds of things 684 00:24:46,240 --> 00:24:49,679 failures are going to occur slos are 685 00:24:47,919 --> 00:24:52,480 going to be breached systems are made by 686 00:24:49,679 --> 00:24:54,320 humans and we've already discussed very 687 00:24:52,480 --> 00:24:55,520 in quite a bit of detail how humans are 688 00:24:54,320 --> 00:24:57,760 imperfect 689 00:24:55,520 --> 00:24:59,919 so what's important is learning from 690 00:24:57,760 --> 00:25:01,840 these and continuing to iterate on your 691 00:24:59,919 --> 00:25:03,840 slos you should be going back and 692 00:25:01,840 --> 00:25:05,039 looking at them on a regular cadence to 693 00:25:03,840 --> 00:25:06,799 make sure that they're reflecting the 694 00:25:05,039 --> 00:25:09,679 things that you want or the things that 695 00:25:06,799 --> 00:25:11,919 their clients still want 696 00:25:09,679 --> 00:25:11,919 um 697 00:25:12,000 --> 00:25:16,400 and like you have to be collaborative in 698 00:25:14,240 --> 00:25:17,679 this iteration as well you have to 699 00:25:16,400 --> 00:25:19,200 incorporate those other parts of the 700 00:25:17,679 --> 00:25:21,520 business that have those touch points 701 00:25:19,200 --> 00:25:22,880 with the clients as well as the ones 702 00:25:21,520 --> 00:25:25,200 that pay the bills 703 00:25:22,880 --> 00:25:26,480 uh to make sure that you're delivering 704 00:25:25,200 --> 00:25:27,840 not just for yourself and your 705 00:25:26,480 --> 00:25:30,240 department but also for the whole 706 00:25:27,840 --> 00:25:30,240 business 707 00:25:32,159 --> 00:25:36,880 the other piece to all of this right is 708 00:25:35,360 --> 00:25:38,080 the blameless mentality and how 709 00:25:36,880 --> 00:25:39,600 important it is 710 00:25:38,080 --> 00:25:41,760 because you're not going to get it right 711 00:25:39,600 --> 00:25:43,039 away right straight away or even 712 00:25:41,760 --> 00:25:45,520 continuously 713 00:25:43,039 --> 00:25:47,679 what the clients want is going to change 714 00:25:45,520 --> 00:25:49,679 um but nothing's about who tripped over 715 00:25:47,679 --> 00:25:51,520 the power cord but how do we stop people 716 00:25:49,679 --> 00:25:52,799 from tripping over the power cord next 717 00:25:51,520 --> 00:25:56,080 time 718 00:25:52,799 --> 00:25:58,400 slos should never ever ever be tied to 719 00:25:56,080 --> 00:26:01,279 individual performance metrics 720 00:25:58,400 --> 00:26:03,440 they need to be the goal around elixir 721 00:26:01,279 --> 00:26:05,600 the goal should always be defining more 722 00:26:03,440 --> 00:26:07,520 slos to get greater visibility and 723 00:26:05,600 --> 00:26:10,000 understanding rather than blaming teams 724 00:26:07,520 --> 00:26:12,320 for not meeting slos your slos are meant 725 00:26:10,000 --> 00:26:14,240 to be a way to empower the discussion of 726 00:26:12,320 --> 00:26:17,760 what needs to happen next and what went 727 00:26:14,240 --> 00:26:17,760 wrong and how we fix it in the future 728 00:26:19,440 --> 00:26:23,200 i just want to circle quickly back to 729 00:26:21,520 --> 00:26:25,360 martin fowler's idea of good i.t 730 00:26:23,200 --> 00:26:27,279 practices and bridging the gap between 731 00:26:25,360 --> 00:26:30,080 speed and quality 732 00:26:27,279 --> 00:26:31,600 so wrecker point my my company is still 733 00:26:30,080 --> 00:26:35,760 in its infancy when it comes to 734 00:26:31,600 --> 00:26:38,000 implementing any of these slos sli slas 735 00:26:35,760 --> 00:26:39,679 and i came to talk to you today about 736 00:26:38,000 --> 00:26:41,200 the research that i'd done in the hopes 737 00:26:39,679 --> 00:26:43,760 that other people had seen these 738 00:26:41,200 --> 00:26:45,919 problems in their own organizations and 739 00:26:43,760 --> 00:26:48,799 found this something interesting 740 00:26:45,919 --> 00:26:52,960 um as you know a way to 741 00:26:48,799 --> 00:26:55,520 bridge that gap and go forward 742 00:26:52,960 --> 00:26:57,440 um slos are a great way to start 743 00:26:55,520 --> 00:26:58,960 leveraging your existing data and 744 00:26:57,440 --> 00:27:00,720 setting metrics that really mean 745 00:26:58,960 --> 00:27:03,120 something both to your customer and to 746 00:27:00,720 --> 00:27:04,840 your product and addressing a root cause 747 00:27:03,120 --> 00:27:06,880 rather than just 748 00:27:04,840 --> 00:27:09,760 symptoms um 749 00:27:06,880 --> 00:27:11,840 and the more research that i dive into 750 00:27:09,760 --> 00:27:13,919 of this side of sre 751 00:27:11,840 --> 00:27:16,559 and its potential role as the collector 752 00:27:13,919 --> 00:27:18,640 and the curator of data and acting as a 753 00:27:16,559 --> 00:27:21,360 mediator between the two factions within 754 00:27:18,640 --> 00:27:23,279 the business means that they get to be 755 00:27:21,360 --> 00:27:25,039 an impartial driver for the great 756 00:27:23,279 --> 00:27:27,600 customer experience 757 00:27:25,039 --> 00:27:29,600 and the more i believe that even if this 758 00:27:27,600 --> 00:27:31,520 isn't the final ansel 759 00:27:29,600 --> 00:27:33,279 i think it's a really good step forward 760 00:27:31,520 --> 00:27:35,520 for technology in general and the way 761 00:27:33,279 --> 00:27:38,080 that our businesses need to continue to 762 00:27:35,520 --> 00:27:39,360 move forward 763 00:27:38,080 --> 00:27:41,919 i'm certainly not trying to say that 764 00:27:39,360 --> 00:27:43,760 slos will solve any problem or even that 765 00:27:41,919 --> 00:27:45,600 they're right for every business 766 00:27:43,760 --> 00:27:47,279 but i am trying to say that this 767 00:27:45,600 --> 00:27:49,279 framework is something that i see as a 768 00:27:47,279 --> 00:27:53,039 potential bomb for the pain that i have 769 00:27:49,279 --> 00:27:57,440 seen across so many businesses globally 770 00:27:53,039 --> 00:27:57,440 um and every day in my own business 771 00:27:58,240 --> 00:28:02,720 it's highly likely that i'll be back 772 00:28:00,000 --> 00:28:04,799 here next year for the we tried this and 773 00:28:02,720 --> 00:28:06,720 this is what we learnt uh metrics for 774 00:28:04,799 --> 00:28:08,000 good not evil part two 775 00:28:06,720 --> 00:28:09,520 and we can definitely go over the 776 00:28:08,000 --> 00:28:11,039 blooper reel then 777 00:28:09,520 --> 00:28:12,399 um but i hope that this has been a bit 778 00:28:11,039 --> 00:28:14,559 of an inspiration to do your own 779 00:28:12,399 --> 00:28:16,480 research and explore if this might be a 780 00:28:14,559 --> 00:28:18,159 good solution to your problems 781 00:28:16,480 --> 00:28:20,640 thank you so much for your time i know 782 00:28:18,159 --> 00:28:21,679 this is not a normal talk even for this 783 00:28:20,640 --> 00:28:23,039 track 784 00:28:21,679 --> 00:28:25,120 but it's something that i really wanted 785 00:28:23,039 --> 00:28:27,120 to share and if you've got your own 786 00:28:25,120 --> 00:28:29,039 horror stories or success stories for 787 00:28:27,120 --> 00:28:30,880 implementing slos in your business i'd 788 00:28:29,039 --> 00:28:32,480 love to hear them please hit me up on 789 00:28:30,880 --> 00:28:33,679 linkedin or twitter 790 00:28:32,480 --> 00:28:35,200 and uh 791 00:28:33,679 --> 00:28:37,200 if you've got nothing else out of this 792 00:28:35,200 --> 00:28:41,360 talk please remember that whatever else 793 00:28:37,200 --> 00:28:41,360 you do don't put bounties on snakeheads 794 00:28:42,559 --> 00:28:47,039 thank you dawn that has that has been a 795 00:28:44,320 --> 00:28:48,159 tremendous talk thank you um 796 00:28:47,039 --> 00:28:50,640 it's 797 00:28:48,159 --> 00:28:52,320 yes sorry molly what am i saying dawn i 798 00:28:50,640 --> 00:28:54,000 have my next talk already queued up in 799 00:28:52,320 --> 00:28:55,600 my head excellent more failure for the 800 00:28:54,000 --> 00:28:58,960 for the devops track 801 00:28:55,600 --> 00:29:00,559 um my apologies molly uh metric is close 802 00:28:58,960 --> 00:29:02,559 to my heart though i'd love everything 803 00:29:00,559 --> 00:29:04,159 that you were talking about there um 804 00:29:02,559 --> 00:29:05,840 people in the chat also really really 805 00:29:04,159 --> 00:29:08,559 got a lot out of that we have a couple 806 00:29:05,840 --> 00:29:10,480 of questions already queued up um 807 00:29:08,559 --> 00:29:12,080 so if you could jump into the chat and 808 00:29:10,480 --> 00:29:13,840 uh and talk to people about those 809 00:29:12,080 --> 00:29:16,080 questions either just in the chat for 810 00:29:13,840 --> 00:29:18,080 the the room itself or into the hallway 811 00:29:16,080 --> 00:29:20,320 track that would be superb um people 812 00:29:18,080 --> 00:29:21,919 would love to talk to you some more 813 00:29:20,320 --> 00:29:23,600 all right no worries i'll jump in now 814 00:29:21,919 --> 00:29:25,840 thank you great 815 00:29:23,600 --> 00:29:28,000 and we have another short break now um 816 00:29:25,840 --> 00:29:30,399 we'll be back in 15 minutes for our 817 00:29:28,000 --> 00:29:32,559 final talk of the day um which is by 818 00:29:30,399 --> 00:29:34,240 dawn this time getting people's names 819 00:29:32,559 --> 00:29:37,200 right is important uh which is about 820 00:29:34,240 --> 00:29:40,000 accessibility uh accessibility overlays 821 00:29:37,200 --> 00:29:41,840 a cautionary tale hmm i wonder what the 822 00:29:40,000 --> 00:29:45,039 cautionary tale will be 823 00:29:41,840 --> 00:29:46,880 and uh yes so grab a quick drink ask 824 00:29:45,039 --> 00:29:49,039 some questions about molly's talk there 825 00:29:46,880 --> 00:29:53,159 about metrics and we will see you back 826 00:29:49,039 --> 00:29:53,159 here in about 15 minutes 827 00:29:58,559 --> 00:30:00,640 you