1 00:00:00,000 --> 00:00:08,469 foreign 2 00:00:00,500 --> 00:00:08,469 [Music] 3 00:00:11,820 --> 00:00:16,800 good morning again everyone I hope you 4 00:00:14,519 --> 00:00:20,100 enjoyed your morning tea 5 00:00:16,800 --> 00:00:23,160 now we're getting a talk from Paul 6 00:00:20,100 --> 00:00:25,920 McKinney he's a software engineer who's 7 00:00:23,160 --> 00:00:27,180 best known for his work in RCU on the 8 00:00:25,920 --> 00:00:29,880 Linux kernel 9 00:00:27,180 --> 00:00:32,099 he's been coding for more than four 10 00:00:29,880 --> 00:00:34,380 decades so he has a bit to teach us and 11 00:00:32,099 --> 00:00:37,559 today he's going to to some of the 12 00:00:34,380 --> 00:00:40,340 lessons he's learned in that time please 13 00:00:37,559 --> 00:00:40,340 welcome Paul 14 00:00:42,760 --> 00:00:47,760 [Applause] 15 00:00:45,239 --> 00:00:50,039 thank you Russell 16 00:00:47,760 --> 00:00:53,820 so as Russell said I've been doing this 17 00:00:50,039 --> 00:00:56,460 for a while uh for almost 50 years in 18 00:00:53,820 --> 00:00:58,020 terms of just coding uh 45 supporting 19 00:00:56,460 --> 00:01:00,719 myself 20 00:00:58,020 --> 00:01:02,520 and uh I can sum up some cautionary 21 00:01:00,719 --> 00:01:04,619 quotes which are mostly saying that if 22 00:01:02,520 --> 00:01:06,840 you want to actually get what you want 23 00:01:04,619 --> 00:01:09,900 you have to first know what you want 24 00:01:06,840 --> 00:01:12,780 and this is all very nice to say but uh 25 00:01:09,900 --> 00:01:15,119 you know here we live in the real world 26 00:01:12,780 --> 00:01:17,100 so I'll go back to 1990 something called 27 00:01:15,119 --> 00:01:20,759 stochastic fairness cueing 28 00:01:17,100 --> 00:01:23,400 and that involved these guys this is a 29 00:01:20,759 --> 00:01:25,200 Cisco AGS Gateway from late 80s early 30 00:01:23,400 --> 00:01:26,700 90s and I'm one of the people that's 31 00:01:25,200 --> 00:01:29,280 actually programmed the innards of this 32 00:01:26,700 --> 00:01:31,860 without actually ever working for Cisco 33 00:01:29,280 --> 00:01:34,020 this was part of a research project and 34 00:01:31,860 --> 00:01:35,939 the problem we were looking at was this 35 00:01:34,020 --> 00:01:37,560 kind of thing a queuing problem so you 36 00:01:35,939 --> 00:01:39,720 have a router like the one we saw on the 37 00:01:37,560 --> 00:01:41,460 previous page we got a bulk Source it's 38 00:01:39,720 --> 00:01:43,380 just maybe FTP or something just 39 00:01:41,460 --> 00:01:45,000 blasting bits out there and a couple of 40 00:01:43,380 --> 00:01:46,380 interactive sources that are just trying 41 00:01:45,000 --> 00:01:48,479 to Echo characters 42 00:01:46,380 --> 00:01:51,720 problem is back then a one megabit 43 00:01:48,479 --> 00:01:53,520 network was really fast and these poor 44 00:01:51,720 --> 00:01:55,920 little one characters get stuck in this 45 00:01:53,520 --> 00:01:57,720 mass of red packets that are filling up 46 00:01:55,920 --> 00:01:59,820 the router's queue and your echoing 47 00:01:57,720 --> 00:02:02,700 takes forever 48 00:01:59,820 --> 00:02:05,219 now the uh theoretical solution back 49 00:02:02,700 --> 00:02:07,439 then was something called Fair queuing 50 00:02:05,219 --> 00:02:11,400 so what you do is you identify each flow 51 00:02:07,439 --> 00:02:14,340 of traffic with its TCP IP quadruple the 52 00:02:11,400 --> 00:02:16,560 two addresses and the two ports and uh 53 00:02:14,340 --> 00:02:18,900 you've used a data structure to give 54 00:02:16,560 --> 00:02:21,300 each flow its own cue 55 00:02:18,900 --> 00:02:22,860 well the problem with that uh and this I 56 00:02:21,300 --> 00:02:24,540 guess hasn't changed much is the CPUs 57 00:02:22,860 --> 00:02:26,280 have a hard time keeping up networking 58 00:02:24,540 --> 00:02:27,420 and back then the CPUs were really 59 00:02:26,280 --> 00:02:30,599 really slow 60 00:02:27,420 --> 00:02:33,060 and so this wasn't really practical 61 00:02:30,599 --> 00:02:35,640 so the trick we used was instead to use 62 00:02:33,060 --> 00:02:36,959 hashing so we wanted to give each flow 63 00:02:35,640 --> 00:02:39,120 its own cue but only with high 64 00:02:36,959 --> 00:02:40,560 probability all right so sometimes you'd 65 00:02:39,120 --> 00:02:41,640 Collide and that'd be bad most the time 66 00:02:40,560 --> 00:02:44,819 you wouldn't 67 00:02:41,640 --> 00:02:46,739 and so we hashed the IP port quadruple 68 00:02:44,819 --> 00:02:48,980 wonderful end in fairness with the end 69 00:02:46,739 --> 00:02:52,140 end principle it was great 70 00:02:48,980 --> 00:02:53,879 and my vision at that time was we were 71 00:02:52,140 --> 00:02:55,680 going to have stochastic Fair queuing 72 00:02:53,879 --> 00:02:56,940 routers throughout the internet and that 73 00:02:55,680 --> 00:02:59,220 would be wonderful we'd have fairness 74 00:02:56,940 --> 00:03:00,420 and be great of course you haven't heard 75 00:02:59,220 --> 00:03:01,739 of this 76 00:03:00,420 --> 00:03:03,360 and there's a good reason why you 77 00:03:01,739 --> 00:03:05,160 haven't heard of this 78 00:03:03,360 --> 00:03:08,760 and that's because this was not a vision 79 00:03:05,160 --> 00:03:12,540 this was a delusion this never happened 80 00:03:08,760 --> 00:03:15,180 and what happened instead was that the 81 00:03:12,540 --> 00:03:18,000 backbone tended to be over provisioned 82 00:03:15,180 --> 00:03:21,120 and there was not end-to-end fairness 83 00:03:18,000 --> 00:03:23,459 Supply but rather hop to hop hop by hop 84 00:03:21,120 --> 00:03:25,319 and they had the fairness applied at the 85 00:03:23,459 --> 00:03:26,760 endpoints into the houses shown at the 86 00:03:25,319 --> 00:03:28,739 bottom there 87 00:03:26,760 --> 00:03:30,120 in fact I would get calls from Engineers 88 00:03:28,739 --> 00:03:31,680 who'd read my paper and say hey this 89 00:03:30,120 --> 00:03:34,620 this sounds great we'd really like this 90 00:03:31,680 --> 00:03:37,560 but is it okay if we use Ethernet Mac 91 00:03:34,620 --> 00:03:40,200 addresses instead of using the IP Tuple 92 00:03:37,560 --> 00:03:44,480 quadruple I go well yeah that'll work 93 00:03:40,200 --> 00:03:44,480 but why okay thanks bye 94 00:03:45,480 --> 00:03:49,739 anyway uh the problem was I solved the 95 00:03:48,299 --> 00:03:51,659 wrong problem 96 00:03:49,739 --> 00:03:53,700 I was gonna solve end in fairness 97 00:03:51,659 --> 00:03:55,379 because that's what the people around me 98 00:03:53,700 --> 00:03:57,299 thought was important the correct 99 00:03:55,379 --> 00:03:59,760 problem was instead hop by hop fairness 100 00:03:57,299 --> 00:04:01,680 and endpoint fairness but by sheer dumb 101 00:03:59,760 --> 00:04:03,659 luck my algorithm handled both 102 00:04:01,680 --> 00:04:05,760 the other problem I had was that I was 103 00:04:03,659 --> 00:04:07,140 writing research quality code I might 104 00:04:05,760 --> 00:04:09,599 have been insulted by that at the time 105 00:04:07,140 --> 00:04:11,640 but I've learned differently since 106 00:04:09,599 --> 00:04:13,080 I just need to get a paper out uh 107 00:04:11,640 --> 00:04:15,239 fortunately nobody asked for the code 108 00:04:13,080 --> 00:04:16,680 they just implemented good code based on 109 00:04:15,239 --> 00:04:18,600 the paper 110 00:04:16,680 --> 00:04:20,820 this was used heavily until about 2015 111 00:04:18,600 --> 00:04:22,079 and for this sort of thing 25 years is 112 00:04:20,820 --> 00:04:23,820 not a bad run 113 00:04:22,079 --> 00:04:26,040 there's something called fq codal and 114 00:04:23,820 --> 00:04:28,440 cake that also addressed buffer bloat 115 00:04:26,040 --> 00:04:30,360 and these three guys did a lot to make 116 00:04:28,440 --> 00:04:32,100 that happen I suppose they're gonna be 117 00:04:30,360 --> 00:04:35,040 obsoleted you could do a lot worse than 118 00:04:32,100 --> 00:04:37,740 we obsoleted by these three guys 119 00:04:35,040 --> 00:04:40,199 so the thing was I had a bad idea that 120 00:04:37,740 --> 00:04:41,520 was badly implemented I was trying to do 121 00:04:40,199 --> 00:04:44,280 the wrong thing and didn't do a very 122 00:04:41,520 --> 00:04:48,000 good job of it but fortunately my idea 123 00:04:44,280 --> 00:04:50,419 was resuscitated by sheer dumb luck 124 00:04:48,000 --> 00:04:52,440 uh and uh here the thing is here that 125 00:04:50,419 --> 00:04:54,180 premature abstraction is the root of all 126 00:04:52,440 --> 00:04:56,280 evil used to be performance now I think 127 00:04:54,180 --> 00:04:59,699 it's abstraction 128 00:04:56,280 --> 00:05:01,380 uh the problem was that the end-to-end 129 00:04:59,699 --> 00:05:03,600 abstraction was a very powerful one and 130 00:05:01,380 --> 00:05:06,240 happened to work in a lot of places this 131 00:05:03,600 --> 00:05:08,100 just was not one of them 132 00:05:06,240 --> 00:05:10,500 and the problem was that I failed to 133 00:05:08,100 --> 00:05:12,419 live among my users I was living among a 134 00:05:10,500 --> 00:05:13,199 bunch of researchers and standards 135 00:05:12,419 --> 00:05:15,720 people 136 00:05:13,199 --> 00:05:19,639 but Engineers who were living among the 137 00:05:15,720 --> 00:05:19,639 users were took made things right 138 00:05:19,919 --> 00:05:24,600 so let's go back even further to the 139 00:05:21,960 --> 00:05:26,520 1980s this is customer relationships 140 00:05:24,600 --> 00:05:27,780 management I did an application based on 141 00:05:26,520 --> 00:05:29,699 one of these things 142 00:05:27,780 --> 00:05:31,620 this is an ultrona attache which was 143 00:05:29,699 --> 00:05:34,080 very popular for a couple of years in 144 00:05:31,620 --> 00:05:36,360 the early 1980s the reason was so 145 00:05:34,080 --> 00:05:38,340 popular was that it used a z80 and ran 146 00:05:36,360 --> 00:05:40,500 CPM instead of using the more expensive 147 00:05:38,340 --> 00:05:42,960 16-bit processors so it was quite a bit 148 00:05:40,500 --> 00:05:44,340 cheaper which was fine until the 16-bit 149 00:05:42,960 --> 00:05:45,900 processors got cheap enough that it 150 00:05:44,340 --> 00:05:49,020 didn't matter 151 00:05:45,900 --> 00:05:50,880 but at that point there was a customer 152 00:05:49,020 --> 00:05:53,340 that asked me to build it to spec under 153 00:05:50,880 --> 00:05:55,080 contract so I did that work and they 154 00:05:53,340 --> 00:05:56,520 loved it it was great they just they 155 00:05:55,080 --> 00:05:59,400 were really happy I'd done all the stuff 156 00:05:56,520 --> 00:06:01,139 and made it work and that was great but 157 00:05:59,400 --> 00:06:03,300 their prospective customers weren't 158 00:06:01,139 --> 00:06:05,460 quite so happy 159 00:06:03,300 --> 00:06:07,680 in fact their prospective customers were 160 00:06:05,460 --> 00:06:09,300 so unhappy that they went out of 161 00:06:07,680 --> 00:06:11,360 business 162 00:06:09,300 --> 00:06:14,880 uh by sure dumb luck they paid me before 163 00:06:11,360 --> 00:06:16,979 they went bankrupt so uh you know uh 164 00:06:14,880 --> 00:06:18,419 this is a bad idea I claim it's well 165 00:06:16,979 --> 00:06:19,979 implemented the code's lost in the midst 166 00:06:18,419 --> 00:06:23,000 of time so who knows 167 00:06:19,979 --> 00:06:23,000 but hey I got paid 168 00:06:24,000 --> 00:06:29,940 the problem here was just the computer's 169 00:06:27,600 --> 00:06:32,100 field was quite immature I was the only 170 00:06:29,940 --> 00:06:33,720 one on the team that had any experience 171 00:06:32,100 --> 00:06:35,280 and I didn't have any contact whatsoever 172 00:06:33,720 --> 00:06:38,900 with the customer and that was just 173 00:06:35,280 --> 00:06:38,900 built to fail which it did 174 00:06:41,699 --> 00:06:45,960 so this quote from about that same time 175 00:06:43,919 --> 00:06:48,240 every nose debugging is twice as hard as 176 00:06:45,960 --> 00:06:50,880 writing a program in the first place 177 00:06:48,240 --> 00:06:52,080 well that's what we got hit by 178 00:06:50,880 --> 00:06:53,639 the thing is 179 00:06:52,080 --> 00:06:55,080 while you're programming you're living 180 00:06:53,639 --> 00:06:56,340 in Blissful ignorance of a lot of the 181 00:06:55,080 --> 00:06:57,539 requirements yeah you got the ones you 182 00:06:56,340 --> 00:06:59,940 know about and you're quoting those and 183 00:06:57,539 --> 00:07:02,280 it's great uh but those other Pro 184 00:06:59,940 --> 00:07:04,380 requirements are really still there okay 185 00:07:02,280 --> 00:07:07,440 and they hit you really hard during 186 00:07:04,380 --> 00:07:09,000 debugging and this lack of requirements 187 00:07:07,440 --> 00:07:12,780 lack of knowledge requirements is only 188 00:07:09,000 --> 00:07:15,300 one cause of karnahan's observation 189 00:07:12,780 --> 00:07:17,340 my problem was that I failed to 190 00:07:15,300 --> 00:07:19,979 understand that I was competing with a 191 00:07:17,340 --> 00:07:21,419 file cabinet 192 00:07:19,979 --> 00:07:24,979 and you know what 193 00:07:21,419 --> 00:07:24,979 the file cabinet one 194 00:07:26,099 --> 00:07:31,380 so again uh it's a live among your users 195 00:07:29,580 --> 00:07:36,419 thing let's go back even further to the 196 00:07:31,380 --> 00:07:38,460 1970s uh this was uh the first paying 197 00:07:36,419 --> 00:07:40,500 assignment I had now I worked my way 198 00:07:38,460 --> 00:07:42,060 through University the first time uh 199 00:07:40,500 --> 00:07:43,680 working on the program that figured out 200 00:07:42,060 --> 00:07:45,479 which student went into which dorm room 201 00:07:43,680 --> 00:07:47,940 and also how much they should be charged 202 00:07:45,479 --> 00:07:51,479 uh this is not the computer used but it 203 00:07:47,940 --> 00:07:53,099 was the same type A CDC 3300 uh it was 204 00:07:51,479 --> 00:07:54,840 slightly configured slightly differently 205 00:07:53,099 --> 00:07:56,039 but it filled a room just like that one 206 00:07:54,840 --> 00:07:58,319 did 207 00:07:56,039 --> 00:08:00,599 and we were using Punch Cards in Fortran 208 00:07:58,319 --> 00:08:02,940 I kid you not Fortran for a business 209 00:08:00,599 --> 00:08:04,080 application program but such was the 210 00:08:02,940 --> 00:08:06,120 time 211 00:08:04,080 --> 00:08:08,819 but we moved to the times later on we 212 00:08:06,120 --> 00:08:11,460 moved to a cyber 73 which is a successor 213 00:08:08,819 --> 00:08:12,900 of the famous CDC 6600 supercomputer 214 00:08:11,460 --> 00:08:17,400 from the mid 60s 215 00:08:12,900 --> 00:08:19,560 and uh we went to Punch Cards on Cobalt 216 00:08:17,400 --> 00:08:21,720 now there were a lot of issues with this 217 00:08:19,560 --> 00:08:23,639 being as things as they were one of them 218 00:08:21,720 --> 00:08:25,440 was that the code I received had some 219 00:08:23,639 --> 00:08:27,120 temporal confusion 220 00:08:25,440 --> 00:08:29,099 normally the students were charged for 221 00:08:27,120 --> 00:08:31,259 an entire term so this didn't matter but 222 00:08:29,099 --> 00:08:33,899 uh one feature they'd asked before I 223 00:08:31,259 --> 00:08:35,339 showed up was to make it so that a 224 00:08:33,899 --> 00:08:36,779 student could arrive part way through 225 00:08:35,339 --> 00:08:38,820 the term and they could be have their 226 00:08:36,779 --> 00:08:40,620 charges prorated over the amount of the 227 00:08:38,820 --> 00:08:42,240 term they actually resided in the dorm 228 00:08:40,620 --> 00:08:44,940 room 229 00:08:42,240 --> 00:08:46,980 okay so that's a problem was the the 230 00:08:44,940 --> 00:08:48,240 computer had the computer program had 231 00:08:46,980 --> 00:08:50,820 some uh 232 00:08:48,240 --> 00:08:53,040 interesting ideas about different 233 00:08:50,820 --> 00:08:55,680 days of having different value 234 00:08:53,040 --> 00:08:57,300 and a student started on Friday and was 235 00:08:55,680 --> 00:08:59,040 not amused with the bill 236 00:08:57,300 --> 00:09:00,420 my manager had some interesting 237 00:08:59,040 --> 00:09:02,760 suggestions for what that student might 238 00:09:00,420 --> 00:09:05,279 have wished you the money instead but my 239 00:09:02,760 --> 00:09:07,080 job was to fix the problem 240 00:09:05,279 --> 00:09:09,420 and the problem was the months vary in 241 00:09:07,080 --> 00:09:12,420 length I mean this month for example has 242 00:09:09,420 --> 00:09:14,940 31 days next month has 30 days last 243 00:09:12,420 --> 00:09:17,459 month had 28 days except that last month 244 00:09:14,940 --> 00:09:19,140 the next year has 29 days 245 00:09:17,459 --> 00:09:21,480 and there's a lot of corner cases and 246 00:09:19,140 --> 00:09:23,519 easy to get that wrong and uh the code 247 00:09:21,480 --> 00:09:24,959 had gotten it wrong 248 00:09:23,519 --> 00:09:26,760 the solution was something called the 249 00:09:24,959 --> 00:09:28,320 jaded algorithm and no this has nothing 250 00:09:26,760 --> 00:09:30,120 to do with the website this is before 251 00:09:28,320 --> 00:09:32,040 the web showed up this is just a nice 252 00:09:30,120 --> 00:09:34,440 numerical algorithm that turns month day 253 00:09:32,040 --> 00:09:36,300 year into a sequential number for each 254 00:09:34,440 --> 00:09:38,459 day going up and also switches them back 255 00:09:36,300 --> 00:09:40,140 which makes this sort of thing trivial I 256 00:09:38,459 --> 00:09:42,120 applied that and life got easy really 257 00:09:40,140 --> 00:09:44,160 quickly 258 00:09:42,120 --> 00:09:45,300 uh this was a good idea implemented 259 00:09:44,160 --> 00:09:46,920 poorly 260 00:09:45,300 --> 00:09:48,899 now I've kind of broken the rules here 261 00:09:46,920 --> 00:09:51,540 because I'm telling on my predecessor 262 00:09:48,899 --> 00:09:54,540 not myself so I should fix that and to 263 00:09:51,540 --> 00:09:58,080 do that we'll go ahead to the mid-1990s 264 00:09:54,540 --> 00:09:59,580 this is a sequent symmetry system and 265 00:09:58,080 --> 00:10:00,839 these were used to make cluster database 266 00:09:59,580 --> 00:10:02,940 servers 267 00:10:00,839 --> 00:10:04,740 and the idea was you had a shared disk 268 00:10:02,940 --> 00:10:06,420 system so each server here could access 269 00:10:04,740 --> 00:10:07,920 all the disks 270 00:10:06,420 --> 00:10:09,779 and the cool thing about that there was 271 00:10:07,920 --> 00:10:11,339 a locking algorithm distributed lock 272 00:10:09,779 --> 00:10:13,019 algorithm according to all coordinated 273 00:10:11,339 --> 00:10:14,640 all of that but the cool thing was that 274 00:10:13,019 --> 00:10:16,680 one of those guys go down you could 275 00:10:14,640 --> 00:10:18,720 still access all the data 276 00:10:16,680 --> 00:10:20,640 of course if you're doing something like 277 00:10:18,720 --> 00:10:22,019 that failover is kind of tricky so you 278 00:10:20,640 --> 00:10:23,640 really want to make sure that you test 279 00:10:22,019 --> 00:10:25,740 this thing all right you need to test it 280 00:10:23,640 --> 00:10:27,420 frequently more recently we have a 281 00:10:25,740 --> 00:10:29,100 concept called a chaos monkey to do that 282 00:10:27,420 --> 00:10:32,180 but that was a long time in the future 283 00:10:29,100 --> 00:10:32,180 back in the early 90s 284 00:10:32,279 --> 00:10:36,779 but for some reason one of our customers 285 00:10:35,279 --> 00:10:38,640 objected 286 00:10:36,779 --> 00:10:40,019 to having this thing tested every 287 00:10:38,640 --> 00:10:42,899 evening 288 00:10:40,019 --> 00:10:46,260 which we were inadvertently doing 289 00:10:42,899 --> 00:10:48,720 fortunately for us it uh system crashed 290 00:10:46,260 --> 00:10:50,700 immediately after backups completed 291 00:10:48,720 --> 00:10:53,820 but still this was not a good look and 292 00:10:50,700 --> 00:10:55,440 the customer complained vociferously 293 00:10:53,820 --> 00:10:56,700 problem was I mean the standard thing 294 00:10:55,440 --> 00:10:58,500 back then you collect your crash dump 295 00:10:56,700 --> 00:11:00,240 from the customer analyze it and fix the 296 00:10:58,500 --> 00:11:01,980 fix except that the crash dump was just 297 00:11:00,240 --> 00:11:04,860 a mess there was no useful information 298 00:11:01,980 --> 00:11:06,660 in it and we were for a long time unable 299 00:11:04,860 --> 00:11:08,640 to reproduce in the lab we tried what 300 00:11:06,660 --> 00:11:11,519 they did it worked fine 301 00:11:08,640 --> 00:11:13,980 until finally we did find a test case 302 00:11:11,519 --> 00:11:16,440 that uh between five hours and 24 hours 303 00:11:13,980 --> 00:11:18,120 77 hours it would fall over 304 00:11:16,440 --> 00:11:21,660 problem with this it takes a long time 305 00:11:18,120 --> 00:11:23,459 to test an alleged fix and uh there's 306 00:11:21,660 --> 00:11:25,560 this U.S holiday at the end of May 307 00:11:23,459 --> 00:11:29,040 called Memorial Day and it was the start 308 00:11:25,560 --> 00:11:30,779 of that Memorial Day three day weekend 309 00:11:29,040 --> 00:11:31,920 but you know we had a problem we need to 310 00:11:30,779 --> 00:11:33,120 fix it the customer was getting really 311 00:11:31,920 --> 00:11:34,860 upset 312 00:11:33,120 --> 00:11:36,779 uh we finally got a hint from a stack 313 00:11:34,860 --> 00:11:38,820 trace and that pointed the figure at 314 00:11:36,779 --> 00:11:42,779 this piece of the code 315 00:11:38,820 --> 00:11:44,339 and it translated an address to a data 316 00:11:42,779 --> 00:11:47,040 structure that represented a two Meg 317 00:11:44,339 --> 00:11:50,220 section of memory unfortunately memory 318 00:11:47,040 --> 00:11:51,899 was very scarce expensive and so we 319 00:11:50,220 --> 00:11:53,519 couldn't afford to line the memory 320 00:11:51,899 --> 00:11:55,500 region so that meant that we'd have 321 00:11:53,519 --> 00:11:57,000 something like this the address was up 322 00:11:55,500 --> 00:11:59,100 there that yellow arrow 323 00:11:57,000 --> 00:12:00,839 where the beginning of the region was 324 00:11:59,100 --> 00:12:02,459 down in one two Meg region and the 325 00:12:00,839 --> 00:12:04,140 actual address while still in that 326 00:12:02,459 --> 00:12:05,940 region was in another was in the next 327 00:12:04,140 --> 00:12:07,680 two Meg chunk of memory 328 00:12:05,940 --> 00:12:09,120 and that was okay we had some code to 329 00:12:07,680 --> 00:12:10,500 fix that we would go it would translate 330 00:12:09,120 --> 00:12:12,420 it to the thing it would say wait a 331 00:12:10,500 --> 00:12:14,040 minute this entry they rape is that blue 332 00:12:12,420 --> 00:12:16,500 guy up there not the yellow guy so let's 333 00:12:14,040 --> 00:12:17,880 go to the previous one and yay life is 334 00:12:16,500 --> 00:12:20,160 happy and let's go 335 00:12:17,880 --> 00:12:22,440 except that compilers sometimes mess you 336 00:12:20,160 --> 00:12:24,779 up and then and this was an 80 86 337 00:12:22,440 --> 00:12:26,820 processor which has and 32 bits and 338 00:12:24,779 --> 00:12:29,339 doesn't have very many registers so the 339 00:12:26,820 --> 00:12:30,360 compiler decided that uh no I don't have 340 00:12:29,339 --> 00:12:31,620 any of our registers we're going to 341 00:12:30,360 --> 00:12:32,940 refetch the stupid thing every time I 342 00:12:31,620 --> 00:12:34,200 don't care about your temp variable I'm 343 00:12:32,940 --> 00:12:36,540 not going to use it 344 00:12:34,200 --> 00:12:39,600 and in that case of course things can go 345 00:12:36,540 --> 00:12:41,399 bad uh we go and we translate to the 346 00:12:39,600 --> 00:12:43,680 thing we check it's a null pointer we 347 00:12:41,399 --> 00:12:45,720 say nope it's not null but then the guy 348 00:12:43,680 --> 00:12:47,940 up there goes away now the pointer is no 349 00:12:45,720 --> 00:12:49,380 we compare a no point is less than any 350 00:12:47,940 --> 00:12:51,120 other pointer so this is great we got 351 00:12:49,380 --> 00:12:52,860 the right guy we won't decrement and 352 00:12:51,120 --> 00:12:56,060 bang things go bad 353 00:12:52,860 --> 00:12:56,060 that's what was happening 354 00:12:56,760 --> 00:13:01,320 okay this is an easy fix I've shown the 355 00:12:59,040 --> 00:13:03,000 fix in Linux kerneles uh you just do 356 00:13:01,320 --> 00:13:04,860 read once basically tell the compiler 357 00:13:03,000 --> 00:13:06,240 you can't trust this pick it up and use 358 00:13:04,860 --> 00:13:07,860 what you picked up don't pick it up 359 00:13:06,240 --> 00:13:10,440 again unless until I tell it to you tell 360 00:13:07,860 --> 00:13:12,120 you to and in that case it works great 361 00:13:10,440 --> 00:13:13,980 the memory can be free we still have the 362 00:13:12,120 --> 00:13:15,300 old value which is not null we do the 363 00:13:13,980 --> 00:13:16,260 comparison everything works out and it's 364 00:13:15,300 --> 00:13:18,959 great 365 00:13:16,260 --> 00:13:21,180 this is a good idea implemented poorly 366 00:13:18,959 --> 00:13:23,040 and for any compiler writers that might 367 00:13:21,180 --> 00:13:24,300 happen with the audience despite what 368 00:13:23,040 --> 00:13:26,639 you may have heard and what you may feel 369 00:13:24,300 --> 00:13:29,279 volatile is your friend okay 370 00:13:26,639 --> 00:13:30,899 it's needed by device drivers so if 371 00:13:29,279 --> 00:13:33,660 volatile doesn't work your credit card 372 00:13:30,899 --> 00:13:35,880 doesn't work either all right 373 00:13:33,660 --> 00:13:38,220 and uh that's what rewinds does it 374 00:13:35,880 --> 00:13:40,260 doesn't volatile read and that is the 375 00:13:38,220 --> 00:13:42,060 story of how I provide deprive myself 376 00:13:40,260 --> 00:13:45,420 and my colleagues of Memorial Day 377 00:13:42,060 --> 00:13:48,380 Weekend okay didn't work out very well 378 00:13:45,420 --> 00:13:48,380 but here I am 379 00:13:48,839 --> 00:13:52,440 well now I'd like to go back to the 380 00:13:50,399 --> 00:13:54,660 1970s this is my very first professional 381 00:13:52,440 --> 00:13:55,860 project I said earlier about a first 382 00:13:54,660 --> 00:13:58,740 project but that was the first paid 383 00:13:55,860 --> 00:14:01,079 project this uh involved a computer that 384 00:13:58,740 --> 00:14:03,420 was off-site inside our high school this 385 00:14:01,079 --> 00:14:05,459 is what you saw a Model 33 teletide 386 00:14:03,420 --> 00:14:06,720 paper tape the whole show author Cat 387 00:14:05,459 --> 00:14:08,639 uppercase only 388 00:14:06,720 --> 00:14:11,220 the actual computer was connected by a 389 00:14:08,639 --> 00:14:12,360 lease line 110 bod it was a couple hours 390 00:14:11,220 --> 00:14:13,800 drive away 391 00:14:12,360 --> 00:14:15,420 and I was really impressed with this 392 00:14:13,800 --> 00:14:18,000 when I first saw it it could type faster 393 00:14:15,420 --> 00:14:19,380 than I could all right wow 394 00:14:18,000 --> 00:14:21,680 um and it didn't involve a bunch of 395 00:14:19,380 --> 00:14:21,680 cards 396 00:14:21,779 --> 00:14:25,380 now this was a pro bono computer dating 397 00:14:23,820 --> 00:14:27,720 program they had a dance the National 398 00:14:25,380 --> 00:14:29,220 Honor Society it was a fundraiser 399 00:14:27,720 --> 00:14:31,079 and uh 400 00:14:29,220 --> 00:14:33,779 I was in charge of writing the code that 401 00:14:31,079 --> 00:14:36,600 uh did the matching now I had a pretty 402 00:14:33,779 --> 00:14:38,880 good idea with the boys were after but 403 00:14:36,600 --> 00:14:40,139 um at that time and place doing similar 404 00:14:38,880 --> 00:14:41,220 research on what the girls would after 405 00:14:40,139 --> 00:14:43,620 would have been considered highly 406 00:14:41,220 --> 00:14:45,720 inappropriate so instead I interrogated 407 00:14:43,620 --> 00:14:47,459 my home economics teacher I didn't 408 00:14:45,720 --> 00:14:49,019 actually take any courses from her but I 409 00:14:47,459 --> 00:14:51,000 interrogated her anyway 410 00:14:49,019 --> 00:14:52,560 and extracted some questions from her 411 00:14:51,000 --> 00:14:54,600 and that allowed me to make a simple 412 00:14:52,560 --> 00:14:57,060 Hemi distance matching with the expected 413 00:14:54,600 --> 00:14:59,279 1970s constraints on matches 414 00:14:57,060 --> 00:15:01,079 so we sold the students questionnaires 415 00:14:59,279 --> 00:15:03,000 they filled them out and gave them back 416 00:15:01,079 --> 00:15:04,740 to us we constraint transcribe the 417 00:15:03,000 --> 00:15:06,480 result of paper tape and then read that 418 00:15:04,740 --> 00:15:08,220 paper tape into the program 419 00:15:06,480 --> 00:15:10,339 this symbol was effective it worked 420 00:15:08,220 --> 00:15:10,339 great 421 00:15:10,519 --> 00:15:16,920 except that we did have one dissatisfied 422 00:15:15,300 --> 00:15:19,500 customer 423 00:15:16,920 --> 00:15:21,660 it seems that a senior girl got matched 424 00:15:19,500 --> 00:15:24,240 only with freshman boys which I've said 425 00:15:21,660 --> 00:15:25,620 her greatly and we looked at the Forum 426 00:15:24,240 --> 00:15:26,880 and she really did check the seniors 427 00:15:25,620 --> 00:15:28,440 only box 428 00:15:26,880 --> 00:15:30,180 uh you know check the program and it 429 00:15:28,440 --> 00:15:31,560 looked to be fine and so we checked the 430 00:15:30,180 --> 00:15:33,240 paper tape and sure enough this is a 431 00:15:31,560 --> 00:15:35,399 data entry error 432 00:15:33,240 --> 00:15:37,500 let this be a lesson to you having a 433 00:15:35,399 --> 00:15:39,300 correct program is not enough 434 00:15:37,500 --> 00:15:41,639 the environment in which that program 435 00:15:39,300 --> 00:15:43,440 runs and the processes surrounding that 436 00:15:41,639 --> 00:15:45,180 that program are also very very 437 00:15:43,440 --> 00:15:46,800 important 438 00:15:45,180 --> 00:15:49,320 now I could claim hey the code was 439 00:15:46,800 --> 00:15:51,060 correct I'm good except uh and this was 440 00:15:49,320 --> 00:15:53,040 a good idea implemented properly as far 441 00:15:51,060 --> 00:15:55,139 as the code is concerned 442 00:15:53,040 --> 00:15:56,459 but I was also the overall Project Lead 443 00:15:55,139 --> 00:15:58,199 I was in charge of this whole thing as 444 00:15:56,459 --> 00:16:00,000 well as the guy writing the code 445 00:15:58,199 --> 00:16:01,800 that meant that if there's a problem 446 00:16:00,000 --> 00:16:04,399 it's my responsibility no matter where 447 00:16:01,800 --> 00:16:04,399 the problem is 448 00:16:05,940 --> 00:16:11,820 and you know uh looking back on it it 449 00:16:09,540 --> 00:16:15,980 might have been a mistake to entrust the 450 00:16:11,820 --> 00:16:15,980 data entry to the Freshman boys 451 00:16:18,240 --> 00:16:22,139 but also 452 00:16:20,279 --> 00:16:23,399 I missed a chance to learn from that 453 00:16:22,139 --> 00:16:25,740 young lady 454 00:16:23,399 --> 00:16:27,300 and there's a quote more recently a lot 455 00:16:25,740 --> 00:16:29,820 of success in life and business comes 456 00:16:27,300 --> 00:16:31,620 from knowing what you want to avoid 457 00:16:29,820 --> 00:16:33,060 and I don't really know if you knew what 458 00:16:31,620 --> 00:16:34,199 she wanted but she sure knew what she 459 00:16:33,060 --> 00:16:35,820 didn't want 460 00:16:34,199 --> 00:16:38,339 and if I had learned that that was 461 00:16:35,820 --> 00:16:40,920 important rather than just being uh you 462 00:16:38,339 --> 00:16:42,600 know I've got this angry customer 463 00:16:40,920 --> 00:16:44,759 I could have saved myself a lot of 464 00:16:42,600 --> 00:16:47,839 trouble later in life I eventually 465 00:16:44,759 --> 00:16:47,839 learned it so here we are 466 00:16:48,899 --> 00:16:51,839 let's get into this Century I've been 467 00:16:50,579 --> 00:16:53,100 doing a lot of stuff from the past 468 00:16:51,839 --> 00:16:54,420 Century let's at least get into this 469 00:16:53,100 --> 00:16:56,220 century 470 00:16:54,420 --> 00:16:57,420 with the real-time Linux I'm not showing 471 00:16:56,220 --> 00:16:59,160 a picture of a machine here just because 472 00:16:57,420 --> 00:17:01,259 there's so much variety there's all just 473 00:16:59,160 --> 00:17:03,540 any range of things that might use 474 00:17:01,259 --> 00:17:06,780 real-time Linux these days 475 00:17:03,540 --> 00:17:09,179 but I'll go back to 2004 and uh my 476 00:17:06,780 --> 00:17:10,439 employer at the time IBM got a lot of 477 00:17:09,179 --> 00:17:12,059 requests for something called real-time 478 00:17:10,439 --> 00:17:15,059 Linux but it was actually Enterprise 479 00:17:12,059 --> 00:17:17,400 grade real-time Linux by which the 480 00:17:15,059 --> 00:17:18,660 people making the requesting proposals 481 00:17:17,400 --> 00:17:20,699 meant something that IBM would stand 482 00:17:18,660 --> 00:17:22,679 behind and fix everything for 483 00:17:20,699 --> 00:17:24,839 problem was that no such thing existed 484 00:17:22,679 --> 00:17:27,360 and IBM had rather strict rules for 485 00:17:24,839 --> 00:17:29,220 contracts calling for mythical creatures 486 00:17:27,360 --> 00:17:33,299 and these rules can be stated in two 487 00:17:29,220 --> 00:17:35,220 words and those two words are no bid 488 00:17:33,299 --> 00:17:37,440 so there was a lot of no bid on 489 00:17:35,220 --> 00:17:38,760 real-time Linux contracts 490 00:17:37,440 --> 00:17:40,080 except 491 00:17:38,760 --> 00:17:41,940 at that time there were some chips 492 00:17:40,080 --> 00:17:43,140 coming out 493 00:17:41,940 --> 00:17:45,000 there were armed chips and they had 494 00:17:43,140 --> 00:17:48,179 multiple CPUs 495 00:17:45,000 --> 00:17:49,980 I saw a a relief press release on these 496 00:17:48,179 --> 00:17:52,320 things it's like wait a minute 497 00:17:49,980 --> 00:17:54,480 we can take one of the CPUs and Mark it 498 00:17:52,320 --> 00:17:56,039 as non-real time so CPU zero will be our 499 00:17:54,480 --> 00:17:57,900 sacrificial lamb 500 00:17:56,039 --> 00:17:59,160 and then all the other CPUs are 501 00:17:57,900 --> 00:18:00,179 restricted to only running real-time 502 00:17:59,160 --> 00:18:01,980 code 503 00:18:00,179 --> 00:18:04,140 and to make it so I can get rid of the 504 00:18:01,980 --> 00:18:06,000 arrows I'm going to change the legend so 505 00:18:04,140 --> 00:18:07,559 we have nrt for non-real time and RT for 506 00:18:06,000 --> 00:18:08,760 real time we can drop the arrows at that 507 00:18:07,559 --> 00:18:10,320 point 508 00:18:08,760 --> 00:18:12,120 and then we have the green real-time 509 00:18:10,320 --> 00:18:14,280 code running on CPUs one through three 510 00:18:12,120 --> 00:18:16,500 and if anything happens non-real time 511 00:18:14,280 --> 00:18:19,559 it's on CPU zero 512 00:18:16,500 --> 00:18:21,360 of course at that point even doing a 513 00:18:19,559 --> 00:18:22,980 system called Linux kernel was non-real 514 00:18:21,360 --> 00:18:24,240 time pretty much any system call was 515 00:18:22,980 --> 00:18:26,460 going to get the big kernel lock and it 516 00:18:24,240 --> 00:18:28,620 was game over but that we could deal 517 00:18:26,460 --> 00:18:30,900 with that what happens is if that 518 00:18:28,620 --> 00:18:31,980 happens we migrate 519 00:18:30,900 --> 00:18:34,200 so 520 00:18:31,980 --> 00:18:37,260 that guy that just turned red over there 521 00:18:34,200 --> 00:18:38,580 in the in the lower left CPU two he just 522 00:18:37,260 --> 00:18:39,720 did a system call well we're not gonna 523 00:18:38,580 --> 00:18:40,919 let him do a system call we're gonna 524 00:18:39,720 --> 00:18:43,980 migrate him 525 00:18:40,919 --> 00:18:46,440 to CPU zero 526 00:18:43,980 --> 00:18:48,000 then we'll let him do a system call 527 00:18:46,440 --> 00:18:49,380 and then when he gets stolen system call 528 00:18:48,000 --> 00:18:52,260 Will migrate him back and they continue 529 00:18:49,380 --> 00:18:53,700 being real time 530 00:18:52,260 --> 00:18:56,340 this is simple it's pretty 531 00:18:53,700 --> 00:18:59,700 straightforward easy concept 532 00:18:56,340 --> 00:19:01,500 of course an easy concept uh so I 533 00:18:59,700 --> 00:19:04,500 implemented a patch 534 00:19:01,500 --> 00:19:06,240 and uh it worked tests out worked out 535 00:19:04,500 --> 00:19:07,980 great 536 00:19:06,240 --> 00:19:09,059 there was a real-time effort spending 537 00:19:07,980 --> 00:19:10,559 that's a really big thing they were 538 00:19:09,059 --> 00:19:13,559 reading whole kernels crashing 539 00:19:10,559 --> 00:19:14,820 everywhere uh but you know would they be 540 00:19:13,559 --> 00:19:16,260 successful if they're successful when 541 00:19:14,820 --> 00:19:18,059 would be successful 542 00:19:16,260 --> 00:19:20,580 uh what I was doing is very pragmatic 543 00:19:18,059 --> 00:19:23,340 and and I knew I could commit to it so 544 00:19:20,580 --> 00:19:24,720 what I did I informed my Executives that 545 00:19:23,340 --> 00:19:27,620 real-time Linux is real and we didn't 546 00:19:24,720 --> 00:19:27,620 need to know a bit anymore 547 00:19:27,840 --> 00:19:30,559 and you know what 548 00:19:30,600 --> 00:19:35,760 those guys actually listen to me 549 00:19:32,820 --> 00:19:37,559 and so we run won a large contract 550 00:19:35,760 --> 00:19:39,960 you know had a nice celebration and 551 00:19:37,559 --> 00:19:43,559 everything except that 552 00:19:39,960 --> 00:19:46,880 uh the customer rejected my idea 553 00:19:43,559 --> 00:19:46,880 that wasn't going to work for them 554 00:19:47,520 --> 00:19:51,960 now 555 00:19:48,960 --> 00:19:53,460 I Can't Tell You Why my brilliant idea 556 00:19:51,960 --> 00:19:55,200 was rejected because I didn't have a 557 00:19:53,460 --> 00:19:56,460 security clearance and so they couldn't 558 00:19:55,200 --> 00:19:58,679 tell me 559 00:19:56,460 --> 00:20:00,840 but that's all of the good because I can 560 00:19:58,679 --> 00:20:02,880 share my speculation with you 561 00:20:00,840 --> 00:20:05,640 and my guess is if you were on a high 562 00:20:02,880 --> 00:20:07,020 value Target and there was Munitions 563 00:20:05,640 --> 00:20:09,660 coming at you at a high rate of speed 564 00:20:07,020 --> 00:20:11,640 you just might want all the CPUs working 565 00:20:09,660 --> 00:20:15,240 out what to do about that 566 00:20:11,640 --> 00:20:16,320 but in any case uh my idea was rejected 567 00:20:15,240 --> 00:20:17,820 we still had these contractual 568 00:20:16,320 --> 00:20:19,200 commitments to me this cracked contract 569 00:20:17,820 --> 00:20:21,360 have been signed before I found out 570 00:20:19,200 --> 00:20:23,520 about this 571 00:20:21,360 --> 00:20:25,799 you remember a couple slides ago there 572 00:20:23,520 --> 00:20:28,640 was that effort I was throwing shade on 573 00:20:25,799 --> 00:20:31,020 well now it was the only game in town 574 00:20:28,640 --> 00:20:32,220 and one of the reasons I was didn't 575 00:20:31,020 --> 00:20:33,840 trust it is because I was pretty sure 576 00:20:32,220 --> 00:20:36,059 they're gonna have problems with RCU and 577 00:20:33,840 --> 00:20:38,940 and I was right and so it was clearly my 578 00:20:36,059 --> 00:20:40,740 job to help them with RCU 579 00:20:38,940 --> 00:20:42,539 and it was something like three from 580 00:20:40,740 --> 00:20:44,220 scratch implementations to get an RCU 581 00:20:42,539 --> 00:20:46,080 that worked well and scaled did all that 582 00:20:44,220 --> 00:20:48,660 stuff but fortunately the first one out 583 00:20:46,080 --> 00:20:51,080 of the gate was good enough for that 584 00:20:48,660 --> 00:20:51,080 contract 585 00:20:51,660 --> 00:20:54,840 and that might sound like kind of a 586 00:20:53,220 --> 00:20:56,820 problem but you know that effort was on 587 00:20:54,840 --> 00:20:58,320 the highlights of my career now it's 588 00:20:56,820 --> 00:20:59,580 working with some really bright guys we 589 00:20:58,320 --> 00:21:01,020 were doing stuff that where I wasn't 590 00:20:59,580 --> 00:21:02,880 clear we could get it done and we got it 591 00:21:01,020 --> 00:21:05,760 done and and changed the world it was it 592 00:21:02,880 --> 00:21:07,679 was a lot of fun that's uh it was it was 593 00:21:05,760 --> 00:21:09,419 a real great experience 594 00:21:07,679 --> 00:21:12,500 sort of a nerve-wracking experience at 595 00:21:09,419 --> 00:21:12,500 times but a great experience 596 00:21:12,900 --> 00:21:17,940 so there it is and uh I had a nice idea 597 00:21:15,780 --> 00:21:20,940 that colliding with reality and you know 598 00:21:17,940 --> 00:21:24,380 when that happens reality wins okay 599 00:21:20,940 --> 00:21:24,380 reality wins 600 00:21:25,799 --> 00:21:30,240 um I'm going to uh 601 00:21:28,020 --> 00:21:32,340 go through this kind of quickly 602 00:21:30,240 --> 00:21:33,720 uh first off somebody may like formal 603 00:21:32,340 --> 00:21:36,720 verification some may have been burned 604 00:21:33,720 --> 00:21:39,059 by it so it's worth asking why bother in 605 00:21:36,720 --> 00:21:40,380 practice with formal verification and 606 00:21:39,059 --> 00:21:42,179 I'm going to view my experience we're 607 00:21:40,380 --> 00:21:44,039 going to go back to that National Honor 608 00:21:42,179 --> 00:21:46,860 Society computer dating program I wrote 609 00:21:44,039 --> 00:21:49,620 in the 1970s and there was only one of 610 00:21:46,860 --> 00:21:51,360 them right we only had one user so if we 611 00:21:49,620 --> 00:21:52,740 had a million year bug in that code I'm 612 00:21:51,360 --> 00:21:55,380 not sure how you you'd have a million 613 00:21:52,740 --> 00:21:57,539 year bug and a code that just checks and 614 00:21:55,380 --> 00:21:59,700 does Hamming Edition but if we did it 615 00:21:57,539 --> 00:22:01,799 would happen once in a million years and 616 00:21:59,700 --> 00:22:03,780 at that point you know you know Murphy 617 00:22:01,799 --> 00:22:06,960 at that point he's kind of a nice guy 618 00:22:03,780 --> 00:22:08,460 sure everything that can happen will 619 00:22:06,960 --> 00:22:10,860 eventually 620 00:22:08,460 --> 00:22:13,320 maybe in geologic time 621 00:22:10,860 --> 00:22:16,260 but over time I've written code this had 622 00:22:13,320 --> 00:22:18,840 more and more and more users 623 00:22:16,260 --> 00:22:20,340 and in 2017 a guy from arm told me that 624 00:22:18,840 --> 00:22:22,380 there were at least 20 billion that's 625 00:22:20,340 --> 00:22:24,480 dealing with a B is in 10 to the ninth 626 00:22:22,380 --> 00:22:27,000 power as in 20 times 10 to the ninth 627 00:22:24,480 --> 00:22:29,340 power Linux systems out there 628 00:22:27,000 --> 00:22:30,539 at that time across the stall base a 629 00:22:29,340 --> 00:22:33,140 million year bug is happening several 630 00:22:30,539 --> 00:22:33,140 times per hour 631 00:22:33,960 --> 00:22:38,100 worse yet if the Linux kernel were to 632 00:22:36,419 --> 00:22:39,900 gain any reasonable fraction even a 633 00:22:38,100 --> 00:22:41,460 small fraction of the iot market and 634 00:22:39,900 --> 00:22:43,380 that goes as much as people think it 635 00:22:41,460 --> 00:22:46,260 will well 636 00:22:43,380 --> 00:22:47,580 uh my fear is that Mr Murphy might 637 00:22:46,260 --> 00:22:49,919 transition from a nice guy to a 638 00:22:47,580 --> 00:22:51,840 homicidal maniac 639 00:22:49,919 --> 00:22:54,120 and I don't know about you guys but I 640 00:22:51,840 --> 00:22:56,460 feel really really bad if I could end up 641 00:22:54,120 --> 00:22:58,020 killing somebody 642 00:22:56,460 --> 00:22:59,760 so 643 00:22:58,020 --> 00:23:01,799 um what do we do about that 644 00:22:59,760 --> 00:23:03,360 in some cases it's surprisingly enough 645 00:23:01,799 --> 00:23:05,940 in some cases you can actually test your 646 00:23:03,360 --> 00:23:07,140 way out of it for example if you had a 647 00:23:05,940 --> 00:23:09,120 bunch of smartphones and people actually 648 00:23:07,140 --> 00:23:11,220 use them as phones like us old guys do 649 00:23:09,120 --> 00:23:13,679 but for servers and for smartphones 650 00:23:11,220 --> 00:23:16,200 using the new age way forget it 651 00:23:13,679 --> 00:23:18,120 plus Linux is starting to use for safety 652 00:23:16,200 --> 00:23:19,559 critical applications not really heavy 653 00:23:18,120 --> 00:23:21,539 duty ones but still safety critical 654 00:23:19,559 --> 00:23:22,799 applications and when you're faced with 655 00:23:21,539 --> 00:23:24,780 something like that 656 00:23:22,799 --> 00:23:29,039 formal verification is full State space 657 00:23:24,780 --> 00:23:32,340 search gets kind of attractive all right 658 00:23:29,039 --> 00:23:34,140 and uh I've done some stuff with it uh 659 00:23:32,340 --> 00:23:36,360 but most of the stuff was verifying 660 00:23:34,140 --> 00:23:38,700 design and to make this work for real 661 00:23:36,360 --> 00:23:40,500 you need regression testing as well 662 00:23:38,700 --> 00:23:42,360 and the thing is you do a design 663 00:23:40,500 --> 00:23:44,220 verification then you do a bug fix and 664 00:23:42,360 --> 00:23:45,960 is your verification still valid well 665 00:23:44,220 --> 00:23:48,299 maybe it is maybe it isn't 666 00:23:45,960 --> 00:23:50,760 uh the problem is this stuff's expensive 667 00:23:48,299 --> 00:23:52,740 at best it's exponential often excitable 668 00:23:50,760 --> 00:23:54,179 and if you do the full up verification 669 00:23:52,740 --> 00:23:56,520 you have full specification that's 670 00:23:54,179 --> 00:23:58,440 software it can contain a lot of bugs 671 00:23:56,520 --> 00:23:59,760 itself which means that if you do your 672 00:23:58,440 --> 00:24:01,260 final verification it's your full 673 00:23:59,760 --> 00:24:03,000 specification might be driving bugs into 674 00:24:01,260 --> 00:24:03,960 your software which is the opposite of 675 00:24:03,000 --> 00:24:06,419 what we want 676 00:24:03,960 --> 00:24:08,580 as a result formal verification really 677 00:24:06,419 --> 00:24:11,340 is used heavily in practice but its use 678 00:24:08,580 --> 00:24:14,100 is highly restricted it's powerful when 679 00:24:11,340 --> 00:24:16,580 properly used but you need to verify the 680 00:24:14,100 --> 00:24:16,580 verification 681 00:24:17,460 --> 00:24:21,840 this is some examples of places it's 682 00:24:19,260 --> 00:24:23,760 used in it's uh the lineage kernel is 683 00:24:21,840 --> 00:24:25,559 subject to formal verification fairly 684 00:24:23,760 --> 00:24:27,600 often 685 00:24:25,559 --> 00:24:29,700 now the way we deal with the verif 686 00:24:27,600 --> 00:24:30,900 verifying the verifiers we do a one-way 687 00:24:29,700 --> 00:24:33,299 bet 688 00:24:30,900 --> 00:24:34,860 so we look for a problem if we find a 689 00:24:33,299 --> 00:24:36,720 problem we treat we do a bug report and 690 00:24:34,860 --> 00:24:38,100 treat it as real if we don't find a 691 00:24:36,720 --> 00:24:39,419 problem we say yeah well whatever we're 692 00:24:38,100 --> 00:24:42,480 not gonna We're Not Gonna trust that 693 00:24:39,419 --> 00:24:44,520 okay so the formal verification can find 694 00:24:42,480 --> 00:24:45,900 bugs for us but it can't lie to us well 695 00:24:44,520 --> 00:24:47,580 it can lie to us all as well we just 696 00:24:45,900 --> 00:24:50,760 won't believe it if it says that the 697 00:24:47,580 --> 00:24:52,740 software is correct we'll just ignore it 698 00:24:50,760 --> 00:24:54,900 so let's get back to this cautionary 699 00:24:52,740 --> 00:24:55,980 quote a lot of success in life and 700 00:24:54,900 --> 00:24:58,200 business comes from knowing what you 701 00:24:55,980 --> 00:25:01,320 want to avoid 702 00:24:58,200 --> 00:25:03,600 uh and uh here are some quotes that are 703 00:25:01,320 --> 00:25:06,179 also pertinent I'll focus on the last 704 00:25:03,600 --> 00:25:07,799 one from the initial Will Rogers 705 00:25:06,179 --> 00:25:10,140 we don't know what we want but we're 706 00:25:07,799 --> 00:25:12,360 ready to bite somebody to get it 707 00:25:10,140 --> 00:25:14,760 and of course that notion leads us to 708 00:25:12,360 --> 00:25:15,960 something called natural selection 709 00:25:14,760 --> 00:25:17,460 and if we're to talk about natural 710 00:25:15,960 --> 00:25:18,900 selection we should have the great man 711 00:25:17,460 --> 00:25:21,240 here Charles Darwin looking on the 712 00:25:18,900 --> 00:25:24,059 proceedings and there he is 713 00:25:21,240 --> 00:25:26,039 and the thing is Charles Darwin was 714 00:25:24,059 --> 00:25:28,080 studying National selection in life 715 00:25:26,039 --> 00:25:30,840 forms 716 00:25:28,080 --> 00:25:34,020 sometime later about a century later 717 00:25:30,840 --> 00:25:37,559 we learned that this natural selection 718 00:25:34,020 --> 00:25:39,600 is based on DNA which has codons which 719 00:25:37,559 --> 00:25:43,200 controls proteins and much else besides 720 00:25:39,600 --> 00:25:45,960 and it's a weird form of software 721 00:25:43,200 --> 00:25:47,820 so Charles Darwin didn't realize it but 722 00:25:45,960 --> 00:25:50,820 he was studying natural selection in 723 00:25:47,820 --> 00:25:53,039 software in living software 724 00:25:50,820 --> 00:25:54,840 and a if natural selection Works in 725 00:25:53,039 --> 00:25:56,100 listening software why not apply it to 726 00:25:54,840 --> 00:25:58,020 real software 727 00:25:56,100 --> 00:26:00,779 so we've got our randomly generated 728 00:25:58,020 --> 00:26:03,000 software instead of mutations 729 00:26:00,779 --> 00:26:04,799 um some of you may object to me calling 730 00:26:03,000 --> 00:26:06,240 your carefully crafted soft randomly 731 00:26:04,799 --> 00:26:08,400 generated 732 00:26:06,240 --> 00:26:10,620 as Russell said at the beginning of this 733 00:26:08,400 --> 00:26:11,940 I've been doing this for Far Over more 734 00:26:10,620 --> 00:26:13,620 than four decades 735 00:26:11,940 --> 00:26:16,500 if you want me to change my assessment 736 00:26:13,620 --> 00:26:18,840 I'm happy to do that after you change 737 00:26:16,500 --> 00:26:20,220 your behavior after you change your 738 00:26:18,840 --> 00:26:22,140 behavior 739 00:26:20,220 --> 00:26:24,240 okay so we got this randomly generated 740 00:26:22,140 --> 00:26:27,600 software you make a validation function 741 00:26:24,240 --> 00:26:29,700 to our selection that kicks out bugs we 742 00:26:27,600 --> 00:26:31,620 fix the bugs and hopefully we inject 743 00:26:29,700 --> 00:26:33,120 fewer bugs than we fix if we manage to 744 00:26:31,620 --> 00:26:34,980 do that we've got this nice virtual 745 00:26:33,120 --> 00:26:37,320 Circle here we go around 746 00:26:34,980 --> 00:26:39,240 and uh we have robust software dropping 747 00:26:37,320 --> 00:26:40,860 out the bottom and you can use agile 748 00:26:39,240 --> 00:26:42,480 methods and well and they attempt to 749 00:26:40,860 --> 00:26:44,700 push this methodology back to the 750 00:26:42,480 --> 00:26:47,460 specification to deal with the what do 751 00:26:44,700 --> 00:26:50,419 users want problem uh with the varying 752 00:26:47,460 --> 00:26:50,419 degrees of success 753 00:26:50,760 --> 00:26:59,880 okay uh there's one problem with this 754 00:26:55,320 --> 00:26:59,880 uh this applies to 755 00:27:01,200 --> 00:27:06,360 software and unfortunately there's a 756 00:27:04,320 --> 00:27:09,299 particular kind of software that it 757 00:27:06,360 --> 00:27:11,720 applies to and that kind of software is 758 00:27:09,299 --> 00:27:15,059 something called a bug 759 00:27:11,720 --> 00:27:16,440 so we don't get robust software dropping 760 00:27:15,059 --> 00:27:18,900 out of the bottom there much as we might 761 00:27:16,440 --> 00:27:20,580 like to instead we get software plus 762 00:27:18,900 --> 00:27:23,059 bugs that have adapted to the 763 00:27:20,580 --> 00:27:23,059 verification 764 00:27:24,659 --> 00:27:28,740 okay so what do we do about that well 765 00:27:27,299 --> 00:27:30,600 one thing we can do is we can get bug 766 00:27:28,740 --> 00:27:32,159 reports in the field now of course we 767 00:27:30,600 --> 00:27:34,340 get the bug report we fix the bug in the 768 00:27:32,159 --> 00:27:37,620 software but it's important to realize 769 00:27:34,340 --> 00:27:39,659 that bug report also indicates a bug in 770 00:27:37,620 --> 00:27:41,700 the validation 771 00:27:39,659 --> 00:27:43,679 the validation has a bug because it 772 00:27:41,700 --> 00:27:45,900 failed as fought a bug in the software 773 00:27:43,679 --> 00:27:48,600 so don't just fix the bug in the soft or 774 00:27:45,900 --> 00:27:49,980 fix the bug in the validation as well if 775 00:27:48,600 --> 00:27:51,659 you do that then you've got a better 776 00:27:49,980 --> 00:27:53,820 chance you can at least get rid of the 777 00:27:51,659 --> 00:27:56,039 bugs that people run into 778 00:27:53,820 --> 00:27:57,900 and make it so that such bugs will be 779 00:27:56,039 --> 00:27:59,700 caught in the future 780 00:27:57,900 --> 00:28:01,080 but there's another issue 781 00:27:59,700 --> 00:28:03,059 okay you're going to validate your 782 00:28:01,080 --> 00:28:04,500 intended use cases naturally right you 783 00:28:03,059 --> 00:28:06,059 write test cases and validation whatever 784 00:28:04,500 --> 00:28:08,220 you can do verification whatever you use 785 00:28:06,059 --> 00:28:09,720 for your use cases you envision 786 00:28:08,220 --> 00:28:11,580 you do some major development that 787 00:28:09,720 --> 00:28:13,860 generates bugs if everything's working 788 00:28:11,580 --> 00:28:16,020 perfectly it never is but let's let's 789 00:28:13,860 --> 00:28:18,419 give it the benefit of the doubt you'll 790 00:28:16,020 --> 00:28:20,279 get rid of all the bugs that the covered 791 00:28:18,419 --> 00:28:22,500 by the current validation 792 00:28:20,279 --> 00:28:24,059 then you do some more development get 793 00:28:22,500 --> 00:28:26,580 rid of some more 794 00:28:24,059 --> 00:28:28,559 and you know what somebody says hey we 795 00:28:26,580 --> 00:28:31,200 can use this or something else 796 00:28:28,559 --> 00:28:35,240 but your software is protected from 797 00:28:31,200 --> 00:28:35,240 those new use cases by walls of bugs 798 00:28:35,279 --> 00:28:38,240 one way out of this is open source 799 00:28:37,260 --> 00:28:40,500 software 800 00:28:38,240 --> 00:28:42,419 the thing is is that the people trying 801 00:28:40,500 --> 00:28:44,159 to use for the news use cases can fix 802 00:28:42,419 --> 00:28:45,539 the bugs or pay somebody else to fix the 803 00:28:44,159 --> 00:28:48,419 bugs or do something anyway they have 804 00:28:45,539 --> 00:28:49,860 some options with proprietary they they 805 00:28:48,419 --> 00:28:51,120 don't release a software and it may just 806 00:28:49,860 --> 00:28:53,700 be stuck 807 00:28:51,120 --> 00:28:56,640 so that can help to some extent and of 808 00:28:53,700 --> 00:28:58,260 course you know if 809 00:28:56,640 --> 00:29:00,659 you have a choice 810 00:28:58,260 --> 00:29:02,760 yes you're going to fix the software 811 00:29:00,659 --> 00:29:04,320 uh that the fix of bugs that are 812 00:29:02,760 --> 00:29:06,059 affecting your customers if you have two 813 00:29:04,320 --> 00:29:07,320 bugs you'll you'll prioritize the one 814 00:29:06,059 --> 00:29:09,480 affecting your customers before the one 815 00:29:07,320 --> 00:29:12,000 that doesn't but if you only ever do 816 00:29:09,480 --> 00:29:13,620 that if you only ever fix the bugs that 817 00:29:12,000 --> 00:29:16,039 affect your customers this is where 818 00:29:13,620 --> 00:29:16,039 you're headed 819 00:29:16,260 --> 00:29:20,340 one way to get out of this is to add 820 00:29:18,240 --> 00:29:22,679 paranoia to the bug reports and right 821 00:29:20,340 --> 00:29:24,840 now a pretty good form of paranoia is 822 00:29:22,679 --> 00:29:26,100 the fuzzer uh this been doing a pretty 823 00:29:24,840 --> 00:29:27,360 good job in the latest kernel finding 824 00:29:26,100 --> 00:29:30,059 all sorts of stuff 825 00:29:27,360 --> 00:29:31,980 and hopefully that's helping and making 826 00:29:30,059 --> 00:29:34,080 things better 827 00:29:31,980 --> 00:29:35,580 all right all right 828 00:29:34,080 --> 00:29:37,080 so 829 00:29:35,580 --> 00:29:39,179 the other piece you got to keep in mind 830 00:29:37,080 --> 00:29:42,000 is natural selection is is a euphemism 831 00:29:39,179 --> 00:29:43,860 for something that's not very pretty 832 00:29:42,000 --> 00:29:45,899 if you don't believe me on that just 833 00:29:43,860 --> 00:29:47,940 think hard and carefully about what 834 00:29:45,899 --> 00:29:49,500 happens to the organism that is not 835 00:29:47,940 --> 00:29:51,960 selected okay 836 00:29:49,500 --> 00:29:54,659 and the same thing applies to Software 837 00:29:51,960 --> 00:29:57,679 if your tests are not failing they are 838 00:29:54,659 --> 00:29:57,679 not improving your software 839 00:29:57,960 --> 00:30:01,980 if your tests are not failing they are 840 00:30:00,120 --> 00:30:03,240 not improving your software if they're 841 00:30:01,980 --> 00:30:04,799 not failing there's a bug in your test 842 00:30:03,240 --> 00:30:07,500 there's bugs in that software somewhere 843 00:30:04,799 --> 00:30:09,539 and that test is failing to find them 844 00:30:07,500 --> 00:30:11,760 in addition if your news users are not 845 00:30:09,539 --> 00:30:12,840 complaining they are not improving your 846 00:30:11,760 --> 00:30:14,159 software 847 00:30:12,840 --> 00:30:16,020 this is a little bit harder to take 848 00:30:14,159 --> 00:30:17,820 sometime it's really painful when users 849 00:30:16,020 --> 00:30:19,799 complain but that's how they improve 850 00:30:17,820 --> 00:30:21,779 your software 851 00:30:19,799 --> 00:30:24,360 okay so why would they fail to complain 852 00:30:21,779 --> 00:30:25,860 well the usual cause is the first one 853 00:30:24,360 --> 00:30:27,299 there they are aren't using our software 854 00:30:25,860 --> 00:30:28,919 at all 855 00:30:27,299 --> 00:30:30,539 they might not know who to complain to 856 00:30:28,919 --> 00:30:31,740 it might be the last end times they 857 00:30:30,539 --> 00:30:33,059 complained either nothing useful 858 00:30:31,740 --> 00:30:34,980 happened or worse yet they were yelled 859 00:30:33,059 --> 00:30:36,240 at or otherwise belittled by the way if 860 00:30:34,980 --> 00:30:38,340 you've been doing this last bit here 861 00:30:36,240 --> 00:30:40,740 yelling at them and belittling them but 862 00:30:38,340 --> 00:30:43,200 please stop that if you sell user 863 00:30:40,740 --> 00:30:44,640 l-u-s-e-r please stop that too that's 864 00:30:43,200 --> 00:30:45,720 that you're not you're not helping 865 00:30:44,640 --> 00:30:47,580 yourself 866 00:30:45,720 --> 00:30:49,559 or it could be your software successful 867 00:30:47,580 --> 00:30:50,640 I mean it just works and nobody has to 868 00:30:49,559 --> 00:30:53,279 do anything with it and it's faded into 869 00:30:50,640 --> 00:30:55,679 the work and and it's wonderful 870 00:30:53,279 --> 00:30:57,059 well that's called a technical success 871 00:30:55,679 --> 00:30:59,039 of course if that happened you probably 872 00:30:57,059 --> 00:31:02,480 have to find something else to do 873 00:30:59,039 --> 00:31:02,480 but that's success 874 00:31:03,179 --> 00:31:08,159 so another cautionary quote is from the 875 00:31:06,000 --> 00:31:09,360 inevitable Steve Jobs customers don't 876 00:31:08,159 --> 00:31:11,039 know what they want until we've shown 877 00:31:09,360 --> 00:31:14,520 them and that's part of the problem here 878 00:31:11,039 --> 00:31:17,880 now Henry Ford has a possibly apocryphal 879 00:31:14,520 --> 00:31:20,279 quote involving horses but uh this one's 880 00:31:17,880 --> 00:31:21,960 verified if there's only one secret of 881 00:31:20,279 --> 00:31:23,340 success it lies in the ability to get 882 00:31:21,960 --> 00:31:24,659 the other person's point of view and see 883 00:31:23,340 --> 00:31:26,100 things from that person's angle as well 884 00:31:24,659 --> 00:31:28,140 as from your own 885 00:31:26,100 --> 00:31:29,100 in other words if your customers don't 886 00:31:28,140 --> 00:31:30,840 know what they want you need to 887 00:31:29,100 --> 00:31:32,820 understand them well enough to see what 888 00:31:30,840 --> 00:31:34,740 they want or need 889 00:31:32,820 --> 00:31:36,419 and that's not something I saw for guys 890 00:31:34,740 --> 00:31:37,020 are really good at 891 00:31:36,419 --> 00:31:39,120 um 892 00:31:37,020 --> 00:31:40,380 we work with things rather than people 893 00:31:39,120 --> 00:31:42,120 that's just that's what we're trained to 894 00:31:40,380 --> 00:31:43,320 do that's what we're supposed to do and 895 00:31:42,120 --> 00:31:45,179 that's why I keep saying You must live 896 00:31:43,320 --> 00:31:47,159 among your users given that we're not 897 00:31:45,179 --> 00:31:49,380 very good at it we have to have the 898 00:31:47,159 --> 00:31:50,700 users around us full time in order so 899 00:31:49,380 --> 00:31:52,620 that we can see what's going on we can 900 00:31:50,700 --> 00:31:55,940 understand where they're coming from but 901 00:31:52,620 --> 00:31:55,940 I repeated exposure 902 00:31:56,399 --> 00:32:00,299 and here's the tricky part you have to 903 00:31:58,380 --> 00:32:02,279 complain on their behalf they may be 904 00:32:00,299 --> 00:32:03,720 perfectly satisfied they're doing 905 00:32:02,279 --> 00:32:05,279 something as repetitive as taking a lot 906 00:32:03,720 --> 00:32:07,279 of time it's causing problems for other 907 00:32:05,279 --> 00:32:10,140 people that's consuming a lot of money 908 00:32:07,279 --> 00:32:11,340 uh but they're getting their job done 909 00:32:10,140 --> 00:32:13,500 it's they're turning the crank it's 910 00:32:11,340 --> 00:32:14,460 working for them they're happy in that 911 00:32:13,500 --> 00:32:16,980 case you have to complain on their 912 00:32:14,460 --> 00:32:19,320 behalf which can be uncomfortable or 913 00:32:16,980 --> 00:32:21,779 even sometimes dangerous but it's a very 914 00:32:19,320 --> 00:32:24,059 important skill As you move up to more 915 00:32:21,779 --> 00:32:26,220 senior levels and yeah I'm in a senior 916 00:32:24,059 --> 00:32:29,039 level I don't bleach my hair this is in 917 00:32:26,220 --> 00:32:32,220 fact my natural hair color 918 00:32:29,039 --> 00:32:34,320 so to sum up you know users people don't 919 00:32:32,220 --> 00:32:36,480 know what they want 920 00:32:34,320 --> 00:32:37,980 but that's no excuse not just for 921 00:32:36,480 --> 00:32:40,799 software Developers for engineers in 922 00:32:37,980 --> 00:32:44,399 general uh we are nonetheless required 923 00:32:40,799 --> 00:32:47,460 to get people things they want and need 924 00:32:44,399 --> 00:32:48,779 perhaps it's some consolation to know 925 00:32:47,460 --> 00:32:51,120 that you've only failed if you've given 926 00:32:48,779 --> 00:32:52,679 up until then it's called learning maybe 927 00:32:51,120 --> 00:32:54,539 learning among them and learning living 928 00:32:52,679 --> 00:32:56,760 among them and learning what they want 929 00:32:54,539 --> 00:32:58,140 and maybe more important you're not a 930 00:32:56,760 --> 00:32:59,760 failure until you start blaming others 931 00:32:58,140 --> 00:33:00,600 for your mistakes so you know the whole 932 00:32:59,760 --> 00:33:01,919 point of this you're going to make 933 00:33:00,600 --> 00:33:03,480 mistakes you need to own them and 934 00:33:01,919 --> 00:33:05,039 improve based on them and then you can 935 00:33:03,480 --> 00:33:07,580 give your users what they don't know 936 00:33:05,039 --> 00:33:07,580 that they want 937 00:33:07,620 --> 00:33:12,600 okay I had it easy actually uh the first 938 00:33:11,039 --> 00:33:14,940 part of my career is just doing assigned 939 00:33:12,600 --> 00:33:16,320 work uh it was easy to figure out what 940 00:33:14,940 --> 00:33:18,000 to do to keep people out of court or 941 00:33:16,320 --> 00:33:19,440 even out of jail that was most of us 942 00:33:18,000 --> 00:33:22,320 doing in the early 80s 943 00:33:19,440 --> 00:33:24,480 keeping users happy and filling research 944 00:33:22,320 --> 00:33:26,519 contract terms wasn't too hard 945 00:33:24,480 --> 00:33:28,019 prior to Unix it was pretty 946 00:33:26,519 --> 00:33:30,120 straightforward I had to do now things 947 00:33:28,019 --> 00:33:32,640 more challenging more recently the last 948 00:33:30,120 --> 00:33:35,159 20 years there's a bunch of stuff that 949 00:33:32,640 --> 00:33:36,899 uh I've had to work on and most recently 950 00:33:35,159 --> 00:33:38,340 in my current employer he's the 951 00:33:36,899 --> 00:33:40,320 administration large data centers which 952 00:33:38,340 --> 00:33:42,960 has been a lot of fun a lot of really 953 00:33:40,320 --> 00:33:44,820 cool challenges over that time and uh 954 00:33:42,960 --> 00:33:46,740 you know I'm proud of my accomplishments 955 00:33:44,820 --> 00:33:48,659 but a lot of you guys are working on 956 00:33:46,740 --> 00:33:50,940 stuff that is much more complex and way 957 00:33:48,659 --> 00:33:53,940 more user-centric okay 958 00:33:50,940 --> 00:33:56,220 uh and uh hopefully what I've said here 959 00:33:53,940 --> 00:33:58,559 has helped but in the meantime my jobs 960 00:33:56,220 --> 00:34:00,600 provide reliable infrastructure for what 961 00:33:58,559 --> 00:34:03,620 you guys are doing so I'm I'm here for 962 00:34:00,600 --> 00:34:05,940 you down underneath 963 00:34:03,620 --> 00:34:07,860 so uh at this point we've got some time 964 00:34:05,940 --> 00:34:10,139 for questions and I'm going to show One 965 00:34:07,860 --> 00:34:12,179 technical success I mean sure my code 966 00:34:10,139 --> 00:34:14,040 has its problems on the planet Earth but 967 00:34:12,179 --> 00:34:16,139 so far on planet Mars has been a 968 00:34:14,040 --> 00:34:17,580 technical assess no complaints 969 00:34:16,139 --> 00:34:20,659 along with everybody else in Linux 970 00:34:17,580 --> 00:34:20,659 kernel back in the day 971 00:34:22,020 --> 00:34:26,240 so in any case thank you for time and 972 00:34:24,000 --> 00:34:26,240 attention 973 00:34:26,460 --> 00:34:29,240 over to you 974 00:34:29,520 --> 00:34:35,839 Israel we have about four minutes of 975 00:34:33,599 --> 00:34:35,839 questions 976 00:34:42,320 --> 00:34:45,560 on here 977 00:34:51,780 --> 00:34:55,740 I think disappeared um do you have any 978 00:34:54,060 --> 00:34:57,839 advice you mentioned earlier about uh 979 00:34:55,740 --> 00:34:59,940 customers sometimes don't know what they 980 00:34:57,839 --> 00:35:02,099 want do you have any particular advice 981 00:34:59,940 --> 00:35:04,080 on how to coax them into a state where 982 00:35:02,099 --> 00:35:06,240 they can know what they want 983 00:35:04,080 --> 00:35:09,260 Beyond becoming a subject matter expert 984 00:35:06,240 --> 00:35:09,260 in whatever it is they're doing 985 00:35:10,560 --> 00:35:14,520 unfortunately becoming a subject and a 986 00:35:12,900 --> 00:35:16,440 men are expert in what they're doing is 987 00:35:14,520 --> 00:35:17,760 the safest and probably most productive 988 00:35:16,440 --> 00:35:20,160 route that's what I'm getting at with 989 00:35:17,760 --> 00:35:21,180 the live among them but 990 00:35:20,160 --> 00:35:24,540 um 991 00:35:21,180 --> 00:35:27,180 other than that uh one thing is to is to 992 00:35:24,540 --> 00:35:29,400 what your communication skills learn how 993 00:35:27,180 --> 00:35:31,380 to how to understand what people are 994 00:35:29,400 --> 00:35:32,880 saying and what listen to them and talk 995 00:35:31,380 --> 00:35:35,339 to people who are customer matter 996 00:35:32,880 --> 00:35:37,260 experts the the thing is there's a bit 997 00:35:35,339 --> 00:35:38,820 of an impedance match so somebody who 998 00:35:37,260 --> 00:35:40,020 understands really well what the 999 00:35:38,820 --> 00:35:41,700 customer is wanting may or may not 1000 00:35:40,020 --> 00:35:44,339 understand what your capabilities 1001 00:35:41,700 --> 00:35:47,160 limitations are so they may ask for 1002 00:35:44,339 --> 00:35:48,599 something that's that's impossible or 1003 00:35:47,160 --> 00:35:49,560 they may fail to ask for something as 1004 00:35:48,599 --> 00:35:51,420 simple that would really help them 1005 00:35:49,560 --> 00:35:53,940 because they think it's complicated 1006 00:35:51,420 --> 00:35:56,099 uh so uh if you're not going to become a 1007 00:35:53,940 --> 00:35:57,660 subject matter expert that's safest okay 1008 00:35:56,099 --> 00:35:59,400 live among your users and become a 1009 00:35:57,660 --> 00:36:00,720 subject matter expert but if you can't 1010 00:35:59,400 --> 00:36:02,339 do that you need to really really 1011 00:36:00,720 --> 00:36:05,579 seriously work on your communication 1012 00:36:02,339 --> 00:36:07,859 skills and uh how to understand what 1013 00:36:05,579 --> 00:36:10,220 people are getting at some of the agile 1014 00:36:07,859 --> 00:36:12,599 methods can help they're not perfect but 1015 00:36:10,220 --> 00:36:14,700 if you can produce prototypes for them 1016 00:36:12,599 --> 00:36:16,380 and show them that can help sometimes 1017 00:36:14,700 --> 00:36:19,040 oh so I'm sorry I don't know of a silver 1018 00:36:16,380 --> 00:36:19,040 bullet 1019 00:36:19,920 --> 00:36:23,300 but that's what I know right now 1020 00:36:26,220 --> 00:36:34,640 what else all right I've got one for you 1021 00:36:29,339 --> 00:36:34,640 uh what cool what bug caused you them 1022 00:36:35,579 --> 00:36:40,140 I think well yeah what bug caused you 1023 00:36:38,040 --> 00:36:44,180 the most heartache 1024 00:36:40,140 --> 00:36:44,180 what bug caused me the most heartache 1025 00:36:44,520 --> 00:36:48,359 well the the one I talked about that 1026 00:36:46,619 --> 00:36:49,740 cost myself and my colleagues in 1027 00:36:48,359 --> 00:36:51,420 Memorial Day certainly took me a while 1028 00:36:49,740 --> 00:36:53,780 to live that one down I'll give you that 1029 00:36:51,420 --> 00:36:53,780 right now 1030 00:36:54,180 --> 00:36:57,560 oh uh 1031 00:36:57,960 --> 00:37:03,359 that's uh uh more recently I suppose so 1032 00:37:01,320 --> 00:37:04,800 what ones caused me trouble recently A 1033 00:37:03,359 --> 00:37:07,260 lot of them are bugs that didn't seem 1034 00:37:04,800 --> 00:37:10,020 like bugs to start with so uh one of the 1035 00:37:07,260 --> 00:37:11,640 things that happened uh upon uh joining 1036 00:37:10,020 --> 00:37:15,119 Facebook and meta lace at least that 1037 00:37:11,640 --> 00:37:17,160 they needed a no new variant of RCU for 1038 00:37:15,119 --> 00:37:19,859 tracing it for BPF the the Berkeley 1039 00:37:17,160 --> 00:37:22,079 packet folder software so we talked and 1040 00:37:19,859 --> 00:37:23,700 we put something together and unknown to 1041 00:37:22,079 --> 00:37:26,280 all of us that had a bug 1042 00:37:23,700 --> 00:37:27,720 uh they had worked for an art test it 1043 00:37:26,280 --> 00:37:28,800 seemed to work fine in production and 1044 00:37:27,720 --> 00:37:32,099 everything right 1045 00:37:28,800 --> 00:37:34,440 but uh uh what happened was it was uh it 1046 00:37:32,099 --> 00:37:36,359 allowed things to it did better than we 1047 00:37:34,440 --> 00:37:38,339 thought and they ran into some update 1048 00:37:36,359 --> 00:37:39,720 side performance issues which weren't 1049 00:37:38,339 --> 00:37:42,180 bugs until they hit them and suddenly it 1050 00:37:39,720 --> 00:37:43,500 was a bug and that came over it came 1051 00:37:42,180 --> 00:37:45,060 along twice there wasn't didn't really 1052 00:37:43,500 --> 00:37:47,339 cause me heartache I don't think but it 1053 00:37:45,060 --> 00:37:48,720 I'm just giving you uh another way these 1054 00:37:47,339 --> 00:37:50,520 things could happen 1055 00:37:48,720 --> 00:37:52,140 um I did it I did things as best I could 1056 00:37:50,520 --> 00:37:54,000 and and thought I never really thought I 1057 00:37:52,140 --> 00:37:56,940 did fine for six or eight months and 1058 00:37:54,000 --> 00:37:59,160 then these new things came up 1059 00:37:56,940 --> 00:38:01,500 okay uh and and I'm going to give 1060 00:37:59,160 --> 00:38:03,480 another example uh it's a silicone 1061 00:38:01,500 --> 00:38:04,920 manufacturers they produce the same as 1062 00:38:03,480 --> 00:38:06,480 the old skills before and suddenly got 1063 00:38:04,920 --> 00:38:08,220 tons of bug reports 1064 00:38:06,480 --> 00:38:10,560 because suddenly people were trying to 1065 00:38:08,220 --> 00:38:13,400 do that instead of sending dials to zoom 1066 00:38:10,560 --> 00:38:13,400 in on the traces 1067 00:38:14,460 --> 00:38:17,000 so does it just 1068 00:38:18,960 --> 00:38:24,780 we've got time for one more question 1069 00:38:23,220 --> 00:38:25,500 thank you 1070 00:38:24,780 --> 00:38:28,680 um 1071 00:38:25,500 --> 00:38:31,200 with you said open source is sort of 1072 00:38:28,680 --> 00:38:35,339 part of the solution uh and I agree with 1073 00:38:31,200 --> 00:38:37,500 you on the principle that open source 1074 00:38:35,339 --> 00:38:40,680 gives customers the ability to sort of 1075 00:38:37,500 --> 00:38:43,140 directly contribute but I've also seen 1076 00:38:40,680 --> 00:38:45,480 the sort of patches welcome attitude 1077 00:38:43,140 --> 00:38:48,420 from open source how do we 1078 00:38:45,480 --> 00:38:51,119 sort of make open source more friendly 1079 00:38:48,420 --> 00:38:53,160 to customers and users so they don't 1080 00:38:51,119 --> 00:38:55,200 feel like they the only way to 1081 00:38:53,160 --> 00:38:57,440 contribute is by fixing their own 1082 00:38:55,200 --> 00:38:57,440 problem 1083 00:38:58,020 --> 00:39:02,880 that's an excellent question you make an 1084 00:39:00,180 --> 00:39:04,440 excellent point at the same time the 1085 00:39:02,880 --> 00:39:06,119 fact that they can fix the problem by 1086 00:39:04,440 --> 00:39:07,800 contributing as an option they have with 1087 00:39:06,119 --> 00:39:09,720 open source that they don't have in many 1088 00:39:07,800 --> 00:39:12,000 cases proprietary software so let's not 1089 00:39:09,720 --> 00:39:14,040 lose sight of that but you're right we 1090 00:39:12,000 --> 00:39:18,480 need to do better okay 1091 00:39:14,040 --> 00:39:19,980 uh the CII uh initiative is one way we 1092 00:39:18,480 --> 00:39:22,200 help doing that by funding some people 1093 00:39:19,980 --> 00:39:24,300 to make it work better 1094 00:39:22,200 --> 00:39:26,460 and perhaps there's more things done 1095 00:39:24,300 --> 00:39:30,119 along those lines 1096 00:39:26,460 --> 00:39:30,119 uh the 1097 00:39:30,900 --> 00:39:36,599 uh it may be that we can do a better job 1098 00:39:34,560 --> 00:39:38,520 of reaching out depending on what the 1099 00:39:36,599 --> 00:39:39,960 bug is okay I mean I'm not suggesting 1100 00:39:38,520 --> 00:39:41,760 doing this for an RCU bug all right 1101 00:39:39,960 --> 00:39:43,680 although I've come across some really 1102 00:39:41,760 --> 00:39:45,839 brilliant students in my time and some 1103 00:39:43,680 --> 00:39:47,160 of them a few of them a very few have 1104 00:39:45,839 --> 00:39:48,180 managed to actually jump in and do this 1105 00:39:47,160 --> 00:39:51,960 kind of thing 1106 00:39:48,180 --> 00:39:54,420 but perhaps we need to uh keep track of 1107 00:39:51,960 --> 00:39:56,339 the requests and see if we can get 1108 00:39:54,420 --> 00:39:57,720 people to come in people who are 1109 00:39:56,339 --> 00:39:59,700 interested in learning to use those 1110 00:39:57,720 --> 00:40:01,800 learning processes for some of the bugs 1111 00:39:59,700 --> 00:40:04,920 now that's not a full solution 1112 00:40:01,800 --> 00:40:06,720 uh but uh it's a problem bigger than I 1113 00:40:04,920 --> 00:40:08,160 am 1114 00:40:06,720 --> 00:40:11,180 but the problem we do need to take 1115 00:40:08,160 --> 00:40:11,180 seriously I agree with you 1116 00:40:11,520 --> 00:40:15,960 all right thank you Paul we'll have to 1117 00:40:13,800 --> 00:40:17,640 bring it to an end now we've got the 1118 00:40:15,960 --> 00:40:19,859 next session starting in about nine 1119 00:40:17,640 --> 00:40:21,119 minutes thank you again 1120 00:40:19,859 --> 00:40:22,900 thank you all for your time and 1121 00:40:21,119 --> 00:40:28,619 attention 1122 00:40:22,900 --> 00:40:28,619 [Applause]