1 00:00:00,000 --> 00:00:08,469 foreign 2 00:00:00,500 --> 00:00:08,469 [Music] 3 00:00:12,790 --> 00:00:16,020 [Music] 4 00:00:14,160 --> 00:00:18,539 so many people here 5 00:00:16,020 --> 00:00:19,800 okay so let me introduce 6 00:00:18,539 --> 00:00:24,240 our speaker 7 00:00:19,800 --> 00:00:28,080 Rohan McClure so Rohan is a first year 8 00:00:24,240 --> 00:00:29,160 grad at IBM mostly hacking on the Linux 9 00:00:28,080 --> 00:00:32,160 kernel 10 00:00:29,160 --> 00:00:35,040 the Linux kernel strived to be bad work 11 00:00:32,160 --> 00:00:36,420 compatible for user space 12 00:00:35,040 --> 00:00:40,440 today 13 00:00:36,420 --> 00:00:43,260 Rohan will talk about the time that he 14 00:00:40,440 --> 00:00:45,780 accidentally broke that promise 15 00:00:43,260 --> 00:00:49,909 please welcome Rohan 16 00:00:45,780 --> 00:00:49,909 [Applause] 17 00:00:50,280 --> 00:00:55,620 two apologies before I begin I was awake 18 00:00:53,399 --> 00:00:58,199 in Canberra at four at 4am this morning 19 00:00:55,620 --> 00:00:59,879 and so I may be a bit loopy the other 20 00:00:58,199 --> 00:01:01,860 apology is for presenting a talk about 21 00:00:59,879 --> 00:01:04,379 the Linux kernel from an Apple laptop 22 00:01:01,860 --> 00:01:06,420 but um thankfully for your sake I save 23 00:01:04,379 --> 00:01:09,420 you the second hand stress of watching 24 00:01:06,420 --> 00:01:11,520 me X rander so um 25 00:01:09,420 --> 00:01:13,680 I'd like to begin today with an 26 00:01:11,520 --> 00:01:15,780 acknowledgment of country I begin today 27 00:01:13,680 --> 00:01:17,720 by acknowledging the Warren Jerry Roy 28 00:01:15,780 --> 00:01:20,100 warrang people of the colon Nation 29 00:01:17,720 --> 00:01:22,439 traditional custodians of the land on 30 00:01:20,100 --> 00:01:23,820 which We Gather today and I pay my 31 00:01:22,439 --> 00:01:26,159 respects to their Elders past and 32 00:01:23,820 --> 00:01:27,720 present I extend that respect to 33 00:01:26,159 --> 00:01:29,939 Aboriginal and Torres Strait Islander 34 00:01:27,720 --> 00:01:32,159 peoples here today 35 00:01:29,939 --> 00:01:34,500 hi my name is Rohan thanks for coming to 36 00:01:32,159 --> 00:01:36,180 my talk so for the last year in a bit 37 00:01:34,500 --> 00:01:40,259 I've been hacking on the colonel as well 38 00:01:36,180 --> 00:01:42,780 as openssl and uh just in my first patch 39 00:01:40,259 --> 00:01:45,119 series I managed to really max out on 40 00:01:42,780 --> 00:01:48,799 the number of people I could irritate 41 00:01:45,119 --> 00:01:51,420 um so uh I've made my first patch series 42 00:01:48,799 --> 00:01:52,799 and a month after it's upstreamed I get 43 00:01:51,420 --> 00:01:55,740 this sort of feedback 44 00:01:52,799 --> 00:01:57,899 this breaks power PC 32-bit 45 00:01:55,740 --> 00:01:59,640 uh 46 00:01:57,899 --> 00:02:03,360 probably breaks every Sysco with a 47 00:01:59,640 --> 00:02:04,320 64-bit argument all right great great 48 00:02:03,360 --> 00:02:06,540 um 49 00:02:04,320 --> 00:02:08,220 sorts of damage exactly well a bit of an 50 00:02:06,540 --> 00:02:10,440 introduction to the kernel perhaps it 51 00:02:08,220 --> 00:02:13,140 needs no product no introduction but 52 00:02:10,440 --> 00:02:14,480 I'll give a a brief one at a thousand 53 00:02:13,140 --> 00:02:16,980 feet 54 00:02:14,480 --> 00:02:18,840 there sets of sort of dichotomy between 55 00:02:16,980 --> 00:02:20,640 kernel space which is privileged and 56 00:02:18,840 --> 00:02:23,580 user space which is basically 57 00:02:20,640 --> 00:02:25,020 unprivileged and essentially two 58 00:02:23,580 --> 00:02:26,940 security rings in this sort of 59 00:02:25,020 --> 00:02:28,739 Monolithic kernel 60 00:02:26,940 --> 00:02:30,720 um and so user space relies heavily on 61 00:02:28,739 --> 00:02:32,640 the Kernel then for its Process 62 00:02:30,720 --> 00:02:36,360 Management for its file system access 63 00:02:32,640 --> 00:02:39,420 for just general access to resources 64 00:02:36,360 --> 00:02:41,940 Etc and so with that comes a number of 65 00:02:39,420 --> 00:02:44,459 software contracts and so if you were 66 00:02:41,940 --> 00:02:45,800 let's say trying to break all of user 67 00:02:44,459 --> 00:02:48,239 space that is all 68 00:02:45,800 --> 00:02:49,860 unprivileged programs on your system 69 00:02:48,239 --> 00:02:53,280 what might you do well you might 70 00:02:49,860 --> 00:02:55,620 actually violate these contracts 71 00:02:53,280 --> 00:02:57,959 um for those of us unfamiliar with the 72 00:02:55,620 --> 00:03:00,300 idea of a software contract essentially 73 00:02:57,959 --> 00:03:03,420 if you've programmed it all you've used 74 00:03:00,300 --> 00:03:05,700 some sort of API an API stands for an 75 00:03:03,420 --> 00:03:07,620 application programming interface and 76 00:03:05,700 --> 00:03:08,879 critically I'm going to draw on this 77 00:03:07,620 --> 00:03:12,060 distinction later 78 00:03:08,879 --> 00:03:14,760 it's a sort of contract on source code 79 00:03:12,060 --> 00:03:16,680 it's usually not difficult to consume 80 00:03:14,760 --> 00:03:19,379 someone else's Library when you have 81 00:03:16,680 --> 00:03:22,080 tooling which supports you doing so 82 00:03:19,379 --> 00:03:25,019 that is there's a common ABI which I'll 83 00:03:22,080 --> 00:03:26,940 get to in a moment so that you can call 84 00:03:25,019 --> 00:03:28,920 library code provided the parameters 85 00:03:26,940 --> 00:03:31,200 which the documentation if there is 86 00:03:28,920 --> 00:03:34,620 documentation or examples if there are 87 00:03:31,200 --> 00:03:38,280 examples or code if there are is code 88 00:03:34,620 --> 00:03:41,459 um you can usually call that Library 89 00:03:38,280 --> 00:03:43,080 um subject to its specification and an 90 00:03:41,459 --> 00:03:45,659 API implies that that library is 91 00:03:43,080 --> 00:03:48,840 intended to be stable to some extent and 92 00:03:45,659 --> 00:03:51,080 it has exported functions and you can 93 00:03:48,840 --> 00:03:54,120 call it without risk of personal injury 94 00:03:51,080 --> 00:03:57,000 at some rate but uh contrast that 95 00:03:54,120 --> 00:03:58,920 however to an ABI an application binary 96 00:03:57,000 --> 00:04:01,260 interface 97 00:03:58,920 --> 00:04:03,959 what keeps programs which were compiled 98 00:04:01,260 --> 00:04:06,239 yesterday or just as easily 20 years ago 99 00:04:03,959 --> 00:04:07,980 what keeps them running on Modern 100 00:04:06,239 --> 00:04:10,080 systems well that's that's an 101 00:04:07,980 --> 00:04:12,780 application binary interface and these 102 00:04:10,080 --> 00:04:15,720 are extremely prescriptive this covers 103 00:04:12,780 --> 00:04:17,699 stuff like calling Convention as well as 104 00:04:15,720 --> 00:04:19,260 the focus of this talk is on the syscall 105 00:04:17,699 --> 00:04:22,199 interface so I'll get to that in a 106 00:04:19,260 --> 00:04:23,759 moment but um ABI is extremely 107 00:04:22,199 --> 00:04:26,580 prescriptive is what's worth knowing 108 00:04:23,759 --> 00:04:29,040 it's down to exact register allocation 109 00:04:26,580 --> 00:04:29,639 and stack layout 110 00:04:29,040 --> 00:04:32,460 um 111 00:04:29,639 --> 00:04:34,620 uh with an API you can call your library 112 00:04:32,460 --> 00:04:36,600 and as much as your tool chain has 113 00:04:34,620 --> 00:04:38,820 support for that perhaps a foreign 114 00:04:36,600 --> 00:04:41,340 function interface with a bi it 115 00:04:38,820 --> 00:04:43,979 prescribes exactly how binaries look and 116 00:04:41,340 --> 00:04:45,540 how the kernel must respond so there are 117 00:04:43,979 --> 00:04:48,000 actually a number of apis it's not 118 00:04:45,540 --> 00:04:51,060 always just user space to Kernel 119 00:04:48,000 --> 00:04:53,100 um usually when it comes to API the base 120 00:04:51,060 --> 00:04:55,500 right is a what's your object format 121 00:04:53,100 --> 00:04:57,780 which you're compiling to 122 00:04:55,500 --> 00:05:00,360 um if you're compiling code for modern 123 00:04:57,780 --> 00:05:03,360 Linux that's almost always elf the 124 00:05:00,360 --> 00:05:08,460 executable and linkable format 125 00:05:03,360 --> 00:05:10,860 on top of Elf you get further uh uh 126 00:05:08,460 --> 00:05:13,139 further specifications which come from 127 00:05:10,860 --> 00:05:15,360 your choice of architecture the machine 128 00:05:13,139 --> 00:05:18,900 which you're compiling for architecture 129 00:05:15,360 --> 00:05:20,360 is sorry ABI by the way is not ISO 130 00:05:18,900 --> 00:05:23,220 instruction set 131 00:05:20,360 --> 00:05:25,919 ABI defines things like calling 132 00:05:23,220 --> 00:05:28,259 convention for example and then even 133 00:05:25,919 --> 00:05:30,900 operating system can further 134 00:05:28,259 --> 00:05:34,620 constrain 135 00:05:30,900 --> 00:05:38,340 ABI that is how libraries and their 136 00:05:34,620 --> 00:05:40,560 consumers operate but um 137 00:05:38,340 --> 00:05:42,720 the particular level of ABI we're going 138 00:05:40,560 --> 00:05:46,139 to talk about is What's called the 139 00:05:42,720 --> 00:05:49,800 kernel to user space API and that one is 140 00:05:46,139 --> 00:05:51,780 stable intended to be stable there's 141 00:05:49,800 --> 00:05:53,820 another ABI in the kernel which is the 142 00:05:51,780 --> 00:05:56,160 kernel to modules ABI but that's not the 143 00:05:53,820 --> 00:05:58,580 one I broke so we're going to talk about 144 00:05:56,160 --> 00:05:58,580 this one 145 00:05:58,860 --> 00:06:02,759 here 146 00:06:00,000 --> 00:06:05,580 so about the Linux kernel to user space 147 00:06:02,759 --> 00:06:07,620 API this particular bit of the API is 148 00:06:05,580 --> 00:06:09,780 about the sort of tools which arbitrary 149 00:06:07,620 --> 00:06:12,180 binaries on a Linux system should be 150 00:06:09,780 --> 00:06:13,320 able to rely upon existing on their 151 00:06:12,180 --> 00:06:15,780 system 152 00:06:13,320 --> 00:06:18,360 so they include things like the vdso 153 00:06:15,780 --> 00:06:21,600 that's the virtual Dynamic shared object 154 00:06:18,360 --> 00:06:25,020 it's a library which is always mapped 155 00:06:21,600 --> 00:06:27,360 into process memory whenever you load 156 00:06:25,020 --> 00:06:30,539 any sort of Elf file 157 00:06:27,360 --> 00:06:32,280 as well as that this Cisco interface 158 00:06:30,539 --> 00:06:36,180 I'm going to talk about more about those 159 00:06:32,280 --> 00:06:39,919 in a moment the video so actually uh 160 00:06:36,180 --> 00:06:42,419 exists to circumvent some of the 161 00:06:39,919 --> 00:06:44,039 performance ramifications of CIS calls 162 00:06:42,419 --> 00:06:45,240 which almost always imply a sort of 163 00:06:44,039 --> 00:06:49,560 context switch 164 00:06:45,240 --> 00:06:52,199 and so Cisco system calls have been 165 00:06:49,560 --> 00:06:54,539 opted out so that your Lipsy can can 166 00:06:52,199 --> 00:06:56,460 choose to call things via the video so 167 00:06:54,539 --> 00:06:57,720 if appropriate 168 00:06:56,460 --> 00:07:00,060 um 169 00:06:57,720 --> 00:07:02,880 as well as that there are also file 170 00:07:00,060 --> 00:07:04,680 systems so Unix participates in a sort 171 00:07:02,880 --> 00:07:09,080 of sorry Linux participates in a sort of 172 00:07:04,680 --> 00:07:12,180 Unix file based ideology and so 173 00:07:09,080 --> 00:07:15,000 access to your csfs procfs and the 174 00:07:12,180 --> 00:07:17,759 little the little known config FS are 175 00:07:15,000 --> 00:07:20,759 also deemed to be Abi this talk is going 176 00:07:17,759 --> 00:07:24,060 to focus on ciscalls however so 177 00:07:20,759 --> 00:07:25,680 um what actually is a Cisco I think at a 178 00:07:24,060 --> 00:07:27,599 thousand feet again the way I describe 179 00:07:25,680 --> 00:07:30,419 it is a sort of pseudo function call 180 00:07:27,599 --> 00:07:33,660 that is uh when it comes to control flow 181 00:07:30,419 --> 00:07:35,580 the the same pattern exists so your your 182 00:07:33,660 --> 00:07:37,400 code will eventually yield when you 183 00:07:35,580 --> 00:07:40,380 initiate assist call 184 00:07:37,400 --> 00:07:41,639 when the syscall should terminate 185 00:07:40,380 --> 00:07:44,160 hopefully 186 00:07:41,639 --> 00:07:48,419 um and then uh execution should be 187 00:07:44,160 --> 00:07:50,280 resumed at your calling process and so 188 00:07:48,419 --> 00:07:52,440 Linux is a bit interesting I'll talk 189 00:07:50,280 --> 00:07:53,160 more about this in a moment but this is 190 00:07:52,440 --> 00:07:55,319 um 191 00:07:53,160 --> 00:07:58,199 this is the sort of code which the 192 00:07:55,319 --> 00:07:59,580 kernel is expected to support right so I 193 00:07:58,199 --> 00:08:03,060 don't know how familiar you are with 194 00:07:59,580 --> 00:08:05,099 power assembly I walk you walk uh this 195 00:08:03,060 --> 00:08:09,000 through with you 196 00:08:05,099 --> 00:08:11,520 um so uh as per the calling convention 197 00:08:09,000 --> 00:08:13,680 we're going to specify in r0 what sort 198 00:08:11,520 --> 00:08:16,319 of Cisco we want to do clearly we want 199 00:08:13,680 --> 00:08:19,860 to do an exit first parameter is zero 200 00:08:16,319 --> 00:08:21,479 exit with status zero says call 201 00:08:19,860 --> 00:08:22,740 um we have two Cisco instructions on 202 00:08:21,479 --> 00:08:24,960 power 203 00:08:22,740 --> 00:08:26,520 um your Lipsy will choose the most 204 00:08:24,960 --> 00:08:28,080 recent one 205 00:08:26,520 --> 00:08:31,319 uh 206 00:08:28,080 --> 00:08:33,659 in short this Cisco instruction is a way 207 00:08:31,319 --> 00:08:38,339 for unprivileged code 208 00:08:33,659 --> 00:08:40,339 to cause the kernel to facilitate some 209 00:08:38,339 --> 00:08:44,880 sort of critical function call 210 00:08:40,339 --> 00:08:48,000 and so nearly always the way this is 211 00:08:44,880 --> 00:08:51,180 implemented in Hardware is by means of 212 00:08:48,000 --> 00:08:52,200 an interrupt this Cisco is a synchronous 213 00:08:51,180 --> 00:08:54,480 interrupt 214 00:08:52,200 --> 00:08:56,880 in general your interrupt Hardware works 215 00:08:54,480 --> 00:08:59,100 for events which can fire at any time 216 00:08:56,880 --> 00:09:00,120 but this is an interrupt instigated by 217 00:08:59,100 --> 00:09:02,399 code 218 00:09:00,120 --> 00:09:04,019 and that interrupt Handler is expected 219 00:09:02,399 --> 00:09:07,399 to fulfill the contract of whatever 220 00:09:04,019 --> 00:09:07,399 ciscal you just called 221 00:09:07,980 --> 00:09:13,320 a really critical Cisco is fork for 222 00:09:11,220 --> 00:09:16,200 example and this is just to motivate how 223 00:09:13,320 --> 00:09:19,459 how privileged how much reflection on 224 00:09:16,200 --> 00:09:22,680 program data the assist calls end up 225 00:09:19,459 --> 00:09:24,660 invoking so this is a process in 226 00:09:22,680 --> 00:09:26,519 threading controls this call and it's 227 00:09:24,660 --> 00:09:30,000 being abstracted out by libsee so I can 228 00:09:26,519 --> 00:09:32,040 just call the fork Fork function 229 00:09:30,000 --> 00:09:34,680 um and so Fork has to look at the 230 00:09:32,040 --> 00:09:36,779 entirety of your process state 231 00:09:34,680 --> 00:09:38,580 and it what it does is it it duplicates 232 00:09:36,779 --> 00:09:40,200 that thread 233 00:09:38,580 --> 00:09:41,700 um that is it duplicates its memory 234 00:09:40,200 --> 00:09:43,500 mappings and it keeps the same 235 00:09:41,700 --> 00:09:47,399 instruction pointer in the resultant 236 00:09:43,500 --> 00:09:49,620 thread and so when the threads wake up 237 00:09:47,399 --> 00:09:51,720 they both resume execution after the 238 00:09:49,620 --> 00:09:54,060 call site and you'll notice that we take 239 00:09:51,720 --> 00:09:56,640 a return value bid 240 00:09:54,060 --> 00:09:58,800 um that's our process ID it's zero if 241 00:09:56,640 --> 00:10:01,220 we're Frankenstein's monster it's the 242 00:09:58,800 --> 00:10:03,420 name of Frankenstein's monster otherwise 243 00:10:01,220 --> 00:10:05,880 and you you branch on that condition 244 00:10:03,420 --> 00:10:08,760 figure out whether am I the child thread 245 00:10:05,880 --> 00:10:10,860 or am I or am I the parent 246 00:10:08,760 --> 00:10:12,720 um and so ciscals yeah they're a sort of 247 00:10:10,860 --> 00:10:15,300 function they or rather they have a 248 00:10:12,720 --> 00:10:17,640 function like kind of interface but 249 00:10:15,300 --> 00:10:19,680 clearly but interestingly the the 250 00:10:17,640 --> 00:10:22,260 function calling ABI and elf is not 251 00:10:19,680 --> 00:10:25,200 actually doesn't 100 line up with um 252 00:10:22,260 --> 00:10:26,880 with uh Cisco calling conventions but 253 00:10:25,200 --> 00:10:29,040 clearly this is a very privileged 254 00:10:26,880 --> 00:10:32,220 operation which I 255 00:10:29,040 --> 00:10:35,160 uh well a few more bits of a side 256 00:10:32,220 --> 00:10:37,680 uh before we cut to the chase so Linux 257 00:10:35,160 --> 00:10:41,220 is a bit interesting 258 00:10:37,680 --> 00:10:43,800 um so uh above you can correct me on the 259 00:10:41,220 --> 00:10:45,720 Nuance of this later but above other 260 00:10:43,800 --> 00:10:49,500 free and open source operating systems 261 00:10:45,720 --> 00:10:53,160 Linux has a particularly High commitment 262 00:10:49,500 --> 00:10:55,620 to keep a stable Cisco interface that's 263 00:10:53,160 --> 00:10:57,720 because you don't usually as a 264 00:10:55,620 --> 00:11:00,480 programmer invoke sys calls directly you 265 00:10:57,720 --> 00:11:01,920 might use libsy and uh Linux doesn't 266 00:11:00,480 --> 00:11:04,079 doesn't 267 00:11:01,920 --> 00:11:06,240 doesn't in principle prefer a a 268 00:11:04,079 --> 00:11:08,399 canonical lepsy implementation and so 269 00:11:06,240 --> 00:11:11,279 the Cisco interface is deemed to be it's 270 00:11:08,399 --> 00:11:13,200 deemed to be authoritative 271 00:11:11,279 --> 00:11:16,380 um certainly your proprietary operating 272 00:11:13,200 --> 00:11:19,019 systems avoid syscalls at all for user 273 00:11:16,380 --> 00:11:20,760 space so there's lib system on on Mac OS 274 00:11:19,019 --> 00:11:21,899 Cisco interface is not deemed to be 275 00:11:20,760 --> 00:11:25,860 stable 276 00:11:21,899 --> 00:11:28,440 um and then you have nt.dll on Windows 277 00:11:25,860 --> 00:11:30,360 um but yeah Linux has this kind of 278 00:11:28,440 --> 00:11:32,760 extreme attention to detail when it 279 00:11:30,360 --> 00:11:34,860 comes to ABI it's best sum up summed up 280 00:11:32,760 --> 00:11:37,200 by this one particular Linus quote which 281 00:11:34,860 --> 00:11:39,959 is of course in all caps we do not break 282 00:11:37,200 --> 00:11:42,680 user space and uh so that's what I did 283 00:11:39,959 --> 00:11:46,440 all right well 284 00:11:42,680 --> 00:11:48,660 all right um so uh so we have our 285 00:11:46,440 --> 00:11:50,700 interrupt code and um because the kernel 286 00:11:48,660 --> 00:11:53,760 is essentially just this stitch together 287 00:11:50,700 --> 00:11:56,240 big old massive elf binary stitched 288 00:11:53,760 --> 00:11:59,880 together by link scripts 289 00:11:56,240 --> 00:12:02,220 we have these symbols in here for the 290 00:11:59,880 --> 00:12:04,560 actual body of a Cisco our interrupt 291 00:12:02,220 --> 00:12:07,620 code is going to call this body their 292 00:12:04,560 --> 00:12:10,440 inter interrupts are handled by kernel 293 00:12:07,620 --> 00:12:13,440 um and so they used to have uh this sort 294 00:12:10,440 --> 00:12:15,839 of calling convention however to give a 295 00:12:13,440 --> 00:12:20,820 little bit more freedom to how we write 296 00:12:15,839 --> 00:12:23,060 our interrupt code we decided to use 297 00:12:20,820 --> 00:12:25,440 some ugly and horrible macros 298 00:12:23,060 --> 00:12:27,300 to change the convo calling convention 299 00:12:25,440 --> 00:12:30,440 to look like this you don't need to know 300 00:12:27,300 --> 00:12:32,579 what a PT regs is it comes from p-trace 301 00:12:30,440 --> 00:12:33,720 I can't tell you any more of the 302 00:12:32,579 --> 00:12:35,519 entomology 303 00:12:33,720 --> 00:12:36,839 etymology Dragon 304 00:12:35,519 --> 00:12:39,240 um there 305 00:12:36,839 --> 00:12:41,279 um it's it's actual in actual fact just 306 00:12:39,240 --> 00:12:44,700 a structure which represents the 307 00:12:41,279 --> 00:12:46,380 registers of the caller of the Cisco 308 00:12:44,700 --> 00:12:48,779 because it's an interrupt right so we're 309 00:12:46,380 --> 00:12:51,540 we're there's a entirely new context 310 00:12:48,779 --> 00:12:54,300 between user user space code which 311 00:12:51,540 --> 00:12:56,579 called us and kernel code we keep that 312 00:12:54,300 --> 00:12:58,320 context and we're expected to save and 313 00:12:56,579 --> 00:13:00,560 restore it accordingly hence a context 314 00:12:58,320 --> 00:13:00,560 switch 315 00:13:00,899 --> 00:13:04,380 um and so we want to change that calling 316 00:13:02,279 --> 00:13:07,380 convention uh 317 00:13:04,380 --> 00:13:10,380 stay cleared for a bit of motivation on 318 00:13:07,380 --> 00:13:12,839 why we do that exactly but the the kind 319 00:13:10,380 --> 00:13:15,180 of crown jewels is uh I've been inducted 320 00:13:12,839 --> 00:13:18,120 onto the hardening team and uh so we 321 00:13:15,180 --> 00:13:19,860 decided we'd do a hardening and thought 322 00:13:18,120 --> 00:13:24,300 of that included 323 00:13:19,860 --> 00:13:26,880 um uh we wanted to essentially uh 324 00:13:24,300 --> 00:13:27,540 sanitize whatever 325 00:13:26,880 --> 00:13:29,899 um 326 00:13:27,540 --> 00:13:33,000 uh whatever continuity there is between 327 00:13:29,899 --> 00:13:36,480 user space code and kernel code and so 328 00:13:33,000 --> 00:13:39,060 whenever user space can directly or 329 00:13:36,480 --> 00:13:42,180 directly enough in invoke an interrupt 330 00:13:39,060 --> 00:13:44,339 of any sort we don't want the 331 00:13:42,180 --> 00:13:46,500 architected register state of that user 332 00:13:44,339 --> 00:13:48,300 code in any way influencing the 333 00:13:46,500 --> 00:13:50,459 speculative runtime behavior of the 334 00:13:48,300 --> 00:13:52,380 kernel and so what do we do we have to 335 00:13:50,459 --> 00:13:53,480 save even more registers for this 336 00:13:52,380 --> 00:13:56,639 mitigation 337 00:13:53,480 --> 00:13:59,639 we sanitize those registers we set them 338 00:13:56,639 --> 00:14:01,560 to zero and then we restore those 339 00:13:59,639 --> 00:14:04,260 registers before we return 340 00:14:01,560 --> 00:14:05,639 and if it's not obvious to you that 341 00:14:04,260 --> 00:14:09,540 there's an exploit kind of in the 342 00:14:05,639 --> 00:14:12,180 waiting here then um that's fine it 343 00:14:09,540 --> 00:14:14,579 wasn't obvious to me either 344 00:14:12,180 --> 00:14:17,160 um but let's just sort of dive into the 345 00:14:14,579 --> 00:14:20,279 activity of Kernel hardening right so 346 00:14:17,160 --> 00:14:23,399 there's been a sort of seismic shift in 347 00:14:20,279 --> 00:14:25,500 the world of security research with 348 00:14:23,399 --> 00:14:26,459 Spectra and meltdown which is to say 349 00:14:25,500 --> 00:14:29,720 that 350 00:14:26,459 --> 00:14:33,000 um user space may not be able to elect 351 00:14:29,720 --> 00:14:36,120 leak secrets from the kernel directly 352 00:14:33,000 --> 00:14:39,300 but it might be able to do so in an 353 00:14:36,120 --> 00:14:41,220 implied fashion through timing attacks 354 00:14:39,300 --> 00:14:43,199 and these timing attacks become 355 00:14:41,220 --> 00:14:46,160 increasingly available with an awareness 356 00:14:43,199 --> 00:14:48,120 of of the reality which is that 357 00:14:46,160 --> 00:14:51,420 microarchitectures are necessarily 358 00:14:48,120 --> 00:14:53,160 speculative on on super scalar processes 359 00:14:51,420 --> 00:14:55,380 and um 360 00:14:53,160 --> 00:14:56,820 and so what we're doing is uh is 361 00:14:55,380 --> 00:14:59,160 hardening measures right we're just 362 00:14:56,820 --> 00:15:01,440 trying to produce isolation between user 363 00:14:59,160 --> 00:15:03,540 code and kernel code and so I can't 364 00:15:01,440 --> 00:15:06,300 actually guarantee you that I I've 365 00:15:03,540 --> 00:15:09,600 foreseen exploits per se 366 00:15:06,300 --> 00:15:11,220 but um it's a bit tedious to just chase 367 00:15:09,600 --> 00:15:13,680 after exploits when they become 368 00:15:11,220 --> 00:15:15,660 discovered right you'd rather Harden the 369 00:15:13,680 --> 00:15:18,000 kernel against against the prospect if 370 00:15:15,660 --> 00:15:21,240 people gluing together info leaks and 371 00:15:18,000 --> 00:15:24,420 gadgets and one day creating an exploit 372 00:15:21,240 --> 00:15:26,459 and so a bit of a bit of a of a red flag 373 00:15:24,420 --> 00:15:28,139 which alerts you to to the fact that 374 00:15:26,459 --> 00:15:31,920 this might need hardening is that there 375 00:15:28,139 --> 00:15:37,260 is shared architect at State uh that is 376 00:15:31,920 --> 00:15:40,620 the registers of the calling code are 377 00:15:37,260 --> 00:15:43,620 um are in fact retained although cleared 378 00:15:40,620 --> 00:15:45,959 pretty quickly by by kernel code 379 00:15:43,620 --> 00:15:47,519 if um by the way if there's a if there's 380 00:15:45,959 --> 00:15:50,160 an architectural bug we've got bigger 381 00:15:47,519 --> 00:15:52,500 problems so uh we're on the micro 382 00:15:50,160 --> 00:15:55,440 architectural level here 383 00:15:52,500 --> 00:15:57,060 um okay so a quick aside as to how Cisco 384 00:15:55,440 --> 00:15:59,820 wrappers are implemented in the kernel I 385 00:15:57,060 --> 00:16:01,980 promise to not show you too much code 386 00:15:59,820 --> 00:16:07,139 um so there's this Cisco Define for 387 00:16:01,980 --> 00:16:09,899 macro and so this kind of creates the uh 388 00:16:07,139 --> 00:16:12,480 signature the the Prototype what's it 389 00:16:09,899 --> 00:16:16,620 called and see the the prototype for 390 00:16:12,480 --> 00:16:18,839 um uh for the phallicate F allocates uh 391 00:16:16,620 --> 00:16:20,820 Cisco Handler 392 00:16:18,839 --> 00:16:22,440 um and it's going to prepare this CIS 393 00:16:20,820 --> 00:16:23,760 underscore prefix to the beginning 394 00:16:22,440 --> 00:16:25,800 because no one should be directly 395 00:16:23,760 --> 00:16:27,839 calling it by name 396 00:16:25,800 --> 00:16:30,000 um and so if you're going to create your 397 00:16:27,839 --> 00:16:33,060 own Cisco if you're mucking about the 398 00:16:30,000 --> 00:16:35,940 kernel you'll Define this is called like 399 00:16:33,060 --> 00:16:39,060 such and then you'll add a number in a 400 00:16:35,940 --> 00:16:41,820 in a table to say when r0 is this number 401 00:16:39,060 --> 00:16:42,839 in kernel please call this so it's cool 402 00:16:41,820 --> 00:16:45,540 for me 403 00:16:42,839 --> 00:16:48,060 what we're going to do is uh in order to 404 00:16:45,540 --> 00:16:51,120 get a new calling convention see Cisco 405 00:16:48,060 --> 00:16:53,579 handlers uh like any good C program or 406 00:16:51,120 --> 00:16:55,560 would we're going to write an ugly macro 407 00:16:53,579 --> 00:16:56,940 um and uh that ugly macro better 408 00:16:55,560 --> 00:17:01,440 implement the calling convention 409 00:16:56,940 --> 00:17:03,660 perfectly with no bugs ever for 410 00:17:01,440 --> 00:17:06,900 our Cisco Handler and so clearly this is 411 00:17:03,660 --> 00:17:09,540 this is where my bug came from 412 00:17:06,900 --> 00:17:12,079 um as a bit of an aside so I work on on 413 00:17:09,540 --> 00:17:15,780 Power Systems Linux on power 414 00:17:12,079 --> 00:17:18,419 we have our own subsystem arch power PC 415 00:17:15,780 --> 00:17:21,000 and uh there are about 12 million lines 416 00:17:18,419 --> 00:17:23,040 of code in the kernel uh not including 417 00:17:21,000 --> 00:17:26,459 modules not including user space 418 00:17:23,040 --> 00:17:29,460 libraries tests Etc about four million 419 00:17:26,459 --> 00:17:31,679 of those lines is architecture specific 420 00:17:29,460 --> 00:17:35,760 and about half a million of those lines 421 00:17:31,679 --> 00:17:37,200 sit in Arch power PC so it's a big 422 00:17:35,760 --> 00:17:39,660 subsystem 423 00:17:37,200 --> 00:17:41,840 why is it so big well 424 00:17:39,660 --> 00:17:45,419 you could hint to a few things I suppose 425 00:17:41,840 --> 00:17:48,900 for comparison arm has spun off its 426 00:17:45,419 --> 00:17:51,419 32-bit and 64-bit subsystems 427 00:17:48,900 --> 00:17:53,760 um and so we're like x86 in that respect 428 00:17:51,419 --> 00:17:57,120 we they're both under a common 429 00:17:53,760 --> 00:17:58,799 subdirectory but unlike x86 where by 430 00:17:57,120 --> 00:18:00,419 Indian 431 00:17:58,799 --> 00:18:02,760 so you end up with this kind of horrible 432 00:18:00,419 --> 00:18:04,320 Matrix of kind of platforms running on 433 00:18:02,760 --> 00:18:07,140 Arch powerpc 434 00:18:04,320 --> 00:18:09,140 and sometimes the same code Services all 435 00:18:07,140 --> 00:18:12,740 four of these platforms 436 00:18:09,140 --> 00:18:15,660 Ibis are fiddly enough and so with four 437 00:18:12,740 --> 00:18:17,220 sub-architectures of sorts uh yeah 438 00:18:15,660 --> 00:18:20,160 necessarily there there are going to be 439 00:18:17,220 --> 00:18:22,080 at least four abis I need to implement 440 00:18:20,160 --> 00:18:23,460 an API is Right are meant to be 441 00:18:22,080 --> 00:18:28,160 implemented by Tool chains because 442 00:18:23,460 --> 00:18:28,160 they're fiddly and not buy crappy macros 443 00:18:28,200 --> 00:18:33,240 so how did I do in my patch series my 444 00:18:30,900 --> 00:18:34,740 first patch series on the Kernel 445 00:18:33,240 --> 00:18:37,320 um well there's a little babby kernel 446 00:18:34,740 --> 00:18:39,419 developer I had to touch a lot of 447 00:18:37,320 --> 00:18:42,360 um untouched Legacy code 448 00:18:39,419 --> 00:18:45,059 and uh I started off with six patches my 449 00:18:42,360 --> 00:18:47,340 first revision and uh if you don't know 450 00:18:45,059 --> 00:18:51,059 in the kernel we um 451 00:18:47,340 --> 00:18:53,900 we do our communication over email which 452 00:18:51,059 --> 00:18:57,660 is insane to me 453 00:18:53,900 --> 00:19:00,179 and uh once you send off saying hi I'm a 454 00:18:57,660 --> 00:19:01,860 young and not yet cynical kernel 455 00:19:00,179 --> 00:19:03,720 developer and I would like to modernize 456 00:19:01,860 --> 00:19:06,480 your stuff you get a laundry list right 457 00:19:03,720 --> 00:19:08,760 of things which you should fix and so uh 458 00:19:06,480 --> 00:19:12,419 by the time I I submitted part of my 459 00:19:08,760 --> 00:19:14,640 patches it was revision six and uh I had 460 00:19:12,419 --> 00:19:17,640 25 patches on the list which was 461 00:19:14,640 --> 00:19:20,039 ridiculous most of them got into a 462 00:19:17,640 --> 00:19:21,960 kernel six one but the actual hardening 463 00:19:20,039 --> 00:19:23,520 component got into six two because again 464 00:19:21,960 --> 00:19:25,860 if you went to Russell's talk everyone 465 00:19:23,520 --> 00:19:29,460 will complain if you make your CPS like 466 00:19:25,860 --> 00:19:30,059 two percent slower so 467 00:19:29,460 --> 00:19:32,760 um 468 00:19:30,059 --> 00:19:34,980 yeah like where I really 469 00:19:32,760 --> 00:19:37,860 um hit a wall is with this 32-bit 470 00:19:34,980 --> 00:19:39,900 calling conventions I I really 471 00:19:37,860 --> 00:19:42,600 um did work hard to to make these work 472 00:19:39,900 --> 00:19:44,220 did I miss the slide no all right I 473 00:19:42,600 --> 00:19:47,160 really did work hard to make these work 474 00:19:44,220 --> 00:19:50,580 but there are just some cursed 475 00:19:47,160 --> 00:19:53,820 um uh special cases so here's a here's a 476 00:19:50,580 --> 00:19:57,120 relatively normal ciscal CIS F advise 477 00:19:53,820 --> 00:20:00,240 64. where does the 64 come from well 478 00:19:57,120 --> 00:20:02,700 this lofty you kind of you know if you 479 00:20:00,240 --> 00:20:05,760 know that's always 64-bit irrespective 480 00:20:02,700 --> 00:20:08,160 of of what tool chain you're using 481 00:20:05,760 --> 00:20:09,480 um I think that's a posix thing 482 00:20:08,160 --> 00:20:11,940 um and uh 483 00:20:09,480 --> 00:20:15,620 yeah well some architectures actually 484 00:20:11,940 --> 00:20:15,620 want you to implement this one too 485 00:20:16,260 --> 00:20:19,520 um this is this is where it gets bit 486 00:20:17,700 --> 00:20:21,960 cursed you notice that the advice 487 00:20:19,520 --> 00:20:24,480 parameter actually changes its position 488 00:20:21,960 --> 00:20:27,419 this is probably the most attention 489 00:20:24,480 --> 00:20:28,440 you've ever paid to this fact but um the 490 00:20:27,419 --> 00:20:30,960 reason why it does that is because 491 00:20:28,440 --> 00:20:33,419 there's a sort of soft rule of no more 492 00:20:30,960 --> 00:20:35,460 than six parameters to ciscals because 493 00:20:33,419 --> 00:20:38,039 calling conventions on all these 494 00:20:35,460 --> 00:20:39,600 architectures do register bound so it's 495 00:20:38,039 --> 00:20:41,100 called most of them architectures do 496 00:20:39,600 --> 00:20:42,720 registered Bound syscalls for the first 497 00:20:41,100 --> 00:20:45,480 six at least 498 00:20:42,720 --> 00:20:46,919 um and so that's it's kind of common 499 00:20:45,480 --> 00:20:50,160 um and uh 500 00:20:46,919 --> 00:20:53,580 see 32-bit machines still need to be 501 00:20:50,160 --> 00:20:55,679 able to send these uh these two 64-bit 502 00:20:53,580 --> 00:20:57,299 params that's the the lofties I'm 503 00:20:55,679 --> 00:20:59,820 pointing to 504 00:20:57,299 --> 00:21:02,760 how do you sand a 64-bit parameter with 505 00:20:59,820 --> 00:21:05,280 32-bit arguments uh registers well you 506 00:21:02,760 --> 00:21:07,620 obviously split it in two and so that 507 00:21:05,280 --> 00:21:09,480 comes to six and why did we have to move 508 00:21:07,620 --> 00:21:12,360 advice to the back 509 00:21:09,480 --> 00:21:14,179 well it's because these architectures in 510 00:21:12,360 --> 00:21:16,679 their function calling conventions 511 00:21:14,179 --> 00:21:19,679 specify that these registered pairs 512 00:21:16,679 --> 00:21:21,900 which unite to become a single 64-bit 513 00:21:19,679 --> 00:21:24,299 values is register pairs better be odd 514 00:21:21,900 --> 00:21:26,600 even in their in their indexing all 515 00:21:24,299 --> 00:21:26,600 right so 516 00:21:27,360 --> 00:21:32,940 um yeah so some 32-bit args kind of 517 00:21:30,000 --> 00:21:34,620 expect the the Sysco handlers to look a 518 00:21:32,940 --> 00:21:38,039 bit like that 519 00:21:34,620 --> 00:21:39,480 and then the order of uh Lo and hi in 520 00:21:38,039 --> 00:21:41,039 each case is going to be dependent on 521 00:21:39,480 --> 00:21:41,760 endiness 522 00:21:41,039 --> 00:21:44,340 um 523 00:21:41,760 --> 00:21:46,440 yeah well all of these are implemented 524 00:21:44,340 --> 00:21:48,539 not in generic code they're implemented 525 00:21:46,440 --> 00:21:51,059 in power PC and they were last edited in 526 00:21:48,539 --> 00:21:52,799 2005. so so I had to clean up a whole 527 00:21:51,059 --> 00:21:55,620 bunch of them they weren't even using 528 00:21:52,799 --> 00:21:57,360 the macro I showed earlier so yeah a lot 529 00:21:55,620 --> 00:21:59,280 of fix up I won't believe with the 530 00:21:57,360 --> 00:22:01,980 details and then there was this little 531 00:21:59,280 --> 00:22:04,440 gem all right well so it turns out 532 00:22:01,980 --> 00:22:07,020 because this is a Cisco right and we 533 00:22:04,440 --> 00:22:08,580 don't break user space this was ABI for 534 00:22:07,020 --> 00:22:10,679 a while this was this was 535 00:22:08,580 --> 00:22:14,220 I couldn't change this without much 536 00:22:10,679 --> 00:22:17,400 arguing so in short there was this crazy 537 00:22:14,220 --> 00:22:19,919 um like power PC 32-bit only uh Cisco 538 00:22:17,400 --> 00:22:21,720 where um it would check whether the 539 00:22:19,919 --> 00:22:23,640 first parameter is a pointer or Not by 540 00:22:21,720 --> 00:22:27,059 saying is it bigger than a prohibitively 541 00:22:23,640 --> 00:22:29,220 large number of file descriptors and um 542 00:22:27,059 --> 00:22:32,220 if so it would arbitrarily decide use 543 00:22:29,220 --> 00:22:36,179 these old semantics if not use these new 544 00:22:32,220 --> 00:22:37,799 ones and that's nuts so I after quite a 545 00:22:36,179 --> 00:22:39,120 bit of arguing and then resuscitating 546 00:22:37,799 --> 00:22:40,980 all the conversations on the mailing 547 00:22:39,120 --> 00:22:42,480 list and saying no look we really were 548 00:22:40,980 --> 00:22:44,159 going in this direction just people 549 00:22:42,480 --> 00:22:45,840 didn't have time 550 00:22:44,159 --> 00:22:48,659 um we should just replace this with 551 00:22:45,840 --> 00:22:51,000 something sensible and so I 552 00:22:48,659 --> 00:22:52,880 finally cut out this thing and replaced 553 00:22:51,000 --> 00:22:55,980 it with say select 554 00:22:52,880 --> 00:22:58,260 besides the way I'd set things up you 555 00:22:55,980 --> 00:23:00,000 couldn't reliably recurse this call 556 00:22:58,260 --> 00:23:03,360 handlers so this kind of had to go 557 00:23:00,000 --> 00:23:04,919 anyway details 558 00:23:03,360 --> 00:23:07,980 all right so how did we do with 559 00:23:04,919 --> 00:23:10,679 performance right so we're having to 560 00:23:07,980 --> 00:23:13,260 save more registers in the way and that 561 00:23:10,679 --> 00:23:14,539 we hope to sanitize all of the users 562 00:23:13,260 --> 00:23:17,820 registers 563 00:23:14,539 --> 00:23:19,919 and then eventually restore them and so 564 00:23:17,820 --> 00:23:23,220 saving to the stack restoring from the 565 00:23:19,919 --> 00:23:26,400 stack it's going to be hella slow right 566 00:23:23,220 --> 00:23:28,620 oh sorry sanitizing more registers 567 00:23:26,400 --> 00:23:30,080 clearly that's more work on every 568 00:23:28,620 --> 00:23:34,260 contact switch 569 00:23:30,080 --> 00:23:35,640 and it turns out it made things faster 570 00:23:34,260 --> 00:23:36,900 um I made context which is faster 571 00:23:35,640 --> 00:23:40,440 accidentally 572 00:23:36,900 --> 00:23:42,120 um and so I've hence been summarily um 573 00:23:40,440 --> 00:23:44,940 removed from the hardening team because 574 00:23:42,120 --> 00:23:46,860 that goes against our our secondary goal 575 00:23:44,940 --> 00:23:50,039 primary goal 576 00:23:46,860 --> 00:23:52,500 just making context switches slower 577 00:23:50,039 --> 00:23:54,780 um it turns out that I made them 14 578 00:23:52,500 --> 00:23:56,460 faster you can ask me in the questions 579 00:23:54,780 --> 00:23:59,100 how 580 00:23:56,460 --> 00:24:01,559 uh and uh even with the mitigation 581 00:23:59,100 --> 00:24:04,799 kicked in contact switches are still 5.6 582 00:24:01,559 --> 00:24:07,380 faster so that's nice 583 00:24:04,799 --> 00:24:09,840 all right so you didn't come here to 584 00:24:07,380 --> 00:24:12,179 hear me waffle about cescals you came to 585 00:24:09,840 --> 00:24:13,200 see me break things so 586 00:24:12,179 --> 00:24:16,140 um 587 00:24:13,200 --> 00:24:19,400 right well this is the feedback I got 588 00:24:16,140 --> 00:24:21,360 and um after a little bit of a frantic 589 00:24:19,400 --> 00:24:23,340 slurry of trying to figure out what was 590 00:24:21,360 --> 00:24:25,620 going wrong this is what I figured out 591 00:24:23,340 --> 00:24:27,720 what my macro was doing if you 592 00:24:25,620 --> 00:24:29,159 um don't like debugging macros then uh 593 00:24:27,720 --> 00:24:32,460 good 594 00:24:29,159 --> 00:24:34,200 um so so I my macro spits out these 595 00:24:32,460 --> 00:24:35,940 three symbols 596 00:24:34,200 --> 00:24:38,760 um just in various stages of trying to 597 00:24:35,940 --> 00:24:40,620 unpack this args parameter 598 00:24:38,760 --> 00:24:42,000 and uh 599 00:24:40,620 --> 00:24:44,039 you don't need to see all of them 600 00:24:42,000 --> 00:24:45,120 because uh right here you can probably 601 00:24:44,039 --> 00:24:47,700 see the problem 602 00:24:45,120 --> 00:24:49,559 if you're facilitating a 32-bit calling 603 00:24:47,700 --> 00:24:52,080 convention by the way you should be able 604 00:24:49,559 --> 00:24:54,600 to run 32-bit programs on a 64-bit 605 00:24:52,080 --> 00:24:55,440 machine there needs to be compatibility 606 00:24:54,600 --> 00:24:58,980 um 607 00:24:55,440 --> 00:25:00,480 if you're running 32-bit my macro 608 00:24:58,980 --> 00:25:02,900 basically 609 00:25:00,480 --> 00:25:07,740 it cuts out here it sees that this is a 610 00:25:02,900 --> 00:25:09,659 uh for arity Sysco and so it only emits 611 00:25:07,740 --> 00:25:11,580 the first four registers so it just it 612 00:25:09,659 --> 00:25:13,740 breaks calling convention and I knew 613 00:25:11,580 --> 00:25:17,640 about this right I just I just didn't 614 00:25:13,740 --> 00:25:20,240 know that there were any ciscals left in 615 00:25:17,640 --> 00:25:23,039 the kernel after I had done all my work 616 00:25:20,240 --> 00:25:24,840 that were expected to run on both 32 and 617 00:25:23,039 --> 00:25:27,720 64. 618 00:25:24,840 --> 00:25:31,080 and so the rest of this is all a wash 619 00:25:27,720 --> 00:25:33,480 um but yeah that's some cursed macro 620 00:25:31,080 --> 00:25:34,860 and uh yeah just to kind of spell it out 621 00:25:33,480 --> 00:25:39,240 a little bit further 622 00:25:34,860 --> 00:25:41,460 uh we only pass in four registers uh the 623 00:25:39,240 --> 00:25:43,140 the last two are five and R6 should be 624 00:25:41,460 --> 00:25:44,820 parts of offset 625 00:25:43,140 --> 00:25:48,059 of the offset parameter but instead 626 00:25:44,820 --> 00:25:51,120 they're interpreted as a random bit of 627 00:25:48,059 --> 00:25:53,880 offset and around a bit of Len and so 628 00:25:51,120 --> 00:25:55,919 this this is what the macro should spit 629 00:25:53,880 --> 00:25:59,760 out but it doesn't 630 00:25:55,919 --> 00:26:02,340 which is not good and uh yeah so the the 631 00:25:59,760 --> 00:26:07,020 exact part of offset and Lan which are 632 00:26:02,340 --> 00:26:10,799 represented by each each of R5 and R6 uh 633 00:26:07,020 --> 00:26:12,840 basically that's sort of random but um 634 00:26:10,799 --> 00:26:15,960 but not quite random that's determined 635 00:26:12,840 --> 00:26:19,080 by our endian-ness which brings us to 636 00:26:15,960 --> 00:26:20,940 the next bug I wrote 637 00:26:19,080 --> 00:26:22,679 um and uh there's nothing really deep 638 00:26:20,940 --> 00:26:25,260 here but I thought I would give back to 639 00:26:22,679 --> 00:26:30,480 the community and um I would Define this 640 00:26:25,260 --> 00:26:32,100 compat ARG u64 dual macro there's a 641 00:26:30,480 --> 00:26:34,320 subsystem in the kernel you may be 642 00:26:32,100 --> 00:26:38,760 interested to know called ASM generic 643 00:26:34,320 --> 00:26:41,820 and uh azim Sans stands for assembly and 644 00:26:38,760 --> 00:26:45,539 it's you know it's like generic assembly 645 00:26:41,820 --> 00:26:47,700 um essentially it's bits of macros 646 00:26:45,539 --> 00:26:49,919 um which uh all architectures could 647 00:26:47,700 --> 00:26:52,320 benefit from there being common code and 648 00:26:49,919 --> 00:26:54,240 people override their own macros and so 649 00:26:52,320 --> 00:26:55,679 I overload these macros 650 00:26:54,240 --> 00:26:58,260 so that everyone can have a 651 00:26:55,679 --> 00:27:00,600 compatibility version of f allocate 652 00:26:58,260 --> 00:27:04,860 everyone who chooses to 653 00:27:00,600 --> 00:27:07,559 uh and that compat are q64 Jewel that's 654 00:27:04,860 --> 00:27:09,960 going to split offset into a low portion 655 00:27:07,559 --> 00:27:12,000 and a high portion and then the order of 656 00:27:09,960 --> 00:27:14,580 those low portion and high portion is 657 00:27:12,000 --> 00:27:16,380 going to be determined by endiness well 658 00:27:14,580 --> 00:27:17,880 I got that wrong I got that flipped 659 00:27:16,380 --> 00:27:19,440 around the other side and my patch 660 00:27:17,880 --> 00:27:21,299 series went through six revisions and 661 00:27:19,440 --> 00:27:22,020 somehow no one caught that 662 00:27:21,299 --> 00:27:23,820 um 663 00:27:22,020 --> 00:27:26,159 so so I really was just hitting high 664 00:27:23,820 --> 00:27:30,360 schoolers here and I had in fact bricked 665 00:27:26,159 --> 00:27:31,740 risk five 32-bit and power PC 32-bit at 666 00:27:30,360 --> 00:27:34,200 this stage 667 00:27:31,740 --> 00:27:35,159 um a prominent member of the colonel 668 00:27:34,200 --> 00:27:37,020 community 669 00:27:35,159 --> 00:27:39,240 made a comment about this see the 670 00:27:37,020 --> 00:27:41,220 blunders which I'd made right they were 671 00:27:39,240 --> 00:27:43,919 they were great 672 00:27:41,220 --> 00:27:46,559 um in nature but not great enough to 673 00:27:43,919 --> 00:27:48,120 topple two-haul architectures because um 674 00:27:46,559 --> 00:27:50,580 no one's really running these Upstream 675 00:27:48,120 --> 00:27:53,580 kernels but a certain prominent member 676 00:27:50,580 --> 00:27:56,159 of the community decided to chime in 677 00:27:53,580 --> 00:27:58,080 saying you know is it just me or did 678 00:27:56,159 --> 00:28:00,179 this break every architecture did did 679 00:27:58,080 --> 00:28:01,440 this grad just manage to destroy 680 00:28:00,179 --> 00:28:03,240 everything 681 00:28:01,440 --> 00:28:06,059 um and people just didn't scream because 682 00:28:03,240 --> 00:28:08,159 32-bit code has become so rare 683 00:28:06,059 --> 00:28:11,220 um thankfully it was it was just risk 684 00:28:08,159 --> 00:28:14,340 five and powerpc so no one noticed but 685 00:28:11,220 --> 00:28:14,340 um but 686 00:28:15,000 --> 00:28:17,120 um 687 00:28:17,159 --> 00:28:20,880 but you know every cloud has a silver 688 00:28:19,380 --> 00:28:23,539 lining and a guy I got my first shout 689 00:28:20,880 --> 00:28:23,539 out from Linus 690 00:28:25,679 --> 00:28:30,900 so uh by means of conclusion essentially 691 00:28:28,440 --> 00:28:34,440 yeah I'm a first year grad 692 00:28:30,900 --> 00:28:36,840 um I go to work on the Linux kernel 693 00:28:34,440 --> 00:28:38,940 which supports the vast majority of of 694 00:28:36,840 --> 00:28:40,679 web servers which is kind of amazing and 695 00:28:38,940 --> 00:28:42,900 it's just kind of amazing how how easily 696 00:28:40,679 --> 00:28:45,360 someone can throw their hat into the 697 00:28:42,900 --> 00:28:48,600 ring and work on 698 00:28:45,360 --> 00:28:51,000 um essentially just infrastructure which 699 00:28:48,600 --> 00:28:53,580 which everyone depends upon 700 00:28:51,000 --> 00:28:56,100 and um essentially my ability to 701 00:28:53,580 --> 00:28:59,940 contribute as much as I have is really 702 00:28:56,100 --> 00:29:01,860 owed greatly to the rigorous uh 703 00:28:59,940 --> 00:29:04,980 reviewing this is not a joke by the way 704 00:29:01,860 --> 00:29:08,580 these these these tests uh the people 705 00:29:04,980 --> 00:29:09,779 really did boot kernels on 32-bit and uh 706 00:29:08,580 --> 00:29:12,480 it turns out they they didn't actually 707 00:29:09,779 --> 00:29:14,700 corrupt enough to not boot 708 00:29:12,480 --> 00:29:16,500 um which is delightful so it was one of 709 00:29:14,700 --> 00:29:18,419 those kind of wicked problems 710 00:29:16,500 --> 00:29:21,059 um but they certainly were corrupting 711 00:29:18,419 --> 00:29:24,120 um and really anyway my ability to get 712 00:29:21,059 --> 00:29:26,159 this far is very much I'm thankful for 713 00:29:24,120 --> 00:29:28,080 for the testers the reviewers and the 714 00:29:26,159 --> 00:29:30,840 maintainers who makes this possible 715 00:29:28,080 --> 00:29:32,580 essentially what I want to say is that 716 00:29:30,840 --> 00:29:34,980 persistence seems to be a sort of key 717 00:29:32,580 --> 00:29:36,539 skill for kernel developers 718 00:29:34,980 --> 00:29:39,000 um and that if you if you're someone who 719 00:29:36,539 --> 00:29:41,820 enjoys low level programming and you're 720 00:29:39,000 --> 00:29:44,039 willing to go through a mailing list to 721 00:29:41,820 --> 00:29:45,899 go through multiple lines of reviews you 722 00:29:44,039 --> 00:29:48,360 you really can actually contribute to 723 00:29:45,899 --> 00:29:50,340 some pretty significant software 724 00:29:48,360 --> 00:29:52,740 projects 725 00:29:50,340 --> 00:29:55,399 and uh there's always work it seems 726 00:29:52,740 --> 00:29:57,480 manually manipulating registers 727 00:29:55,399 --> 00:30:00,360 and but if you have this sort of 728 00:29:57,480 --> 00:30:02,340 meticulous attention to detail yeah you 729 00:30:00,360 --> 00:30:03,179 can you can contribute on most things so 730 00:30:02,340 --> 00:30:05,700 I hope 731 00:30:03,179 --> 00:30:07,799 in hearing this talk uh this encourages 732 00:30:05,700 --> 00:30:09,539 you to contribute to the projects which 733 00:30:07,799 --> 00:30:10,440 interest you 734 00:30:09,539 --> 00:30:12,000 um 735 00:30:10,440 --> 00:30:15,260 and I hope that your first patch series 736 00:30:12,000 --> 00:30:15,260 is not quite this Coast 737 00:30:15,419 --> 00:30:18,790 um 738 00:30:16,200 --> 00:30:26,820 any questions yep 739 00:30:18,790 --> 00:30:28,799 [Applause] 740 00:30:26,820 --> 00:30:33,299 so how did you speed up context 741 00:30:28,799 --> 00:30:34,919 yeah yeah good one so uh should I show 742 00:30:33,299 --> 00:30:37,559 some kernel code for this one that might 743 00:30:34,919 --> 00:30:42,419 take a little a little while go ahead 744 00:30:37,559 --> 00:30:46,140 okay uh all right I closed my editor 745 00:30:42,419 --> 00:30:47,460 um okay well in in short 746 00:30:46,140 --> 00:30:49,860 uh 747 00:30:47,460 --> 00:30:51,840 there's been a number of reforms to 748 00:30:49,860 --> 00:30:55,380 replace what is a whole bunch of 749 00:30:51,840 --> 00:30:57,320 assembly so we we we use uh gnu 750 00:30:55,380 --> 00:30:59,820 assembler to 751 00:30:57,320 --> 00:31:01,860 write our interrupt handlers at the 752 00:30:59,820 --> 00:31:04,980 exact region in virtual memory where 753 00:31:01,860 --> 00:31:06,600 intra handlers are expected and then we 754 00:31:04,980 --> 00:31:08,460 just have this we have a number of 755 00:31:06,600 --> 00:31:10,260 trampolines which eventually call into C 756 00:31:08,460 --> 00:31:13,620 code and then the rest of everything is 757 00:31:10,260 --> 00:31:15,659 performed in in C code in short when I 758 00:31:13,620 --> 00:31:16,799 changed calling conventions I know what 759 00:31:15,659 --> 00:31:20,720 I'll do 760 00:31:16,799 --> 00:31:20,720 I'll just show this slide 761 00:31:21,419 --> 00:31:25,260 yeah so when I change calling 762 00:31:23,220 --> 00:31:27,779 conventions 763 00:31:25,260 --> 00:31:29,580 for the assist call handlers I was also 764 00:31:27,779 --> 00:31:32,760 able to call it change the calling 765 00:31:29,580 --> 00:31:34,500 Convention of some of these uh C 766 00:31:32,760 --> 00:31:35,760 functions which facilitate these 767 00:31:34,500 --> 00:31:38,520 interrupts 768 00:31:35,760 --> 00:31:42,600 and that might not sound like a lot but 769 00:31:38,520 --> 00:31:45,360 um in principle these Cisco handlers can 770 00:31:42,600 --> 00:31:48,000 take up to six arguments so that's six 771 00:31:45,360 --> 00:31:49,500 of the scratch registers uh consumed and 772 00:31:48,000 --> 00:31:53,279 so what I did is I just freed up 773 00:31:49,500 --> 00:31:56,820 register allocation in short so uh it it 774 00:31:53,279 --> 00:31:59,039 uh decreased the the total stack 775 00:31:56,820 --> 00:32:02,700 utilization and it turns out that yeah I 776 00:31:59,039 --> 00:32:04,860 got more than 10 speed up which is kind 777 00:32:02,700 --> 00:32:06,840 of phenomenal there actually not at all 778 00:32:04,860 --> 00:32:08,880 what I expected 779 00:32:06,840 --> 00:32:12,740 um yeah there's another part to that but 780 00:32:08,880 --> 00:32:12,740 I've gone any other questions 781 00:32:15,899 --> 00:32:18,440 hey 782 00:32:25,140 --> 00:32:29,880 um sort of a two-parter like uh how much 783 00:32:27,480 --> 00:32:32,039 background did you have like in kernel 784 00:32:29,880 --> 00:32:34,080 development or like 785 00:32:32,039 --> 00:32:36,899 um looking at this kind of low-level 786 00:32:34,080 --> 00:32:40,820 development before you got stuck into 787 00:32:36,899 --> 00:32:43,500 this work and secondly why was risk five 788 00:32:40,820 --> 00:32:45,480 impacted along with powerpc good one 789 00:32:43,500 --> 00:32:48,539 okay uh 790 00:32:45,480 --> 00:32:50,820 uh yes so what background did I have I 791 00:32:48,539 --> 00:32:54,000 primarily got into low-level programming 792 00:32:50,820 --> 00:32:55,740 actually by just writing simulations 793 00:32:54,000 --> 00:32:57,840 um I've touched a bit of Fortran but 794 00:32:55,740 --> 00:33:00,419 thankfully mostly in C 795 00:32:57,840 --> 00:33:02,100 um and uh so yeah just working with low 796 00:33:00,419 --> 00:33:06,240 level libraries like 797 00:33:02,100 --> 00:33:08,659 um MPI uh and openmp and whatnot yeah 798 00:33:06,240 --> 00:33:11,460 and I just discovered that I I just like 799 00:33:08,659 --> 00:33:14,159 tuning optimizing things 800 00:33:11,460 --> 00:33:15,720 um which is cool uh 801 00:33:14,159 --> 00:33:17,460 yeah 802 00:33:15,720 --> 00:33:20,460 um and then later on I got into Colonel 803 00:33:17,460 --> 00:33:22,679 Dev I took a class at Uni 804 00:33:20,460 --> 00:33:24,899 um that kind of interested me to read a 805 00:33:22,679 --> 00:33:26,519 whole bunch of Kernel code but didn't 806 00:33:24,899 --> 00:33:28,320 have any projects immediately which I 807 00:33:26,519 --> 00:33:29,159 necessarily wanted to write in the 808 00:33:28,320 --> 00:33:30,720 kernel 809 00:33:29,159 --> 00:33:33,179 um and so yeah I really did come in 810 00:33:30,720 --> 00:33:35,700 blind at some rate in terms of the the 811 00:33:33,179 --> 00:33:37,559 context of what I was looking at and um 812 00:33:35,700 --> 00:33:41,519 actually this was a this is a really 813 00:33:37,559 --> 00:33:42,799 good first task working on uh Cisco 814 00:33:41,519 --> 00:33:46,860 handlers 815 00:33:42,799 --> 00:33:49,019 in that uh essentially so the The Cisco 816 00:33:46,860 --> 00:33:52,880 interfaces I imagine probably the 817 00:33:49,019 --> 00:33:57,000 largest user to to core kernel 818 00:33:52,880 --> 00:33:59,940 interface and so uh the user user code 819 00:33:57,000 --> 00:34:01,620 needs to request resources of of all 820 00:33:59,940 --> 00:34:04,080 sorts and so it should eventually touch 821 00:34:01,620 --> 00:34:05,580 all subsystems in the core kernel and so 822 00:34:04,080 --> 00:34:07,980 yeah I got to read quite a bit of code 823 00:34:05,580 --> 00:34:09,300 that way so it's a good induction 824 00:34:07,980 --> 00:34:12,480 um and then your second question how did 825 00:34:09,300 --> 00:34:14,159 I break risk five as well 826 00:34:12,480 --> 00:34:16,440 um let's go back to that slide because I 827 00:34:14,159 --> 00:34:18,500 don't think I explained it super well 828 00:34:16,440 --> 00:34:18,500 um 829 00:34:19,980 --> 00:34:27,060 in short I um I influented this compat 830 00:34:23,099 --> 00:34:29,580 ARG u64 Jewel macro that was already 831 00:34:27,060 --> 00:34:32,760 implemented and being consumed by risk 832 00:34:29,580 --> 00:34:35,700 five and um uh 833 00:34:32,760 --> 00:34:37,460 I opted in powerpc to using it so we 834 00:34:35,700 --> 00:34:40,379 could remove some of our own custom 835 00:34:37,460 --> 00:34:41,220 Cisco handlers which were basically the 836 00:34:40,379 --> 00:34:46,139 same 837 00:34:41,220 --> 00:34:47,820 uh but critically I I exchanged their um 838 00:34:46,139 --> 00:34:50,240 working version of that Macro for a one 839 00:34:47,820 --> 00:34:50,240 which broke 840 00:34:51,980 --> 00:35:01,619 uh yeah I um I misplaced a um 841 00:34:58,200 --> 00:35:04,980 yeah if deaf big Indian 842 00:35:01,619 --> 00:35:07,440 um turns out not all architectures have 843 00:35:04,980 --> 00:35:09,240 a if def little Indian which is which is 844 00:35:07,440 --> 00:35:13,020 weird and more architectures are likely 845 00:35:09,240 --> 00:35:14,880 to have a a macro called defined config 846 00:35:13,020 --> 00:35:16,020 big endian then config little ending 847 00:35:14,880 --> 00:35:18,000 which is weird because more 848 00:35:16,020 --> 00:35:19,980 architectures I imagine a little Indian 849 00:35:18,000 --> 00:35:21,839 than and whatever 850 00:35:19,980 --> 00:35:24,359 fun 851 00:35:21,839 --> 00:35:26,640 yes indeed well I suppose if Little 852 00:35:24,359 --> 00:35:27,599 Indians the default then 853 00:35:26,640 --> 00:35:31,680 anyway 854 00:35:27,599 --> 00:35:34,680 whatever but um in short I uh 855 00:35:31,680 --> 00:35:36,780 I I realize this that oh if I have if 856 00:35:34,680 --> 00:35:38,460 config little endian I'm gonna break a 857 00:35:36,780 --> 00:35:40,859 whole bunch of architectures and so I 858 00:35:38,460 --> 00:35:41,700 decided let's not use that so I decided 859 00:35:40,859 --> 00:35:44,400 to use 860 00:35:41,700 --> 00:35:46,260 if and death big end in except I missed 861 00:35:44,400 --> 00:35:48,180 the N is down the end and so did my 862 00:35:46,260 --> 00:35:50,700 reviewers apparently so 863 00:35:48,180 --> 00:35:52,280 kind of been that important yeah 864 00:35:50,700 --> 00:35:55,579 um 865 00:35:52,280 --> 00:35:55,579 any more questions 866 00:35:57,960 --> 00:36:01,980 so I have a fully non non-technical 867 00:35:59,940 --> 00:36:04,200 question as a newcomer to the Linux 868 00:36:01,980 --> 00:36:05,820 Community what if anything do you think 869 00:36:04,200 --> 00:36:08,700 should be different about the Linux 870 00:36:05,820 --> 00:36:10,320 development culture to get more people 871 00:36:08,700 --> 00:36:13,380 like yourselves who are newcomers to 872 00:36:10,320 --> 00:36:14,820 involved and working Upstream yeah I 873 00:36:13,380 --> 00:36:17,640 would say that I mean the grads are 874 00:36:14,820 --> 00:36:19,440 today are natives right in in at very 875 00:36:17,640 --> 00:36:21,780 least GitHub 876 00:36:19,440 --> 00:36:24,540 um I mean sorry I think I'm only 877 00:36:21,780 --> 00:36:27,780 touching on concerns which are 878 00:36:24,540 --> 00:36:29,460 I think pretty broadly felt within even 879 00:36:27,780 --> 00:36:31,920 within the kernel Community who've 880 00:36:29,460 --> 00:36:34,859 solidified their workflows now for for 881 00:36:31,920 --> 00:36:38,579 some years but um yeah I think the key 882 00:36:34,859 --> 00:36:41,460 thing is um uh the way people contribute 883 00:36:38,579 --> 00:36:43,800 uh new grads will be more familiar of 884 00:36:41,460 --> 00:36:45,180 course with with merge requests 885 00:36:43,800 --> 00:36:46,619 um or pull requests or if you're on 886 00:36:45,180 --> 00:36:50,640 GitHub 887 00:36:46,619 --> 00:36:52,619 um and uh the way CI Works kernel CI is 888 00:36:50,640 --> 00:36:53,579 a talk um happening in this conference 889 00:36:52,619 --> 00:36:57,480 about it 890 00:36:53,579 --> 00:36:59,460 um Colonel CI uh has quite a quite a 891 00:36:57,480 --> 00:37:02,940 long history there are some really good 892 00:36:59,460 --> 00:37:06,240 examples of it with um uh so for example 893 00:37:02,940 --> 00:37:08,579 the zero day CI Which Intel does is 894 00:37:06,240 --> 00:37:10,380 um has been very helpful 895 00:37:08,579 --> 00:37:11,700 um but ultimately you know we built up 896 00:37:10,380 --> 00:37:14,160 this Matrix right of the sub 897 00:37:11,700 --> 00:37:15,839 architectures present present on on 898 00:37:14,160 --> 00:37:18,420 powerpc that's really the tip of the 899 00:37:15,839 --> 00:37:20,760 iceberg because there are any number of 900 00:37:18,420 --> 00:37:23,579 modules and configs of any sort so 901 00:37:20,760 --> 00:37:25,380 essentially the The Matrix of things 902 00:37:23,579 --> 00:37:26,820 which you might want to test become 903 00:37:25,380 --> 00:37:27,660 enormous 904 00:37:26,820 --> 00:37:30,240 um 905 00:37:27,660 --> 00:37:32,460 yeah yeah there are there are some very 906 00:37:30,240 --> 00:37:33,780 good talks from previous lcas on on the 907 00:37:32,460 --> 00:37:36,420 state of testing but yeah it's 908 00:37:33,780 --> 00:37:38,400 fundamentally it's it's testing and how 909 00:37:36,420 --> 00:37:40,859 do you contribute 910 00:37:38,400 --> 00:37:42,660 um setting up your your mail client to 911 00:37:40,859 --> 00:37:44,400 be able to contribute is it's kind of 912 00:37:42,660 --> 00:37:46,079 yeah 913 00:37:44,400 --> 00:37:48,839 um non-trivial 914 00:37:46,079 --> 00:37:51,240 um and so in just decreasing that that 915 00:37:48,839 --> 00:37:55,560 barrier to entry 916 00:37:51,240 --> 00:37:57,820 um yeah I'd say barrier to entry yep 917 00:37:55,560 --> 00:38:01,030 just thank you 918 00:37:57,820 --> 00:38:01,030 [Applause]