1 00:00:00,000 --> 00:00:08,469 foreign 2 00:00:00,500 --> 00:00:08,469 [Music] 3 00:00:11,300 --> 00:00:15,660 hi everyone welcome back from morning 4 00:00:13,380 --> 00:00:19,560 tea hope you had a good one 5 00:00:15,660 --> 00:00:23,279 um we have an interesting talk uh up 6 00:00:19,560 --> 00:00:24,779 this morning uh so what has RCU done 7 00:00:23,279 --> 00:00:27,660 lately 8 00:00:24,779 --> 00:00:30,180 um and this will be from Paul McKinney 9 00:00:27,660 --> 00:00:32,340 he is a kernel hacker working on various 10 00:00:30,180 --> 00:00:34,739 parts of the read copy update 11 00:00:32,340 --> 00:00:37,620 implementation in the Linux kernel along 12 00:00:34,739 --> 00:00:39,300 with his team uh today's talk will be 13 00:00:37,620 --> 00:00:42,420 covering all of the recent work in this 14 00:00:39,300 --> 00:00:44,820 implementation as well as RCU work going 15 00:00:42,420 --> 00:00:47,140 on outside the kernel uh so please give 16 00:00:44,820 --> 00:00:51,000 Paul a warm welcome 17 00:00:47,140 --> 00:00:52,680 [Applause] 18 00:00:51,000 --> 00:00:56,340 thank you Sam 19 00:00:52,680 --> 00:00:57,600 so this is the team uh Buchan was in a 20 00:00:56,340 --> 00:01:00,480 reflective mood when he took the picture 21 00:00:57,600 --> 00:01:02,940 so but there there we are 22 00:01:00,480 --> 00:01:04,979 uh one thing that uh sometimes I get 23 00:01:02,940 --> 00:01:07,260 questions about is whether RC really is 24 00:01:04,979 --> 00:01:09,840 still changing I remember a poor guy was 25 00:01:07,260 --> 00:01:11,159 uh very surprised by this uh I didn't 26 00:01:09,840 --> 00:01:13,439 have the heart to tell him that not only 27 00:01:11,159 --> 00:01:15,240 is RCU under active development but spin 28 00:01:13,439 --> 00:01:17,040 locks and atomic operations are also 29 00:01:15,240 --> 00:01:19,799 interactive development 30 00:01:17,040 --> 00:01:22,740 so first we're going to do an RC review 31 00:01:19,799 --> 00:01:24,600 and this is going to be quick there are 32 00:01:22,740 --> 00:01:26,159 some URLs at the bottom of the slide 33 00:01:24,600 --> 00:01:28,380 here and those will be available to 34 00:01:26,159 --> 00:01:30,659 people that give you a much longer and 35 00:01:28,380 --> 00:01:32,820 gentler introduction to this but uh 36 00:01:30,659 --> 00:01:35,460 we're we're just going through it for as 37 00:01:32,820 --> 00:01:36,720 a reminder so the purpose of RSU in 38 00:01:35,460 --> 00:01:37,560 short is to work around the laws of 39 00:01:36,720 --> 00:01:39,900 physics 40 00:01:37,560 --> 00:01:41,579 now when I was a kid I was well aware 41 00:01:39,900 --> 00:01:43,200 that the finite speed of light was a 42 00:01:41,579 --> 00:01:44,040 problem for things like say Interstellar 43 00:01:43,200 --> 00:01:46,680 travel 44 00:01:44,040 --> 00:01:49,560 but I would have been quite surprised to 45 00:01:46,680 --> 00:01:52,560 find that that finite speed of light and 46 00:01:49,560 --> 00:01:55,439 the non-zero side atoms was going to be 47 00:01:52,560 --> 00:01:57,740 a problem for computation but here we 48 00:01:55,439 --> 00:01:57,740 are today 49 00:01:57,899 --> 00:02:02,460 this kind of gives a quick overview of 50 00:02:00,360 --> 00:02:05,159 what RCU does at a high level 51 00:02:02,460 --> 00:02:06,659 what we have is time going from top to 52 00:02:05,159 --> 00:02:08,640 bottom and it's really ordering but 53 00:02:06,659 --> 00:02:10,920 let's call it time anyway in each of 54 00:02:08,640 --> 00:02:13,260 four scenarios the scenario in the upper 55 00:02:10,920 --> 00:02:15,720 left we reader they start with RC Reed 56 00:02:13,260 --> 00:02:17,280 lock they end with RC read unlock and it 57 00:02:15,720 --> 00:02:19,739 started before 58 00:02:17,280 --> 00:02:22,200 an object was removed there 59 00:02:19,739 --> 00:02:24,000 and that means that this reader might 60 00:02:22,200 --> 00:02:27,060 have a reference to that object 61 00:02:24,000 --> 00:02:29,580 and that in turn means that the reader 62 00:02:27,060 --> 00:02:32,280 had better be done before we free that 63 00:02:29,580 --> 00:02:33,780 memory I mean use after freeze are just 64 00:02:32,280 --> 00:02:35,580 as bad with RCU as they are every 65 00:02:33,780 --> 00:02:38,280 anywhere else please don't do that right 66 00:02:35,580 --> 00:02:40,440 and rsu's purpose is to prevent that use 67 00:02:38,280 --> 00:02:42,900 after free and in this case the 68 00:02:40,440 --> 00:02:44,940 synchronized RSU would wait until all 69 00:02:42,900 --> 00:02:46,680 the pre-existing readers got done before 70 00:02:44,940 --> 00:02:48,300 it returned thus preventing that use 71 00:02:46,680 --> 00:02:50,459 after free 72 00:02:48,300 --> 00:02:52,200 in the upper right we have it the other 73 00:02:50,459 --> 00:02:55,140 way around we have a reader that stayed 74 00:02:52,200 --> 00:02:57,480 around and was still in force after the 75 00:02:55,140 --> 00:02:59,819 free and therefore it better be the case 76 00:02:57,480 --> 00:03:02,519 that that reader started after the 77 00:02:59,819 --> 00:03:03,959 removal so that that freeing can't 78 00:03:02,519 --> 00:03:05,040 inconvenience it it can't possibly have 79 00:03:03,959 --> 00:03:07,379 a reference to the thing that was 80 00:03:05,040 --> 00:03:08,940 removed so it's okay still 81 00:03:07,379 --> 00:03:10,739 we can have the belt and suspender 82 00:03:08,940 --> 00:03:13,140 scenario which is in the lower left 83 00:03:10,739 --> 00:03:15,180 there not only were we not around when 84 00:03:13,140 --> 00:03:17,220 we got to bring the object but we also 85 00:03:15,180 --> 00:03:18,540 weren't there to see it in the first 86 00:03:17,220 --> 00:03:21,420 place 87 00:03:18,540 --> 00:03:23,400 the bad scenario the lower right is a 88 00:03:21,420 --> 00:03:24,959 bug in RCU here we have a reader that 89 00:03:23,400 --> 00:03:27,060 was there before the removal and was 90 00:03:24,959 --> 00:03:29,459 still after the free that'll be used 91 00:03:27,060 --> 00:03:31,080 after free potentially and again that's 92 00:03:29,459 --> 00:03:32,819 a bug in RCU what should have happened 93 00:03:31,080 --> 00:03:34,260 is it synchronized RCU should not have 94 00:03:32,819 --> 00:03:36,060 returned so said 95 00:03:34,260 --> 00:03:37,920 so that's kind of a quick view of the 96 00:03:36,060 --> 00:03:41,340 base semantics 97 00:03:37,920 --> 00:03:43,980 this is code that relies on it and what 98 00:03:41,340 --> 00:03:45,959 we're doing here is we're combining 99 00:03:43,980 --> 00:03:48,720 temporal synchronization again really 100 00:03:45,959 --> 00:03:50,519 ordering on the y-axis going down and we 101 00:03:48,720 --> 00:03:53,220 also have address space synchronization 102 00:03:50,519 --> 00:03:54,720 going from left to right 103 00:03:53,220 --> 00:03:56,459 so we have this Kirk config variable 104 00:03:54,720 --> 00:03:58,980 it's a pointer initially it pointed to 105 00:03:56,459 --> 00:04:00,599 that blue object and so we have a reader 106 00:03:58,980 --> 00:04:02,099 on the on the left that started early we 107 00:04:00,599 --> 00:04:04,019 have a reader on the right that started 108 00:04:02,099 --> 00:04:06,000 late the reader on the left started 109 00:04:04,019 --> 00:04:09,360 early so it gets the old object the blue 110 00:04:06,000 --> 00:04:11,340 structure 3746 because it is appointed 111 00:04:09,360 --> 00:04:13,920 to that it'll still use those consistent 112 00:04:11,340 --> 00:04:15,480 pair of values and not be confused by 113 00:04:13,920 --> 00:04:18,000 the later update 114 00:04:15,480 --> 00:04:20,220 again the reader came first synchronized 115 00:04:18,000 --> 00:04:23,220 RSU has to wait for it so the free can't 116 00:04:20,220 --> 00:04:25,620 happen until that reader gets done 117 00:04:23,220 --> 00:04:27,240 the other reader came later therefore it 118 00:04:25,620 --> 00:04:29,699 doesn't get a reference the old object 119 00:04:27,240 --> 00:04:32,820 instead it gets the new object and again 120 00:04:29,699 --> 00:04:34,440 it gets a consistent set of values so 121 00:04:32,820 --> 00:04:36,300 what happens in some sense we have a 122 00:04:34,440 --> 00:04:39,240 kind of a zone of confusion between 123 00:04:36,300 --> 00:04:41,880 those two dotted horizontal blue lines 124 00:04:39,240 --> 00:04:44,160 and the confusion is resolved by the 125 00:04:41,880 --> 00:04:47,160 change in address space between the blue 126 00:04:44,160 --> 00:04:50,100 and the green structure 127 00:04:47,160 --> 00:04:51,900 now it may seem kind of crazy that you'd 128 00:04:50,100 --> 00:04:54,360 have a reader that might be using stuff 129 00:04:51,900 --> 00:04:56,400 after it was obsolete but keep in mind 130 00:04:54,360 --> 00:04:58,620 that most of our computers are 131 00:04:56,400 --> 00:05:00,240 interfacing to the real world and so is 132 00:04:58,620 --> 00:05:01,680 that red dotted line up there earlier 133 00:05:00,240 --> 00:05:04,979 that's when the change happened in the 134 00:05:01,680 --> 00:05:06,660 real world and so we could be totally 135 00:05:04,979 --> 00:05:08,040 consistent in the computer but we'd 136 00:05:06,660 --> 00:05:10,500 still be out of sync with the real world 137 00:05:08,040 --> 00:05:13,919 RC was just admitting that we do have 138 00:05:10,500 --> 00:05:16,560 that lack of synchronization 139 00:05:13,919 --> 00:05:19,440 all right we talked about time and space 140 00:05:16,560 --> 00:05:21,540 we've seen the first three of those and 141 00:05:19,440 --> 00:05:23,400 they the limit time for us or ordering 142 00:05:21,540 --> 00:05:25,139 if you prefer call RCU is like 143 00:05:23,400 --> 00:05:26,940 synchronize RSU except asynchronous you 144 00:05:25,139 --> 00:05:28,680 give it a function and it calls it later 145 00:05:26,940 --> 00:05:31,560 and the bottom three 146 00:05:28,680 --> 00:05:33,780 um uh the top uh the first two rcd 147 00:05:31,560 --> 00:05:36,060 reference in France load pointers and 148 00:05:33,780 --> 00:05:38,100 the rco sign pointer updates it we used 149 00:05:36,060 --> 00:05:40,979 an atomic Exchange in the example just 150 00:05:38,100 --> 00:05:42,780 to make it fit on one slide 151 00:05:40,979 --> 00:05:44,280 all right and this is what we're doing 152 00:05:42,780 --> 00:05:46,860 here if you had reader writer locking 153 00:05:44,280 --> 00:05:48,600 that works great by the way but uh 154 00:05:46,860 --> 00:05:49,680 because you have this in classic reader 155 00:05:48,600 --> 00:05:52,680 writer locking you have the same lock 156 00:05:49,680 --> 00:05:54,840 object and that has to be atomically 157 00:05:52,680 --> 00:05:56,580 manipulated that's going to mean delays 158 00:05:54,840 --> 00:05:58,380 as it gets batted around along the CPUs 159 00:05:56,580 --> 00:06:00,780 and those delays are represented by 160 00:05:58,380 --> 00:06:03,120 those red that red in that 161 00:06:00,780 --> 00:06:04,740 we have latencies to get the global 162 00:06:03,120 --> 00:06:06,180 agreement we need to switch from readers 163 00:06:04,740 --> 00:06:08,580 to updators and back 164 00:06:06,180 --> 00:06:11,220 in contrast with RCU we don't have that 165 00:06:08,580 --> 00:06:13,080 lens see there's no not it's not 166 00:06:11,220 --> 00:06:15,000 necessary to have a single Global shared 167 00:06:13,080 --> 00:06:16,860 object and so we can get much more 168 00:06:15,000 --> 00:06:18,840 efficient use of the CPU 169 00:06:16,860 --> 00:06:20,940 okay well it's easy to say that you 170 00:06:18,840 --> 00:06:22,860 might want some evidence here's a one 171 00:06:20,940 --> 00:06:24,660 straightforward thing this is a crude 172 00:06:22,860 --> 00:06:26,280 Benchmark where we use an empty Reach 173 00:06:24,660 --> 00:06:27,900 Out critical section so we acquire read 174 00:06:26,280 --> 00:06:29,880 acquire lock we immediately release it 175 00:06:27,900 --> 00:06:31,979 we do RC read lock and we immediately do 176 00:06:29,880 --> 00:06:33,419 our secret unlock as you can see the 177 00:06:31,979 --> 00:06:36,360 reader writer lock even though not much 178 00:06:33,419 --> 00:06:37,860 is happening otherwise on 200 CPUs is 179 00:06:36,360 --> 00:06:40,440 chewing up better part of 10 180 00:06:37,860 --> 00:06:43,139 microseconds just to acquire the lock 181 00:06:40,440 --> 00:06:45,240 and to release it nothing else happening 182 00:06:43,139 --> 00:06:47,819 RCU on the other hand is flat all the 183 00:06:45,240 --> 00:06:49,259 way out to 200 CPUs and we've measured 184 00:06:47,819 --> 00:06:51,360 in Labs we haven't published it we 185 00:06:49,259 --> 00:06:52,979 majored in Labs out to a thousand you 186 00:06:51,360 --> 00:06:54,900 see that Jitter down there around send 187 00:06:52,979 --> 00:06:57,479 CPUs is most prominent that's due to 188 00:06:54,900 --> 00:06:59,160 hyper threading the thing is that one of 189 00:06:57,479 --> 00:07:01,259 these RCU 190 00:06:59,160 --> 00:07:03,300 threads can use up more than half of a 191 00:07:01,259 --> 00:07:05,699 core so if you have two of them the same 192 00:07:03,300 --> 00:07:07,080 core they fight a little bit and get a 193 00:07:05,699 --> 00:07:09,780 little bit lower throughput 194 00:07:07,080 --> 00:07:12,060 and we don't control which thread runs 195 00:07:09,780 --> 00:07:13,259 on which CPU so we see that Jitter in 196 00:07:12,060 --> 00:07:14,940 the middle there 197 00:07:13,259 --> 00:07:16,500 of course an empty reset critical 198 00:07:14,940 --> 00:07:18,240 section is kind of useless not entirely 199 00:07:16,500 --> 00:07:21,120 useless but not what you normally want 200 00:07:18,240 --> 00:07:23,340 so if we add something in the critical 201 00:07:21,120 --> 00:07:24,900 sections and you can see the latency the 202 00:07:23,340 --> 00:07:26,819 duration of that critical section on the 203 00:07:24,900 --> 00:07:28,800 bottom ranging from 100 nanoseconds to 204 00:07:26,819 --> 00:07:30,660 10 microseconds 205 00:07:28,800 --> 00:07:32,880 and we still have the nanoseconds for 206 00:07:30,660 --> 00:07:34,740 operation on the y-axis what you can see 207 00:07:32,880 --> 00:07:36,360 from this is that the performance 208 00:07:34,740 --> 00:07:39,060 disadvantage of reader writer locking 209 00:07:36,360 --> 00:07:41,280 decreases as you increase the critical 210 00:07:39,060 --> 00:07:43,860 second duration as you'd expect the 211 00:07:41,280 --> 00:07:46,800 overhead drops down into the noise 212 00:07:43,860 --> 00:07:49,620 however it also decreases with 213 00:07:46,800 --> 00:07:51,000 decreasing numbers of CPUs in short if 214 00:07:49,620 --> 00:07:53,220 you have really big critical sections 215 00:07:51,000 --> 00:07:55,440 they have almost no CPUs redu Rider 216 00:07:53,220 --> 00:07:57,120 locking will do just fine for you on the 217 00:07:55,440 --> 00:07:58,560 other hand if you have lots of CPUs and 218 00:07:57,120 --> 00:08:00,900 very short critical sections like you're 219 00:07:58,560 --> 00:08:03,660 protecting a very fast data structure 220 00:08:00,900 --> 00:08:06,060 like hash table RCU has great benefits 221 00:08:03,660 --> 00:08:09,000 early on 222 00:08:06,060 --> 00:08:11,280 all right there's also restrictions rco 223 00:08:09,000 --> 00:08:13,380 is a specialized tool it works best when 224 00:08:11,280 --> 00:08:16,319 you mostly have readers over there on 225 00:08:13,380 --> 00:08:18,900 the right and staling consistent data is 226 00:08:16,319 --> 00:08:20,460 okay now you can restore consistency if 227 00:08:18,900 --> 00:08:22,919 you want there's patterns that do that 228 00:08:20,460 --> 00:08:26,039 and it's also there are also a couple of 229 00:08:22,919 --> 00:08:28,319 use cases when you're all updating and 230 00:08:26,039 --> 00:08:29,460 you need consistent data but they're 231 00:08:28,319 --> 00:08:31,560 more rare 232 00:08:29,460 --> 00:08:33,060 uh the most common uses are up over 233 00:08:31,560 --> 00:08:35,760 towards the blue section 234 00:08:33,060 --> 00:08:37,020 and the other piece is that RCU is most 235 00:08:35,760 --> 00:08:38,459 frequently used for blank data 236 00:08:37,020 --> 00:08:40,860 structures there are some non-link data 237 00:08:38,459 --> 00:08:43,860 structure use cases but they're rare 238 00:08:40,860 --> 00:08:46,020 and the use cases are here the this is 239 00:08:43,860 --> 00:08:48,180 the same two URLs we saw earlier and 240 00:08:46,020 --> 00:08:50,220 they go through this in detail what we 241 00:08:48,180 --> 00:08:52,080 saw in that little code example was the 242 00:08:50,220 --> 00:08:54,240 upper left there the quasi-reader rider 243 00:08:52,080 --> 00:08:56,100 lock and the most of the complexity in 244 00:08:54,240 --> 00:08:57,480 RCU it's quite simple just in and of 245 00:08:56,100 --> 00:08:59,040 itself you wait for all pre-existing 246 00:08:57,480 --> 00:09:01,200 readers what can be simpler the 247 00:08:59,040 --> 00:09:02,519 complexity is making it work with the 248 00:09:01,200 --> 00:09:03,720 use case and fitting in with everything 249 00:09:02,519 --> 00:09:05,459 else that's where a lot of the 250 00:09:03,720 --> 00:09:08,459 complexity is 251 00:09:05,459 --> 00:09:10,680 okay so we blasted through a review uh 252 00:09:08,459 --> 00:09:12,480 and again if this is your first exposure 253 00:09:10,680 --> 00:09:14,339 to RSU please take a look at some of the 254 00:09:12,480 --> 00:09:15,959 other material including those two URLs 255 00:09:14,339 --> 00:09:17,880 but we're going to go ahead and look at 256 00:09:15,959 --> 00:09:19,740 some changes that have happened more 257 00:09:17,880 --> 00:09:21,120 recently and this is just a quick list 258 00:09:19,740 --> 00:09:23,760 of the ones we're going to go into a 259 00:09:21,120 --> 00:09:26,279 little bit of detail on but again this 260 00:09:23,760 --> 00:09:28,019 is going to be a bit of a flyover each 261 00:09:26,279 --> 00:09:29,399 of these topics most of them could be a 262 00:09:28,019 --> 00:09:31,080 full presentation we're going to give 263 00:09:29,399 --> 00:09:32,580 you a taste of what's happened as 264 00:09:31,080 --> 00:09:34,260 opposed to trying to dive deep in any 265 00:09:32,580 --> 00:09:35,519 one of them 266 00:09:34,260 --> 00:09:37,740 all right 267 00:09:35,519 --> 00:09:40,260 flavor consolidation is a follow-up I 268 00:09:37,740 --> 00:09:43,560 actually presented this at Linux Au in 269 00:09:40,260 --> 00:09:46,680 2019 so it'll be even flying faster I 270 00:09:43,560 --> 00:09:48,839 got a email from elinis Torvalds that CC 271 00:09:46,680 --> 00:09:50,820 security at kernel.org and didn't see CL 272 00:09:48,839 --> 00:09:52,380 came out which is usually a hint that uh 273 00:09:50,820 --> 00:09:53,820 you're not going to be minding your own 274 00:09:52,380 --> 00:09:56,940 business for a little while 275 00:09:53,820 --> 00:09:59,279 uh there was an exploit uh this don't do 276 00:09:56,940 --> 00:10:01,680 this uh what don't you do and and Lena 277 00:09:59,279 --> 00:10:03,360 said look can we have some way of making 278 00:10:01,680 --> 00:10:05,760 this mistake harder to make or 279 00:10:03,360 --> 00:10:07,920 impossible to make and the problem was 280 00:10:05,760 --> 00:10:10,200 we had different types of readers 281 00:10:07,920 --> 00:10:11,640 and you had to make sure you got the 282 00:10:10,200 --> 00:10:13,560 readers paired up with the proper 283 00:10:11,640 --> 00:10:15,000 updater if you did one reader and the 284 00:10:13,560 --> 00:10:19,080 other updater like happened in the 285 00:10:15,000 --> 00:10:20,160 exploit uh then uh you could have all 286 00:10:19,080 --> 00:10:22,800 sorts of problems like Linux security 287 00:10:20,160 --> 00:10:25,740 holes as what as what actually happened 288 00:10:22,800 --> 00:10:28,080 so I had a simple conceptually a simple 289 00:10:25,740 --> 00:10:29,519 fix which was to say okay synchronized 290 00:10:28,080 --> 00:10:30,899 RCU is going to match up with everything 291 00:10:29,519 --> 00:10:32,640 we're not going to have different kinds 292 00:10:30,899 --> 00:10:34,620 of updates use synchronize RC and that 293 00:10:32,640 --> 00:10:36,300 will take care of you and that that's 294 00:10:34,620 --> 00:10:38,160 nice and simple other than it took me a 295 00:10:36,300 --> 00:10:40,620 good year to actually make that 296 00:10:38,160 --> 00:10:43,380 transition uh but one thing is not 297 00:10:40,620 --> 00:10:49,500 simple is if you need to back port a 298 00:10:43,380 --> 00:10:52,380 patch from say 6.2 to 4.16. the 4.19 299 00:10:49,500 --> 00:10:54,720 uh that's a bit of a gap there 300 00:10:52,380 --> 00:10:56,579 okay so if it's 4.20 or later that 301 00:10:54,720 --> 00:10:59,399 you're backwarding to no problem it just 302 00:10:56,579 --> 00:11:00,899 works but if it's four times earlier you 303 00:10:59,399 --> 00:11:03,600 have to take a look at what your patch 304 00:11:00,899 --> 00:11:06,000 is because the semantics changed now if 305 00:11:03,600 --> 00:11:08,100 your readers are only RC read lock then 306 00:11:06,000 --> 00:11:10,440 you're golden just take it back it'll be 307 00:11:08,100 --> 00:11:12,779 fine otherwise there's a synchronized 308 00:11:10,440 --> 00:11:15,779 RCU mold that causes 309 00:11:12,779 --> 00:11:18,120 multiple weights concurrently uh as 310 00:11:15,779 --> 00:11:20,040 shown in that little example there and 311 00:11:18,120 --> 00:11:22,860 you can chain call RCU the different 312 00:11:20,040 --> 00:11:26,160 call rcus together one after another by 313 00:11:22,860 --> 00:11:29,640 having the one back invoke the other the 314 00:11:26,160 --> 00:11:30,300 next uh call RCU variant and get that to 315 00:11:29,640 --> 00:11:31,920 work 316 00:11:30,300 --> 00:11:35,399 okay so it's something that can be 317 00:11:31,920 --> 00:11:37,079 handled but be careful if you fail to do 318 00:11:35,399 --> 00:11:39,360 this if you just backboard it and you 319 00:11:37,079 --> 00:11:41,160 should have made a change the bugs are 320 00:11:39,360 --> 00:11:42,720 really difficult to reproduce and 321 00:11:41,160 --> 00:11:45,540 difficult to find 322 00:11:42,720 --> 00:11:46,980 okay so with that guy taken care of what 323 00:11:45,540 --> 00:11:49,100 I learned out of this is a key security 324 00:11:46,980 --> 00:11:51,660 property is ease of use 325 00:11:49,100 --> 00:11:53,279 my code was in some sense correct or 326 00:11:51,660 --> 00:11:55,019 close to it but 327 00:11:53,279 --> 00:11:58,500 um it didn't interact very well with the 328 00:11:55,019 --> 00:12:00,480 people so that I had to fix 329 00:11:58,500 --> 00:12:02,160 uh Joel Fernandez starting five four 330 00:12:00,480 --> 00:12:05,579 made some mods to list for each entry 331 00:12:02,160 --> 00:12:07,620 RCU and ageless differentiate RCU the 332 00:12:05,579 --> 00:12:09,300 old style which is still supported uh 333 00:12:07,620 --> 00:12:11,279 what happens is you Loop through it's 334 00:12:09,300 --> 00:12:14,339 like list free Gentry you Loop through a 335 00:12:11,279 --> 00:12:15,779 RCU protected list and what it does if 336 00:12:14,339 --> 00:12:17,820 you have locked up enable it says Hey 337 00:12:15,779 --> 00:12:20,279 there'd better be an archery Block in 338 00:12:17,820 --> 00:12:21,720 effect otherwise this is this isn't safe 339 00:12:20,279 --> 00:12:24,060 and I'm going to complain 340 00:12:21,720 --> 00:12:26,220 and that's okay except that sometimes 341 00:12:24,060 --> 00:12:27,779 you'll have shared code that is also 342 00:12:26,220 --> 00:12:30,060 called for an update side so it might be 343 00:12:27,779 --> 00:12:32,040 it's okay to be an RC reader or it's 344 00:12:30,060 --> 00:12:34,440 okay to hold event mutex 345 00:12:32,040 --> 00:12:36,779 and so what Joel did was he added a 346 00:12:34,440 --> 00:12:38,279 locked up an optional uh locked up 347 00:12:36,779 --> 00:12:39,240 expression to these guys shown on the 348 00:12:38,279 --> 00:12:41,040 bottom there 349 00:12:39,240 --> 00:12:42,779 and what you can do is put something 350 00:12:41,040 --> 00:12:45,300 like lock type is held Ampersand event 351 00:12:42,779 --> 00:12:47,579 mutex in that optional position and 352 00:12:45,300 --> 00:12:49,320 that'll tell lockdep okay we either have 353 00:12:47,579 --> 00:12:51,360 to be an RC reader or we have to hold 354 00:12:49,320 --> 00:12:53,880 this lock either way is fine and so 355 00:12:51,360 --> 00:12:56,639 you're exposing more of your a design to 356 00:12:53,880 --> 00:12:58,620 locked up and it allows uh shared code 357 00:12:56,639 --> 00:13:01,320 to work better 358 00:12:58,620 --> 00:13:03,180 okay so this gave us debug ability 359 00:13:01,320 --> 00:13:05,040 improvements without an API explosion 360 00:13:03,180 --> 00:13:08,399 which is a good thing 361 00:13:05,040 --> 00:13:10,740 uh vladislavresky uh in five nine gave a 362 00:13:08,399 --> 00:13:13,500 single argument K free RCU 363 00:13:10,740 --> 00:13:16,019 and uh what happened the old way of 364 00:13:13,500 --> 00:13:19,200 doing things you have cave cafe or K3 365 00:13:16,019 --> 00:13:21,180 RCU their synonyms at this point 366 00:13:19,200 --> 00:13:23,760 um you gave it a pointer that's p and 367 00:13:21,180 --> 00:13:26,160 you also gave it the field name of an 368 00:13:23,760 --> 00:13:28,380 RCU head field in the object and that's 369 00:13:26,160 --> 00:13:30,180 the RH there now the nice thing about 370 00:13:28,380 --> 00:13:32,100 this it just bang it happens it never 371 00:13:30,180 --> 00:13:33,660 sleeps and we'll it's continues to be 372 00:13:32,100 --> 00:13:35,100 reported supported it's still useful 373 00:13:33,660 --> 00:13:37,980 we'll still have it 374 00:13:35,100 --> 00:13:40,079 but one problem with it in some cases if 375 00:13:37,980 --> 00:13:42,300 you have a really small structure 376 00:13:40,079 --> 00:13:43,680 and that structure has there's a huge 377 00:13:42,300 --> 00:13:46,620 number of them in the kernel 378 00:13:43,680 --> 00:13:48,959 the extra eight bytes or four bytes in a 379 00:13:46,620 --> 00:13:50,100 32-bit system of that RCU head can be a 380 00:13:48,959 --> 00:13:52,200 real problem 381 00:13:50,100 --> 00:13:54,660 and so a new way you could do it is 382 00:13:52,200 --> 00:13:57,779 leave off that second argument just say 383 00:13:54,660 --> 00:14:00,000 KV free RCU of P like it shows there and 384 00:13:57,779 --> 00:14:03,660 in that case you don't need that RCU 385 00:14:00,000 --> 00:14:07,139 head sure in your in your object 386 00:14:03,660 --> 00:14:09,180 but the trade-off is that it's going to 387 00:14:07,139 --> 00:14:12,180 allocate memory to track this thing and 388 00:14:09,180 --> 00:14:15,000 of course if you're out of memory it can 389 00:14:12,180 --> 00:14:16,620 sleep uh so you know there's a latency 390 00:14:15,000 --> 00:14:19,620 issue and also you can't use it from a 391 00:14:16,620 --> 00:14:21,660 tonic contacts 392 00:14:19,620 --> 00:14:23,459 all right this has been around for a 393 00:14:21,660 --> 00:14:25,139 while and that was that was wonderful it 394 00:14:23,459 --> 00:14:27,720 helped out some people with the memory 395 00:14:25,139 --> 00:14:31,019 footprint except that it turned out to 396 00:14:27,720 --> 00:14:33,120 be a use ease of use problem again 397 00:14:31,019 --> 00:14:35,399 and the problem is is that it was just 398 00:14:33,120 --> 00:14:37,980 really easy to just forget that field 399 00:14:35,399 --> 00:14:41,820 and so instead of typing the KB free RCP 400 00:14:37,980 --> 00:14:43,320 comma RH just kv3p kv-3 rcop the 401 00:14:41,820 --> 00:14:45,180 compiler is fine with that didn't work 402 00:14:43,320 --> 00:14:46,620 great unless you recall it from Atomic 403 00:14:45,180 --> 00:14:47,459 context in which case locknip would 404 00:14:46,620 --> 00:14:50,820 complain 405 00:14:47,459 --> 00:14:52,920 but sometimes you're in a in an 406 00:14:50,820 --> 00:14:55,199 environment that can sleep 407 00:14:52,920 --> 00:14:57,480 but you don't want that extra sleep from 408 00:14:55,199 --> 00:14:59,579 Caper for your RCU it might be that you 409 00:14:57,480 --> 00:15:01,199 have an SLA to meet and that sleep is 410 00:14:59,579 --> 00:15:03,300 going to be too much 411 00:15:01,199 --> 00:15:04,620 so uh Eric dumasay ran into this found 412 00:15:03,300 --> 00:15:07,019 this problem 413 00:15:04,620 --> 00:15:10,019 Google's Google's Fleet and so what 414 00:15:07,019 --> 00:15:12,779 happened was that we no longer allow the 415 00:15:10,019 --> 00:15:15,899 single argument version of KV free RCU 416 00:15:12,779 --> 00:15:20,100 instead you have to put the might sleep 417 00:15:15,899 --> 00:15:22,440 suffix on it okay and uh what that means 418 00:15:20,100 --> 00:15:23,760 then is that you have a bigger Hemi 419 00:15:22,440 --> 00:15:27,180 distance between the correct and the 420 00:15:23,760 --> 00:15:28,560 incorrect variant of it and the might 421 00:15:27,180 --> 00:15:29,880 sleep the nice thing about that is you 422 00:15:28,560 --> 00:15:31,380 look at this thing and it tells you 423 00:15:29,880 --> 00:15:33,480 right there hey this might sleep which 424 00:15:31,380 --> 00:15:36,720 is added documentation 425 00:15:33,480 --> 00:15:38,100 uh while he was at it Vlad also uh used 426 00:15:36,720 --> 00:15:41,040 the pulled grace period API we'll 427 00:15:38,100 --> 00:15:44,459 describe next and that really seriously 428 00:15:41,040 --> 00:15:46,320 reduced the memory footprint uh uh under 429 00:15:44,459 --> 00:15:48,180 under a micro Benchmark but still it was 430 00:15:46,320 --> 00:15:50,339 it was order magnitude style reduction 431 00:15:48,180 --> 00:15:52,920 which is great 432 00:15:50,339 --> 00:15:54,660 so uh what I was surprised is sometimes 433 00:15:52,920 --> 00:15:58,160 the views requires more typing rather 434 00:15:54,660 --> 00:15:58,160 than less but here we are 435 00:15:58,440 --> 00:16:02,040 okay I said pulled grace period apis and 436 00:16:00,720 --> 00:16:03,600 here they are 437 00:16:02,040 --> 00:16:05,639 I'm not going to go through this in 438 00:16:03,600 --> 00:16:08,040 detail I'm just noting that the first 439 00:16:05,639 --> 00:16:09,959 move in this direction happened in 3.14 440 00:16:08,040 --> 00:16:12,300 almost 10 years ago 441 00:16:09,959 --> 00:16:14,399 and the way they're used is on this 442 00:16:12,300 --> 00:16:16,260 slide so what happens is you call get 443 00:16:14,399 --> 00:16:19,560 State synchronized RCU it just hands you 444 00:16:16,260 --> 00:16:21,779 back a unsigned long you put it in a 445 00:16:19,560 --> 00:16:23,100 cookie you go do something else and 446 00:16:21,779 --> 00:16:25,440 something else might take a fair amount 447 00:16:23,100 --> 00:16:27,480 of time and then you pass the cookie to 448 00:16:25,440 --> 00:16:29,399 con synchronize our shoe or con 449 00:16:27,480 --> 00:16:31,500 synchronizes RC under expedited if you 450 00:16:29,399 --> 00:16:33,899 want to thing to go fast and what it 451 00:16:31,500 --> 00:16:35,940 does is look it says hey has the grace 452 00:16:33,899 --> 00:16:37,920 period corresponding this to this cookie 453 00:16:35,940 --> 00:16:39,420 ended already and if it has it says okay 454 00:16:37,920 --> 00:16:42,480 great I'll just return 455 00:16:39,420 --> 00:16:44,040 otherwise it invokes synchronized RCU or 456 00:16:42,480 --> 00:16:45,420 synchronized under RSU under a bar 457 00:16:44,040 --> 00:16:47,399 accident depending on what you when you 458 00:16:45,420 --> 00:16:50,040 chose at that point so if you do 459 00:16:47,399 --> 00:16:53,040 something takes a while you get safety 460 00:16:50,040 --> 00:16:56,160 but you don't get the extra overhead 461 00:16:53,040 --> 00:16:57,660 and what you'd like uh so what I'll just 462 00:16:56,160 --> 00:17:00,240 give you an example usage here before 463 00:16:57,660 --> 00:17:01,920 getting the next thing 464 00:17:00,240 --> 00:17:03,540 and so what we have here is we have a 465 00:17:01,920 --> 00:17:05,400 state diagram we've got a single element 466 00:17:03,540 --> 00:17:07,079 that's in an RC protected data structure 467 00:17:05,400 --> 00:17:08,880 but it's a cash like thing where you age 468 00:17:07,079 --> 00:17:11,280 things out of the cash and you age it 469 00:17:08,880 --> 00:17:12,120 from left to right there one two three 470 00:17:11,280 --> 00:17:14,280 four 471 00:17:12,120 --> 00:17:15,959 so initially on the left we've got it 472 00:17:14,280 --> 00:17:18,179 it's actually in the cache readers can 473 00:17:15,959 --> 00:17:20,100 get at it and that means you have to be 474 00:17:18,179 --> 00:17:21,600 careful with it so we find something 475 00:17:20,100 --> 00:17:23,880 that hasn't been used for a while or 476 00:17:21,600 --> 00:17:25,140 maybe it's something about it we decide 477 00:17:23,880 --> 00:17:27,600 we're going to pull it out of the reader 478 00:17:25,140 --> 00:17:30,960 visible section and on the way out in 479 00:17:27,600 --> 00:17:32,460 state two there we note we give it a 480 00:17:30,960 --> 00:17:34,799 cookie saying okay the grace period 481 00:17:32,460 --> 00:17:37,260 started and then we put it going to 482 00:17:34,799 --> 00:17:39,120 state three now in state three it could 483 00:17:37,260 --> 00:17:41,640 be a reader says no wait a minute I want 484 00:17:39,120 --> 00:17:43,559 that back and so we list add RCU it 485 00:17:41,640 --> 00:17:46,620 whoops a list add RCU it back into the 486 00:17:43,559 --> 00:17:48,299 reader visible chunk or we might uh note 487 00:17:46,620 --> 00:17:50,280 that the readers are all done the grace 488 00:17:48,299 --> 00:17:51,720 period's over in which case we can move 489 00:17:50,280 --> 00:17:54,000 it to the fourth section where it can 490 00:17:51,720 --> 00:17:56,580 sit for a while if need if desirable 491 00:17:54,000 --> 00:17:58,740 where all the readers are done it's uh 492 00:17:56,580 --> 00:18:01,140 it could be used anytime and the cool 493 00:17:58,740 --> 00:18:03,600 thing about that is if you get an oom 494 00:18:01,140 --> 00:18:05,039 event and out of memory of that all the 495 00:18:03,600 --> 00:18:06,780 guys in state four can be immediately 496 00:18:05,039 --> 00:18:08,580 freed which allows you to have the 497 00:18:06,780 --> 00:18:11,280 archery readers and have it be safe and 498 00:18:08,580 --> 00:18:12,780 have the delays that make it so that you 499 00:18:11,280 --> 00:18:14,100 can immediately respond to this out of 500 00:18:12,780 --> 00:18:15,900 memory event 501 00:18:14,100 --> 00:18:17,700 until next slide we have the same thing 502 00:18:15,900 --> 00:18:19,980 only we're showing The Primitives used 503 00:18:17,700 --> 00:18:22,080 on those transitions from left to right 504 00:18:19,980 --> 00:18:23,520 list lrc has been around for a long time 505 00:18:22,080 --> 00:18:25,980 and that's what we use to remove this 506 00:18:23,520 --> 00:18:27,960 element safely from the list of readers 507 00:18:25,980 --> 00:18:30,480 then we use get State synchronized RSU 508 00:18:27,960 --> 00:18:33,240 which we've seen before or uh start pull 509 00:18:30,480 --> 00:18:34,799 synchronized RCU which uh to get State 510 00:18:33,240 --> 00:18:37,500 synchronizer so just gives you a cookie 511 00:18:34,799 --> 00:18:39,240 uh get start pull synchronous RCU gives 512 00:18:37,500 --> 00:18:42,240 you a cookie and it makes sure that the 513 00:18:39,240 --> 00:18:44,580 gray sprees you need are going to start 514 00:18:42,240 --> 00:18:47,100 uh pull State synchronized RCU just 515 00:18:44,580 --> 00:18:49,500 pulls the thing and says uh is it done 516 00:18:47,100 --> 00:18:52,679 yet and if it is that's our y there and 517 00:18:49,500 --> 00:18:54,179 we can move it over now uh we this thing 518 00:18:52,679 --> 00:18:55,860 is subject to counter overflow we've 519 00:18:54,179 --> 00:18:58,020 only got a limited number of bits and 520 00:18:55,860 --> 00:19:00,299 they can overflow and so there's a get 521 00:18:58,020 --> 00:19:02,940 completed synchronized RCU that gives us 522 00:19:00,299 --> 00:19:04,440 a Cookie that is Perma expired if you 523 00:19:02,940 --> 00:19:06,480 get a cookie from get completed 524 00:19:04,440 --> 00:19:07,559 synchronized RCU pull State synchronized 525 00:19:06,480 --> 00:19:09,360 artist who's always going to say true 526 00:19:07,559 --> 00:19:11,400 yep that thing's expired 527 00:19:09,360 --> 00:19:13,260 and that means you can avoid counteract 528 00:19:11,400 --> 00:19:14,880 problems which aren't much of a problem 529 00:19:13,260 --> 00:19:18,140 64-bit systems but there still are 530 00:19:14,880 --> 00:19:18,140 32-bit systems running around 531 00:19:18,179 --> 00:19:22,740 okay so uh 532 00:19:20,940 --> 00:19:25,020 here's the same sort of thing again 533 00:19:22,740 --> 00:19:26,580 except instead of using con synchronized 534 00:19:25,020 --> 00:19:28,080 RC we're using pull State synchronized 535 00:19:26,580 --> 00:19:30,240 RC and what you'd like to have happen 536 00:19:28,080 --> 00:19:32,340 but which does not work you'd like to 537 00:19:30,240 --> 00:19:34,200 get your cookie and then if you somehow 538 00:19:32,340 --> 00:19:35,700 do a synchronized RCU you would like it 539 00:19:34,200 --> 00:19:37,919 to be unconditionally the case the pull 540 00:19:35,700 --> 00:19:39,960 State synchronous RCU says yeah that 541 00:19:37,919 --> 00:19:42,059 grace period is done but there are two 542 00:19:39,960 --> 00:19:43,200 reasons that doesn't happen one is 543 00:19:42,059 --> 00:19:45,840 counterwrap which we've already 544 00:19:43,200 --> 00:19:47,700 discussed the other one is that we have 545 00:19:45,840 --> 00:19:50,580 both normal and synchronous and 546 00:19:47,700 --> 00:19:53,340 expedited Grace periods the normal ones 547 00:19:50,580 --> 00:19:55,380 are slow but use less CPU the expedited 548 00:19:53,340 --> 00:19:57,059 ones are much faster but hit the CPUs a 549 00:19:55,380 --> 00:19:59,580 lot harder and interrupt other CPUs and 550 00:19:57,059 --> 00:20:00,780 aren't so good for real time 551 00:19:59,580 --> 00:20:02,940 now 552 00:20:00,780 --> 00:20:04,500 get State synchronized RCU returns a 553 00:20:02,940 --> 00:20:07,080 single counter so these are mushed 554 00:20:04,500 --> 00:20:10,500 together and because we had 128 bits of 555 00:20:07,080 --> 00:20:13,980 State represented 64 bits we lose things 556 00:20:10,500 --> 00:20:15,900 and what we lose is that if if this 557 00:20:13,980 --> 00:20:18,000 synchronized RCU happens to overlap with 558 00:20:15,900 --> 00:20:19,020 an expedite synchronized RC somewhere 559 00:20:18,000 --> 00:20:21,299 else 560 00:20:19,020 --> 00:20:23,760 one or the other of those events can be 561 00:20:21,299 --> 00:20:25,740 lost and so that means we get a false 562 00:20:23,760 --> 00:20:29,700 positive trigger however 563 00:20:25,740 --> 00:20:32,340 if you duplicate that as RCU then you're 564 00:20:29,700 --> 00:20:34,440 guaranteed that one or the other of them 565 00:20:32,340 --> 00:20:37,460 will win either that or something in 566 00:20:34,440 --> 00:20:39,840 between them wins either way 567 00:20:37,460 --> 00:20:42,059 the cookie will notice a grace period 568 00:20:39,840 --> 00:20:44,640 passing now you can still get false 569 00:20:42,059 --> 00:20:46,980 positive wardons there but those will 570 00:20:44,640 --> 00:20:49,140 only be due to counteract 571 00:20:46,980 --> 00:20:51,240 most of the time you don't care most the 572 00:20:49,140 --> 00:20:53,520 time uh wait one gracer weight grade two 573 00:20:51,240 --> 00:20:54,900 Grace periods who cares but sometimes 574 00:20:53,520 --> 00:20:56,460 you do care and sometimes you're willing 575 00:20:54,900 --> 00:20:58,260 to pay a price of a larger data 576 00:20:56,460 --> 00:21:00,720 structure and so each of these things 577 00:20:58,260 --> 00:21:03,600 has an under bar full suffix 578 00:21:00,720 --> 00:21:04,980 and there you're getting two counters so 579 00:21:03,600 --> 00:21:06,840 you have a full representation of this 580 00:21:04,980 --> 00:21:08,280 representation of the state 581 00:21:06,840 --> 00:21:10,080 now 582 00:21:08,280 --> 00:21:12,539 that might blow your data structure 583 00:21:10,080 --> 00:21:15,900 however it turns out that there's only a 584 00:21:12,539 --> 00:21:18,120 few values of each cookie that can 585 00:21:15,900 --> 00:21:19,500 possibly correspond to a grace period 586 00:21:18,120 --> 00:21:22,679 not yet being done 587 00:21:19,500 --> 00:21:25,080 and those two uh CPP macros and I'm 588 00:21:22,679 --> 00:21:27,419 active CPU macros tell you in each case 589 00:21:25,080 --> 00:21:29,340 not full and full how many of those 590 00:21:27,419 --> 00:21:31,260 there can be and so you can have an 591 00:21:29,340 --> 00:21:33,120 array of that size and then know that 592 00:21:31,260 --> 00:21:34,980 you can only have that many that are 593 00:21:33,120 --> 00:21:36,360 waiting still and so instead of having 594 00:21:34,980 --> 00:21:38,640 the cookie in each element of your data 595 00:21:36,360 --> 00:21:41,240 structure you can put in that very small 596 00:21:38,640 --> 00:21:41,240 array size 597 00:21:41,640 --> 00:21:47,220 so using the full same sort of away here 598 00:21:44,960 --> 00:21:48,600 because we have a structure we pass it 599 00:21:47,220 --> 00:21:51,059 by reference to get State synchronized 600 00:21:48,600 --> 00:21:53,159 RCU full in this case we only need one 601 00:21:51,059 --> 00:21:55,260 grace period and full State synchronize 602 00:21:53,159 --> 00:21:56,880 our shoe full can trigger but again only 603 00:21:55,260 --> 00:21:58,260 to counterwrap and we got two counters 604 00:21:56,880 --> 00:22:00,539 and they both have to wrap just the 605 00:21:58,260 --> 00:22:01,679 wrong way so you have to work to make 606 00:22:00,539 --> 00:22:04,559 this happen 607 00:22:01,679 --> 00:22:06,900 especially on a 64-bit system 608 00:22:04,559 --> 00:22:09,059 so what this means is we have a lockless 609 00:22:06,900 --> 00:22:11,760 grace period API if you really want to 610 00:22:09,059 --> 00:22:13,860 you can interact with RCU Grace periods 611 00:22:11,760 --> 00:22:17,760 from an nmi Handler 612 00:22:13,860 --> 00:22:19,500 which seems kind of unusual we really do 613 00:22:17,760 --> 00:22:22,260 have people wanting this and using this 614 00:22:19,500 --> 00:22:24,780 for example from Atomic context or 615 00:22:22,260 --> 00:22:26,100 Hardware interrupt handlers 616 00:22:24,780 --> 00:22:27,539 okay 617 00:22:26,100 --> 00:22:30,059 I'm not I'm going to go through these 618 00:22:27,539 --> 00:22:34,080 really quickly mostly these are things 619 00:22:30,059 --> 00:22:35,159 used for uh for tracing free trampolines 620 00:22:34,080 --> 00:22:37,679 especially 621 00:22:35,159 --> 00:22:39,539 and these guys are all very specialized 622 00:22:37,679 --> 00:22:41,100 uh here they are the main thing I'm 623 00:22:39,539 --> 00:22:42,360 going to do is and by the way 624 00:22:41,100 --> 00:22:43,919 implementing these and getting them 625 00:22:42,360 --> 00:22:47,059 working and speeding them up as I was 626 00:22:43,919 --> 00:22:49,980 greatly assisted by a neurology 627 00:22:47,059 --> 00:22:51,780 and the key thing is these are for 628 00:22:49,980 --> 00:22:54,299 tracing a BPF if you really need them 629 00:22:51,780 --> 00:22:56,039 okay fine but talk to us talk to the 630 00:22:54,299 --> 00:22:58,140 tracing folk talk to the BPF folk 631 00:22:56,039 --> 00:23:00,179 because it's really easy to abuse them 632 00:22:58,140 --> 00:23:02,640 in a way that messes it up for them all 633 00:23:00,179 --> 00:23:05,100 right so if you need these make sure you 634 00:23:02,640 --> 00:23:08,640 let us know 635 00:23:05,100 --> 00:23:10,260 all right uh this next topic is callback 636 00:23:08,640 --> 00:23:11,280 offloading and de-off loading this is 637 00:23:10,260 --> 00:23:13,980 something contributed by Frederick 638 00:23:11,280 --> 00:23:15,659 weisbecker it's available in V 5.12 now 639 00:23:13,980 --> 00:23:17,700 first off what the heck's offloading or 640 00:23:15,659 --> 00:23:20,039 de-off loading and why do you care 641 00:23:17,700 --> 00:23:22,559 this is the classic way that callbacks 642 00:23:20,039 --> 00:23:24,419 are invoked so what happens remember we 643 00:23:22,559 --> 00:23:27,600 have readers and we have Grace periods 644 00:23:24,419 --> 00:23:29,340 and what happens with call RCU is that 645 00:23:27,600 --> 00:23:30,780 you have a function that's invoked and 646 00:23:29,340 --> 00:23:33,780 these are kept active with a list of 647 00:23:30,780 --> 00:23:35,880 little RCU head structures and then once 648 00:23:33,780 --> 00:23:37,440 the grace period has ended we grab that 649 00:23:35,880 --> 00:23:39,419 RCU head and we call the function of 650 00:23:37,440 --> 00:23:41,159 half of the argument that's in that RC 651 00:23:39,419 --> 00:23:44,100 head structure 652 00:23:41,159 --> 00:23:45,360 and those callbacks by default today and 653 00:23:44,100 --> 00:23:46,140 it used to be the only way you could do 654 00:23:45,360 --> 00:23:49,020 it 655 00:23:46,140 --> 00:23:52,320 are invoked in soft irq environment all 656 00:23:49,020 --> 00:23:54,720 right so cpu0 does a call RCU that 657 00:23:52,320 --> 00:23:57,600 enqueue something sometime later uh 658 00:23:54,720 --> 00:23:59,880 software queue happens and we freeze 659 00:23:57,600 --> 00:24:02,280 some memory usually and life is good 660 00:23:59,880 --> 00:24:04,260 except that this thing is invoked in 661 00:24:02,280 --> 00:24:05,640 there a priority and it's likely 662 00:24:04,260 --> 00:24:07,860 disrupting whatever's intended to 663 00:24:05,640 --> 00:24:09,240 execute about this time I mean it 664 00:24:07,860 --> 00:24:11,940 doesn't matter you might have been 665 00:24:09,240 --> 00:24:14,039 running a 99 prior real-time sched 50 666 00:24:11,940 --> 00:24:17,159 task the highest priority you get as a 667 00:24:14,039 --> 00:24:18,659 user it'll still interrupt you okay and 668 00:24:17,159 --> 00:24:21,360 the the real-time guys don't like that 669 00:24:18,659 --> 00:24:23,460 for good for good reason all right 670 00:24:21,360 --> 00:24:26,340 so what we did and this is something 671 00:24:23,460 --> 00:24:28,679 that Jim Houston and Joel Cordy uh uh 672 00:24:26,340 --> 00:24:31,260 spearheaded uh back in a special purpose 673 00:24:28,679 --> 00:24:33,000 real-time Linux back in the day 674 00:24:31,260 --> 00:24:36,120 but what we do is instead of having the 675 00:24:33,000 --> 00:24:37,440 CPU do it we offload these things so uh 676 00:24:36,120 --> 00:24:40,080 if CPU 677 00:24:37,440 --> 00:24:41,940 CPU one there does a call RCU there's a 678 00:24:40,080 --> 00:24:44,700 grace period but instead of invoking it 679 00:24:41,940 --> 00:24:47,159 at software queue on that CPU it instead 680 00:24:44,700 --> 00:24:49,740 gets invoked on an on a cape a separate 681 00:24:47,159 --> 00:24:51,780 thread that's associated with that CPU 682 00:24:49,740 --> 00:24:53,100 but does not necessarily run on it it's 683 00:24:51,780 --> 00:24:55,799 going to run wherever the scheduler 684 00:24:53,100 --> 00:24:57,780 sells it to or you can manually place it 685 00:24:55,799 --> 00:25:00,120 somewhere if you want to 686 00:24:57,780 --> 00:25:02,760 but even if it does around CPU one it is 687 00:25:00,120 --> 00:25:05,580 running as a normal task and that means 688 00:25:02,760 --> 00:25:07,260 it won't preempt high priority tasks so 689 00:25:05,580 --> 00:25:09,299 this works better for real time even if 690 00:25:07,260 --> 00:25:12,059 it is running on the same CPU 691 00:25:09,299 --> 00:25:14,460 all right so we have two ways you can 692 00:25:12,059 --> 00:25:16,860 run callbacks they can be not offloaded 693 00:25:14,460 --> 00:25:19,020 uh and run in software to context which 694 00:25:16,860 --> 00:25:20,340 is a little bit more efficient or you 695 00:25:19,020 --> 00:25:23,220 can offload them and have better real 696 00:25:20,340 --> 00:25:25,740 time and HPC properties 697 00:25:23,220 --> 00:25:27,900 problem is and this is great it reduces 698 00:25:25,740 --> 00:25:30,000 Jitter and saves power the safe power 699 00:25:27,900 --> 00:25:32,760 was a surprise but a very nice one 700 00:25:30,000 --> 00:25:35,520 except that you have to choose a boot 701 00:25:32,760 --> 00:25:37,320 time which CPUs are offloaded back in 702 00:25:35,520 --> 00:25:38,940 the day this wasn't a problem but we 703 00:25:37,320 --> 00:25:42,000 have lots and lots of Hardware threads 704 00:25:38,940 --> 00:25:43,380 and lots of cores and having that baked 705 00:25:42,000 --> 00:25:47,100 in for all of runtime is becoming 706 00:25:43,380 --> 00:25:49,500 inconvenient and so what happens is that 707 00:25:47,100 --> 00:25:52,860 Frederick made it so that there's some 708 00:25:49,500 --> 00:25:56,400 new internal apis SAR so and OCB 709 00:25:52,860 --> 00:25:58,260 uh CPU offload and RC and OCB CPU d-off 710 00:25:56,400 --> 00:26:00,059 load on the bottom there and what those 711 00:25:58,260 --> 00:26:02,940 do is you can tell one CPU to switch 712 00:26:00,059 --> 00:26:04,440 between those two modes at runtime and 713 00:26:02,940 --> 00:26:05,880 you have to have a particular your 714 00:26:04,440 --> 00:26:08,279 kernel has to be set up properly for 715 00:26:05,880 --> 00:26:09,840 this by default this you won't get this 716 00:26:08,279 --> 00:26:13,260 but then you won't get offloading in the 717 00:26:09,840 --> 00:26:14,940 first place all right now 718 00:26:13,260 --> 00:26:17,220 um you're growing well I'm a system 719 00:26:14,940 --> 00:26:18,659 administrator or I'm a user of why do I 720 00:26:17,220 --> 00:26:21,600 have to write kernel code to change this 721 00:26:18,659 --> 00:26:23,820 and that's the current state this is a 722 00:26:21,600 --> 00:26:26,100 first step towards runtime adjustment of 723 00:26:23,820 --> 00:26:28,500 the no hurts full which is the 724 00:26:26,100 --> 00:26:31,740 essentially Hardware response you know 725 00:26:28,500 --> 00:26:33,900 uh bare metal response from a given CPU 726 00:26:31,740 --> 00:26:35,520 and that once we get that all the way 727 00:26:33,900 --> 00:26:37,919 done this is just one step in a very 728 00:26:35,520 --> 00:26:40,020 large puzzle for this then there will 729 00:26:37,919 --> 00:26:42,059 clearly be user Space controls 730 00:26:40,020 --> 00:26:43,980 up to that point if somebody needs it 731 00:26:42,059 --> 00:26:48,059 let us know but we don't know of a use 732 00:26:43,980 --> 00:26:49,380 case for just the Callback offloading 733 00:26:48,059 --> 00:26:51,720 okay so that's some great work from 734 00:26:49,380 --> 00:26:54,779 Frederick uh one of the things is 735 00:26:51,720 --> 00:26:56,120 putting RCU on a memory diet 736 00:26:54,779 --> 00:26:59,220 back before 737 00:26:56,120 --> 00:27:02,279 v4.12 the SRC struct was a small thing 738 00:26:59,220 --> 00:27:04,500 but the problem was that there was a 739 00:27:02,279 --> 00:27:06,240 single callback list and that meant that 740 00:27:04,500 --> 00:27:07,020 there was Global contention on this 741 00:27:06,240 --> 00:27:09,179 thing 742 00:27:07,020 --> 00:27:11,940 and we were having more and more use of 743 00:27:09,179 --> 00:27:14,039 srcu and thus more potential for there 744 00:27:11,940 --> 00:27:16,080 being lots of call ourselor fuse or lots 745 00:27:14,039 --> 00:27:17,820 of synchronizers to use concurrently and 746 00:27:16,080 --> 00:27:21,299 that would cause law contention 747 00:27:17,820 --> 00:27:25,020 so in 4.12 we added the SRC you know 748 00:27:21,299 --> 00:27:27,779 combining tree and that's just a way of 749 00:27:25,020 --> 00:27:29,820 causing small groups of CPUs to contend 750 00:27:27,779 --> 00:27:30,659 the winner of that contends and so on up 751 00:27:29,820 --> 00:27:32,520 the tree 752 00:27:30,659 --> 00:27:34,380 so you can have a huge number of CPUs 753 00:27:32,520 --> 00:27:36,480 and keep the lock contention 754 00:27:34,380 --> 00:27:38,880 fixed and bounded 755 00:27:36,480 --> 00:27:41,340 problem is that this was allocated to 756 00:27:38,880 --> 00:27:43,860 build time and the only thing we had was 757 00:27:41,340 --> 00:27:47,100 in our CPUs and in some distros this 758 00:27:43,860 --> 00:27:49,200 thing can range up to 4096. 759 00:27:47,100 --> 00:27:50,520 um and that can be a pro and it's not 760 00:27:49,200 --> 00:27:53,700 that much of a problem if you get that 761 00:27:50,520 --> 00:27:56,760 big of a machine it's only 26k 762 00:27:53,700 --> 00:27:58,799 but uh the thing is people put srcu 763 00:27:56,760 --> 00:28:00,120 structs inside of other structures and 764 00:27:58,799 --> 00:28:01,260 so they have a fast path that wants 765 00:28:00,120 --> 00:28:04,140 something underneath 766 00:28:01,260 --> 00:28:06,900 the srsu is struck and suddenly they 767 00:28:04,140 --> 00:28:08,520 can't use the short assembly immediate 768 00:28:06,900 --> 00:28:09,960 offsets and the compiler generates worse 769 00:28:08,520 --> 00:28:12,179 code for them on their fast path and 770 00:28:09,960 --> 00:28:14,880 that wasn't considered a friendly Act 771 00:28:12,179 --> 00:28:17,419 so in V 5.17 what we did was we made it 772 00:28:14,880 --> 00:28:21,419 so that the srcu node combining tree is 773 00:28:17,419 --> 00:28:22,620 allocated separately and it's optional 774 00:28:21,419 --> 00:28:25,140 all right 775 00:28:22,620 --> 00:28:27,240 and because it's allocated separately at 776 00:28:25,140 --> 00:28:30,659 runtime instead of having used the big 777 00:28:27,240 --> 00:28:33,000 NR CPUs huge thing we can use nrcpu IDs 778 00:28:30,659 --> 00:28:35,340 which is set to the biggest number of 779 00:28:33,000 --> 00:28:37,080 CPUs this particular system booting 780 00:28:35,340 --> 00:28:40,620 right now could possibly have which is 781 00:28:37,080 --> 00:28:42,299 usually way way smaller than 4096. so 782 00:28:40,620 --> 00:28:45,120 even if you do have to allocate the 783 00:28:42,299 --> 00:28:47,220 combining tree it's smaller and in most 784 00:28:45,120 --> 00:28:48,480 cases you don't need it all right and so 785 00:28:47,220 --> 00:28:51,179 in a lot of cases you can just have the 786 00:28:48,480 --> 00:28:55,320 SRC instruct another little pointer 787 00:28:51,179 --> 00:28:56,760 so uh how do you make this work well by 788 00:28:55,320 --> 00:28:58,200 default what happens that the default 789 00:28:56,760 --> 00:29:00,960 boot parameters are on the bottom there 790 00:28:58,200 --> 00:29:02,640 if you have less than 128 CPUs you just 791 00:29:00,960 --> 00:29:04,620 have this srcu struct with a null 792 00:29:02,640 --> 00:29:06,779 pointer and that's all you have 793 00:29:04,620 --> 00:29:09,000 if you have a big system then contention 794 00:29:06,779 --> 00:29:11,340 is more of a problem and so what we do 795 00:29:09,000 --> 00:29:12,600 is uh when we when you allocate the 796 00:29:11,340 --> 00:29:14,880 structure when you initialize the 797 00:29:12,600 --> 00:29:16,799 structure it unconditionally allocates 798 00:29:14,880 --> 00:29:20,179 the array but again it's smaller it's 799 00:29:16,799 --> 00:29:20,179 just the NR CPU IDs 800 00:29:20,640 --> 00:29:25,620 okay that's the default approach there 801 00:29:23,159 --> 00:29:28,200 are other approaches you can set the 802 00:29:25,620 --> 00:29:30,539 instead of setting the 803 00:29:28,200 --> 00:29:32,039 uh convert to Big to three you can set 804 00:29:30,539 --> 00:29:34,020 it to zero in that case it says just 805 00:29:32,039 --> 00:29:35,880 never allocated which might be a good 806 00:29:34,020 --> 00:29:37,980 choice if on a small system 807 00:29:35,880 --> 00:29:41,480 and the other thing you can do is you 808 00:29:37,980 --> 00:29:41,480 can you can say uh 809 00:29:41,640 --> 00:29:45,059 convert to big as one in which case 810 00:29:43,440 --> 00:29:47,159 it'll always allocate it no matter how 811 00:29:45,059 --> 00:29:48,960 big your system is and if you have a 812 00:29:47,159 --> 00:29:51,120 situation where you have a kernel module 813 00:29:48,960 --> 00:29:53,159 that's just hammering SRC updates 814 00:29:51,120 --> 00:29:55,020 heavily you might do that just to get it 815 00:29:53,159 --> 00:29:57,000 out of the way and there's another one 816 00:29:55,020 --> 00:29:58,860 on the bottom for Pursuit torture the 817 00:29:57,000 --> 00:30:00,419 other thing you do are just showing the 818 00:29:58,860 --> 00:30:03,120 lower bit of this thing you can use the 819 00:30:00,419 --> 00:30:04,380 lower nibble excuse me if you use the 820 00:30:03,120 --> 00:30:06,419 next nibble up 821 00:30:04,380 --> 00:30:08,460 uh that says do conversion on contention 822 00:30:06,419 --> 00:30:10,500 and that seems like a lot nicer right 823 00:30:08,460 --> 00:30:12,840 because you just let it be there and if 824 00:30:10,500 --> 00:30:14,580 you have contention you increase it 825 00:30:12,840 --> 00:30:16,620 um like this okay 826 00:30:14,580 --> 00:30:17,820 and if you do that then it would just 827 00:30:16,620 --> 00:30:20,700 leave it alone if you have contention it 828 00:30:17,820 --> 00:30:21,600 would allocate it up uh the problem with 829 00:30:20,700 --> 00:30:24,059 this 830 00:30:21,600 --> 00:30:26,220 is that uh we haven't yet learned our 831 00:30:24,059 --> 00:30:28,020 trust this is a large increase in state 832 00:30:26,220 --> 00:30:29,880 space there's a lot of things that go 833 00:30:28,020 --> 00:30:31,620 wrong we need a lot of experience on it 834 00:30:29,880 --> 00:30:32,940 before we make it be default it's there 835 00:30:31,620 --> 00:30:34,860 if you want to play with it if you need 836 00:30:32,940 --> 00:30:37,140 it great use it let us know how it works 837 00:30:34,860 --> 00:30:39,419 and at some point we will make this 838 00:30:37,140 --> 00:30:41,460 default unless something breaks but it's 839 00:30:39,419 --> 00:30:42,960 not there yet and we'll also make this 840 00:30:41,460 --> 00:30:45,620 the fault perhaps we can remove some of 841 00:30:42,960 --> 00:30:48,299 the of the other options we'll see 842 00:30:45,620 --> 00:30:50,700 uh and the overall lesson here is memory 843 00:30:48,299 --> 00:30:53,340 is cheap compared to when I was uh much 844 00:30:50,700 --> 00:30:55,980 younger but it's not that cheap there 845 00:30:53,340 --> 00:30:58,080 are other things that uh run into there 846 00:30:55,980 --> 00:30:59,820 this is just a cheat sheet for people 847 00:30:58,080 --> 00:31:02,039 looking at the at the presentation even 848 00:30:59,820 --> 00:31:02,940 organized view of the of the kernel boot 849 00:31:02,039 --> 00:31:05,220 parameters 850 00:31:02,940 --> 00:31:07,380 another thing that happened and this is 851 00:31:05,220 --> 00:31:10,080 also an assist by Frederick weisbecker 852 00:31:07,380 --> 00:31:13,080 is it the print K guys need an nmi-safed 853 00:31:10,080 --> 00:31:16,080 srcu to have a lockless print K and so 854 00:31:13,080 --> 00:31:17,700 we added the nmi safe variants of that a 855 00:31:16,080 --> 00:31:20,460 key thing here is that you can't mix and 856 00:31:17,700 --> 00:31:23,399 match if you have an srcu struct 857 00:31:20,460 --> 00:31:25,559 and you do SRC readlock nmi safe you 858 00:31:23,399 --> 00:31:27,779 have to always use the nmi safe variants 859 00:31:25,559 --> 00:31:29,940 of the readers on that structure 860 00:31:27,779 --> 00:31:31,919 if you ever use an sr3 to unlock without 861 00:31:29,940 --> 00:31:33,840 the nmi safe you don't get to use nmi 862 00:31:31,919 --> 00:31:36,240 safe on that structure ever once the 863 00:31:33,840 --> 00:31:38,700 reason for that is that in theory mix 864 00:31:36,240 --> 00:31:40,440 and match but it's really you have to be 865 00:31:38,700 --> 00:31:43,320 really careful about it 866 00:31:40,440 --> 00:31:46,080 and we also have cross CPU SRC readers 867 00:31:43,320 --> 00:31:48,179 now so that you don't have to be on the 868 00:31:46,080 --> 00:31:52,620 same task or CPU 869 00:31:48,179 --> 00:31:55,799 okay so specialized SRC readers 870 00:31:52,620 --> 00:31:57,720 um uh this is something vladretsky 871 00:31:55,799 --> 00:32:00,899 contributed I'm not going to go through 872 00:31:57,720 --> 00:32:02,399 that in detail but uh expedited rccp 873 00:32:00,899 --> 00:32:04,919 install warnings are something that have 874 00:32:02,399 --> 00:32:06,960 used to be in the 1990s and 90s PTX 875 00:32:04,919 --> 00:32:09,899 quite short they're quite long in Linux 876 00:32:06,960 --> 00:32:12,179 and he came up with 20 milliseconds for 877 00:32:09,899 --> 00:32:14,520 expedited Grace periods and uh 878 00:32:12,179 --> 00:32:16,740 collection there was a few mods needed 879 00:32:14,520 --> 00:32:18,840 to make this work uh once they got those 880 00:32:16,740 --> 00:32:21,240 mods done they reduced the maximum 881 00:32:18,840 --> 00:32:22,860 latency on their tests of expedited 882 00:32:21,240 --> 00:32:25,080 Grace periods from about two seconds to 883 00:32:22,860 --> 00:32:26,580 two milliseconds that's three orders of 884 00:32:25,080 --> 00:32:28,020 magnitude which is pretty freaking 885 00:32:26,580 --> 00:32:30,059 impressive 886 00:32:28,020 --> 00:32:31,919 so there were a few other issues that 887 00:32:30,059 --> 00:32:34,860 were needed to be fixed Vlad rescue took 888 00:32:31,919 --> 00:32:37,020 care of those in 6.0 and this looks like 889 00:32:34,860 --> 00:32:41,520 it's working quite well and is helping a 890 00:32:37,020 --> 00:32:44,460 lot with some of the uh small scale 891 00:32:41,520 --> 00:32:46,500 um embedded systems 892 00:32:44,460 --> 00:32:47,760 and so at last our star warnings at 893 00:32:46,500 --> 00:32:50,480 least for one aspect are faster than 894 00:32:47,760 --> 00:32:50,480 1990s 895 00:32:50,940 --> 00:32:54,899 we're going to go through lazy RSU 896 00:32:52,500 --> 00:32:57,480 callbacks fairly quickly uh these this 897 00:32:54,899 --> 00:32:59,100 is a system that is mostly idle and it 898 00:32:57,480 --> 00:33:00,840 wakes up and does every once in a while 899 00:32:59,100 --> 00:33:02,940 so we've got the app and when it wakes 900 00:33:00,840 --> 00:33:04,559 up it closes a file when it gets done 901 00:33:02,940 --> 00:33:06,360 well you close the file and your 902 00:33:04,559 --> 00:33:08,760 multi-threaded app the kernel has to do 903 00:33:06,360 --> 00:33:10,620 a call RCU to clean up that file of 904 00:33:08,760 --> 00:33:11,880 scripture safely and that means RCU 905 00:33:10,620 --> 00:33:13,799 schedule is a grace period if these 906 00:33:11,880 --> 00:33:15,179 things happen fairly far spaced apart 907 00:33:13,799 --> 00:33:16,799 which they do if the if the thing's 908 00:33:15,179 --> 00:33:19,260 almost idle you get lots of Grace 909 00:33:16,799 --> 00:33:20,880 variants and that costs you lots of 910 00:33:19,260 --> 00:33:22,799 battery power 911 00:33:20,880 --> 00:33:24,419 and my personal preference was for the 912 00:33:22,799 --> 00:33:26,279 app just to leave the file open for 913 00:33:24,419 --> 00:33:28,320 crying out loud and it'll save a lot of 914 00:33:26,279 --> 00:33:30,120 time and effort but uh organizational 915 00:33:28,320 --> 00:33:31,440 problems apparently prevented that to 916 00:33:30,120 --> 00:33:33,899 straightforward solution being 917 00:33:31,440 --> 00:33:35,519 implemented so what we did instead and 918 00:33:33,899 --> 00:33:38,399 this is uh Joel Fernandez with assist 919 00:33:35,519 --> 00:33:39,840 from Frederick weisbecker it's a 6.2 is 920 00:33:38,399 --> 00:33:41,700 that you just wait for up to 10 seconds 921 00:33:39,840 --> 00:33:43,559 before you worry about the grace period 922 00:33:41,700 --> 00:33:45,360 after all the call RC all it's going to 923 00:33:43,559 --> 00:33:46,440 do is free some memory so you know who 924 00:33:45,360 --> 00:33:49,799 cares 925 00:33:46,440 --> 00:33:51,659 and that gives us a some tens of percent 926 00:33:49,799 --> 00:33:54,360 in some cases a few percent or a few 927 00:33:51,659 --> 00:33:56,880 tens of percent uh Improvement in 928 00:33:54,360 --> 00:34:00,179 battery Lifetime on those systems so 929 00:33:56,880 --> 00:34:01,799 that's pretty impressive of course uh I 930 00:34:00,179 --> 00:34:03,240 would I still argue that if they fix 931 00:34:01,799 --> 00:34:05,460 this in user space and didn't do the 932 00:34:03,240 --> 00:34:07,140 call rcu's they'd have even more battery 933 00:34:05,460 --> 00:34:09,179 saving and who knows maybe that'll 934 00:34:07,140 --> 00:34:11,820 happen sometime 935 00:34:09,179 --> 00:34:14,339 all right and those more details on uh 936 00:34:11,820 --> 00:34:16,740 What uh how to set it up and what its 937 00:34:14,339 --> 00:34:18,780 limitations are uh you guys already knew 938 00:34:16,740 --> 00:34:20,879 lasers via virtue and it's yet another 939 00:34:18,780 --> 00:34:22,980 piece of proof 940 00:34:20,879 --> 00:34:24,839 there's a call RCU hurry you use if 941 00:34:22,980 --> 00:34:26,700 laziness is not a virtue for your 942 00:34:24,839 --> 00:34:27,960 particular use case that will just 943 00:34:26,700 --> 00:34:30,599 always immediately start the grace 944 00:34:27,960 --> 00:34:32,460 period if there isn't one already going 945 00:34:30,599 --> 00:34:33,780 okay uh this is a bunch of other stuff 946 00:34:32,460 --> 00:34:35,760 that's going on I'm not going to go 947 00:34:33,780 --> 00:34:37,080 through it in detail but uh you can look 948 00:34:35,760 --> 00:34:38,820 at it if you'd like 949 00:34:37,080 --> 00:34:40,200 uh looking to the Future 950 00:34:38,820 --> 00:34:41,520 uh the main thing I'm going to cover 951 00:34:40,200 --> 00:34:43,800 here is the bottom this is the common 952 00:34:41,520 --> 00:34:45,659 case most changes to RC were things I 953 00:34:43,800 --> 00:34:48,500 would never have guessed and I don't 954 00:34:45,659 --> 00:34:48,500 expect that to change 955 00:34:48,599 --> 00:34:54,720 that's a long-term Trend the upper 956 00:34:51,240 --> 00:34:56,820 line is RCU commits the lower line is 957 00:34:54,720 --> 00:34:59,040 the ones done by not me 958 00:34:56,820 --> 00:35:00,540 early on it was Deepak or Sarma doing 959 00:34:59,040 --> 00:35:03,780 most of them so most of them were not me 960 00:35:00,540 --> 00:35:06,839 but more lately since I took over for 961 00:35:03,780 --> 00:35:08,700 real-time RCU it's been mostly me and 962 00:35:06,839 --> 00:35:10,140 that's a percentage which is noisy and 963 00:35:08,700 --> 00:35:13,140 is it increasing and decreasing who 964 00:35:10,140 --> 00:35:15,720 knows so we smooth it out over two years 965 00:35:13,140 --> 00:35:19,140 uh in April the two years ending in 966 00:35:15,720 --> 00:35:21,359 April of 2017 there were 47 46 967 00:35:19,140 --> 00:35:24,000 contributors and 74 of the patches came 968 00:35:21,359 --> 00:35:25,920 from me popping ahead to March of this 969 00:35:24,000 --> 00:35:27,780 year the two years ending march of this 970 00:35:25,920 --> 00:35:30,240 year there were 67 contributors which is 971 00:35:27,780 --> 00:35:32,040 good that's more and only 60 of them 972 00:35:30,240 --> 00:35:33,839 were for me which kind of feels bad to 973 00:35:32,040 --> 00:35:36,119 me but it's actually pretty healthy for 974 00:35:33,839 --> 00:35:37,560 the community in general uh you know I I 975 00:35:36,119 --> 00:35:39,420 don't bleach with my hair this is my 976 00:35:37,560 --> 00:35:42,180 natural hair color 977 00:35:39,420 --> 00:35:44,400 all right uh the other thing is that 978 00:35:42,180 --> 00:35:47,400 software wrote in 1977 is still in use 979 00:35:44,400 --> 00:35:48,960 that was more than 45 years ago it seems 980 00:35:47,400 --> 00:35:51,540 like a fair bet that the Linux kernel 981 00:35:48,960 --> 00:35:54,180 might be in use 45 years from now 982 00:35:51,540 --> 00:35:55,980 admittedly the software wrote in 1977 is 983 00:35:54,180 --> 00:35:58,680 only used in a museum but it was used 984 00:35:55,980 --> 00:36:00,000 this year so you know the the danger is 985 00:35:58,680 --> 00:36:01,560 real 986 00:36:00,000 --> 00:36:03,119 and so therefore a first Public 987 00:36:01,560 --> 00:36:05,160 Announcement Joel Fernandez and motion 988 00:36:03,119 --> 00:36:06,960 Fung will be handling the RC release 989 00:36:05,160 --> 00:36:09,420 into the 6-4 Verge window they've 990 00:36:06,960 --> 00:36:11,339 started on that my goal is to have at 991 00:36:09,420 --> 00:36:14,160 least one more group uh doing that in 992 00:36:11,339 --> 00:36:15,540 2023 by the end of 2024 I'd like each of 993 00:36:14,160 --> 00:36:17,280 the five to have done a release on their 994 00:36:15,540 --> 00:36:18,599 own but we'll see how it goes I may have 995 00:36:17,280 --> 00:36:19,980 to adjust that 996 00:36:18,599 --> 00:36:22,920 and at this point I'm going to skip 997 00:36:19,980 --> 00:36:25,560 ahead through the standardization and uh 998 00:36:22,920 --> 00:36:28,020 right to the summary here uh we've been 999 00:36:25,560 --> 00:36:29,640 through this uh the main thing is that 1000 00:36:28,020 --> 00:36:31,920 RCU is still under active development 1001 00:36:29,640 --> 00:36:33,900 driven by the needs of its users past 1002 00:36:31,920 --> 00:36:35,579 present and future the End Future Part 1003 00:36:33,900 --> 00:36:37,380 being the thing we talked about most 1004 00:36:35,579 --> 00:36:39,960 recently 1005 00:36:37,380 --> 00:36:41,880 uh okay oops I'm going backwards that's 1006 00:36:39,960 --> 00:36:43,980 bad this is a bunch of more information 1007 00:36:41,880 --> 00:36:45,420 that's slightly available we went 1008 00:36:43,980 --> 00:36:48,119 blasted through a bunch of stuff you can 1009 00:36:45,420 --> 00:36:50,880 get the full story here 1010 00:36:48,119 --> 00:36:54,740 and uh here we are again all six of us 1011 00:36:50,880 --> 00:36:54,740 and uh I'm happy to take questions 1012 00:37:01,560 --> 00:37:06,140 and if there are questions I can ask you 1013 00:37:03,119 --> 00:37:06,140 guys questions I guess 1014 00:37:07,040 --> 00:37:10,460 any one questions 1015 00:37:12,480 --> 00:37:17,599 nope 1016 00:37:14,579 --> 00:37:17,599 oh yep 1017 00:37:19,920 --> 00:37:25,200 are you talking about the srcu struct in 1018 00:37:22,800 --> 00:37:27,540 there and how a lot of use cases in line 1019 00:37:25,200 --> 00:37:29,579 that into their other structures uh how 1020 00:37:27,540 --> 00:37:31,320 big is that now yeah 1021 00:37:29,579 --> 00:37:33,859 so you said that's more I was curious 1022 00:37:31,320 --> 00:37:33,859 how much more 1023 00:37:34,440 --> 00:37:38,220 um I think it's a few hundred bytes let 1024 00:37:36,420 --> 00:37:41,180 me go back and not I guess it doesn't 1025 00:37:38,220 --> 00:37:41,180 really matter but uh 1026 00:37:42,000 --> 00:37:45,900 yeah okay so 1027 00:37:44,820 --> 00:37:47,900 there 1028 00:37:45,900 --> 00:37:47,900 um 1029 00:37:48,720 --> 00:37:52,680 but if you if you uh so the first off is 1030 00:37:51,660 --> 00:37:54,380 going to vary from architecture to 1031 00:37:52,680 --> 00:37:58,440 architecture based on the size of things 1032 00:37:54,380 --> 00:38:00,900 uh uh one thing would be to uh just look 1033 00:37:58,440 --> 00:38:02,520 at it with uh with any of the tools 1034 00:38:00,900 --> 00:38:04,320 looking through the dwarf information if 1035 00:38:02,520 --> 00:38:06,359 you have a hard time getting it uh drop 1036 00:38:04,320 --> 00:38:08,760 me an email and I'll mail you back uh 1037 00:38:06,359 --> 00:38:11,579 what the size of it is for the runs I 1038 00:38:08,760 --> 00:38:14,339 run on x86 64-bit 1039 00:38:11,579 --> 00:38:18,020 but it's but the thing is almost all of 1040 00:38:14,339 --> 00:38:18,020 SRC struck was that combining tree 1041 00:38:20,640 --> 00:38:26,160 um actually I could try some here 1042 00:38:23,220 --> 00:38:29,480 I'll try something stupid 1043 00:38:26,160 --> 00:38:29,480 um I will try to 1044 00:38:29,820 --> 00:38:32,900 uh let's see 1045 00:38:33,720 --> 00:38:36,320 try that 1046 00:38:36,839 --> 00:38:40,140 okay forget it I won't try something 1047 00:38:38,280 --> 00:38:41,520 stupid because it's not cooperating when 1048 00:38:40,140 --> 00:38:43,440 I'm in this mode I was going to try to 1049 00:38:41,520 --> 00:38:47,359 pop up an X term and look at it but I 1050 00:38:43,440 --> 00:38:47,359 did set up for that's no joy 1051 00:38:47,400 --> 00:38:50,940 but it's an excellent question again I 1052 00:38:49,500 --> 00:38:53,339 believe it's on the order of a few 1053 00:38:50,940 --> 00:38:56,540 hundred bytes it's way way smaller than 1054 00:38:53,339 --> 00:38:56,540 26 kilobytes 1055 00:38:57,440 --> 00:39:01,280 anybody else got any questions 1056 00:39:08,940 --> 00:39:14,760 sorry 1057 00:39:10,640 --> 00:39:17,400 if uh nobody has any other questions uh 1058 00:39:14,760 --> 00:39:20,520 we'll say thank you uh to Paul please 1059 00:39:17,400 --> 00:39:22,079 give him a wonderful Round of Applause 1060 00:39:20,520 --> 00:39:24,380 and thank you all for your time and 1061 00:39:22,079 --> 00:39:24,380 attention