1 00:00:00,000 --> 00:00:08,469 foreign 2 00:00:00,500 --> 00:00:08,469 [Music] 3 00:00:12,139 --> 00:00:16,740 hello everyone this is I believe the 4 00:00:15,000 --> 00:00:18,660 last Talk of the day 5 00:00:16,740 --> 00:00:21,000 um and we have with us uh David Gibson 6 00:00:18,660 --> 00:00:23,520 who has had a 20 plus year career 7 00:00:21,000 --> 00:00:26,400 working almost entirely on open source 8 00:00:23,520 --> 00:00:28,920 projects uh a lot of this work is in and 9 00:00:26,400 --> 00:00:30,599 around the Linux kernel and David will 10 00:00:28,920 --> 00:00:32,579 be describing an older networking 11 00:00:30,599 --> 00:00:35,460 technology uh and how it's being 12 00:00:32,579 --> 00:00:38,239 modernized for use today so please give 13 00:00:35,460 --> 00:00:38,239 David a warm welcome 14 00:00:38,650 --> 00:00:45,120 [Applause] 15 00:00:41,100 --> 00:00:48,239 hello uh I'll start by acknowledging the 16 00:00:45,120 --> 00:00:51,059 traditional custodians of this land the 17 00:00:48,239 --> 00:00:53,219 wooden Jerry where were wrong people and 18 00:00:51,059 --> 00:00:56,719 I extend my respects to their Elders 19 00:00:53,219 --> 00:00:56,719 past present and emerging 20 00:00:56,879 --> 00:01:02,719 all right 21 00:00:59,420 --> 00:01:02,719 hang up 22 00:01:03,239 --> 00:01:06,240 so 23 00:01:04,320 --> 00:01:10,140 just as we start quick show of hands 24 00:01:06,240 --> 00:01:13,500 here who has used dial-up internet 25 00:01:10,140 --> 00:01:17,040 who has used a BBS system 26 00:01:13,500 --> 00:01:19,880 all right who has used uh slurp to get 27 00:01:17,040 --> 00:01:19,880 their dial up internet 28 00:01:19,920 --> 00:01:23,580 all right a few people 29 00:01:21,659 --> 00:01:25,860 and who recognizes the similarly dated 30 00:01:23,580 --> 00:01:28,920 quote there 31 00:01:25,860 --> 00:01:31,080 or reference rather 32 00:01:28,920 --> 00:01:34,320 anyway we're going to talk about ancient 33 00:01:31,080 --> 00:01:36,840 history specifically the 1990s 34 00:01:34,320 --> 00:01:38,939 so this is the world of dial up 35 00:01:36,840 --> 00:01:41,400 well dial up things dial up internet so 36 00:01:38,939 --> 00:01:43,619 modems and dial up from home are an 37 00:01:41,400 --> 00:01:45,720 established technology at this point 38 00:01:43,619 --> 00:01:48,479 internet access is common at 39 00:01:45,720 --> 00:01:51,119 universities at this point and it's 40 00:01:48,479 --> 00:01:52,079 there in some organizations but not a 41 00:01:51,119 --> 00:01:54,600 lot 42 00:01:52,079 --> 00:01:57,119 commercial isps are still kind of 43 00:01:54,600 --> 00:01:59,520 uncommon and kind of expensive 44 00:01:57,119 --> 00:02:01,020 uh however and this will become 45 00:01:59,520 --> 00:02:02,280 important in a moment there are quite a 46 00:02:01,020 --> 00:02:05,579 few students and staff I think this is 47 00:02:02,280 --> 00:02:08,759 more in the US than here uh have access 48 00:02:05,579 --> 00:02:11,520 to dial up shell accounts but not dial 49 00:02:08,759 --> 00:02:14,700 up internet accounts 50 00:02:11,520 --> 00:02:17,099 so what does what does dial up look up 51 00:02:14,700 --> 00:02:18,540 look like if it's not internet 52 00:02:17,099 --> 00:02:22,680 um so a few of you might be familiar 53 00:02:18,540 --> 00:02:24,900 with bbs's these are all systems a text 54 00:02:22,680 --> 00:02:26,280 interface you dial into this system and 55 00:02:24,900 --> 00:02:28,319 you can do a bunch of stuff you have 56 00:02:26,280 --> 00:02:30,900 these menus you could exchange users 57 00:02:28,319 --> 00:02:35,040 exchange messages upload and download 58 00:02:30,900 --> 00:02:36,599 files play various games uh bits and 59 00:02:35,040 --> 00:02:39,300 pieces they vary in exactly what they 60 00:02:36,599 --> 00:02:41,580 could do but these were around and 61 00:02:39,300 --> 00:02:43,319 popular for a little while 62 00:02:41,580 --> 00:02:48,540 so what does that look like from a 63 00:02:43,319 --> 00:02:50,819 systems point of view you've got your uh 64 00:02:48,540 --> 00:02:53,280 home computer here uh probably not a 65 00:02:50,819 --> 00:02:54,000 laptop in 1995 but we'll just go with 66 00:02:53,280 --> 00:02:56,819 that 67 00:02:54,000 --> 00:02:59,760 you've got your terminal program there 68 00:02:56,819 --> 00:03:03,180 and it connects through to the some 69 00:02:59,760 --> 00:03:04,319 software running on the BBS system 70 00:03:03,180 --> 00:03:07,019 um and you say surely it's more 71 00:03:04,319 --> 00:03:08,940 complicated than that and well yes 72 00:03:07,019 --> 00:03:11,120 there's modems and stuff 73 00:03:08,940 --> 00:03:14,280 but that's not what this talk is about 74 00:03:11,120 --> 00:03:17,099 for our purposes your talk your software 75 00:03:14,280 --> 00:03:20,879 talks to their software in user space on 76 00:03:17,099 --> 00:03:22,800 both ends and uh that's how bulletin 77 00:03:20,879 --> 00:03:24,659 board system works 78 00:03:22,800 --> 00:03:26,340 so meanwhile at the University which 79 00:03:24,659 --> 00:03:29,760 does have internet what does that look 80 00:03:26,340 --> 00:03:32,819 like they've got an internet host it's 81 00:03:29,760 --> 00:03:35,340 got some kind of real uh network 82 00:03:32,819 --> 00:03:37,620 connection to the outward world even in 83 00:03:35,340 --> 00:03:38,940 the 90s it was probably ethernet but 84 00:03:37,620 --> 00:03:41,220 there were a few other options which 85 00:03:38,940 --> 00:03:42,900 were more common then than they are now 86 00:03:41,220 --> 00:03:44,819 and you could run whatever Network 87 00:03:42,900 --> 00:03:48,659 applications on there 88 00:03:44,819 --> 00:03:52,140 um you know a gopher server or uh what 89 00:03:48,659 --> 00:03:55,319 else was around then email of course 90 00:03:52,140 --> 00:03:57,299 uh FTP would have been around then I see 91 00:03:55,319 --> 00:03:59,700 yeah various things and they they 92 00:03:57,299 --> 00:04:03,180 connect out to the world uh through 93 00:03:59,700 --> 00:04:06,360 primarily TCP and UDP sockets and 94 00:04:03,180 --> 00:04:07,200 um that all goes over the the L2 95 00:04:06,360 --> 00:04:10,760 um 96 00:04:07,200 --> 00:04:10,760 Ethernet or whatever Network 97 00:04:10,860 --> 00:04:15,180 and again there's more complexity here 98 00:04:13,560 --> 00:04:16,799 in what that connection actually means 99 00:04:15,180 --> 00:04:18,720 there's various levels of router and so 100 00:04:16,799 --> 00:04:21,079 forth but that's also not what this talk 101 00:04:18,720 --> 00:04:21,079 is about 102 00:04:21,479 --> 00:04:28,560 so if you did manage to get dial up into 103 00:04:24,360 --> 00:04:30,660 that at this time uh how did that happen 104 00:04:28,560 --> 00:04:32,220 so you have your home computer again you 105 00:04:30,660 --> 00:04:34,160 have some kind of access point maybe you 106 00:04:32,220 --> 00:04:36,660 know someone at the University maybe you 107 00:04:34,160 --> 00:04:38,820 uh had access to you managed to have 108 00:04:36,660 --> 00:04:41,460 access to an ISP or something 109 00:04:38,820 --> 00:04:44,040 and every scene we can get a dial-up 110 00:04:41,460 --> 00:04:46,320 serial link from your software to their 111 00:04:44,040 --> 00:04:47,759 software so we want to change this into 112 00:04:46,320 --> 00:04:49,919 a network link 113 00:04:47,759 --> 00:04:51,960 and the way that was generally done at 114 00:04:49,919 --> 00:04:54,060 the time was using a program called SL 115 00:04:51,960 --> 00:04:56,280 attach which kind of wired that serial 116 00:04:54,060 --> 00:04:58,860 link into the kernel and said please 117 00:04:56,280 --> 00:05:00,380 treat it as a network link using slip 118 00:04:58,860 --> 00:05:03,780 which is the protocol 119 00:05:00,380 --> 00:05:05,340 serial line IP so internet over a Serial 120 00:05:03,780 --> 00:05:08,520 line 121 00:05:05,340 --> 00:05:10,560 now the important thing about this is it 122 00:05:08,520 --> 00:05:12,840 needs you to have root or at least some 123 00:05:10,560 --> 00:05:16,759 kind of privilege on both ends s will 124 00:05:12,840 --> 00:05:18,600 attach needs to talk to the colonel 125 00:05:16,759 --> 00:05:20,460 and 126 00:05:18,600 --> 00:05:23,220 um 127 00:05:20,460 --> 00:05:26,699 and tell it to to plumb the serial link 128 00:05:23,220 --> 00:05:30,180 into the kernels networking and so you 129 00:05:26,699 --> 00:05:33,840 need root on both ends which is why you 130 00:05:30,180 --> 00:05:36,240 can't just do this on on any old system 131 00:05:33,840 --> 00:05:38,759 so like I said isps are a bit uncommon 132 00:05:36,240 --> 00:05:40,100 at this stage but some people had shell 133 00:05:38,759 --> 00:05:42,240 accounts 134 00:05:40,100 --> 00:05:44,520 so they don't have root on the 135 00:05:42,240 --> 00:05:46,080 University server but they can run 136 00:05:44,520 --> 00:05:48,960 whatever they 137 00:05:46,080 --> 00:05:51,060 more or less whatever they want in user 138 00:05:48,960 --> 00:05:52,860 space on that machine so that's the the 139 00:05:51,060 --> 00:05:55,139 red is the things you control the black 140 00:05:52,860 --> 00:05:57,240 is the things you don't 141 00:05:55,139 --> 00:06:00,539 your home computer you can connect up to 142 00:05:57,240 --> 00:06:05,639 there you can run you can run Elm to do 143 00:06:00,539 --> 00:06:07,620 your email you can run a news client 144 00:06:05,639 --> 00:06:09,600 um things and they can they can do 145 00:06:07,620 --> 00:06:12,900 Network stuff 146 00:06:09,600 --> 00:06:15,539 what you can't do is run Mosaic on your 147 00:06:12,900 --> 00:06:18,960 home PC and and download interlaced 148 00:06:15,539 --> 00:06:20,940 pictures of cats for example and I hear 149 00:06:18,960 --> 00:06:22,740 that was all the rage at the time 150 00:06:20,940 --> 00:06:25,380 so 151 00:06:22,740 --> 00:06:26,580 so the situation here is you've got root 152 00:06:25,380 --> 00:06:29,520 and you've got control over the 153 00:06:26,580 --> 00:06:31,080 networking uh over here on your home 154 00:06:29,520 --> 00:06:33,180 system 155 00:06:31,080 --> 00:06:35,280 but you don't have any connectivity 156 00:06:33,180 --> 00:06:37,919 and your network connectivity over here 157 00:06:35,280 --> 00:06:39,120 you have network connectivity but you 158 00:06:37,919 --> 00:06:40,919 don't have root 159 00:06:39,120 --> 00:06:42,720 so is there some way we can connect the 160 00:06:40,919 --> 00:06:44,340 two together 161 00:06:42,720 --> 00:06:46,500 and 162 00:06:44,340 --> 00:06:48,419 um there we use it was a program called 163 00:06:46,500 --> 00:06:52,580 slurp exactly how that spelled and 164 00:06:48,419 --> 00:06:52,580 capitalized varies quite a bit 165 00:06:53,100 --> 00:06:56,699 um 166 00:06:54,419 --> 00:06:57,840 but on your end this looks just like a 167 00:06:56,699 --> 00:07:00,120 slip link 168 00:06:57,840 --> 00:07:02,880 um you you wire this slip link into the 169 00:07:00,120 --> 00:07:05,460 kernel but on the other hand and that 170 00:07:02,880 --> 00:07:08,400 link terminates in user space it doesn't 171 00:07:05,460 --> 00:07:11,520 talk to the or it doesn't talk in a 172 00:07:08,400 --> 00:07:14,000 network Administration capacity to the 173 00:07:11,520 --> 00:07:16,979 uh to the server end 174 00:07:14,000 --> 00:07:19,680 uh instead it it makes magically makes 175 00:07:16,979 --> 00:07:23,819 this happen by talking regular user 176 00:07:19,680 --> 00:07:25,560 space TCP and UDP sockets so how does it 177 00:07:23,819 --> 00:07:29,819 do that 178 00:07:25,560 --> 00:07:32,940 well we can uh and uh 179 00:07:29,819 --> 00:07:34,319 the the effect of this is uh once once 180 00:07:32,940 --> 00:07:36,479 you've got this set up it looks like 181 00:07:34,319 --> 00:07:38,340 you've got a a network connection out to 182 00:07:36,479 --> 00:07:41,220 the outside world and exactly how you 183 00:07:38,340 --> 00:07:44,180 wire that in kind of no longer matters 184 00:07:41,220 --> 00:07:47,639 how does that work we'll zoom in onto 185 00:07:44,180 --> 00:07:49,259 this University server here we've got 186 00:07:47,639 --> 00:07:52,020 the internet coming in through some kind 187 00:07:49,259 --> 00:07:55,139 of network interface and its driver 188 00:07:52,020 --> 00:07:58,080 and inside the kernel over there there's 189 00:07:55,139 --> 00:08:00,479 a tcpip stack which effectively splits 190 00:07:58,080 --> 00:08:02,240 that thing up into the various 191 00:08:00,479 --> 00:08:04,199 UDP and TCP 192 00:08:02,240 --> 00:08:09,120 theoretically a few other things mostly 193 00:08:04,199 --> 00:08:12,120 TCP and UDP connections into user space 194 00:08:09,120 --> 00:08:14,759 so what slurp does is it takes another 195 00:08:12,120 --> 00:08:16,020 TCP stack specifically the one from 4.4 196 00:08:14,759 --> 00:08:19,139 BSD 197 00:08:16,020 --> 00:08:20,940 and turns it on its head and takes all 198 00:08:19,139 --> 00:08:23,639 those sockets and puts them back 199 00:08:20,940 --> 00:08:26,039 together into an L2 stream which it 200 00:08:23,639 --> 00:08:27,060 sends over a slip connection back to 201 00:08:26,039 --> 00:08:29,039 your end 202 00:08:27,060 --> 00:08:30,360 if you think this sounds like a gigantic 203 00:08:29,039 --> 00:08:32,700 hack 204 00:08:30,360 --> 00:08:36,680 it is 205 00:08:32,700 --> 00:08:36,680 but it was a very useful one at the time 206 00:08:40,039 --> 00:08:45,000 Okay so 207 00:08:42,360 --> 00:08:48,060 Let's uh dial the clock forward a bit 208 00:08:45,000 --> 00:08:50,700 isps became common and cheap triple P 209 00:08:48,060 --> 00:08:54,000 replace slip broad pan replace dial up 210 00:08:50,700 --> 00:08:56,880 no room for this hack anymore right 211 00:08:54,000 --> 00:08:58,200 okay well virtual machines became a 212 00:08:56,880 --> 00:09:02,519 thing uh 213 00:08:58,200 --> 00:09:04,500 I guess in the 2010s ish 214 00:09:02,519 --> 00:09:05,880 so this might look a bit odd to you 215 00:09:04,500 --> 00:09:07,440 because you might be used to thinking of 216 00:09:05,880 --> 00:09:09,540 the virtual machine as running on top of 217 00:09:07,440 --> 00:09:11,760 the host machine but what I'm looking 218 00:09:09,540 --> 00:09:14,279 here at the is the network point of view 219 00:09:11,760 --> 00:09:15,779 so from that point of view the virtual 220 00:09:14,279 --> 00:09:17,399 machine is essentially an independent 221 00:09:15,779 --> 00:09:19,380 system from the host so you have the 222 00:09:17,399 --> 00:09:22,980 host machine here 223 00:09:19,380 --> 00:09:25,080 you have qmu which creates the uh 224 00:09:22,980 --> 00:09:27,180 virtual machine there are other things 225 00:09:25,080 --> 00:09:30,300 but qme is what we'll talk about here 226 00:09:27,180 --> 00:09:31,980 and so that creates this uh virtual 227 00:09:30,300 --> 00:09:34,080 machine 228 00:09:31,980 --> 00:09:35,459 and you kind of automatically have a 229 00:09:34,080 --> 00:09:38,279 link from the virtual machine through to 230 00:09:35,459 --> 00:09:40,800 qmu because qmu creates it and emulates 231 00:09:38,279 --> 00:09:43,740 that Network so you've got this emulated 232 00:09:40,800 --> 00:09:45,660 Ethernet there just by the nature of How 233 00:09:43,740 --> 00:09:47,220 It's implemented 234 00:09:45,660 --> 00:09:48,660 so what you can what can you do with 235 00:09:47,220 --> 00:09:51,000 that well one thing you can do is if 236 00:09:48,660 --> 00:09:53,760 you've got two virtual machines 237 00:09:51,000 --> 00:09:54,540 you've got two qamus two things wired up 238 00:09:53,760 --> 00:09:58,200 there 239 00:09:54,540 --> 00:09:59,940 you can make a socket between those qmus 240 00:09:58,200 --> 00:10:02,580 and you can 241 00:09:59,940 --> 00:10:03,660 connect the 242 00:10:02,580 --> 00:10:06,120 um 243 00:10:03,660 --> 00:10:08,820 connect those all back to back and give 244 00:10:06,120 --> 00:10:10,260 you what looks like a a network 245 00:10:08,820 --> 00:10:12,360 connection directly between those two 246 00:10:10,260 --> 00:10:14,399 virtual machines 247 00:10:12,360 --> 00:10:15,839 um that's usually that's what the 248 00:10:14,399 --> 00:10:17,640 definite socket or Dash net stream 249 00:10:15,839 --> 00:10:19,260 options of qmu 250 00:10:17,640 --> 00:10:23,519 um they're basically equivalent streams 251 00:10:19,260 --> 00:10:25,980 just a bit newer for reasons 252 00:10:23,519 --> 00:10:27,360 but connecting to another VM isn't all 253 00:10:25,980 --> 00:10:29,040 that exciting you really want to connect 254 00:10:27,360 --> 00:10:31,560 to the outside world 255 00:10:29,040 --> 00:10:34,980 so how would you do that 256 00:10:31,560 --> 00:10:37,140 was the qmu.net tap option and what that 257 00:10:34,980 --> 00:10:39,720 does is 258 00:10:37,140 --> 00:10:42,060 uh it takes that emulated thing and it 259 00:10:39,720 --> 00:10:44,459 uses the tap device or the ton device 260 00:10:42,060 --> 00:10:47,040 there's a few other variants of it 261 00:10:44,459 --> 00:10:49,980 and wires those up to give you what 262 00:10:47,040 --> 00:10:52,079 looks like a network connection from the 263 00:10:49,980 --> 00:10:55,019 virtual machine to the host machine 264 00:10:52,079 --> 00:10:57,120 but you'll notice again here that again 265 00:10:55,019 --> 00:10:59,040 you need root or at least some kind of 266 00:10:57,120 --> 00:11:01,980 elevated privilege on that host machine 267 00:10:59,040 --> 00:11:04,560 in order to wire up the tap device into 268 00:11:01,980 --> 00:11:06,899 the host's networking stack 269 00:11:04,560 --> 00:11:09,360 if you're deploying a you know 270 00:11:06,899 --> 00:11:10,980 production system of VMS that's probably 271 00:11:09,360 --> 00:11:12,540 not a problem you can wire everything up 272 00:11:10,980 --> 00:11:14,820 you set up your VMS to have the right 273 00:11:12,540 --> 00:11:16,140 sort of accessibility this is typically 274 00:11:14,820 --> 00:11:18,000 what you do 275 00:11:16,140 --> 00:11:20,700 if however you're just developing you're 276 00:11:18,000 --> 00:11:23,220 just testing you don't really want to be 277 00:11:20,700 --> 00:11:24,959 root for this so 278 00:11:23,220 --> 00:11:26,820 now you've got a host machine where you 279 00:11:24,959 --> 00:11:28,860 don't have privilege 280 00:11:26,820 --> 00:11:30,420 and you want to give Network to your VM 281 00:11:28,860 --> 00:11:33,240 and this should look familiar at this 282 00:11:30,420 --> 00:11:34,220 point you've got one place where you 283 00:11:33,240 --> 00:11:36,420 have 284 00:11:34,220 --> 00:11:38,220 root access in fact you completely 285 00:11:36,420 --> 00:11:39,959 control the virtual Hardware 286 00:11:38,220 --> 00:11:41,940 but no connectivity 287 00:11:39,959 --> 00:11:44,279 and you've got another place where you 288 00:11:41,940 --> 00:11:45,660 have connectivity but you don't have 289 00:11:44,279 --> 00:11:48,620 root access you don't have Network 290 00:11:45,660 --> 00:11:48,620 administrative access 291 00:11:49,380 --> 00:11:57,240 so were you slurp uh qmusic.net user is 292 00:11:53,519 --> 00:11:59,459 slurp it's actually based on a newer 293 00:11:57,240 --> 00:12:02,519 version of that very same code base in a 294 00:11:59,459 --> 00:12:04,459 form called lib slurp and it basically 295 00:12:02,519 --> 00:12:08,160 works exactly the same way it takes the 296 00:12:04,459 --> 00:12:10,500 emulated ethernet of qmu wires it up to 297 00:12:08,160 --> 00:12:13,260 a bunch of sockets that qmu sends out to 298 00:12:10,500 --> 00:12:17,160 the outside world and it looks like you 299 00:12:13,260 --> 00:12:19,380 have network connections hooray 300 00:12:17,160 --> 00:12:21,720 VMS are a bit passe these days these 301 00:12:19,380 --> 00:12:24,620 days it's all about containers so how 302 00:12:21,720 --> 00:12:24,620 does networking work there 303 00:12:25,380 --> 00:12:30,060 well in this case 304 00:12:26,820 --> 00:12:32,700 um podman could be Docker whatever 305 00:12:30,060 --> 00:12:34,200 uh uh just sets up a virtual ethernet 306 00:12:32,700 --> 00:12:35,820 device between your container and the 307 00:12:34,200 --> 00:12:38,100 host the data doesn't actually go 308 00:12:35,820 --> 00:12:40,019 through podman in this case it just 309 00:12:38,100 --> 00:12:42,000 configures it 310 00:12:40,019 --> 00:12:44,160 um but this is typically how it does it 311 00:12:42,000 --> 00:12:46,680 it creates a v device uh could be back 312 00:12:44,160 --> 00:12:49,620 VLAN there's a few options 313 00:12:46,680 --> 00:12:51,720 um wires them straight up again 314 00:12:49,620 --> 00:12:54,060 needs root on the host 315 00:12:51,720 --> 00:12:56,160 uh or at least some kind of raise 316 00:12:54,060 --> 00:12:59,240 privilege some kind of privileged helper 317 00:12:56,160 --> 00:12:59,240 program something like that 318 00:12:59,399 --> 00:13:03,420 typical way it's done 319 00:13:01,380 --> 00:13:05,519 more recently 320 00:13:03,420 --> 00:13:08,040 uh podman has added an option for what 321 00:13:05,519 --> 00:13:09,959 they call rootless networking so how can 322 00:13:08,040 --> 00:13:12,600 you give a container network access if 323 00:13:09,959 --> 00:13:14,579 you don't have root on host 324 00:13:12,600 --> 00:13:16,560 and it slip 325 00:13:14,579 --> 00:13:20,220 a particular it's a variant called slip 326 00:13:16,560 --> 00:13:22,260 phonetiness that is slurp hooked up to 327 00:13:20,220 --> 00:13:24,839 talk to a network namespace like having 328 00:13:22,260 --> 00:13:26,399 a container rather than a VM or a 329 00:13:24,839 --> 00:13:27,959 virtual Ethernet or or anything else 330 00:13:26,399 --> 00:13:29,519 like that 331 00:13:27,959 --> 00:13:32,399 so 332 00:13:29,519 --> 00:13:35,399 so this supposedly ancient hack is is 333 00:13:32,399 --> 00:13:37,980 still seeing some life 334 00:13:35,399 --> 00:13:39,720 um little diversion uh containers and 335 00:13:37,980 --> 00:13:41,700 kubernetes together 336 00:13:39,720 --> 00:13:45,120 um people may may not have heard of of 337 00:13:41,700 --> 00:13:47,459 cube vert it's a it's an extension to 338 00:13:45,120 --> 00:13:51,120 kubernetes essentially designed to run 339 00:13:47,459 --> 00:13:52,440 VM workloads on a kubernetes cluster 340 00:13:51,120 --> 00:13:54,540 um this is quite useful and you've got 341 00:13:52,440 --> 00:13:56,839 existing VM workloads and you want to 342 00:13:54,540 --> 00:13:59,399 move them into Cloud environments 343 00:13:56,839 --> 00:14:01,680 and specifically how it does that is it 344 00:13:59,399 --> 00:14:04,620 essentially runs qmu inside a kubernetes 345 00:14:01,680 --> 00:14:06,240 pod to fire up the VMS but the important 346 00:14:04,620 --> 00:14:08,519 thing is it's not just enough to run 347 00:14:06,240 --> 00:14:10,399 these VMS they need to talk to the rest 348 00:14:08,519 --> 00:14:13,220 of the cluster because you could have 349 00:14:10,399 --> 00:14:17,100 apps that are split across some 350 00:14:13,220 --> 00:14:20,040 containerized pods some VM pods 351 00:14:17,100 --> 00:14:22,019 whatever so all these components need to 352 00:14:20,040 --> 00:14:24,060 talk to each other the VM networking 353 00:14:22,019 --> 00:14:26,100 needs to integrate with the kubernetes 354 00:14:24,060 --> 00:14:27,660 networking 355 00:14:26,100 --> 00:14:29,040 right so what's that's gonna what is 356 00:14:27,660 --> 00:14:31,320 that going to look like 357 00:14:29,040 --> 00:14:32,220 you've got a kubernetes pod 358 00:14:31,320 --> 00:14:34,680 um 359 00:14:32,220 --> 00:14:36,120 it talks to the internet but more 360 00:14:34,680 --> 00:14:38,339 immediately it talks to the rest of the 361 00:14:36,120 --> 00:14:40,680 cluster that's kind of significant in 362 00:14:38,339 --> 00:14:43,079 this case inside the Pod you've got qmu 363 00:14:40,680 --> 00:14:44,820 it creates a virtual machine again 364 00:14:43,079 --> 00:14:47,339 this might look a bit odd the virtual 365 00:14:44,820 --> 00:14:48,899 machine not being inside the pot as it 366 00:14:47,339 --> 00:14:50,760 were but this is a networking View and 367 00:14:48,899 --> 00:14:54,420 from that point of view it's a it's a 368 00:14:50,760 --> 00:14:56,220 different network space from the uh it's 369 00:14:54,420 --> 00:15:00,240 a different host in the networking sense 370 00:14:56,220 --> 00:15:02,880 from the from the pod 371 00:15:00,240 --> 00:15:04,620 and so you might say well Cuba creates 372 00:15:02,880 --> 00:15:06,720 that pod so it's got administrative 373 00:15:04,620 --> 00:15:10,500 access over it right so it can just do 374 00:15:06,720 --> 00:15:12,540 the typical thing wires up the Mac VLAN 375 00:15:10,500 --> 00:15:15,060 um gives you a virtual ethernet 376 00:15:12,540 --> 00:15:18,540 hunky dory 377 00:15:15,060 --> 00:15:20,339 he can uh it does that right now it 378 00:15:18,540 --> 00:15:22,079 works for some things 379 00:15:20,339 --> 00:15:24,060 but not for other things there's 380 00:15:22,079 --> 00:15:26,760 actually two modes it can use 381 00:15:24,060 --> 00:15:28,199 um there's bridge mode uh I there's a 382 00:15:26,760 --> 00:15:31,199 lot of details I could go into here 383 00:15:28,199 --> 00:15:32,880 that's kind of a focus of this 384 00:15:31,199 --> 00:15:34,680 um so if this doesn't make a lot of 385 00:15:32,880 --> 00:15:36,540 sense to you it doesn't really matter 386 00:15:34,680 --> 00:15:37,199 we'll come back to other things in a 387 00:15:36,540 --> 00:15:38,579 minute 388 00:15:37,199 --> 00:15:41,160 you've got bridge mode in bridge mode 389 00:15:38,579 --> 00:15:42,600 the VM essentially takes over the IP 390 00:15:41,160 --> 00:15:44,600 address of the Pod takes over the 391 00:15:42,600 --> 00:15:47,760 network interface of the pod 392 00:15:44,600 --> 00:15:49,380 uh that works great when it works uh 393 00:15:47,760 --> 00:15:51,600 pods can have multiple containers in 394 00:15:49,380 --> 00:15:53,579 them if you have other containers often 395 00:15:51,600 --> 00:15:54,779 called sidecar containers this will 396 00:15:53,579 --> 00:15:58,079 completely break them because they don't 397 00:15:54,779 --> 00:16:00,120 have an IP anymore so 398 00:15:58,079 --> 00:16:03,240 that's good until it's not there's also 399 00:16:00,120 --> 00:16:06,720 masquerading mode in that way 400 00:16:03,240 --> 00:16:09,420 you it uses the kernel Nat masquerading 401 00:16:06,720 --> 00:16:11,880 to connect the VMS interface to the 402 00:16:09,420 --> 00:16:14,279 external pod interface 403 00:16:11,880 --> 00:16:16,560 uh there's a couple of problems here one 404 00:16:14,279 --> 00:16:18,240 is that the guest doesn't see the the 405 00:16:16,560 --> 00:16:19,079 doesn't see the same IP as the Pod 406 00:16:18,240 --> 00:16:22,620 anymore 407 00:16:19,079 --> 00:16:25,079 and kubernetes are apps tend to assume 408 00:16:22,620 --> 00:16:27,060 that the Pod IP is something that 409 00:16:25,079 --> 00:16:28,620 everybody agrees on that that all the 410 00:16:27,060 --> 00:16:31,160 components of the ad know about and 411 00:16:28,620 --> 00:16:33,660 don't have to translate in any way 412 00:16:31,160 --> 00:16:35,459 it also breaks service meshes this is 413 00:16:33,660 --> 00:16:36,000 something I don't know a lot about 414 00:16:35,459 --> 00:16:38,279 um 415 00:16:36,000 --> 00:16:41,459 it's related to the fact that in service 416 00:16:38,279 --> 00:16:43,800 meshes uh at a sidecar 417 00:16:41,459 --> 00:16:46,259 a sidecar routes the traffic and expects 418 00:16:43,800 --> 00:16:47,579 it to come from the pods user space and 419 00:16:46,259 --> 00:16:49,680 in this case because of the way it's 420 00:16:47,579 --> 00:16:51,660 routed through the kernel Nat it's not 421 00:16:49,680 --> 00:16:53,759 coming from user space and everything 422 00:16:51,660 --> 00:16:55,440 falls down horribly 423 00:16:53,759 --> 00:16:56,820 so we need something different in order 424 00:16:55,440 --> 00:16:59,040 to get Cube there this is the original 425 00:16:56,820 --> 00:17:01,139 motivating case for past and pasta which 426 00:16:59,040 --> 00:17:03,660 I'm going to talk about in a minute 427 00:17:01,139 --> 00:17:05,040 we need a different approach um in 428 00:17:03,660 --> 00:17:07,260 addition to those kind of fundamental 429 00:17:05,040 --> 00:17:09,000 problems actually although you can get 430 00:17:07,260 --> 00:17:10,400 privilege in that kubernetes pod it's 431 00:17:09,000 --> 00:17:12,839 kind of a pain 432 00:17:10,400 --> 00:17:14,939 kubernetes tends to not make that very 433 00:17:12,839 --> 00:17:17,280 easy and Cuba has to use a bunch of 434 00:17:14,939 --> 00:17:19,860 tricks in order to get the privilege it 435 00:17:17,280 --> 00:17:22,559 needs to set that up 436 00:17:19,860 --> 00:17:25,140 so we want VM traffic to appear if it's 437 00:17:22,559 --> 00:17:27,720 coming from the Pod user space we wanted 438 00:17:25,140 --> 00:17:29,040 to connect the L2 to The L4 interface we 439 00:17:27,720 --> 00:17:31,020 already have 440 00:17:29,040 --> 00:17:34,500 slope again right 441 00:17:31,020 --> 00:17:36,539 well no not quite 442 00:17:34,500 --> 00:17:39,720 there's a few problems with slurp the 443 00:17:36,539 --> 00:17:41,580 big one is that slurp always gnats slurp 444 00:17:39,720 --> 00:17:43,559 obviously being a user space program has 445 00:17:41,580 --> 00:17:45,900 no power to allocate new IP addresses 446 00:17:43,559 --> 00:17:47,100 out of anywhere or to control or routing 447 00:17:45,900 --> 00:17:49,700 on the host 448 00:17:47,100 --> 00:17:52,620 uh so the way it does that is it gives 449 00:17:49,700 --> 00:17:55,080 the guest 450 00:17:52,620 --> 00:17:57,480 an address on a on a private Network 451 00:17:55,080 --> 00:18:00,419 usually a 10 dot Network 452 00:17:57,480 --> 00:18:02,460 and inside the slurp codes this is not 453 00:18:00,419 --> 00:18:05,340 using the kernel Nat but inside slurps 454 00:18:02,460 --> 00:18:07,799 internal TCP stack it includes address 455 00:18:05,340 --> 00:18:09,840 translation to translate that into 456 00:18:07,799 --> 00:18:12,299 addresses in the outside world 457 00:18:09,840 --> 00:18:14,820 like I said kubernetes really isn't like 458 00:18:12,299 --> 00:18:18,380 that uh it pod IPS are expected to be 459 00:18:14,820 --> 00:18:18,380 Global across the cluster 460 00:18:19,520 --> 00:18:23,340 security is not great either it's got a 461 00:18:21,960 --> 00:18:26,039 pretty poor track record particularly 462 00:18:23,340 --> 00:18:27,299 for resource leaks memory leaks and the 463 00:18:26,039 --> 00:18:29,460 like 464 00:18:27,299 --> 00:18:31,320 complete tcpip stack is actually pretty 465 00:18:29,460 --> 00:18:33,120 complicated so there's a fairly large 466 00:18:31,320 --> 00:18:35,100 attack surface there and slope has a 467 00:18:33,120 --> 00:18:37,200 bunch of extra features for 468 00:18:35,100 --> 00:18:39,360 this that and the other increases it 469 00:18:37,200 --> 00:18:42,600 still further it's an old code base 470 00:18:39,360 --> 00:18:44,820 fairly difficult to maintain 471 00:18:42,600 --> 00:18:48,179 and in the case of lube slurp as it's 472 00:18:44,820 --> 00:18:52,140 used in qmu that is literally a library 473 00:18:48,179 --> 00:18:54,720 it shares address space with qmu so 474 00:18:52,140 --> 00:18:56,340 if you compromise some of qmu you can 475 00:18:54,720 --> 00:18:58,980 attack lib slurp and the other way 476 00:18:56,340 --> 00:19:01,140 around so we've got no isolation between 477 00:18:58,980 --> 00:19:03,600 the components there 478 00:19:01,140 --> 00:19:04,980 performance is also pretty bad 479 00:19:03,600 --> 00:19:06,299 um because alert was never really built 480 00:19:04,980 --> 00:19:08,460 for performance it turns out that even 481 00:19:06,299 --> 00:19:10,380 in the mid 90s it's pretty easy to keep 482 00:19:08,460 --> 00:19:12,660 up with a with a modem with a dial-up 483 00:19:10,380 --> 00:19:14,400 modem 484 00:19:12,660 --> 00:19:16,620 um since then computers have changed a 485 00:19:14,400 --> 00:19:18,720 bit networks have changed a lot and you 486 00:19:16,620 --> 00:19:22,260 really need different techniques to have 487 00:19:18,720 --> 00:19:24,900 uh decent uh performance 488 00:19:22,260 --> 00:19:26,940 in particular slurp has no support for 489 00:19:24,900 --> 00:19:29,820 TCP window scaling 490 00:19:26,940 --> 00:19:32,760 um which means you could have most 64k 491 00:19:29,820 --> 00:19:36,840 of data kind of in flight without an app 492 00:19:32,760 --> 00:19:39,740 which is plenty on a 56k modem but not 493 00:19:36,840 --> 00:19:45,240 on a multi-gigabit 494 00:19:39,740 --> 00:19:48,299 virtual Network link on your container 495 00:19:45,240 --> 00:19:50,400 uh it also has uh not very much IPv6 496 00:19:48,299 --> 00:19:51,480 support it's kind of been hacked in but 497 00:19:50,400 --> 00:19:52,740 there's a bunch of things it doesn't 498 00:19:51,480 --> 00:19:54,720 support 499 00:19:52,740 --> 00:19:57,000 all right so where are we now we want 500 00:19:54,720 --> 00:19:59,419 something that's like slurp but not 501 00:19:57,000 --> 00:19:59,419 slope 502 00:20:00,780 --> 00:20:06,900 and here is past 503 00:20:02,640 --> 00:20:10,260 so past is a modern written from scratch 504 00:20:06,900 --> 00:20:12,780 L2 to L4 Bridge so 505 00:20:10,260 --> 00:20:15,660 same concept of slurp connecting a 506 00:20:12,780 --> 00:20:18,120 virtual Ethernet or a virtual L2 network 507 00:20:15,660 --> 00:20:21,240 connection to regular old user 508 00:20:18,120 --> 00:20:23,280 unprivileged TCP sockets 509 00:20:21,240 --> 00:20:24,720 uh stands for plug a simple socket 510 00:20:23,280 --> 00:20:27,900 transport 511 00:20:24,720 --> 00:20:31,559 um I believe it's past is is a word 512 00:20:27,900 --> 00:20:33,360 meaning kind of adequate in German kind 513 00:20:31,559 --> 00:20:35,400 of slang 514 00:20:33,360 --> 00:20:38,580 um it was originally written by Stefano 515 00:20:35,400 --> 00:20:41,820 brivio starting late in 2020 516 00:20:38,580 --> 00:20:43,460 I started working on it uh just under a 517 00:20:41,820 --> 00:20:46,440 year ago 518 00:20:43,460 --> 00:20:50,900 and so I'm the second main contributor 519 00:20:46,440 --> 00:20:53,400 to it it's about 9 500 lines of C 520 00:20:50,900 --> 00:20:55,620 excluding comments 521 00:20:53,400 --> 00:20:58,980 and we and we got a bunch of particular 522 00:20:55,620 --> 00:21:01,820 depend design goals uh for this to 523 00:20:58,980 --> 00:21:05,580 address those problems we saw with slurp 524 00:21:01,820 --> 00:21:08,100 so we want to avoid dependencies so that 525 00:21:05,580 --> 00:21:10,860 we don't have a big supply chain of 526 00:21:08,100 --> 00:21:14,340 dependencies to to check 527 00:21:10,860 --> 00:21:18,419 we want to avoid Nat we can't completely 528 00:21:14,340 --> 00:21:19,580 but we mostly can and we'll see how in a 529 00:21:18,419 --> 00:21:22,200 moment 530 00:21:19,580 --> 00:21:24,600 uh we want to avoid dynamic memory 531 00:21:22,200 --> 00:21:26,160 allocation entirely if you're not 532 00:21:24,600 --> 00:21:28,080 allocating memory dynamically it's very 533 00:21:26,160 --> 00:21:30,480 hard to leak it 534 00:21:28,080 --> 00:21:32,280 we want it to perform reasonably we want 535 00:21:30,480 --> 00:21:34,200 it to be security conscious and we want 536 00:21:32,280 --> 00:21:37,640 it to support IPv6 537 00:21:34,200 --> 00:21:37,640 just as well as ipv4 538 00:21:37,919 --> 00:21:44,220 so how does this work we have passed and 539 00:21:40,799 --> 00:21:46,380 we have qmu running on the same host I 540 00:21:44,220 --> 00:21:47,820 ran out of space a bit there so that qmu 541 00:21:46,380 --> 00:21:50,240 is running on the host it's just a bit 542 00:21:47,820 --> 00:21:50,240 offset 543 00:21:50,640 --> 00:21:54,840 we connect them with a Unix domain 544 00:21:52,919 --> 00:21:56,820 socket and what we do here is we use 545 00:21:54,840 --> 00:21:58,980 exactly the same protocol that we talked 546 00:21:56,820 --> 00:22:01,320 about briefly earlier where qm you can 547 00:21:58,980 --> 00:22:03,419 talk to another qmu back to back and do 548 00:22:01,320 --> 00:22:06,260 networking over that so we talk that 549 00:22:03,419 --> 00:22:09,120 same qmu socket protocol 550 00:22:06,260 --> 00:22:12,020 and using that we connect our virtual 551 00:22:09,120 --> 00:22:15,720 machine up to past it does the 552 00:22:12,020 --> 00:22:17,460 slurp-like bridging thing and we have a 553 00:22:15,720 --> 00:22:21,600 network connection so this is again 554 00:22:17,460 --> 00:22:23,400 using the Dash net stream option in qmu 555 00:22:21,600 --> 00:22:24,780 it does actually have to be that net 556 00:22:23,400 --> 00:22:27,179 stream these days this is one of those 557 00:22:24,780 --> 00:22:29,520 slight differences Dash net socket 558 00:22:27,179 --> 00:22:32,280 could only do TCP sockets and we prefer 559 00:22:29,520 --> 00:22:34,700 to do this over a ux demand socket Minor 560 00:22:32,280 --> 00:22:34,700 Details 561 00:22:35,700 --> 00:22:40,140 like internally 562 00:22:37,559 --> 00:22:43,200 um so over here on the left we have this 563 00:22:40,140 --> 00:22:45,179 virtual L2 so this is coming in over the 564 00:22:43,200 --> 00:22:47,580 socket from qmu 565 00:22:45,179 --> 00:22:50,039 uh we essentially split that up based on 566 00:22:47,580 --> 00:22:51,539 the protocol uh TCP and UDP are simple 567 00:22:50,039 --> 00:22:55,799 enough we just translate those into 568 00:22:51,539 --> 00:22:59,820 socket calls on the host kernel 569 00:22:55,799 --> 00:23:02,580 uh we can't deal uh entirely with icmp 570 00:22:59,820 --> 00:23:05,460 and icmpv6 we can handle some things 571 00:23:02,580 --> 00:23:07,559 though in particular uh modern kernels 572 00:23:05,460 --> 00:23:10,260 we have what are called ping sockets 573 00:23:07,559 --> 00:23:12,179 which does allow you to do pings as 574 00:23:10,260 --> 00:23:13,320 regular unprivileged sockets so we can 575 00:23:12,179 --> 00:23:15,659 do that 576 00:23:13,320 --> 00:23:18,720 and for us in pov6 there's a subset 577 00:23:15,659 --> 00:23:21,179 called NDP network discovery protocol so 578 00:23:18,720 --> 00:23:22,740 that's how the guest can find out an 579 00:23:21,179 --> 00:23:24,720 address find out what its route is 580 00:23:22,740 --> 00:23:26,400 supposed to be we internally answer 581 00:23:24,720 --> 00:23:28,020 queries from that so that the guest does 582 00:23:26,400 --> 00:23:31,260 what we expect it to 583 00:23:28,020 --> 00:23:34,440 likewise we answered DHCP and DHCP V6 584 00:23:31,260 --> 00:23:37,679 and op requests all internally 585 00:23:34,440 --> 00:23:39,059 and to do that we use netlink to talk to 586 00:23:37,679 --> 00:23:42,179 the host kernel to get the information 587 00:23:39,059 --> 00:23:45,120 we need to supply there 588 00:23:42,179 --> 00:23:47,580 so how do we avoid that 589 00:23:45,120 --> 00:23:50,940 uh the guest seems the same IP address 590 00:23:47,580 --> 00:23:53,179 as the host even for IPv6 yes that's 591 00:23:50,940 --> 00:23:53,179 weird 592 00:23:53,520 --> 00:24:01,980 uh and it gets that IP address from uh 593 00:23:58,679 --> 00:24:05,460 of it can get it from DHCP or DHCP V6 or 594 00:24:01,980 --> 00:24:07,320 from NDP any of those methods will let 595 00:24:05,460 --> 00:24:08,760 the guest discover the address it's 596 00:24:07,320 --> 00:24:10,700 supposed to have which it'll be told the 597 00:24:08,760 --> 00:24:13,260 same address as the host 598 00:24:10,700 --> 00:24:15,840 you can configure it but by default it 599 00:24:13,260 --> 00:24:17,640 takes it from the host interface with a 600 00:24:15,840 --> 00:24:20,940 default route default route so it'll 601 00:24:17,640 --> 00:24:23,700 usually take the the primary in a sense 602 00:24:20,940 --> 00:24:25,620 network interface of the host 603 00:24:23,700 --> 00:24:27,780 this works great for the guests 604 00:24:25,620 --> 00:24:29,340 connecting out to the outside world 605 00:24:27,780 --> 00:24:31,200 but obviously if you've got the same 606 00:24:29,340 --> 00:24:34,260 address as the host there's no way to 607 00:24:31,200 --> 00:24:36,840 actually address the host itself 608 00:24:34,260 --> 00:24:40,020 we have a special case Nat for that so 609 00:24:36,840 --> 00:24:42,440 we take a special address that the guest 610 00:24:40,020 --> 00:24:45,539 can use to address the host 611 00:24:42,440 --> 00:24:47,760 actually specifically it's mapped to the 612 00:24:45,539 --> 00:24:50,520 to 1270 to the loopback address on the 613 00:24:47,760 --> 00:24:52,380 host so actually talk to the guest as if 614 00:24:50,520 --> 00:24:54,840 we're to talk to the host as if we were 615 00:24:52,380 --> 00:24:58,080 local rather than talk to the host as if 616 00:24:54,840 --> 00:25:00,360 we were from somewhere else which is 617 00:24:58,080 --> 00:25:02,280 usually what we want 618 00:25:00,360 --> 00:25:04,440 this is a bit limited I'll come back to 619 00:25:02,280 --> 00:25:07,260 this in a minute there's also another 620 00:25:04,440 --> 00:25:08,159 special case for handling DNS queries 621 00:25:07,260 --> 00:25:09,900 um 622 00:25:08,159 --> 00:25:12,419 but I mean this is a different trade-off 623 00:25:09,900 --> 00:25:15,120 from that's it's in some ways less 624 00:25:12,419 --> 00:25:17,820 elegant than using that 625 00:25:15,120 --> 00:25:20,820 but it works much nicer in a thing like 626 00:25:17,820 --> 00:25:24,179 kubernetes where Nat really confuses 627 00:25:20,820 --> 00:25:26,900 high levels high level software 628 00:25:24,179 --> 00:25:26,900 security 629 00:25:27,840 --> 00:25:33,000 we have a bunch of bunch of approaches 630 00:25:30,600 --> 00:25:34,440 um using some more modern socket 631 00:25:33,000 --> 00:25:37,320 interfaces 632 00:25:34,440 --> 00:25:38,880 uh we don't need a complete TCP steak 633 00:25:37,320 --> 00:25:41,039 machine we can get away with a kind of 634 00:25:38,880 --> 00:25:43,679 cut down one that simplifies things 635 00:25:41,039 --> 00:25:46,260 smaller Tax Service like I said we avoid 636 00:25:43,679 --> 00:25:50,640 Dynamic allocation completely and that's 637 00:25:46,260 --> 00:25:51,900 actually enforced with setcomp so even 638 00:25:50,640 --> 00:25:53,820 if something gets in and tries to 639 00:25:51,900 --> 00:25:55,080 allocate memory it'll get a success and 640 00:25:53,820 --> 00:25:58,980 just die 641 00:25:55,080 --> 00:26:01,080 we use a bunch of uh static Checkers the 642 00:25:58,980 --> 00:26:04,620 routine checks have CPP checking Clank 643 00:26:01,080 --> 00:26:05,700 tidy we run coverity scan from time to 644 00:26:04,620 --> 00:26:07,980 time 645 00:26:05,700 --> 00:26:09,539 no external dependencies so what you see 646 00:26:07,980 --> 00:26:12,919 is what you get it does use the standard 647 00:26:09,539 --> 00:26:12,919 C library but nothing more than that 648 00:26:13,140 --> 00:26:19,860 moreover we do a bunch of sort of second 649 00:26:16,440 --> 00:26:23,880 layer security things we isolate 650 00:26:19,860 --> 00:26:26,159 ourselves using Linux namespaces so 651 00:26:23,880 --> 00:26:28,500 and we actually use pivot root so once 652 00:26:26,159 --> 00:26:30,419 the thing has configured itself you 653 00:26:28,500 --> 00:26:33,840 cannot touch the host file system from 654 00:26:30,419 --> 00:26:36,240 inside the past process context 655 00:26:33,840 --> 00:26:38,100 um and we also isolate a bunch of other 656 00:26:36,240 --> 00:26:41,820 namespaces so 657 00:26:38,100 --> 00:26:43,500 most of the host systems you can't uh 658 00:26:41,820 --> 00:26:46,500 after at least after configuration you 659 00:26:43,500 --> 00:26:48,600 can no longer touch on the host uh like 660 00:26:46,500 --> 00:26:51,480 I said we use set comp we filter down to 661 00:26:48,600 --> 00:26:54,840 currently only 26 assist calls on x8664 662 00:26:51,480 --> 00:26:56,460 the exact set we need depends varies on 663 00:26:54,840 --> 00:26:59,279 depending on the architecture for 664 00:26:56,460 --> 00:27:02,940 complicated reasons 665 00:26:59,279 --> 00:27:04,380 uh we drop capabilities uh in both where 666 00:27:02,940 --> 00:27:07,080 we started and in that isolated 667 00:27:04,380 --> 00:27:08,700 namespace when you create a namespace 668 00:27:07,080 --> 00:27:10,559 you get capabilities in there but we 669 00:27:08,700 --> 00:27:14,460 drop them again 670 00:27:10,559 --> 00:27:16,799 and we ship with app armor and SE Linux 671 00:27:14,460 --> 00:27:19,740 profiles included so we've sort of got 672 00:27:16,799 --> 00:27:22,020 multiple layers of isolation here to try 673 00:27:19,740 --> 00:27:24,059 and keep this secure 674 00:27:22,020 --> 00:27:26,820 uh performance 675 00:27:24,059 --> 00:27:29,039 um you do a bunch of things in in TCP we 676 00:27:26,820 --> 00:27:31,559 actually advertise a very large MTU to 677 00:27:29,039 --> 00:27:36,360 the guest a 64k MTU so much bigger than 678 00:27:31,559 --> 00:27:40,440 a normal ethernet would be like um 1.5 k 679 00:27:36,360 --> 00:27:41,760 and we coalesce our TCP segments as they 680 00:27:40,440 --> 00:27:42,539 come in 681 00:27:41,760 --> 00:27:45,000 um 682 00:27:42,539 --> 00:27:47,000 which means that even if we've got a lot 683 00:27:45,000 --> 00:27:50,220 of packets coming in from the outside 684 00:27:47,000 --> 00:27:52,919 we can do relatively few packets into 685 00:27:50,220 --> 00:27:54,960 the guest which reduces the overhead 686 00:27:52,919 --> 00:27:58,380 reduces number of ciscals 687 00:27:54,960 --> 00:28:01,020 for UDP we use these uh send M message 688 00:27:58,380 --> 00:28:03,419 and receive M message there's two M's 689 00:28:01,020 --> 00:28:05,340 there which matters uh which are single 690 00:28:03,419 --> 00:28:06,900 sys calls that can send or receive a 691 00:28:05,340 --> 00:28:09,539 whole batch of packets a bunch of 692 00:28:06,900 --> 00:28:10,860 datagrams with one call 693 00:28:09,539 --> 00:28:14,940 um 694 00:28:10,860 --> 00:28:19,140 again less ciscals uh we use we've got a 695 00:28:14,940 --> 00:28:19,919 ovx2 accelerated checksum on x86 696 00:28:19,140 --> 00:28:21,360 um 697 00:28:19,919 --> 00:28:23,700 in theory could be done on other 698 00:28:21,360 --> 00:28:25,980 platforms just hasn't been yet 699 00:28:23,700 --> 00:28:27,659 and we use some buffer pools where we 700 00:28:25,980 --> 00:28:29,880 pre-partially pre-generate the headers 701 00:28:27,659 --> 00:28:33,000 the main benefit of that is data 702 00:28:29,880 --> 00:28:36,120 locality not so much just saving the 703 00:28:33,000 --> 00:28:38,279 writing that's fairly cheap but it does 704 00:28:36,120 --> 00:28:39,779 improve uh delically 705 00:28:38,279 --> 00:28:42,299 I'm not going to give you numbers today 706 00:28:39,779 --> 00:28:45,779 they're a bit in flux 707 00:28:42,299 --> 00:28:50,220 um it is much faster than slurp 708 00:28:45,779 --> 00:28:53,600 it is pretty comparable to a tap just a 709 00:28:50,220 --> 00:28:56,580 plain single threaded tap connection 710 00:28:53,600 --> 00:28:58,679 it is significantly slower than 711 00:28:56,580 --> 00:29:01,140 multi-cue tap at least once you get to 712 00:28:58,679 --> 00:29:03,900 sort of four or eight cues 713 00:29:01,140 --> 00:29:06,600 uh we probably won't ever fully compete 714 00:29:03,900 --> 00:29:09,779 with that but we hope to be competitive 715 00:29:06,600 --> 00:29:12,000 enough to be useful in in real world uh 716 00:29:09,779 --> 00:29:13,919 real world situations 717 00:29:12,000 --> 00:29:17,179 like I said we've got full IPv6 support 718 00:29:13,919 --> 00:29:20,039 we support NDP internally and DHCP V6 719 00:29:17,179 --> 00:29:22,980 there's basically it's pretty much 720 00:29:20,039 --> 00:29:24,299 completely symmetric between V4 and V6 721 00:29:22,980 --> 00:29:26,880 support 722 00:29:24,299 --> 00:29:28,919 and we have uh slope had this as well 723 00:29:26,880 --> 00:29:30,000 but it's a bit more flexible in past you 724 00:29:28,919 --> 00:29:33,419 can 725 00:29:30,000 --> 00:29:36,600 control which ports on the host will 726 00:29:33,419 --> 00:29:38,700 forward into the guest and you can 727 00:29:36,600 --> 00:29:40,740 select all the ports with something Cube 728 00:29:38,700 --> 00:29:44,299 that wants to do at just particular 729 00:29:40,740 --> 00:29:44,299 ranges just particular ports 730 00:29:45,179 --> 00:29:48,539 faster 731 00:29:46,159 --> 00:29:49,980 this is a variant on the same thing so 732 00:29:48,539 --> 00:29:51,960 this is the equivalent of slurp for net 733 00:29:49,980 --> 00:29:56,039 and S this is basically the same thing 734 00:29:51,960 --> 00:29:58,100 but instead of a VM you have a namespace 735 00:29:56,039 --> 00:30:00,840 or a container 736 00:29:58,100 --> 00:30:02,760 as you guessed 737 00:30:00,840 --> 00:30:04,980 this does require privilege in that 738 00:30:02,760 --> 00:30:07,440 namespace but not on the host 739 00:30:04,980 --> 00:30:09,919 uh it's implemented in the same binary 740 00:30:07,440 --> 00:30:09,919 as past 741 00:30:09,960 --> 00:30:14,940 it's also got an accelerated path for 742 00:30:12,059 --> 00:30:17,580 local to local that uses splice for TCP 743 00:30:14,940 --> 00:30:21,179 which goes very very fast 744 00:30:17,580 --> 00:30:23,880 uh you can afford ports outwards as well 745 00:30:21,179 --> 00:30:26,880 which 746 00:30:23,880 --> 00:30:29,840 the semantics are a little bit weird but 747 00:30:26,880 --> 00:30:29,840 um it can be useful 748 00:30:30,600 --> 00:30:35,580 all right so what uh pasta looks like 749 00:30:33,000 --> 00:30:37,440 there is um you've got a network name 750 00:30:35,580 --> 00:30:40,799 space that could be a container it could 751 00:30:37,440 --> 00:30:44,399 be one you've set up by some other means 752 00:30:40,799 --> 00:30:46,500 um and that just uh we use a tap device 753 00:30:44,399 --> 00:30:48,960 to connect that through now you might 754 00:30:46,500 --> 00:30:50,700 ask why is the tap device okay here but 755 00:30:48,960 --> 00:30:52,679 it wasn't okay 756 00:30:50,700 --> 00:30:53,880 uh in the other cases and the the answer 757 00:30:52,679 --> 00:30:55,860 is that 758 00:30:53,880 --> 00:30:58,159 in this case we only need the privilege 759 00:30:55,860 --> 00:31:01,260 in the container and not on the host end 760 00:30:58,159 --> 00:31:05,000 which is what makes this work whereas 761 00:31:01,260 --> 00:31:05,000 tap in other situations would not work 762 00:31:05,880 --> 00:31:12,720 uh we have packages for Fedora uh Centos 763 00:31:09,779 --> 00:31:14,700 uh Debian Ubuntu unofficial ones for a 764 00:31:12,720 --> 00:31:18,960 couple of other things 765 00:31:14,700 --> 00:31:21,179 uh we've got some more on the way 766 00:31:18,960 --> 00:31:24,659 in theory it should be just about 767 00:31:21,179 --> 00:31:25,520 neutral to architecture it's tested on 768 00:31:24,659 --> 00:31:29,700 on 769 00:31:25,520 --> 00:31:33,000 non-x86 a little bit but not a lot 770 00:31:29,700 --> 00:31:35,700 uh we support libsy and we're working on 771 00:31:33,000 --> 00:31:38,580 uh muscle support right now 772 00:31:35,700 --> 00:31:41,100 it is Linux only uh it uses a bunch of 773 00:31:38,580 --> 00:31:43,860 Linux specific kernel features there are 774 00:31:41,100 --> 00:31:46,620 alternatives uh in on some other os's 775 00:31:43,860 --> 00:31:48,000 but not ones you can just drop in 776 00:31:46,620 --> 00:31:52,380 um they're ones where we'd need a 777 00:31:48,000 --> 00:31:54,480 specific port to other operating systems 778 00:31:52,380 --> 00:31:55,679 and we're working uh or something we've 779 00:31:54,480 --> 00:31:58,440 been working a lot on is integration 780 00:31:55,679 --> 00:32:01,020 with other things so just recently uh 781 00:31:58,440 --> 00:32:04,140 support's gone into libvert it actually 782 00:32:01,020 --> 00:32:06,120 went in liver 900 but there are some 783 00:32:04,140 --> 00:32:09,380 bugs if you've got SC Linux enabled 784 00:32:06,120 --> 00:32:11,580 which should now should be fixed in 920 785 00:32:09,380 --> 00:32:14,100 so that's you've got a network 786 00:32:11,580 --> 00:32:15,120 configuration option in 787 00:32:14,100 --> 00:32:17,940 um 788 00:32:15,120 --> 00:32:19,500 in liver that says use past instead of 789 00:32:17,940 --> 00:32:21,240 whatever other options you had for 790 00:32:19,500 --> 00:32:25,080 networking 791 00:32:21,240 --> 00:32:27,419 uh we we've got this supported in podman 792 00:32:25,080 --> 00:32:29,159 as an alternative to slip for net NS for 793 00:32:27,419 --> 00:32:31,620 us networking 794 00:32:29,159 --> 00:32:34,740 and uh 795 00:32:31,620 --> 00:32:37,399 uh keywords uh it is in there we've hit 796 00:32:34,740 --> 00:32:39,779 some complications which we are 797 00:32:37,399 --> 00:32:41,640 gradually working on 798 00:32:39,779 --> 00:32:43,559 um it's not as smooth as we would like 799 00:32:41,640 --> 00:32:45,360 at the moment but we're hoping it'll get 800 00:32:43,559 --> 00:32:48,679 there 801 00:32:45,360 --> 00:32:48,679 all right we're actually from here 802 00:32:48,779 --> 00:32:53,760 uh probably the highest priority is we 803 00:32:51,480 --> 00:32:55,440 do have some uh 804 00:32:53,760 --> 00:32:56,640 less than optimal things that we're 805 00:32:55,440 --> 00:32:58,500 trying to work on 806 00:32:56,640 --> 00:33:00,720 um like I said the net is a bit 807 00:32:58,500 --> 00:33:03,539 inflexible and it's got some weird edge 808 00:33:00,720 --> 00:33:06,779 cases uh we want to improve that make it 809 00:33:03,539 --> 00:33:09,539 more consistent uh more flexible 810 00:33:06,779 --> 00:33:11,279 uh and more robust against different 811 00:33:09,539 --> 00:33:13,260 configurations 812 00:33:11,279 --> 00:33:14,700 um so that's something 813 00:33:13,260 --> 00:33:16,799 we're not working on right at this 814 00:33:14,700 --> 00:33:18,960 instant but it's kind of high on the 815 00:33:16,799 --> 00:33:20,820 queue for us to get at 816 00:33:18,960 --> 00:33:23,580 uh one problem and this is one of the 817 00:33:20,820 --> 00:33:25,200 complications with Cube vert is it can 818 00:33:23,580 --> 00:33:27,120 use a lot of Kernel memory if you're 819 00:33:25,200 --> 00:33:29,340 forwarding a lot of ports so they were 820 00:33:27,120 --> 00:33:31,860 forwarding essentially all the ports 821 00:33:29,340 --> 00:33:36,480 from the Pod into the VM 822 00:33:31,860 --> 00:33:39,919 so if you've got TCP UDP ipv4 and IPv6 823 00:33:36,480 --> 00:33:42,120 that's about 200 000 listening sockets 824 00:33:39,919 --> 00:33:43,799 that takes up quite a lot of Kernel 825 00:33:42,120 --> 00:33:45,600 memory we've got a couple of things 826 00:33:43,799 --> 00:33:47,220 there where we can reduce it we've 827 00:33:45,600 --> 00:33:49,320 reduced it a little bit 828 00:33:47,220 --> 00:33:50,820 there's only so much we can do we do 829 00:33:49,320 --> 00:33:52,260 actually have some ideas for some kernel 830 00:33:50,820 --> 00:33:54,059 extensions which would bring let us 831 00:33:52,260 --> 00:33:56,039 bring this way down not sure where 832 00:33:54,059 --> 00:33:57,419 that'll go but we've got a few ideas 833 00:33:56,039 --> 00:34:00,539 there 834 00:33:57,419 --> 00:34:03,240 testing in CI is uh 835 00:34:00,539 --> 00:34:05,100 not as good as we'd like it is a bit too 836 00:34:03,240 --> 00:34:07,260 fragile and dependent on the host's 837 00:34:05,100 --> 00:34:09,000 network setup 838 00:34:07,260 --> 00:34:12,480 so we hope to improve that improve 839 00:34:09,000 --> 00:34:12,480 coverage there 840 00:34:13,080 --> 00:34:16,919 uh big thing in terms of performance 841 00:34:15,179 --> 00:34:19,440 we'd like to add support for V host 842 00:34:16,919 --> 00:34:23,159 users so instead of using that qmu 843 00:34:19,440 --> 00:34:25,619 socket protocol we interact with qmu via 844 00:34:23,159 --> 00:34:28,440 a v host user 845 00:34:25,619 --> 00:34:30,480 that should go a lot faster at least if 846 00:34:28,440 --> 00:34:31,980 we can multi-thread it which is perhaps 847 00:34:30,480 --> 00:34:33,899 not trivial but 848 00:34:31,980 --> 00:34:36,359 um we're looking at it 849 00:34:33,899 --> 00:34:40,139 we'd like to do fuzz testing uh 850 00:34:36,359 --> 00:34:42,300 theoretically easy practically kind of a 851 00:34:40,139 --> 00:34:43,080 pain to get all the bits lined up to 852 00:34:42,300 --> 00:34:45,540 work 853 00:34:43,080 --> 00:34:47,580 and we'd like it to be more portable 854 00:34:45,540 --> 00:34:49,679 particularly BSD and Darwin it would be 855 00:34:47,580 --> 00:34:50,700 nice to have 856 00:34:49,679 --> 00:34:52,980 um 857 00:34:50,700 --> 00:34:54,659 ports there we think there are the 858 00:34:52,980 --> 00:34:57,140 network features but we kind of aren't 859 00:34:54,659 --> 00:34:59,820 experts in that area 860 00:34:57,140 --> 00:35:01,920 modernist writing something new in C 861 00:34:59,820 --> 00:35:04,020 that's security conscious these days is 862 00:35:01,920 --> 00:35:05,580 perhaps an interesting choice 863 00:35:04,020 --> 00:35:07,440 um 864 00:35:05,580 --> 00:35:10,440 we have certainly thought about using 865 00:35:07,440 --> 00:35:13,140 rust in in places to improve that kind 866 00:35:10,440 --> 00:35:14,760 of memory safety it's not trivial 867 00:35:13,140 --> 00:35:17,780 because we have a bunch of low-level 868 00:35:14,760 --> 00:35:22,760 interactions so we'd certainly need 869 00:35:17,780 --> 00:35:22,760 uh unsafe and a few things to deal with 870 00:35:23,820 --> 00:35:28,740 possible use cases we might add 871 00:35:26,700 --> 00:35:30,900 um it could replace stash net you the 872 00:35:28,740 --> 00:35:32,700 current dashnet user in qmu uh 873 00:35:30,900 --> 00:35:34,440 theoretically straightforward nobody's 874 00:35:32,700 --> 00:35:37,560 had time to work on it yet 875 00:35:34,440 --> 00:35:38,460 may have some uses in Carter containers 876 00:35:37,560 --> 00:35:40,859 um 877 00:35:38,460 --> 00:35:42,619 this is an interesting one clatt is a is 878 00:35:40,859 --> 00:35:47,660 a 879 00:35:42,619 --> 00:35:51,240 internet draft for running ipv4 only 880 00:35:47,660 --> 00:35:52,920 applications in an IPv6 only network 881 00:35:51,240 --> 00:35:55,800 with a sort of 882 00:35:52,920 --> 00:35:59,400 it's kind of like Nats but ipv4 to IPv6 883 00:35:55,800 --> 00:36:01,740 in a sense and past could possibly be 884 00:35:59,400 --> 00:36:04,079 extended to do that and we're interested 885 00:36:01,740 --> 00:36:06,540 to hear of new use cases that other 886 00:36:04,079 --> 00:36:08,880 people might have 887 00:36:06,540 --> 00:36:10,560 we would welcome contributions uh 888 00:36:08,880 --> 00:36:13,260 there's two main people we've got a few 889 00:36:10,560 --> 00:36:15,060 other people uh it's quite active at 890 00:36:13,260 --> 00:36:16,980 this time and 891 00:36:15,060 --> 00:36:18,960 we're sort of really on the cusp of 892 00:36:16,980 --> 00:36:21,660 going from an experimental thing to 893 00:36:18,960 --> 00:36:24,960 something that can really be used so 894 00:36:21,660 --> 00:36:27,859 um if people are interested in 895 00:36:24,960 --> 00:36:27,859 uh 896 00:36:27,900 --> 00:36:32,160 there are places you can contribute 897 00:36:30,000 --> 00:36:34,740 um there's a link there to another 898 00:36:32,160 --> 00:36:37,740 presentation that Stefano did at I think 899 00:36:34,740 --> 00:36:40,460 last year's KVM forum which has some 900 00:36:37,740 --> 00:36:40,460 slightly different information 901 00:36:41,579 --> 00:36:47,760 uh all right credits there 902 00:36:45,060 --> 00:36:48,480 uh so that's basically the end 903 00:36:47,760 --> 00:36:50,460 um 904 00:36:48,480 --> 00:36:53,099 if people want to ask questions I 905 00:36:50,460 --> 00:36:55,980 suggest you prepare that and meanwhile I 906 00:36:53,099 --> 00:36:57,960 will give a brief demo here 907 00:36:55,980 --> 00:37:00,380 I think we have like eight Tish minutes 908 00:36:57,960 --> 00:37:00,380 left 909 00:37:07,619 --> 00:37:09,720 I think about David 910 00:37:09,060 --> 00:37:12,480 um 911 00:37:09,720 --> 00:37:14,640 it's a question around how you found the 912 00:37:12,480 --> 00:37:17,880 L4 decapsulation maps to the modern 913 00:37:14,640 --> 00:37:22,320 socket API like could either be a 914 00:37:17,880 --> 00:37:25,140 nightmare or brilliant like ah 915 00:37:22,320 --> 00:37:26,760 I'm not totally clear a clear which 916 00:37:25,140 --> 00:37:29,099 aspect of that you're meaning 917 00:37:26,760 --> 00:37:31,500 um is it is it a fairly linear process 918 00:37:29,099 --> 00:37:34,500 to go from your L4 decapsulation into 919 00:37:31,500 --> 00:37:36,300 the soccer cores available or is there a 920 00:37:34,500 --> 00:37:40,260 lot of code that has to manage like your 921 00:37:36,300 --> 00:37:43,140 sequence numbering your uh TSB options 922 00:37:40,260 --> 00:37:44,339 in between it's not trivial 923 00:37:43,140 --> 00:37:47,160 um it's not 924 00:37:44,339 --> 00:37:49,220 super complicated either 925 00:37:47,160 --> 00:37:49,220 um 926 00:37:51,200 --> 00:37:56,520 so 927 00:37:53,040 --> 00:37:58,260 so here we have it uh we can build it 928 00:37:56,520 --> 00:37:59,520 oops 929 00:37:58,260 --> 00:38:02,240 so 930 00:37:59,520 --> 00:38:04,859 it doesn't take terribly long to compile 931 00:38:02,240 --> 00:38:07,040 because that's less than 10 000 lines of 932 00:38:04,859 --> 00:38:07,040 code 933 00:38:10,260 --> 00:38:15,300 and the easiest way to invoke it 934 00:38:12,720 --> 00:38:16,260 is uh with pasta the 935 00:38:15,300 --> 00:38:18,420 um 936 00:38:16,260 --> 00:38:21,480 namespace version if you run it with no 937 00:38:18,420 --> 00:38:22,920 arguments it will create a new namespace 938 00:38:21,480 --> 00:38:26,960 for you so actually I'll show you here 939 00:38:22,920 --> 00:38:29,760 so I'm not root as you can see 940 00:38:26,960 --> 00:38:32,220 uh on this system we have a couple of 941 00:38:29,760 --> 00:38:33,960 network interfaces we have the main one 942 00:38:32,220 --> 00:38:37,040 being the the wireless here which is the 943 00:38:33,960 --> 00:38:37,040 one that's connected at the moment 944 00:38:37,560 --> 00:38:42,300 we can uh 945 00:38:40,200 --> 00:38:44,520 just run past it without things and now 946 00:38:42,300 --> 00:38:46,260 we appear to be root 947 00:38:44,520 --> 00:38:47,940 um we're not really root we're root 948 00:38:46,260 --> 00:38:52,260 within this namespace 949 00:38:47,940 --> 00:38:54,780 so if we actually try to for example uh 950 00:38:52,260 --> 00:38:57,420 you know create a file 951 00:38:54,780 --> 00:38:58,560 that will fail because we're not rooting 952 00:38:57,420 --> 00:39:01,260 the 953 00:38:58,560 --> 00:39:04,260 init namespace 954 00:39:01,260 --> 00:39:06,660 but we are a root within this user and 955 00:39:04,260 --> 00:39:08,940 network namespace where it actually 956 00:39:06,660 --> 00:39:10,380 copies that interface name which is why 957 00:39:08,940 --> 00:39:11,400 it looks like the same device here but 958 00:39:10,380 --> 00:39:13,619 you see we don't see the other 959 00:39:11,400 --> 00:39:17,160 interfaces so this is actually the 960 00:39:13,619 --> 00:39:20,900 pastor created uh interface 961 00:39:17,160 --> 00:39:20,900 and we can 962 00:39:21,660 --> 00:39:26,599 uh 963 00:39:23,460 --> 00:39:29,520 run a DHCP on that 964 00:39:26,599 --> 00:39:30,960 and again that permission denied is just 965 00:39:29,520 --> 00:39:33,359 because we're not rooting the main 966 00:39:30,960 --> 00:39:36,660 namespace I haven't told this to put its 967 00:39:33,359 --> 00:39:38,160 uh its log file somewhere else 968 00:39:36,660 --> 00:39:41,700 and that will have configured the 969 00:39:38,160 --> 00:39:45,540 interface I don't does the LCA Network 970 00:39:41,700 --> 00:39:48,180 have ifub6 doesn't look like it so IPv6 971 00:39:45,540 --> 00:39:50,420 is not going to work here and now we can 972 00:39:48,180 --> 00:39:54,300 do something like 973 00:39:50,420 --> 00:39:57,140 try to connect to the outside world 974 00:39:54,300 --> 00:39:57,140 and 975 00:39:57,380 --> 00:40:02,820 yes of course Tracer out is going to be 976 00:40:00,060 --> 00:40:04,260 a bit weird because everything's kind of 977 00:40:02,820 --> 00:40:07,380 proxied through our 978 00:40:04,260 --> 00:40:10,980 user space intermediary 979 00:40:07,380 --> 00:40:13,040 but we can do something like 980 00:40:10,980 --> 00:40:13,040 um 981 00:40:16,200 --> 00:40:23,240 uh 982 00:40:18,900 --> 00:40:23,240 just uh get a web page and we have 983 00:40:24,359 --> 00:40:32,599 something there so not a very extensive 984 00:40:27,780 --> 00:40:32,599 demo but there is any other questions 985 00:40:33,599 --> 00:40:37,760 or things you'd like me to try on this 986 00:40:35,400 --> 00:40:37,760 demo 987 00:40:42,440 --> 00:40:46,980 I'd be interested in hearing a little 988 00:40:44,400 --> 00:40:49,619 bit about it can you yeah I should know 989 00:40:46,980 --> 00:40:51,540 that hello I'd be interested in hearing 990 00:40:49,619 --> 00:40:53,599 a little bit about 991 00:40:51,540 --> 00:40:53,599 um 992 00:40:54,060 --> 00:40:57,960 what 993 00:40:56,099 --> 00:40:59,760 what would be involved in using rust 994 00:40:57,960 --> 00:41:01,800 instead of C so 995 00:40:59,760 --> 00:41:04,920 you said you'd have to use unsafe a lot 996 00:41:01,800 --> 00:41:06,720 does that sort of negate the benefits of 997 00:41:04,920 --> 00:41:10,260 using a language like rust in place of C 998 00:41:06,720 --> 00:41:13,320 or is it not necessarily 999 00:41:10,260 --> 00:41:16,280 um it does it depends how much you have 1000 00:41:13,320 --> 00:41:16,280 to use it you do have to be 1001 00:41:19,440 --> 00:41:22,560 it at least marks the bit you need to be 1002 00:41:21,420 --> 00:41:23,579 careful of 1003 00:41:22,560 --> 00:41:25,560 um 1004 00:41:23,579 --> 00:41:27,960 in order to do it well you do need to 1005 00:41:25,560 --> 00:41:29,820 think pretty carefully about what your 1006 00:41:27,960 --> 00:41:31,500 abstractions are and what the invariants 1007 00:41:29,820 --> 00:41:32,960 are you need to maintain in the unsafe 1008 00:41:31,500 --> 00:41:36,180 code 1009 00:41:32,960 --> 00:41:39,420 which can be 1010 00:41:36,180 --> 00:41:44,960 tricky it doesn't completely negate 1011 00:41:39,420 --> 00:41:44,960 the the benefits but it does mean 1012 00:41:45,060 --> 00:41:49,579 they're much more conditional and 1013 00:41:48,480 --> 00:41:53,880 um 1014 00:41:49,579 --> 00:41:56,040 they're harder to harder to reach 1015 00:41:53,880 --> 00:41:57,660 if that makes sense 1016 00:41:56,040 --> 00:42:00,359 yeah 1017 00:41:57,660 --> 00:42:01,440 one one option we we would have that we 1018 00:42:00,359 --> 00:42:04,440 could look at if we were doing that 1019 00:42:01,440 --> 00:42:06,119 would be to put all the sort of 1020 00:42:04,440 --> 00:42:08,640 configuration and startup command line 1021 00:42:06,119 --> 00:42:10,800 parsing that stuff into rust and then 1022 00:42:08,640 --> 00:42:13,800 basically have a c module that does the 1023 00:42:10,800 --> 00:42:15,240 runs the engine uh once that's all set 1024 00:42:13,800 --> 00:42:17,220 up 1025 00:42:15,240 --> 00:42:20,099 um 1026 00:42:17,220 --> 00:42:22,859 that makes a lot of theoretical sense 1027 00:42:20,099 --> 00:42:24,780 um again it's a bit of a pain because we 1028 00:42:22,859 --> 00:42:27,900 have to 1029 00:42:24,780 --> 00:42:30,060 we have to pack all the information we 1030 00:42:27,900 --> 00:42:32,280 need for that engine which is not huge 1031 00:42:30,060 --> 00:42:33,960 but it's not nothing 1032 00:42:32,280 --> 00:42:35,700 and we have to package that in a way 1033 00:42:33,960 --> 00:42:38,460 that both the C code and the rust code 1034 00:42:35,700 --> 00:42:41,220 can access which can be done but it's 1035 00:42:38,460 --> 00:42:43,440 just another layer of complication so 1036 00:42:41,220 --> 00:42:48,440 there's a bunch of options but it's 1037 00:42:43,440 --> 00:42:48,440 it's not kind of a no-brainer to do yeah 1038 00:42:49,560 --> 00:42:53,540 it looks like this question up the back 1039 00:42:51,359 --> 00:42:53,540 there 1040 00:42:58,079 --> 00:43:02,819 I was wondering the no dynamic memory 1041 00:43:01,200 --> 00:43:05,579 allocation I can see that it'd make a 1042 00:43:02,819 --> 00:43:08,160 big difference to the security how hard 1043 00:43:05,579 --> 00:43:11,339 was it sort of dealing with that or it's 1044 00:43:08,160 --> 00:43:14,160 fairly stateless so it doesn't uh it's 1045 00:43:11,339 --> 00:43:16,800 not that bad um that honestly when I 1046 00:43:14,160 --> 00:43:17,839 came on that was already essentially 1047 00:43:16,800 --> 00:43:22,319 done 1048 00:43:17,839 --> 00:43:25,020 uh the main thing it requires is a 1049 00:43:22,319 --> 00:43:27,060 interesting hack in TCP 1050 00:43:25,020 --> 00:43:28,500 the perhaps the biggest barrier there is 1051 00:43:27,060 --> 00:43:29,940 the most obvious way to implement 1052 00:43:28,500 --> 00:43:34,619 something like this is for every 1053 00:43:29,940 --> 00:43:36,119 connection you have a buffer that 1054 00:43:34,619 --> 00:43:37,980 stashes information where it's in 1055 00:43:36,119 --> 00:43:39,480 between the two sides and if you had to 1056 00:43:37,980 --> 00:43:40,560 allocate that every time you made a new 1057 00:43:39,480 --> 00:43:43,380 connection 1058 00:43:40,560 --> 00:43:46,680 that would be a problem 1059 00:43:43,380 --> 00:43:48,720 um now you could do that and we we do do 1060 00:43:46,680 --> 00:43:50,760 you use a bunch of static arrays so we 1061 00:43:48,720 --> 00:43:52,380 just have a connection table not 1062 00:43:50,760 --> 00:43:54,180 containing a buffer in fact but we do 1063 00:43:52,380 --> 00:43:56,480 have a big connection table with 1064 00:43:54,180 --> 00:43:59,400 pre-allocated 1065 00:43:56,480 --> 00:44:01,560 uh do that would be even huger than it 1066 00:43:59,400 --> 00:44:03,240 is if we had buffers in there the way 1067 00:44:01,560 --> 00:44:06,000 we're able to get away with it without 1068 00:44:03,240 --> 00:44:08,640 buffers is a bit of a hack using the 1069 00:44:06,000 --> 00:44:11,339 message Peak option what we actually do 1070 00:44:08,640 --> 00:44:13,980 is when we take data from the TCP socket 1071 00:44:11,339 --> 00:44:15,960 we initially just Peak it so it stays in 1072 00:44:13,980 --> 00:44:18,000 the kernel buffers 1073 00:44:15,960 --> 00:44:20,640 and we only discard it from the kernel 1074 00:44:18,000 --> 00:44:22,859 buffers when it gets acknowledged from 1075 00:44:20,640 --> 00:44:25,020 the other side so we're kind of able to 1076 00:44:22,859 --> 00:44:27,180 abuse the kernel socket buffers there to 1077 00:44:25,020 --> 00:44:30,619 avoid allocating our own buffers which 1078 00:44:27,180 --> 00:44:30,619 is an interesting trick 1079 00:44:34,319 --> 00:44:36,859 together 1080 00:44:38,700 --> 00:44:44,420 I think somebody there 1081 00:44:41,640 --> 00:44:44,420 had a question 1082 00:44:49,200 --> 00:44:52,619 so I noticed you said you're reporting 1083 00:44:51,359 --> 00:44:54,180 it 1084 00:44:52,619 --> 00:44:57,900 I know you said you were putting it to 1085 00:44:54,180 --> 00:44:59,579 Amazon is is dynamic memory allocation 1086 00:44:57,900 --> 00:45:01,740 issue there because obviously libsy 1087 00:44:59,579 --> 00:45:04,500 might decide to do allocations you're 1088 00:45:01,740 --> 00:45:07,020 not expecting so sorry Port 2 1089 00:45:04,500 --> 00:45:10,800 um for Alpine 1090 00:45:07,020 --> 00:45:13,380 uh and your nusl oh right 1091 00:45:10,800 --> 00:45:15,000 um sorry what's the so so let's see 1092 00:45:13,380 --> 00:45:16,680 might decide to do dynamic memory 1093 00:45:15,000 --> 00:45:19,319 allocation that you're not expecting is 1094 00:45:16,680 --> 00:45:21,300 that that that's probably less of a 1095 00:45:19,319 --> 00:45:23,280 problem with with muzzle than it was 1096 00:45:21,300 --> 00:45:25,260 with glibc gillibcy allocates in a bunch 1097 00:45:23,280 --> 00:45:27,300 of places we do have a bunch of hacks 1098 00:45:25,260 --> 00:45:29,400 there to convince glibcy to not call 1099 00:45:27,300 --> 00:45:31,619 Jill Ipsy we actually avoid several 1100 00:45:29,400 --> 00:45:34,619 gelib C functions which would be obvious 1101 00:45:31,619 --> 00:45:37,319 choices like the um we basically have 1102 00:45:34,619 --> 00:45:39,599 our own uh stuff to talk the syslog 1103 00:45:37,319 --> 00:45:41,160 socket rather than using the normal open 1104 00:45:39,599 --> 00:45:43,619 log and syslog functions because those 1105 00:45:41,160 --> 00:45:45,119 allocate in glibc there are a bunch of 1106 00:45:43,619 --> 00:45:46,560 hacks like that to avoid that it's 1107 00:45:45,119 --> 00:45:49,220 probably easier and muzzle than is in 1108 00:45:46,560 --> 00:45:49,220 glibsy 1109 00:45:50,880 --> 00:45:53,000 um 1110 00:45:53,720 --> 00:45:58,140 all right time is up for questions thank 1111 00:45:56,339 --> 00:46:01,840 you very much 1112 00:45:58,140 --> 00:46:01,840 [Applause]