1 00:00:06,320 --> 00:00:11,499 [Music] 2 00:00:16,480 --> 00:00:21,199 no my hook am i welcome back to lca 3 00:00:19,520 --> 00:00:23,039 2022. 4 00:00:21,199 --> 00:00:25,199 call gel tna i'm joel if you're just 5 00:00:23,039 --> 00:00:28,240 joining us welcome to lca 6 00:00:25,199 --> 00:00:31,279 um we have with us fraser 7 00:00:28,240 --> 00:00:33,680 tweedle who is another fellow red hatter 8 00:00:31,279 --> 00:00:35,920 based out of brisbane uh fraser works on 9 00:00:33,680 --> 00:00:37,680 security and identity solutions at red 10 00:00:35,920 --> 00:00:39,520 hat and today he's going to be telling 11 00:00:37,680 --> 00:00:41,600 us about an upcoming and already 12 00:00:39,520 --> 00:00:43,280 delivered kernel and kubernetes security 13 00:00:41,600 --> 00:00:45,600 features that enable for bitter 14 00:00:43,280 --> 00:00:49,120 container isolation and secure 15 00:00:45,600 --> 00:00:51,920 deployment of systemd based workloads 16 00:00:49,120 --> 00:00:54,399 know my fraser 17 00:00:51,920 --> 00:00:54,399 take it away 18 00:00:54,559 --> 00:00:58,399 so 19 00:00:55,520 --> 00:00:59,440 welcome to my presentation and thanks to 20 00:00:58,399 --> 00:01:02,640 everyone 21 00:00:59,440 --> 00:01:05,040 especially those attending live because 22 00:01:02,640 --> 00:01:08,320 in this particular time slot uh there's 23 00:01:05,040 --> 00:01:11,040 gold in them their tracks 24 00:01:08,320 --> 00:01:13,200 and uh i'm a little resentful to have to 25 00:01:11,040 --> 00:01:15,920 present in this time slot 26 00:01:13,200 --> 00:01:18,400 some of those other prezzos are great 27 00:01:15,920 --> 00:01:21,520 so good luck to them also and let's get 28 00:01:18,400 --> 00:01:23,680 into it so this talk is creative commons 29 00:01:21,520 --> 00:01:25,280 attribution licensed except where 30 00:01:23,680 --> 00:01:28,000 otherwise noted 31 00:01:25,280 --> 00:01:29,840 the slides are available now 32 00:01:28,000 --> 00:01:31,920 and if you're the sort of person who 33 00:01:29,840 --> 00:01:34,560 likes to follow the links 34 00:01:31,920 --> 00:01:36,159 and open up two dozen tabs 35 00:01:34,560 --> 00:01:38,240 to go deep 36 00:01:36,159 --> 00:01:40,479 after or during the prezzo there are 37 00:01:38,240 --> 00:01:41,360 hyperlinks in the pdf 38 00:01:40,479 --> 00:01:44,320 so 39 00:01:41,360 --> 00:01:47,040 go grab the pdf from speaker deck 40 00:01:44,320 --> 00:01:49,600 and you can follow along from there 41 00:01:47,040 --> 00:01:50,399 and i will be available in the chat room 42 00:01:49,600 --> 00:01:51,759 and 43 00:01:50,399 --> 00:01:53,759 in the general 44 00:01:51,759 --> 00:01:55,200 conference chat rooms following my 45 00:01:53,759 --> 00:01:57,439 presentation 46 00:01:55,200 --> 00:01:59,600 if we have questions 47 00:01:57,439 --> 00:02:03,119 that we don't have time to address 48 00:01:59,600 --> 00:02:03,119 during this time slot 49 00:02:03,360 --> 00:02:08,000 so today i will talk 50 00:02:05,520 --> 00:02:10,640 briefly about containers what they are 51 00:02:08,000 --> 00:02:13,200 and container standards 52 00:02:10,640 --> 00:02:15,520 and i will give an overview of 53 00:02:13,200 --> 00:02:19,440 kubernetes and openshift particularly 54 00:02:15,520 --> 00:02:21,200 with respect to their container runtimes 55 00:02:19,440 --> 00:02:23,360 these are huge products there's no way 56 00:02:21,200 --> 00:02:24,319 to cover 57 00:02:23,360 --> 00:02:26,560 even 58 00:02:24,319 --> 00:02:28,720 a sizeable chunk of them in depth so we 59 00:02:26,560 --> 00:02:31,120 just have to focus on this one specific 60 00:02:28,720 --> 00:02:33,360 area of the runtime 61 00:02:31,120 --> 00:02:36,800 then i'll talk about free ipa which is 62 00:02:33,360 --> 00:02:38,720 the application that is the subject of 63 00:02:36,800 --> 00:02:41,200 my team's efforts 64 00:02:38,720 --> 00:02:43,840 and we'll conclude by talking about 65 00:02:41,200 --> 00:02:46,080 systemd based workloads on kubernetes 66 00:02:43,840 --> 00:02:48,319 and openshift what are the challenges 67 00:02:46,080 --> 00:02:51,440 and the workarounds and the solutions to 68 00:02:48,319 --> 00:02:53,760 run them 69 00:02:51,440 --> 00:02:55,200 so what does it contain ask 10 different 70 00:02:53,760 --> 00:02:56,800 people you might get 10 different 71 00:02:55,200 --> 00:02:58,159 answers and they might all be correct 72 00:02:56,800 --> 00:03:00,480 answers because 73 00:02:58,159 --> 00:03:02,640 the concept of a container is really an 74 00:03:00,480 --> 00:03:05,840 abstraction 75 00:03:02,640 --> 00:03:07,519 of over uh isolation and confinement 76 00:03:05,840 --> 00:03:09,599 mechanisms 77 00:03:07,519 --> 00:03:11,360 most commonly when people are talking 78 00:03:09,599 --> 00:03:14,640 about containers they're talking about 79 00:03:11,360 --> 00:03:15,760 some os level virtualization where all 80 00:03:14,640 --> 00:03:17,360 of the 81 00:03:15,760 --> 00:03:18,879 containers or the containerized 82 00:03:17,360 --> 00:03:20,959 processors 83 00:03:18,879 --> 00:03:22,879 share the host kernel so it's the host 84 00:03:20,959 --> 00:03:25,519 kernel running everything but 85 00:03:22,879 --> 00:03:27,920 presenting to the confined processors a 86 00:03:25,519 --> 00:03:31,080 restricted view of the system 87 00:03:27,920 --> 00:03:32,720 and imposing resource limits 88 00:03:31,080 --> 00:03:34,400 non-linux 89 00:03:32,720 --> 00:03:37,760 implementations of os level 90 00:03:34,400 --> 00:03:40,080 virtualization include freebsd jails and 91 00:03:37,760 --> 00:03:42,239 solaris zones 92 00:03:40,080 --> 00:03:44,560 uh it's important to make a distinction 93 00:03:42,239 --> 00:03:46,480 between the container which is actually 94 00:03:44,560 --> 00:03:49,200 the the confinement mechanisms and the 95 00:03:46,480 --> 00:03:51,599 processes running in that environment 96 00:03:49,200 --> 00:03:54,000 from container images so if you're 97 00:03:51,599 --> 00:03:55,680 talking about uh docker files and 98 00:03:54,000 --> 00:03:57,120 building a container 99 00:03:55,680 --> 00:03:59,200 what you're really talking about is 100 00:03:57,120 --> 00:04:01,519 building a container image which defines 101 00:03:59,200 --> 00:04:03,439 the file system contents 102 00:04:01,519 --> 00:04:05,519 intended for use 103 00:04:03,439 --> 00:04:07,120 in or with a container 104 00:04:05,519 --> 00:04:08,480 as well as 105 00:04:07,120 --> 00:04:10,239 metadata 106 00:04:08,480 --> 00:04:12,799 about how that container should be run 107 00:04:10,239 --> 00:04:14,879 such as what process needs to be run 108 00:04:12,799 --> 00:04:17,599 the setting of environment variables and 109 00:04:14,879 --> 00:04:17,599 similar things 110 00:04:18,000 --> 00:04:22,960 on linux containers 111 00:04:19,759 --> 00:04:24,720 usually consist of a bunch of 112 00:04:22,960 --> 00:04:26,479 disparate 113 00:04:24,720 --> 00:04:29,440 security mechanisms offered by the 114 00:04:26,479 --> 00:04:31,280 kernel including namespaces and there 115 00:04:29,440 --> 00:04:34,160 are different kinds of namespaces 116 00:04:31,280 --> 00:04:36,080 namespaces for pids or process ids 117 00:04:34,160 --> 00:04:38,720 mount namespaces network device 118 00:04:36,080 --> 00:04:40,479 namespaces c group namespace and there 119 00:04:38,720 --> 00:04:41,680 are some other kinds 120 00:04:40,479 --> 00:04:43,520 and 121 00:04:41,680 --> 00:04:44,880 each of those kinds of namespace will 122 00:04:43,520 --> 00:04:46,800 present a 123 00:04:44,880 --> 00:04:49,360 restricted view 124 00:04:46,800 --> 00:04:51,600 to the namespace process or processes 125 00:04:49,360 --> 00:04:53,680 for example in a pid name space 126 00:04:51,600 --> 00:04:56,479 uh a process running in that namespace 127 00:04:53,680 --> 00:04:59,199 can see other pids inside that namespace 128 00:04:56,479 --> 00:05:01,199 but nothing outside the name space 129 00:04:59,199 --> 00:05:03,520 mount namespace 130 00:05:01,199 --> 00:05:05,919 has a dedicated mount table for the 131 00:05:03,520 --> 00:05:07,840 processors in that namespace and 132 00:05:05,919 --> 00:05:09,039 manipulating that mount table won't 133 00:05:07,840 --> 00:05:10,639 affect 134 00:05:09,039 --> 00:05:11,680 the mount tables in other mountain name 135 00:05:10,639 --> 00:05:13,919 spaces 136 00:05:11,680 --> 00:05:14,960 and manipulating the system mount table 137 00:05:13,919 --> 00:05:16,400 also 138 00:05:14,960 --> 00:05:19,120 will not affect the 139 00:05:16,400 --> 00:05:22,160 container's mount table 140 00:05:19,120 --> 00:05:23,280 a container may include selinux or 141 00:05:22,160 --> 00:05:25,199 armor 142 00:05:23,280 --> 00:05:27,919 confinement so those are mandatory 143 00:05:25,199 --> 00:05:31,600 access control mechanisms 144 00:05:27,919 --> 00:05:34,000 and a container may use capabilities or 145 00:05:31,600 --> 00:05:35,280 sep comp to restrict 146 00:05:34,000 --> 00:05:38,320 what the 147 00:05:35,280 --> 00:05:40,720 process can do 148 00:05:38,320 --> 00:05:42,560 in terms of system calls so what system 149 00:05:40,720 --> 00:05:44,160 calls 150 00:05:42,560 --> 00:05:46,160 are allowed or what arguments are 151 00:05:44,160 --> 00:05:48,479 allowed in a system call or potentially 152 00:05:46,160 --> 00:05:51,199 modifying the behavior of particular 153 00:05:48,479 --> 00:05:51,199 system calls 154 00:05:51,600 --> 00:05:56,080 now recently the open container 155 00:05:53,919 --> 00:05:58,000 initiative which is uh an initiative of 156 00:05:56,080 --> 00:05:59,199 the linux foundation 157 00:05:58,000 --> 00:06:00,880 has been 158 00:05:59,199 --> 00:06:03,520 uh specifying and developing 159 00:06:00,880 --> 00:06:04,840 specifications for various aspects of 160 00:06:03,520 --> 00:06:06,960 this whole container 161 00:06:04,840 --> 00:06:09,280 ecosystem the one we're going to talk 162 00:06:06,960 --> 00:06:11,840 about today is the runtime specification 163 00:06:09,280 --> 00:06:14,720 they also have specifications for 164 00:06:11,840 --> 00:06:17,600 the image format and other things 165 00:06:14,720 --> 00:06:19,840 the runtime specification is a 166 00:06:17,600 --> 00:06:21,039 specification for a low level runtime 167 00:06:19,840 --> 00:06:24,400 interface 168 00:06:21,039 --> 00:06:26,560 uh it is not linux specific so it um 169 00:06:24,400 --> 00:06:28,720 in encompasses 170 00:06:26,560 --> 00:06:31,440 solaris containers windows containers 171 00:06:28,720 --> 00:06:32,960 even virtual machines treating vms as a 172 00:06:31,440 --> 00:06:35,360 kind of container 173 00:06:32,960 --> 00:06:38,639 within the oci 174 00:06:35,360 --> 00:06:41,360 definition of the container abstraction 175 00:06:38,639 --> 00:06:42,960 oci implementations include run c which 176 00:06:41,360 --> 00:06:45,039 is their reference implementation for 177 00:06:42,960 --> 00:06:46,400 linux c-run is another linux 178 00:06:45,039 --> 00:06:48,720 implementation 179 00:06:46,400 --> 00:06:50,800 and carter containers is an 180 00:06:48,720 --> 00:06:52,639 implementation of the 181 00:06:50,800 --> 00:06:56,880 virtual machine 182 00:06:52,639 --> 00:06:56,880 instantiation of a container 183 00:06:57,199 --> 00:07:02,240 the runtime specification uses a json 184 00:07:00,160 --> 00:07:04,400 configuration format it's quite verbose 185 00:07:02,240 --> 00:07:07,360 but there's a link here to an example if 186 00:07:04,400 --> 00:07:09,759 you want to see that 187 00:07:07,360 --> 00:07:11,759 the general things that 188 00:07:09,759 --> 00:07:13,680 will be specified for any container 189 00:07:11,759 --> 00:07:15,759 include the mounts 190 00:07:13,680 --> 00:07:17,280 the process that should be executed and 191 00:07:15,759 --> 00:07:19,759 its environment 192 00:07:17,280 --> 00:07:21,680 and life cycle hooks for 193 00:07:19,759 --> 00:07:24,240 running commands either on the host on 194 00:07:21,680 --> 00:07:26,319 the container host or inside the 195 00:07:24,240 --> 00:07:28,880 container upon 196 00:07:26,319 --> 00:07:31,440 container start stop creation 197 00:07:28,880 --> 00:07:33,280 destruction etc 198 00:07:31,440 --> 00:07:35,919 and the linux specific 199 00:07:33,280 --> 00:07:38,319 tunables for a container include 200 00:07:35,919 --> 00:07:40,160 capability set so restricting the 201 00:07:38,319 --> 00:07:41,599 capability bounding set for the 202 00:07:40,160 --> 00:07:43,680 container 203 00:07:41,599 --> 00:07:44,400 what namespaces should be used or should 204 00:07:43,680 --> 00:07:47,759 be 205 00:07:44,400 --> 00:07:49,840 created and new for that container 206 00:07:47,759 --> 00:07:51,759 what c group the container should live 207 00:07:49,840 --> 00:07:56,000 in sys controls that should be set 208 00:07:51,759 --> 00:07:56,000 setcomp profile and similar things 209 00:07:56,560 --> 00:08:01,120 so now let's talk about 210 00:07:58,479 --> 00:08:03,520 kubernetes and openshift so kubernetes 211 00:08:01,120 --> 00:08:06,479 is a container orchestration system uh 212 00:08:03,520 --> 00:08:09,599 if you see k8s or cates written that's 213 00:08:06,479 --> 00:08:11,280 an abbreviation for kubernetes 214 00:08:09,599 --> 00:08:14,240 kubernetes 215 00:08:11,280 --> 00:08:16,639 works with a cluster of machines 216 00:08:14,240 --> 00:08:18,639 with distributed configuration and 217 00:08:16,639 --> 00:08:20,960 distributing the 218 00:08:18,639 --> 00:08:23,039 application workloads across the cluster 219 00:08:20,960 --> 00:08:26,240 and it has a declarative configuration 220 00:08:23,039 --> 00:08:28,000 format typically using json or yaml if 221 00:08:26,240 --> 00:08:29,599 you're interacting with the system as a 222 00:08:28,000 --> 00:08:31,360 human 223 00:08:29,599 --> 00:08:34,159 but also there are 224 00:08:31,360 --> 00:08:36,640 apis for most languages you could poke a 225 00:08:34,159 --> 00:08:38,560 stick at in particular go but also 226 00:08:36,640 --> 00:08:41,839 python and haskell 227 00:08:38,560 --> 00:08:43,760 and many other languages 228 00:08:41,839 --> 00:08:46,080 it has integration with 229 00:08:43,760 --> 00:08:48,720 most cloud providers including the the 230 00:08:46,080 --> 00:08:51,920 big obvious ones 231 00:08:48,720 --> 00:08:53,360 all the kubernetes documentation blog 232 00:08:51,920 --> 00:08:54,959 posts 233 00:08:53,360 --> 00:08:59,519 tutorials 234 00:08:54,959 --> 00:09:00,560 guides etc live at kubernetes.io 235 00:08:59,519 --> 00:09:03,040 and 236 00:09:00,560 --> 00:09:04,480 the github homepage is github.com 237 00:09:03,040 --> 00:09:07,959 kubernetes that's where all of the 238 00:09:04,480 --> 00:09:07,959 source code lives 239 00:09:11,040 --> 00:09:15,839 so some of the terminology that we need 240 00:09:13,440 --> 00:09:19,760 to use today and again we're focusing 241 00:09:15,839 --> 00:09:22,320 mainly on this area of the the run time 242 00:09:19,760 --> 00:09:24,399 there are many many many many many other 243 00:09:22,320 --> 00:09:25,360 things that we have no time to cover 244 00:09:24,399 --> 00:09:27,519 today 245 00:09:25,360 --> 00:09:29,600 so a container is 246 00:09:27,519 --> 00:09:31,040 an isolated or confined process or 247 00:09:29,600 --> 00:09:32,640 process tree 248 00:09:31,040 --> 00:09:35,120 uh it may 249 00:09:32,640 --> 00:09:37,519 be a windows container a linux container 250 00:09:35,120 --> 00:09:40,480 a vm based container 251 00:09:37,519 --> 00:09:42,800 that's really up to how that workload 252 00:09:40,480 --> 00:09:45,519 will be sandboxed 253 00:09:42,800 --> 00:09:46,640 a pod is a group one or more related 254 00:09:45,519 --> 00:09:48,839 containers 255 00:09:46,640 --> 00:09:51,279 so a 256 00:09:48,839 --> 00:09:53,040 a typical 257 00:09:51,279 --> 00:09:55,920 uh example would be 258 00:09:53,040 --> 00:09:59,040 some http application and then its 259 00:09:55,920 --> 00:10:00,880 database you know apache and mysql 260 00:09:59,040 --> 00:10:03,200 bundled together in a pod as a single 261 00:10:00,880 --> 00:10:05,200 application but the different components 262 00:10:03,200 --> 00:10:07,519 are actually separate containers within 263 00:10:05,200 --> 00:10:07,519 the pod 264 00:10:08,560 --> 00:10:14,959 a namespace is a scope for 265 00:10:12,640 --> 00:10:17,680 for objects and it's also an 266 00:10:14,959 --> 00:10:19,120 authentication and authorization scope 267 00:10:17,680 --> 00:10:22,880 such as for 268 00:10:19,120 --> 00:10:23,760 a single application or team or project 269 00:10:22,880 --> 00:10:25,440 and 270 00:10:23,760 --> 00:10:27,519 the nodes are the machines in the 271 00:10:25,440 --> 00:10:30,560 clusters where the pods are executed 272 00:10:27,519 --> 00:10:33,279 typically there are two kinds of nodes 273 00:10:30,560 --> 00:10:34,079 one class is the control plane nodes 274 00:10:33,279 --> 00:10:35,600 where 275 00:10:34,079 --> 00:10:37,760 the logic that 276 00:10:35,600 --> 00:10:40,480 is kubernetes that manages all of the 277 00:10:37,760 --> 00:10:43,440 networking and pod scheduling 278 00:10:40,480 --> 00:10:45,519 and auto scaling logic all of that stuff 279 00:10:43,440 --> 00:10:47,279 lives in the control plane or the master 280 00:10:45,519 --> 00:10:48,560 nodes they're also called 281 00:10:47,279 --> 00:10:50,160 and then there are the worker nodes 282 00:10:48,560 --> 00:10:52,399 which is where the 283 00:10:50,160 --> 00:10:55,839 business applications running on the 284 00:10:52,399 --> 00:10:55,839 cluster will be executed 285 00:10:56,320 --> 00:11:02,240 on a node the agent that executes a pod 286 00:10:59,839 --> 00:11:04,640 is called kubelet so kuberlet 287 00:11:02,240 --> 00:11:06,240 observes the distributed configuration 288 00:11:04,640 --> 00:11:09,279 and if it sees that 289 00:11:06,240 --> 00:11:11,200 a particular pod has been scheduled to 290 00:11:09,279 --> 00:11:13,839 it or has been 291 00:11:11,200 --> 00:11:16,800 removed from it then it will affect 292 00:11:13,839 --> 00:11:20,000 those changes on that node 293 00:11:16,800 --> 00:11:22,000 it does that by creating a sandbox 294 00:11:20,000 --> 00:11:24,000 sandbox is the 295 00:11:22,000 --> 00:11:26,480 isolation or confinement mechanism or 296 00:11:24,000 --> 00:11:28,560 mechanisms to be used for a pod 297 00:11:26,480 --> 00:11:30,320 one pod one sandbox 298 00:11:28,560 --> 00:11:32,320 all of the containers for a pod if there 299 00:11:30,320 --> 00:11:34,880 are multiple will run inside the same 300 00:11:32,320 --> 00:11:34,880 sandbox 301 00:11:36,720 --> 00:11:40,079 and the container runtime interface is 302 00:11:38,560 --> 00:11:42,800 the interface 303 00:11:40,079 --> 00:11:44,560 that kuberlette uses to talk to a 304 00:11:42,800 --> 00:11:46,640 container runtime 305 00:11:44,560 --> 00:11:49,519 and tell it to create start stop and 306 00:11:46,640 --> 00:11:51,360 destroy sandboxes and containers 307 00:11:49,519 --> 00:11:55,040 two implementations for linux include 308 00:11:51,360 --> 00:11:57,360 cryo and container d 309 00:11:55,040 --> 00:11:59,760 so visualizing this this whole thing 310 00:11:57,360 --> 00:12:00,959 this whole diagram is one kubernetes 311 00:11:59,760 --> 00:12:03,920 node 312 00:12:00,959 --> 00:12:06,800 on the left we have kubelet and kuberlet 313 00:12:03,920 --> 00:12:09,760 talks to a cri runtime 314 00:12:06,800 --> 00:12:11,920 using cri which has a protobuf 315 00:12:09,760 --> 00:12:14,000 wire format 316 00:12:11,920 --> 00:12:15,279 and then the cri runtime will do 317 00:12:14,000 --> 00:12:16,480 something 318 00:12:15,279 --> 00:12:18,800 to 319 00:12:16,480 --> 00:12:20,880 to create and manage the containers as 320 00:12:18,800 --> 00:12:24,000 requested by kuberlet 321 00:12:20,880 --> 00:12:26,160 so if we instantiate the abstract cri 322 00:12:24,000 --> 00:12:27,839 runtime as cryo now 323 00:12:26,160 --> 00:12:29,920 ben cryo 324 00:12:27,839 --> 00:12:32,079 is a program that uses 325 00:12:29,920 --> 00:12:33,920 an oci runtime 326 00:12:32,079 --> 00:12:36,480 to manage the containers now you can 327 00:12:33,920 --> 00:12:38,800 plug in different oci runtimes so we'll 328 00:12:36,480 --> 00:12:42,160 go a step further and we'll say well 329 00:12:38,800 --> 00:12:45,040 um we can use run c for that and so now 330 00:12:42,160 --> 00:12:45,760 we have kind of this fully instantiated 331 00:12:45,040 --> 00:12:48,639 uh 332 00:12:45,760 --> 00:12:50,480 container runtime setup 333 00:12:48,639 --> 00:12:53,600 uh that is one way that you could do it 334 00:12:50,480 --> 00:12:55,120 in a kubernetes node different 335 00:12:53,600 --> 00:12:58,000 distributions 336 00:12:55,120 --> 00:13:00,800 of kubernetes will use different 337 00:12:58,000 --> 00:13:02,880 cri runtimes and potentially if they're 338 00:13:00,800 --> 00:13:05,040 using an oci runtime 339 00:13:02,880 --> 00:13:08,480 it might be run c or it might be c run 340 00:13:05,040 --> 00:13:08,480 or or it might be something else 341 00:13:09,680 --> 00:13:14,000 i mentioned that the 342 00:13:11,200 --> 00:13:16,399 definitions are declarative 343 00:13:14,000 --> 00:13:17,680 and so here's a very simple pod 344 00:13:16,399 --> 00:13:19,519 definition 345 00:13:17,680 --> 00:13:21,360 we have the kind field which says what 346 00:13:19,519 --> 00:13:24,079 kind of object it is this is this is the 347 00:13:21,360 --> 00:13:26,959 yaml um 348 00:13:24,079 --> 00:13:28,880 serialization of this data but uh you 349 00:13:26,959 --> 00:13:31,760 know you could work with it with json or 350 00:13:28,880 --> 00:13:34,320 many other formats 351 00:13:31,760 --> 00:13:35,680 in the pod spec we have a list of 352 00:13:34,320 --> 00:13:36,800 containers 353 00:13:35,680 --> 00:13:38,880 in this case 354 00:13:36,800 --> 00:13:42,000 the list has one container 355 00:13:38,880 --> 00:13:43,600 and we have to specify a container image 356 00:13:42,000 --> 00:13:46,560 which will be a reference to some image 357 00:13:43,600 --> 00:13:49,519 registry where the cluster can pull the 358 00:13:46,560 --> 00:13:52,399 container image from 359 00:13:49,519 --> 00:13:53,839 and we may also specify the command to 360 00:13:52,399 --> 00:13:55,920 execute 361 00:13:53,839 --> 00:13:58,079 environment variables to set you don't 362 00:13:55,920 --> 00:13:59,519 always need to set these because 363 00:13:58,079 --> 00:14:02,320 these 364 00:13:59,519 --> 00:14:04,000 may have default values specified in the 365 00:14:02,320 --> 00:14:06,639 image metadata 366 00:14:04,000 --> 00:14:10,079 but you can set them in the pod spec or 367 00:14:06,639 --> 00:14:10,079 override them if you need to 368 00:14:11,360 --> 00:14:15,839 open shift 369 00:14:12,959 --> 00:14:17,600 also called openshift container platform 370 00:14:15,839 --> 00:14:21,040 or ocp 371 00:14:17,600 --> 00:14:23,519 is red hat's commercially supported 372 00:14:21,040 --> 00:14:25,040 enterprise container platform it's based 373 00:14:23,519 --> 00:14:27,360 on kubernetes 374 00:14:25,040 --> 00:14:30,839 there's also an upstream distribution 375 00:14:27,360 --> 00:14:30,839 called okd 376 00:14:30,959 --> 00:14:37,120 the latest stable release is version 4.9 377 00:14:34,959 --> 00:14:40,639 and it's no coincidence that in the 378 00:14:37,120 --> 00:14:42,560 diagram previously i used cryo and run c 379 00:14:40,639 --> 00:14:47,600 as the example because those are the 380 00:14:42,560 --> 00:14:47,600 components used in openshift 381 00:14:48,160 --> 00:14:52,880 all existing kubernetes terminology 382 00:14:50,320 --> 00:14:54,720 applies in openshift and there are 383 00:14:52,880 --> 00:14:56,639 additional concepts 384 00:14:54,720 --> 00:14:59,839 in openshift 385 00:14:56,639 --> 00:15:01,839 with respect to the runtime and pods 386 00:14:59,839 --> 00:15:04,000 the two that you need to know about 387 00:15:01,839 --> 00:15:06,720 today are projects 388 00:15:04,000 --> 00:15:09,279 which extend the namespace concept with 389 00:15:06,720 --> 00:15:10,560 additional 390 00:15:09,279 --> 00:15:13,519 attributes 391 00:15:10,560 --> 00:15:16,399 and metadata and security context 392 00:15:13,519 --> 00:15:18,959 constraints which are policies 393 00:15:16,399 --> 00:15:20,079 that affect the se linux context of a 394 00:15:18,959 --> 00:15:22,240 pod 395 00:15:20,079 --> 00:15:25,600 the setcomp profile the capability 396 00:15:22,240 --> 00:15:27,519 bounding set the user ids 397 00:15:25,600 --> 00:15:29,199 that the container can run as and 398 00:15:27,519 --> 00:15:32,160 similar 399 00:15:29,199 --> 00:15:32,160 similar mechanisms 400 00:15:33,440 --> 00:15:39,040 so the openshift runtime environment 401 00:15:35,279 --> 00:15:42,399 today uses se linux for confinement 402 00:15:39,040 --> 00:15:44,639 it creates sandboxes with a set of 403 00:15:42,399 --> 00:15:47,199 namespaces including a unique c group 404 00:15:44,639 --> 00:15:48,720 namespace a pid namespace mount 405 00:15:47,199 --> 00:15:50,880 namespaces 406 00:15:48,720 --> 00:15:52,560 uts namespace which allows setting a 407 00:15:50,880 --> 00:15:55,120 different hostname different kernel view 408 00:15:52,560 --> 00:15:58,320 name and some other attributes 409 00:15:55,120 --> 00:15:58,320 and network devices 410 00:15:58,560 --> 00:16:05,600 each project in an openshift cluster 411 00:16:01,120 --> 00:16:09,920 gets assigned a unique user id range 412 00:16:05,600 --> 00:16:11,839 and by default any container 413 00:16:09,920 --> 00:16:15,120 that is part of a pod 414 00:16:11,839 --> 00:16:20,240 within that project's namespace 415 00:16:15,120 --> 00:16:20,240 must run as a user id from that range 416 00:16:20,480 --> 00:16:23,759 the range is something large you know it 417 00:16:22,639 --> 00:16:26,959 might be 418 00:16:23,759 --> 00:16:28,480 1 billion to 1 billion and 10 000 or 419 00:16:26,959 --> 00:16:30,079 something like that 420 00:16:28,480 --> 00:16:32,000 and it's unique per project in the 421 00:16:30,079 --> 00:16:33,920 cluster 422 00:16:32,000 --> 00:16:35,759 these restrictions can be circumvented 423 00:16:33,920 --> 00:16:37,519 via the run as user 424 00:16:35,759 --> 00:16:39,600 property which is part of the pod spec 425 00:16:37,519 --> 00:16:41,040 you can request that it be run as a 426 00:16:39,600 --> 00:16:43,199 different user 427 00:16:41,040 --> 00:16:46,959 but in creating that pod 428 00:16:43,199 --> 00:16:50,320 you also have to do that via an account 429 00:16:46,959 --> 00:16:51,839 that has permission to use sccs that 430 00:16:50,320 --> 00:16:54,320 allow 431 00:16:51,839 --> 00:16:57,440 that pod to be run as 432 00:16:54,320 --> 00:16:59,759 root on the host or as some other 433 00:16:57,440 --> 00:17:00,560 low value user id on the host rather 434 00:16:59,759 --> 00:17:03,360 than 435 00:17:00,560 --> 00:17:05,039 a uid from the assigned range for that 436 00:17:03,360 --> 00:17:07,520 namespace 437 00:17:05,039 --> 00:17:08,720 this is usually a very bad idea 438 00:17:07,520 --> 00:17:10,319 because 439 00:17:08,720 --> 00:17:12,000 if you're running as 440 00:17:10,319 --> 00:17:13,839 root on the host even with these other 441 00:17:12,000 --> 00:17:15,439 kinds of confinement 442 00:17:13,839 --> 00:17:18,240 if you break out of your se linux 443 00:17:15,439 --> 00:17:19,520 confinement or you can escape your mount 444 00:17:18,240 --> 00:17:20,640 namespace 445 00:17:19,520 --> 00:17:22,079 uh 446 00:17:20,640 --> 00:17:23,039 your root on the 447 00:17:22,079 --> 00:17:25,199 node 448 00:17:23,039 --> 00:17:26,640 uh basically your node and your cluster 449 00:17:25,199 --> 00:17:27,360 are owned 450 00:17:26,640 --> 00:17:29,280 so 451 00:17:27,360 --> 00:17:31,120 it's uh not something that we want to be 452 00:17:29,280 --> 00:17:33,280 doing especially as 453 00:17:31,120 --> 00:17:36,240 people working on and developing a very 454 00:17:33,280 --> 00:17:38,480 security minded and security sensitive 455 00:17:36,240 --> 00:17:41,520 project okay what's that project free 456 00:17:38,480 --> 00:17:44,480 ipa so free ipa is an open source 457 00:17:41,520 --> 00:17:45,840 identity management solution 458 00:17:44,480 --> 00:17:48,000 that means that you can define your 459 00:17:45,840 --> 00:17:50,160 users groups and services 460 00:17:48,000 --> 00:17:52,480 in your organization it provides 461 00:17:50,160 --> 00:17:56,559 authentication mechanisms and lets you 462 00:17:52,480 --> 00:18:00,960 define and enforce access policies 463 00:17:56,559 --> 00:18:02,559 free ipa is made up of a lot of moving 464 00:18:00,960 --> 00:18:04,160 parts 465 00:18:02,559 --> 00:18:05,600 several of which are 466 00:18:04,160 --> 00:18:08,400 very large 467 00:18:05,600 --> 00:18:09,600 mature independent projects in their own 468 00:18:08,400 --> 00:18:12,160 right 469 00:18:09,600 --> 00:18:14,799 so these components include 389ds which 470 00:18:12,160 --> 00:18:17,760 is our database and ldap server 471 00:18:14,799 --> 00:18:20,720 mit kerberos which is an implementation 472 00:18:17,760 --> 00:18:24,960 of the kerberos authentication protocol 473 00:18:20,720 --> 00:18:26,480 we have an http api and also a web ui 474 00:18:24,960 --> 00:18:29,480 and these 475 00:18:26,480 --> 00:18:29,480 are 476 00:18:29,520 --> 00:18:35,440 provided by an application running 477 00:18:31,440 --> 00:18:35,440 behind an apache reverse proxy 478 00:18:35,919 --> 00:18:41,440 we have dog tag pki which is an x509 479 00:18:38,960 --> 00:18:42,640 certificate authority sssd is a client 480 00:18:41,440 --> 00:18:45,440 component 481 00:18:42,640 --> 00:18:47,840 that provides authentication mechanisms 482 00:18:45,440 --> 00:18:48,799 and user lookup facilities 483 00:18:47,840 --> 00:18:52,400 to 484 00:18:48,799 --> 00:18:53,360 applications running on an enrolled host 485 00:18:52,400 --> 00:18:54,960 and 486 00:18:53,360 --> 00:18:57,200 those those are the main moving parts 487 00:18:54,960 --> 00:19:00,559 but there are others 488 00:18:57,200 --> 00:19:03,440 free ipa is available as part of rel and 489 00:19:00,559 --> 00:19:05,919 so is commercially supported as part of 490 00:19:03,440 --> 00:19:08,320 rel and we have 491 00:19:05,919 --> 00:19:09,520 fedora where we do all of our upstream 492 00:19:08,320 --> 00:19:12,080 development 493 00:19:09,520 --> 00:19:14,200 features land there first usually 494 00:19:12,080 --> 00:19:16,480 and we have community support 495 00:19:14,200 --> 00:19:18,880 freeipa.org is the website if you want 496 00:19:16,480 --> 00:19:21,039 to learn more 497 00:19:18,880 --> 00:19:23,760 why do we want to run it 498 00:19:21,039 --> 00:19:24,480 on kubernetes or openshift well 499 00:19:23,760 --> 00:19:27,679 as 500 00:19:24,480 --> 00:19:30,160 organizations move more applications 501 00:19:27,679 --> 00:19:32,160 onto container orchestration systems 502 00:19:30,160 --> 00:19:33,280 like openshift 503 00:19:32,160 --> 00:19:36,240 then 504 00:19:33,280 --> 00:19:38,720 they may also wish to have the identity 505 00:19:36,240 --> 00:19:41,600 management system providing the identity 506 00:19:38,720 --> 00:19:44,000 services that those applications need 507 00:19:41,600 --> 00:19:45,840 running alongside those applications on 508 00:19:44,000 --> 00:19:48,559 the cluster 509 00:19:45,840 --> 00:19:49,919 the cluster itself also has identity 510 00:19:48,559 --> 00:19:51,600 needs 511 00:19:49,919 --> 00:19:53,200 you 512 00:19:51,600 --> 00:19:55,360 need to 513 00:19:53,200 --> 00:19:58,640 have users and groups 514 00:19:55,360 --> 00:20:01,200 within the openshift cluster itself and 515 00:19:58,640 --> 00:20:02,720 you might want to tie that in with your 516 00:20:01,200 --> 00:20:05,919 identity management system for your 517 00:20:02,720 --> 00:20:08,159 organization or you might want to deploy 518 00:20:05,919 --> 00:20:09,520 your identity management system for your 519 00:20:08,159 --> 00:20:11,760 organization 520 00:20:09,520 --> 00:20:14,240 on an openshift cluster 521 00:20:11,760 --> 00:20:16,720 also for access to the nodes directly in 522 00:20:14,240 --> 00:20:19,679 openshift everything 523 00:20:16,720 --> 00:20:22,159 all the nodes are red hat coros nodes as 524 00:20:19,679 --> 00:20:24,480 previously atomic so they're immutable 525 00:20:22,159 --> 00:20:26,799 systems and 526 00:20:24,480 --> 00:20:29,039 currently the way you would log onto a 527 00:20:26,799 --> 00:20:32,240 node is via a debug shell 528 00:20:29,039 --> 00:20:33,520 or via ssh but there's a single user 529 00:20:32,240 --> 00:20:34,960 account 530 00:20:33,520 --> 00:20:36,400 that's not really 531 00:20:34,960 --> 00:20:38,640 adequate for 532 00:20:36,400 --> 00:20:40,480 organizations in highly regulated 533 00:20:38,640 --> 00:20:42,080 industries like telcos banking 534 00:20:40,480 --> 00:20:44,400 government etc 535 00:20:42,080 --> 00:20:47,440 so there is actually a need for 536 00:20:44,400 --> 00:20:49,440 proper identity management and access 537 00:20:47,440 --> 00:20:50,320 control mechanisms 538 00:20:49,440 --> 00:20:52,400 for 539 00:20:50,320 --> 00:20:54,559 logging onto the nodes of the cluster as 540 00:20:52,400 --> 00:20:57,039 well 541 00:20:54,559 --> 00:20:59,039 finally there's this idea that you might 542 00:20:57,039 --> 00:21:02,000 want to offer identity management 543 00:20:59,039 --> 00:21:03,760 systems as a service 544 00:21:02,000 --> 00:21:06,240 so a service provider which in the 545 00:21:03,760 --> 00:21:09,120 future may be red hat maybe someone else 546 00:21:06,240 --> 00:21:12,559 maybe multiple companies uh offering 547 00:21:09,120 --> 00:21:14,480 free ipa as a turnkey managed service 548 00:21:12,559 --> 00:21:15,600 where an organization says yes we're 549 00:21:14,480 --> 00:21:18,559 going to need 550 00:21:15,600 --> 00:21:20,799 free ipa for our organization 551 00:21:18,559 --> 00:21:23,840 i have an account with a cloud provider 552 00:21:20,799 --> 00:21:26,159 i can see it in the service catalog 553 00:21:23,840 --> 00:21:28,159 you know click ok deploy 554 00:21:26,159 --> 00:21:31,520 and a short time later 555 00:21:28,159 --> 00:21:34,799 the identity management system is set up 556 00:21:31,520 --> 00:21:38,400 and we can give the keys to the customer 557 00:21:34,799 --> 00:21:39,840 and typically these will be deployed 558 00:21:38,400 --> 00:21:41,679 on the service provider's own 559 00:21:39,840 --> 00:21:44,960 infrastructure managed by the service 560 00:21:41,679 --> 00:21:46,400 provider and co-tenanted so multiple 561 00:21:44,960 --> 00:21:48,000 different customers 562 00:21:46,400 --> 00:21:50,080 with their 563 00:21:48,000 --> 00:21:53,120 application instances living side by 564 00:21:50,080 --> 00:21:53,120 side on the cluster 565 00:21:53,840 --> 00:21:57,280 so if we want to start doing this sort 566 00:21:56,080 --> 00:22:00,720 of thing 567 00:21:57,280 --> 00:22:02,640 we need containers for free ipa we need 568 00:22:00,720 --> 00:22:04,640 to 569 00:22:02,640 --> 00:22:06,400 put this stuff in a container so we can 570 00:22:04,640 --> 00:22:08,640 run it in a container orchestration 571 00:22:06,400 --> 00:22:10,559 system and our current approach is to 572 00:22:08,640 --> 00:22:12,720 encapsulate the whole 573 00:22:10,559 --> 00:22:14,080 rail or fedora based system 574 00:22:12,720 --> 00:22:16,080 in a container 575 00:22:14,080 --> 00:22:17,840 wrapping everything up together where 576 00:22:16,080 --> 00:22:19,360 pid one the container's entry point will 577 00:22:17,840 --> 00:22:21,200 be system d 578 00:22:19,360 --> 00:22:23,919 and just as if you were deploying on a 579 00:22:21,200 --> 00:22:26,320 vm or bare metal system d will start 580 00:22:23,919 --> 00:22:27,919 and monitor all the services that need 581 00:22:26,320 --> 00:22:29,120 to be brought up 582 00:22:27,919 --> 00:22:30,400 uh 583 00:22:29,120 --> 00:22:32,159 on that 584 00:22:30,400 --> 00:22:33,440 application instance 585 00:22:32,159 --> 00:22:35,919 this is 586 00:22:33,440 --> 00:22:38,559 a very stark contrast from 587 00:22:35,919 --> 00:22:40,799 a microservice architecture or 588 00:22:38,559 --> 00:22:42,559 you might have heard the buzzword cloud 589 00:22:40,799 --> 00:22:44,320 native 590 00:22:42,559 --> 00:22:46,720 uh 591 00:22:44,320 --> 00:22:48,640 where all of the 592 00:22:46,720 --> 00:22:50,159 main components of the application would 593 00:22:48,640 --> 00:22:52,880 be running 594 00:22:50,159 --> 00:22:55,520 as separate containers within within a 595 00:22:52,880 --> 00:22:57,760 pod or maybe even in separate pods uh 596 00:22:55,520 --> 00:22:59,840 wired up to talk to each other to do all 597 00:22:57,760 --> 00:23:02,240 the things they need to do but all of 598 00:22:59,840 --> 00:23:04,559 them running uh separately rather than 599 00:23:02,240 --> 00:23:07,120 together in a single sandbox 600 00:23:04,559 --> 00:23:10,240 uh we're doing the exact opposite we 601 00:23:07,120 --> 00:23:11,679 call it a monolithic container i don't 602 00:23:10,240 --> 00:23:13,679 know if there's any 603 00:23:11,679 --> 00:23:15,440 better or established terminology for 604 00:23:13,679 --> 00:23:17,120 what we're trying to do 605 00:23:15,440 --> 00:23:18,960 i think there might not be because 606 00:23:17,120 --> 00:23:20,080 usually when we talk to people about 607 00:23:18,960 --> 00:23:22,960 this 608 00:23:20,080 --> 00:23:24,799 there are howls of dismay and 609 00:23:22,960 --> 00:23:26,159 why would you do it like that and that's 610 00:23:24,799 --> 00:23:28,480 not how you're supposed to build 611 00:23:26,159 --> 00:23:30,559 applications 612 00:23:28,480 --> 00:23:31,840 but but we have 613 00:23:30,559 --> 00:23:33,760 reasons to 614 00:23:31,840 --> 00:23:36,000 avoid doing it if we can 615 00:23:33,760 --> 00:23:38,080 first of all it's a very big upfront 616 00:23:36,000 --> 00:23:40,559 engineering effort to re-architect free 617 00:23:38,080 --> 00:23:42,960 ipa to be cloud native to to break 618 00:23:40,559 --> 00:23:45,120 everything up as i mentioned a lot of 619 00:23:42,960 --> 00:23:46,880 the components of free ipa are big 620 00:23:45,120 --> 00:23:49,120 legacy projects in their own right 621 00:23:46,880 --> 00:23:50,960 making all kinds of assumptions about 622 00:23:49,120 --> 00:23:53,520 how they're deployed 623 00:23:50,960 --> 00:23:54,400 you know what users they're running as 624 00:23:53,520 --> 00:23:56,000 so 625 00:23:54,400 --> 00:23:57,440 and many of these projects 626 00:23:56,000 --> 00:24:00,880 red hat 627 00:23:57,440 --> 00:24:02,559 is just a contributor to among many 628 00:24:00,880 --> 00:24:04,320 we don't steer 629 00:24:02,559 --> 00:24:06,320 all of the projects that we actually use 630 00:24:04,320 --> 00:24:08,320 as part of free ipa so 631 00:24:06,320 --> 00:24:10,960 it's not only an engineering effort but 632 00:24:08,320 --> 00:24:13,279 also a political effort to make the 633 00:24:10,960 --> 00:24:15,840 changes that we would need 634 00:24:13,279 --> 00:24:15,840 to do this 635 00:24:16,159 --> 00:24:19,919 furthermore 636 00:24:17,679 --> 00:24:22,559 we would face 637 00:24:19,919 --> 00:24:24,640 an increase in ongoing costs 638 00:24:22,559 --> 00:24:26,159 as we support free ipa with two 639 00:24:24,640 --> 00:24:29,039 completely different application 640 00:24:26,159 --> 00:24:31,360 architectures why because free ipa is 641 00:24:29,039 --> 00:24:32,720 available in rel we commercially support 642 00:24:31,360 --> 00:24:33,840 rel 643 00:24:32,720 --> 00:24:35,840 and 644 00:24:33,840 --> 00:24:40,080 the existing releases the near future 645 00:24:35,840 --> 00:24:42,240 releases are going to use the status quo 646 00:24:40,080 --> 00:24:43,600 application architecture and deployment 647 00:24:42,240 --> 00:24:45,200 paradigm 648 00:24:43,600 --> 00:24:47,039 so 649 00:24:45,200 --> 00:24:49,279 we need to support that for five years 650 00:24:47,039 --> 00:24:50,559 minimum probably realistically more like 651 00:24:49,279 --> 00:24:51,600 10 years 652 00:24:50,559 --> 00:24:54,240 so we 653 00:24:51,600 --> 00:24:57,200 face significant ongoing costs 654 00:24:54,240 --> 00:24:58,159 if we have to re-architect free ipa 655 00:24:57,200 --> 00:25:00,320 and 656 00:24:58,159 --> 00:25:01,440 basically build it in a completely 657 00:25:00,320 --> 00:25:02,799 different way 658 00:25:01,440 --> 00:25:04,159 for the cloud 659 00:25:02,799 --> 00:25:06,000 now if we were starting from scratch 660 00:25:04,159 --> 00:25:08,400 today it's a no-brainer of course we 661 00:25:06,000 --> 00:25:10,799 would do it in the cloud-native way 662 00:25:08,400 --> 00:25:12,880 but free ipa is an old project things 663 00:25:10,799 --> 00:25:14,480 were done differently back then it's not 664 00:25:12,880 --> 00:25:15,679 anything to criticize it's just a 665 00:25:14,480 --> 00:25:17,360 reality 666 00:25:15,679 --> 00:25:19,440 of 667 00:25:17,360 --> 00:25:20,559 of where we're at as a team and and the 668 00:25:19,440 --> 00:25:22,480 business 669 00:25:20,559 --> 00:25:24,559 uh consequences 670 00:25:22,480 --> 00:25:27,720 for the different decisions that we have 671 00:25:24,559 --> 00:25:27,720 to make 672 00:25:28,000 --> 00:25:31,679 so to do what we want to do 673 00:25:29,440 --> 00:25:33,919 unsurprisingly there are many challenges 674 00:25:31,679 --> 00:25:35,840 some of the main areas are the runtime 675 00:25:33,919 --> 00:25:38,080 and that's what i'm going deep on in 676 00:25:35,840 --> 00:25:40,799 this presentation some challenges with 677 00:25:38,080 --> 00:25:42,640 volumes and mounts and um 678 00:25:40,799 --> 00:25:45,520 dealing with uh 679 00:25:42,640 --> 00:25:48,240 mounts that uh not necessarily owned by 680 00:25:45,520 --> 00:25:50,640 root in the container and 681 00:25:48,240 --> 00:25:52,400 um yeah just various assumptions about 682 00:25:50,640 --> 00:25:54,000 the file system that that don't 683 00:25:52,400 --> 00:25:56,320 necessarily hold 684 00:25:54,000 --> 00:25:58,159 in a cloud environment 685 00:25:56,320 --> 00:26:02,960 uh and ingress so that's getting traffic 686 00:25:58,159 --> 00:26:04,240 to the cluster free ipa uses some uh 687 00:26:02,960 --> 00:26:05,919 uh 688 00:26:04,240 --> 00:26:08,960 a lot of applications today are just 689 00:26:05,919 --> 00:26:12,240 http and that's it well free ipa uses 690 00:26:08,960 --> 00:26:14,480 ldap and kerberos we use srv records for 691 00:26:12,240 --> 00:26:17,279 service discovery we have services that 692 00:26:14,480 --> 00:26:18,720 use both tcp and udp 693 00:26:17,279 --> 00:26:19,600 to communicate 694 00:26:18,720 --> 00:26:21,039 and 695 00:26:19,600 --> 00:26:23,279 i've got some links there to some blog 696 00:26:21,039 --> 00:26:25,520 posts where i have 697 00:26:23,279 --> 00:26:27,360 written about some of the 698 00:26:25,520 --> 00:26:29,200 challenges that we have with ingress but 699 00:26:27,360 --> 00:26:31,360 that is a whole other talk 700 00:26:29,200 --> 00:26:34,159 maybe next year 701 00:26:31,360 --> 00:26:36,000 so for the runtime what are the 702 00:26:34,159 --> 00:26:37,520 challenges and what are the workarounds 703 00:26:36,000 --> 00:26:39,679 and solutions 704 00:26:37,520 --> 00:26:41,840 the first one is that systemd and other 705 00:26:39,679 --> 00:26:43,679 components in free ipa and you can 706 00:26:41,840 --> 00:26:45,600 abstract this to any kind of 707 00:26:43,679 --> 00:26:47,679 quote-unquote legacy application that 708 00:26:45,600 --> 00:26:50,159 you might want to bundle up 709 00:26:47,679 --> 00:26:52,640 in a monolithic container and run on 710 00:26:50,159 --> 00:26:55,039 kubernetes or openshift 711 00:26:52,640 --> 00:26:56,880 so in these situations for these 712 00:26:55,039 --> 00:27:00,559 applications they may be expecting to 713 00:26:56,880 --> 00:27:02,559 run as root or some other specific uid 714 00:27:00,559 --> 00:27:03,279 and by default 715 00:27:02,559 --> 00:27:05,600 on 716 00:27:03,279 --> 00:27:08,159 kubernetes or an openshift cluster 717 00:27:05,600 --> 00:27:10,080 that's not going to happen it's going to 718 00:27:08,159 --> 00:27:13,279 by default try and run your application 719 00:27:10,080 --> 00:27:15,919 as some unprivileged uid 720 00:27:13,279 --> 00:27:17,919 so a possible solution to this is 721 00:27:15,919 --> 00:27:20,480 username spaces yet another kind of 722 00:27:17,919 --> 00:27:22,960 namespace that 723 00:27:20,480 --> 00:27:24,880 that abstracts 724 00:27:22,960 --> 00:27:27,840 and constrains 725 00:27:24,880 --> 00:27:29,120 the hosts or the parent username space 726 00:27:27,840 --> 00:27:30,559 and 727 00:27:29,120 --> 00:27:33,600 maps those 728 00:27:30,559 --> 00:27:35,600 maps a range of host 729 00:27:33,600 --> 00:27:38,000 user ids to 730 00:27:35,600 --> 00:27:40,480 a range of container uids potentially 731 00:27:38,000 --> 00:27:42,080 starting at a different number say zero 732 00:27:40,480 --> 00:27:45,440 root 733 00:27:42,080 --> 00:27:48,880 so this is supported in the oci runtime 734 00:27:45,440 --> 00:27:50,080 spec and oci runtime implementations 735 00:27:48,880 --> 00:27:52,080 and in 736 00:27:50,080 --> 00:27:53,919 cryo the version that shipped in 737 00:27:52,080 --> 00:27:56,720 openshift47 738 00:27:53,919 --> 00:27:58,880 there is an annotation based username 739 00:27:56,720 --> 00:27:59,760 space feature 740 00:27:58,880 --> 00:28:02,000 so 741 00:27:59,760 --> 00:28:03,600 via a pod annotation and assuming the 742 00:28:02,000 --> 00:28:06,399 cluster is 743 00:28:03,600 --> 00:28:09,440 configured to support this behavior or 744 00:28:06,399 --> 00:28:11,120 to enable this behavior then you can 745 00:28:09,440 --> 00:28:12,559 cause your pod 746 00:28:11,120 --> 00:28:16,799 to be 747 00:28:12,559 --> 00:28:18,559 executed in its own username space 748 00:28:16,799 --> 00:28:21,200 at the moment this requires the pod to 749 00:28:18,559 --> 00:28:23,039 be admitted to the cluster 750 00:28:21,200 --> 00:28:26,640 by an account that 751 00:28:23,039 --> 00:28:30,880 can use the any uid scc or a similar scc 752 00:28:26,640 --> 00:28:34,640 that allows pods to be run as root or as 753 00:28:30,880 --> 00:28:36,880 privileged users on the host 754 00:28:34,640 --> 00:28:39,760 now the workload itself will not run as 755 00:28:36,880 --> 00:28:40,559 those users it's a work around because 756 00:28:39,760 --> 00:28:43,520 the 757 00:28:40,559 --> 00:28:45,039 kubernetes api and the kubernetes data 758 00:28:43,520 --> 00:28:49,200 model 759 00:28:45,039 --> 00:28:50,960 have no awareness yet of username spaces 760 00:28:49,200 --> 00:28:53,360 so visualizing this 761 00:28:50,960 --> 00:28:56,640 you have a host username space which is 762 00:28:53,360 --> 00:28:57,679 0 through 2 to the 32 763 00:28:56,640 --> 00:28:58,399 and 764 00:28:57,679 --> 00:29:00,960 some 765 00:28:58,399 --> 00:29:03,600 slice of this namespace 766 00:29:00,960 --> 00:29:05,760 can be mapped to a particular 767 00:29:03,600 --> 00:29:08,320 process which may be a container so here 768 00:29:05,760 --> 00:29:11,039 we have container a inside the container 769 00:29:08,320 --> 00:29:12,640 there's a range of user ids 0 through 770 00:29:11,039 --> 00:29:15,600 five three five 771 00:29:12,640 --> 00:29:17,039 so if you're running as user two hundred 772 00:29:15,600 --> 00:29:18,640 thousand on the host inside that 773 00:29:17,039 --> 00:29:20,080 username space that will appear to be 774 00:29:18,640 --> 00:29:22,240 root 775 00:29:20,080 --> 00:29:25,520 um or you can think of it vice versa so 776 00:29:22,240 --> 00:29:27,919 for contact for container b 777 00:29:25,520 --> 00:29:31,039 the process that within the container is 778 00:29:27,919 --> 00:29:34,640 running as uid 0 is actually in the host 779 00:29:31,039 --> 00:29:37,640 username space running as user id 780 00:29:34,640 --> 00:29:37,640 26536 781 00:29:41,440 --> 00:29:46,159 uh once your cluster is configured to 782 00:29:43,360 --> 00:29:48,159 support username spaces to actually uh 783 00:29:46,159 --> 00:29:50,320 enable the use of this feature or opt 784 00:29:48,159 --> 00:29:51,960 into it in the pod spec you need to 785 00:29:50,320 --> 00:29:54,480 specify a couple of 786 00:29:51,960 --> 00:29:57,279 annotations um 787 00:29:54,480 --> 00:29:59,600 so this io.openshift build a true this 788 00:29:57,279 --> 00:29:59,600 just 789 00:30:00,880 --> 00:30:06,320 says to the container runtime 790 00:30:04,640 --> 00:30:08,799 i am a container i want to run in 791 00:30:06,320 --> 00:30:11,200 builder mode and in builder mode 792 00:30:08,799 --> 00:30:12,880 username spaces is enabled 793 00:30:11,200 --> 00:30:15,840 and then you also have to specify the 794 00:30:12,880 --> 00:30:18,080 user ns mode annotation which says okay 795 00:30:15,840 --> 00:30:19,279 how do i actually want 796 00:30:18,080 --> 00:30:22,880 to 797 00:30:19,279 --> 00:30:25,600 generate the mapping of host user ids to 798 00:30:22,880 --> 00:30:28,000 contain use user ids one of these modes 799 00:30:25,600 --> 00:30:29,919 is auto that's the most convenient 800 00:30:28,000 --> 00:30:32,320 option and the most secure option 801 00:30:29,919 --> 00:30:34,000 because it just allows you to say 802 00:30:32,320 --> 00:30:35,039 what size 803 00:30:34,000 --> 00:30:35,760 you need 804 00:30:35,039 --> 00:30:37,840 so 805 00:30:35,760 --> 00:30:39,600 i want a uid range 806 00:30:37,840 --> 00:30:41,520 of 65536 807 00:30:39,600 --> 00:30:44,320 and it will find 808 00:30:41,520 --> 00:30:44,320 an available 809 00:30:44,399 --> 00:30:48,799 portion of the host 810 00:30:46,080 --> 00:30:48,799 user id 811 00:30:48,840 --> 00:30:54,000 range and map that to the container 812 00:30:54,799 --> 00:30:59,440 kubernetes itself has no 813 00:30:57,039 --> 00:31:03,120 support for user name spaces but it is a 814 00:30:59,440 --> 00:31:05,279 long running and ongoing discussion 815 00:31:03,120 --> 00:31:06,720 um the first proposal was maybe two or 816 00:31:05,279 --> 00:31:07,519 three years ago and there have been a 817 00:31:06,720 --> 00:31:10,000 few 818 00:31:07,519 --> 00:31:11,679 iterations uh there's a fair bit of 819 00:31:10,000 --> 00:31:13,679 discussion about it at the moment with 820 00:31:11,679 --> 00:31:16,559 this third 821 00:31:13,679 --> 00:31:18,880 proposal number 3065 822 00:31:16,559 --> 00:31:20,880 i still need to catch up on this 823 00:31:18,880 --> 00:31:21,919 this was only sort of over december and 824 00:31:20,880 --> 00:31:24,720 christmas 825 00:31:21,919 --> 00:31:26,880 so i need to get up to date with what's 826 00:31:24,720 --> 00:31:28,159 actually happening in the kubernetes 827 00:31:26,880 --> 00:31:32,559 upstream 828 00:31:28,159 --> 00:31:34,799 uh in terms of username space support 829 00:31:32,559 --> 00:31:36,640 okay so that's user name spaces the next 830 00:31:34,799 --> 00:31:38,480 challenge is c groups so openshift 831 00:31:36,640 --> 00:31:40,640 creates a unique c group for each 832 00:31:38,480 --> 00:31:42,559 container and it creates a c group 833 00:31:40,640 --> 00:31:44,559 namespace for that container which makes 834 00:31:42,559 --> 00:31:47,200 the container c group appear to be the 835 00:31:44,559 --> 00:31:50,159 root namespace the root 836 00:31:47,200 --> 00:31:53,120 sorry not namespace the root c group 837 00:31:50,159 --> 00:31:53,120 inside the container 838 00:31:53,200 --> 00:31:58,640 so when you mount the c group fs 839 00:31:56,559 --> 00:32:01,919 what appears inside the container as 840 00:31:58,640 --> 00:32:02,880 csfsc group will actually be 841 00:32:01,919 --> 00:32:04,559 sys 842 00:32:02,880 --> 00:32:08,159 fsc group 843 00:32:04,559 --> 00:32:09,760 coupe pods dot scope slash coupons 844 00:32:08,159 --> 00:32:12,320 best effort dot scope slash blah blah 845 00:32:09,760 --> 00:32:14,480 blah some container id some deeply 846 00:32:12,320 --> 00:32:15,760 nested a c group for the specific 847 00:32:14,480 --> 00:32:16,960 container 848 00:32:15,760 --> 00:32:19,120 and 849 00:32:16,960 --> 00:32:20,960 in order to bring up the system and do 850 00:32:19,120 --> 00:32:22,240 the service management systemd needs 851 00:32:20,960 --> 00:32:25,120 right access 852 00:32:22,240 --> 00:32:28,640 to the c group to create new scopes and 853 00:32:25,120 --> 00:32:29,679 slices but it doesn't have it so the 854 00:32:28,640 --> 00:32:32,320 solution 855 00:32:29,679 --> 00:32:34,320 well let's modify the runtime 856 00:32:32,320 --> 00:32:35,440 so that it will change the owner of the 857 00:32:34,320 --> 00:32:36,880 c group 858 00:32:35,440 --> 00:32:39,120 to the 859 00:32:36,880 --> 00:32:41,519 user id that's actually running the 860 00:32:39,120 --> 00:32:43,840 container process 861 00:32:41,519 --> 00:32:45,200 so i implemented this uh i submitted a 862 00:32:43,840 --> 00:32:47,200 pull request 863 00:32:45,200 --> 00:32:48,960 and during the discussion and review of 864 00:32:47,200 --> 00:32:51,279 that pull request people were pointing 865 00:32:48,960 --> 00:32:53,039 out well um see run one of the other 866 00:32:51,279 --> 00:32:55,279 runtimes see run already does this and 867 00:32:53,039 --> 00:32:56,640 we were discussing that there were 868 00:32:55,279 --> 00:32:58,159 discrepancies 869 00:32:56,640 --> 00:33:00,240 between what different runtimes were 870 00:32:58,159 --> 00:33:02,640 doing in this regard 871 00:33:00,240 --> 00:33:04,960 the conclusion was well let's define a 872 00:33:02,640 --> 00:33:07,200 semantics for c group ownership in the 873 00:33:04,960 --> 00:33:08,640 oci runtime spec so i shelled that pull 874 00:33:07,200 --> 00:33:11,039 request 875 00:33:08,640 --> 00:33:13,679 went and made proposals to the oci 876 00:33:11,039 --> 00:33:15,200 runtime spec to define this semantics 877 00:33:13,679 --> 00:33:16,880 there was some discussion about it 878 00:33:15,200 --> 00:33:17,919 eventually that pull request was 879 00:33:16,880 --> 00:33:19,440 accepted 880 00:33:17,919 --> 00:33:21,200 um and 881 00:33:19,440 --> 00:33:25,120 after that then we were able to get the 882 00:33:21,200 --> 00:33:27,600 runcpul request merged 883 00:33:25,120 --> 00:33:30,480 so what is the semantics 884 00:33:27,600 --> 00:33:32,159 if and only if c groups v2 is in use and 885 00:33:30,480 --> 00:33:33,679 the container has its own c group 886 00:33:32,159 --> 00:33:35,519 namespace 887 00:33:33,679 --> 00:33:37,360 and the c group of s is to be mounted 888 00:33:35,519 --> 00:33:40,799 read write 889 00:33:37,360 --> 00:33:44,080 then the runtime should change the owner 890 00:33:40,799 --> 00:33:45,760 of the container c group to the host uid 891 00:33:44,080 --> 00:33:48,159 that maps to the 892 00:33:45,760 --> 00:33:50,000 uid of the process 893 00:33:48,159 --> 00:33:52,640 that the 894 00:33:50,000 --> 00:33:55,039 container that the process 895 00:33:52,640 --> 00:33:57,760 that is the container entry point 896 00:33:55,039 --> 00:33:59,600 uh in the user name space 897 00:33:57,760 --> 00:34:01,760 hopefully that makes sense 898 00:33:59,600 --> 00:34:02,559 um tldr 899 00:34:01,760 --> 00:34:03,440 it 900 00:34:02,559 --> 00:34:04,840 will 901 00:34:03,440 --> 00:34:07,360 join 902 00:34:04,840 --> 00:34:08,320 the c group 903 00:34:07,360 --> 00:34:10,800 to 904 00:34:08,320 --> 00:34:13,679 the user that corresponds to root inside 905 00:34:10,800 --> 00:34:15,679 the container's username space 906 00:34:13,679 --> 00:34:17,599 in the common case 907 00:34:15,679 --> 00:34:20,639 it's a little more nuanced than that 908 00:34:17,599 --> 00:34:24,480 what actually gets shown the c group 909 00:34:20,639 --> 00:34:27,599 directory itself so that systemd or the 910 00:34:24,480 --> 00:34:31,679 container process can create new 911 00:34:27,599 --> 00:34:32,639 sub c groups new scopes new slices 912 00:34:31,679 --> 00:34:35,040 and 913 00:34:32,639 --> 00:34:37,520 only the other files mentioned in sys 914 00:34:35,040 --> 00:34:40,159 kernel c group delegate 915 00:34:37,520 --> 00:34:42,240 so that it can move 916 00:34:40,159 --> 00:34:44,960 processes and threads into and out of 917 00:34:42,240 --> 00:34:44,960 the c groups 918 00:34:45,359 --> 00:34:47,599 in 919 00:34:47,839 --> 00:34:52,159 in its secret namespace 920 00:34:50,560 --> 00:34:54,240 uh 921 00:34:52,159 --> 00:34:56,240 why only these ones because 922 00:34:54,240 --> 00:34:57,920 if you turn everything in the container 923 00:34:56,240 --> 00:35:00,160 c group it would allow the container to 924 00:34:57,920 --> 00:35:01,520 elevate its own resource limits and do 925 00:35:00,160 --> 00:35:03,760 other things that it should not be 926 00:35:01,520 --> 00:35:06,240 allowed to do 927 00:35:03,760 --> 00:35:08,960 we need to use c groups v2 928 00:35:06,240 --> 00:35:12,240 and this is required for secure c group 929 00:35:08,960 --> 00:35:14,480 delegation c groups v2 is implemented it 930 00:35:12,240 --> 00:35:17,920 works but it is not yet the default 931 00:35:14,480 --> 00:35:19,599 configuration but it is on the roadmap 932 00:35:17,920 --> 00:35:21,599 so how to configure the cluster in 933 00:35:19,599 --> 00:35:24,160 openshift we use an object 934 00:35:21,599 --> 00:35:26,560 type called machine config 935 00:35:24,160 --> 00:35:28,720 we need to specify kernel arguments to 936 00:35:26,560 --> 00:35:32,640 turn on c groups v2 937 00:35:28,720 --> 00:35:34,079 we need to change a couple of files that 938 00:35:32,640 --> 00:35:37,359 specify 939 00:35:34,079 --> 00:35:40,240 allowed ranges of host uids that can be 940 00:35:37,359 --> 00:35:41,839 mapped to 941 00:35:40,240 --> 00:35:44,640 to child 942 00:35:41,839 --> 00:35:48,880 user id name spaces 943 00:35:44,640 --> 00:35:48,880 and we also need to deploy a 944 00:35:49,280 --> 00:35:54,400 experimental version of run c because 945 00:35:52,400 --> 00:35:56,320 the run changes have not yet made their 946 00:35:54,400 --> 00:35:58,480 way into a release 947 00:35:56,320 --> 00:36:00,320 so we use a systemd unit 948 00:35:58,480 --> 00:36:03,040 to do that and that systemd unit will 949 00:36:00,320 --> 00:36:04,160 run rpmos tree override replace 950 00:36:03,040 --> 00:36:06,160 if the 951 00:36:04,160 --> 00:36:07,280 desired version of run c is not yet 952 00:36:06,160 --> 00:36:10,079 present 953 00:36:07,280 --> 00:36:11,040 on that node 954 00:36:10,079 --> 00:36:12,480 okay 955 00:36:11,040 --> 00:36:14,400 demo time 956 00:36:12,480 --> 00:36:18,960 so i have here a cluster i only deployed 957 00:36:14,400 --> 00:36:22,560 this cluster about an hour ago um 958 00:36:18,960 --> 00:36:24,400 uh so you get um 959 00:36:22,560 --> 00:36:27,280 machine config there is a machine code 960 00:36:24,400 --> 00:36:29,359 thing here that i already deployed this 961 00:36:27,280 --> 00:36:31,200 is one idm-410 962 00:36:29,359 --> 00:36:35,560 so the three slides that i just showed 963 00:36:31,200 --> 00:36:35,560 you that's that machine config 964 00:36:43,040 --> 00:36:46,160 yeah there's there's a lot more 965 00:36:44,320 --> 00:36:49,760 information in the um 966 00:36:46,160 --> 00:36:50,800 when viewing the object uh hang on uh 967 00:36:49,760 --> 00:36:53,119 idm 968 00:36:50,800 --> 00:36:54,800 410 969 00:36:53,119 --> 00:36:57,440 sorry i was showing all machine config 970 00:36:54,800 --> 00:37:00,880 so here's the here's the data that um 971 00:36:57,440 --> 00:37:00,880 that was in the last few slides 972 00:37:02,000 --> 00:37:06,400 uh 973 00:37:02,880 --> 00:37:06,400 i'm going to create a new project 974 00:37:07,599 --> 00:37:14,000 called test 975 00:37:09,839 --> 00:37:14,000 i'm going to create a user called test 976 00:37:14,160 --> 00:37:19,599 i'm going to 977 00:37:16,240 --> 00:37:21,200 assign edit permissions on this project 978 00:37:19,599 --> 00:37:23,760 to user test 979 00:37:21,200 --> 00:37:27,200 to allow to create pods 980 00:37:23,760 --> 00:37:28,640 i'm going to assign it the neuid scc 981 00:37:27,200 --> 00:37:30,560 which is the workaround that we need to 982 00:37:28,640 --> 00:37:32,560 be able to admit 983 00:37:30,560 --> 00:37:35,960 a pod that's using usernamespaces and 984 00:37:32,560 --> 00:37:35,960 running as 985 00:37:36,240 --> 00:37:40,320 uid 0 in the user name space 986 00:37:40,400 --> 00:37:44,640 even though it will map to unprivileged 987 00:37:42,240 --> 00:37:44,640 users 988 00:37:45,520 --> 00:37:50,640 and i will create 989 00:37:48,079 --> 00:37:53,440 a pod with a systemd container just 990 00:37:50,640 --> 00:37:53,440 running nginx 991 00:37:54,560 --> 00:37:57,200 and uh 992 00:37:59,599 --> 00:38:04,640 this is the the same as i showed you on 993 00:38:01,680 --> 00:38:06,800 uh the slide with the example pod spec 994 00:38:04,640 --> 00:38:10,320 for using username spaces in particular 995 00:38:06,800 --> 00:38:10,320 there are those two annotations 996 00:38:14,000 --> 00:38:19,920 so if we have a look at the pod 997 00:38:18,160 --> 00:38:22,960 and particular attributes so the node 998 00:38:19,920 --> 00:38:26,160 name what node is it being scheduled on 999 00:38:22,960 --> 00:38:28,240 and what are the statuses 1000 00:38:26,160 --> 00:38:31,680 of the containers 1001 00:38:28,240 --> 00:38:35,200 in that pod we can see we're running on 1002 00:38:31,680 --> 00:38:38,800 this worker note so worker node a 1003 00:38:35,200 --> 00:38:38,800 and the pod is running 1004 00:38:39,359 --> 00:38:42,960 and 1005 00:38:40,960 --> 00:38:44,640 let's now open a debug shell on that 1006 00:38:42,960 --> 00:38:48,079 worker node and have a poke around and 1007 00:38:44,640 --> 00:38:48,079 see what's happening under the hood 1008 00:38:50,160 --> 00:38:53,920 and we're going to need to copy the 1009 00:38:51,680 --> 00:38:57,119 container id or 1010 00:38:53,920 --> 00:38:57,119 the prefix of that 1011 00:38:57,440 --> 00:39:01,280 we need to route to host on the worker 1012 00:39:00,000 --> 00:39:02,800 node 1013 00:39:01,280 --> 00:39:04,320 and now we can run 1014 00:39:02,800 --> 00:39:05,680 cry control which is a tool for 1015 00:39:04,320 --> 00:39:08,320 interacting with 1016 00:39:05,680 --> 00:39:10,640 the container runtime 1017 00:39:08,320 --> 00:39:14,240 on the worker node 1018 00:39:10,640 --> 00:39:17,119 inspect the id 1019 00:39:14,240 --> 00:39:17,839 we'll use jq 1020 00:39:17,119 --> 00:39:22,000 to 1021 00:39:17,839 --> 00:39:23,920 just pull out the pid of the process 1022 00:39:22,000 --> 00:39:25,440 now if we do cat 1023 00:39:23,920 --> 00:39:28,800 proc 1024 00:39:25,440 --> 00:39:31,359 that bit so 65317 1025 00:39:28,800 --> 00:39:34,240 uid map 1026 00:39:31,359 --> 00:39:36,560 what this file shows us is that uid0 1027 00:39:34,240 --> 00:39:37,760 inside the username space of this 1028 00:39:36,560 --> 00:39:43,040 process 1029 00:39:37,760 --> 00:39:44,800 is mapped to uid 200 000 on the host 1030 00:39:43,040 --> 00:39:47,040 it's two hundred thousand because the 1031 00:39:44,800 --> 00:39:48,480 modifications to etsy sub uid and that's 1032 00:39:47,040 --> 00:39:50,640 the sub gid 1033 00:39:48,480 --> 00:39:52,160 that we used 1034 00:39:50,640 --> 00:39:54,000 um 1035 00:39:52,160 --> 00:39:56,240 that's just the start the allowable 1036 00:39:54,000 --> 00:39:57,599 range of assignments 1037 00:39:56,240 --> 00:39:59,200 so as the first 1038 00:39:57,599 --> 00:40:01,760 container using a namespace that was 1039 00:39:59,200 --> 00:40:03,839 created that's where it started 1040 00:40:01,760 --> 00:40:06,079 to allocate the 1041 00:40:03,839 --> 00:40:07,839 host uids from and the size of the range 1042 00:40:06,079 --> 00:40:09,760 is 65536 1043 00:40:07,839 --> 00:40:10,400 as requested 1044 00:40:09,760 --> 00:40:11,760 now 1045 00:40:10,400 --> 00:40:14,720 if we have a look at the processes 1046 00:40:11,760 --> 00:40:15,760 running in the container we can do 1047 00:40:14,720 --> 00:40:16,880 let's see 1048 00:40:15,760 --> 00:40:19,680 pgrep 1049 00:40:16,880 --> 00:40:22,960 double dash ns this means show me 1050 00:40:19,680 --> 00:40:22,960 all of the processes 1051 00:40:23,119 --> 00:40:26,640 running in the same set of name spaces 1052 00:40:25,680 --> 00:40:27,760 as 1053 00:40:26,640 --> 00:40:30,240 process 1054 00:40:27,760 --> 00:40:31,599 six three five one seven 1055 00:40:30,240 --> 00:40:34,160 let's pipe that 1056 00:40:31,599 --> 00:40:35,839 to x args uh 1057 00:40:34,160 --> 00:40:40,079 p s 1058 00:40:35,839 --> 00:40:41,599 dash o we'll just have a look at the 1059 00:40:40,079 --> 00:40:44,240 user 1060 00:40:41,599 --> 00:40:47,200 appeared and the command line 1061 00:40:44,240 --> 00:40:47,200 and sort by pid 1062 00:40:47,680 --> 00:40:51,200 so this is what's running in the 1063 00:40:48,880 --> 00:40:55,520 container init that's system d we see a 1064 00:40:51,200 --> 00:40:55,520 bunch of other systemd daemons running 1065 00:40:55,760 --> 00:40:59,119 and 1066 00:40:57,359 --> 00:41:01,280 then toward the end 1067 00:40:59,119 --> 00:41:03,280 nginx the master process and and the 1068 00:41:01,280 --> 00:41:05,440 workers for the nginx server 1069 00:41:03,280 --> 00:41:07,680 so this is a systemd based container 1070 00:41:05,440 --> 00:41:11,520 running in the username space 1071 00:41:07,680 --> 00:41:11,520 the last thing to look at is the 1072 00:41:13,280 --> 00:41:16,800 c group 1073 00:41:15,280 --> 00:41:18,240 so 1074 00:41:16,800 --> 00:41:20,800 if we do 1075 00:41:18,240 --> 00:41:23,760 oc rsh so remote shell for running 1076 00:41:20,800 --> 00:41:26,640 commands inside the container 1077 00:41:23,760 --> 00:41:26,640 pod engine x 1078 00:41:26,880 --> 00:41:31,680 i think i'm almost out of time so 1079 00:41:30,720 --> 00:41:33,200 uh 1080 00:41:31,680 --> 00:41:35,839 it will suffice 1081 00:41:33,200 --> 00:41:40,400 to 1082 00:41:35,839 --> 00:41:40,400 demonstrate that sysfs c group 1083 00:41:41,920 --> 00:41:47,440 inside the container 1084 00:41:44,480 --> 00:41:48,720 is owned by root 1085 00:41:47,440 --> 00:41:50,319 and its 1086 00:41:48,720 --> 00:41:52,400 inode 1087 00:41:50,319 --> 00:41:55,599 is not 1088 00:41:52,400 --> 00:41:55,599 the inode of 1089 00:41:56,560 --> 00:41:59,680 cis fsc group 1090 00:42:00,000 --> 00:42:04,800 on the host or in the host c group 1091 00:42:01,520 --> 00:42:06,000 namespace which is inode one 1092 00:42:04,800 --> 00:42:07,520 if you were to 1093 00:42:06,000 --> 00:42:09,599 find 1094 00:42:07,520 --> 00:42:10,960 this directory from the host point of 1095 00:42:09,599 --> 00:42:14,480 view 1096 00:42:10,960 --> 00:42:14,480 actually we can have a look here um 1097 00:42:14,880 --> 00:42:19,760 uh ai ssfsc group 1098 00:42:18,160 --> 00:42:22,240 we can see that 1099 00:42:19,760 --> 00:42:24,480 only those files that were allowed to be 1100 00:42:22,240 --> 00:42:27,480 shown so init.scope 1101 00:42:24,480 --> 00:42:27,480 memory.on.group 1102 00:42:27,760 --> 00:42:33,200 the directory itself are owned by root 1103 00:42:31,520 --> 00:42:35,520 inside this username space everything 1104 00:42:33,200 --> 00:42:37,359 else is no body it's actually root in 1105 00:42:35,520 --> 00:42:39,040 the host username space but as that's 1106 00:42:37,359 --> 00:42:42,319 unmapped 1107 00:42:39,040 --> 00:42:47,200 unmapped uids get interpreted as uid 1108 00:42:42,319 --> 00:42:47,200 65534 which is the no body user 1109 00:42:47,280 --> 00:42:52,640 okay so that concludes the demo here's a 1110 00:42:49,760 --> 00:42:55,119 slide with some links to resources 1111 00:42:52,640 --> 00:42:57,599 including the main repo for this project 1112 00:42:55,119 --> 00:42:59,920 which doesn't have much in it yet but we 1113 00:42:57,599 --> 00:43:05,119 intend to expand that 1114 00:42:59,920 --> 00:43:05,119 my experimental run c builds are whoops 1115 00:43:05,200 --> 00:43:07,280 at 1116 00:43:06,240 --> 00:43:10,000 my 1117 00:43:07,280 --> 00:43:12,319 homepage on fedora 1118 00:43:10,000 --> 00:43:14,079 team blogs i blog extensively about all 1119 00:43:12,319 --> 00:43:15,920 of this research and development that 1120 00:43:14,079 --> 00:43:18,000 i'm doing so if you look at the 1121 00:43:15,920 --> 00:43:18,960 containers tag on my blog you'll find a 1122 00:43:18,000 --> 00:43:22,000 lot of 1123 00:43:18,960 --> 00:43:23,760 interesting stuff i have a youtube video 1124 00:43:22,000 --> 00:43:25,359 of the demo 1125 00:43:23,760 --> 00:43:28,240 with a bit more stuff in it than what i 1126 00:43:25,359 --> 00:43:29,839 was able to just show you 1127 00:43:28,240 --> 00:43:32,560 and the future 1128 00:43:29,839 --> 00:43:35,520 the username space support in kubernetes 1129 00:43:32,560 --> 00:43:38,000 itself is an ongoing discussion 1130 00:43:35,520 --> 00:43:41,119 on openshift we can use the annotation 1131 00:43:38,000 --> 00:43:43,680 based username space support 1132 00:43:41,119 --> 00:43:44,960 and we can use sys with that 1133 00:43:43,680 --> 00:43:47,280 combined with 1134 00:43:44,960 --> 00:43:48,720 the uh 1135 00:43:47,280 --> 00:43:50,960 feature to 1136 00:43:48,720 --> 00:43:53,119 tone the container c group which 1137 00:43:50,960 --> 00:43:54,400 at the moment isn't in an experimental 1138 00:43:53,119 --> 00:43:56,560 build 1139 00:43:54,400 --> 00:43:58,560 then you can do it as i've just 1140 00:43:56,560 --> 00:44:00,960 demonstrated but the question of 1141 00:43:58,560 --> 00:44:02,240 official support is still an ongoing 1142 00:44:00,960 --> 00:44:04,079 discussion that we're having with the 1143 00:44:02,240 --> 00:44:05,280 openshift project with its product 1144 00:44:04,079 --> 00:44:07,680 management 1145 00:44:05,280 --> 00:44:10,079 and we are looking for allies that is 1146 00:44:07,680 --> 00:44:13,040 other teams in a similar boat to us with 1147 00:44:10,079 --> 00:44:15,119 legacy applications that would benefit 1148 00:44:13,040 --> 00:44:17,920 from these features 1149 00:44:15,119 --> 00:44:21,359 being matured and released and 1150 00:44:17,920 --> 00:44:23,520 officially supported in openshift 1151 00:44:21,359 --> 00:44:25,280 um at the end of the day we might not 1152 00:44:23,520 --> 00:44:27,599 get what we want and we may end up 1153 00:44:25,280 --> 00:44:29,200 having to re-architect free ipa for the 1154 00:44:27,599 --> 00:44:30,640 cloud 1155 00:44:29,200 --> 00:44:33,119 and that would not be the end of the 1156 00:44:30,640 --> 00:44:35,359 world but uh we're hoping that we can 1157 00:44:33,119 --> 00:44:36,960 continue pressing on the path um that 1158 00:44:35,359 --> 00:44:38,640 i've been speaking about in this 1159 00:44:36,960 --> 00:44:39,599 presentation 1160 00:44:38,640 --> 00:44:40,720 and 1161 00:44:39,599 --> 00:44:44,319 that concludes thank you very much 1162 00:44:40,720 --> 00:44:47,599 fraser i know i've certainly had the the 1163 00:44:44,319 --> 00:44:50,800 container root requirement mapping issue 1164 00:44:47,599 --> 00:44:53,839 multiple times in the last several years 1165 00:44:50,800 --> 00:44:56,640 and as you said ideally you just don't 1166 00:44:53,839 --> 00:44:59,920 do that but that's not always the life 1167 00:44:56,640 --> 00:45:01,440 we live unfortunately um we have kind of 1168 00:44:59,920 --> 00:45:03,440 run out of time so there was a few 1169 00:45:01,440 --> 00:45:05,040 questions we'll drop them into the chat 1170 00:45:03,440 --> 00:45:07,040 window and 1171 00:45:05,040 --> 00:45:08,560 michael 1172 00:45:07,040 --> 00:45:10,319 and fraser 1173 00:45:08,560 --> 00:45:13,359 hopefully we'll be able to sort that out 1174 00:45:10,319 --> 00:45:15,520 as we get prepared for the next speaker 1175 00:45:13,359 --> 00:45:16,880 who is going to be coming up very 1176 00:45:15,520 --> 00:45:19,119 shortly we've got 10 minutes of 1177 00:45:16,880 --> 00:45:21,920 changeover so go grab a drink and we'll 1178 00:45:19,119 --> 00:45:25,160 see you back in 10. 1179 00:45:21,920 --> 00:45:25,160 thank you