1 00:00:00,420 --> 00:00:05,910 [Music] 2 00:00:10,880 --> 00:00:17,520 Hello everyone and welcome to the first 3 00:00:14,320 --> 00:00:19,600 round of talks for the day one for the 4 00:00:17,520 --> 00:00:21,920 general track. Um I'm Ash. I'll be 5 00:00:19,600 --> 00:00:24,240 session chairing this block and for our 6 00:00:21,920 --> 00:00:27,359 first talk we have Nick Moore talking 7 00:00:24,240 --> 00:00:31,439 about the primordial code. So, everybody 8 00:00:27,359 --> 00:00:34,599 give a big warm welcome to Nick. 9 00:00:31,439 --> 00:00:34,599 Thank you. 10 00:00:35,760 --> 00:00:41,680 There we go. Just to to start off, um uh 11 00:00:39,920 --> 00:00:43,360 content warnings. We mentioned 12 00:00:41,680 --> 00:00:45,840 infectious and genetic diseases 13 00:00:43,360 --> 00:00:47,760 including cancer and COVID, but not like 14 00:00:45,840 --> 00:00:50,960 any gruesome detail or anything like 15 00:00:47,760 --> 00:00:53,120 that. Just content warnings. 16 00:00:50,960 --> 00:00:55,680 All right. My name is Nick Nick Moore. 17 00:00:53,120 --> 00:00:57,440 Um I'm a consultant. I work in software 18 00:00:55,680 --> 00:00:58,800 development and systems architecture for 19 00:00:57,440 --> 00:01:00,879 a number of clients in a number of 20 00:00:58,800 --> 00:01:03,280 industries through my own company 21 00:01:00,879 --> 00:01:06,080 Nimote. Um I've been doing that for a 22 00:01:03,280 --> 00:01:09,040 lot of years now and it's it's awesome 23 00:01:06,080 --> 00:01:11,200 fun. You should try it. Um I've worked 24 00:01:09,040 --> 00:01:13,439 in with micro I've presented here 25 00:01:11,200 --> 00:01:15,040 previously about microython about the 26 00:01:13,439 --> 00:01:17,280 internet of things about visual 27 00:01:15,040 --> 00:01:20,000 programming languages about functional 28 00:01:17,280 --> 00:01:22,479 programming in JavaScript. um not at 29 00:01:20,000 --> 00:01:25,439 this conference at Linux conf and about 30 00:01:22,479 --> 00:01:29,040 how did you do NoSQL in an SQL database 31 00:01:25,439 --> 00:01:30,400 and now I'm having the luck of being 32 00:01:29,040 --> 00:01:32,560 able to try my hand at being a 33 00:01:30,400 --> 00:01:34,079 biioinformatician 34 00:01:32,560 --> 00:01:35,759 um 35 00:01:34,079 --> 00:01:37,439 to start with thanks and apologies to 36 00:01:35,759 --> 00:01:39,520 all my colleagues at Walter and Eliza 37 00:01:37,439 --> 00:01:41,040 Hall Institute of Medical Research and 38 00:01:39,520 --> 00:01:43,360 the University of Washington Genome 39 00:01:41,040 --> 00:01:46,159 Sciences and the Broman Batty Institute. 40 00:01:43,360 --> 00:01:47,840 Um the mistakes and oversimplifications 41 00:01:46,159 --> 00:01:49,520 in this presentation are all mine. 42 00:01:47,840 --> 00:01:51,920 They've been incredibly patient helping 43 00:01:49,520 --> 00:01:54,399 me learn and find my feet in this really 44 00:01:51,920 --> 00:01:56,159 amazing field. 45 00:01:54,399 --> 00:01:58,399 Uh there is or at least there will be 46 00:01:56,159 --> 00:02:00,880 the slides, the notes and the arata on 47 00:01:58,399 --> 00:02:04,399 this presentation up at that URL on that 48 00:02:00,880 --> 00:02:07,840 QR code. Um so it'll be updated soon 49 00:02:04,399 --> 00:02:09,440 enough. Um and that way any really 50 00:02:07,840 --> 00:02:12,720 gratuitous mistakes I make I can at 51 00:02:09,440 --> 00:02:15,040 least apologize for later. Um all right, 52 00:02:12,720 --> 00:02:17,040 let's talk about bioinformatics. What is 53 00:02:15,040 --> 00:02:18,400 bioinformatics? Does anyone other than 54 00:02:17,040 --> 00:02:22,239 Alan and Estelle know what 55 00:02:18,400 --> 00:02:24,319 bioinformatics is? Cool. That was way 56 00:02:22,239 --> 00:02:28,879 more people than I expected. Hopefully, 57 00:02:24,319 --> 00:02:31,360 you don't all work for Weihi. Um, 58 00:02:28,879 --> 00:02:33,360 uh, so bioinformatics is the study of 59 00:02:31,360 --> 00:02:35,519 biological systems kind of numerically. 60 00:02:33,360 --> 00:02:37,519 And because biological systems are a 61 00:02:35,519 --> 00:02:40,319 smidge complicated, that basically 62 00:02:37,519 --> 00:02:42,080 involves computers. 63 00:02:40,319 --> 00:02:43,599 We're really lucky. Computers and and 64 00:02:42,080 --> 00:02:45,840 biology have kind of developed in 65 00:02:43,599 --> 00:02:48,959 parallel over the last couple of 66 00:02:45,840 --> 00:02:50,879 centuries. So it dates back to sort of 67 00:02:48,959 --> 00:02:54,160 the invention of computing in the 19th 68 00:02:50,879 --> 00:02:56,879 century and the idea of of data as this 69 00:02:54,160 --> 00:02:59,599 continuous tape for for telegraphy and 70 00:02:56,879 --> 00:03:01,200 things like that. Um and at roughly the 71 00:02:59,599 --> 00:03:03,519 same time we were discovering the idea 72 00:03:01,200 --> 00:03:06,400 that inheritance of characteristics in 73 00:03:03,519 --> 00:03:08,640 plants and animals was was governed by 74 00:03:06,400 --> 00:03:10,879 sort of discrete inheritable units. They 75 00:03:08,640 --> 00:03:11,920 weren't really labeled genes yet, but 76 00:03:10,879 --> 00:03:15,440 that's what we're talking about. We're 77 00:03:11,920 --> 00:03:20,319 talking about genes. Um, and Darwin was 78 00:03:15,440 --> 00:03:22,560 working on natural selection. Um, 79 00:03:20,319 --> 00:03:24,400 and and how all of this could actually 80 00:03:22,560 --> 00:03:27,120 end up with the diversity of life we 81 00:03:24,400 --> 00:03:28,800 see. Um, 82 00:03:27,120 --> 00:03:30,799 at the same time, we actually did 83 00:03:28,800 --> 00:03:33,519 discover the DNA molecule and discovered 84 00:03:30,799 --> 00:03:35,519 there was this enormously long wiggly 85 00:03:33,519 --> 00:03:36,640 molecule and and it wasn't very 86 00:03:35,519 --> 00:03:38,879 interesting and it just looks like 87 00:03:36,640 --> 00:03:40,799 boogers. But we had no idea what it was 88 00:03:38,879 --> 00:03:42,480 for. I just think it's funny that it at 89 00:03:40,799 --> 00:03:43,920 the start of this story, we have DNA 90 00:03:42,480 --> 00:03:47,760 already lying around and everyone goes, 91 00:03:43,920 --> 00:03:49,200 "Oh, gross." Anyway, at some point or 92 00:03:47,760 --> 00:03:51,120 another though, people worked out that 93 00:03:49,200 --> 00:03:53,840 these inheritable units had a 94 00:03:51,120 --> 00:03:55,599 relationship. Ones that were more likely 95 00:03:53,840 --> 00:03:58,159 to be inherited together, ones that were 96 00:03:55,599 --> 00:04:01,120 less likely to be inherited together. 97 00:03:58,159 --> 00:04:02,959 And with a paper and pen, people worked 98 00:04:01,120 --> 00:04:05,200 out that there was like a linear pattern 99 00:04:02,959 --> 00:04:07,200 to this. These things had there were 100 00:04:05,200 --> 00:04:08,959 closer ones and further ones and the 101 00:04:07,200 --> 00:04:11,280 distances kind of added up in a long 102 00:04:08,959 --> 00:04:14,000 line which is really interesting when 103 00:04:11,280 --> 00:04:16,079 you think about it. Um Colo proposed 104 00:04:14,000 --> 00:04:18,479 that this was a giant hereditary 105 00:04:16,079 --> 00:04:20,720 hereditary molecule a molecule that was 106 00:04:18,479 --> 00:04:22,960 passed down from a parent organism to a 107 00:04:20,720 --> 00:04:25,680 child organism and that must be how 108 00:04:22,960 --> 00:04:28,960 heredity works. I don't think he knew it 109 00:04:25,680 --> 00:04:30,479 was DNA at that point. It's one of those 110 00:04:28,960 --> 00:04:32,160 things where the Soviet world and the 111 00:04:30,479 --> 00:04:35,840 Western world maybe weren't talking that 112 00:04:32,160 --> 00:04:38,800 much. Um unfortunately that ran a foul 113 00:04:35,840 --> 00:04:42,639 of lysenoist lamarist theories of 114 00:04:38,800 --> 00:04:45,199 evolution and so he didn't go too well. 115 00:04:42,639 --> 00:04:47,199 Um at the same time Alan Turing was 116 00:04:45,199 --> 00:04:48,880 working on the fundamentals of computing 117 00:04:47,199 --> 00:04:50,720 and he came up with the idea of a 118 00:04:48,880 --> 00:04:53,199 universal computing machine that went 119 00:04:50,720 --> 00:04:56,479 back and forth along a tape. It's not 120 00:04:53,199 --> 00:04:58,479 really DNA but it's it's an idea of this 121 00:04:56,479 --> 00:05:00,560 linear organization of information that 122 00:04:58,479 --> 00:05:02,320 was being developed at the same time. 123 00:05:00,560 --> 00:05:04,080 Boolean algebra kind of got developed 124 00:05:02,320 --> 00:05:06,320 and the the theories behind that by 125 00:05:04,080 --> 00:05:07,919 people like Claude Shannon and then of 126 00:05:06,320 --> 00:05:10,000 course the war distracted people 127 00:05:07,919 --> 00:05:11,919 somewhat into cryp analysis which ended 128 00:05:10,000 --> 00:05:15,759 up giving us general purpose computing 129 00:05:11,919 --> 00:05:17,840 which was kind of awesome. Um in the 40s 130 00:05:15,759 --> 00:05:19,600 Frederick Sanger I think working on a 131 00:05:17,840 --> 00:05:22,000 theory that proteins were what you 132 00:05:19,600 --> 00:05:24,080 inherited worked out how to sequence 133 00:05:22,000 --> 00:05:27,199 proteins which was pretty awesome and he 134 00:05:24,080 --> 00:05:29,039 won a Nobel Prize for it. Um but it 135 00:05:27,199 --> 00:05:30,880 turned out that that's not actually how 136 00:05:29,039 --> 00:05:32,800 things are inherited. Oh yeah and in the 137 00:05:30,880 --> 00:05:34,400 meantime Allan after the war goes on to 138 00:05:32,800 --> 00:05:36,960 work on morphagenesis which is 139 00:05:34,400 --> 00:05:39,919 bioinformatics. So obviously this is an 140 00:05:36,960 --> 00:05:42,160 interesting field that's developing. 141 00:05:39,919 --> 00:05:43,600 In about 44 DNA turned out to be the 142 00:05:42,160 --> 00:05:46,320 carrier of genetic information. That 143 00:05:43,600 --> 00:05:48,000 horrible booger looking molecule turned 144 00:05:46,320 --> 00:05:52,240 out to be actually where all of our 145 00:05:48,000 --> 00:05:53,919 heredity comes from. um and Watson Frank 146 00:05:52,240 --> 00:05:55,280 and Crick and Rosalyn Franklin 147 00:05:53,919 --> 00:05:57,199 discovered that structure managed to 148 00:05:55,280 --> 00:05:58,240 derive a structure of the double helix 149 00:05:57,199 --> 00:06:01,280 that you're familiar with and you'll see 150 00:05:58,240 --> 00:06:02,880 in a minute Frederick Sanger not to be 151 00:06:01,280 --> 00:06:04,319 put off by the fact that the molecule 152 00:06:02,880 --> 00:06:06,160 he's working on wasn't the source of 153 00:06:04,319 --> 00:06:08,639 herity goes and works out a sequence DNA 154 00:06:06,160 --> 00:06:12,800 wins another Nobel Prize you know just 155 00:06:08,639 --> 00:06:15,120 another Tuesday um and then in the 2000s 156 00:06:12,800 --> 00:06:17,039 we developed DNA editing we worked out 157 00:06:15,120 --> 00:06:20,319 mechanisms that allow people to actually 158 00:06:17,039 --> 00:06:22,960 edit DNA molecules change genes. That's 159 00:06:20,319 --> 00:06:24,560 pretty awesome. Uh, in recent decades, 160 00:06:22,960 --> 00:06:26,960 we've worked out how to actually change 161 00:06:24,560 --> 00:06:28,960 genes in a living cell, how to change 162 00:06:26,960 --> 00:06:31,199 genes in a living person, and this has 163 00:06:28,960 --> 00:06:33,759 actually been used to cure people's 164 00:06:31,199 --> 00:06:38,240 genetic diseases, which is, I think, 165 00:06:33,759 --> 00:06:40,400 just amazingly awesome. Um, 166 00:06:38,240 --> 00:06:41,360 all right. So, to make any sense out of 167 00:06:40,400 --> 00:06:43,520 this presentation, we're going to have 168 00:06:41,360 --> 00:06:45,440 to teach you a bit of biology, about 4 169 00:06:43,520 --> 00:06:50,960 billion years of biology in about 10 170 00:06:45,440 --> 00:06:54,240 minutes. So, hang on. Um, 171 00:06:50,960 --> 00:06:56,319 organisms are made of cells. 172 00:06:54,240 --> 00:06:58,960 A cell is like a little tiny computer. 173 00:06:56,319 --> 00:07:01,360 It's an encapsul an encapsulation. It 174 00:06:58,960 --> 00:07:04,160 has a bunch of code inside it which we 175 00:07:01,360 --> 00:07:06,240 call genes. That code makes proteins. 176 00:07:04,160 --> 00:07:08,400 The little machines that make the cell 177 00:07:06,240 --> 00:07:10,240 work. 178 00:07:08,400 --> 00:07:12,880 The cell is sort of defined by its 179 00:07:10,240 --> 00:07:14,400 membrane, by its its its edge, its 180 00:07:12,880 --> 00:07:16,000 outside container that keeps the 181 00:07:14,400 --> 00:07:17,759 outsides out and the insides in. You can 182 00:07:16,000 --> 00:07:19,440 think of that as a bit like a firewall. 183 00:07:17,759 --> 00:07:22,160 It has various ports through it called 184 00:07:19,440 --> 00:07:24,160 channels that let information in the 185 00:07:22,160 --> 00:07:27,440 form of molecules come in and out of the 186 00:07:24,160 --> 00:07:29,520 cell under the cell's control. So the 187 00:07:27,440 --> 00:07:32,319 cell kind of controls its own gateways 188 00:07:29,520 --> 00:07:34,160 and things like that. And that's really 189 00:07:32,319 --> 00:07:36,240 what a cell is. It's a collection of 190 00:07:34,160 --> 00:07:39,919 stuff in one place. There are different 191 00:07:36,240 --> 00:07:41,759 kinds of cell. That's a um a bacterial 192 00:07:39,919 --> 00:07:45,199 sort of cell, typical bacterial cell, 193 00:07:41,759 --> 00:07:47,120 typical animal cell, typical plant cell. 194 00:07:45,199 --> 00:07:48,639 They have a lot of things in different 195 00:07:47,120 --> 00:07:50,800 but they have a lot of things in common 196 00:07:48,639 --> 00:07:52,960 which is largely this this idea that the 197 00:07:50,800 --> 00:07:56,080 the genetic information is hoarded very 198 00:07:52,960 --> 00:07:58,560 carefully in the center. 199 00:07:56,080 --> 00:08:00,560 All right DNA itself the molecule we're 200 00:07:58,560 --> 00:08:02,160 talking about is indeed a very long 201 00:08:00,560 --> 00:08:04,879 molecule. This is just a little section 202 00:08:02,160 --> 00:08:09,039 here of a very long spiraly molecule 203 00:08:04,879 --> 00:08:11,199 meters in length you know that is 204 00:08:09,039 --> 00:08:16,319 composed of these sort of bases these 205 00:08:11,199 --> 00:08:18,400 subunits and the subunits pair up so 206 00:08:16,319 --> 00:08:22,400 it's like a very long tape instead of 207 00:08:18,400 --> 00:08:26,479 two bits one and zero we have four bits 208 00:08:22,400 --> 00:08:27,919 A C G and T um and they complement each 209 00:08:26,479 --> 00:08:29,759 other and build themselves up like a 210 00:08:27,919 --> 00:08:30,879 ladder so you get that idea of that 211 00:08:29,759 --> 00:08:32,880 symmetry 212 00:08:30,879 --> 00:08:34,560 That's more or less exactly what Colov 213 00:08:32,880 --> 00:08:36,399 described, funnily enough, even though 214 00:08:34,560 --> 00:08:38,880 he had none of the equipment required to 215 00:08:36,399 --> 00:08:40,479 see or work out any of this stuff. He 216 00:08:38,880 --> 00:08:42,080 know if we're going to record all this 217 00:08:40,479 --> 00:08:43,440 stuff as some very long molecule, it'll 218 00:08:42,080 --> 00:08:45,440 have to zip together with some other 219 00:08:43,440 --> 00:08:47,360 very long molecule to stabil. Anyway, 220 00:08:45,440 --> 00:08:49,680 that's pretty amazing. 221 00:08:47,360 --> 00:08:51,519 When I talk about the genomes coming up, 222 00:08:49,680 --> 00:08:54,160 we'll talk about in terms of of base 223 00:08:51,519 --> 00:08:57,040 pairs or mega base pairs or whatever. A 224 00:08:54,160 --> 00:08:58,640 base pair is is effectively two bits 225 00:08:57,040 --> 00:09:01,279 because there's four options for base 226 00:08:58,640 --> 00:09:02,720 pairs. So therefore it's worth two bits 227 00:09:01,279 --> 00:09:04,160 of information. So that's just to give 228 00:09:02,720 --> 00:09:06,240 you an idea of what the currency here 229 00:09:04,160 --> 00:09:08,720 is. 230 00:09:06,240 --> 00:09:10,800 So 231 00:09:08,720 --> 00:09:14,000 in about 77 we started being able to 232 00:09:10,800 --> 00:09:15,680 sequence whole organisms. Uh the first 233 00:09:14,000 --> 00:09:18,480 one sequenced I believe was a bacteria 234 00:09:15,680 --> 00:09:21,440 phase a virus that attacks bacteria. Um 235 00:09:18,480 --> 00:09:24,080 it has only about five kilobase 236 00:09:21,440 --> 00:09:28,800 kilobases. It's single stranded RNA I 237 00:09:24,080 --> 00:09:30,880 think. Um and only 11 genes. Anyone here 238 00:09:28,800 --> 00:09:32,560 who is a programmer of very old 239 00:09:30,880 --> 00:09:34,399 computers will sympathize. Those 11 240 00:09:32,560 --> 00:09:36,480 genes actually kind of overlap each 241 00:09:34,399 --> 00:09:38,560 other and they interfere with each other 242 00:09:36,480 --> 00:09:41,200 and it's very compact compact because it 243 00:09:38,560 --> 00:09:43,839 manages to fit enough 244 00:09:41,200 --> 00:09:46,240 potential into such a tiny number of 245 00:09:43,839 --> 00:09:47,760 instructions. Incredible. A couple of 246 00:09:46,240 --> 00:09:50,320 decades later, we're sequencing much 247 00:09:47,760 --> 00:09:53,120 more complicated organisms like E.coli. 248 00:09:50,320 --> 00:09:56,480 So pretty common human gut bacteria 249 00:09:53,120 --> 00:09:59,360 which have millions of bases and yeast. 250 00:09:56,480 --> 00:10:01,600 ask me about my yeasts. Um, uh, which 251 00:09:59,360 --> 00:10:03,440 has millions of bases and and thousands 252 00:10:01,600 --> 00:10:06,959 of genes. We eventually work up to the 253 00:10:03,440 --> 00:10:08,880 fruitfly, which is a a an organism 254 00:10:06,959 --> 00:10:10,560 beloved of biologists because you can 255 00:10:08,880 --> 00:10:12,640 catch them in a milk bottle and breed 256 00:10:10,560 --> 00:10:14,079 them and mail them to people and things 257 00:10:12,640 --> 00:10:16,240 like that. It's they're very good to 258 00:10:14,079 --> 00:10:18,640 work with. They have hundreds of 259 00:10:16,240 --> 00:10:20,480 millions of base pairs. And eventually, 260 00:10:18,640 --> 00:10:22,880 in about 2003, we worked up to 261 00:10:20,480 --> 00:10:25,760 sequencing a whole human genome, which 262 00:10:22,880 --> 00:10:27,120 is like three billion base pairs. As a 263 00:10:25,760 --> 00:10:29,839 bonus, we all have two copies of it 264 00:10:27,120 --> 00:10:33,200 we're carrying around, but and about 265 00:10:29,839 --> 00:10:35,839 that forms about 20,000 genes. So, a 266 00:10:33,200 --> 00:10:37,760 human genome is quite a large project. 267 00:10:35,839 --> 00:10:39,680 Uh this is just a chart showing kind of 268 00:10:37,760 --> 00:10:41,279 an idea of the different species. And 269 00:10:39,680 --> 00:10:44,000 and part of the point of this chart is 270 00:10:41,279 --> 00:10:46,240 actually the the kind of the opposite of 271 00:10:44,000 --> 00:10:48,079 what I just said. Um mammals are down 272 00:10:46,240 --> 00:10:49,680 here in this little region here, but 273 00:10:48,079 --> 00:10:51,839 here's flowering plants, which actually 274 00:10:49,680 --> 00:10:54,560 turn out to be like way more complicated 275 00:10:51,839 --> 00:10:56,560 at sometimes or not. You know, there's a 276 00:10:54,560 --> 00:10:58,160 huge variation in the size of genomes 277 00:10:56,560 --> 00:11:01,360 between species, I think, is the point 278 00:10:58,160 --> 00:11:02,880 of that one. Um, but the funny thing 279 00:11:01,360 --> 00:11:05,839 about it is we're all using basically 280 00:11:02,880 --> 00:11:08,959 the same genetic code. There's so much 281 00:11:05,839 --> 00:11:10,959 in common between different organisms. 282 00:11:08,959 --> 00:11:12,480 This is, I've been told, a compulsory 283 00:11:10,959 --> 00:11:14,240 slide that must be included in every 284 00:11:12,480 --> 00:11:16,320 single biioinformatics 285 00:11:14,240 --> 00:11:18,399 um, presentation. This is the cost to 286 00:11:16,320 --> 00:11:20,720 sequence a human genome. It falls 287 00:11:18,399 --> 00:11:22,399 precipitously. But wait, those of you 288 00:11:20,720 --> 00:11:24,000 who aren't terrible science nerds may 289 00:11:22,399 --> 00:11:27,360 not have noticed that that is a log 290 00:11:24,000 --> 00:11:29,680 scale. The top point there is $100 291 00:11:27,360 --> 00:11:32,720 million to sequence a single human 292 00:11:29,680 --> 00:11:36,079 genome. And it falls by this end to less 293 00:11:32,720 --> 00:11:41,920 than $1,000 to sequence a single human 294 00:11:36,079 --> 00:11:43,680 genome. That's kind of awesome. Um, 295 00:11:41,920 --> 00:11:45,279 so I mean that kind of under represents 296 00:11:43,680 --> 00:11:47,600 if I plotted it on a linear scale, it 297 00:11:45,279 --> 00:11:50,320 would just fall like a log and go across 298 00:11:47,600 --> 00:11:51,839 there. Um, 299 00:11:50,320 --> 00:11:53,279 this is really exciting for a couple of 300 00:11:51,839 --> 00:11:55,040 reasons. One of which is, of course, it 301 00:11:53,279 --> 00:11:56,959 allows us to do precision medicine to 302 00:11:55,040 --> 00:11:58,880 really analyze one particular patient 303 00:11:56,959 --> 00:12:01,120 and really get to the heart of what the 304 00:11:58,880 --> 00:12:03,040 hell is wrong with them and maybe find a 305 00:12:01,120 --> 00:12:05,279 specific cure for that specific person. 306 00:12:03,040 --> 00:12:07,920 And that's really cool, 307 00:12:05,279 --> 00:12:09,760 but also it enables us to look more 308 00:12:07,920 --> 00:12:11,760 broadly at what a human genome is. The 309 00:12:09,760 --> 00:12:14,639 human genome project has like one 310 00:12:11,760 --> 00:12:17,200 sequence from it's combined from a a a 311 00:12:14,639 --> 00:12:20,800 couple of people I think but it is 312 00:12:17,200 --> 00:12:23,200 basically representing a person right 313 00:12:20,800 --> 00:12:25,680 the pang genome is is like a 314 00:12:23,200 --> 00:12:27,920 representation of what is all of the 315 00:12:25,680 --> 00:12:30,240 possibilities for healthy humans 316 00:12:27,920 --> 00:12:34,399 worldwide and it opens up the 317 00:12:30,240 --> 00:12:35,680 possibility of doing medicine for um 318 00:12:34,399 --> 00:12:37,519 people in developing countries for 319 00:12:35,680 --> 00:12:39,600 example who are not really in the market 320 00:12:37,519 --> 00:12:41,920 for precision medicine but maybe we can 321 00:12:39,600 --> 00:12:43,519 at least understand 322 00:12:41,920 --> 00:12:46,240 conditions that are more applicable to 323 00:12:43,519 --> 00:12:47,920 them more easily which I think is is 324 00:12:46,240 --> 00:12:49,680 really awesome too. And so these pang 325 00:12:47,920 --> 00:12:52,320 genome projects are rolling out to look 326 00:12:49,680 --> 00:12:53,600 at thousands of people, tens of 327 00:12:52,320 --> 00:12:55,839 thousands of people, hundreds of 328 00:12:53,600 --> 00:12:58,320 thousands of people across the world's 329 00:12:55,839 --> 00:13:01,040 population and to build up a picture of 330 00:12:58,320 --> 00:13:03,839 what all of that combined 331 00:13:01,040 --> 00:13:05,279 might be. All right. 332 00:13:03,839 --> 00:13:07,600 Okay. So let's talk about genes. Let's 333 00:13:05,279 --> 00:13:09,279 talk about evolution. 334 00:13:07,600 --> 00:13:10,639 The reason we can evolve is because we 335 00:13:09,279 --> 00:13:13,200 can change and the reason we can change 336 00:13:10,639 --> 00:13:16,639 is because we can make mistakes. Um, 337 00:13:13,200 --> 00:13:18,480 mistakes creep in whenever um a cell 338 00:13:16,639 --> 00:13:20,320 replicates, whenever a cell receives 339 00:13:18,480 --> 00:13:22,639 like radiation damage or something like 340 00:13:20,320 --> 00:13:24,959 that. Viruses can write themselves back 341 00:13:22,639 --> 00:13:27,360 into your DNA. Most of these changes are 342 00:13:24,959 --> 00:13:29,839 very unhelpful and and not good for you. 343 00:13:27,360 --> 00:13:32,399 But in evolutionary time scales, the 344 00:13:29,839 --> 00:13:36,000 good outweighs the bad and you you end 345 00:13:32,399 --> 00:13:37,839 up evolving new abilities, new whatever. 346 00:13:36,000 --> 00:13:39,920 not quite like Xben or whatever, you 347 00:13:37,839 --> 00:13:42,160 know, it doesn't help an individual very 348 00:13:39,920 --> 00:13:44,399 often very much, but over billions of 349 00:13:42,160 --> 00:13:45,680 years, you get somewhere. So, the kind 350 00:13:44,399 --> 00:13:47,440 of changes we're talking about are like 351 00:13:45,680 --> 00:13:50,480 substitution of a single base for 352 00:13:47,440 --> 00:13:52,000 another base, insertion of some bases, 353 00:13:50,480 --> 00:13:53,680 deletion of some bases, or just 354 00:13:52,000 --> 00:13:55,040 duplication. Sometimes the mechanism 355 00:13:53,680 --> 00:13:57,360 gets a bit gummed up and you end up with 356 00:13:55,040 --> 00:13:58,720 little duplicate runs. So, as a 357 00:13:57,360 --> 00:14:00,720 programmer, you can kind of imagine 358 00:13:58,720 --> 00:14:03,519 doing this to your own code, right? It 359 00:14:00,720 --> 00:14:06,000 would very often not be helpful. Every 360 00:14:03,519 --> 00:14:08,800 now and then it might turn out that it 361 00:14:06,000 --> 00:14:10,639 actually improves something. The 362 00:14:08,800 --> 00:14:12,959 duplication is especially interesting. 363 00:14:10,639 --> 00:14:16,240 If you copied a function out of a piece 364 00:14:12,959 --> 00:14:18,480 of code and then now you have two copies 365 00:14:16,240 --> 00:14:20,160 and they both kind of are doing the job, 366 00:14:18,480 --> 00:14:23,040 then that frees you up to have one of 367 00:14:20,160 --> 00:14:24,800 them change a bit more without getting 368 00:14:23,040 --> 00:14:26,240 rid of the job the first one was doing. 369 00:14:24,800 --> 00:14:28,480 So that's part of the mechanism of 370 00:14:26,240 --> 00:14:31,360 evolution. Um, I think it's really 371 00:14:28,480 --> 00:14:32,880 interesting that if if replication and 372 00:14:31,360 --> 00:14:35,519 radiation protection and all that stuff 373 00:14:32,880 --> 00:14:37,120 was perfect, nothing would ever evolve. 374 00:14:35,519 --> 00:14:38,720 We wouldn't be here. The system has to 375 00:14:37,120 --> 00:14:40,880 be imperfect for it to work. I think 376 00:14:38,720 --> 00:14:42,240 that's amazing. Bacteria have an even 377 00:14:40,880 --> 00:14:45,040 better trick up their sleeve. They do 378 00:14:42,240 --> 00:14:46,959 horizontal gene transfer. A bacteria can 379 00:14:45,040 --> 00:14:48,560 basically build a thing called a plasmid 380 00:14:46,959 --> 00:14:50,079 and inject it across into one of its 381 00:14:48,560 --> 00:14:53,279 neighbors using this sort of little 382 00:14:50,079 --> 00:14:55,519 hairy tube things and give away some of 383 00:14:53,279 --> 00:14:58,160 its genes to its neighbors. not even 384 00:14:55,519 --> 00:15:00,240 necessarily the same species. This seems 385 00:14:58,160 --> 00:15:02,160 really remarkable because it means that 386 00:15:00,240 --> 00:15:05,639 bacteria are the first sort of free 387 00:15:02,160 --> 00:15:05,639 software advocates. 388 00:15:07,680 --> 00:15:10,800 What's really interesting about this 389 00:15:09,120 --> 00:15:12,959 one, right, is that it's not the 390 00:15:10,800 --> 00:15:15,440 bacteria's survival that matters in the 391 00:15:12,959 --> 00:15:18,079 long run. It's the gene's survival. If 392 00:15:15,440 --> 00:15:21,040 I'm a bacteria and I have a gene for for 393 00:15:18,079 --> 00:15:23,680 antibiotic resistance and I give that to 394 00:15:21,040 --> 00:15:26,160 a neighbor, that neighbor can develop 395 00:15:23,680 --> 00:15:28,560 that antibiotic resistance, that gene 396 00:15:26,160 --> 00:15:30,880 will now be replicated by that neighbor. 397 00:15:28,560 --> 00:15:32,800 Even if I now get out competed, the gene 398 00:15:30,880 --> 00:15:34,639 lives on 399 00:15:32,800 --> 00:15:37,519 and and we've all seen open source 400 00:15:34,639 --> 00:15:40,480 libraries that do this too, you know. 401 00:15:37,519 --> 00:15:41,839 Um, so 402 00:15:40,480 --> 00:15:43,120 I want to talk about this this concept 403 00:15:41,839 --> 00:15:47,040 at the center of biology. called the 404 00:15:43,120 --> 00:15:49,920 central dogma of of biology. The idea is 405 00:15:47,040 --> 00:15:51,600 that DNA gets transcribed into RNA that 406 00:15:49,920 --> 00:15:54,639 gets translated into proteins, right? 407 00:15:51,600 --> 00:15:57,920 It's very orderly, very neat. DNA turns 408 00:15:54,639 --> 00:15:59,920 into RNA turns into proteins. You know, 409 00:15:57,920 --> 00:16:03,440 you have code, you compile it, you link 410 00:15:59,920 --> 00:16:05,360 it, you run it. Fine. Of course, there's 411 00:16:03,440 --> 00:16:07,519 also retrotranscription. That's when a 412 00:16:05,360 --> 00:16:09,519 virus writes from RNA back into DNA. 413 00:16:07,519 --> 00:16:12,000 There's also replication where DNA gets 414 00:16:09,519 --> 00:16:14,399 copied into more DNA. So, it's not okay. 415 00:16:12,000 --> 00:16:16,079 Um, and also sometimes RNA actually has 416 00:16:14,399 --> 00:16:18,079 a useful job to do itself. It's called 417 00:16:16,079 --> 00:16:19,759 non-coding RNA. So sometimes you're 418 00:16:18,079 --> 00:16:22,160 using like the think of as a data 419 00:16:19,759 --> 00:16:23,600 segment in your code. Gets used directly 420 00:16:22,160 --> 00:16:24,880 rather than being turned into a protein. 421 00:16:23,600 --> 00:16:26,959 But that's that's fine. That's not too 422 00:16:24,880 --> 00:16:29,680 complicated. Um, except for RNA 423 00:16:26,959 --> 00:16:32,079 polymerase, right? RNA polymerase is a 424 00:16:29,680 --> 00:16:34,560 protein that does the transcription, 425 00:16:32,079 --> 00:16:35,839 does the job of transcription. So So I 426 00:16:34,560 --> 00:16:37,519 think that's okay though because we've 427 00:16:35,839 --> 00:16:39,839 got the protein at the end, the thing we 428 00:16:37,519 --> 00:16:44,160 just compile and run, and that's our 429 00:16:39,839 --> 00:16:46,399 compiler. Um hang on. And then um also 430 00:16:44,160 --> 00:16:47,920 RNA needs to be spliced and splicing is 431 00:16:46,399 --> 00:16:49,600 kind of complicated. It's done by 432 00:16:47,920 --> 00:16:51,920 proteins and also when you do the 433 00:16:49,600 --> 00:16:54,079 translation that's done by proteins and 434 00:16:51,920 --> 00:16:55,920 non-coding RNA. 435 00:16:54,079 --> 00:16:58,800 So so we need to run the compiler in 436 00:16:55,920 --> 00:17:00,880 order to run the compiler. Um this 437 00:16:58,800 --> 00:17:02,560 starts to get a bit tricky. The only 438 00:17:00,880 --> 00:17:04,480 reason this works at all is because it's 439 00:17:02,560 --> 00:17:09,760 bootstrapped. The cell that you grew 440 00:17:04,480 --> 00:17:12,720 from, the usite is full of these logical 441 00:17:09,760 --> 00:17:14,319 units enough to keep building itself for 442 00:17:12,720 --> 00:17:15,919 long enough to get itself up and 443 00:17:14,319 --> 00:17:17,839 running. It's kind of a little bit like 444 00:17:15,919 --> 00:17:19,199 a boot disc. There's enough of an 445 00:17:17,839 --> 00:17:21,360 operating system on there that it can 446 00:17:19,199 --> 00:17:24,799 make another boot disc and another boot 447 00:17:21,360 --> 00:17:26,480 disc and copy errors kind of creep in. 448 00:17:24,799 --> 00:17:28,400 Um, 449 00:17:26,480 --> 00:17:29,840 so we talk about this as bootstrapping 450 00:17:28,400 --> 00:17:31,360 from the metaphor of pulling yourself up 451 00:17:29,840 --> 00:17:34,160 by your own bootstraps, which is of 452 00:17:31,360 --> 00:17:36,960 course impossible. But it raises the 453 00:17:34,160 --> 00:17:38,960 question, when did the first cell manage 454 00:17:36,960 --> 00:17:41,280 to do this? And this is a thing we we 455 00:17:38,960 --> 00:17:44,000 just don't know. There's no source 456 00:17:41,280 --> 00:17:46,320 control in biology. We can't check out 457 00:17:44,000 --> 00:17:48,240 the first version and find out. It's 458 00:17:46,320 --> 00:17:50,000 very frustrating. But we can look at the 459 00:17:48,240 --> 00:17:51,520 common features. Oh, that's not the 460 00:17:50,000 --> 00:17:53,039 right slide. We can look at the common 461 00:17:51,520 --> 00:17:54,640 features. This is actually one of the 462 00:17:53,039 --> 00:17:56,720 common features. 463 00:17:54,640 --> 00:17:58,080 of of all these different branches of 464 00:17:56,720 --> 00:17:59,840 life. And so what have they all got in 465 00:17:58,080 --> 00:18:01,919 common? And we can kind of theorize this 466 00:17:59,840 --> 00:18:04,799 existence of a a common ancestor that 467 00:18:01,919 --> 00:18:07,120 must have arisen at some point. 468 00:18:04,799 --> 00:18:09,280 Possibly it had RNA instead of DNA and 469 00:18:07,120 --> 00:18:10,799 sort of started halfway through. 470 00:18:09,280 --> 00:18:14,000 Possibly 471 00:18:10,799 --> 00:18:16,160 possibly a lot of things. Um 472 00:18:14,000 --> 00:18:19,039 and that's where all life kind of 473 00:18:16,160 --> 00:18:20,400 evolved from in its different ways. This 474 00:18:19,039 --> 00:18:21,919 is just a little example. I was talking 475 00:18:20,400 --> 00:18:23,120 before I said proteins are like little 476 00:18:21,919 --> 00:18:24,480 machines. As I said before, I just 477 00:18:23,120 --> 00:18:27,200 wanted to demonstrate to you that this 478 00:18:24,480 --> 00:18:29,679 is not a metaphor. This is this is not 479 00:18:27,200 --> 00:18:31,919 one of those whole endofunctors in the 480 00:18:29,679 --> 00:18:33,520 whatever. This is this is a an actual 481 00:18:31,919 --> 00:18:36,400 real machine. This is a thing called ATP 482 00:18:33,520 --> 00:18:37,440 synthes. It's a protein complex. The 483 00:18:36,400 --> 00:18:38,880 different colored bits are different 484 00:18:37,440 --> 00:18:40,320 proteins. They all lock together. They 485 00:18:38,880 --> 00:18:42,400 build a little machine. It has a 486 00:18:40,320 --> 00:18:45,360 rotating crankshaft 487 00:18:42,400 --> 00:18:48,400 and it literally pumps either either 488 00:18:45,360 --> 00:18:50,960 takes protons, push it around, and it 489 00:18:48,400 --> 00:18:52,160 makes ATP or ATP pushes around and it 490 00:18:50,960 --> 00:18:53,919 pumps protein. 491 00:18:52,160 --> 00:18:56,640 protons 492 00:18:53,919 --> 00:18:58,559 either. Um, it's like a engine in every 493 00:18:56,640 --> 00:19:00,000 single cell of every single organism. 494 00:18:58,559 --> 00:19:02,640 And there slightly different versions of 495 00:19:00,000 --> 00:19:05,440 it around. Again, not great source 496 00:19:02,640 --> 00:19:07,919 control, but it is one of the basic kind 497 00:19:05,440 --> 00:19:10,240 of units of what we call life. You could 498 00:19:07,919 --> 00:19:12,799 also say that life is just ATP syntheses 499 00:19:10,240 --> 00:19:14,320 way of making more ATP synthes. Depends 500 00:19:12,799 --> 00:19:16,960 on which your perspective you look at 501 00:19:14,320 --> 00:19:18,640 this. Okay. Translation. I talked about 502 00:19:16,960 --> 00:19:22,240 translation before. translation happens 503 00:19:18,640 --> 00:19:24,160 inside when you turn RNA into a protein. 504 00:19:22,240 --> 00:19:26,559 There's like a lookup table of what bits 505 00:19:24,160 --> 00:19:28,480 of RNA turn into what amino acids and 506 00:19:26,559 --> 00:19:30,960 then the amino acids get assembled into 507 00:19:28,480 --> 00:19:33,200 a protein into a exciting little 508 00:19:30,960 --> 00:19:34,960 machine. This looks a little bit 509 00:19:33,200 --> 00:19:37,840 computery, doesn't it? It's like we've 510 00:19:34,960 --> 00:19:39,280 got different encodings. We've got start 511 00:19:37,840 --> 00:19:41,360 codeons which say where to start 512 00:19:39,280 --> 00:19:43,600 running. We've got stop codeons that 513 00:19:41,360 --> 00:19:45,600 when say when to stop running. And it 514 00:19:43,600 --> 00:19:47,760 kind of looks 515 00:19:45,600 --> 00:19:49,440 kind of computery a little bit like a 516 00:19:47,760 --> 00:19:52,799 decode table from an early 517 00:19:49,440 --> 00:19:54,559 microprocessor or something like that. 518 00:19:52,799 --> 00:19:56,240 But there's no loops, there's no 519 00:19:54,559 --> 00:19:57,840 branching, there's no anything. So how 520 00:19:56,240 --> 00:19:59,919 the hell is this a program? You can't 521 00:19:57,840 --> 00:20:02,559 really have a program without loops and 522 00:19:59,919 --> 00:20:04,559 branching and all of that sort of stuff, 523 00:20:02,559 --> 00:20:07,520 can you? 524 00:20:04,559 --> 00:20:08,640 The answer is that these the chromosome, 525 00:20:07,520 --> 00:20:11,840 the thing that connects all the genes 526 00:20:08,640 --> 00:20:13,919 up, isn't just the genes. It's also um 527 00:20:11,840 --> 00:20:16,320 control regions like a library file. 528 00:20:13,919 --> 00:20:18,640 There are there are sections that say 529 00:20:16,320 --> 00:20:20,799 how much of this gene you should you 530 00:20:18,640 --> 00:20:22,640 should or how much this protein you 531 00:20:20,799 --> 00:20:24,400 should make using this gene, how much 532 00:20:22,640 --> 00:20:27,840 should this gene be activated, how much 533 00:20:24,400 --> 00:20:30,080 should it be repressed, etc., etc. And 534 00:20:27,840 --> 00:20:32,640 also how should it be spliced up and all 535 00:20:30,080 --> 00:20:36,240 of that can vary based on the action of 536 00:20:32,640 --> 00:20:38,080 either external stimuli or other things 537 00:20:36,240 --> 00:20:39,760 happening inside the cell, other genes 538 00:20:38,080 --> 00:20:41,440 expression and so on and so forth. So 539 00:20:39,760 --> 00:20:44,080 it's actually really complicated. The 540 00:20:41,440 --> 00:20:46,640 program is in the interactions between 541 00:20:44,080 --> 00:20:49,039 genes. Just as a like super simplest 542 00:20:46,640 --> 00:20:51,200 possible example in E.coli, there's a 543 00:20:49,039 --> 00:20:54,080 group of genes that make enzymes that 544 00:20:51,200 --> 00:20:55,600 break lactose down. Lactose is a sugar, 545 00:20:54,080 --> 00:20:57,440 but it's slightly more complicated 546 00:20:55,600 --> 00:21:00,159 sugar. E.lo would really rather just eat 547 00:20:57,440 --> 00:21:02,159 glucose lollies um rather than drink 548 00:21:00,159 --> 00:21:05,280 milk. 549 00:21:02,159 --> 00:21:07,360 But in the presence of lactose and and 550 00:21:05,280 --> 00:21:09,600 the absence of glucose, it can make an 551 00:21:07,360 --> 00:21:13,760 enzyme that breaks the lactose down into 552 00:21:09,600 --> 00:21:15,679 glucose and glactose. But anyway, um but 553 00:21:13,760 --> 00:21:17,200 there's no point it wasting its energy 554 00:21:15,679 --> 00:21:18,640 making those enzymes if they're just 555 00:21:17,200 --> 00:21:20,480 going to lie around the place being 556 00:21:18,640 --> 00:21:23,200 useless. So, it has this little bit of 557 00:21:20,480 --> 00:21:26,559 logic in its in its genetics that says, 558 00:21:23,200 --> 00:21:30,480 "Okay, if there's lactose hanging around 559 00:21:26,559 --> 00:21:33,120 and I don't have enough glucose, then 560 00:21:30,480 --> 00:21:34,640 make some enzymes to convert them." So, 561 00:21:33,120 --> 00:21:37,280 it looks a bit like that, except being 562 00:21:34,640 --> 00:21:40,400 biology, it's all squishy. None of it is 563 00:21:37,280 --> 00:21:42,400 like just logically boolean or anything 564 00:21:40,400 --> 00:21:44,640 like that. It's all about it. It 565 00:21:42,400 --> 00:21:46,480 enhances it a bit, it represses it a 566 00:21:44,640 --> 00:21:47,919 bit. It's all very variable and and 567 00:21:46,480 --> 00:21:49,360 squishy. That's that's the way of these 568 00:21:47,919 --> 00:21:52,720 things. And this is like the simplest 569 00:21:49,360 --> 00:21:54,880 possible gene regulation. 570 00:21:52,720 --> 00:21:56,799 Most gene regulation maps looks more 571 00:21:54,880 --> 00:21:59,440 like something like that where you have 572 00:21:56,799 --> 00:22:01,520 very many genes enhancing or suppressing 573 00:21:59,440 --> 00:22:04,240 other genes, many outside influences 574 00:22:01,520 --> 00:22:07,039 coming in. Many many causes, many 575 00:22:04,240 --> 00:22:09,039 effects. It this is not happy 576 00:22:07,039 --> 00:22:11,840 programming territory. This is real 577 00:22:09,039 --> 00:22:13,840 spaghetti code stuff. And and even 578 00:22:11,840 --> 00:22:16,400 picking apart the simplest one of these 579 00:22:13,840 --> 00:22:19,120 relationships is extremely difficult, 580 00:22:16,400 --> 00:22:20,080 extremely complicated. So I I mean I 581 00:22:19,120 --> 00:22:22,640 think this is this is one of the 582 00:22:20,080 --> 00:22:24,480 frontiers of of biology at this point is 583 00:22:22,640 --> 00:22:26,159 understanding how these things interact 584 00:22:24,480 --> 00:22:27,679 and building these maps and trying to 585 00:22:26,159 --> 00:22:30,080 get an idea of what the hell is going on 586 00:22:27,679 --> 00:22:32,720 inside the cell. 587 00:22:30,080 --> 00:22:36,080 All right, one last little example about 588 00:22:32,720 --> 00:22:37,600 biology sort of existing things. So I 589 00:22:36,080 --> 00:22:42,480 talked before about all these different 590 00:22:37,600 --> 00:22:44,000 forms of life depending from one place. 591 00:22:42,480 --> 00:22:46,880 A really interesting thing happened 592 00:22:44,000 --> 00:22:49,919 about two billion years ago I think. Um 593 00:22:46,880 --> 00:22:53,440 yeah, two billion years ago where some 594 00:22:49,919 --> 00:22:57,280 particularly kind of entrepreneurial um 595 00:22:53,440 --> 00:22:59,520 archa absorbed some uh other aerobic 596 00:22:57,280 --> 00:23:01,600 bacteria and just put them to work. You 597 00:22:59,520 --> 00:23:04,240 know, they set up the first companies. 598 00:23:01,600 --> 00:23:05,760 Um they were the founders. So they they 599 00:23:04,240 --> 00:23:08,480 moved in, the other ones moved in. They 600 00:23:05,760 --> 00:23:10,240 got protection in return for basically 601 00:23:08,480 --> 00:23:12,960 providing a lot of energy to the cell 602 00:23:10,240 --> 00:23:16,080 that contained them. Um and you have 603 00:23:12,960 --> 00:23:18,320 this encapsulation going on. So inside a 604 00:23:16,080 --> 00:23:22,640 ukarote which is I I should just explain 605 00:23:18,320 --> 00:23:26,720 ukarotes includes animals and plants and 606 00:23:22,640 --> 00:23:29,120 um yeasts and um slime molds and all 607 00:23:26,720 --> 00:23:31,679 sorts of things and humans. So that's 608 00:23:29,120 --> 00:23:33,120 nice. Inside the ukariot cell we 609 00:23:31,679 --> 00:23:35,520 actually have a lot of encapsulation 610 00:23:33,120 --> 00:23:37,520 going on. We have a a nucleus which is 611 00:23:35,520 --> 00:23:39,600 like the central kind of most defended 612 00:23:37,520 --> 00:23:42,240 part of the cell. That's where the DNA 613 00:23:39,600 --> 00:23:44,880 lives. That's your kernel code. That's 614 00:23:42,240 --> 00:23:47,679 the most protected part. around that you 615 00:23:44,880 --> 00:23:49,600 have other like user space parts of the 616 00:23:47,679 --> 00:23:52,400 code in the cytoplasm and you also have 617 00:23:49,600 --> 00:23:54,640 these mitochondria which are running 618 00:23:52,400 --> 00:23:57,440 separately 619 00:23:54,640 --> 00:23:59,200 out there doing their jobs a little bit 620 00:23:57,440 --> 00:24:03,120 like a a container image or something 621 00:23:59,200 --> 00:24:04,640 like that that runs separately. Weirdly 622 00:24:03,120 --> 00:24:06,240 they're running an older version of the 623 00:24:04,640 --> 00:24:08,159 operating system. If you think about all 624 00:24:06,240 --> 00:24:10,640 of this translation table stuff and all 625 00:24:08,159 --> 00:24:12,159 that as an operating system for a cell, 626 00:24:10,640 --> 00:24:14,240 the mitochondria are still running the 627 00:24:12,159 --> 00:24:16,240 one they came with or some version of 628 00:24:14,240 --> 00:24:20,559 it. It uses a different slightly 629 00:24:16,240 --> 00:24:22,400 different encoding. It has its own um 630 00:24:20,559 --> 00:24:24,159 its own enzymes, its own everything 631 00:24:22,400 --> 00:24:26,400 inside that little encapsulation. They 632 00:24:24,159 --> 00:24:28,400 even replicate themselves as the cell 633 00:24:26,400 --> 00:24:29,840 replicates. They also replicate and send 634 00:24:28,400 --> 00:24:31,760 some copies this way and some copies 635 00:24:29,840 --> 00:24:33,360 that way. So they're they're in there 636 00:24:31,760 --> 00:24:36,559 and they're replicating along with the 637 00:24:33,360 --> 00:24:38,720 cell. um but within their own thing. 638 00:24:36,559 --> 00:24:40,799 Plants do even better. They did it 639 00:24:38,720 --> 00:24:42,799 twice. They also absorbed a 640 00:24:40,799 --> 00:24:44,880 cyanobacteria and that's how they do 641 00:24:42,799 --> 00:24:47,200 photosynthesis. So plants are actually 642 00:24:44,880 --> 00:24:48,960 way ahead of us. I believe there's a 643 00:24:47,200 --> 00:24:51,440 some algae that's done it three times 644 00:24:48,960 --> 00:24:54,320 and also has things called nitro plasts 645 00:24:51,440 --> 00:24:58,400 or something which are basically another 646 00:24:54,320 --> 00:25:00,080 thing. Oh wow, this is good. 647 00:24:58,400 --> 00:25:02,480 Um 648 00:25:00,080 --> 00:25:04,559 many not all ukarots are multisellular. 649 00:25:02,480 --> 00:25:06,159 uh specialization is done at runtime. We 650 00:25:04,559 --> 00:25:10,320 all have the same genome in every single 651 00:25:06,159 --> 00:25:12,240 cell in your body. Um at runtime, at at 652 00:25:10,320 --> 00:25:13,679 the time you are alive, your cells will 653 00:25:12,240 --> 00:25:15,200 work out what they're meant to be doing. 654 00:25:13,679 --> 00:25:17,360 They will shift into that mode. They 655 00:25:15,200 --> 00:25:19,200 will do that job. They communicate with 656 00:25:17,360 --> 00:25:21,679 each other, exchanging messages through 657 00:25:19,200 --> 00:25:23,840 those cell walls. If a cell is like a 658 00:25:21,679 --> 00:25:26,240 computer, then your body is like a whole 659 00:25:23,840 --> 00:25:29,039 network of computers interacting all the 660 00:25:26,240 --> 00:25:31,760 time. Um, even funnier, things like 661 00:25:29,039 --> 00:25:33,600 slime molds and some bacteria basically 662 00:25:31,760 --> 00:25:34,799 can form big groups when times are tough 663 00:25:33,600 --> 00:25:36,400 and then they can disperse into 664 00:25:34,799 --> 00:25:38,000 individual cells when they'd rather do 665 00:25:36,400 --> 00:25:40,559 that. They're very interesting that 666 00:25:38,000 --> 00:25:43,200 multisellular life turns out to be kind 667 00:25:40,559 --> 00:25:45,840 of yeah optional. All right, 668 00:25:43,200 --> 00:25:47,200 applications. Biology is really cool, 669 00:25:45,840 --> 00:25:49,840 but it's it's awesome that we can 670 00:25:47,200 --> 00:25:51,120 actually do cool things with it. Um, for 671 00:25:49,840 --> 00:25:53,600 example, we can look at genetic 672 00:25:51,120 --> 00:25:56,240 disorders, right? So, there are various 673 00:25:53,600 --> 00:25:58,080 genes. If that gene doesn't work as well 674 00:25:56,240 --> 00:26:00,159 as it should, you may have various 675 00:25:58,080 --> 00:26:02,640 problems with that function in your 676 00:26:00,159 --> 00:26:04,880 body. So for example, G6PD is an 677 00:26:02,640 --> 00:26:06,640 antioxidant. It makes a protein called 678 00:26:04,880 --> 00:26:09,679 G6PD. Very imaginative. Thank you, 679 00:26:06,640 --> 00:26:12,000 biologists. Uh which is an antioxidant. 680 00:26:09,679 --> 00:26:13,919 If it's faulty, you might get a thing 681 00:26:12,000 --> 00:26:16,480 called hemolytic anemia, which is where 682 00:26:13,919 --> 00:26:20,720 your red blood cells explode. Extremely 683 00:26:16,480 --> 00:26:23,039 metal, quite painful, I imagine. Um 684 00:26:20,720 --> 00:26:24,240 not a very pleasant thing to have. You 685 00:26:23,039 --> 00:26:26,400 probably don't know if you have it 686 00:26:24,240 --> 00:26:28,720 unless you eat the wrong thing and wake 687 00:26:26,400 --> 00:26:30,799 up in hospital. I imagine 688 00:26:28,720 --> 00:26:33,120 what we want to do is we want to find 689 00:26:30,799 --> 00:26:35,200 out about those variants before we have 690 00:26:33,120 --> 00:26:36,559 people lying around in the hospital. So 691 00:26:35,200 --> 00:26:38,640 the kind of experiments that I've been 692 00:26:36,559 --> 00:26:42,480 working on with people are taking genes 693 00:26:38,640 --> 00:26:44,640 like G6PD and altering them. Take every 694 00:26:42,480 --> 00:26:46,960 possible base in that thing and make 695 00:26:44,640 --> 00:26:50,240 every single onelet typo that could 696 00:26:46,960 --> 00:26:52,320 possibly happen in G6PD. Right? We run 697 00:26:50,240 --> 00:26:53,919 through those one-letter typos and we 698 00:26:52,320 --> 00:26:56,080 build a library of those one-letter 699 00:26:53,919 --> 00:26:58,640 typos. 700 00:26:56,080 --> 00:27:00,799 Then ask me about my yeasts. We insert 701 00:26:58,640 --> 00:27:02,640 them into yeasts. 702 00:27:00,799 --> 00:27:05,360 This is an awesome thing. A yeast, a 703 00:27:02,640 --> 00:27:07,039 yeast, a single cellular little thing 704 00:27:05,360 --> 00:27:10,400 you make beer with. This is brewer's 705 00:27:07,039 --> 00:27:12,480 yeast. Can run the software that runs a 706 00:27:10,400 --> 00:27:14,240 human just fine. They're on the same 707 00:27:12,480 --> 00:27:16,559 operating system. I think that is just 708 00:27:14,240 --> 00:27:19,039 stunning. The nice thing about yeasts is 709 00:27:16,559 --> 00:27:21,440 they double in population every hour and 710 00:27:19,039 --> 00:27:22,960 a half in ideal conditions and you can 711 00:27:21,440 --> 00:27:26,320 kill a billion of them before lunchtime 712 00:27:22,960 --> 00:27:28,080 and no one cares. Um, this is this is 713 00:27:26,320 --> 00:27:29,919 awesome for science because, you know, 714 00:27:28,080 --> 00:27:32,480 this is what we like to do. This is the 715 00:27:29,919 --> 00:27:34,240 experimental setup. It's really just a 716 00:27:32,480 --> 00:27:36,720 fancy home brewing setup called a 717 00:27:34,240 --> 00:27:38,799 tabidat that keeps the tabidity, the 718 00:27:36,720 --> 00:27:40,559 number of the population the same. The 719 00:27:38,799 --> 00:27:43,840 yeast think they're growing indefinitely 720 00:27:40,559 --> 00:27:45,679 and infinitely. Um, we feed them, then 721 00:27:43,840 --> 00:27:48,159 we torture them with bleach, and we see 722 00:27:45,679 --> 00:27:50,640 which ones survive and which ones don't. 723 00:27:48,159 --> 00:27:53,039 And from that, we can work out which of 724 00:27:50,640 --> 00:27:56,240 those variants we introduced really jam 725 00:27:53,039 --> 00:27:57,520 up the works, and which ones aren't that 726 00:27:56,240 --> 00:27:59,039 relevant, which ones don't really 727 00:27:57,520 --> 00:28:01,120 matter. 728 00:27:59,039 --> 00:28:02,720 We can compare them by looking at their 729 00:28:01,120 --> 00:28:04,640 population over time, and we can see 730 00:28:02,720 --> 00:28:07,120 that that some things really just fade 731 00:28:04,640 --> 00:28:08,320 out, some things thrive, 732 00:28:07,120 --> 00:28:10,000 some things are in the middle. They're 733 00:28:08,320 --> 00:28:12,080 kind of mediocre variants, which is even 734 00:28:10,000 --> 00:28:13,600 more interesting, I think. 735 00:28:12,080 --> 00:28:16,399 And what we can do from that, we can 736 00:28:13,600 --> 00:28:18,320 build up a map of those variants. And 737 00:28:16,399 --> 00:28:20,720 just to zoom in on that bottom graph 738 00:28:18,320 --> 00:28:22,320 there, this is the map showing the the 739 00:28:20,720 --> 00:28:23,919 blue ones and ones we were pretty sure 740 00:28:22,320 --> 00:28:25,279 were going to be duds. The red ones were 741 00:28:23,919 --> 00:28:27,679 ones we were pretty sure were going to 742 00:28:25,279 --> 00:28:29,919 be fine. The gray ones are like all of 743 00:28:27,679 --> 00:28:31,840 the population we tested. So we've been 744 00:28:29,919 --> 00:28:35,679 able to sort the population of neverbe 745 00:28:31,840 --> 00:28:38,080 saw before seen variants into good and 746 00:28:35,679 --> 00:28:39,520 bad and kind of mediocre which is really 747 00:28:38,080 --> 00:28:42,720 interesting and it has you know really 748 00:28:39,520 --> 00:28:44,320 good clinical meaning for people. 749 00:28:42,720 --> 00:28:45,520 Is this a thing I have to worry about? 750 00:28:44,320 --> 00:28:47,200 Is this not a thing I have to worry 751 00:28:45,520 --> 00:28:49,200 about. 752 00:28:47,200 --> 00:28:51,360 Other genes this has been done for are 753 00:28:49,200 --> 00:28:54,480 things like uh tumor suppressor genes 754 00:28:51,360 --> 00:28:58,080 and stuff like that. So a a a good or a 755 00:28:54,480 --> 00:29:00,640 less effective gene might deeply affect 756 00:28:58,080 --> 00:29:02,640 your odds of getting cancer in life, 757 00:29:00,640 --> 00:29:05,200 which is is a thing that is good to 758 00:29:02,640 --> 00:29:06,799 know. All right. And we can build up a 759 00:29:05,200 --> 00:29:08,480 map of what part of the protein looks 760 00:29:06,799 --> 00:29:10,080 like it does what, which is was what 761 00:29:08,480 --> 00:29:12,080 that is. If that looks incomprehensible 762 00:29:10,080 --> 00:29:14,320 to you, welcome to the club. I'm the 763 00:29:12,080 --> 00:29:17,120 numbers guy, not the squiggly drawings 764 00:29:14,320 --> 00:29:18,480 guy. So you know All right, I'm going to 765 00:29:17,120 --> 00:29:19,840 rush through to one more thing. I think 766 00:29:18,480 --> 00:29:23,919 I I'm just going to run through the 767 00:29:19,840 --> 00:29:25,279 question time. Sorry. Ask me later. Um, 768 00:29:23,919 --> 00:29:26,799 we want to be able to debug these 769 00:29:25,279 --> 00:29:28,799 things. A really complicated system, but 770 00:29:26,799 --> 00:29:31,600 we want to know how it works, right? 771 00:29:28,799 --> 00:29:34,399 Who's done hello world debugging? Print 772 00:29:31,600 --> 00:29:36,240 statement debugging. Yeah, it's awesome, 773 00:29:34,399 --> 00:29:37,679 isn't it? Sometimes you just have to. If 774 00:29:36,240 --> 00:29:38,960 you ever worked in embedded systems, 775 00:29:37,679 --> 00:29:42,159 though, you've discovered that bugs 776 00:29:38,960 --> 00:29:44,080 happen so quick that the first H doesn't 777 00:29:42,159 --> 00:29:45,520 even have time to come out of the UART 778 00:29:44,080 --> 00:29:47,440 before the whole thing crashes and 779 00:29:45,520 --> 00:29:50,000 restarts. So, what we do there is we 780 00:29:47,440 --> 00:29:52,080 light up LEDs instead. Right? LEDs light 781 00:29:50,000 --> 00:29:53,600 up one after another. Hell does this 782 00:29:52,080 --> 00:29:56,000 have to do with cellular biology? Well, 783 00:29:53,600 --> 00:29:58,320 jellyfish. Jellyfish 784 00:29:56,000 --> 00:30:00,880 flues. If you take a fluorescent 785 00:29:58,320 --> 00:30:03,279 jellyfish protein and you couple it to 786 00:30:00,880 --> 00:30:05,520 the gene you care about, then when that 787 00:30:03,279 --> 00:30:08,080 gene is expressed, that protein is also 788 00:30:05,520 --> 00:30:10,559 expressed and it glows. And now you can 789 00:30:08,080 --> 00:30:12,240 sort the cells by how much the protein 790 00:30:10,559 --> 00:30:14,240 was expressed, how much the gene was 791 00:30:12,240 --> 00:30:16,159 expressed. Sorry. You can sort them and 792 00:30:14,240 --> 00:30:17,679 and count them that way rather than just 793 00:30:16,159 --> 00:30:18,640 counting their population going up and 794 00:30:17,679 --> 00:30:20,640 down. And this is a real experiment. 795 00:30:18,640 --> 00:30:23,840 It's a technique called Vamps. And you 796 00:30:20,640 --> 00:30:26,640 can use this to to analyze what's going 797 00:30:23,840 --> 00:30:28,880 on in these in these cells. 798 00:30:26,640 --> 00:30:30,480 All right. This is another cool example. 799 00:30:28,880 --> 00:30:32,559 Who's got an email that looks a bit like 800 00:30:30,480 --> 00:30:34,880 this? Not one of those characters is 801 00:30:32,559 --> 00:30:37,440 ASKY. They're all random weird things 802 00:30:34,880 --> 00:30:38,960 from different languages. Um, they get 803 00:30:37,440 --> 00:30:40,399 past your spam filter because none of 804 00:30:38,960 --> 00:30:43,120 the words you were looking for are in 805 00:30:40,399 --> 00:30:46,960 there. You can do the same thing with 806 00:30:43,120 --> 00:30:48,640 DNA or with RNA. in messenger RNA, which 807 00:30:46,960 --> 00:30:50,080 you're all familiar with, I imagine by 808 00:30:48,640 --> 00:30:52,720 now. It's been on the news. It's very 809 00:30:50,080 --> 00:30:54,559 scary. It's not very scary. Um, it's in 810 00:30:52,720 --> 00:30:57,039 all your cells. It's it's the 811 00:30:54,559 --> 00:30:59,840 instructions for how to make a protein. 812 00:30:57,039 --> 00:31:01,360 Um, you can normally your body would 813 00:30:59,840 --> 00:31:04,559 filter that out. It would not accept 814 00:31:01,360 --> 00:31:06,640 that mRNA. But it's been discovered that 815 00:31:04,559 --> 00:31:08,480 if we just replace some of the bases 816 00:31:06,640 --> 00:31:10,399 with bases that look a lot like that but 817 00:31:08,480 --> 00:31:12,640 aren't quite the same, it gets past the 818 00:31:10,399 --> 00:31:14,960 spam filter but the cell will still read 819 00:31:12,640 --> 00:31:17,760 it and it will still go and make you 820 00:31:14,960 --> 00:31:20,559 COVID spike proteins that you you can 821 00:31:17,760 --> 00:31:22,000 then have an immune response to. 822 00:31:20,559 --> 00:31:24,000 This technique was pioneered and 823 00:31:22,000 --> 00:31:26,480 discovered largely by Catalyn Caraco who 824 00:31:24,000 --> 00:31:30,720 in the 80s was working on dual stranded 825 00:31:26,480 --> 00:31:33,440 RNA. Moved to the US under somewhat 826 00:31:30,720 --> 00:31:37,440 dubious visa circumstances went to on to 827 00:31:33,440 --> 00:31:38,960 research mRNA. Went on to work on RNA 828 00:31:37,440 --> 00:31:40,799 pharmaceuticals. I'm hurrying this up 829 00:31:38,960 --> 00:31:42,640 because we're getting close to the end. 830 00:31:40,799 --> 00:31:45,919 Including the pseudouodine technique was 831 00:31:42,640 --> 00:31:47,600 invented in 2005. um 832 00:31:45,919 --> 00:31:50,240 and then went on that went on to be the 833 00:31:47,600 --> 00:31:52,399 the mRNA vaccines we all know and many 834 00:31:50,240 --> 00:31:54,720 of us have experienced and she won a 835 00:31:52,399 --> 00:31:57,360 Nobel Prize in physiology and medicine 836 00:31:54,720 --> 00:31:59,600 um for that. This is my thank you slide 837 00:31:57,360 --> 00:32:00,960 for that. It's made my life a lot more 838 00:31:59,600 --> 00:32:03,360 pleasant and I'm sure a lot of other 839 00:32:00,960 --> 00:32:06,240 people too. mRNA vaccines are now being 840 00:32:03,360 --> 00:32:09,279 looked at for many other purposes. Oh 841 00:32:06,240 --> 00:32:10,720 yeah, I I ruined my own joke. Um many 842 00:32:09,279 --> 00:32:12,880 other purposes including cancer 843 00:32:10,720 --> 00:32:14,240 therapies and things like that. Since 844 00:32:12,880 --> 00:32:15,440 I've been talking about biology in this 845 00:32:14,240 --> 00:32:17,360 software conference, I want to move on 846 00:32:15,440 --> 00:32:20,080 and talk about politics. I mean, not 847 00:32:17,360 --> 00:32:21,679 really, but sort of, right? All of this 848 00:32:20,080 --> 00:32:23,440 stuff has only been possible because of 849 00:32:21,679 --> 00:32:24,799 open collaboration. This something 850 00:32:23,440 --> 00:32:26,799 something should be very dear to our 851 00:32:24,799 --> 00:32:28,480 hearts in the programming world as well. 852 00:32:26,799 --> 00:32:30,000 People send each other samples, people 853 00:32:28,480 --> 00:32:32,480 send each other fruit flies, people send 854 00:32:30,000 --> 00:32:34,559 each other sequencing data, people share 855 00:32:32,480 --> 00:32:36,399 and collaborate. And this stuff is not 856 00:32:34,559 --> 00:32:38,640 possible while hiding in your basement, 857 00:32:36,399 --> 00:32:41,200 not working with other people. 858 00:32:38,640 --> 00:32:43,039 The concept of this amunotherapy for 859 00:32:41,200 --> 00:32:45,519 cancer is incredibly important. A lot of 860 00:32:43,039 --> 00:32:47,360 these therapies are easier and cheaper 861 00:32:45,519 --> 00:32:48,880 as well than radiotherapies and stuff 862 00:32:47,360 --> 00:32:51,039 like that. So there's huge potential to 863 00:32:48,880 --> 00:32:52,720 rid the developing world of diseases 864 00:32:51,039 --> 00:32:54,559 like malaria that are horrible and 865 00:32:52,720 --> 00:32:56,000 terrible and all of that stuff. And 866 00:32:54,559 --> 00:32:58,080 really the applications of this stuff 867 00:32:56,000 --> 00:33:00,080 are bounded only by our imagination and 868 00:32:58,080 --> 00:33:02,480 our willingness to just have a go at 869 00:33:00,080 --> 00:33:04,320 stuff. Spend a couple of decades working 870 00:33:02,480 --> 00:33:07,279 on something that nobody else thinks is 871 00:33:04,320 --> 00:33:10,080 going to work. you might end up curing a 872 00:33:07,279 --> 00:33:11,519 pandemic. You never know. All right. 873 00:33:10,080 --> 00:33:13,840 Thank you very much. I think I've run 874 00:33:11,519 --> 00:33:16,399 through all my question time. Possibly, 875 00:33:13,840 --> 00:33:18,480 precisely, possibly not. So, thank you 876 00:33:16,399 --> 00:33:20,399 very much and please ask me questions 877 00:33:18,480 --> 00:33:23,399 after the the end of this session. Thank 878 00:33:20,399 --> 00:33:23,399 you.