1 00:00:00,420 --> 00:00:05,910 [Music] 2 00:00:11,200 --> 00:00:15,679 Thanks for coming. I'm Lily. Um, we'll 3 00:00:14,320 --> 00:00:18,000 talk about some falsehoods about 4 00:00:15,679 --> 00:00:20,880 reality. I chose this topic. I can't 5 00:00:18,000 --> 00:00:23,119 remember why. Um, and you know, I've 6 00:00:20,880 --> 00:00:25,199 been kicking myself for it, but as as we 7 00:00:23,119 --> 00:00:27,680 do Oh, 8 00:00:25,199 --> 00:00:31,039 and then I did that. 9 00:00:27,680 --> 00:00:35,200 We can edit this out in post, right? 10 00:00:31,039 --> 00:00:37,280 Anyway, hi, I'm Lily. Um, 11 00:00:35,200 --> 00:00:39,600 I am an information security engineer. 12 00:00:37,280 --> 00:00:41,280 I'm an exhistorian. I'm a synthesizer 13 00:00:39,600 --> 00:00:43,360 enthusiast. And I am a professionally 14 00:00:41,280 --> 00:00:45,920 anxious human. 15 00:00:43,360 --> 00:00:47,280 I like working at the edges of systems 16 00:00:45,920 --> 00:00:49,039 because, in my opinion, the most 17 00:00:47,280 --> 00:00:51,280 interesting things happen there. It's 18 00:00:49,039 --> 00:00:52,960 also where things tend to break down and 19 00:00:51,280 --> 00:00:54,879 where you can find the things that 20 00:00:52,960 --> 00:00:56,879 break. And every time there is another 21 00:00:54,879 --> 00:00:58,640 cycle of move fast and break things in 22 00:00:56,879 --> 00:01:00,719 the security space, we see a lot of the 23 00:00:58,640 --> 00:01:03,199 broken things. You may have noticed that 24 00:01:00,719 --> 00:01:05,600 we are in a cycle like that right now. 25 00:01:03,199 --> 00:01:08,320 And that means I'm seeing a lot of the 26 00:01:05,600 --> 00:01:10,159 broken aspects of projects and products 27 00:01:08,320 --> 00:01:12,880 and and what a good portion of them stem 28 00:01:10,159 --> 00:01:15,520 from in my view is a belief held by 29 00:01:12,880 --> 00:01:17,200 someone central to that work that the 30 00:01:15,520 --> 00:01:18,960 information that feeds and drives their 31 00:01:17,200 --> 00:01:20,880 systems is real in some kind of 32 00:01:18,960 --> 00:01:22,960 objective way. that their thing is 33 00:01:20,880 --> 00:01:24,640 naturally going to handle the complexity 34 00:01:22,960 --> 00:01:27,119 of the real world with computers because 35 00:01:24,640 --> 00:01:30,720 computers are very complicated and 36 00:01:27,119 --> 00:01:33,280 reality is simple. 37 00:01:30,720 --> 00:01:34,640 The core issue here is that reality is 38 00:01:33,280 --> 00:01:37,040 not literally possible to capture 39 00:01:34,640 --> 00:01:38,720 objectively, but we need to represent 40 00:01:37,040 --> 00:01:41,840 reality to our computers in order to do 41 00:01:38,720 --> 00:01:43,520 our jobs. So phrases like the map is not 42 00:01:41,840 --> 00:01:45,439 the territory, they're common, they're 43 00:01:43,520 --> 00:01:48,479 quotable because they come up so often 44 00:01:45,439 --> 00:01:50,799 in our work. And put simply, most of the 45 00:01:48,479 --> 00:01:52,880 work we do with with computers has some 46 00:01:50,799 --> 00:01:54,560 kind of relationship to things happening 47 00:01:52,880 --> 00:01:56,240 in pieces of the world somewhere along 48 00:01:54,560 --> 00:01:58,640 the line. And in order to deal with 49 00:01:56,240 --> 00:02:00,560 that, we have to find a way to represent 50 00:01:58,640 --> 00:02:02,079 the world to the computer in order to 51 00:02:00,560 --> 00:02:03,600 make sense of it. So we have to make 52 00:02:02,079 --> 00:02:05,520 decisions to turn those things into data 53 00:02:03,600 --> 00:02:07,200 sets and models and abstract systems 54 00:02:05,520 --> 00:02:09,520 that can be used to stand in for the 55 00:02:07,200 --> 00:02:11,520 things that we want to talk about and 56 00:02:09,520 --> 00:02:13,840 help us represent them and make new 57 00:02:11,520 --> 00:02:16,480 observations and test hypotheses about 58 00:02:13,840 --> 00:02:18,800 whatever this thing is. 59 00:02:16,480 --> 00:02:20,640 The falsehoods programmers believe in 60 00:02:18,800 --> 00:02:22,959 the title of this talk, falsehoods 61 00:02:20,640 --> 00:02:25,520 programmers believe about reality. Um, 62 00:02:22,959 --> 00:02:27,599 this is a longunning format for lists 63 00:02:25,520 --> 00:02:29,760 about things that seem simple to deal 64 00:02:27,599 --> 00:02:31,360 with on the surface and that get more 65 00:02:29,760 --> 00:02:33,680 complicated the more that you dig into 66 00:02:31,360 --> 00:02:35,920 them. There are many excellent lists and 67 00:02:33,680 --> 00:02:38,959 talks as well about how hard it is to 68 00:02:35,920 --> 00:02:43,040 handle every single edge case for names 69 00:02:38,959 --> 00:02:45,280 and time and geography and gender and 70 00:02:43,040 --> 00:02:47,519 email addresses and phone numbers and 71 00:02:45,280 --> 00:02:51,519 you know the list goes on. You can find 72 00:02:47,519 --> 00:02:53,680 a list of lists at that link. 73 00:02:51,519 --> 00:02:55,280 And this title is also sort of kind of a 74 00:02:53,680 --> 00:02:56,879 little bit clickbay to get you here 75 00:02:55,280 --> 00:02:59,840 because I suspect that a lot of you will 76 00:02:56,879 --> 00:03:01,280 know most of or parts of or some of all 77 00:02:59,840 --> 00:03:03,360 of this sort of stuff I'm going to talk 78 00:03:01,280 --> 00:03:06,159 about already. And if you haven't yet 79 00:03:03,360 --> 00:03:09,040 run into it, you will the longer that 80 00:03:06,159 --> 00:03:11,200 you go on doing this kind of work. But 81 00:03:09,040 --> 00:03:13,360 this talk is not just for the people who 82 00:03:11,200 --> 00:03:15,040 are in this room or the people who are 83 00:03:13,360 --> 00:03:16,959 watching it online live or in the 84 00:03:15,040 --> 00:03:19,120 future. It's also for the people who 85 00:03:16,959 --> 00:03:20,400 would never have chosen to watch this 86 00:03:19,120 --> 00:03:21,920 talk. 87 00:03:20,400 --> 00:03:23,920 It's to give all of you who did choose 88 00:03:21,920 --> 00:03:26,159 to watch this talk the more tools, all 89 00:03:23,920 --> 00:03:27,680 these tools that you can use to have 90 00:03:26,159 --> 00:03:29,840 these kinds of conversations with the 91 00:03:27,680 --> 00:03:32,080 people in your lives in both personal 92 00:03:29,840 --> 00:03:34,400 and professional capacities who have 93 00:03:32,080 --> 00:03:36,560 fallen into the idea that we can capture 94 00:03:34,400 --> 00:03:38,319 reality objectively and they're going 95 00:03:36,560 --> 00:03:41,280 ahead and building things based on this 96 00:03:38,319 --> 00:03:43,760 premise. And because reality is slippery 97 00:03:41,280 --> 00:03:46,400 and brains are weird, um sometimes the 98 00:03:43,760 --> 00:03:48,959 person who needs that reminder is you or 99 00:03:46,400 --> 00:03:52,440 me. We forget some of this in small ways 100 00:03:48,959 --> 00:03:52,440 all the time. 101 00:03:56,640 --> 00:04:00,879 So although the map is not the territory 102 00:03:58,959 --> 00:04:03,040 was a phrase that was coined, it it was 103 00:04:00,879 --> 00:04:06,000 a phrase that was coined in the 1930s, 104 00:04:03,040 --> 00:04:09,120 but it's only one expression of a very 105 00:04:06,000 --> 00:04:11,439 old observation. It's not by any means a 106 00:04:09,120 --> 00:04:13,599 new idea. 107 00:04:11,439 --> 00:04:14,959 It's there's been so much philosophy and 108 00:04:13,599 --> 00:04:16,479 literature and all kinds of other stuff 109 00:04:14,959 --> 00:04:17,680 about it that if I started to list it, 110 00:04:16,479 --> 00:04:20,400 we wouldn't be doing anything else for 111 00:04:17,680 --> 00:04:22,479 the rest of this time slot. I will have 112 00:04:20,400 --> 00:04:24,000 to gloss over things like semiotics, 113 00:04:22,479 --> 00:04:26,320 which I am just as disappointed about as 114 00:04:24,000 --> 00:04:28,160 you are. 115 00:04:26,320 --> 00:04:29,440 So, here are some works you can dip into 116 00:04:28,160 --> 00:04:31,520 later if you feel like going further 117 00:04:29,440 --> 00:04:34,080 down this rabbit hole. 118 00:04:31,520 --> 00:04:35,840 Anyway, this map territory thing has 119 00:04:34,080 --> 00:04:38,320 become increasingly loadbearing as we 120 00:04:35,840 --> 00:04:39,600 have made things more complex and as the 121 00:04:38,320 --> 00:04:41,199 pieces of the world that we're dealing 122 00:04:39,600 --> 00:04:43,040 with in our data get more and more 123 00:04:41,199 --> 00:04:45,040 nuanced. So, back in the time of the 124 00:04:43,040 --> 00:04:46,960 ancient Greeks, they had started to 125 00:04:45,040 --> 00:04:49,199 tease out some of these ideas, but they 126 00:04:46,960 --> 00:04:50,880 were often the kinds of musings that you 127 00:04:49,199 --> 00:04:52,639 would get at parties from people who had 128 00:04:50,880 --> 00:04:55,199 ingested some really interesting stuff 129 00:04:52,639 --> 00:04:57,280 most of the time. By the time we got to 130 00:04:55,199 --> 00:04:58,880 the industrial revolution and humanity 131 00:04:57,280 --> 00:05:00,960 started to try making and organizing 132 00:04:58,880 --> 00:05:04,080 things in a common way at some kind of 133 00:05:00,960 --> 00:05:06,000 scale, these little reality maps became 134 00:05:04,080 --> 00:05:08,800 way more important because they were 135 00:05:06,000 --> 00:05:11,039 necessary for anything to be organized 136 00:05:08,800 --> 00:05:12,320 or consistent when you were doing 137 00:05:11,039 --> 00:05:13,680 anything at the scale of like larger 138 00:05:12,320 --> 00:05:16,800 than the number of people who could fit 139 00:05:13,680 --> 00:05:18,160 in a town hall at the one time. And by 140 00:05:16,800 --> 00:05:20,400 the time that we had gotten to the point 141 00:05:18,160 --> 00:05:21,919 of making computers about it, we had 142 00:05:20,400 --> 00:05:23,440 made things nice and complicated for 143 00:05:21,919 --> 00:05:25,759 ourselves. We'd invented the concept of 144 00:05:23,440 --> 00:05:27,840 middle managers and our species was 145 00:05:25,759 --> 00:05:30,000 capable of screwing up a whole lot of 146 00:05:27,840 --> 00:05:31,840 things way faster. 147 00:05:30,000 --> 00:05:34,960 But in our current moment, we have 148 00:05:31,840 --> 00:05:36,560 dialed it up even further than even the 149 00:05:34,960 --> 00:05:38,960 ancient Greeks would ever have believed. 150 00:05:36,560 --> 00:05:42,000 We've taken perfectly good databases and 151 00:05:38,960 --> 00:05:43,919 we have given them anxiety. 152 00:05:42,000 --> 00:05:45,199 We have built large language models on 153 00:05:43,919 --> 00:05:47,039 the back of literally everything on the 154 00:05:45,199 --> 00:05:49,199 entire open web and quite a few things 155 00:05:47,039 --> 00:05:52,080 that aren't. We've developed algorithms 156 00:05:49,199 --> 00:05:54,160 that make decisions about people. um and 157 00:05:52,080 --> 00:05:55,840 what they will see, what they will hear 158 00:05:54,160 --> 00:05:57,919 as they go about their days. And we've 159 00:05:55,840 --> 00:05:59,759 based them on, you know, mass analytics 160 00:05:57,919 --> 00:06:01,600 gathering and physical senses and where 161 00:05:59,759 --> 00:06:05,280 you bought lunch yesterday and who you 162 00:06:01,600 --> 00:06:07,199 went to school with. So, we need to 163 00:06:05,280 --> 00:06:09,199 continue to make this observation and 164 00:06:07,199 --> 00:06:11,199 especially now because we are beyond big 165 00:06:09,199 --> 00:06:13,600 data and we're into the realm of sort of 166 00:06:11,199 --> 00:06:15,199 weekend Bernies and big data around to 167 00:06:13,600 --> 00:06:17,600 make decisions about the world and the 168 00:06:15,199 --> 00:06:18,800 people and the things in it. 169 00:06:17,600 --> 00:06:20,240 So, there are terms like, you know, 170 00:06:18,800 --> 00:06:22,000 digital twins and world models and 171 00:06:20,240 --> 00:06:25,280 ground truth. They get thrown around in 172 00:06:22,000 --> 00:06:28,880 so many places. And these terms have the 173 00:06:25,280 --> 00:06:30,400 effect of carrying on this idea that we 174 00:06:28,880 --> 00:06:32,560 are able to measure everything 175 00:06:30,400 --> 00:06:35,680 objectively and then probably make money 176 00:06:32,560 --> 00:06:38,000 from it. We've taken all of these maps 177 00:06:35,680 --> 00:06:39,440 from philosophy to industry to 178 00:06:38,000 --> 00:06:41,120 individually shaping the information 179 00:06:39,440 --> 00:06:43,280 that people receive on a daily basis. 180 00:06:41,120 --> 00:06:44,960 And this whole set of things is still 181 00:06:43,280 --> 00:06:47,520 not actually the territory. Not even 182 00:06:44,960 --> 00:06:49,520 close. Like just ask an LLM how many L's 183 00:06:47,520 --> 00:06:52,319 there are in surveillance capitalism. 184 00:06:49,520 --> 00:06:55,720 and then like ask it again if it's 185 00:06:52,319 --> 00:06:55,720 really sure. 186 00:06:59,360 --> 00:07:02,240 So let's talk about some of the thinking 187 00:07:00,880 --> 00:07:04,319 traps that we fall into every now and 188 00:07:02,240 --> 00:07:06,080 then. I've broken some of these 189 00:07:04,319 --> 00:07:08,639 fallacies up into a few statements that 190 00:07:06,080 --> 00:07:10,720 we can talk through like any structure 191 00:07:08,639 --> 00:07:12,160 we put over reality. The points I want 192 00:07:10,720 --> 00:07:13,280 to make as we talk about each of these 193 00:07:12,160 --> 00:07:14,560 statements are going to overlap with 194 00:07:13,280 --> 00:07:17,520 each other a little bit contradict each 195 00:07:14,560 --> 00:07:19,919 other in some places. um you know but 196 00:07:17,520 --> 00:07:21,199 having worked on this talk for so long I 197 00:07:19,919 --> 00:07:23,759 consider all of that to be kind of part 198 00:07:21,199 --> 00:07:27,840 of the fun that comes with the territory 199 00:07:23,759 --> 00:07:29,680 so to speak. So here we go. Surely the 200 00:07:27,840 --> 00:07:32,080 more data we have the more perfectly we 201 00:07:29,680 --> 00:07:34,880 can represent reality. 202 00:07:32,080 --> 00:07:36,160 So to an extent this is true. If you 203 00:07:34,880 --> 00:07:37,840 have one data source you have one view 204 00:07:36,160 --> 00:07:40,080 of the world. If you have two you have 205 00:07:37,840 --> 00:07:42,080 two views that allow you to approximate 206 00:07:40,080 --> 00:07:44,479 something more general about the world. 207 00:07:42,080 --> 00:07:46,000 And if you have 500 or 10 billion 208 00:07:44,479 --> 00:07:48,400 sources, then you probably end up with 209 00:07:46,000 --> 00:07:50,800 something that is increasingly nuanced, 210 00:07:48,400 --> 00:07:52,720 but it's still not reality. It's a bunch 211 00:07:50,800 --> 00:07:54,720 of different models of slices of reality 212 00:07:52,720 --> 00:07:57,520 with your interpretation of it laid over 213 00:07:54,720 --> 00:07:59,440 the top. Raw data as a term is an 214 00:07:57,520 --> 00:08:00,560 oxymoron. There's also a book with this 215 00:07:59,440 --> 00:08:02,879 title that goes into some of the 216 00:08:00,560 --> 00:08:04,479 examples about how this is so, why it is 217 00:08:02,879 --> 00:08:06,720 that way. 218 00:08:04,479 --> 00:08:09,120 All data is the result of someone's 219 00:08:06,720 --> 00:08:10,960 choices to measure something, how to 220 00:08:09,120 --> 00:08:13,280 measure it, what that measurement stands 221 00:08:10,960 --> 00:08:15,280 for, and so on. Raw data, I feel like, 222 00:08:13,280 --> 00:08:17,520 is more like raw milk, like it's 223 00:08:15,280 --> 00:08:19,199 unprocessed, unpasteurized, it's full of 224 00:08:17,520 --> 00:08:20,639 bugs. It needs some kind of processing 225 00:08:19,199 --> 00:08:22,639 to improve its quality and make it fit 226 00:08:20,639 --> 00:08:24,240 for consumption. There are plenty of 227 00:08:22,639 --> 00:08:26,400 things that cause problems with raw 228 00:08:24,240 --> 00:08:28,400 data, like interference and changing 229 00:08:26,400 --> 00:08:30,080 parameters and faulty sensors and faulty 230 00:08:28,400 --> 00:08:32,320 assumptions. 231 00:08:30,080 --> 00:08:34,240 Quality data sources are what really 232 00:08:32,320 --> 00:08:35,839 make the difference here. Things do tend 233 00:08:34,240 --> 00:08:37,599 to work better when you choose what you 234 00:08:35,839 --> 00:08:39,039 need for the purposes that you need it 235 00:08:37,599 --> 00:08:41,760 for and you build your models and 236 00:08:39,039 --> 00:08:43,440 systems with that purpose in mind. In 237 00:08:41,760 --> 00:08:45,279 the era of scaling everything and 238 00:08:43,440 --> 00:08:47,600 relentless growth and like line go up, 239 00:08:45,279 --> 00:08:51,040 it's almost a default assumption I feel 240 00:08:47,600 --> 00:08:53,279 that you know some places will work with 241 00:08:51,040 --> 00:08:56,160 the view that react that quality can be 242 00:08:53,279 --> 00:08:57,680 coaxed out of a model later and that the 243 00:08:56,160 --> 00:09:01,440 quantity of the sources is what matters 244 00:08:57,680 --> 00:09:03,279 the most and to an extent that works but 245 00:09:01,440 --> 00:09:05,120 it is also very brittle. You have to 246 00:09:03,279 --> 00:09:07,600 work with your model just so and finesse 247 00:09:05,120 --> 00:09:10,320 it just right to get the thing that you 248 00:09:07,600 --> 00:09:11,920 want. Many of us are probably intimately 249 00:09:10,320 --> 00:09:13,920 familiar with how badly this goes when 250 00:09:11,920 --> 00:09:16,240 someone at your workplace decides to 251 00:09:13,920 --> 00:09:17,839 wire up 20 years of SharePoint to an LLM 252 00:09:16,240 --> 00:09:19,279 to finally solve the problem of like 253 00:09:17,839 --> 00:09:21,360 nobody ever being able to find anything 254 00:09:19,279 --> 00:09:23,040 around here. And you know what's 255 00:09:21,360 --> 00:09:25,600 supposed to come out of that at the 256 00:09:23,040 --> 00:09:29,279 other end is insights. And what comes 257 00:09:25,600 --> 00:09:31,120 out is usually like an air horn noise. 258 00:09:29,279 --> 00:09:32,480 And to correct for it, we often end up 259 00:09:31,120 --> 00:09:33,680 having to write prompts that are so 260 00:09:32,480 --> 00:09:35,279 complex that it sounds like you're 261 00:09:33,680 --> 00:09:36,800 trying to outwit a genie by covering 262 00:09:35,279 --> 00:09:38,560 literally all the edge cases so it 263 00:09:36,800 --> 00:09:41,440 doesn't grant your wish in like the most 264 00:09:38,560 --> 00:09:43,120 cursed way possible. 265 00:09:41,440 --> 00:09:45,279 So many talks about this conference have 266 00:09:43,120 --> 00:09:47,760 been about the issues that exists in all 267 00:09:45,279 --> 00:09:49,120 data sets and models 268 00:09:47,760 --> 00:09:51,600 and particularly how Python helped them 269 00:09:49,120 --> 00:09:54,080 work with them. And this is a Python 270 00:09:51,600 --> 00:09:56,240 conference. So cool. So there's uh Ishad 271 00:09:54,080 --> 00:09:58,000 Zaman's talk about continuing about 272 00:09:56,240 --> 00:09:59,600 counting the neurons in mouse brains. 273 00:09:58,000 --> 00:10:01,600 That was a really cool one. Renee 274 00:09:59,600 --> 00:10:03,680 Noble's talk about matching student 275 00:10:01,600 --> 00:10:06,080 names in a register to what the students 276 00:10:03,680 --> 00:10:07,760 write down when they sign in and how 277 00:10:06,080 --> 00:10:09,360 even like multi-round approaches with a 278 00:10:07,760 --> 00:10:11,279 range of techniques and technologies 279 00:10:09,360 --> 00:10:13,519 couldn't match all of the names 100% 280 00:10:11,279 --> 00:10:15,200 correctly. And there was Dave Cole's 281 00:10:13,519 --> 00:10:17,120 talk about finding workable and 282 00:10:15,200 --> 00:10:18,480 realistic road trip routes for electric 283 00:10:17,120 --> 00:10:21,800 vehicles that accounted for all of the 284 00:10:18,480 --> 00:10:21,800 broken charges. 285 00:10:22,000 --> 00:10:26,959 The way um 286 00:10:24,800 --> 00:10:28,320 the way that many AI product companies 287 00:10:26,959 --> 00:10:30,160 market and encourage users to think 288 00:10:28,320 --> 00:10:32,640 about large language models is I think a 289 00:10:30,160 --> 00:10:34,399 pretty big example of this fallacy. It's 290 00:10:32,640 --> 00:10:36,640 not really the fault of the LLMs. The 291 00:10:34,399 --> 00:10:38,720 technology itself is not something that 292 00:10:36,640 --> 00:10:40,480 should be about being a mirror of the 293 00:10:38,720 --> 00:10:43,360 world. But it is very tempting and very 294 00:10:40,480 --> 00:10:44,560 profitable to fall into the idea that it 295 00:10:43,360 --> 00:10:45,760 should be so. Especially when you're 296 00:10:44,560 --> 00:10:47,600 told it's this general purpose 297 00:10:45,760 --> 00:10:49,920 everything machine. It's what I call the 298 00:10:47,600 --> 00:10:51,519 sparkle emoji effect. You put all of the 299 00:10:49,920 --> 00:10:53,920 LLM back features behind a little button 300 00:10:51,519 --> 00:10:56,320 with a sparkle emoji on it and that 301 00:10:53,920 --> 00:10:58,240 encourages the idea that it's magic and 302 00:10:56,320 --> 00:11:00,399 it's got a side dose of like it's very 303 00:10:58,240 --> 00:11:02,959 complicated. Don't think about it. It'll 304 00:11:00,399 --> 00:11:05,920 just work TM. 305 00:11:02,959 --> 00:11:07,440 And it's actually really important in 306 00:11:05,920 --> 00:11:09,200 deciding when to use an LLM 307 00:11:07,440 --> 00:11:10,560 appropriately that the users know the 308 00:11:09,200 --> 00:11:11,920 basics about what it actually does, 309 00:11:10,560 --> 00:11:13,120 which is mathematically predicting and 310 00:11:11,920 --> 00:11:14,959 outputting tokens based on the tokens 311 00:11:13,120 --> 00:11:17,279 that were input and what it doesn't do, 312 00:11:14,959 --> 00:11:19,279 which is think and reason in some kind 313 00:11:17,279 --> 00:11:21,279 of like objective and neutral fashion or 314 00:11:19,279 --> 00:11:23,600 at all. 315 00:11:21,279 --> 00:11:25,680 It feels to me like the root of this 316 00:11:23,600 --> 00:11:27,600 idea is like check it out. We've scraped 317 00:11:25,680 --> 00:11:29,200 all of the data from the entire internet 318 00:11:27,600 --> 00:11:31,279 and all of human everything and we are 319 00:11:29,200 --> 00:11:33,680 regurgitating it back to you. But that 320 00:11:31,279 --> 00:11:35,040 isn't actually everything. It's all 321 00:11:33,680 --> 00:11:37,200 written data that has been made 322 00:11:35,040 --> 00:11:38,560 available in a digital form which favors 323 00:11:37,200 --> 00:11:40,240 the written languages and styles and 324 00:11:38,560 --> 00:11:42,880 topics that are used by the groups of 325 00:11:40,240 --> 00:11:45,200 people who have the most access and the 326 00:11:42,880 --> 00:11:46,800 earliest access to the internet and to 327 00:11:45,200 --> 00:11:49,120 publication technologies in all of the 328 00:11:46,800 --> 00:11:50,880 years gone by. That is like a small 329 00:11:49,120 --> 00:11:52,959 slice of perspectives when we're talking 330 00:11:50,880 --> 00:11:55,440 about the scale of humanity. But there's 331 00:11:52,959 --> 00:11:57,200 also a small slice of what language is. 332 00:11:55,440 --> 00:11:59,680 We've had written language for maybe 333 00:11:57,200 --> 00:12:02,000 6,000 years, which is nothing compared 334 00:11:59,680 --> 00:12:03,680 to like the 100,000ish years that we 335 00:12:02,000 --> 00:12:06,800 have had language in general, like as a 336 00:12:03,680 --> 00:12:09,200 species. Written language can seem like 337 00:12:06,800 --> 00:12:11,680 the most important thing today. But 338 00:12:09,200 --> 00:12:13,920 language itself is also gestures and 339 00:12:11,680 --> 00:12:16,320 tone and posture and facial expression 340 00:12:13,920 --> 00:12:17,760 and timing. And in an over in an 341 00:12:16,320 --> 00:12:21,200 overwhelming number of cases, it is 342 00:12:17,760 --> 00:12:23,839 spoken and it is interactive. It assumes 343 00:12:21,200 --> 00:12:26,160 communities and societies and others at 344 00:12:23,839 --> 00:12:28,000 the other end. It is so bound up with 345 00:12:26,160 --> 00:12:29,920 our species that it has impacted the 346 00:12:28,000 --> 00:12:31,920 evolution of our hands and our faces and 347 00:12:29,920 --> 00:12:34,320 our larynxes and our limbs compared to 348 00:12:31,920 --> 00:12:35,839 other species. Um, you should really 349 00:12:34,320 --> 00:12:38,320 check out Steven Levenson's work on 350 00:12:35,839 --> 00:12:39,839 this. He's a linguist. He spent over 40 351 00:12:38,320 --> 00:12:41,440 years researching the intersection of 352 00:12:39,839 --> 00:12:45,279 linguistics and evolutionary biology. 353 00:12:41,440 --> 00:12:47,440 It's really cool. Anyway, when we take 354 00:12:45,279 --> 00:12:49,120 just the internet accessible written 355 00:12:47,440 --> 00:12:51,200 parts of some languages and we turn it 356 00:12:49,120 --> 00:12:53,279 into a chatbot and it is marketed like 357 00:12:51,200 --> 00:12:55,120 that as the basis of all like objective 358 00:12:53,279 --> 00:12:57,920 human knowledge and expression and 359 00:12:55,120 --> 00:12:59,600 everything, it really isn't. It is very 360 00:12:57,920 --> 00:13:01,360 good at natural language processing and 361 00:12:59,600 --> 00:13:03,279 like semantic search and it can be made 362 00:13:01,360 --> 00:13:06,880 to be partially reliable performing a 363 00:13:03,279 --> 00:13:09,680 subset of tasks if you hold it like that 364 00:13:06,880 --> 00:13:11,360 and don't use too many run-on sentences 365 00:13:09,680 --> 00:13:12,880 or low resource languages and you know 366 00:13:11,360 --> 00:13:15,279 maybe give it all the answers before you 367 00:13:12,880 --> 00:13:17,920 start. But it is not a perfect 368 00:13:15,279 --> 00:13:20,240 representation of reality and is also 369 00:13:17,920 --> 00:13:21,920 not a therapist or a romantic partner or 370 00:13:20,240 --> 00:13:23,680 an intern. 371 00:13:21,920 --> 00:13:25,040 can't even run a drive-thru because 372 00:13:23,680 --> 00:13:26,320 ingesting a bunch of training manuals on 373 00:13:25,040 --> 00:13:27,920 how to run a drive-through is not the 374 00:13:26,320 --> 00:13:29,200 same as having the experience of being a 375 00:13:27,920 --> 00:13:30,959 teenage human being who knows that the 376 00:13:29,200 --> 00:13:32,720 audio quality sucks and the people 377 00:13:30,959 --> 00:13:34,560 placing the orders at the other end are 378 00:13:32,720 --> 00:13:36,000 probably tired and distracted and 379 00:13:34,560 --> 00:13:38,079 occasionally likely to mess with you for 380 00:13:36,000 --> 00:13:40,880 like no good reason. It is hard to 381 00:13:38,079 --> 00:13:42,079 measure that experience in data. 382 00:13:40,880 --> 00:13:43,920 And yeah, this is another one of these 383 00:13:42,079 --> 00:13:45,600 fallacies about data that assumes 384 00:13:43,920 --> 00:13:48,079 everything is measurable when that is 385 00:13:45,600 --> 00:13:51,440 not at all true. How do you measure 386 00:13:48,079 --> 00:13:53,519 justice or security or love? These 387 00:13:51,440 --> 00:13:55,200 things are really important, but they 388 00:13:53,519 --> 00:13:58,320 are not measurable. You end up having to 389 00:13:55,200 --> 00:13:59,760 use things as proxies for other things. 390 00:13:58,320 --> 00:14:01,120 Sometimes we get so caught up in the 391 00:13:59,760 --> 00:14:03,040 proxy that we forget that it isn't 392 00:14:01,120 --> 00:14:05,600 actually measuring the thing itself. So 393 00:14:03,040 --> 00:14:07,199 like speaking as a security person, the 394 00:14:05,600 --> 00:14:09,040 number of vulnerabilities that you patch 395 00:14:07,199 --> 00:14:10,320 in your app's code, it doesn't 396 00:14:09,040 --> 00:14:12,880 necessarily mean that your app is like 397 00:14:10,320 --> 00:14:14,880 more secure than it was before. It might 398 00:14:12,880 --> 00:14:16,560 still be perfectly possible for the app 399 00:14:14,880 --> 00:14:18,160 to be breached because a core 400 00:14:16,560 --> 00:14:19,440 developer's GitHub password was fished 401 00:14:18,160 --> 00:14:21,040 and the attackers can now control your 402 00:14:19,440 --> 00:14:23,360 build servers. It's all down to what we 403 00:14:21,040 --> 00:14:26,480 choose to observe or what we think we 404 00:14:23,360 --> 00:14:29,040 observe because reality doesn't look the 405 00:14:26,480 --> 00:14:31,440 same to everyone. 406 00:14:29,040 --> 00:14:33,519 For a start, not all of us can see the 407 00:14:31,440 --> 00:14:37,600 same color spectrum. Not all of us have 408 00:14:33,519 --> 00:14:40,800 vision. We all have we also have like 409 00:14:37,600 --> 00:14:42,639 opinions and histories that that shape 410 00:14:40,800 --> 00:14:44,959 us and what we think is good and what 411 00:14:42,639 --> 00:14:46,480 isn't. And those views can change 412 00:14:44,959 --> 00:14:48,160 constantly. Some of you watching this 413 00:14:46,480 --> 00:14:50,399 talk probably disagree with me about 414 00:14:48,160 --> 00:14:52,000 what I'm saying or how I'm saying it. 415 00:14:50,399 --> 00:14:53,920 Some of you might be bored and some of 416 00:14:52,000 --> 00:14:57,040 you might be fascinated or distracted by 417 00:14:53,920 --> 00:15:01,199 social media or by the ineffable nature 418 00:14:57,040 --> 00:15:02,639 of the universe, which fair enough. 419 00:15:01,199 --> 00:15:04,000 But we often fall into this trap of 420 00:15:02,639 --> 00:15:06,320 thinking that there's only one reality 421 00:15:04,000 --> 00:15:08,639 experienced by everyone because it is 422 00:15:06,320 --> 00:15:10,720 more or less a defense mechanism. It's 423 00:15:08,639 --> 00:15:12,320 not actually good for our brains to be 424 00:15:10,720 --> 00:15:14,320 consistently aware of the subjective 425 00:15:12,320 --> 00:15:15,440 nature of reality for like every waking 426 00:15:14,320 --> 00:15:17,120 moment. 427 00:15:15,440 --> 00:15:19,279 There's an amount that it can be 428 00:15:17,120 --> 00:15:20,959 liberating to realize that a lot of 429 00:15:19,279 --> 00:15:22,880 nonsense like I'm really bad at 430 00:15:20,959 --> 00:15:25,199 painting. That's a matter of perspective 431 00:15:22,880 --> 00:15:28,000 and taste and experience, but it makes 432 00:15:25,199 --> 00:15:30,079 it very hard to participate in wider 433 00:15:28,000 --> 00:15:32,639 functional systems with others or like 434 00:15:30,079 --> 00:15:34,639 to buy groceries or merge in traffic or 435 00:15:32,639 --> 00:15:36,560 hold a conversation. 436 00:15:34,639 --> 00:15:38,079 We need consensus to get anything done 437 00:15:36,560 --> 00:15:40,079 on a cooperative level. So, we 438 00:15:38,079 --> 00:15:43,440 approximate constantly and this is 439 00:15:40,079 --> 00:15:46,079 usually mostly good enough. Take web 440 00:15:43,440 --> 00:15:48,160 standards. There is nothing to stop you 441 00:15:46,079 --> 00:15:49,600 from writing a web app that uses get 442 00:15:48,160 --> 00:15:52,399 requests to make changes to the 443 00:15:49,600 --> 00:15:55,279 application state. Like it's really not 444 00:15:52,399 --> 00:15:56,880 advisable to do that. But in my time as 445 00:15:55,279 --> 00:15:58,639 a penetration tester, I have seen so 446 00:15:56,880 --> 00:16:00,480 many apps that are built this way. And 447 00:15:58,639 --> 00:16:02,240 it works, but not all of the other 448 00:16:00,480 --> 00:16:03,920 systems make the same assumptions about 449 00:16:02,240 --> 00:16:07,600 how that's going to work. And it gets 450 00:16:03,920 --> 00:16:09,440 weird really fast. And this is all human 451 00:16:07,600 --> 00:16:11,600 understanding. That's how it works. 452 00:16:09,440 --> 00:16:13,839 We've decided that words in a language 453 00:16:11,600 --> 00:16:15,040 represent things and ideas and mostly we 454 00:16:13,839 --> 00:16:17,040 share those understandings in order to 455 00:16:15,040 --> 00:16:19,680 communicate. For example, it's also the 456 00:16:17,040 --> 00:16:22,639 foundation of all science. So one paper, 457 00:16:19,680 --> 00:16:24,399 no matter how well researched or 458 00:16:22,639 --> 00:16:27,360 thoroughly documented, is just an 459 00:16:24,399 --> 00:16:29,519 expression of one experience. But if we 460 00:16:27,360 --> 00:16:31,440 do enough experience experiments in 461 00:16:29,519 --> 00:16:32,720 enough ways, we gather enough 462 00:16:31,440 --> 00:16:34,560 perspectives over time, we put them 463 00:16:32,720 --> 00:16:36,959 together, we discuss them robustly and 464 00:16:34,560 --> 00:16:38,560 widely among a diverse community, and we 465 00:16:36,959 --> 00:16:40,800 constantly search for new input, we 466 00:16:38,560 --> 00:16:42,480 approach like scientific consensus, 467 00:16:40,800 --> 00:16:44,320 which tells us what is more likely to be 468 00:16:42,480 --> 00:16:46,399 happening in the world around us and is 469 00:16:44,320 --> 00:16:49,199 a pretty solid base to build additional 470 00:16:46,399 --> 00:16:51,360 things on top of. Being aware of shared 471 00:16:49,199 --> 00:16:53,600 consensus becomes important when we're 472 00:16:51,360 --> 00:16:55,199 in charge of making decisions and or you 473 00:16:53,600 --> 00:16:58,160 know have the ability to influence 474 00:16:55,199 --> 00:17:00,000 decisions about how a reality a slice of 475 00:16:58,160 --> 00:17:04,040 reality will be interpreted for a data 476 00:17:00,000 --> 00:17:04,040 set or a model or a system. 477 00:17:05,520 --> 00:17:10,000 Our roles as observers also mean that we 478 00:17:08,160 --> 00:17:11,839 become describers and we bring our 479 00:17:10,000 --> 00:17:14,000 biases along whether we are aware of it 480 00:17:11,839 --> 00:17:16,240 or not. We are biased in the very act of 481 00:17:14,000 --> 00:17:18,720 choosing what to observe. We then make 482 00:17:16,240 --> 00:17:20,959 choices about how to describe the world 483 00:17:18,720 --> 00:17:22,480 to our model or our data set which will 484 00:17:20,959 --> 00:17:24,799 reflect our experiences and our 485 00:17:22,480 --> 00:17:27,120 education and our beliefs in some way or 486 00:17:24,799 --> 00:17:29,039 another. There is nothing too small that 487 00:17:27,120 --> 00:17:30,559 isn't subject to some kind of opinion. 488 00:17:29,039 --> 00:17:32,240 Anytime you change a column name at 489 00:17:30,559 --> 00:17:35,200 someone work, someone at work gets like 490 00:17:32,240 --> 00:17:36,880 mad about it. You've experienced this. 491 00:17:35,200 --> 00:17:38,640 There is no data set or model system 492 00:17:36,880 --> 00:17:40,799 that is not biased. You cannot remove 493 00:17:38,640 --> 00:17:42,960 bias. You can only continually be open 494 00:17:40,799 --> 00:17:45,520 to more quality input and see it as like 495 00:17:42,960 --> 00:17:48,320 an evolving consensus over time. Which 496 00:17:45,520 --> 00:17:49,919 leads us to the issue of categorization. 497 00:17:48,320 --> 00:17:51,440 There was a really spicy lightning talk 498 00:17:49,919 --> 00:17:53,120 at the conference yesterday about like 499 00:17:51,440 --> 00:17:56,400 phoggenetic bracketing. Every 500 00:17:53,120 --> 00:18:01,640 terrestrial vertebrae is a fish. 501 00:17:56,400 --> 00:18:01,640 So is is a hot dog a sandwich? 502 00:18:03,120 --> 00:18:08,720 No. 503 00:18:05,679 --> 00:18:10,240 A jaffers dumplings. Good question. 504 00:18:08,720 --> 00:18:11,600 I can see some nodding heads about the 505 00:18:10,240 --> 00:18:13,120 hot dogs and I've seen a lot of people 506 00:18:11,600 --> 00:18:14,799 shouting no as well. You're all really 507 00:18:13,120 --> 00:18:20,440 confident. 508 00:18:14,799 --> 00:18:20,440 All of you. Is cereal a soup? 509 00:18:21,919 --> 00:18:24,919 Cool. 510 00:18:26,320 --> 00:18:32,240 Reality has like no inherent structure. 511 00:18:29,840 --> 00:18:34,480 Any structure that we put on reality is 512 00:18:32,240 --> 00:18:36,799 artificial and it's imposed. And these 513 00:18:34,480 --> 00:18:38,320 structures can be really helpful. Like 514 00:18:36,799 --> 00:18:39,919 you can decide what you want to choose 515 00:18:38,320 --> 00:18:41,440 on a menu for example or where you would 516 00:18:39,919 --> 00:18:43,760 find things in a supermarket. But they 517 00:18:41,440 --> 00:18:46,640 are still made up and there will always 518 00:18:43,760 --> 00:18:48,320 be outliers and overlaps and gray areas 519 00:18:46,640 --> 00:18:49,760 and debates and I'd be very happy to get 520 00:18:48,320 --> 00:18:51,520 into some debates later because those 521 00:18:49,760 --> 00:18:52,720 are fun. 522 00:18:51,520 --> 00:18:54,960 Categories though they're really 523 00:18:52,720 --> 00:18:56,960 powerful and like all models they carry 524 00:18:54,960 --> 00:18:58,559 the power to shape the world and people 525 00:18:56,960 --> 00:19:00,240 use categories quite validly as 526 00:18:58,559 --> 00:19:02,480 shorthands for many things. As we said 527 00:19:00,240 --> 00:19:03,919 before we have to sort of like delegate 528 00:19:02,480 --> 00:19:06,640 delegate and trust each other and 529 00:19:03,919 --> 00:19:08,960 operate by some kind of consensus to get 530 00:19:06,640 --> 00:19:11,520 through life. But categories are 531 00:19:08,960 --> 00:19:13,840 arbitrary and sticky and political. And 532 00:19:11,520 --> 00:19:16,320 at their best, they are a way to 533 00:19:13,840 --> 00:19:18,080 understand something about the world. At 534 00:19:16,320 --> 00:19:20,400 their worst, they lead to oppressions 535 00:19:18,080 --> 00:19:21,919 and human rights abuses. And when people 536 00:19:20,400 --> 00:19:24,080 mistake them for objective truth, 537 00:19:21,919 --> 00:19:25,919 especially when people use it as a 538 00:19:24,080 --> 00:19:27,919 weapon to control the world and force it 539 00:19:25,919 --> 00:19:29,440 to fit that descriptor, it gets really 540 00:19:27,919 --> 00:19:30,799 bad. 541 00:19:29,440 --> 00:19:32,400 The anthropologist and political 542 00:19:30,799 --> 00:19:34,799 scientist James C. Scott wrote about 543 00:19:32,400 --> 00:19:36,240 this very extensively in his 1998 book, 544 00:19:34,799 --> 00:19:37,919 Seeing Like a State, which goes really 545 00:19:36,240 --> 00:19:39,679 deeply into how governments, companies, 546 00:19:37,919 --> 00:19:42,000 and other groups have attempted to 547 00:19:39,679 --> 00:19:43,760 impose categories on the world at scale. 548 00:19:42,000 --> 00:19:45,520 The goal, whether consciously or 549 00:19:43,760 --> 00:19:48,160 unconsciously expressed, is to make the 550 00:19:45,520 --> 00:19:49,520 world legible. I think it was 551 00:19:48,160 --> 00:19:51,360 Jordi Miller put it in his lightning 552 00:19:49,520 --> 00:19:53,440 talk yesterday, you alter reality to 553 00:19:51,360 --> 00:19:55,440 match the software. 554 00:19:53,440 --> 00:19:57,120 This makes a lot of things very easy to 555 00:19:55,440 --> 00:19:58,720 manage, much easier to manage, 556 00:19:57,120 --> 00:20:00,640 especially when it comes to deciding how 557 00:19:58,720 --> 00:20:02,720 to allocate resources or plan for the 558 00:20:00,640 --> 00:20:04,160 future at the scale of a country, how to 559 00:20:02,720 --> 00:20:06,240 ensure that everyone in that country has 560 00:20:04,160 --> 00:20:08,000 the best chance to receive basic medical 561 00:20:06,240 --> 00:20:09,600 care and an education and how to react 562 00:20:08,000 --> 00:20:11,760 and respond effectively and efficiently 563 00:20:09,600 --> 00:20:14,160 to disasters. But it doesn't always 564 00:20:11,760 --> 00:20:16,320 necessarily improve things. In making 565 00:20:14,160 --> 00:20:19,200 the world legible to a government or to 566 00:20:16,320 --> 00:20:20,799 a computer, it flattens and erases local 567 00:20:19,200 --> 00:20:22,480 knowledge and complexity. complexity 568 00:20:20,799 --> 00:20:25,200 that was often doing really important 569 00:20:22,480 --> 00:20:27,200 work in the context where it came up and 570 00:20:25,200 --> 00:20:29,039 it's often really hard to understand and 571 00:20:27,200 --> 00:20:30,799 measure outside of that context. So it 572 00:20:29,039 --> 00:20:32,480 gets discarded as unimportant or like 573 00:20:30,799 --> 00:20:34,080 annoying. 574 00:20:32,480 --> 00:20:36,320 Scott gives a lot of examples in his 575 00:20:34,080 --> 00:20:38,960 book including one about scientific 576 00:20:36,320 --> 00:20:42,080 forestry in late 18 late 18th century 577 00:20:38,960 --> 00:20:43,600 Prussia and Saxony. 578 00:20:42,080 --> 00:20:46,320 There were these diverse old growth 579 00:20:43,600 --> 00:20:48,720 forests in these areas and been used for 580 00:20:46,320 --> 00:20:51,440 centuries by the local communities from 581 00:20:48,720 --> 00:20:54,720 everything to medicine from everything 582 00:20:51,440 --> 00:20:56,240 from medicines to like animal fod to 583 00:20:54,720 --> 00:20:57,760 building materials and food like they 584 00:20:56,240 --> 00:20:59,520 were hunting grounds the whole thing. 585 00:20:57,760 --> 00:21:01,840 But when the state's wood supply started 586 00:20:59,520 --> 00:21:03,280 to decline, the government decided to 587 00:21:01,840 --> 00:21:05,440 systematically replace the forest with 588 00:21:03,280 --> 00:21:06,799 like the ordered rows of single species 589 00:21:05,440 --> 00:21:09,679 timber plantations. I think it was like 590 00:21:06,799 --> 00:21:12,960 a Norway spruce. And the state could now 591 00:21:09,679 --> 00:21:14,720 read the forest from like tables in an 592 00:21:12,960 --> 00:21:16,640 office because they planted them 593 00:21:14,720 --> 00:21:18,799 specifically in those ways. They got rid 594 00:21:16,640 --> 00:21:20,400 of the old one. This also enabled them 595 00:21:18,799 --> 00:21:23,280 to predict things like yields and 596 00:21:20,400 --> 00:21:25,440 revenues with really precise 597 00:21:23,280 --> 00:21:28,799 calculations. And for about 80 years, 598 00:21:25,440 --> 00:21:30,880 this approach was extremely successful. 599 00:21:28,799 --> 00:21:33,280 These plantations produced like uniform 600 00:21:30,880 --> 00:21:34,880 highquality timber with unprecedented 601 00:21:33,280 --> 00:21:36,799 efficiency. But once the first lot of 602 00:21:34,880 --> 00:21:39,200 trees had been harvested and the second 603 00:21:36,799 --> 00:21:42,960 rotation had been established, it became 604 00:21:39,200 --> 00:21:45,200 really clear that these like whole 605 00:21:42,960 --> 00:21:46,720 simplified forests were just collapsing. 606 00:21:45,200 --> 00:21:49,039 The second lot were not growing as well 607 00:21:46,720 --> 00:21:50,880 as the first lot did because the complex 608 00:21:49,039 --> 00:21:52,960 ecological relationships between the 609 00:21:50,880 --> 00:21:54,640 local communities and the diversity of 610 00:21:52,960 --> 00:21:56,880 the plant and animal species and the old 611 00:21:54,640 --> 00:21:59,440 growth forest had been sustaining the 612 00:21:56,880 --> 00:22:01,039 soil health and the nutrient cycling and 613 00:21:59,440 --> 00:22:03,360 pest resistance and all that sort of 614 00:22:01,039 --> 00:22:05,679 stuff. The new forests, especially that 615 00:22:03,360 --> 00:22:09,120 first one, had been living off what the 616 00:22:05,679 --> 00:22:12,240 old forests had built up over centuries. 617 00:22:09,120 --> 00:22:14,799 And once that inheritance like ran out, 618 00:22:12,240 --> 00:22:16,480 timber yields crashed, the forests died. 619 00:22:14,799 --> 00:22:18,559 So what looked like irrelevant 620 00:22:16,480 --> 00:22:20,159 complexity to state planners was 621 00:22:18,559 --> 00:22:22,880 actually the foundation that made the 622 00:22:20,159 --> 00:22:24,720 whole system work. It's not just forests 623 00:22:22,880 --> 00:22:26,400 though. It when powerful groups create 624 00:22:24,720 --> 00:22:28,480 categories. And when there are strong 625 00:22:26,400 --> 00:22:30,400 incentives to fit into those categories 626 00:22:28,480 --> 00:22:32,400 or serious disadvantages for standing 627 00:22:30,400 --> 00:22:34,240 out, people in places start to reshape 628 00:22:32,400 --> 00:22:35,919 themselves to match the description. 629 00:22:34,240 --> 00:22:37,360 Like there's a particular kind of 630 00:22:35,919 --> 00:22:38,640 aesthetic. It's, you know, industrial 631 00:22:37,360 --> 00:22:40,240 lighting and reclaimed wood and 632 00:22:38,640 --> 00:22:42,080 minimalist furniture. You know, the one 633 00:22:40,240 --> 00:22:43,520 it's um probably a couple of Mona 634 00:22:42,080 --> 00:22:47,039 plants, the whole thing. It's it's 635 00:22:43,520 --> 00:22:50,320 coffee shops and Airbnbs and, you know, 636 00:22:47,039 --> 00:22:53,520 all of that co-working spaces. 637 00:22:50,320 --> 00:22:55,120 Um, the Verge journalist Kyle Chika 638 00:22:53,520 --> 00:22:56,960 decided to call this Airspace. And this 639 00:22:55,120 --> 00:23:00,000 is back in 2016 when it was already like 640 00:22:56,960 --> 00:23:01,679 a this thing. It's everywhere. Why? And 641 00:23:00,000 --> 00:23:04,720 he needed this name for this sort of 642 00:23:01,679 --> 00:23:07,440 homogenized interior design vibe because 643 00:23:04,720 --> 00:23:09,039 it had been popping up in his view all 644 00:23:07,440 --> 00:23:11,760 over the planet without any connection 645 00:23:09,039 --> 00:23:14,880 to local context. There was it looked 646 00:23:11,760 --> 00:23:18,880 the same in Barcelona as it did in New 647 00:23:14,880 --> 00:23:21,840 York as it did in Istanbul as it did in 648 00:23:18,880 --> 00:23:23,919 Shanghai like everywhere. Um and he 649 00:23:21,840 --> 00:23:25,840 found that this aesthetic once he 650 00:23:23,919 --> 00:23:27,360 started to dig into it was pretty much 651 00:23:25,840 --> 00:23:29,600 down to platform based algorithmic 652 00:23:27,360 --> 00:23:31,520 trends. These kind of spaces photograph 653 00:23:29,600 --> 00:23:33,200 really well for Instagram and they 654 00:23:31,520 --> 00:23:34,720 appeal to the demographic that uses 655 00:23:33,200 --> 00:23:36,720 these kinds of platforms and they get 656 00:23:34,720 --> 00:23:38,400 recommended by the same apps that guide 657 00:23:36,720 --> 00:23:39,919 people to discover places to visit and 658 00:23:38,400 --> 00:23:41,520 so more people made the things look like 659 00:23:39,919 --> 00:23:43,919 that so that they would also get 660 00:23:41,520 --> 00:23:46,000 favored. Spotify's genre categories work 661 00:23:43,919 --> 00:23:47,840 in a similar way for music. Spotify will 662 00:23:46,000 --> 00:23:49,280 commission and generate its own music 663 00:23:47,840 --> 00:23:51,600 specifically to fit the vibe of a 664 00:23:49,280 --> 00:23:53,600 particular playlist. But as the 665 00:23:51,600 --> 00:23:55,520 journalist Liz P documented in her 666 00:23:53,600 --> 00:23:57,120 recent book Mood Machine, the platform 667 00:23:55,520 --> 00:23:58,559 also uses their artist analytic 668 00:23:57,120 --> 00:24:00,960 dashboards. the ones that the artists 669 00:23:58,559 --> 00:24:02,640 themselves are using to nudge the actual 670 00:24:00,960 --> 00:24:04,799 musicians to write and market their 671 00:24:02,640 --> 00:24:06,880 music to fit more closely into these 672 00:24:04,799 --> 00:24:08,400 categories with the aim of increasing 673 00:24:06,880 --> 00:24:10,480 their chances of ending up on these 674 00:24:08,400 --> 00:24:12,480 playlists which has become essential for 675 00:24:10,480 --> 00:24:15,600 any kind of visibility and therefore 676 00:24:12,480 --> 00:24:17,600 income on that platform. So if we make 677 00:24:15,600 --> 00:24:21,960 reality fit this we get sort of like 678 00:24:17,600 --> 00:24:21,960 cultural equivalent of beige. 679 00:24:24,720 --> 00:24:30,320 This one is the main trap. It's not 680 00:24:26,960 --> 00:24:32,480 actually useful or desirable to do this. 681 00:24:30,320 --> 00:24:34,320 For one thing, we'll be here literally 682 00:24:32,480 --> 00:24:36,080 forever if we try. As I said earlier, 683 00:24:34,320 --> 00:24:37,360 reality has no inherent structure. 684 00:24:36,080 --> 00:24:39,440 There's only the structures that we give 685 00:24:37,360 --> 00:24:41,279 it. This mindset also leads to 686 00:24:39,440 --> 00:24:42,799 overcapture of data, the indiscriminate 687 00:24:41,279 --> 00:24:44,400 inclusion of lower quality and 688 00:24:42,799 --> 00:24:46,240 extraneous data, bad decisions, 689 00:24:44,400 --> 00:24:48,960 inflexible systems, surveillance, 690 00:24:46,240 --> 00:24:50,480 capitalism. For developers, this can 691 00:24:48,960 --> 00:24:52,240 also be really paralyzing. I've 692 00:24:50,480 --> 00:24:54,240 definitely started projects only to get 693 00:24:52,240 --> 00:24:55,840 completely hung up on trying to design a 694 00:24:54,240 --> 00:24:58,080 system that will gracefully handle as 695 00:24:55,840 --> 00:24:59,919 many of my potential future use cases as 696 00:24:58,080 --> 00:25:01,600 possible. And sometimes I get so stuck 697 00:24:59,919 --> 00:25:03,520 there that I never end up building 698 00:25:01,600 --> 00:25:05,039 anything. 699 00:25:03,520 --> 00:25:06,400 This doesn't mean that it is useless to 700 00:25:05,039 --> 00:25:07,840 make models. We just have to be aware 701 00:25:06,400 --> 00:25:09,520 that what we are doing is making a 702 00:25:07,840 --> 00:25:11,760 representation of the world for our 703 00:25:09,520 --> 00:25:14,080 specific purpose. And once you get 704 00:25:11,760 --> 00:25:16,640 there, that can be pretty liberating. 705 00:25:14,080 --> 00:25:18,640 Once I remember that this thing only has 706 00:25:16,640 --> 00:25:20,159 to work for my purpose right now and I 707 00:25:18,640 --> 00:25:22,240 take the trouble to articulate what the 708 00:25:20,159 --> 00:25:24,080 purpose actually is, I am able to move 709 00:25:22,240 --> 00:25:25,600 more confidently and I'm much less 710 00:25:24,080 --> 00:25:27,679 precious about changing things later if 711 00:25:25,600 --> 00:25:29,039 my current tools don't fit the job. So 712 00:25:27,679 --> 00:25:31,039 let's talk about abstractions. We get 713 00:25:29,039 --> 00:25:34,559 into some principles for how to use them 714 00:25:31,039 --> 00:25:37,120 mindfully and well. 715 00:25:34,559 --> 00:25:39,840 Abstractions are tools, not mirrors. 716 00:25:37,120 --> 00:25:42,400 Your tools are there to be useful, not 717 00:25:39,840 --> 00:25:44,159 perfect. As Bill Kent says, he wrote a 718 00:25:42,400 --> 00:25:45,279 book called Data and Reality. The second 719 00:25:44,159 --> 00:25:48,159 edition is the one you should really 720 00:25:45,279 --> 00:25:51,200 check out. It's 1978, I think, and it is 721 00:25:48,159 --> 00:25:53,760 very good. Um, there is an important 722 00:25:51,200 --> 00:25:55,679 difference between truth and utility. We 723 00:25:53,760 --> 00:25:56,720 want things that are useful, at least in 724 00:25:55,679 --> 00:25:58,960 this business, otherwise we'd be 725 00:25:56,720 --> 00:26:01,760 philosophers and artists. 726 00:25:58,960 --> 00:26:03,600 We are only working with computers. We 727 00:26:01,760 --> 00:26:04,960 have a task to do with what we're 728 00:26:03,600 --> 00:26:07,120 coding. And if we don't, we should ask 729 00:26:04,960 --> 00:26:11,480 ourselves what we're doing. We need to 730 00:26:07,120 --> 00:26:11,480 leave the hard stuff to the artists. 731 00:26:12,000 --> 00:26:16,400 We should design with context and with 732 00:26:13,760 --> 00:26:19,120 purpose. Every abstraction should start 733 00:26:16,400 --> 00:26:21,039 a clear start with a clear sense of why 734 00:26:19,120 --> 00:26:23,360 you are building it and what you need it 735 00:26:21,039 --> 00:26:25,679 to do. Thinking in systems by Dana 736 00:26:23,360 --> 00:26:27,279 Meadows is full of so many quotes that I 737 00:26:25,679 --> 00:26:29,600 could have put in many places in this 738 00:26:27,279 --> 00:26:32,000 talk. Um but I've restrained myself to 739 00:26:29,600 --> 00:26:33,600 this one. There are no separate systems. 740 00:26:32,000 --> 00:26:35,679 The world is a continuum. Where to draw 741 00:26:33,600 --> 00:26:37,520 a boundary around a system depends on 742 00:26:35,679 --> 00:26:39,520 the purpose of the discussion and the 743 00:26:37,520 --> 00:26:40,880 questions that we want to ask. This 744 00:26:39,520 --> 00:26:42,640 means being explicit about your 745 00:26:40,880 --> 00:26:44,559 constraints. What problem are you 746 00:26:42,640 --> 00:26:47,039 solving for whom? Under what conditions? 747 00:26:44,559 --> 00:26:48,640 What are you optimizing for? The same 748 00:26:47,039 --> 00:26:49,919 piece of reality might need completely 749 00:26:48,640 --> 00:26:51,440 different abstractions depending on 750 00:26:49,919 --> 00:26:53,520 whether you're trying to track inventory 751 00:26:51,440 --> 00:26:55,760 or understand user behavior or plan a 752 00:26:53,520 --> 00:26:57,840 budget. Designed for your actual 753 00:26:55,760 --> 00:27:00,000 purpose, not for some imagined perfect 754 00:26:57,840 --> 00:27:02,960 system. 755 00:27:00,000 --> 00:27:05,600 And abstractions have consequences. This 756 00:27:02,960 --> 00:27:08,000 doesn't mean that we magically avoid the 757 00:27:05,600 --> 00:27:09,360 inherent issues with measuring the 758 00:27:08,000 --> 00:27:10,480 world. Just because we acknowledge that 759 00:27:09,360 --> 00:27:11,760 we're working with abstractions of 760 00:27:10,480 --> 00:27:13,200 reality, that doesn't mean that they 761 00:27:11,760 --> 00:27:15,120 don't have consequences. We've covered 762 00:27:13,200 --> 00:27:17,279 this off. The choices we make about what 763 00:27:15,120 --> 00:27:19,120 we measure and how we name it have an 764 00:27:17,279 --> 00:27:21,760 impact on the world. There are both 765 00:27:19,120 --> 00:27:24,559 technical decisions and also sometimes 766 00:27:21,760 --> 00:27:26,320 decisions that affect people's lives and 767 00:27:24,559 --> 00:27:28,320 opportunities and access to resources, 768 00:27:26,320 --> 00:27:29,520 which means we have to be thoughtful. We 769 00:27:28,320 --> 00:27:31,840 should ask when we're designing our 770 00:27:29,520 --> 00:27:34,400 tools, who benefits from this design? 771 00:27:31,840 --> 00:27:36,559 who might be excluded or misrepresented, 772 00:27:34,400 --> 00:27:39,520 what behavior is this design likely to 773 00:27:36,559 --> 00:27:41,039 encourage? And sometimes the answer is 774 00:27:39,520 --> 00:27:42,720 it's fine. We're just trying to track 775 00:27:41,039 --> 00:27:44,960 what laptops in our inventory need 776 00:27:42,720 --> 00:27:47,760 repairs. But sometimes it is bigger than 777 00:27:44,960 --> 00:27:49,360 that. So yeah, we're we're never going 778 00:27:47,760 --> 00:27:50,799 to be able to build perfect 779 00:27:49,360 --> 00:27:52,400 representations of reality. We're going 780 00:27:50,799 --> 00:27:54,320 to keep running into edge cases and 781 00:27:52,400 --> 00:27:55,760 categories that don't quite fit and data 782 00:27:54,320 --> 00:27:58,159 that tells us conflicting things about 783 00:27:55,760 --> 00:28:00,640 the world. And I am really here to say 784 00:27:58,159 --> 00:28:02,240 basically that that's that's okay. Once 785 00:28:00,640 --> 00:28:03,679 we stop aiming to capture reality 786 00:28:02,240 --> 00:28:05,919 completely, we can start building things 787 00:28:03,679 --> 00:28:07,919 that are useful. We can focus on solving 788 00:28:05,919 --> 00:28:09,279 problems instead of getting stuck in 789 00:28:07,919 --> 00:28:11,200 debates about whether a hot dog is a 790 00:28:09,279 --> 00:28:13,600 sandwich, unless that is your thing. I'm 791 00:28:11,200 --> 00:28:15,200 not here to judge. We can build 792 00:28:13,600 --> 00:28:16,720 abstractions that work well enough for 793 00:28:15,200 --> 00:28:18,240 their purposes, that acknowledge their 794 00:28:16,720 --> 00:28:20,720 limitations, that treat the people who 795 00:28:18,240 --> 00:28:23,440 have to live with them with respect. The 796 00:28:20,720 --> 00:28:25,919 map isn't the territory, but a decent 797 00:28:23,440 --> 00:28:28,399 map built mindfully and with a purpose 798 00:28:25,919 --> 00:28:29,380 can get you where you need to go. 799 00:28:28,399 --> 00:28:39,949 Thanks, 800 00:28:29,380 --> 00:28:39,949 [Applause]