1 00:00:00,480 --> 00:00:03,480 foreign 2 00:00:09,059 --> 00:00:13,799 welcome back hope you enjoyed your break 3 00:00:11,000 --> 00:00:15,599 our first speaker for this session will 4 00:00:13,799 --> 00:00:17,520 be 5 00:00:15,599 --> 00:00:19,199 David rollinson 6 00:00:17,520 --> 00:00:20,760 he will be talking he will be giving an 7 00:00:19,199 --> 00:00:24,910 introduction to causal inference with 8 00:00:20,760 --> 00:00:32,100 python I'd like to welcome you on stage 9 00:00:24,910 --> 00:00:32,100 [Applause] 10 00:00:33,660 --> 00:00:37,500 okay 11 00:00:34,280 --> 00:00:39,719 hopefully that's going to pop up 12 00:00:37,500 --> 00:00:42,300 all right thanks um yeah great to be 13 00:00:39,719 --> 00:00:44,280 here and um I'm really excited to be 14 00:00:42,300 --> 00:00:46,200 talking to you today about causal 15 00:00:44,280 --> 00:00:48,059 inference with python it's a topic that 16 00:00:46,200 --> 00:00:49,739 I've been increasingly passionate about 17 00:00:48,059 --> 00:00:51,300 over the last few years because I've 18 00:00:49,739 --> 00:00:52,920 seen sort of how much it can really 19 00:00:51,300 --> 00:00:55,680 impact the way that we do data science 20 00:00:52,920 --> 00:00:57,059 and machine learning in Industry then 21 00:00:55,680 --> 00:00:58,739 this Talk's going to kind of have two 22 00:00:57,059 --> 00:01:00,780 parts the first part I'm going to try 23 00:00:58,739 --> 00:01:02,520 and convince you that you also should be 24 00:01:00,780 --> 00:01:05,760 interested and passionate about cause of 25 00:01:02,520 --> 00:01:07,200 inference and more broadly causality and 26 00:01:05,760 --> 00:01:09,240 then in the second part we're going to 27 00:01:07,200 --> 00:01:11,880 work through a simple example with a 28 00:01:09,240 --> 00:01:14,100 python Library called do wise which 29 00:01:11,880 --> 00:01:15,360 enables you to calculate you know cause 30 00:01:14,100 --> 00:01:17,159 and effect 31 00:01:15,360 --> 00:01:19,500 um in Python 32 00:01:17,159 --> 00:01:20,659 so as soon as you start sort of looking 33 00:01:19,500 --> 00:01:23,400 into 34 00:01:20,659 --> 00:01:25,200 causal inference you'll encounter this 35 00:01:23,400 --> 00:01:26,759 term causality and at first it seems 36 00:01:25,200 --> 00:01:28,799 like it's a bit of a sort of nebulous 37 00:01:26,759 --> 00:01:30,600 concept and really it it kind of doesn't 38 00:01:28,799 --> 00:01:33,240 have a very specific definition it sort 39 00:01:30,600 --> 00:01:36,060 of encompasses a range of topics around 40 00:01:33,240 --> 00:01:38,640 the science of cause and effect and this 41 00:01:36,060 --> 00:01:40,200 is a topic that actually is everywhere 42 00:01:38,640 --> 00:01:42,479 there are many questions that you'll 43 00:01:40,200 --> 00:01:45,479 encounter in you know a data science 44 00:01:42,479 --> 00:01:46,680 role which are inherently causal and you 45 00:01:45,479 --> 00:01:49,619 know if you look out for the the words 46 00:01:46,680 --> 00:01:52,140 in red here like what would happen if or 47 00:01:49,619 --> 00:01:54,180 why did this happen like I call these 48 00:01:52,140 --> 00:01:55,799 questions inherently causal because to 49 00:01:54,180 --> 00:01:58,259 answer them properly you really need an 50 00:01:55,799 --> 00:02:00,720 understanding of causation not just 51 00:01:58,259 --> 00:02:02,820 Association or correlation 52 00:02:00,720 --> 00:02:04,079 what's interesting is that most of the 53 00:02:02,820 --> 00:02:06,899 machine learning models that you'll 54 00:02:04,079 --> 00:02:08,280 encounter are not explicitly causal even 55 00:02:06,899 --> 00:02:10,200 when they're trying to address these 56 00:02:08,280 --> 00:02:12,660 causal questions 57 00:02:10,200 --> 00:02:14,220 and one of the things I often encounter 58 00:02:12,660 --> 00:02:16,500 talking to sort of particularly machine 59 00:02:14,220 --> 00:02:18,440 learning and AI people is well can't you 60 00:02:16,500 --> 00:02:21,239 do predictions with an associative model 61 00:02:18,440 --> 00:02:22,860 and it's true you can I mean that that's 62 00:02:21,239 --> 00:02:25,680 one of their sort of core capabilities 63 00:02:22,860 --> 00:02:28,080 but what's different is with a causal 64 00:02:25,680 --> 00:02:30,959 model you're more likely to get accurate 65 00:02:28,080 --> 00:02:33,599 answers when you are asking questions in 66 00:02:30,959 --> 00:02:34,980 a changed context so the statistics of 67 00:02:33,599 --> 00:02:36,540 the data that you're going to use the 68 00:02:34,980 --> 00:02:38,580 model in are going to be different for 69 00:02:36,540 --> 00:02:41,099 some reason and that difference can 70 00:02:38,580 --> 00:02:42,780 disrupt an associative model but 71 00:02:41,099 --> 00:02:45,480 hopefully a causal model will be able to 72 00:02:42,780 --> 00:02:47,519 handle those disruptions because of the 73 00:02:45,480 --> 00:02:48,780 causal modeling so for example if you're 74 00:02:47,519 --> 00:02:51,360 going to make an intervention which is 75 00:02:48,780 --> 00:02:53,099 like a change to the system then that's 76 00:02:51,360 --> 00:02:54,420 going to change the statistics and you 77 00:02:53,099 --> 00:02:55,920 want the model to be able to deal with 78 00:02:54,420 --> 00:02:58,440 that then a causal model would be 79 00:02:55,920 --> 00:02:59,819 preferable and of course it may be not 80 00:02:58,440 --> 00:03:01,200 an intervention that you're making but 81 00:02:59,819 --> 00:03:03,060 an intervention that you can't control 82 00:03:01,200 --> 00:03:04,319 such as climate change you're aware that 83 00:03:03,060 --> 00:03:05,640 it's coming and you can sort of have a 84 00:03:04,319 --> 00:03:07,260 bit of an understanding of the effects 85 00:03:05,640 --> 00:03:09,019 that it might have and you want to model 86 00:03:07,260 --> 00:03:12,599 that 87 00:03:09,019 --> 00:03:14,040 in uh in a lot of research that we see 88 00:03:12,599 --> 00:03:17,940 particularly sort of observational 89 00:03:14,040 --> 00:03:20,819 studies we often see a statement like 90 00:03:17,940 --> 00:03:24,060 you know doing X May reduce the risk of 91 00:03:20,819 --> 00:03:25,800 why and you know this this guy 92 00:03:24,060 --> 00:03:27,120 um on Twitter or X or whatever it is 93 00:03:25,800 --> 00:03:29,459 this week 94 00:03:27,120 --> 00:03:31,319 um he he said you know this is a 95 00:03:29,459 --> 00:03:32,640 explicitly causal statement but then 96 00:03:31,319 --> 00:03:34,500 later on in the paper you've got a 97 00:03:32,640 --> 00:03:36,360 statement like oh well you know this is 98 00:03:34,500 --> 00:03:38,159 just an associational study so you can't 99 00:03:36,360 --> 00:03:39,659 actually say anything about cause and 100 00:03:38,159 --> 00:03:41,700 cause and effect 101 00:03:39,659 --> 00:03:43,980 um so it's almost like you know 102 00:03:41,700 --> 00:03:46,140 Schrodinger's cat right where the study 103 00:03:43,980 --> 00:03:47,700 is in two states at the same time yeah 104 00:03:46,140 --> 00:03:49,319 you one thing to draw this sort of 105 00:03:47,700 --> 00:03:51,360 causal conclusion but you know you're 106 00:03:49,319 --> 00:03:53,940 not allowed to say that so I feel like 107 00:03:51,360 --> 00:03:56,220 there's a sort of you know internal 108 00:03:53,940 --> 00:03:57,959 contradiction and you know if people 109 00:03:56,220 --> 00:03:59,940 were aware that it is actually quite 110 00:03:57,959 --> 00:04:01,920 easy to embrace and ADD thinking about 111 00:03:59,940 --> 00:04:04,140 causality into these studies then they 112 00:04:01,920 --> 00:04:06,780 would do it a lot more often 113 00:04:04,140 --> 00:04:08,159 and it's not just the researchers maybe 114 00:04:06,780 --> 00:04:10,260 sort of hedging their bets about whether 115 00:04:08,159 --> 00:04:11,280 the research is covering causality or 116 00:04:10,260 --> 00:04:13,980 not 117 00:04:11,280 --> 00:04:16,440 um there's also been Research into the 118 00:04:13,980 --> 00:04:19,019 perceptions that people draw from 119 00:04:16,440 --> 00:04:20,519 associative studies like if they read 120 00:04:19,019 --> 00:04:23,460 that you know there's an association 121 00:04:20,519 --> 00:04:25,979 between X and Y people often draw the 122 00:04:23,460 --> 00:04:28,860 conclusion that X causes y 123 00:04:25,979 --> 00:04:31,199 which you know may be true but it may 124 00:04:28,860 --> 00:04:33,479 also not be true and in fact there's a 125 00:04:31,199 --> 00:04:35,460 huge number of examples showing that you 126 00:04:33,479 --> 00:04:37,380 know it very easily can be a sort of 127 00:04:35,460 --> 00:04:40,199 spurious or false correlation in fact 128 00:04:37,380 --> 00:04:42,120 this this guy Tyler vegan the website at 129 00:04:40,199 --> 00:04:44,160 the bottom there yeah he's got a whole 130 00:04:42,120 --> 00:04:46,199 website full of hilarious sort of 131 00:04:44,160 --> 00:04:48,300 correlations that don't have any real 132 00:04:46,199 --> 00:04:50,460 causal relation just to show how easy it 133 00:04:48,300 --> 00:04:52,259 is to discover you know a false 134 00:04:50,460 --> 00:04:56,040 relationship 135 00:04:52,259 --> 00:04:58,860 there is one uh experimental design that 136 00:04:56,040 --> 00:05:00,840 people widely understand do establish a 137 00:04:58,860 --> 00:05:04,020 causal relationship and that's called 138 00:05:00,840 --> 00:05:08,100 the randomized control trial and and or 139 00:05:04,020 --> 00:05:10,139 RCT and an RCT has two key elements that 140 00:05:08,100 --> 00:05:12,780 enable it to do that the first is 141 00:05:10,139 --> 00:05:14,880 randomization so whatever the factors 142 00:05:12,780 --> 00:05:16,979 that you know affect that whole study 143 00:05:14,880 --> 00:05:19,560 population they're going to be present 144 00:05:16,979 --> 00:05:21,180 in both groups that you produce because 145 00:05:19,560 --> 00:05:23,460 you've randomized the assignment of 146 00:05:21,180 --> 00:05:24,960 people to those two groups whatever 147 00:05:23,460 --> 00:05:26,820 those confounding factors are they're 148 00:05:24,960 --> 00:05:28,560 going to be present in both groups and 149 00:05:26,820 --> 00:05:30,840 then you make some Interventional change 150 00:05:28,560 --> 00:05:32,699 to just one of those groups and then 151 00:05:30,840 --> 00:05:34,680 that enables the combination of the 152 00:05:32,699 --> 00:05:36,060 random assignment and That change to 153 00:05:34,680 --> 00:05:38,639 just one of the groups allows you to 154 00:05:36,060 --> 00:05:40,740 make that conclusion that the the 155 00:05:38,639 --> 00:05:43,139 differences between those groups are due 156 00:05:40,740 --> 00:05:44,520 to the intervention and they're not due 157 00:05:43,139 --> 00:05:46,560 to other factors that were sort of 158 00:05:44,520 --> 00:05:49,500 hidden in the background 159 00:05:46,560 --> 00:05:51,419 but randomized controlled trials are not 160 00:05:49,500 --> 00:05:53,280 always Pro possible they're not always 161 00:05:51,419 --> 00:05:54,300 practical so for example if your 162 00:05:53,280 --> 00:05:55,740 question is about something that's 163 00:05:54,300 --> 00:05:57,180 happened in the past and obviously 164 00:05:55,740 --> 00:05:58,560 unless you can time travel you can't go 165 00:05:57,180 --> 00:06:00,000 back and change that and see what would 166 00:05:58,560 --> 00:06:02,039 have happened 167 00:06:00,000 --> 00:06:04,080 um there are also many situations where 168 00:06:02,039 --> 00:06:06,060 it's you know unethical or impractical 169 00:06:04,080 --> 00:06:08,340 to do a randomized control trial so for 170 00:06:06,060 --> 00:06:09,960 example you can't get you know a group 171 00:06:08,340 --> 00:06:11,820 of kids and then get half of them to 172 00:06:09,960 --> 00:06:15,060 smoke 20 cigarettes a day for 20 years 173 00:06:11,820 --> 00:06:16,740 sort of see what might happen so if you 174 00:06:15,060 --> 00:06:18,900 can't do a randomized controlled trial 175 00:06:16,740 --> 00:06:20,160 can you still model causality and the 176 00:06:18,900 --> 00:06:22,979 answer is yes 177 00:06:20,160 --> 00:06:25,560 you basically need these two things so 178 00:06:22,979 --> 00:06:27,240 first you need some data and secondly 179 00:06:25,560 --> 00:06:28,940 you need a cause and model and there's 180 00:06:27,240 --> 00:06:31,560 many types of causal model 181 00:06:28,940 --> 00:06:34,319 but most commonly the way that you 182 00:06:31,560 --> 00:06:37,380 produce the model is either by drawing 183 00:06:34,319 --> 00:06:39,300 on the knowledge of experts and that 184 00:06:37,380 --> 00:06:40,620 process of sort of gathering and sort of 185 00:06:39,300 --> 00:06:42,360 discussing and teasing out that 186 00:06:40,620 --> 00:06:45,720 knowledge is called elicitation 187 00:06:42,360 --> 00:06:46,979 or you can learn the causal model from 188 00:06:45,720 --> 00:06:48,060 the data and that's called causal 189 00:06:46,979 --> 00:06:50,039 discovery 190 00:06:48,060 --> 00:06:51,720 so causal inference is the process of 191 00:06:50,039 --> 00:06:53,639 you know using the model once you've got 192 00:06:51,720 --> 00:06:55,500 it Discovery is the process of learning 193 00:06:53,639 --> 00:06:58,199 a model from data and elicitation is the 194 00:06:55,500 --> 00:06:59,639 process of learning a model from experts 195 00:06:58,199 --> 00:07:01,139 and there can be a bit of mixing right 196 00:06:59,639 --> 00:07:02,580 like you can get some expert domain 197 00:07:01,139 --> 00:07:05,039 knowledge and use that to restrict the 198 00:07:02,580 --> 00:07:09,060 range of models for causal discovery 199 00:07:05,039 --> 00:07:11,520 so in my day job I work for wsp 200 00:07:09,060 --> 00:07:13,199 um an engineering consulting company and 201 00:07:11,520 --> 00:07:15,360 what's really sort of drawn me to the 202 00:07:13,199 --> 00:07:16,919 sort of causality space is just the 203 00:07:15,360 --> 00:07:19,919 number of opportunities that we 204 00:07:16,919 --> 00:07:22,199 encounter where we have clients with 205 00:07:19,919 --> 00:07:24,740 vast quantities of detailed historical 206 00:07:22,199 --> 00:07:26,819 data and because a lot of these sort of 207 00:07:24,740 --> 00:07:29,099 infrastructure Engineering Systems they 208 00:07:26,819 --> 00:07:30,660 also have expert domain knowledge of 209 00:07:29,099 --> 00:07:32,759 these well-defined well-controlled 210 00:07:30,660 --> 00:07:35,099 systems and the types of questions that 211 00:07:32,759 --> 00:07:38,400 they come to us and ask us to solve are 212 00:07:35,099 --> 00:07:39,960 often causal questions so for example in 213 00:07:38,400 --> 00:07:41,460 you know managing a lot of the sort of 214 00:07:39,960 --> 00:07:44,520 critical infrastructure that we have 215 00:07:41,460 --> 00:07:46,740 around Australia we get questions like 216 00:07:44,520 --> 00:07:49,080 you know over the last 10 years we've 217 00:07:46,740 --> 00:07:51,599 invested X millions of dollars in 218 00:07:49,080 --> 00:07:54,060 applying these policies to renew like 219 00:07:51,599 --> 00:07:56,099 pipe networks or Road networks you know 220 00:07:54,060 --> 00:07:57,660 if we had invested a different amount of 221 00:07:56,099 --> 00:08:00,000 money or if we'd invested in different 222 00:07:57,660 --> 00:08:01,979 practices or policies or Technologies 223 00:08:00,000 --> 00:08:03,360 what would have happened like what would 224 00:08:01,979 --> 00:08:05,400 have been the service level of our 225 00:08:03,360 --> 00:08:07,680 Railways or our roads under the those 226 00:08:05,400 --> 00:08:09,599 conditions and so all of these questions 227 00:08:07,680 --> 00:08:12,300 they generally they cause all questions 228 00:08:09,599 --> 00:08:13,560 because they involve exploring the 229 00:08:12,300 --> 00:08:14,699 outcomes that would have happened under 230 00:08:13,560 --> 00:08:16,740 different conditions that aren't 231 00:08:14,699 --> 00:08:18,240 represented in the data 232 00:08:16,740 --> 00:08:19,620 so that was the first part of the talk 233 00:08:18,240 --> 00:08:21,599 where I tried to sort of convince you 234 00:08:19,620 --> 00:08:23,160 that you should be interested in in 235 00:08:21,599 --> 00:08:24,660 causality 236 00:08:23,160 --> 00:08:27,120 the second part is sort of looking 237 00:08:24,660 --> 00:08:29,220 specifically at a python Library called 238 00:08:27,120 --> 00:08:33,300 do Y which I've been working with quite 239 00:08:29,220 --> 00:08:35,520 a bit and do why is part of a uh a 240 00:08:33,300 --> 00:08:38,159 package well not a package an ecosystem 241 00:08:35,520 --> 00:08:41,099 they call it um called Pi Y which 242 00:08:38,159 --> 00:08:44,459 contains a few major packages do Y which 243 00:08:41,099 --> 00:08:46,260 is about causal effects it kind of Mel a 244 00:08:44,459 --> 00:08:48,600 lot of the people working in the causal 245 00:08:46,260 --> 00:08:50,399 inference space come from econometrics 246 00:08:48,600 --> 00:08:52,019 and epidemiology 247 00:08:50,399 --> 00:08:53,880 and so they brought in a lot of their 248 00:08:52,019 --> 00:08:56,279 methods and causal Learners called 249 00:08:53,880 --> 00:08:58,019 Discovery algorithms 250 00:08:56,279 --> 00:09:01,500 and this talk is going to mostly focus 251 00:08:58,019 --> 00:09:04,620 on the do Y part and do what is well 252 00:09:01,500 --> 00:09:06,600 documented the user guys it's all on Pi 253 00:09:04,620 --> 00:09:09,060 Y and actually you know the user guide 254 00:09:06,600 --> 00:09:10,320 is not just the bare bones of um you 255 00:09:09,060 --> 00:09:12,540 know this is this is how you install it 256 00:09:10,320 --> 00:09:14,399 this is how you you how you do one 257 00:09:12,540 --> 00:09:15,720 simple introduction it's actually pretty 258 00:09:14,399 --> 00:09:17,220 detailed it covers a lot of sort of 259 00:09:15,720 --> 00:09:19,019 background Concepts so it's a really 260 00:09:17,220 --> 00:09:21,600 quite a recommended read 261 00:09:19,019 --> 00:09:24,839 I'm going to show a few clips of code 262 00:09:21,600 --> 00:09:27,300 for the rest of the talk and that's 263 00:09:24,839 --> 00:09:29,279 actually in a public GitHub repo which I 264 00:09:27,300 --> 00:09:30,660 just made for this talk so if you want 265 00:09:29,279 --> 00:09:34,500 to go have a look at that afterwards 266 00:09:30,660 --> 00:09:36,180 then you can you can have a look at your 267 00:09:34,500 --> 00:09:38,339 what's happened play with the code like 268 00:09:36,180 --> 00:09:41,279 maybe do some experiments of your own so 269 00:09:38,339 --> 00:09:43,560 everything there it's very simple 270 00:09:41,279 --> 00:09:46,920 one of the things I really like about do 271 00:09:43,560 --> 00:09:49,200 y is that it imposes this sort of 272 00:09:46,920 --> 00:09:51,300 four-step process on modeling a sort of 273 00:09:49,200 --> 00:09:54,660 causal inference problem 274 00:09:51,300 --> 00:09:56,640 and the four steps are firstly model the 275 00:09:54,660 --> 00:09:59,279 problem and I'll explain what these are 276 00:09:56,640 --> 00:10:01,680 as we go uh secondly we'll use that 277 00:09:59,279 --> 00:10:03,600 model to identify an estimate we'll use 278 00:10:01,680 --> 00:10:05,220 the S demand and your data to estimate 279 00:10:03,600 --> 00:10:07,560 an effect and then finally the fourth 280 00:10:05,220 --> 00:10:09,660 step we will try to refute that estimate 281 00:10:07,560 --> 00:10:10,980 okay so to try and explain what those 282 00:10:09,660 --> 00:10:12,899 words will mean we'll go through a bit 283 00:10:10,980 --> 00:10:15,720 of an example 284 00:10:12,899 --> 00:10:19,740 the example that I picked is called the 285 00:10:15,720 --> 00:10:22,140 Lalonde data set it's it's really old it 286 00:10:19,740 --> 00:10:25,920 was you know it's from I think the the 287 00:10:22,140 --> 00:10:27,240 late 1970s and it's very simple a small 288 00:10:25,920 --> 00:10:29,700 data set 289 00:10:27,240 --> 00:10:31,080 um and essentially what had happened is 290 00:10:29,700 --> 00:10:33,240 they had a training program and they 291 00:10:31,080 --> 00:10:35,399 wanted to understand if that training 292 00:10:33,240 --> 00:10:38,700 program had actually produced a benefit 293 00:10:35,399 --> 00:10:40,440 to the people who participated in it and 294 00:10:38,700 --> 00:10:43,500 so they looked at the wages of 295 00:10:40,440 --> 00:10:45,360 participants three years later in 1978 296 00:10:43,500 --> 00:10:47,519 and then compared to another group of 297 00:10:45,360 --> 00:10:49,980 people who hadn't participated in that 298 00:10:47,519 --> 00:10:52,620 program and so the question so the data 299 00:10:49,980 --> 00:10:55,740 you can see there this this data is in 300 00:10:52,620 --> 00:10:57,180 the in the repo uh essentially there's 301 00:10:55,740 --> 00:10:58,920 two columns that we're really interested 302 00:10:57,180 --> 00:11:00,660 in you know whether they undertook the 303 00:10:58,920 --> 00:11:02,940 training and then their wage three years 304 00:11:00,660 --> 00:11:04,860 later which was 1978 I told you it was a 305 00:11:02,940 --> 00:11:06,300 very old example 306 00:11:04,860 --> 00:11:07,500 um and there's a few other columns which 307 00:11:06,300 --> 00:11:11,880 are sort of variables that they thought 308 00:11:07,500 --> 00:11:13,920 may have also affected the answer 309 00:11:11,880 --> 00:11:15,600 so remember I said that to do causality 310 00:11:13,920 --> 00:11:17,700 without randomized control trials you 311 00:11:15,600 --> 00:11:20,700 need two things so firstly you need some 312 00:11:17,700 --> 00:11:23,040 data and we just looked at that CSV file 313 00:11:20,700 --> 00:11:25,200 and then secondly you need a causal 314 00:11:23,040 --> 00:11:27,660 model and so the next thing we need to 315 00:11:25,200 --> 00:11:31,019 look at is how you can describe a causal 316 00:11:27,660 --> 00:11:33,720 model in do y the python Library 317 00:11:31,019 --> 00:11:35,160 so do I want you to provide your domain 318 00:11:33,720 --> 00:11:38,459 knowledge about the system in question 319 00:11:35,160 --> 00:11:41,519 as a directed a cyclic graph directed 320 00:11:38,459 --> 00:11:43,560 meaning that there are arrows 321 00:11:41,519 --> 00:11:45,420 essentially between the variables and 322 00:11:43,560 --> 00:11:48,000 variables are just like the columns in 323 00:11:45,420 --> 00:11:50,040 your data file effectively right so 324 00:11:48,000 --> 00:11:51,779 and acyclic means that there are no 325 00:11:50,040 --> 00:11:53,820 Loops right so those are those are the 326 00:11:51,779 --> 00:11:55,800 only sort of constraints that we have we 327 00:11:53,820 --> 00:11:57,959 need that graph to include at least the 328 00:11:55,800 --> 00:12:00,120 treatment which is the the cause that we 329 00:11:57,959 --> 00:12:01,980 want to vary and the outcome that we 330 00:12:00,120 --> 00:12:03,839 want to understand the effect on the 331 00:12:01,980 --> 00:12:06,959 outcome right 332 00:12:03,839 --> 00:12:08,940 there is this aim that you want to 333 00:12:06,959 --> 00:12:12,240 include in that graph all of the 334 00:12:08,940 --> 00:12:14,160 relevant direct causal relationships so 335 00:12:12,240 --> 00:12:15,899 you don't want to include just a you 336 00:12:14,160 --> 00:12:17,220 know a correlation you only want to 337 00:12:15,899 --> 00:12:20,720 include a relationship where it's a 338 00:12:17,220 --> 00:12:23,480 causal one we there's a bit of sort of 339 00:12:20,720 --> 00:12:26,540 judgment and sort of expediency 340 00:12:23,480 --> 00:12:28,740 practicality to sort of deciding which 341 00:12:26,540 --> 00:12:30,380 variables and which interactions to 342 00:12:28,740 --> 00:12:33,360 include 343 00:12:30,380 --> 00:12:35,820 that is a whole sort of Topic in itself 344 00:12:33,360 --> 00:12:37,140 but um you know one tip I can give is 345 00:12:35,820 --> 00:12:39,779 like you can always create multiple 346 00:12:37,140 --> 00:12:41,700 models and compare the results with 347 00:12:39,779 --> 00:12:44,639 different models one of the great things 348 00:12:41,700 --> 00:12:48,360 about creating this graph is that it 349 00:12:44,639 --> 00:12:50,399 becomes a specific precise documented 350 00:12:48,360 --> 00:12:52,560 description of your assumptions and 351 00:12:50,399 --> 00:12:54,899 beliefs that you're bringing to this 352 00:12:52,560 --> 00:12:56,660 study so whereas if you'd just done one 353 00:12:54,899 --> 00:12:59,339 of those Schrodinger's 354 00:12:56,660 --> 00:13:00,720 studies before like essentially all of 355 00:12:59,339 --> 00:13:02,100 this would have not been stated right 356 00:13:00,720 --> 00:13:04,500 whatever assumptions you make about 357 00:13:02,100 --> 00:13:06,360 confounding variables is just kind of 358 00:13:04,500 --> 00:13:09,480 left for the reader to sort of interpret 359 00:13:06,360 --> 00:13:12,300 whereas if you Embrace causality and you 360 00:13:09,480 --> 00:13:14,100 sort of draw a causal diagram a dag like 361 00:13:12,300 --> 00:13:15,420 this and you're making those assumptions 362 00:13:14,100 --> 00:13:17,459 explicit 363 00:13:15,420 --> 00:13:19,200 so even if they're wrong at least people 364 00:13:17,459 --> 00:13:22,800 can see what they are 365 00:13:19,200 --> 00:13:26,220 now do I want you to provide the causal 366 00:13:22,800 --> 00:13:28,019 model as a string which you know is 367 00:13:26,220 --> 00:13:29,880 visible on the left there and looks a 368 00:13:28,019 --> 00:13:31,800 bit complicated so I'll just sort of 369 00:13:29,880 --> 00:13:32,880 break it down so we can understand how 370 00:13:31,800 --> 00:13:34,560 it works 371 00:13:32,880 --> 00:13:36,899 the first part is essentially we're 372 00:13:34,560 --> 00:13:38,399 declaring the variables and if you 373 00:13:36,899 --> 00:13:40,860 remember I said that the variables are 374 00:13:38,399 --> 00:13:42,060 essentially just the relevant you know 375 00:13:40,860 --> 00:13:44,100 you don't have to use all of them the 376 00:13:42,060 --> 00:13:45,300 relevant columns in your data file so 377 00:13:44,100 --> 00:13:48,060 you can see the variables are 378 00:13:45,300 --> 00:13:50,940 essentially The Columns there 379 00:13:48,060 --> 00:13:52,560 and it's in this in this string we 380 00:13:50,940 --> 00:13:54,420 essentially just to declare them all by 381 00:13:52,560 --> 00:13:57,180 listing them by name 382 00:13:54,420 --> 00:13:59,880 and once we've declared the variables 383 00:13:57,180 --> 00:14:02,639 the next step is to create the edges in 384 00:13:59,880 --> 00:14:04,440 our graph and um yeah it's a good 385 00:14:02,639 --> 00:14:07,139 starting point is basically to say well 386 00:14:04,440 --> 00:14:10,139 in this case there is an edge there is a 387 00:14:07,139 --> 00:14:11,300 causal effect between the whether the 388 00:14:10,139 --> 00:14:14,040 participant 389 00:14:11,300 --> 00:14:15,779 received the training course and their 390 00:14:14,040 --> 00:14:17,639 wage now in this case that's a direct 391 00:14:15,779 --> 00:14:19,980 effect it doesn't have to be it might be 392 00:14:17,639 --> 00:14:21,180 that you know doing training affects 393 00:14:19,980 --> 00:14:22,920 some other variable and that other 394 00:14:21,180 --> 00:14:25,860 variable affects wages but in this case 395 00:14:22,920 --> 00:14:27,540 it's direct and to explain to do why 396 00:14:25,860 --> 00:14:29,459 that you've got this direct effect you 397 00:14:27,540 --> 00:14:32,420 just use this Arrow operator you can see 398 00:14:29,459 --> 00:14:34,800 it in the red box on the left 399 00:14:32,420 --> 00:14:36,360 the so having having sort of created 400 00:14:34,800 --> 00:14:37,980 that first Edge we can just sort of keep 401 00:14:36,360 --> 00:14:39,779 populating the graph with all the other 402 00:14:37,980 --> 00:14:42,540 edges just by sort of adding them to 403 00:14:39,779 --> 00:14:44,220 that string so in the next one we sort 404 00:14:42,540 --> 00:14:45,720 of consider well what's the impact of 405 00:14:44,220 --> 00:14:47,639 you know the number of years education 406 00:14:45,720 --> 00:14:49,260 that person has had and you sort of 407 00:14:47,639 --> 00:14:50,940 consult with your experts and they say 408 00:14:49,260 --> 00:14:53,639 oh yeah well that would affect you know 409 00:14:50,940 --> 00:14:55,079 wages as well and actually in this study 410 00:14:53,639 --> 00:14:57,000 it affected whether people were eligible 411 00:14:55,079 --> 00:14:58,620 for the training program as well so we 412 00:14:57,000 --> 00:15:00,360 sort of represent that by adding those 413 00:14:58,620 --> 00:15:01,740 two edges there 414 00:15:00,360 --> 00:15:03,060 and then the rest of the string is 415 00:15:01,740 --> 00:15:04,860 essentially just repeating that and 416 00:15:03,060 --> 00:15:06,480 adding all of the other edges and I 417 00:15:04,860 --> 00:15:08,579 don't claim this is like a correct 418 00:15:06,480 --> 00:15:10,800 causal diagram this is you know just an 419 00:15:08,579 --> 00:15:12,360 example but um you know essentially it's 420 00:15:10,800 --> 00:15:13,380 a representation of the string on the 421 00:15:12,360 --> 00:15:15,060 left there 422 00:15:13,380 --> 00:15:17,940 so that was the first step and that was 423 00:15:15,060 --> 00:15:20,220 like really like the the bulk of the 424 00:15:17,940 --> 00:15:21,899 work that you have to do as a user of 425 00:15:20,220 --> 00:15:24,300 the do y Library 426 00:15:21,899 --> 00:15:26,100 once you've you've created that that 427 00:15:24,300 --> 00:15:28,740 graph as a string and you've got your 428 00:15:26,100 --> 00:15:30,180 data as a pandas data frame you 429 00:15:28,740 --> 00:15:32,639 essentially pass them both into an 430 00:15:30,180 --> 00:15:35,519 object that do y equals a causal model 431 00:15:32,639 --> 00:15:37,620 and you say the treatment here is the 432 00:15:35,519 --> 00:15:40,440 training variable and the outcome is is 433 00:15:37,620 --> 00:15:43,440 these the wages in 1978 and you pass in 434 00:15:40,440 --> 00:15:44,820 the data in your graph that's it for the 435 00:15:43,440 --> 00:15:47,040 first step 436 00:15:44,820 --> 00:15:49,019 the second step is then we do a thing 437 00:15:47,040 --> 00:15:50,459 called identify effect all of the 438 00:15:49,019 --> 00:15:51,899 remaining steps are literally just one 439 00:15:50,459 --> 00:15:53,760 function calling it's like it's actually 440 00:15:51,899 --> 00:15:55,980 very easy 441 00:15:53,760 --> 00:15:57,360 um as I mentioned earlier like using 442 00:15:55,980 --> 00:15:59,519 identify effect will produce a thing 443 00:15:57,360 --> 00:16:01,500 called an S demand which you may not 444 00:15:59,519 --> 00:16:04,199 have heard of before essentially an 445 00:16:01,500 --> 00:16:06,600 estimate is a way to estimate the 446 00:16:04,199 --> 00:16:09,120 desired quantity so it's it's a sort of 447 00:16:06,600 --> 00:16:11,839 it's a strategy or like a procedure that 448 00:16:09,120 --> 00:16:14,279 will enable you to calculate the the the 449 00:16:11,839 --> 00:16:15,600 quantity that you're interested in and 450 00:16:14,279 --> 00:16:17,279 it's worth noting that it's not always 451 00:16:15,600 --> 00:16:19,380 possible so you can create a graph where 452 00:16:17,279 --> 00:16:21,660 there is no valid estimate it's also 453 00:16:19,380 --> 00:16:23,160 possible to create a graph where there 454 00:16:21,660 --> 00:16:24,480 are multiple estimands in which case it 455 00:16:23,160 --> 00:16:25,800 will return them all and you can choose 456 00:16:24,480 --> 00:16:28,139 between them 457 00:16:25,800 --> 00:16:30,180 so in this case we've gotten a backdoor 458 00:16:28,139 --> 00:16:31,560 s demand 459 00:16:30,180 --> 00:16:34,800 and the other thing that's happening 460 00:16:31,560 --> 00:16:37,500 under the hood when do y is doing this 461 00:16:34,800 --> 00:16:39,240 identification step is it's analyzing 462 00:16:37,500 --> 00:16:40,980 the graph that domain knowledge you've 463 00:16:39,240 --> 00:16:43,560 provided and it's working out the roles 464 00:16:40,980 --> 00:16:44,940 of all the variables in this problem and 465 00:16:43,560 --> 00:16:46,259 and this is a really key step right 466 00:16:44,940 --> 00:16:48,000 because it's understanding which 467 00:16:46,259 --> 00:16:49,440 variables you should be controlling or 468 00:16:48,000 --> 00:16:51,420 conditioning for 469 00:16:49,440 --> 00:16:53,399 and also which variables you should not 470 00:16:51,420 --> 00:16:54,959 be conditioning for and that's really 471 00:16:53,399 --> 00:16:56,579 interesting because some people sort of 472 00:16:54,959 --> 00:16:58,500 kind of think well I should just control 473 00:16:56,579 --> 00:17:01,560 for as many variables as possible but 474 00:16:58,500 --> 00:17:04,319 that's actually harmful in in some in 475 00:17:01,560 --> 00:17:07,140 some situations and can actually sort of 476 00:17:04,319 --> 00:17:09,120 eliminate or bias incorrectly the effect 477 00:17:07,140 --> 00:17:10,919 that you're looking for so that that 478 00:17:09,120 --> 00:17:12,240 sort of analysis of the graph is really 479 00:17:10,919 --> 00:17:14,819 important 480 00:17:12,240 --> 00:17:17,040 and to sort of illustrate you know the 481 00:17:14,819 --> 00:17:18,959 the effect that that can have there's 482 00:17:17,040 --> 00:17:21,839 this this phenomenon known as Simpson's 483 00:17:18,959 --> 00:17:23,579 Paradox and uh in Simpsons Paradox what 484 00:17:21,839 --> 00:17:25,740 happens is you've got this this whole 485 00:17:23,579 --> 00:17:27,780 study population where you know the 486 00:17:25,740 --> 00:17:30,179 relationship between some property X and 487 00:17:27,780 --> 00:17:31,679 some property y has a certain you know 488 00:17:30,179 --> 00:17:34,020 Direction so you can see this strong 489 00:17:31,679 --> 00:17:37,140 magenta line there basically saying like 490 00:17:34,020 --> 00:17:39,419 an increase in X decreases the value of 491 00:17:37,140 --> 00:17:41,940 y and that is true over the whole 492 00:17:39,419 --> 00:17:43,860 population but if you bring in this 493 00:17:41,940 --> 00:17:46,020 additional variable which actually 494 00:17:43,860 --> 00:17:48,419 divides the population into these four 495 00:17:46,020 --> 00:17:50,520 color groups then within each of those 496 00:17:48,419 --> 00:17:52,440 groups the relationship between X and Y 497 00:17:50,520 --> 00:17:54,780 is completely opposite 498 00:17:52,440 --> 00:17:56,640 so if you hadn't brought in and control 499 00:17:54,780 --> 00:17:57,840 for that variable appropriately then 500 00:17:56,640 --> 00:18:00,179 your conclusion would have been the 501 00:17:57,840 --> 00:18:02,039 opposite of what it should be now 502 00:18:00,179 --> 00:18:04,500 hopefully that sort of intuitively makes 503 00:18:02,039 --> 00:18:06,059 sense in this example you can kind of 504 00:18:04,500 --> 00:18:07,380 see how that works but without the 505 00:18:06,059 --> 00:18:09,860 coloring it's actually really hard to 506 00:18:07,380 --> 00:18:12,179 grasp out two totally different 507 00:18:09,860 --> 00:18:15,740 contradictory outcomes can be possible 508 00:18:12,179 --> 00:18:15,740 in in one set of data 509 00:18:16,260 --> 00:18:21,960 the next of our third of our sort of 510 00:18:19,260 --> 00:18:24,059 four steps is estimating the effect yeah 511 00:18:21,960 --> 00:18:26,400 again it's just a single function call 512 00:18:24,059 --> 00:18:28,140 it's very easy to do you can select from 513 00:18:26,400 --> 00:18:30,179 a range of models that are built in and 514 00:18:28,140 --> 00:18:33,360 supported by do y and you can also 515 00:18:30,179 --> 00:18:36,480 access models from the econ ml package 516 00:18:33,360 --> 00:18:38,520 as well and so having done this in our 517 00:18:36,480 --> 00:18:41,160 data set we get this result that the 518 00:18:38,520 --> 00:18:43,440 cause of estimate is 1629 and in this 519 00:18:41,160 --> 00:18:45,480 case it's 16 29 dollars more and because 520 00:18:43,440 --> 00:18:47,340 we've got a causal model we can actually 521 00:18:45,480 --> 00:18:50,580 make a causal interpretation which we 522 00:18:47,340 --> 00:18:53,280 can say you know as a given as a prior 523 00:18:50,580 --> 00:18:55,380 sort of assumption that graph that that 524 00:18:53,280 --> 00:18:57,780 domain knowledge that we provided if you 525 00:18:55,380 --> 00:18:59,280 accept that as being correct then on 526 00:18:57,780 --> 00:19:02,039 average completing this training course 527 00:18:59,280 --> 00:19:03,360 causes participants to earn one thousand 528 00:19:02,039 --> 00:19:06,000 six hundred and twenty nine dollars more 529 00:19:03,360 --> 00:19:08,280 than not completing the training right 530 00:19:06,000 --> 00:19:10,140 so you see by bringing the sort of 531 00:19:08,280 --> 00:19:12,360 causal analysis and the causal model 532 00:19:10,140 --> 00:19:14,039 into the study we're able to go from a 533 00:19:12,360 --> 00:19:15,419 sort of a statement about you know one 534 00:19:14,039 --> 00:19:16,980 variable being associated with another 535 00:19:15,419 --> 00:19:19,260 to actually have sort of causal 536 00:19:16,980 --> 00:19:23,340 interpretation 537 00:19:19,260 --> 00:19:26,100 the the next and sort of final step in 538 00:19:23,340 --> 00:19:28,799 the in the do y Paradigm sort of how to 539 00:19:26,100 --> 00:19:31,200 handle causal inference is refutation 540 00:19:28,799 --> 00:19:35,280 and basically that means sort of stress 541 00:19:31,200 --> 00:19:37,799 testing your uh your model to sort of 542 00:19:35,280 --> 00:19:40,380 see is this a real effect like you might 543 00:19:37,799 --> 00:19:42,179 not really be sure from the magnitude of 544 00:19:40,380 --> 00:19:44,220 the variables whether this is like a you 545 00:19:42,179 --> 00:19:46,260 know a weak effect but legitimate or 546 00:19:44,220 --> 00:19:48,179 maybe it's a strong effect but it's sort 547 00:19:46,260 --> 00:19:51,299 of like biased or confounded in some way 548 00:19:48,179 --> 00:19:53,640 and so do I provides a number of tools 549 00:19:51,299 --> 00:19:56,220 to enable you to sort of gain confidence 550 00:19:53,640 --> 00:19:59,280 and sort of understand your how 551 00:19:56,220 --> 00:20:00,840 statistically robust that effect is and 552 00:19:59,280 --> 00:20:03,240 you can access all of them through the 553 00:20:00,840 --> 00:20:05,039 refute estimate function you basically 554 00:20:03,240 --> 00:20:07,679 specify the name of the test that you 555 00:20:05,039 --> 00:20:09,299 want to do so in this case you can see 556 00:20:07,679 --> 00:20:10,679 what I've done is I've used a placebo 557 00:20:09,299 --> 00:20:12,240 treatment which essentially means we 558 00:20:10,679 --> 00:20:13,440 randomize all of the treatments but we 559 00:20:12,240 --> 00:20:15,360 keep the outcomes and all the other 560 00:20:13,440 --> 00:20:16,740 variables the same and because we've 561 00:20:15,360 --> 00:20:19,740 randomized the treatment we would expect 562 00:20:16,740 --> 00:20:21,480 that effect to disappear and in this 563 00:20:19,740 --> 00:20:24,000 case fortunately it does effect has gone 564 00:20:21,480 --> 00:20:26,580 down from 1600 to just two dollars so 565 00:20:24,000 --> 00:20:29,160 it's pretty much gone 566 00:20:26,580 --> 00:20:30,720 now there's one extra bit that I wanted 567 00:20:29,160 --> 00:20:32,880 to sort of add to this talk which is 568 00:20:30,720 --> 00:20:35,039 about counterfactual outcomes so a 569 00:20:32,880 --> 00:20:36,360 counter factual outcome is like looking 570 00:20:35,039 --> 00:20:38,220 back and saying well what would have 571 00:20:36,360 --> 00:20:41,039 happened if if things were different 572 00:20:38,220 --> 00:20:42,419 like if we did something differently and 573 00:20:41,039 --> 00:20:44,100 yeah the great thing about having a 574 00:20:42,419 --> 00:20:46,380 cause and model is we can actually we 575 00:20:44,100 --> 00:20:48,660 can actually answer this right so we can 576 00:20:46,380 --> 00:20:50,520 first look at what actually did happen 577 00:20:48,660 --> 00:20:52,679 to the participants in this study and so 578 00:20:50,520 --> 00:20:55,020 the red box at the bottom shows that if 579 00:20:52,679 --> 00:20:56,640 you look at the average outcome of all 580 00:20:55,020 --> 00:20:58,679 the participants it's five thousand 581 00:20:56,640 --> 00:21:01,799 three hundred dollars average wage in 582 00:20:58,679 --> 00:21:03,419 1978 been a lot of inflation since then 583 00:21:01,799 --> 00:21:05,580 and um 584 00:21:03,419 --> 00:21:07,140 if we look at the outcome for just the 585 00:21:05,580 --> 00:21:10,020 control group who didn't receive the 586 00:21:07,140 --> 00:21:12,480 training it's the average wage is 4 500 587 00:21:10,020 --> 00:21:14,820 and the average outcome for the treated 588 00:21:12,480 --> 00:21:16,679 group who did receive the training is 589 00:21:14,820 --> 00:21:18,539 six thousand three hundred so yeah at 590 00:21:16,679 --> 00:21:21,419 surface level it looks like you know 591 00:21:18,539 --> 00:21:22,919 there was uh an increase in wage for 592 00:21:21,419 --> 00:21:24,720 that group which matches our sort of 593 00:21:22,919 --> 00:21:26,940 causal effect that doing the training 594 00:21:24,720 --> 00:21:28,440 did increase their wage so that's all 595 00:21:26,940 --> 00:21:30,480 looking good 596 00:21:28,440 --> 00:21:32,520 so do wife revise this thing called the 597 00:21:30,480 --> 00:21:35,039 do operator and that is a way to access 598 00:21:32,520 --> 00:21:38,580 an intervention or to apply a 599 00:21:35,039 --> 00:21:40,620 counterfactual scenario and so uh to 600 00:21:38,580 --> 00:21:43,919 illustrate that I've added a couple of 601 00:21:40,620 --> 00:21:47,640 extra outcomes so firstly the outcome 602 00:21:43,919 --> 00:21:49,740 over all the participants if if none of 603 00:21:47,640 --> 00:21:52,559 them received any training the average 604 00:21:49,740 --> 00:21:54,539 outcome goes down from 5 300 and it goes 605 00:21:52,559 --> 00:21:55,740 down to four thousand six hundred so you 606 00:21:54,539 --> 00:21:57,960 can see that if we take away the 607 00:21:55,740 --> 00:21:59,940 training then all the participants kind 608 00:21:57,960 --> 00:22:01,679 of become more like the controls 609 00:21:59,940 --> 00:22:03,780 and we also have accounts factual 610 00:22:01,679 --> 00:22:06,480 outcome as if we if we did provide 611 00:22:03,780 --> 00:22:09,960 training to all the participants so that 612 00:22:06,480 --> 00:22:11,159 increases the average wage to 6200 and 613 00:22:09,960 --> 00:22:12,900 those numbers kind of make sense right 614 00:22:11,159 --> 00:22:15,000 because they look it basically makes the 615 00:22:12,900 --> 00:22:16,980 population look more like the you know 616 00:22:15,000 --> 00:22:18,059 the the population who did receive 617 00:22:16,980 --> 00:22:20,400 training or we can make the population 618 00:22:18,059 --> 00:22:22,799 look more like the ones who didn't 619 00:22:20,400 --> 00:22:24,720 and that's really the sort of you know 620 00:22:22,799 --> 00:22:28,020 one of the key powers of this is that it 621 00:22:24,720 --> 00:22:30,120 it enables you to answer questions like 622 00:22:28,020 --> 00:22:32,460 well what if we rolled out that program 623 00:22:30,120 --> 00:22:34,260 more widely like what if we replaced all 624 00:22:32,460 --> 00:22:35,340 of these old devices with some new 625 00:22:34,260 --> 00:22:36,900 device you know what would actually 626 00:22:35,340 --> 00:22:38,880 happen and this allows us to sort of 627 00:22:36,900 --> 00:22:40,559 answer those questions 628 00:22:38,880 --> 00:22:44,520 before I wrap up just want to quickly 629 00:22:40,559 --> 00:22:47,880 mention an app that that I created based 630 00:22:44,520 --> 00:22:50,159 on based on the Dubai library and um and 631 00:22:47,880 --> 00:22:51,659 really this app aims to make some of the 632 00:22:50,159 --> 00:22:53,700 topics we talked about today like 633 00:22:51,659 --> 00:22:56,340 causality accessible to a wider audience 634 00:22:53,700 --> 00:22:58,440 and specifically like trying to sort of 635 00:22:56,340 --> 00:23:00,840 make these techniques available to 636 00:22:58,440 --> 00:23:03,539 scientists and you know engineers and 637 00:23:00,840 --> 00:23:05,039 other people who aren't necessarily data 638 00:23:03,539 --> 00:23:06,720 scientists or python developers so 639 00:23:05,039 --> 00:23:09,419 they're not able to sort of access you 640 00:23:06,720 --> 00:23:12,059 know libraries like do wire directly 641 00:23:09,419 --> 00:23:14,039 and that app includes a causal diagram 642 00:23:12,059 --> 00:23:17,340 editor that enables you to sort of 643 00:23:14,039 --> 00:23:18,780 explore you know how different sort of 644 00:23:17,340 --> 00:23:20,400 models of your system would be 645 00:23:18,780 --> 00:23:22,100 represented and how you can use them in 646 00:23:20,400 --> 00:23:25,440 your studies 647 00:23:22,100 --> 00:23:27,600 so that pretty much wraps things up 648 00:23:25,440 --> 00:23:29,159 um I hope I've sort of made you at least 649 00:23:27,600 --> 00:23:31,080 made you intrigued about you know 650 00:23:29,159 --> 00:23:32,640 causality and causal inference I mean I 651 00:23:31,080 --> 00:23:35,100 believe that we should be using these 652 00:23:32,640 --> 00:23:36,720 methods more widely sort of discussing 653 00:23:35,100 --> 00:23:39,240 them sort of thinking about cause and 654 00:23:36,720 --> 00:23:41,940 effect and explicitly in a lot of 655 00:23:39,240 --> 00:23:44,100 especially observational studies there 656 00:23:41,940 --> 00:23:45,960 is this particular opportunity we're 657 00:23:44,100 --> 00:23:47,460 seeing where you know there's 658 00:23:45,960 --> 00:23:49,620 organizations have a huge amount of 659 00:23:47,460 --> 00:23:51,179 historical data and they've got that 660 00:23:49,620 --> 00:23:53,280 detailed domain knowledge that makes it 661 00:23:51,179 --> 00:23:55,320 very sort of accessible there and if 662 00:23:53,280 --> 00:23:57,900 you're thinking about doing cause of 663 00:23:55,320 --> 00:23:59,880 inference then I recommend do why it's 664 00:23:57,900 --> 00:24:01,200 under active development it's easy to 665 00:23:59,880 --> 00:24:02,880 use 666 00:24:01,200 --> 00:24:05,039 um and yeah as mentioned the code for 667 00:24:02,880 --> 00:24:08,100 this talk is available in the link there 668 00:24:05,039 --> 00:24:11,100 thanks for listening 669 00:24:08,100 --> 00:24:11,100 foreign