1 00:00:04,960 --> 00:00:19,999 [Music] 2 00:00:21,519 --> 00:00:25,920 I'd like to welcome Lizzy Sila uh to 3 00:00:23,480 --> 00:00:27,880 talk about causal Discovery in Python 4 00:00:25,920 --> 00:00:29,640 causal Discovery means learning what 5 00:00:27,880 --> 00:00:31,880 causes what from your data that's an 6 00:00:29,640 --> 00:00:33,559 incred incredibly important thing for 7 00:00:31,880 --> 00:00:35,640 anyone who wants to make conclusions 8 00:00:33,559 --> 00:00:37,960 from their data or to use their data to 9 00:00:35,640 --> 00:00:40,320 develop a strategic plan for Meaningful 10 00:00:37,960 --> 00:00:43,320 decisions for what to do next Lizzie 11 00:00:40,320 --> 00:00:45,280 Sila is a senior data scientist at wsp 12 00:00:43,320 --> 00:00:47,800 she has broad interest in applied data 13 00:00:45,280 --> 00:00:49,719 science and is worked in projects in 14 00:00:47,800 --> 00:00:52,199 electricity distribution water 15 00:00:49,719 --> 00:00:54,960 distribution abandoned M shaft detection 16 00:00:52,199 --> 00:00:57,399 fish ecology and arthitis monitoring via 17 00:00:54,960 --> 00:01:00,359 wearable devices among others she did 18 00:00:57,399 --> 00:01:02,640 her PhD in Al horal Discovery at carnegi 19 00:01:00,359 --> 00:01:04,000 melan University her pastimes include 20 00:01:02,640 --> 00:01:06,799 singing and choirs and running the 21 00:01:04,000 --> 00:01:08,720 monthly Melbourne machine learning Andi 22 00:01:06,799 --> 00:01:11,119 Meetup as well as the Melbourne chapter 23 00:01:08,720 --> 00:01:12,680 of puzzled fight she'll present a review 24 00:01:11,119 --> 00:01:16,159 and comparison of softwares available 25 00:01:12,680 --> 00:01:18,600 for causal Discovery in Python first a 26 00:01:16,159 --> 00:01:21,520 brief intro into causal Discovery um 27 00:01:18,600 --> 00:01:23,520 then uh run through uh various packages 28 00:01:21,520 --> 00:01:25,159 intended for that purpose and if there's 29 00:01:23,520 --> 00:01:27,479 time available maybe we'll get a quick 30 00:01:25,159 --> 00:01:31,439 demonstration at the end everybody 31 00:01:27,479 --> 00:01:31,439 please welcome Lizzie Silva 32 00:01:32,920 --> 00:01:40,000 thanks Genevie and thank you all for 33 00:01:35,000 --> 00:01:43,000 being here uh I have had spoilers for my 34 00:01:40,000 --> 00:01:47,079 first slide which is what is causal 35 00:01:43,000 --> 00:01:50,280 Discovery um so for those unfamiliar 36 00:01:47,079 --> 00:01:53,280 causal Discovery is about learning what 37 00:01:50,280 --> 00:01:56,240 causes what um great you can see my 38 00:01:53,280 --> 00:01:59,479 cursor uh so causal Discovery is what 39 00:01:56,240 --> 00:02:02,399 you would do first before later doing 40 00:01:59,479 --> 00:02:05,200 causal effect estimation that is 41 00:02:02,399 --> 00:02:08,119 learning how much does each feature have 42 00:02:05,200 --> 00:02:10,959 an effect the input to a causal 43 00:02:08,119 --> 00:02:13,920 Discovery algorithm is some kind of 44 00:02:10,959 --> 00:02:18,280 table with features in the columns and 45 00:02:13,920 --> 00:02:20,879 observations in the rows the output is a 46 00:02:18,280 --> 00:02:23,519 graph with each feature being one node 47 00:02:20,879 --> 00:02:26,000 in the graph and there's an arrow from 48 00:02:23,519 --> 00:02:28,640 one feature to another if there's a 49 00:02:26,000 --> 00:02:30,760 causal effect so smoking causes lung 50 00:02:28,640 --> 00:02:35,000 cancer we have an arrow from smoking to 51 00:02:30,760 --> 00:02:38,040 lung cancer and in between is the causal 52 00:02:35,000 --> 00:02:41,159 Discovery algorithm which is very 53 00:02:38,040 --> 00:02:43,959 complicated and can be explained in 54 00:02:41,159 --> 00:02:46,760 detail but in 30 minutes not really 55 00:02:43,959 --> 00:02:48,440 doable uh so what I'm going to do is 56 00:02:46,760 --> 00:02:50,680 point you if you want the mathematical 57 00:02:48,440 --> 00:02:53,200 details to a previous talk I gave on 58 00:02:50,680 --> 00:02:56,599 this topic that that QR code will be on 59 00:02:53,200 --> 00:02:59,120 the final slide as well and instead I'm 60 00:02:56,599 --> 00:03:02,000 just going to give you some reason to 61 00:02:59,120 --> 00:03:05,000 believe you might want to do this some 62 00:03:02,000 --> 00:03:06,760 intuition about why it might work and 63 00:03:05,000 --> 00:03:08,599 then just go straight into how would you 64 00:03:06,760 --> 00:03:10,799 do this in 65 00:03:08,599 --> 00:03:13,159 Python so when would I want to use 66 00:03:10,799 --> 00:03:15,920 causal Discovery given that we've all 67 00:03:13,159 --> 00:03:18,239 heard that correlation is not causation 68 00:03:15,920 --> 00:03:20,560 and we don't want to make mistakes the 69 00:03:18,239 --> 00:03:23,519 trouble is that we have to take actions 70 00:03:20,560 --> 00:03:26,280 in the world and we can't always do a 71 00:03:23,519 --> 00:03:28,640 randomized control trial to find out 72 00:03:26,280 --> 00:03:30,599 what causes what it is unethical to 73 00:03:28,640 --> 00:03:33,120 force people to smoke can see whether it 74 00:03:30,599 --> 00:03:35,400 gives them lung cancer so sometimes we 75 00:03:33,120 --> 00:03:39,640 have to make causal conclusions from 76 00:03:35,400 --> 00:03:42,040 observational data now usually we'd like 77 00:03:39,640 --> 00:03:45,120 to rely on our background knowledge 78 00:03:42,040 --> 00:03:47,400 about what causes what but unfortunately 79 00:03:45,120 --> 00:03:49,920 sometimes we don't know all the causal 80 00:03:47,400 --> 00:03:51,920 relationships in this case we can add 81 00:03:49,920 --> 00:03:54,680 what we do know as constraints and use 82 00:03:51,920 --> 00:03:57,799 causal Discovery to learn the 83 00:03:54,680 --> 00:04:01,159 rest sometimes our experts are not 84 00:03:57,799 --> 00:04:05,239 perfect and they may believe something 85 00:04:01,159 --> 00:04:07,840 that's not true in this case it might be 86 00:04:05,239 --> 00:04:10,720 worth just trying causal Discovery and 87 00:04:07,840 --> 00:04:12,560 see what it gets you you may find that 88 00:04:10,720 --> 00:04:15,000 the model you get out of a causal 89 00:04:12,560 --> 00:04:18,239 Discovery algorithm actually fits the 90 00:04:15,000 --> 00:04:21,079 data much better than an expert 91 00:04:18,239 --> 00:04:23,320 guess and the last situation is that you 92 00:04:21,079 --> 00:04:26,440 might just have way too many features to 93 00:04:23,320 --> 00:04:28,080 have any idea what's going on if you're 94 00:04:26,440 --> 00:04:31,919 if you're trying to learn a genetic 95 00:04:28,080 --> 00:04:33,880 regulatory network over a 20,000 genes 96 00:04:31,919 --> 00:04:36,440 uh you won't have background knowledge 97 00:04:33,880 --> 00:04:38,960 about which genes affect the others in 98 00:04:36,440 --> 00:04:42,120 this case causal Discovery is a great 99 00:04:38,960 --> 00:04:45,080 way to come up with a first guess to 100 00:04:42,120 --> 00:04:47,360 generate hypotheses and prioritize which 101 00:04:45,080 --> 00:04:50,680 experiments you're going to do 102 00:04:47,360 --> 00:04:54,360 first when can't I use causal 103 00:04:50,680 --> 00:04:58,160 Discovery if you have measured only two 104 00:04:54,360 --> 00:05:00,320 things and those two things have a gan 105 00:04:58,160 --> 00:05:02,360 distribution by which I mean the bell 106 00:05:00,320 --> 00:05:04,720 curve the normal 107 00:05:02,360 --> 00:05:06,880 distribution and they have a linear 108 00:05:04,720 --> 00:05:10,479 relationship between them in that case 109 00:05:06,880 --> 00:05:12,440 you're very sad pardon me uh you cannot 110 00:05:10,479 --> 00:05:16,199 do anything with causal Discovery in 111 00:05:12,440 --> 00:05:17,479 this situation that has been proven and 112 00:05:16,199 --> 00:05:19,319 this is why you've heard that 113 00:05:17,479 --> 00:05:23,080 correlation is not 114 00:05:19,319 --> 00:05:24,680 causation however in every other case we 115 00:05:23,080 --> 00:05:27,880 can learn 116 00:05:24,680 --> 00:05:30,520 something for example let's say we've 117 00:05:27,880 --> 00:05:32,800 only measured two features but 118 00:05:30,520 --> 00:05:34,680 those features have a non- Galan 119 00:05:32,800 --> 00:05:36,800 distribution so this is not the bell 120 00:05:34,680 --> 00:05:39,800 curve this is a uniform 121 00:05:36,800 --> 00:05:42,560 distribution in this case if I get the 122 00:05:39,800 --> 00:05:45,280 direction of causation right it's a 123 00:05:42,560 --> 00:05:46,720 little tick mark and I have the 124 00:05:45,280 --> 00:05:51,520 predictor 125 00:05:46,720 --> 00:05:54,000 X1 uh predicting the effect X2 then you 126 00:05:51,520 --> 00:05:57,319 can see the amount of noise I'm adding 127 00:05:54,000 --> 00:05:59,240 has got no relationship to the X1 value 128 00:05:57,319 --> 00:06:02,720 but if I get the direction of causation 129 00:05:59,240 --> 00:06:07,039 wrong suddenly the amount of noise 130 00:06:02,720 --> 00:06:08,199 depends on the value of X1 in a way that 131 00:06:07,039 --> 00:06:11,240 feels 132 00:06:08,199 --> 00:06:13,840 counterintuitive we expect intuitively 133 00:06:11,240 --> 00:06:16,039 that noise is just random other stuff 134 00:06:13,840 --> 00:06:18,440 that's happening and should have nothing 135 00:06:16,039 --> 00:06:22,199 to do with the relationship between X1 136 00:06:18,440 --> 00:06:25,080 and X2 so if you get the direction right 137 00:06:22,199 --> 00:06:27,360 you see a change in the dependence of 138 00:06:25,080 --> 00:06:32,000 the noise on the 139 00:06:27,360 --> 00:06:35,120 cause um so that's true for the this is 140 00:06:32,000 --> 00:06:39,280 a situation which you can use uh linear 141 00:06:35,120 --> 00:06:42,479 non- gasna cyclic models to learn causal 142 00:06:39,280 --> 00:06:45,280 relationships uh if you have Gan noise 143 00:06:42,479 --> 00:06:47,680 but nonlinear relationships you can use 144 00:06:45,280 --> 00:06:51,319 the same kind of trick if I get the 145 00:06:47,680 --> 00:06:54,120 direction of causation right the noise 146 00:06:51,319 --> 00:06:57,639 is unrelated to my predictor this is 147 00:06:54,120 --> 00:07:00,800 real data this is rings on Abalone 148 00:06:57,639 --> 00:07:04,800 predicting their length rings 149 00:07:00,800 --> 00:07:07,160 basically being a proxy for age age 150 00:07:04,800 --> 00:07:09,759 increases the length of the Abalone but 151 00:07:07,160 --> 00:07:11,840 if uh that length doesn't increase the 152 00:07:09,759 --> 00:07:14,520 age 153 00:07:11,840 --> 00:07:16,840 um so that that this is just trying to 154 00:07:14,520 --> 00:07:19,319 give you some intuition that it might be 155 00:07:16,840 --> 00:07:21,400 possible to learn causal 156 00:07:19,319 --> 00:07:23,000 relationships if you have data from a 157 00:07:21,400 --> 00:07:24,960 certain 158 00:07:23,000 --> 00:07:26,680 distribution the other thing you might 159 00:07:24,960 --> 00:07:30,599 have is more than two 160 00:07:26,680 --> 00:07:32,800 features on this one slide on the right 161 00:07:30,599 --> 00:07:35,400 hand side I want to give you some 162 00:07:32,800 --> 00:07:39,080 feeling for why this is doable and on 163 00:07:35,400 --> 00:07:43,440 the left hand side why it's hard this is 164 00:07:39,080 --> 00:07:47,360 the observable universe one picture uh 165 00:07:43,440 --> 00:07:52,440 and the number of directed ayylic graphs 166 00:07:47,360 --> 00:07:55,120 or causal models with 25 features is at 167 00:07:52,440 --> 00:07:57,680 least 10 orders of magnitude larger than 168 00:07:55,120 --> 00:08:00,800 the number of atoms in the known 169 00:07:57,680 --> 00:08:04,400 universe the space of models that we're 170 00:08:00,800 --> 00:08:06,919 trying to find the true model in is very 171 00:08:04,400 --> 00:08:08,879 large that's why there's a hard problem 172 00:08:06,919 --> 00:08:10,919 when you have more than two features 173 00:08:08,879 --> 00:08:16,319 however it is 174 00:08:10,919 --> 00:08:18,639 doable because we have sometimes these V 175 00:08:16,319 --> 00:08:21,159 structures where you've got two things 176 00:08:18,639 --> 00:08:23,639 that are not causally related but they 177 00:08:21,159 --> 00:08:28,560 both influence a common 178 00:08:23,639 --> 00:08:30,479 effect uh so in this case the in this 179 00:08:28,560 --> 00:08:32,760 example 180 00:08:30,479 --> 00:08:35,120 uh there's no dependence between the 181 00:08:32,760 --> 00:08:37,719 battery charge and the fuel tank level 182 00:08:35,120 --> 00:08:41,240 in a car they both influence whether a 183 00:08:37,719 --> 00:08:43,519 car starts so they're independent but 184 00:08:41,240 --> 00:08:46,880 when you condition on their common 185 00:08:43,519 --> 00:08:49,640 effect let's say the car did not start 186 00:08:46,880 --> 00:08:52,000 and you know that the battery is fully 187 00:08:49,640 --> 00:08:53,760 charged you can learn something about 188 00:08:52,000 --> 00:08:57,160 the value of the fuel tank you learn 189 00:08:53,760 --> 00:09:00,600 that it's empty so what we see with 190 00:08:57,160 --> 00:09:04,240 these V structures is in Independence of 191 00:09:00,600 --> 00:09:07,720 the predictors but dependence 192 00:09:04,240 --> 00:09:10,839 conditional on their common effect that 193 00:09:07,720 --> 00:09:13,200 unusual pattern of conditional 194 00:09:10,839 --> 00:09:15,440 Independence uh of Independence and 195 00:09:13,200 --> 00:09:18,040 conditional dependence indicates the 196 00:09:15,440 --> 00:09:20,839 presence of a v structure and when you 197 00:09:18,040 --> 00:09:24,800 see it in the data you can learn that V 198 00:09:20,839 --> 00:09:28,320 structure exists and Orient these edges 199 00:09:24,800 --> 00:09:31,399 in your causal model so it's doable but 200 00:09:28,320 --> 00:09:32,720 it is computation Ally intense so if 201 00:09:31,399 --> 00:09:35,440 you're in this 202 00:09:32,720 --> 00:09:37,880 situation um my suggestion for which 203 00:09:35,440 --> 00:09:41,640 algorithm to use depends on the number 204 00:09:37,880 --> 00:09:45,480 of features you're trying to learn on 205 00:09:41,640 --> 00:09:48,200 um I will say the grasp and boss 206 00:09:45,480 --> 00:09:50,120 algorithm they were invented after I 207 00:09:48,200 --> 00:09:52,360 graduated so I don't really understand 208 00:09:50,120 --> 00:09:54,760 how they work but I do understand 209 00:09:52,360 --> 00:09:57,000 they're way more accurate than anything 210 00:09:54,760 --> 00:09:59,000 that existed when I was studying 211 00:09:57,000 --> 00:10:01,720 especially on dense graphs so I 212 00:09:59,000 --> 00:10:04,079 recommend commend them up to maybe 100 213 00:10:01,720 --> 00:10:09,000 features after that you want something 214 00:10:04,079 --> 00:10:10,560 more efficient like the PC algorithm or 215 00:10:09,000 --> 00:10:13,640 fges 216 00:10:10,560 --> 00:10:16,720 um other questions that 217 00:10:13,640 --> 00:10:20,120 influence choice of 218 00:10:16,720 --> 00:10:22,519 algorithm you may believe that there are 219 00:10:20,120 --> 00:10:24,880 some latent confounders that you have 220 00:10:22,519 --> 00:10:27,920 not measured in your data set in which 221 00:10:24,880 --> 00:10:30,600 case there are algorithms all based on 222 00:10:27,920 --> 00:10:34,600 FCI which will represent the extra 223 00:10:30,600 --> 00:10:36,920 uncertainty related to those 224 00:10:34,600 --> 00:10:39,600 confounders many of these algorithms 225 00:10:36,920 --> 00:10:41,720 take a statistical test or a score 226 00:10:39,600 --> 00:10:44,519 function as a 227 00:10:41,720 --> 00:10:46,560 plugin and those tests and score 228 00:10:44,519 --> 00:10:49,200 functions depend on whether the features 229 00:10:46,560 --> 00:10:53,200 are continuous or discrete or a mixture 230 00:10:49,200 --> 00:10:55,200 of each um when you have a mixture of 231 00:10:53,200 --> 00:10:57,160 continuous and discrete features this is 232 00:10:55,200 --> 00:11:00,560 actually the hardest case because you 233 00:10:57,160 --> 00:11:03,480 need a score or a test there are similar 234 00:11:00,560 --> 00:11:05,560 levels of power no matter how many 235 00:11:03,480 --> 00:11:09,320 features are discreete or 236 00:11:05,560 --> 00:11:12,040 continuous um people often ask me what 237 00:11:09,320 --> 00:11:14,720 about time doesn't time make it much 238 00:11:12,040 --> 00:11:17,839 more easy 239 00:11:14,720 --> 00:11:20,560 um it's actually much easier when you 240 00:11:17,839 --> 00:11:22,720 just have a snapshot of a population 241 00:11:20,560 --> 00:11:25,120 because typically when people want to 242 00:11:22,720 --> 00:11:27,880 take time into account it's because 243 00:11:25,120 --> 00:11:30,760 they've only measured one system over 244 00:11:27,880 --> 00:11:33,000 time if you measure say Australia's 245 00:11:30,760 --> 00:11:37,720 economy over time your sample size is 246 00:11:33,000 --> 00:11:40,399 one there's one Australia and uh you 247 00:11:37,720 --> 00:11:43,800 want to use the different points in time 248 00:11:40,399 --> 00:11:47,120 as effectively different samples but you 249 00:11:43,800 --> 00:11:48,560 can have strange things happening like 250 00:11:47,120 --> 00:11:52,079 different causal 251 00:11:48,560 --> 00:11:54,959 relationships are working depending on 252 00:11:52,079 --> 00:11:57,560 seasonality um so time actually makes 253 00:11:54,959 --> 00:11:59,920 things much more difficult and I won't 254 00:11:57,560 --> 00:12:04,040 talk about it further 255 00:11:59,920 --> 00:12:05,079 um so instead now I'm going to jump to a 256 00:12:04,040 --> 00:12:07,639 worked 257 00:12:05,079 --> 00:12:10,959 example for this worked example I wanted 258 00:12:07,639 --> 00:12:13,839 a really small data set just three 259 00:12:10,959 --> 00:12:18,480 features where we could all agree on the 260 00:12:13,839 --> 00:12:21,720 causal model um so I picked uh the 261 00:12:18,480 --> 00:12:22,480 relationship between altitude latitude 262 00:12:21,720 --> 00:12:25,399 and 263 00:12:22,480 --> 00:12:29,920 temperature I hope we can agree that 264 00:12:25,399 --> 00:12:34,839 warming up a location will not move it 265 00:12:29,920 --> 00:12:36,880 but moving may change the temperature so 266 00:12:34,839 --> 00:12:40,720 we can agree on on these causal 267 00:12:36,880 --> 00:12:44,560 directions hopefully um this data set 268 00:12:40,720 --> 00:12:47,800 also has some nice nonlinearity and 269 00:12:44,560 --> 00:12:51,560 non-sanity so and it has a v structure 270 00:12:47,800 --> 00:12:53,120 so it should be extremely 271 00:12:51,560 --> 00:12:57,279 learnable 272 00:12:53,120 --> 00:13:01,040 um I also want to mention 273 00:12:57,279 --> 00:13:04,160 tetrad uh tetrad was the first software 274 00:13:01,040 --> 00:13:07,199 available for causal Discovery it is a 275 00:13:04,160 --> 00:13:09,440 massive Java Library it has been in 276 00:13:07,199 --> 00:13:12,240 development since the early 90s and it's 277 00:13:09,440 --> 00:13:14,560 mostly the work of Joe Ramsey who's 278 00:13:12,240 --> 00:13:17,160 pictured here holding his Mark of 279 00:13:14,560 --> 00:13:21,839 blanket which is a joke for the the real 280 00:13:17,160 --> 00:13:23,040 stats nerds in the audience um and it 281 00:13:21,839 --> 00:13:24,839 has 282 00:13:23,040 --> 00:13:27,880 everything 283 00:13:24,839 --> 00:13:29,800 um Joe is at the carnegi melan 284 00:13:27,880 --> 00:13:33,839 University philosophy Department which 285 00:13:29,800 --> 00:13:36,240 is where I studied um and the philosophy 286 00:13:33,839 --> 00:13:39,000 department has realized that most data 287 00:13:36,240 --> 00:13:42,079 scientists use Python so they now have 288 00:13:39,000 --> 00:13:44,600 not one but two python libraries for 289 00:13:42,079 --> 00:13:47,720 causal 290 00:13:44,600 --> 00:13:50,959 Discovery the first is called causal 291 00:13:47,720 --> 00:13:54,279 learn developed in 2020 and this is part 292 00:13:50,959 --> 00:13:56,360 of the Cari melon University Center for 293 00:13:54,279 --> 00:14:01,000 causal learning and 294 00:13:56,360 --> 00:14:04,160 reasoning um all of the implementations 295 00:14:01,000 --> 00:14:07,079 in csal learn are new they're all 100% 296 00:14:04,160 --> 00:14:09,880 python um and because they're 100% 297 00:14:07,079 --> 00:14:11,920 python they have limited performance for 298 00:14:09,880 --> 00:14:14,959 the really large graphs they don't have 299 00:14:11,920 --> 00:14:17,399 all the optimizations that are in tetrad 300 00:14:14,959 --> 00:14:19,759 but it's really actively maintained it's 301 00:14:17,399 --> 00:14:22,920 got a large user Community by the 302 00:14:19,759 --> 00:14:27,199 standards of causal Discovery user 303 00:14:22,920 --> 00:14:30,920 communities um so I will now jump 304 00:14:27,199 --> 00:14:34,040 to vs code yeah I wanted to show you in 305 00:14:30,920 --> 00:14:35,399 3D the relationship between temperature 306 00:14:34,040 --> 00:14:37,920 latitude and 307 00:14:35,399 --> 00:14:38,959 altitude um so this is temperature on 308 00:14:37,920 --> 00:14:42,480 the Y 309 00:14:38,959 --> 00:14:45,360 AIS uh latitude and altitude on the 310 00:14:42,480 --> 00:14:48,120 horizontal axes and you can see that 311 00:14:45,360 --> 00:14:51,639 there's this sort of surface of 312 00:14:48,120 --> 00:14:55,440 different temperatures at different 313 00:14:51,639 --> 00:14:57,199 locations um so caal 314 00:14:55,440 --> 00:14:59,080 learn 315 00:14:57,199 --> 00:15:01,839 um see this this big 316 00:14:59,080 --> 00:15:05,639 [Music] 317 00:15:01,839 --> 00:15:08,279 off we'll start by importing our 318 00:15:05,639 --> 00:15:12,880 libraries loading our 319 00:15:08,279 --> 00:15:14,240 data and then learning a graph is just 320 00:15:12,880 --> 00:15:16,399 this one 321 00:15:14,240 --> 00:15:19,040 line 322 00:15:16,399 --> 00:15:22,720 um and it's really quick and it learns 323 00:15:19,040 --> 00:15:27,199 the right graph but you may have 324 00:15:22,720 --> 00:15:29,880 noticed uh a little tell here what 325 00:15:27,199 --> 00:15:30,930 happens if I change this seed to 326 00:15:29,880 --> 00:15:32,079 one 327 00:15:30,930 --> 00:15:35,120 [Music] 328 00:15:32,079 --> 00:15:37,399 um it uh picks up the relationship 329 00:15:35,120 --> 00:15:39,240 between latitude and temperature but it 330 00:15:37,399 --> 00:15:42,360 can't Orient it because it no longer has 331 00:15:39,240 --> 00:15:45,279 a v structure and it's missed the 332 00:15:42,360 --> 00:15:46,920 altitude to temperature relationship so 333 00:15:45,279 --> 00:15:50,880 the 334 00:15:46,920 --> 00:15:54,759 um if I don't set the random seat at all 335 00:15:50,880 --> 00:15:58,519 and just keep generating this it uh 336 00:15:54,759 --> 00:16:01,600 flips between the two um there is some 337 00:15:58,519 --> 00:16:04,000 random in the permutation searches uh 338 00:16:01,600 --> 00:16:06,720 I'm not exactly sure where it is I dug 339 00:16:04,000 --> 00:16:10,959 around in the code and I couldn't see it 340 00:16:06,720 --> 00:16:13,240 um but I will say that they take a score 341 00:16:10,959 --> 00:16:16,480 function and the score function that is 342 00:16:13,240 --> 00:16:19,800 the default is best for Gan data which 343 00:16:16,480 --> 00:16:23,680 this is not so I I haven't exactly tried 344 00:16:19,800 --> 00:16:27,480 to optimize this um but yeah that is 345 00:16:23,680 --> 00:16:30,040 causal learn on this 346 00:16:27,480 --> 00:16:34,360 example second python 347 00:16:30,040 --> 00:16:38,560 package um also presented also produced 348 00:16:34,360 --> 00:16:41,680 by the CMU philosophy department is py 349 00:16:38,560 --> 00:16:45,199 tetrad it's a python wrapper for 350 00:16:41,680 --> 00:16:48,040 tetrad um it's got some some python API 351 00:16:45,199 --> 00:16:50,800 in it it's got this tetrad search class 352 00:16:48,040 --> 00:16:53,560 some data set type translators and graph 353 00:16:50,800 --> 00:16:55,360 translators and you can also access the 354 00:16:53,560 --> 00:16:59,000 rest of tetrad through 355 00:16:55,360 --> 00:17:02,079 jpipe given tetrad has everything this 356 00:16:59,000 --> 00:17:04,400 is really powerful it's also got a bunch 357 00:17:02,079 --> 00:17:07,120 of optimizations for dealing with large 358 00:17:04,400 --> 00:17:11,959 graphs and all the newest algorithms get 359 00:17:07,120 --> 00:17:13,880 implemented in tetrad first um so yeah 360 00:17:11,959 --> 00:17:17,760 it's maintained it's got a small but 361 00:17:13,880 --> 00:17:21,039 dedicated user base um but it's mostly 362 00:17:17,760 --> 00:17:22,760 maintained by Joe who's a Java guy so 363 00:17:21,039 --> 00:17:25,480 could use some help with the python 364 00:17:22,760 --> 00:17:29,520 packaging getting it up on piie things 365 00:17:25,480 --> 00:17:32,200 like that how about learning that same 366 00:17:29,520 --> 00:17:36,160 model in 367 00:17:32,200 --> 00:17:37,760 petrad once again load up our 368 00:17:36,160 --> 00:17:42,480 data 369 00:17:37,760 --> 00:17:43,240 and again it's one line oh two it's a 370 00:17:42,480 --> 00:17:45,679 few 371 00:17:43,240 --> 00:17:49,280 lines 372 00:17:45,679 --> 00:17:53,080 um and in this case I don't get a nice 373 00:17:49,280 --> 00:17:55,039 image out of the box um but you can read 374 00:17:53,080 --> 00:17:57,440 these edges are what we want we've got 375 00:17:55,039 --> 00:17:59,799 the altitude to temperature Edge and the 376 00:17:57,440 --> 00:18:01,640 latitude to temperature Edge 377 00:17:59,799 --> 00:18:05,320 and I can 378 00:18:01,640 --> 00:18:09,640 output an image of that 379 00:18:05,320 --> 00:18:14,120 graph which it's looking good good job 380 00:18:09,640 --> 00:18:16,600 patrade um now I I should admit that I 381 00:18:14,120 --> 00:18:18,640 signed myself up for talks in order to 382 00:18:16,600 --> 00:18:22,480 force myself to do things that are on my 383 00:18:18,640 --> 00:18:24,520 to-do list so the rest of the packages 384 00:18:22,480 --> 00:18:28,400 that I'm about to talk about are ones 385 00:18:24,520 --> 00:18:30,280 that I had intended to understand uh so 386 00:18:28,400 --> 00:18:33,640 I put in the abstract so that I had to 387 00:18:30,280 --> 00:18:37,440 review them um I really wanted to love 388 00:18:33,640 --> 00:18:41,280 the causal Discovery toolbox um it's 389 00:18:37,440 --> 00:18:45,200 built by devian Kenan who's now at 390 00:18:41,280 --> 00:18:48,320 fentech um and he did his PhD in caal 391 00:18:45,200 --> 00:18:51,600 Discovery under Isabelle guon who set up 392 00:18:48,320 --> 00:18:54,440 the causal pairs challenge so it's got a 393 00:18:51,600 --> 00:18:56,480 real emphasis on the pairwise methods 394 00:18:54,440 --> 00:18:58,799 things like the additive noise method I 395 00:18:56,480 --> 00:19:03,240 mentioned earlier and it's also got some 396 00:18:58,799 --> 00:19:06,039 deep learning methods um and and other 397 00:19:03,240 --> 00:19:09,720 things related to using lots of features 398 00:19:06,039 --> 00:19:12,360 are imports from the PC ALR package 399 00:19:09,720 --> 00:19:16,000 unfortunately it's no longer 400 00:19:12,360 --> 00:19:21,200 maintained hasn't been updated since 401 00:19:16,000 --> 00:19:24,320 2022 and uh let's just go over the um 402 00:19:21,200 --> 00:19:26,480 that same example in 403 00:19:24,320 --> 00:19:29,240 CDT 404 00:19:26,480 --> 00:19:31,799 um come on 405 00:19:29,240 --> 00:19:34,640 there we are load in our 406 00:19:31,799 --> 00:19:38,320 data uh you have to feed it a complete 407 00:19:34,640 --> 00:19:43,360 graph as an input 408 00:19:38,320 --> 00:19:45,520 um and then it learns that altitude 409 00:19:43,360 --> 00:19:47,080 causes 410 00:19:45,520 --> 00:19:50,440 latitude 411 00:19:47,080 --> 00:19:56,960 um so this is using the additive noise 412 00:19:50,440 --> 00:19:59,440 method which um if we review our our 413 00:19:56,960 --> 00:20:01,960 data the noise 414 00:19:59,440 --> 00:20:04,280 does look roughly additive depending on 415 00:20:01,960 --> 00:20:06,200 what point on the surface you're at but 416 00:20:04,280 --> 00:20:08,799 it may be that it's just not working at 417 00:20:06,200 --> 00:20:12,799 a pairwise level because if I just look 418 00:20:08,799 --> 00:20:16,840 at one view of this the noise doesn't 419 00:20:12,799 --> 00:20:20,799 look additive um so I I'm not entirely 420 00:20:16,840 --> 00:20:25,000 sure why CDT is not learning the right 421 00:20:20,799 --> 00:20:28,880 thing but just based on that quick 422 00:20:25,000 --> 00:20:30,760 evaluation can't exactly recommend it um 423 00:20:28,880 --> 00:20:34,880 um the 424 00:20:30,760 --> 00:20:39,520 next option was 425 00:20:34,880 --> 00:20:43,880 uh the causal NEX package this one is 426 00:20:39,520 --> 00:20:46,120 another um uh one built by an actual 427 00:20:43,880 --> 00:20:49,600 company it's built by McKenzie's AI 428 00:20:46,120 --> 00:20:51,559 Consulting team Quantum black um not 429 00:20:49,600 --> 00:20:54,799 sure if it will be maintained further 430 00:20:51,559 --> 00:20:57,400 last commit was 9 months ago um and its 431 00:20:54,799 --> 00:20:59,360 emphasis is on learning the size of 432 00:20:57,400 --> 00:21:01,400 causal effects not caal Cal Discovery 433 00:20:59,360 --> 00:21:03,360 but it does have a couple of causal 434 00:21:01,400 --> 00:21:05,760 Discovery methods and they're gradient 435 00:21:03,360 --> 00:21:08,080 descent based methods so if you wanted 436 00:21:05,760 --> 00:21:10,559 to fit causal Discovery into a deep 437 00:21:08,080 --> 00:21:14,960 learning pipeline this would be the way 438 00:21:10,559 --> 00:21:19,520 to do it um except that they're not very 439 00:21:14,960 --> 00:21:25,480 accurate so I if 440 00:21:19,520 --> 00:21:29,480 I try to learn this graph once again 441 00:21:25,480 --> 00:21:29,480 using um 442 00:21:30,840 --> 00:21:39,520 the same cities data it outputs an HTML 443 00:21:36,640 --> 00:21:42,400 file and it learns 444 00:21:39,520 --> 00:21:45,000 that latitude and temperature cause 445 00:21:42,400 --> 00:21:46,799 altitude but I do like this clicky drag 446 00:21:45,000 --> 00:21:48,480 you think it's extremely 447 00:21:46,799 --> 00:21:55,400 satisfying 448 00:21:48,480 --> 00:21:57,200 um so I can't really recommend the uh 449 00:21:55,400 --> 00:22:00,520 the gradient descent based methods 450 00:21:57,200 --> 00:22:05,559 either they're based on a relaxation of 451 00:22:00,520 --> 00:22:09,600 the acyclicity constraint and they don't 452 00:22:05,559 --> 00:22:12,679 work very well um the last package that 453 00:22:09,600 --> 00:22:16,880 I put in the um abstract and promise to 454 00:22:12,679 --> 00:22:20,480 talk about was Tiger might um and when I 455 00:22:16,880 --> 00:22:24,520 went and reviewed that uh it turned out 456 00:22:20,480 --> 00:22:27,919 that it only had time series methods um 457 00:22:24,520 --> 00:22:29,200 and was also very complicated academic 458 00:22:27,919 --> 00:22:32,480 code 459 00:22:29,200 --> 00:22:34,679 um so I will say that if you have time 460 00:22:32,480 --> 00:22:38,200 series data and you know what you're 461 00:22:34,679 --> 00:22:42,679 doing tiger might is the way to go but I 462 00:22:38,200 --> 00:22:44,480 cannot do my three variable example in 463 00:22:42,679 --> 00:22:47,600 it 464 00:22:44,480 --> 00:22:50,360 um so that that is basically it that's 465 00:22:47,600 --> 00:22:55,760 my review of the the causal Discovery 466 00:22:50,360 --> 00:22:57,840 packages in Python and it shock surprise 467 00:22:55,760 --> 00:23:02,400 I recommend the two produced by my 468 00:22:57,840 --> 00:23:04,960 academic Department um so in summary uh 469 00:23:02,400 --> 00:23:08,679 we all know that correlation doesn't 470 00:23:04,960 --> 00:23:10,880 equal causation but in some cases under 471 00:23:08,679 --> 00:23:14,440 some circumstances you can make an 472 00:23:10,880 --> 00:23:17,559 assumption and uh infer 473 00:23:14,440 --> 00:23:20,240 causation um most users who want to do 474 00:23:17,559 --> 00:23:22,279 this in Python should use causal learn 475 00:23:20,240 --> 00:23:25,559 if you have a large set of features like 476 00:23:22,279 --> 00:23:29,039 a genetic regulatory Network or a niche 477 00:23:25,559 --> 00:23:31,400 use case I suggest using p 478 00:23:29,039 --> 00:23:34,960 instead um and if you're an expert and 479 00:23:31,400 --> 00:23:38,720 you have time series data tiger 480 00:23:34,960 --> 00:23:40,890 might uh that is it thank you any 481 00:23:38,720 --> 00:23:48,779 questions 482 00:23:40,890 --> 00:23:48,779 [Applause] 483 00:24:02,000 --> 00:24:07,240 thanks um apologies if this is more for 484 00:24:04,480 --> 00:24:10,120 your 101 to which I will go and watch 485 00:24:07,240 --> 00:24:12,679 but what else can you do with these 486 00:24:10,120 --> 00:24:15,480 models so if the output is the outputs 487 00:24:12,679 --> 00:24:18,360 you showed were yeah finding the correct 488 00:24:15,480 --> 00:24:21,320 causal relationship but then can these 489 00:24:18,360 --> 00:24:23,840 models then be used to predict things 490 00:24:21,320 --> 00:24:26,399 like you if you get new values of these 491 00:24:23,840 --> 00:24:28,039 input features can you you know provide 492 00:24:26,399 --> 00:24:30,679 estimations of whether you know 493 00:24:28,039 --> 00:24:34,360 something is going to happen or not yeah 494 00:24:30,679 --> 00:24:36,919 great question how is a causal Discovery 495 00:24:34,360 --> 00:24:40,559 Model uh how is a causal model different 496 00:24:36,919 --> 00:24:43,679 to a statistical model um the difference 497 00:24:40,559 --> 00:24:45,440 is you're making predictions about what 498 00:24:43,679 --> 00:24:48,679 would happen if you were to intervene 499 00:24:45,440 --> 00:24:52,240 and change something in the world I'm 500 00:24:48,679 --> 00:24:55,360 predicting that given the the smoking 501 00:24:52,240 --> 00:24:58,679 and and yellow fingers and lung cancer 502 00:24:55,360 --> 00:25:00,919 example I'm predicting that if I were to 503 00:24:58,679 --> 00:25:03,120 paint someone's fingers yellow they 504 00:25:00,919 --> 00:25:04,919 wouldn't get lung cancer even though 505 00:25:03,120 --> 00:25:06,640 there is a correlation between having 506 00:25:04,919 --> 00:25:09,440 yellow fingers and having lung cancer 507 00:25:06,640 --> 00:25:12,360 because of the the nicotine stains um 508 00:25:09,440 --> 00:25:14,799 the causal effect is from smoking to 509 00:25:12,360 --> 00:25:17,960 lung cancer and smoking to having yellow 510 00:25:14,799 --> 00:25:18,919 fingers so you're predicting the result 511 00:25:17,960 --> 00:25:21,520 of 512 00:25:18,919 --> 00:25:24,960 interventions a statistical model just 513 00:25:21,520 --> 00:25:27,640 predicts what will happen if there are 514 00:25:24,960 --> 00:25:29,240 no interventions if you're drawing from 515 00:25:27,640 --> 00:25:32,480 the same distribution and not 516 00:25:29,240 --> 00:25:35,480 intervening and changing anything um so 517 00:25:32,480 --> 00:25:38,679 we want to use causal models if we want 518 00:25:35,480 --> 00:25:44,039 to do anything in the world um causal 519 00:25:38,679 --> 00:25:46,120 models are widely used in uh advertising 520 00:25:44,039 --> 00:25:48,600 because they they want to know like what 521 00:25:46,120 --> 00:25:50,720 is the causal effect of showing someone 522 00:25:48,600 --> 00:25:52,399 this ad because maybe they would have 523 00:25:50,720 --> 00:25:56,240 gone and bought the product 524 00:25:52,399 --> 00:25:58,760 anyway um does the ad actually change 525 00:25:56,240 --> 00:26:01,919 their propensity to buy 526 00:25:58,760 --> 00:26:03,679 um they're widely used in in health 527 00:26:01,919 --> 00:26:06,080 which is a much nicer and more fun 528 00:26:03,679 --> 00:26:08,279 example uh I think they should be used 529 00:26:06,080 --> 00:26:12,600 more often in asset 530 00:26:08,279 --> 00:26:12,600 maintenance uh yeah there's a lot of 531 00:26:12,880 --> 00:26:18,440 applications thank you everybody please 532 00:26:15,399 --> 00:26:20,799 join me again in thanking Lizzy Sila 533 00:26:18,440 --> 00:26:24,520 Lizzy we have a gift for you thank you 534 00:26:20,799 --> 00:26:24,520 so much for your talk