1
00:00:04,960 --> 00:00:19,999
[Music]

2
00:00:21,519 --> 00:00:25,920
I'd like to welcome Lizzy Sila uh to

3
00:00:23,480 --> 00:00:27,880
talk about causal Discovery in Python

4
00:00:25,920 --> 00:00:29,640
causal Discovery means learning what

5
00:00:27,880 --> 00:00:31,880
causes what from your data that's an

6
00:00:29,640 --> 00:00:33,559
incred incredibly important thing for

7
00:00:31,880 --> 00:00:35,640
anyone who wants to make conclusions

8
00:00:33,559 --> 00:00:37,960
from their data or to use their data to

9
00:00:35,640 --> 00:00:40,320
develop a strategic plan for Meaningful

10
00:00:37,960 --> 00:00:43,320
decisions for what to do next Lizzie

11
00:00:40,320 --> 00:00:45,280
Sila is a senior data scientist at wsp

12
00:00:43,320 --> 00:00:47,800
she has broad interest in applied data

13
00:00:45,280 --> 00:00:49,719
science and is worked in projects in

14
00:00:47,800 --> 00:00:52,199
electricity distribution water

15
00:00:49,719 --> 00:00:54,960
distribution abandoned M shaft detection

16
00:00:52,199 --> 00:00:57,399
fish ecology and arthitis monitoring via

17
00:00:54,960 --> 00:01:00,359
wearable devices among others she did

18
00:00:57,399 --> 00:01:02,640
her PhD in Al horal Discovery at carnegi

19
00:01:00,359 --> 00:01:04,000
melan University her pastimes include

20
00:01:02,640 --> 00:01:06,799
singing and choirs and running the

21
00:01:04,000 --> 00:01:08,720
monthly Melbourne machine learning Andi

22
00:01:06,799 --> 00:01:11,119
Meetup as well as the Melbourne chapter

23
00:01:08,720 --> 00:01:12,680
of puzzled fight she'll present a review

24
00:01:11,119 --> 00:01:16,159
and comparison of softwares available

25
00:01:12,680 --> 00:01:18,600
for causal Discovery in Python first a

26
00:01:16,159 --> 00:01:21,520
brief intro into causal Discovery um

27
00:01:18,600 --> 00:01:23,520
then uh run through uh various packages

28
00:01:21,520 --> 00:01:25,159
intended for that purpose and if there's

29
00:01:23,520 --> 00:01:27,479
time available maybe we'll get a quick

30
00:01:25,159 --> 00:01:31,439
demonstration at the end everybody

31
00:01:27,479 --> 00:01:31,439
please welcome Lizzie Silva

32
00:01:32,920 --> 00:01:40,000
thanks Genevie and thank you all for

33
00:01:35,000 --> 00:01:43,000
being here uh I have had spoilers for my

34
00:01:40,000 --> 00:01:47,079
first slide which is what is causal

35
00:01:43,000 --> 00:01:50,280
Discovery um so for those unfamiliar

36
00:01:47,079 --> 00:01:53,280
causal Discovery is about learning what

37
00:01:50,280 --> 00:01:56,240
causes what um great you can see my

38
00:01:53,280 --> 00:01:59,479
cursor uh so causal Discovery is what

39
00:01:56,240 --> 00:02:02,399
you would do first before later doing

40
00:01:59,479 --> 00:02:05,200
causal effect estimation that is

41
00:02:02,399 --> 00:02:08,119
learning how much does each feature have

42
00:02:05,200 --> 00:02:10,959
an effect the input to a causal

43
00:02:08,119 --> 00:02:13,920
Discovery algorithm is some kind of

44
00:02:10,959 --> 00:02:18,280
table with features in the columns and

45
00:02:13,920 --> 00:02:20,879
observations in the rows the output is a

46
00:02:18,280 --> 00:02:23,519
graph with each feature being one node

47
00:02:20,879 --> 00:02:26,000
in the graph and there's an arrow from

48
00:02:23,519 --> 00:02:28,640
one feature to another if there's a

49
00:02:26,000 --> 00:02:30,760
causal effect so smoking causes lung

50
00:02:28,640 --> 00:02:35,000
cancer we have an arrow from smoking to

51
00:02:30,760 --> 00:02:38,040
lung cancer and in between is the causal

52
00:02:35,000 --> 00:02:41,159
Discovery algorithm which is very

53
00:02:38,040 --> 00:02:43,959
complicated and can be explained in

54
00:02:41,159 --> 00:02:46,760
detail but in 30 minutes not really

55
00:02:43,959 --> 00:02:48,440
doable uh so what I'm going to do is

56
00:02:46,760 --> 00:02:50,680
point you if you want the mathematical

57
00:02:48,440 --> 00:02:53,200
details to a previous talk I gave on

58
00:02:50,680 --> 00:02:56,599
this topic that that QR code will be on

59
00:02:53,200 --> 00:02:59,120
the final slide as well and instead I'm

60
00:02:56,599 --> 00:03:02,000
just going to give you some reason to

61
00:02:59,120 --> 00:03:05,000
believe you might want to do this some

62
00:03:02,000 --> 00:03:06,760
intuition about why it might work and

63
00:03:05,000 --> 00:03:08,599
then just go straight into how would you

64
00:03:06,760 --> 00:03:10,799
do this in

65
00:03:08,599 --> 00:03:13,159
Python so when would I want to use

66
00:03:10,799 --> 00:03:15,920
causal Discovery given that we've all

67
00:03:13,159 --> 00:03:18,239
heard that correlation is not causation

68
00:03:15,920 --> 00:03:20,560
and we don't want to make mistakes the

69
00:03:18,239 --> 00:03:23,519
trouble is that we have to take actions

70
00:03:20,560 --> 00:03:26,280
in the world and we can't always do a

71
00:03:23,519 --> 00:03:28,640
randomized control trial to find out

72
00:03:26,280 --> 00:03:30,599
what causes what it is unethical to

73
00:03:28,640 --> 00:03:33,120
force people to smoke can see whether it

74
00:03:30,599 --> 00:03:35,400
gives them lung cancer so sometimes we

75
00:03:33,120 --> 00:03:39,640
have to make causal conclusions from

76
00:03:35,400 --> 00:03:42,040
observational data now usually we'd like

77
00:03:39,640 --> 00:03:45,120
to rely on our background knowledge

78
00:03:42,040 --> 00:03:47,400
about what causes what but unfortunately

79
00:03:45,120 --> 00:03:49,920
sometimes we don't know all the causal

80
00:03:47,400 --> 00:03:51,920
relationships in this case we can add

81
00:03:49,920 --> 00:03:54,680
what we do know as constraints and use

82
00:03:51,920 --> 00:03:57,799
causal Discovery to learn the

83
00:03:54,680 --> 00:04:01,159
rest sometimes our experts are not

84
00:03:57,799 --> 00:04:05,239
perfect and they may believe something

85
00:04:01,159 --> 00:04:07,840
that's not true in this case it might be

86
00:04:05,239 --> 00:04:10,720
worth just trying causal Discovery and

87
00:04:07,840 --> 00:04:12,560
see what it gets you you may find that

88
00:04:10,720 --> 00:04:15,000
the model you get out of a causal

89
00:04:12,560 --> 00:04:18,239
Discovery algorithm actually fits the

90
00:04:15,000 --> 00:04:21,079
data much better than an expert

91
00:04:18,239 --> 00:04:23,320
guess and the last situation is that you

92
00:04:21,079 --> 00:04:26,440
might just have way too many features to

93
00:04:23,320 --> 00:04:28,080
have any idea what's going on if you're

94
00:04:26,440 --> 00:04:31,919
if you're trying to learn a genetic

95
00:04:28,080 --> 00:04:33,880
regulatory network over a 20,000 genes

96
00:04:31,919 --> 00:04:36,440
uh you won't have background knowledge

97
00:04:33,880 --> 00:04:38,960
about which genes affect the others in

98
00:04:36,440 --> 00:04:42,120
this case causal Discovery is a great

99
00:04:38,960 --> 00:04:45,080
way to come up with a first guess to

100
00:04:42,120 --> 00:04:47,360
generate hypotheses and prioritize which

101
00:04:45,080 --> 00:04:50,680
experiments you're going to do

102
00:04:47,360 --> 00:04:54,360
first when can't I use causal

103
00:04:50,680 --> 00:04:58,160
Discovery if you have measured only two

104
00:04:54,360 --> 00:05:00,320
things and those two things have a gan

105
00:04:58,160 --> 00:05:02,360
distribution by which I mean the bell

106
00:05:00,320 --> 00:05:04,720
curve the normal

107
00:05:02,360 --> 00:05:06,880
distribution and they have a linear

108
00:05:04,720 --> 00:05:10,479
relationship between them in that case

109
00:05:06,880 --> 00:05:12,440
you're very sad pardon me uh you cannot

110
00:05:10,479 --> 00:05:16,199
do anything with causal Discovery in

111
00:05:12,440 --> 00:05:17,479
this situation that has been proven and

112
00:05:16,199 --> 00:05:19,319
this is why you've heard that

113
00:05:17,479 --> 00:05:23,080
correlation is not

114
00:05:19,319 --> 00:05:24,680
causation however in every other case we

115
00:05:23,080 --> 00:05:27,880
can learn

116
00:05:24,680 --> 00:05:30,520
something for example let's say we've

117
00:05:27,880 --> 00:05:32,800
only measured two features but

118
00:05:30,520 --> 00:05:34,680
those features have a non- Galan

119
00:05:32,800 --> 00:05:36,800
distribution so this is not the bell

120
00:05:34,680 --> 00:05:39,800
curve this is a uniform

121
00:05:36,800 --> 00:05:42,560
distribution in this case if I get the

122
00:05:39,800 --> 00:05:45,280
direction of causation right it's a

123
00:05:42,560 --> 00:05:46,720
little tick mark and I have the

124
00:05:45,280 --> 00:05:51,520
predictor

125
00:05:46,720 --> 00:05:54,000
X1 uh predicting the effect X2 then you

126
00:05:51,520 --> 00:05:57,319
can see the amount of noise I'm adding

127
00:05:54,000 --> 00:05:59,240
has got no relationship to the X1 value

128
00:05:57,319 --> 00:06:02,720
but if I get the direction of causation

129
00:05:59,240 --> 00:06:07,039
wrong suddenly the amount of noise

130
00:06:02,720 --> 00:06:08,199
depends on the value of X1 in a way that

131
00:06:07,039 --> 00:06:11,240
feels

132
00:06:08,199 --> 00:06:13,840
counterintuitive we expect intuitively

133
00:06:11,240 --> 00:06:16,039
that noise is just random other stuff

134
00:06:13,840 --> 00:06:18,440
that's happening and should have nothing

135
00:06:16,039 --> 00:06:22,199
to do with the relationship between X1

136
00:06:18,440 --> 00:06:25,080
and X2 so if you get the direction right

137
00:06:22,199 --> 00:06:27,360
you see a change in the dependence of

138
00:06:25,080 --> 00:06:32,000
the noise on the

139
00:06:27,360 --> 00:06:35,120
cause um so that's true for the this is

140
00:06:32,000 --> 00:06:39,280
a situation which you can use uh linear

141
00:06:35,120 --> 00:06:42,479
non- gasna cyclic models to learn causal

142
00:06:39,280 --> 00:06:45,280
relationships uh if you have Gan noise

143
00:06:42,479 --> 00:06:47,680
but nonlinear relationships you can use

144
00:06:45,280 --> 00:06:51,319
the same kind of trick if I get the

145
00:06:47,680 --> 00:06:54,120
direction of causation right the noise

146
00:06:51,319 --> 00:06:57,639
is unrelated to my predictor this is

147
00:06:54,120 --> 00:07:00,800
real data this is rings on Abalone

148
00:06:57,639 --> 00:07:04,800
predicting their length rings

149
00:07:00,800 --> 00:07:07,160
basically being a proxy for age age

150
00:07:04,800 --> 00:07:09,759
increases the length of the Abalone but

151
00:07:07,160 --> 00:07:11,840
if uh that length doesn't increase the

152
00:07:09,759 --> 00:07:14,520
age

153
00:07:11,840 --> 00:07:16,840
um so that that this is just trying to

154
00:07:14,520 --> 00:07:19,319
give you some intuition that it might be

155
00:07:16,840 --> 00:07:21,400
possible to learn causal

156
00:07:19,319 --> 00:07:23,000
relationships if you have data from a

157
00:07:21,400 --> 00:07:24,960
certain

158
00:07:23,000 --> 00:07:26,680
distribution the other thing you might

159
00:07:24,960 --> 00:07:30,599
have is more than two

160
00:07:26,680 --> 00:07:32,800
features on this one slide on the right

161
00:07:30,599 --> 00:07:35,400
hand side I want to give you some

162
00:07:32,800 --> 00:07:39,080
feeling for why this is doable and on

163
00:07:35,400 --> 00:07:43,440
the left hand side why it's hard this is

164
00:07:39,080 --> 00:07:47,360
the observable universe one picture uh

165
00:07:43,440 --> 00:07:52,440
and the number of directed ayylic graphs

166
00:07:47,360 --> 00:07:55,120
or causal models with 25 features is at

167
00:07:52,440 --> 00:07:57,680
least 10 orders of magnitude larger than

168
00:07:55,120 --> 00:08:00,800
the number of atoms in the known

169
00:07:57,680 --> 00:08:04,400
universe the space of models that we're

170
00:08:00,800 --> 00:08:06,919
trying to find the true model in is very

171
00:08:04,400 --> 00:08:08,879
large that's why there's a hard problem

172
00:08:06,919 --> 00:08:10,919
when you have more than two features

173
00:08:08,879 --> 00:08:16,319
however it is

174
00:08:10,919 --> 00:08:18,639
doable because we have sometimes these V

175
00:08:16,319 --> 00:08:21,159
structures where you've got two things

176
00:08:18,639 --> 00:08:23,639
that are not causally related but they

177
00:08:21,159 --> 00:08:28,560
both influence a common

178
00:08:23,639 --> 00:08:30,479
effect uh so in this case the in this

179
00:08:28,560 --> 00:08:32,760
example

180
00:08:30,479 --> 00:08:35,120
uh there's no dependence between the

181
00:08:32,760 --> 00:08:37,719
battery charge and the fuel tank level

182
00:08:35,120 --> 00:08:41,240
in a car they both influence whether a

183
00:08:37,719 --> 00:08:43,519
car starts so they're independent but

184
00:08:41,240 --> 00:08:46,880
when you condition on their common

185
00:08:43,519 --> 00:08:49,640
effect let's say the car did not start

186
00:08:46,880 --> 00:08:52,000
and you know that the battery is fully

187
00:08:49,640 --> 00:08:53,760
charged you can learn something about

188
00:08:52,000 --> 00:08:57,160
the value of the fuel tank you learn

189
00:08:53,760 --> 00:09:00,600
that it's empty so what we see with

190
00:08:57,160 --> 00:09:04,240
these V structures is in Independence of

191
00:09:00,600 --> 00:09:07,720
the predictors but dependence

192
00:09:04,240 --> 00:09:10,839
conditional on their common effect that

193
00:09:07,720 --> 00:09:13,200
unusual pattern of conditional

194
00:09:10,839 --> 00:09:15,440
Independence uh of Independence and

195
00:09:13,200 --> 00:09:18,040
conditional dependence indicates the

196
00:09:15,440 --> 00:09:20,839
presence of a v structure and when you

197
00:09:18,040 --> 00:09:24,800
see it in the data you can learn that V

198
00:09:20,839 --> 00:09:28,320
structure exists and Orient these edges

199
00:09:24,800 --> 00:09:31,399
in your causal model so it's doable but

200
00:09:28,320 --> 00:09:32,720
it is computation Ally intense so if

201
00:09:31,399 --> 00:09:35,440
you're in this

202
00:09:32,720 --> 00:09:37,880
situation um my suggestion for which

203
00:09:35,440 --> 00:09:41,640
algorithm to use depends on the number

204
00:09:37,880 --> 00:09:45,480
of features you're trying to learn on

205
00:09:41,640 --> 00:09:48,200
um I will say the grasp and boss

206
00:09:45,480 --> 00:09:50,120
algorithm they were invented after I

207
00:09:48,200 --> 00:09:52,360
graduated so I don't really understand

208
00:09:50,120 --> 00:09:54,760
how they work but I do understand

209
00:09:52,360 --> 00:09:57,000
they're way more accurate than anything

210
00:09:54,760 --> 00:09:59,000
that existed when I was studying

211
00:09:57,000 --> 00:10:01,720
especially on dense graphs so I

212
00:09:59,000 --> 00:10:04,079
recommend commend them up to maybe 100

213
00:10:01,720 --> 00:10:09,000
features after that you want something

214
00:10:04,079 --> 00:10:10,560
more efficient like the PC algorithm or

215
00:10:09,000 --> 00:10:13,640
fges

216
00:10:10,560 --> 00:10:16,720
um other questions that

217
00:10:13,640 --> 00:10:20,120
influence choice of

218
00:10:16,720 --> 00:10:22,519
algorithm you may believe that there are

219
00:10:20,120 --> 00:10:24,880
some latent confounders that you have

220
00:10:22,519 --> 00:10:27,920
not measured in your data set in which

221
00:10:24,880 --> 00:10:30,600
case there are algorithms all based on

222
00:10:27,920 --> 00:10:34,600
FCI which will represent the extra

223
00:10:30,600 --> 00:10:36,920
uncertainty related to those

224
00:10:34,600 --> 00:10:39,600
confounders many of these algorithms

225
00:10:36,920 --> 00:10:41,720
take a statistical test or a score

226
00:10:39,600 --> 00:10:44,519
function as a

227
00:10:41,720 --> 00:10:46,560
plugin and those tests and score

228
00:10:44,519 --> 00:10:49,200
functions depend on whether the features

229
00:10:46,560 --> 00:10:53,200
are continuous or discrete or a mixture

230
00:10:49,200 --> 00:10:55,200
of each um when you have a mixture of

231
00:10:53,200 --> 00:10:57,160
continuous and discrete features this is

232
00:10:55,200 --> 00:11:00,560
actually the hardest case because you

233
00:10:57,160 --> 00:11:03,480
need a score or a test there are similar

234
00:11:00,560 --> 00:11:05,560
levels of power no matter how many

235
00:11:03,480 --> 00:11:09,320
features are discreete or

236
00:11:05,560 --> 00:11:12,040
continuous um people often ask me what

237
00:11:09,320 --> 00:11:14,720
about time doesn't time make it much

238
00:11:12,040 --> 00:11:17,839
more easy

239
00:11:14,720 --> 00:11:20,560
um it's actually much easier when you

240
00:11:17,839 --> 00:11:22,720
just have a snapshot of a population

241
00:11:20,560 --> 00:11:25,120
because typically when people want to

242
00:11:22,720 --> 00:11:27,880
take time into account it's because

243
00:11:25,120 --> 00:11:30,760
they've only measured one system over

244
00:11:27,880 --> 00:11:33,000
time if you measure say Australia's

245
00:11:30,760 --> 00:11:37,720
economy over time your sample size is

246
00:11:33,000 --> 00:11:40,399
one there's one Australia and uh you

247
00:11:37,720 --> 00:11:43,800
want to use the different points in time

248
00:11:40,399 --> 00:11:47,120
as effectively different samples but you

249
00:11:43,800 --> 00:11:48,560
can have strange things happening like

250
00:11:47,120 --> 00:11:52,079
different causal

251
00:11:48,560 --> 00:11:54,959
relationships are working depending on

252
00:11:52,079 --> 00:11:57,560
seasonality um so time actually makes

253
00:11:54,959 --> 00:11:59,920
things much more difficult and I won't

254
00:11:57,560 --> 00:12:04,040
talk about it further

255
00:11:59,920 --> 00:12:05,079
um so instead now I'm going to jump to a

256
00:12:04,040 --> 00:12:07,639
worked

257
00:12:05,079 --> 00:12:10,959
example for this worked example I wanted

258
00:12:07,639 --> 00:12:13,839
a really small data set just three

259
00:12:10,959 --> 00:12:18,480
features where we could all agree on the

260
00:12:13,839 --> 00:12:21,720
causal model um so I picked uh the

261
00:12:18,480 --> 00:12:22,480
relationship between altitude latitude

262
00:12:21,720 --> 00:12:25,399
and

263
00:12:22,480 --> 00:12:29,920
temperature I hope we can agree that

264
00:12:25,399 --> 00:12:34,839
warming up a location will not move it

265
00:12:29,920 --> 00:12:36,880
but moving may change the temperature so

266
00:12:34,839 --> 00:12:40,720
we can agree on on these causal

267
00:12:36,880 --> 00:12:44,560
directions hopefully um this data set

268
00:12:40,720 --> 00:12:47,800
also has some nice nonlinearity and

269
00:12:44,560 --> 00:12:51,560
non-sanity so and it has a v structure

270
00:12:47,800 --> 00:12:53,120
so it should be extremely

271
00:12:51,560 --> 00:12:57,279
learnable

272
00:12:53,120 --> 00:13:01,040
um I also want to mention

273
00:12:57,279 --> 00:13:04,160
tetrad uh tetrad was the first software

274
00:13:01,040 --> 00:13:07,199
available for causal Discovery it is a

275
00:13:04,160 --> 00:13:09,440
massive Java Library it has been in

276
00:13:07,199 --> 00:13:12,240
development since the early 90s and it's

277
00:13:09,440 --> 00:13:14,560
mostly the work of Joe Ramsey who's

278
00:13:12,240 --> 00:13:17,160
pictured here holding his Mark of

279
00:13:14,560 --> 00:13:21,839
blanket which is a joke for the the real

280
00:13:17,160 --> 00:13:23,040
stats nerds in the audience um and it

281
00:13:21,839 --> 00:13:24,839
has

282
00:13:23,040 --> 00:13:27,880
everything

283
00:13:24,839 --> 00:13:29,800
um Joe is at the carnegi melan

284
00:13:27,880 --> 00:13:33,839
University philosophy Department which

285
00:13:29,800 --> 00:13:36,240
is where I studied um and the philosophy

286
00:13:33,839 --> 00:13:39,000
department has realized that most data

287
00:13:36,240 --> 00:13:42,079
scientists use Python so they now have

288
00:13:39,000 --> 00:13:44,600
not one but two python libraries for

289
00:13:42,079 --> 00:13:47,720
causal

290
00:13:44,600 --> 00:13:50,959
Discovery the first is called causal

291
00:13:47,720 --> 00:13:54,279
learn developed in 2020 and this is part

292
00:13:50,959 --> 00:13:56,360
of the Cari melon University Center for

293
00:13:54,279 --> 00:14:01,000
causal learning and

294
00:13:56,360 --> 00:14:04,160
reasoning um all of the implementations

295
00:14:01,000 --> 00:14:07,079
in csal learn are new they're all 100%

296
00:14:04,160 --> 00:14:09,880
python um and because they're 100%

297
00:14:07,079 --> 00:14:11,920
python they have limited performance for

298
00:14:09,880 --> 00:14:14,959
the really large graphs they don't have

299
00:14:11,920 --> 00:14:17,399
all the optimizations that are in tetrad

300
00:14:14,959 --> 00:14:19,759
but it's really actively maintained it's

301
00:14:17,399 --> 00:14:22,920
got a large user Community by the

302
00:14:19,759 --> 00:14:27,199
standards of causal Discovery user

303
00:14:22,920 --> 00:14:30,920
communities um so I will now jump

304
00:14:27,199 --> 00:14:34,040
to vs code yeah I wanted to show you in

305
00:14:30,920 --> 00:14:35,399
3D the relationship between temperature

306
00:14:34,040 --> 00:14:37,920
latitude and

307
00:14:35,399 --> 00:14:38,959
altitude um so this is temperature on

308
00:14:37,920 --> 00:14:42,480
the Y

309
00:14:38,959 --> 00:14:45,360
AIS uh latitude and altitude on the

310
00:14:42,480 --> 00:14:48,120
horizontal axes and you can see that

311
00:14:45,360 --> 00:14:51,639
there's this sort of surface of

312
00:14:48,120 --> 00:14:55,440
different temperatures at different

313
00:14:51,639 --> 00:14:57,199
locations um so caal

314
00:14:55,440 --> 00:14:59,080
learn

315
00:14:57,199 --> 00:15:01,839
um see this this big

316
00:14:59,080 --> 00:15:05,639
[Music]

317
00:15:01,839 --> 00:15:08,279
off we'll start by importing our

318
00:15:05,639 --> 00:15:12,880
libraries loading our

319
00:15:08,279 --> 00:15:14,240
data and then learning a graph is just

320
00:15:12,880 --> 00:15:16,399
this one

321
00:15:14,240 --> 00:15:19,040
line

322
00:15:16,399 --> 00:15:22,720
um and it's really quick and it learns

323
00:15:19,040 --> 00:15:27,199
the right graph but you may have

324
00:15:22,720 --> 00:15:29,880
noticed uh a little tell here what

325
00:15:27,199 --> 00:15:30,930
happens if I change this seed to

326
00:15:29,880 --> 00:15:32,079
one

327
00:15:30,930 --> 00:15:35,120
[Music]

328
00:15:32,079 --> 00:15:37,399
um it uh picks up the relationship

329
00:15:35,120 --> 00:15:39,240
between latitude and temperature but it

330
00:15:37,399 --> 00:15:42,360
can't Orient it because it no longer has

331
00:15:39,240 --> 00:15:45,279
a v structure and it's missed the

332
00:15:42,360 --> 00:15:46,920
altitude to temperature relationship so

333
00:15:45,279 --> 00:15:50,880
the

334
00:15:46,920 --> 00:15:54,759
um if I don't set the random seat at all

335
00:15:50,880 --> 00:15:58,519
and just keep generating this it uh

336
00:15:54,759 --> 00:16:01,600
flips between the two um there is some

337
00:15:58,519 --> 00:16:04,000
random in the permutation searches uh

338
00:16:01,600 --> 00:16:06,720
I'm not exactly sure where it is I dug

339
00:16:04,000 --> 00:16:10,959
around in the code and I couldn't see it

340
00:16:06,720 --> 00:16:13,240
um but I will say that they take a score

341
00:16:10,959 --> 00:16:16,480
function and the score function that is

342
00:16:13,240 --> 00:16:19,800
the default is best for Gan data which

343
00:16:16,480 --> 00:16:23,680
this is not so I I haven't exactly tried

344
00:16:19,800 --> 00:16:27,480
to optimize this um but yeah that is

345
00:16:23,680 --> 00:16:30,040
causal learn on this

346
00:16:27,480 --> 00:16:34,360
example second python

347
00:16:30,040 --> 00:16:38,560
package um also presented also produced

348
00:16:34,360 --> 00:16:41,680
by the CMU philosophy department is py

349
00:16:38,560 --> 00:16:45,199
tetrad it's a python wrapper for

350
00:16:41,680 --> 00:16:48,040
tetrad um it's got some some python API

351
00:16:45,199 --> 00:16:50,800
in it it's got this tetrad search class

352
00:16:48,040 --> 00:16:53,560
some data set type translators and graph

353
00:16:50,800 --> 00:16:55,360
translators and you can also access the

354
00:16:53,560 --> 00:16:59,000
rest of tetrad through

355
00:16:55,360 --> 00:17:02,079
jpipe given tetrad has everything this

356
00:16:59,000 --> 00:17:04,400
is really powerful it's also got a bunch

357
00:17:02,079 --> 00:17:07,120
of optimizations for dealing with large

358
00:17:04,400 --> 00:17:11,959
graphs and all the newest algorithms get

359
00:17:07,120 --> 00:17:13,880
implemented in tetrad first um so yeah

360
00:17:11,959 --> 00:17:17,760
it's maintained it's got a small but

361
00:17:13,880 --> 00:17:21,039
dedicated user base um but it's mostly

362
00:17:17,760 --> 00:17:22,760
maintained by Joe who's a Java guy so

363
00:17:21,039 --> 00:17:25,480
could use some help with the python

364
00:17:22,760 --> 00:17:29,520
packaging getting it up on piie things

365
00:17:25,480 --> 00:17:32,200
like that how about learning that same

366
00:17:29,520 --> 00:17:36,160
model in

367
00:17:32,200 --> 00:17:37,760
petrad once again load up our

368
00:17:36,160 --> 00:17:42,480
data

369
00:17:37,760 --> 00:17:43,240
and again it's one line oh two it's a

370
00:17:42,480 --> 00:17:45,679
few

371
00:17:43,240 --> 00:17:49,280
lines

372
00:17:45,679 --> 00:17:53,080
um and in this case I don't get a nice

373
00:17:49,280 --> 00:17:55,039
image out of the box um but you can read

374
00:17:53,080 --> 00:17:57,440
these edges are what we want we've got

375
00:17:55,039 --> 00:17:59,799
the altitude to temperature Edge and the

376
00:17:57,440 --> 00:18:01,640
latitude to temperature Edge

377
00:17:59,799 --> 00:18:05,320
and I can

378
00:18:01,640 --> 00:18:09,640
output an image of that

379
00:18:05,320 --> 00:18:14,120
graph which it's looking good good job

380
00:18:09,640 --> 00:18:16,600
patrade um now I I should admit that I

381
00:18:14,120 --> 00:18:18,640
signed myself up for talks in order to

382
00:18:16,600 --> 00:18:22,480
force myself to do things that are on my

383
00:18:18,640 --> 00:18:24,520
to-do list so the rest of the packages

384
00:18:22,480 --> 00:18:28,400
that I'm about to talk about are ones

385
00:18:24,520 --> 00:18:30,280
that I had intended to understand uh so

386
00:18:28,400 --> 00:18:33,640
I put in the abstract so that I had to

387
00:18:30,280 --> 00:18:37,440
review them um I really wanted to love

388
00:18:33,640 --> 00:18:41,280
the causal Discovery toolbox um it's

389
00:18:37,440 --> 00:18:45,200
built by devian Kenan who's now at

390
00:18:41,280 --> 00:18:48,320
fentech um and he did his PhD in caal

391
00:18:45,200 --> 00:18:51,600
Discovery under Isabelle guon who set up

392
00:18:48,320 --> 00:18:54,440
the causal pairs challenge so it's got a

393
00:18:51,600 --> 00:18:56,480
real emphasis on the pairwise methods

394
00:18:54,440 --> 00:18:58,799
things like the additive noise method I

395
00:18:56,480 --> 00:19:03,240
mentioned earlier and it's also got some

396
00:18:58,799 --> 00:19:06,039
deep learning methods um and and other

397
00:19:03,240 --> 00:19:09,720
things related to using lots of features

398
00:19:06,039 --> 00:19:12,360
are imports from the PC ALR package

399
00:19:09,720 --> 00:19:16,000
unfortunately it's no longer

400
00:19:12,360 --> 00:19:21,200
maintained hasn't been updated since

401
00:19:16,000 --> 00:19:24,320
2022 and uh let's just go over the um

402
00:19:21,200 --> 00:19:26,480
that same example in

403
00:19:24,320 --> 00:19:29,240
CDT

404
00:19:26,480 --> 00:19:31,799
um come on

405
00:19:29,240 --> 00:19:34,640
there we are load in our

406
00:19:31,799 --> 00:19:38,320
data uh you have to feed it a complete

407
00:19:34,640 --> 00:19:43,360
graph as an input

408
00:19:38,320 --> 00:19:45,520
um and then it learns that altitude

409
00:19:43,360 --> 00:19:47,080
causes

410
00:19:45,520 --> 00:19:50,440
latitude

411
00:19:47,080 --> 00:19:56,960
um so this is using the additive noise

412
00:19:50,440 --> 00:19:59,440
method which um if we review our our

413
00:19:56,960 --> 00:20:01,960
data the noise

414
00:19:59,440 --> 00:20:04,280
does look roughly additive depending on

415
00:20:01,960 --> 00:20:06,200
what point on the surface you're at but

416
00:20:04,280 --> 00:20:08,799
it may be that it's just not working at

417
00:20:06,200 --> 00:20:12,799
a pairwise level because if I just look

418
00:20:08,799 --> 00:20:16,840
at one view of this the noise doesn't

419
00:20:12,799 --> 00:20:20,799
look additive um so I I'm not entirely

420
00:20:16,840 --> 00:20:25,000
sure why CDT is not learning the right

421
00:20:20,799 --> 00:20:28,880
thing but just based on that quick

422
00:20:25,000 --> 00:20:30,760
evaluation can't exactly recommend it um

423
00:20:28,880 --> 00:20:34,880
um the

424
00:20:30,760 --> 00:20:39,520
next option was

425
00:20:34,880 --> 00:20:43,880
uh the causal NEX package this one is

426
00:20:39,520 --> 00:20:46,120
another um uh one built by an actual

427
00:20:43,880 --> 00:20:49,600
company it's built by McKenzie's AI

428
00:20:46,120 --> 00:20:51,559
Consulting team Quantum black um not

429
00:20:49,600 --> 00:20:54,799
sure if it will be maintained further

430
00:20:51,559 --> 00:20:57,400
last commit was 9 months ago um and its

431
00:20:54,799 --> 00:20:59,360
emphasis is on learning the size of

432
00:20:57,400 --> 00:21:01,400
causal effects not caal Cal Discovery

433
00:20:59,360 --> 00:21:03,360
but it does have a couple of causal

434
00:21:01,400 --> 00:21:05,760
Discovery methods and they're gradient

435
00:21:03,360 --> 00:21:08,080
descent based methods so if you wanted

436
00:21:05,760 --> 00:21:10,559
to fit causal Discovery into a deep

437
00:21:08,080 --> 00:21:14,960
learning pipeline this would be the way

438
00:21:10,559 --> 00:21:19,520
to do it um except that they're not very

439
00:21:14,960 --> 00:21:25,480
accurate so I if

440
00:21:19,520 --> 00:21:29,480
I try to learn this graph once again

441
00:21:25,480 --> 00:21:29,480
using um

442
00:21:30,840 --> 00:21:39,520
the same cities data it outputs an HTML

443
00:21:36,640 --> 00:21:42,400
file and it learns

444
00:21:39,520 --> 00:21:45,000
that latitude and temperature cause

445
00:21:42,400 --> 00:21:46,799
altitude but I do like this clicky drag

446
00:21:45,000 --> 00:21:48,480
you think it's extremely

447
00:21:46,799 --> 00:21:55,400
satisfying

448
00:21:48,480 --> 00:21:57,200
um so I can't really recommend the uh

449
00:21:55,400 --> 00:22:00,520
the gradient descent based methods

450
00:21:57,200 --> 00:22:05,559
either they're based on a relaxation of

451
00:22:00,520 --> 00:22:09,600
the acyclicity constraint and they don't

452
00:22:05,559 --> 00:22:12,679
work very well um the last package that

453
00:22:09,600 --> 00:22:16,880
I put in the um abstract and promise to

454
00:22:12,679 --> 00:22:20,480
talk about was Tiger might um and when I

455
00:22:16,880 --> 00:22:24,520
went and reviewed that uh it turned out

456
00:22:20,480 --> 00:22:27,919
that it only had time series methods um

457
00:22:24,520 --> 00:22:29,200
and was also very complicated academic

458
00:22:27,919 --> 00:22:32,480
code

459
00:22:29,200 --> 00:22:34,679
um so I will say that if you have time

460
00:22:32,480 --> 00:22:38,200
series data and you know what you're

461
00:22:34,679 --> 00:22:42,679
doing tiger might is the way to go but I

462
00:22:38,200 --> 00:22:44,480
cannot do my three variable example in

463
00:22:42,679 --> 00:22:47,600
it

464
00:22:44,480 --> 00:22:50,360
um so that that is basically it that's

465
00:22:47,600 --> 00:22:55,760
my review of the the causal Discovery

466
00:22:50,360 --> 00:22:57,840
packages in Python and it shock surprise

467
00:22:55,760 --> 00:23:02,400
I recommend the two produced by my

468
00:22:57,840 --> 00:23:04,960
academic Department um so in summary uh

469
00:23:02,400 --> 00:23:08,679
we all know that correlation doesn't

470
00:23:04,960 --> 00:23:10,880
equal causation but in some cases under

471
00:23:08,679 --> 00:23:14,440
some circumstances you can make an

472
00:23:10,880 --> 00:23:17,559
assumption and uh infer

473
00:23:14,440 --> 00:23:20,240
causation um most users who want to do

474
00:23:17,559 --> 00:23:22,279
this in Python should use causal learn

475
00:23:20,240 --> 00:23:25,559
if you have a large set of features like

476
00:23:22,279 --> 00:23:29,039
a genetic regulatory Network or a niche

477
00:23:25,559 --> 00:23:31,400
use case I suggest using p

478
00:23:29,039 --> 00:23:34,960
instead um and if you're an expert and

479
00:23:31,400 --> 00:23:38,720
you have time series data tiger

480
00:23:34,960 --> 00:23:40,890
might uh that is it thank you any

481
00:23:38,720 --> 00:23:48,779
questions

482
00:23:40,890 --> 00:23:48,779
[Applause]

483
00:24:02,000 --> 00:24:07,240
thanks um apologies if this is more for

484
00:24:04,480 --> 00:24:10,120
your 101 to which I will go and watch

485
00:24:07,240 --> 00:24:12,679
but what else can you do with these

486
00:24:10,120 --> 00:24:15,480
models so if the output is the outputs

487
00:24:12,679 --> 00:24:18,360
you showed were yeah finding the correct

488
00:24:15,480 --> 00:24:21,320
causal relationship but then can these

489
00:24:18,360 --> 00:24:23,840
models then be used to predict things

490
00:24:21,320 --> 00:24:26,399
like you if you get new values of these

491
00:24:23,840 --> 00:24:28,039
input features can you you know provide

492
00:24:26,399 --> 00:24:30,679
estimations of whether you know

493
00:24:28,039 --> 00:24:34,360
something is going to happen or not yeah

494
00:24:30,679 --> 00:24:36,919
great question how is a causal Discovery

495
00:24:34,360 --> 00:24:40,559
Model uh how is a causal model different

496
00:24:36,919 --> 00:24:43,679
to a statistical model um the difference

497
00:24:40,559 --> 00:24:45,440
is you're making predictions about what

498
00:24:43,679 --> 00:24:48,679
would happen if you were to intervene

499
00:24:45,440 --> 00:24:52,240
and change something in the world I'm

500
00:24:48,679 --> 00:24:55,360
predicting that given the the smoking

501
00:24:52,240 --> 00:24:58,679
and and yellow fingers and lung cancer

502
00:24:55,360 --> 00:25:00,919
example I'm predicting that if I were to

503
00:24:58,679 --> 00:25:03,120
paint someone's fingers yellow they

504
00:25:00,919 --> 00:25:04,919
wouldn't get lung cancer even though

505
00:25:03,120 --> 00:25:06,640
there is a correlation between having

506
00:25:04,919 --> 00:25:09,440
yellow fingers and having lung cancer

507
00:25:06,640 --> 00:25:12,360
because of the the nicotine stains um

508
00:25:09,440 --> 00:25:14,799
the causal effect is from smoking to

509
00:25:12,360 --> 00:25:17,960
lung cancer and smoking to having yellow

510
00:25:14,799 --> 00:25:18,919
fingers so you're predicting the result

511
00:25:17,960 --> 00:25:21,520
of

512
00:25:18,919 --> 00:25:24,960
interventions a statistical model just

513
00:25:21,520 --> 00:25:27,640
predicts what will happen if there are

514
00:25:24,960 --> 00:25:29,240
no interventions if you're drawing from

515
00:25:27,640 --> 00:25:32,480
the same distribution and not

516
00:25:29,240 --> 00:25:35,480
intervening and changing anything um so

517
00:25:32,480 --> 00:25:38,679
we want to use causal models if we want

518
00:25:35,480 --> 00:25:44,039
to do anything in the world um causal

519
00:25:38,679 --> 00:25:46,120
models are widely used in uh advertising

520
00:25:44,039 --> 00:25:48,600
because they they want to know like what

521
00:25:46,120 --> 00:25:50,720
is the causal effect of showing someone

522
00:25:48,600 --> 00:25:52,399
this ad because maybe they would have

523
00:25:50,720 --> 00:25:56,240
gone and bought the product

524
00:25:52,399 --> 00:25:58,760
anyway um does the ad actually change

525
00:25:56,240 --> 00:26:01,919
their propensity to buy

526
00:25:58,760 --> 00:26:03,679
um they're widely used in in health

527
00:26:01,919 --> 00:26:06,080
which is a much nicer and more fun

528
00:26:03,679 --> 00:26:08,279
example uh I think they should be used

529
00:26:06,080 --> 00:26:12,600
more often in asset

530
00:26:08,279 --> 00:26:12,600
maintenance uh yeah there's a lot of

531
00:26:12,880 --> 00:26:18,440
applications thank you everybody please

532
00:26:15,399 --> 00:26:20,799
join me again in thanking Lizzy Sila

533
00:26:18,440 --> 00:26:24,520
Lizzy we have a gift for you thank you

534
00:26:20,799 --> 00:26:24,520
so much for your talk