1
00:00:00,480 --> 00:00:03,480
foreign

2
00:00:09,059 --> 00:00:13,799
welcome back hope you enjoyed your break

3
00:00:11,000 --> 00:00:15,599
our first speaker for this session will

4
00:00:13,799 --> 00:00:17,520
be

5
00:00:15,599 --> 00:00:19,199
David rollinson

6
00:00:17,520 --> 00:00:20,760
he will be talking he will be giving an

7
00:00:19,199 --> 00:00:24,910
introduction to causal inference with

8
00:00:20,760 --> 00:00:32,100
python I'd like to welcome you on stage

9
00:00:24,910 --> 00:00:32,100
[Applause]

10
00:00:33,660 --> 00:00:37,500
okay

11
00:00:34,280 --> 00:00:39,719
hopefully that's going to pop up

12
00:00:37,500 --> 00:00:42,300
all right thanks um yeah great to be

13
00:00:39,719 --> 00:00:44,280
here and um I'm really excited to be

14
00:00:42,300 --> 00:00:46,200
talking to you today about causal

15
00:00:44,280 --> 00:00:48,059
inference with python it's a topic that

16
00:00:46,200 --> 00:00:49,739
I've been increasingly passionate about

17
00:00:48,059 --> 00:00:51,300
over the last few years because I've

18
00:00:49,739 --> 00:00:52,920
seen sort of how much it can really

19
00:00:51,300 --> 00:00:55,680
impact the way that we do data science

20
00:00:52,920 --> 00:00:57,059
and machine learning in Industry then

21
00:00:55,680 --> 00:00:58,739
this Talk's going to kind of have two

22
00:00:57,059 --> 00:01:00,780
parts the first part I'm going to try

23
00:00:58,739 --> 00:01:02,520
and convince you that you also should be

24
00:01:00,780 --> 00:01:05,760
interested and passionate about cause of

25
00:01:02,520 --> 00:01:07,200
inference and more broadly causality and

26
00:01:05,760 --> 00:01:09,240
then in the second part we're going to

27
00:01:07,200 --> 00:01:11,880
work through a simple example with a

28
00:01:09,240 --> 00:01:14,100
python Library called do wise which

29
00:01:11,880 --> 00:01:15,360
enables you to calculate you know cause

30
00:01:14,100 --> 00:01:17,159
and effect

31
00:01:15,360 --> 00:01:19,500
um in Python

32
00:01:17,159 --> 00:01:20,659
so as soon as you start sort of looking

33
00:01:19,500 --> 00:01:23,400
into

34
00:01:20,659 --> 00:01:25,200
causal inference you'll encounter this

35
00:01:23,400 --> 00:01:26,759
term causality and at first it seems

36
00:01:25,200 --> 00:01:28,799
like it's a bit of a sort of nebulous

37
00:01:26,759 --> 00:01:30,600
concept and really it it kind of doesn't

38
00:01:28,799 --> 00:01:33,240
have a very specific definition it sort

39
00:01:30,600 --> 00:01:36,060
of encompasses a range of topics around

40
00:01:33,240 --> 00:01:38,640
the science of cause and effect and this

41
00:01:36,060 --> 00:01:40,200
is a topic that actually is everywhere

42
00:01:38,640 --> 00:01:42,479
there are many questions that you'll

43
00:01:40,200 --> 00:01:45,479
encounter in you know a data science

44
00:01:42,479 --> 00:01:46,680
role which are inherently causal and you

45
00:01:45,479 --> 00:01:49,619
know if you look out for the the words

46
00:01:46,680 --> 00:01:52,140
in red here like what would happen if or

47
00:01:49,619 --> 00:01:54,180
why did this happen like I call these

48
00:01:52,140 --> 00:01:55,799
questions inherently causal because to

49
00:01:54,180 --> 00:01:58,259
answer them properly you really need an

50
00:01:55,799 --> 00:02:00,720
understanding of causation not just

51
00:01:58,259 --> 00:02:02,820
Association or correlation

52
00:02:00,720 --> 00:02:04,079
what's interesting is that most of the

53
00:02:02,820 --> 00:02:06,899
machine learning models that you'll

54
00:02:04,079 --> 00:02:08,280
encounter are not explicitly causal even

55
00:02:06,899 --> 00:02:10,200
when they're trying to address these

56
00:02:08,280 --> 00:02:12,660
causal questions

57
00:02:10,200 --> 00:02:14,220
and one of the things I often encounter

58
00:02:12,660 --> 00:02:16,500
talking to sort of particularly machine

59
00:02:14,220 --> 00:02:18,440
learning and AI people is well can't you

60
00:02:16,500 --> 00:02:21,239
do predictions with an associative model

61
00:02:18,440 --> 00:02:22,860
and it's true you can I mean that that's

62
00:02:21,239 --> 00:02:25,680
one of their sort of core capabilities

63
00:02:22,860 --> 00:02:28,080
but what's different is with a causal

64
00:02:25,680 --> 00:02:30,959
model you're more likely to get accurate

65
00:02:28,080 --> 00:02:33,599
answers when you are asking questions in

66
00:02:30,959 --> 00:02:34,980
a changed context so the statistics of

67
00:02:33,599 --> 00:02:36,540
the data that you're going to use the

68
00:02:34,980 --> 00:02:38,580
model in are going to be different for

69
00:02:36,540 --> 00:02:41,099
some reason and that difference can

70
00:02:38,580 --> 00:02:42,780
disrupt an associative model but

71
00:02:41,099 --> 00:02:45,480
hopefully a causal model will be able to

72
00:02:42,780 --> 00:02:47,519
handle those disruptions because of the

73
00:02:45,480 --> 00:02:48,780
causal modeling so for example if you're

74
00:02:47,519 --> 00:02:51,360
going to make an intervention which is

75
00:02:48,780 --> 00:02:53,099
like a change to the system then that's

76
00:02:51,360 --> 00:02:54,420
going to change the statistics and you

77
00:02:53,099 --> 00:02:55,920
want the model to be able to deal with

78
00:02:54,420 --> 00:02:58,440
that then a causal model would be

79
00:02:55,920 --> 00:02:59,819
preferable and of course it may be not

80
00:02:58,440 --> 00:03:01,200
an intervention that you're making but

81
00:02:59,819 --> 00:03:03,060
an intervention that you can't control

82
00:03:01,200 --> 00:03:04,319
such as climate change you're aware that

83
00:03:03,060 --> 00:03:05,640
it's coming and you can sort of have a

84
00:03:04,319 --> 00:03:07,260
bit of an understanding of the effects

85
00:03:05,640 --> 00:03:09,019
that it might have and you want to model

86
00:03:07,260 --> 00:03:12,599
that

87
00:03:09,019 --> 00:03:14,040
in uh in a lot of research that we see

88
00:03:12,599 --> 00:03:17,940
particularly sort of observational

89
00:03:14,040 --> 00:03:20,819
studies we often see a statement like

90
00:03:17,940 --> 00:03:24,060
you know doing X May reduce the risk of

91
00:03:20,819 --> 00:03:25,800
why and you know this this guy

92
00:03:24,060 --> 00:03:27,120
um on Twitter or X or whatever it is

93
00:03:25,800 --> 00:03:29,459
this week

94
00:03:27,120 --> 00:03:31,319
um he he said you know this is a

95
00:03:29,459 --> 00:03:32,640
explicitly causal statement but then

96
00:03:31,319 --> 00:03:34,500
later on in the paper you've got a

97
00:03:32,640 --> 00:03:36,360
statement like oh well you know this is

98
00:03:34,500 --> 00:03:38,159
just an associational study so you can't

99
00:03:36,360 --> 00:03:39,659
actually say anything about cause and

100
00:03:38,159 --> 00:03:41,700
cause and effect

101
00:03:39,659 --> 00:03:43,980
um so it's almost like you know

102
00:03:41,700 --> 00:03:46,140
Schrodinger's cat right where the study

103
00:03:43,980 --> 00:03:47,700
is in two states at the same time yeah

104
00:03:46,140 --> 00:03:49,319
you one thing to draw this sort of

105
00:03:47,700 --> 00:03:51,360
causal conclusion but you know you're

106
00:03:49,319 --> 00:03:53,940
not allowed to say that so I feel like

107
00:03:51,360 --> 00:03:56,220
there's a sort of you know internal

108
00:03:53,940 --> 00:03:57,959
contradiction and you know if people

109
00:03:56,220 --> 00:03:59,940
were aware that it is actually quite

110
00:03:57,959 --> 00:04:01,920
easy to embrace and ADD thinking about

111
00:03:59,940 --> 00:04:04,140
causality into these studies then they

112
00:04:01,920 --> 00:04:06,780
would do it a lot more often

113
00:04:04,140 --> 00:04:08,159
and it's not just the researchers maybe

114
00:04:06,780 --> 00:04:10,260
sort of hedging their bets about whether

115
00:04:08,159 --> 00:04:11,280
the research is covering causality or

116
00:04:10,260 --> 00:04:13,980
not

117
00:04:11,280 --> 00:04:16,440
um there's also been Research into the

118
00:04:13,980 --> 00:04:19,019
perceptions that people draw from

119
00:04:16,440 --> 00:04:20,519
associative studies like if they read

120
00:04:19,019 --> 00:04:23,460
that you know there's an association

121
00:04:20,519 --> 00:04:25,979
between X and Y people often draw the

122
00:04:23,460 --> 00:04:28,860
conclusion that X causes y

123
00:04:25,979 --> 00:04:31,199
which you know may be true but it may

124
00:04:28,860 --> 00:04:33,479
also not be true and in fact there's a

125
00:04:31,199 --> 00:04:35,460
huge number of examples showing that you

126
00:04:33,479 --> 00:04:37,380
know it very easily can be a sort of

127
00:04:35,460 --> 00:04:40,199
spurious or false correlation in fact

128
00:04:37,380 --> 00:04:42,120
this this guy Tyler vegan the website at

129
00:04:40,199 --> 00:04:44,160
the bottom there yeah he's got a whole

130
00:04:42,120 --> 00:04:46,199
website full of hilarious sort of

131
00:04:44,160 --> 00:04:48,300
correlations that don't have any real

132
00:04:46,199 --> 00:04:50,460
causal relation just to show how easy it

133
00:04:48,300 --> 00:04:52,259
is to discover you know a false

134
00:04:50,460 --> 00:04:56,040
relationship

135
00:04:52,259 --> 00:04:58,860
there is one uh experimental design that

136
00:04:56,040 --> 00:05:00,840
people widely understand do establish a

137
00:04:58,860 --> 00:05:04,020
causal relationship and that's called

138
00:05:00,840 --> 00:05:08,100
the randomized control trial and and or

139
00:05:04,020 --> 00:05:10,139
RCT and an RCT has two key elements that

140
00:05:08,100 --> 00:05:12,780
enable it to do that the first is

141
00:05:10,139 --> 00:05:14,880
randomization so whatever the factors

142
00:05:12,780 --> 00:05:16,979
that you know affect that whole study

143
00:05:14,880 --> 00:05:19,560
population they're going to be present

144
00:05:16,979 --> 00:05:21,180
in both groups that you produce because

145
00:05:19,560 --> 00:05:23,460
you've randomized the assignment of

146
00:05:21,180 --> 00:05:24,960
people to those two groups whatever

147
00:05:23,460 --> 00:05:26,820
those confounding factors are they're

148
00:05:24,960 --> 00:05:28,560
going to be present in both groups and

149
00:05:26,820 --> 00:05:30,840
then you make some Interventional change

150
00:05:28,560 --> 00:05:32,699
to just one of those groups and then

151
00:05:30,840 --> 00:05:34,680
that enables the combination of the

152
00:05:32,699 --> 00:05:36,060
random assignment and That change to

153
00:05:34,680 --> 00:05:38,639
just one of the groups allows you to

154
00:05:36,060 --> 00:05:40,740
make that conclusion that the the

155
00:05:38,639 --> 00:05:43,139
differences between those groups are due

156
00:05:40,740 --> 00:05:44,520
to the intervention and they're not due

157
00:05:43,139 --> 00:05:46,560
to other factors that were sort of

158
00:05:44,520 --> 00:05:49,500
hidden in the background

159
00:05:46,560 --> 00:05:51,419
but randomized controlled trials are not

160
00:05:49,500 --> 00:05:53,280
always Pro possible they're not always

161
00:05:51,419 --> 00:05:54,300
practical so for example if your

162
00:05:53,280 --> 00:05:55,740
question is about something that's

163
00:05:54,300 --> 00:05:57,180
happened in the past and obviously

164
00:05:55,740 --> 00:05:58,560
unless you can time travel you can't go

165
00:05:57,180 --> 00:06:00,000
back and change that and see what would

166
00:05:58,560 --> 00:06:02,039
have happened

167
00:06:00,000 --> 00:06:04,080
um there are also many situations where

168
00:06:02,039 --> 00:06:06,060
it's you know unethical or impractical

169
00:06:04,080 --> 00:06:08,340
to do a randomized control trial so for

170
00:06:06,060 --> 00:06:09,960
example you can't get you know a group

171
00:06:08,340 --> 00:06:11,820
of kids and then get half of them to

172
00:06:09,960 --> 00:06:15,060
smoke 20 cigarettes a day for 20 years

173
00:06:11,820 --> 00:06:16,740
sort of see what might happen so if you

174
00:06:15,060 --> 00:06:18,900
can't do a randomized controlled trial

175
00:06:16,740 --> 00:06:20,160
can you still model causality and the

176
00:06:18,900 --> 00:06:22,979
answer is yes

177
00:06:20,160 --> 00:06:25,560
you basically need these two things so

178
00:06:22,979 --> 00:06:27,240
first you need some data and secondly

179
00:06:25,560 --> 00:06:28,940
you need a cause and model and there's

180
00:06:27,240 --> 00:06:31,560
many types of causal model

181
00:06:28,940 --> 00:06:34,319
but most commonly the way that you

182
00:06:31,560 --> 00:06:37,380
produce the model is either by drawing

183
00:06:34,319 --> 00:06:39,300
on the knowledge of experts and that

184
00:06:37,380 --> 00:06:40,620
process of sort of gathering and sort of

185
00:06:39,300 --> 00:06:42,360
discussing and teasing out that

186
00:06:40,620 --> 00:06:45,720
knowledge is called elicitation

187
00:06:42,360 --> 00:06:46,979
or you can learn the causal model from

188
00:06:45,720 --> 00:06:48,060
the data and that's called causal

189
00:06:46,979 --> 00:06:50,039
discovery

190
00:06:48,060 --> 00:06:51,720
so causal inference is the process of

191
00:06:50,039 --> 00:06:53,639
you know using the model once you've got

192
00:06:51,720 --> 00:06:55,500
it Discovery is the process of learning

193
00:06:53,639 --> 00:06:58,199
a model from data and elicitation is the

194
00:06:55,500 --> 00:06:59,639
process of learning a model from experts

195
00:06:58,199 --> 00:07:01,139
and there can be a bit of mixing right

196
00:06:59,639 --> 00:07:02,580
like you can get some expert domain

197
00:07:01,139 --> 00:07:05,039
knowledge and use that to restrict the

198
00:07:02,580 --> 00:07:09,060
range of models for causal discovery

199
00:07:05,039 --> 00:07:11,520
so in my day job I work for wsp

200
00:07:09,060 --> 00:07:13,199
um an engineering consulting company and

201
00:07:11,520 --> 00:07:15,360
what's really sort of drawn me to the

202
00:07:13,199 --> 00:07:16,919
sort of causality space is just the

203
00:07:15,360 --> 00:07:19,919
number of opportunities that we

204
00:07:16,919 --> 00:07:22,199
encounter where we have clients with

205
00:07:19,919 --> 00:07:24,740
vast quantities of detailed historical

206
00:07:22,199 --> 00:07:26,819
data and because a lot of these sort of

207
00:07:24,740 --> 00:07:29,099
infrastructure Engineering Systems they

208
00:07:26,819 --> 00:07:30,660
also have expert domain knowledge of

209
00:07:29,099 --> 00:07:32,759
these well-defined well-controlled

210
00:07:30,660 --> 00:07:35,099
systems and the types of questions that

211
00:07:32,759 --> 00:07:38,400
they come to us and ask us to solve are

212
00:07:35,099 --> 00:07:39,960
often causal questions so for example in

213
00:07:38,400 --> 00:07:41,460
you know managing a lot of the sort of

214
00:07:39,960 --> 00:07:44,520
critical infrastructure that we have

215
00:07:41,460 --> 00:07:46,740
around Australia we get questions like

216
00:07:44,520 --> 00:07:49,080
you know over the last 10 years we've

217
00:07:46,740 --> 00:07:51,599
invested X millions of dollars in

218
00:07:49,080 --> 00:07:54,060
applying these policies to renew like

219
00:07:51,599 --> 00:07:56,099
pipe networks or Road networks you know

220
00:07:54,060 --> 00:07:57,660
if we had invested a different amount of

221
00:07:56,099 --> 00:08:00,000
money or if we'd invested in different

222
00:07:57,660 --> 00:08:01,979
practices or policies or Technologies

223
00:08:00,000 --> 00:08:03,360
what would have happened like what would

224
00:08:01,979 --> 00:08:05,400
have been the service level of our

225
00:08:03,360 --> 00:08:07,680
Railways or our roads under the those

226
00:08:05,400 --> 00:08:09,599
conditions and so all of these questions

227
00:08:07,680 --> 00:08:12,300
they generally they cause all questions

228
00:08:09,599 --> 00:08:13,560
because they involve exploring the

229
00:08:12,300 --> 00:08:14,699
outcomes that would have happened under

230
00:08:13,560 --> 00:08:16,740
different conditions that aren't

231
00:08:14,699 --> 00:08:18,240
represented in the data

232
00:08:16,740 --> 00:08:19,620
so that was the first part of the talk

233
00:08:18,240 --> 00:08:21,599
where I tried to sort of convince you

234
00:08:19,620 --> 00:08:23,160
that you should be interested in in

235
00:08:21,599 --> 00:08:24,660
causality

236
00:08:23,160 --> 00:08:27,120
the second part is sort of looking

237
00:08:24,660 --> 00:08:29,220
specifically at a python Library called

238
00:08:27,120 --> 00:08:33,300
do Y which I've been working with quite

239
00:08:29,220 --> 00:08:35,520
a bit and do why is part of a uh a

240
00:08:33,300 --> 00:08:38,159
package well not a package an ecosystem

241
00:08:35,520 --> 00:08:41,099
they call it um called Pi Y which

242
00:08:38,159 --> 00:08:44,459
contains a few major packages do Y which

243
00:08:41,099 --> 00:08:46,260
is about causal effects it kind of Mel a

244
00:08:44,459 --> 00:08:48,600
lot of the people working in the causal

245
00:08:46,260 --> 00:08:50,399
inference space come from econometrics

246
00:08:48,600 --> 00:08:52,019
and epidemiology

247
00:08:50,399 --> 00:08:53,880
and so they brought in a lot of their

248
00:08:52,019 --> 00:08:56,279
methods and causal Learners called

249
00:08:53,880 --> 00:08:58,019
Discovery algorithms

250
00:08:56,279 --> 00:09:01,500
and this talk is going to mostly focus

251
00:08:58,019 --> 00:09:04,620
on the do Y part and do what is well

252
00:09:01,500 --> 00:09:06,600
documented the user guys it's all on Pi

253
00:09:04,620 --> 00:09:09,060
Y and actually you know the user guide

254
00:09:06,600 --> 00:09:10,320
is not just the bare bones of um you

255
00:09:09,060 --> 00:09:12,540
know this is this is how you install it

256
00:09:10,320 --> 00:09:14,399
this is how you you how you do one

257
00:09:12,540 --> 00:09:15,720
simple introduction it's actually pretty

258
00:09:14,399 --> 00:09:17,220
detailed it covers a lot of sort of

259
00:09:15,720 --> 00:09:19,019
background Concepts so it's a really

260
00:09:17,220 --> 00:09:21,600
quite a recommended read

261
00:09:19,019 --> 00:09:24,839
I'm going to show a few clips of code

262
00:09:21,600 --> 00:09:27,300
for the rest of the talk and that's

263
00:09:24,839 --> 00:09:29,279
actually in a public GitHub repo which I

264
00:09:27,300 --> 00:09:30,660
just made for this talk so if you want

265
00:09:29,279 --> 00:09:34,500
to go have a look at that afterwards

266
00:09:30,660 --> 00:09:36,180
then you can you can have a look at your

267
00:09:34,500 --> 00:09:38,339
what's happened play with the code like

268
00:09:36,180 --> 00:09:41,279
maybe do some experiments of your own so

269
00:09:38,339 --> 00:09:43,560
everything there it's very simple

270
00:09:41,279 --> 00:09:46,920
one of the things I really like about do

271
00:09:43,560 --> 00:09:49,200
y is that it imposes this sort of

272
00:09:46,920 --> 00:09:51,300
four-step process on modeling a sort of

273
00:09:49,200 --> 00:09:54,660
causal inference problem

274
00:09:51,300 --> 00:09:56,640
and the four steps are firstly model the

275
00:09:54,660 --> 00:09:59,279
problem and I'll explain what these are

276
00:09:56,640 --> 00:10:01,680
as we go uh secondly we'll use that

277
00:09:59,279 --> 00:10:03,600
model to identify an estimate we'll use

278
00:10:01,680 --> 00:10:05,220
the S demand and your data to estimate

279
00:10:03,600 --> 00:10:07,560
an effect and then finally the fourth

280
00:10:05,220 --> 00:10:09,660
step we will try to refute that estimate

281
00:10:07,560 --> 00:10:10,980
okay so to try and explain what those

282
00:10:09,660 --> 00:10:12,899
words will mean we'll go through a bit

283
00:10:10,980 --> 00:10:15,720
of an example

284
00:10:12,899 --> 00:10:19,740
the example that I picked is called the

285
00:10:15,720 --> 00:10:22,140
Lalonde data set it's it's really old it

286
00:10:19,740 --> 00:10:25,920
was you know it's from I think the the

287
00:10:22,140 --> 00:10:27,240
late 1970s and it's very simple a small

288
00:10:25,920 --> 00:10:29,700
data set

289
00:10:27,240 --> 00:10:31,080
um and essentially what had happened is

290
00:10:29,700 --> 00:10:33,240
they had a training program and they

291
00:10:31,080 --> 00:10:35,399
wanted to understand if that training

292
00:10:33,240 --> 00:10:38,700
program had actually produced a benefit

293
00:10:35,399 --> 00:10:40,440
to the people who participated in it and

294
00:10:38,700 --> 00:10:43,500
so they looked at the wages of

295
00:10:40,440 --> 00:10:45,360
participants three years later in 1978

296
00:10:43,500 --> 00:10:47,519
and then compared to another group of

297
00:10:45,360 --> 00:10:49,980
people who hadn't participated in that

298
00:10:47,519 --> 00:10:52,620
program and so the question so the data

299
00:10:49,980 --> 00:10:55,740
you can see there this this data is in

300
00:10:52,620 --> 00:10:57,180
the in the repo uh essentially there's

301
00:10:55,740 --> 00:10:58,920
two columns that we're really interested

302
00:10:57,180 --> 00:11:00,660
in you know whether they undertook the

303
00:10:58,920 --> 00:11:02,940
training and then their wage three years

304
00:11:00,660 --> 00:11:04,860
later which was 1978 I told you it was a

305
00:11:02,940 --> 00:11:06,300
very old example

306
00:11:04,860 --> 00:11:07,500
um and there's a few other columns which

307
00:11:06,300 --> 00:11:11,880
are sort of variables that they thought

308
00:11:07,500 --> 00:11:13,920
may have also affected the answer

309
00:11:11,880 --> 00:11:15,600
so remember I said that to do causality

310
00:11:13,920 --> 00:11:17,700
without randomized control trials you

311
00:11:15,600 --> 00:11:20,700
need two things so firstly you need some

312
00:11:17,700 --> 00:11:23,040
data and we just looked at that CSV file

313
00:11:20,700 --> 00:11:25,200
and then secondly you need a causal

314
00:11:23,040 --> 00:11:27,660
model and so the next thing we need to

315
00:11:25,200 --> 00:11:31,019
look at is how you can describe a causal

316
00:11:27,660 --> 00:11:33,720
model in do y the python Library

317
00:11:31,019 --> 00:11:35,160
so do I want you to provide your domain

318
00:11:33,720 --> 00:11:38,459
knowledge about the system in question

319
00:11:35,160 --> 00:11:41,519
as a directed a cyclic graph directed

320
00:11:38,459 --> 00:11:43,560
meaning that there are arrows

321
00:11:41,519 --> 00:11:45,420
essentially between the variables and

322
00:11:43,560 --> 00:11:48,000
variables are just like the columns in

323
00:11:45,420 --> 00:11:50,040
your data file effectively right so

324
00:11:48,000 --> 00:11:51,779
and acyclic means that there are no

325
00:11:50,040 --> 00:11:53,820
Loops right so those are those are the

326
00:11:51,779 --> 00:11:55,800
only sort of constraints that we have we

327
00:11:53,820 --> 00:11:57,959
need that graph to include at least the

328
00:11:55,800 --> 00:12:00,120
treatment which is the the cause that we

329
00:11:57,959 --> 00:12:01,980
want to vary and the outcome that we

330
00:12:00,120 --> 00:12:03,839
want to understand the effect on the

331
00:12:01,980 --> 00:12:06,959
outcome right

332
00:12:03,839 --> 00:12:08,940
there is this aim that you want to

333
00:12:06,959 --> 00:12:12,240
include in that graph all of the

334
00:12:08,940 --> 00:12:14,160
relevant direct causal relationships so

335
00:12:12,240 --> 00:12:15,899
you don't want to include just a you

336
00:12:14,160 --> 00:12:17,220
know a correlation you only want to

337
00:12:15,899 --> 00:12:20,720
include a relationship where it's a

338
00:12:17,220 --> 00:12:23,480
causal one we there's a bit of sort of

339
00:12:20,720 --> 00:12:26,540
judgment and sort of expediency

340
00:12:23,480 --> 00:12:28,740
practicality to sort of deciding which

341
00:12:26,540 --> 00:12:30,380
variables and which interactions to

342
00:12:28,740 --> 00:12:33,360
include

343
00:12:30,380 --> 00:12:35,820
that is a whole sort of Topic in itself

344
00:12:33,360 --> 00:12:37,140
but um you know one tip I can give is

345
00:12:35,820 --> 00:12:39,779
like you can always create multiple

346
00:12:37,140 --> 00:12:41,700
models and compare the results with

347
00:12:39,779 --> 00:12:44,639
different models one of the great things

348
00:12:41,700 --> 00:12:48,360
about creating this graph is that it

349
00:12:44,639 --> 00:12:50,399
becomes a specific precise documented

350
00:12:48,360 --> 00:12:52,560
description of your assumptions and

351
00:12:50,399 --> 00:12:54,899
beliefs that you're bringing to this

352
00:12:52,560 --> 00:12:56,660
study so whereas if you'd just done one

353
00:12:54,899 --> 00:12:59,339
of those Schrodinger's

354
00:12:56,660 --> 00:13:00,720
studies before like essentially all of

355
00:12:59,339 --> 00:13:02,100
this would have not been stated right

356
00:13:00,720 --> 00:13:04,500
whatever assumptions you make about

357
00:13:02,100 --> 00:13:06,360
confounding variables is just kind of

358
00:13:04,500 --> 00:13:09,480
left for the reader to sort of interpret

359
00:13:06,360 --> 00:13:12,300
whereas if you Embrace causality and you

360
00:13:09,480 --> 00:13:14,100
sort of draw a causal diagram a dag like

361
00:13:12,300 --> 00:13:15,420
this and you're making those assumptions

362
00:13:14,100 --> 00:13:17,459
explicit

363
00:13:15,420 --> 00:13:19,200
so even if they're wrong at least people

364
00:13:17,459 --> 00:13:22,800
can see what they are

365
00:13:19,200 --> 00:13:26,220
now do I want you to provide the causal

366
00:13:22,800 --> 00:13:28,019
model as a string which you know is

367
00:13:26,220 --> 00:13:29,880
visible on the left there and looks a

368
00:13:28,019 --> 00:13:31,800
bit complicated so I'll just sort of

369
00:13:29,880 --> 00:13:32,880
break it down so we can understand how

370
00:13:31,800 --> 00:13:34,560
it works

371
00:13:32,880 --> 00:13:36,899
the first part is essentially we're

372
00:13:34,560 --> 00:13:38,399
declaring the variables and if you

373
00:13:36,899 --> 00:13:40,860
remember I said that the variables are

374
00:13:38,399 --> 00:13:42,060
essentially just the relevant you know

375
00:13:40,860 --> 00:13:44,100
you don't have to use all of them the

376
00:13:42,060 --> 00:13:45,300
relevant columns in your data file so

377
00:13:44,100 --> 00:13:48,060
you can see the variables are

378
00:13:45,300 --> 00:13:50,940
essentially The Columns there

379
00:13:48,060 --> 00:13:52,560
and it's in this in this string we

380
00:13:50,940 --> 00:13:54,420
essentially just to declare them all by

381
00:13:52,560 --> 00:13:57,180
listing them by name

382
00:13:54,420 --> 00:13:59,880
and once we've declared the variables

383
00:13:57,180 --> 00:14:02,639
the next step is to create the edges in

384
00:13:59,880 --> 00:14:04,440
our graph and um yeah it's a good

385
00:14:02,639 --> 00:14:07,139
starting point is basically to say well

386
00:14:04,440 --> 00:14:10,139
in this case there is an edge there is a

387
00:14:07,139 --> 00:14:11,300
causal effect between the whether the

388
00:14:10,139 --> 00:14:14,040
participant

389
00:14:11,300 --> 00:14:15,779
received the training course and their

390
00:14:14,040 --> 00:14:17,639
wage now in this case that's a direct

391
00:14:15,779 --> 00:14:19,980
effect it doesn't have to be it might be

392
00:14:17,639 --> 00:14:21,180
that you know doing training affects

393
00:14:19,980 --> 00:14:22,920
some other variable and that other

394
00:14:21,180 --> 00:14:25,860
variable affects wages but in this case

395
00:14:22,920 --> 00:14:27,540
it's direct and to explain to do why

396
00:14:25,860 --> 00:14:29,459
that you've got this direct effect you

397
00:14:27,540 --> 00:14:32,420
just use this Arrow operator you can see

398
00:14:29,459 --> 00:14:34,800
it in the red box on the left

399
00:14:32,420 --> 00:14:36,360
the so having having sort of created

400
00:14:34,800 --> 00:14:37,980
that first Edge we can just sort of keep

401
00:14:36,360 --> 00:14:39,779
populating the graph with all the other

402
00:14:37,980 --> 00:14:42,540
edges just by sort of adding them to

403
00:14:39,779 --> 00:14:44,220
that string so in the next one we sort

404
00:14:42,540 --> 00:14:45,720
of consider well what's the impact of

405
00:14:44,220 --> 00:14:47,639
you know the number of years education

406
00:14:45,720 --> 00:14:49,260
that person has had and you sort of

407
00:14:47,639 --> 00:14:50,940
consult with your experts and they say

408
00:14:49,260 --> 00:14:53,639
oh yeah well that would affect you know

409
00:14:50,940 --> 00:14:55,079
wages as well and actually in this study

410
00:14:53,639 --> 00:14:57,000
it affected whether people were eligible

411
00:14:55,079 --> 00:14:58,620
for the training program as well so we

412
00:14:57,000 --> 00:15:00,360
sort of represent that by adding those

413
00:14:58,620 --> 00:15:01,740
two edges there

414
00:15:00,360 --> 00:15:03,060
and then the rest of the string is

415
00:15:01,740 --> 00:15:04,860
essentially just repeating that and

416
00:15:03,060 --> 00:15:06,480
adding all of the other edges and I

417
00:15:04,860 --> 00:15:08,579
don't claim this is like a correct

418
00:15:06,480 --> 00:15:10,800
causal diagram this is you know just an

419
00:15:08,579 --> 00:15:12,360
example but um you know essentially it's

420
00:15:10,800 --> 00:15:13,380
a representation of the string on the

421
00:15:12,360 --> 00:15:15,060
left there

422
00:15:13,380 --> 00:15:17,940
so that was the first step and that was

423
00:15:15,060 --> 00:15:20,220
like really like the the bulk of the

424
00:15:17,940 --> 00:15:21,899
work that you have to do as a user of

425
00:15:20,220 --> 00:15:24,300
the do y Library

426
00:15:21,899 --> 00:15:26,100
once you've you've created that that

427
00:15:24,300 --> 00:15:28,740
graph as a string and you've got your

428
00:15:26,100 --> 00:15:30,180
data as a pandas data frame you

429
00:15:28,740 --> 00:15:32,639
essentially pass them both into an

430
00:15:30,180 --> 00:15:35,519
object that do y equals a causal model

431
00:15:32,639 --> 00:15:37,620
and you say the treatment here is the

432
00:15:35,519 --> 00:15:40,440
training variable and the outcome is is

433
00:15:37,620 --> 00:15:43,440
these the wages in 1978 and you pass in

434
00:15:40,440 --> 00:15:44,820
the data in your graph that's it for the

435
00:15:43,440 --> 00:15:47,040
first step

436
00:15:44,820 --> 00:15:49,019
the second step is then we do a thing

437
00:15:47,040 --> 00:15:50,459
called identify effect all of the

438
00:15:49,019 --> 00:15:51,899
remaining steps are literally just one

439
00:15:50,459 --> 00:15:53,760
function calling it's like it's actually

440
00:15:51,899 --> 00:15:55,980
very easy

441
00:15:53,760 --> 00:15:57,360
um as I mentioned earlier like using

442
00:15:55,980 --> 00:15:59,519
identify effect will produce a thing

443
00:15:57,360 --> 00:16:01,500
called an S demand which you may not

444
00:15:59,519 --> 00:16:04,199
have heard of before essentially an

445
00:16:01,500 --> 00:16:06,600
estimate is a way to estimate the

446
00:16:04,199 --> 00:16:09,120
desired quantity so it's it's a sort of

447
00:16:06,600 --> 00:16:11,839
it's a strategy or like a procedure that

448
00:16:09,120 --> 00:16:14,279
will enable you to calculate the the the

449
00:16:11,839 --> 00:16:15,600
quantity that you're interested in and

450
00:16:14,279 --> 00:16:17,279
it's worth noting that it's not always

451
00:16:15,600 --> 00:16:19,380
possible so you can create a graph where

452
00:16:17,279 --> 00:16:21,660
there is no valid estimate it's also

453
00:16:19,380 --> 00:16:23,160
possible to create a graph where there

454
00:16:21,660 --> 00:16:24,480
are multiple estimands in which case it

455
00:16:23,160 --> 00:16:25,800
will return them all and you can choose

456
00:16:24,480 --> 00:16:28,139
between them

457
00:16:25,800 --> 00:16:30,180
so in this case we've gotten a backdoor

458
00:16:28,139 --> 00:16:31,560
s demand

459
00:16:30,180 --> 00:16:34,800
and the other thing that's happening

460
00:16:31,560 --> 00:16:37,500
under the hood when do y is doing this

461
00:16:34,800 --> 00:16:39,240
identification step is it's analyzing

462
00:16:37,500 --> 00:16:40,980
the graph that domain knowledge you've

463
00:16:39,240 --> 00:16:43,560
provided and it's working out the roles

464
00:16:40,980 --> 00:16:44,940
of all the variables in this problem and

465
00:16:43,560 --> 00:16:46,259
and this is a really key step right

466
00:16:44,940 --> 00:16:48,000
because it's understanding which

467
00:16:46,259 --> 00:16:49,440
variables you should be controlling or

468
00:16:48,000 --> 00:16:51,420
conditioning for

469
00:16:49,440 --> 00:16:53,399
and also which variables you should not

470
00:16:51,420 --> 00:16:54,959
be conditioning for and that's really

471
00:16:53,399 --> 00:16:56,579
interesting because some people sort of

472
00:16:54,959 --> 00:16:58,500
kind of think well I should just control

473
00:16:56,579 --> 00:17:01,560
for as many variables as possible but

474
00:16:58,500 --> 00:17:04,319
that's actually harmful in in some in

475
00:17:01,560 --> 00:17:07,140
some situations and can actually sort of

476
00:17:04,319 --> 00:17:09,120
eliminate or bias incorrectly the effect

477
00:17:07,140 --> 00:17:10,919
that you're looking for so that that

478
00:17:09,120 --> 00:17:12,240
sort of analysis of the graph is really

479
00:17:10,919 --> 00:17:14,819
important

480
00:17:12,240 --> 00:17:17,040
and to sort of illustrate you know the

481
00:17:14,819 --> 00:17:18,959
the effect that that can have there's

482
00:17:17,040 --> 00:17:21,839
this this phenomenon known as Simpson's

483
00:17:18,959 --> 00:17:23,579
Paradox and uh in Simpsons Paradox what

484
00:17:21,839 --> 00:17:25,740
happens is you've got this this whole

485
00:17:23,579 --> 00:17:27,780
study population where you know the

486
00:17:25,740 --> 00:17:30,179
relationship between some property X and

487
00:17:27,780 --> 00:17:31,679
some property y has a certain you know

488
00:17:30,179 --> 00:17:34,020
Direction so you can see this strong

489
00:17:31,679 --> 00:17:37,140
magenta line there basically saying like

490
00:17:34,020 --> 00:17:39,419
an increase in X decreases the value of

491
00:17:37,140 --> 00:17:41,940
y and that is true over the whole

492
00:17:39,419 --> 00:17:43,860
population but if you bring in this

493
00:17:41,940 --> 00:17:46,020
additional variable which actually

494
00:17:43,860 --> 00:17:48,419
divides the population into these four

495
00:17:46,020 --> 00:17:50,520
color groups then within each of those

496
00:17:48,419 --> 00:17:52,440
groups the relationship between X and Y

497
00:17:50,520 --> 00:17:54,780
is completely opposite

498
00:17:52,440 --> 00:17:56,640
so if you hadn't brought in and control

499
00:17:54,780 --> 00:17:57,840
for that variable appropriately then

500
00:17:56,640 --> 00:18:00,179
your conclusion would have been the

501
00:17:57,840 --> 00:18:02,039
opposite of what it should be now

502
00:18:00,179 --> 00:18:04,500
hopefully that sort of intuitively makes

503
00:18:02,039 --> 00:18:06,059
sense in this example you can kind of

504
00:18:04,500 --> 00:18:07,380
see how that works but without the

505
00:18:06,059 --> 00:18:09,860
coloring it's actually really hard to

506
00:18:07,380 --> 00:18:12,179
grasp out two totally different

507
00:18:09,860 --> 00:18:15,740
contradictory outcomes can be possible

508
00:18:12,179 --> 00:18:15,740
in in one set of data

509
00:18:16,260 --> 00:18:21,960
the next of our third of our sort of

510
00:18:19,260 --> 00:18:24,059
four steps is estimating the effect yeah

511
00:18:21,960 --> 00:18:26,400
again it's just a single function call

512
00:18:24,059 --> 00:18:28,140
it's very easy to do you can select from

513
00:18:26,400 --> 00:18:30,179
a range of models that are built in and

514
00:18:28,140 --> 00:18:33,360
supported by do y and you can also

515
00:18:30,179 --> 00:18:36,480
access models from the econ ml package

516
00:18:33,360 --> 00:18:38,520
as well and so having done this in our

517
00:18:36,480 --> 00:18:41,160
data set we get this result that the

518
00:18:38,520 --> 00:18:43,440
cause of estimate is 1629 and in this

519
00:18:41,160 --> 00:18:45,480
case it's 16 29 dollars more and because

520
00:18:43,440 --> 00:18:47,340
we've got a causal model we can actually

521
00:18:45,480 --> 00:18:50,580
make a causal interpretation which we

522
00:18:47,340 --> 00:18:53,280
can say you know as a given as a prior

523
00:18:50,580 --> 00:18:55,380
sort of assumption that graph that that

524
00:18:53,280 --> 00:18:57,780
domain knowledge that we provided if you

525
00:18:55,380 --> 00:18:59,280
accept that as being correct then on

526
00:18:57,780 --> 00:19:02,039
average completing this training course

527
00:18:59,280 --> 00:19:03,360
causes participants to earn one thousand

528
00:19:02,039 --> 00:19:06,000
six hundred and twenty nine dollars more

529
00:19:03,360 --> 00:19:08,280
than not completing the training right

530
00:19:06,000 --> 00:19:10,140
so you see by bringing the sort of

531
00:19:08,280 --> 00:19:12,360
causal analysis and the causal model

532
00:19:10,140 --> 00:19:14,039
into the study we're able to go from a

533
00:19:12,360 --> 00:19:15,419
sort of a statement about you know one

534
00:19:14,039 --> 00:19:16,980
variable being associated with another

535
00:19:15,419 --> 00:19:19,260
to actually have sort of causal

536
00:19:16,980 --> 00:19:23,340
interpretation

537
00:19:19,260 --> 00:19:26,100
the the next and sort of final step in

538
00:19:23,340 --> 00:19:28,799
the in the do y Paradigm sort of how to

539
00:19:26,100 --> 00:19:31,200
handle causal inference is refutation

540
00:19:28,799 --> 00:19:35,280
and basically that means sort of stress

541
00:19:31,200 --> 00:19:37,799
testing your uh your model to sort of

542
00:19:35,280 --> 00:19:40,380
see is this a real effect like you might

543
00:19:37,799 --> 00:19:42,179
not really be sure from the magnitude of

544
00:19:40,380 --> 00:19:44,220
the variables whether this is like a you

545
00:19:42,179 --> 00:19:46,260
know a weak effect but legitimate or

546
00:19:44,220 --> 00:19:48,179
maybe it's a strong effect but it's sort

547
00:19:46,260 --> 00:19:51,299
of like biased or confounded in some way

548
00:19:48,179 --> 00:19:53,640
and so do I provides a number of tools

549
00:19:51,299 --> 00:19:56,220
to enable you to sort of gain confidence

550
00:19:53,640 --> 00:19:59,280
and sort of understand your how

551
00:19:56,220 --> 00:20:00,840
statistically robust that effect is and

552
00:19:59,280 --> 00:20:03,240
you can access all of them through the

553
00:20:00,840 --> 00:20:05,039
refute estimate function you basically

554
00:20:03,240 --> 00:20:07,679
specify the name of the test that you

555
00:20:05,039 --> 00:20:09,299
want to do so in this case you can see

556
00:20:07,679 --> 00:20:10,679
what I've done is I've used a placebo

557
00:20:09,299 --> 00:20:12,240
treatment which essentially means we

558
00:20:10,679 --> 00:20:13,440
randomize all of the treatments but we

559
00:20:12,240 --> 00:20:15,360
keep the outcomes and all the other

560
00:20:13,440 --> 00:20:16,740
variables the same and because we've

561
00:20:15,360 --> 00:20:19,740
randomized the treatment we would expect

562
00:20:16,740 --> 00:20:21,480
that effect to disappear and in this

563
00:20:19,740 --> 00:20:24,000
case fortunately it does effect has gone

564
00:20:21,480 --> 00:20:26,580
down from 1600 to just two dollars so

565
00:20:24,000 --> 00:20:29,160
it's pretty much gone

566
00:20:26,580 --> 00:20:30,720
now there's one extra bit that I wanted

567
00:20:29,160 --> 00:20:32,880
to sort of add to this talk which is

568
00:20:30,720 --> 00:20:35,039
about counterfactual outcomes so a

569
00:20:32,880 --> 00:20:36,360
counter factual outcome is like looking

570
00:20:35,039 --> 00:20:38,220
back and saying well what would have

571
00:20:36,360 --> 00:20:41,039
happened if if things were different

572
00:20:38,220 --> 00:20:42,419
like if we did something differently and

573
00:20:41,039 --> 00:20:44,100
yeah the great thing about having a

574
00:20:42,419 --> 00:20:46,380
cause and model is we can actually we

575
00:20:44,100 --> 00:20:48,660
can actually answer this right so we can

576
00:20:46,380 --> 00:20:50,520
first look at what actually did happen

577
00:20:48,660 --> 00:20:52,679
to the participants in this study and so

578
00:20:50,520 --> 00:20:55,020
the red box at the bottom shows that if

579
00:20:52,679 --> 00:20:56,640
you look at the average outcome of all

580
00:20:55,020 --> 00:20:58,679
the participants it's five thousand

581
00:20:56,640 --> 00:21:01,799
three hundred dollars average wage in

582
00:20:58,679 --> 00:21:03,419
1978 been a lot of inflation since then

583
00:21:01,799 --> 00:21:05,580
and um

584
00:21:03,419 --> 00:21:07,140
if we look at the outcome for just the

585
00:21:05,580 --> 00:21:10,020
control group who didn't receive the

586
00:21:07,140 --> 00:21:12,480
training it's the average wage is 4 500

587
00:21:10,020 --> 00:21:14,820
and the average outcome for the treated

588
00:21:12,480 --> 00:21:16,679
group who did receive the training is

589
00:21:14,820 --> 00:21:18,539
six thousand three hundred so yeah at

590
00:21:16,679 --> 00:21:21,419
surface level it looks like you know

591
00:21:18,539 --> 00:21:22,919
there was uh an increase in wage for

592
00:21:21,419 --> 00:21:24,720
that group which matches our sort of

593
00:21:22,919 --> 00:21:26,940
causal effect that doing the training

594
00:21:24,720 --> 00:21:28,440
did increase their wage so that's all

595
00:21:26,940 --> 00:21:30,480
looking good

596
00:21:28,440 --> 00:21:32,520
so do wife revise this thing called the

597
00:21:30,480 --> 00:21:35,039
do operator and that is a way to access

598
00:21:32,520 --> 00:21:38,580
an intervention or to apply a

599
00:21:35,039 --> 00:21:40,620
counterfactual scenario and so uh to

600
00:21:38,580 --> 00:21:43,919
illustrate that I've added a couple of

601
00:21:40,620 --> 00:21:47,640
extra outcomes so firstly the outcome

602
00:21:43,919 --> 00:21:49,740
over all the participants if if none of

603
00:21:47,640 --> 00:21:52,559
them received any training the average

604
00:21:49,740 --> 00:21:54,539
outcome goes down from 5 300 and it goes

605
00:21:52,559 --> 00:21:55,740
down to four thousand six hundred so you

606
00:21:54,539 --> 00:21:57,960
can see that if we take away the

607
00:21:55,740 --> 00:21:59,940
training then all the participants kind

608
00:21:57,960 --> 00:22:01,679
of become more like the controls

609
00:21:59,940 --> 00:22:03,780
and we also have accounts factual

610
00:22:01,679 --> 00:22:06,480
outcome as if we if we did provide

611
00:22:03,780 --> 00:22:09,960
training to all the participants so that

612
00:22:06,480 --> 00:22:11,159
increases the average wage to 6200 and

613
00:22:09,960 --> 00:22:12,900
those numbers kind of make sense right

614
00:22:11,159 --> 00:22:15,000
because they look it basically makes the

615
00:22:12,900 --> 00:22:16,980
population look more like the you know

616
00:22:15,000 --> 00:22:18,059
the the population who did receive

617
00:22:16,980 --> 00:22:20,400
training or we can make the population

618
00:22:18,059 --> 00:22:22,799
look more like the ones who didn't

619
00:22:20,400 --> 00:22:24,720
and that's really the sort of you know

620
00:22:22,799 --> 00:22:28,020
one of the key powers of this is that it

621
00:22:24,720 --> 00:22:30,120
it enables you to answer questions like

622
00:22:28,020 --> 00:22:32,460
well what if we rolled out that program

623
00:22:30,120 --> 00:22:34,260
more widely like what if we replaced all

624
00:22:32,460 --> 00:22:35,340
of these old devices with some new

625
00:22:34,260 --> 00:22:36,900
device you know what would actually

626
00:22:35,340 --> 00:22:38,880
happen and this allows us to sort of

627
00:22:36,900 --> 00:22:40,559
answer those questions

628
00:22:38,880 --> 00:22:44,520
before I wrap up just want to quickly

629
00:22:40,559 --> 00:22:47,880
mention an app that that I created based

630
00:22:44,520 --> 00:22:50,159
on based on the Dubai library and um and

631
00:22:47,880 --> 00:22:51,659
really this app aims to make some of the

632
00:22:50,159 --> 00:22:53,700
topics we talked about today like

633
00:22:51,659 --> 00:22:56,340
causality accessible to a wider audience

634
00:22:53,700 --> 00:22:58,440
and specifically like trying to sort of

635
00:22:56,340 --> 00:23:00,840
make these techniques available to

636
00:22:58,440 --> 00:23:03,539
scientists and you know engineers and

637
00:23:00,840 --> 00:23:05,039
other people who aren't necessarily data

638
00:23:03,539 --> 00:23:06,720
scientists or python developers so

639
00:23:05,039 --> 00:23:09,419
they're not able to sort of access you

640
00:23:06,720 --> 00:23:12,059
know libraries like do wire directly

641
00:23:09,419 --> 00:23:14,039
and that app includes a causal diagram

642
00:23:12,059 --> 00:23:17,340
editor that enables you to sort of

643
00:23:14,039 --> 00:23:18,780
explore you know how different sort of

644
00:23:17,340 --> 00:23:20,400
models of your system would be

645
00:23:18,780 --> 00:23:22,100
represented and how you can use them in

646
00:23:20,400 --> 00:23:25,440
your studies

647
00:23:22,100 --> 00:23:27,600
so that pretty much wraps things up

648
00:23:25,440 --> 00:23:29,159
um I hope I've sort of made you at least

649
00:23:27,600 --> 00:23:31,080
made you intrigued about you know

650
00:23:29,159 --> 00:23:32,640
causality and causal inference I mean I

651
00:23:31,080 --> 00:23:35,100
believe that we should be using these

652
00:23:32,640 --> 00:23:36,720
methods more widely sort of discussing

653
00:23:35,100 --> 00:23:39,240
them sort of thinking about cause and

654
00:23:36,720 --> 00:23:41,940
effect and explicitly in a lot of

655
00:23:39,240 --> 00:23:44,100
especially observational studies there

656
00:23:41,940 --> 00:23:45,960
is this particular opportunity we're

657
00:23:44,100 --> 00:23:47,460
seeing where you know there's

658
00:23:45,960 --> 00:23:49,620
organizations have a huge amount of

659
00:23:47,460 --> 00:23:51,179
historical data and they've got that

660
00:23:49,620 --> 00:23:53,280
detailed domain knowledge that makes it

661
00:23:51,179 --> 00:23:55,320
very sort of accessible there and if

662
00:23:53,280 --> 00:23:57,900
you're thinking about doing cause of

663
00:23:55,320 --> 00:23:59,880
inference then I recommend do why it's

664
00:23:57,900 --> 00:24:01,200
under active development it's easy to

665
00:23:59,880 --> 00:24:02,880
use

666
00:24:01,200 --> 00:24:05,039
um and yeah as mentioned the code for

667
00:24:02,880 --> 00:24:08,100
this talk is available in the link there

668
00:24:05,039 --> 00:24:11,100
thanks for listening

669
00:24:08,100 --> 00:24:11,100
foreign