1
00:00:04,960 --> 00:00:19,999
[Music]

2
00:00:20,800 --> 00:00:25,519
hello everybody Welcome to the last talk

3
00:00:22,680 --> 00:00:27,400
in this session just before lunch we've

4
00:00:25,519 --> 00:00:28,760
got a great talk coming up this is Ana

5
00:00:27,400 --> 00:00:30,759
Tish who's here to talk to us about

6
00:00:28,760 --> 00:00:32,880
performant python please welcome her to

7
00:00:30,759 --> 00:00:32,880
the

8
00:00:36,239 --> 00:00:41,160
stage thank you thank you uh good

9
00:00:38,760 --> 00:00:43,600
afternoon as you heard my name is Anna

10
00:00:41,160 --> 00:00:45,640
and today we're going to do a brief

11
00:00:43,600 --> 00:00:49,079
introduction on doing performance

12
00:00:45,640 --> 00:00:49,079
analysis of python

13
00:00:51,000 --> 00:00:56,480
code okay so I work for Microsoft on the

14
00:00:54,000 --> 00:00:58,199
Azure SDK for python uh so throughout

15
00:00:56,480 --> 00:01:00,879
this talk you'll probably hear me refer

16
00:00:58,199 --> 00:01:03,719
to customers or the customer experience

17
00:01:00,879 --> 00:01:05,600
and that's because my background is in

18
00:01:03,719 --> 00:01:08,960
uh building python libraries that we

19
00:01:05,600 --> 00:01:10,560
ship on pii for other developers to use

20
00:01:08,960 --> 00:01:12,119
uh so the word customer may mean

21
00:01:10,560 --> 00:01:14,960
different things to you depending on who

22
00:01:12,119 --> 00:01:16,200
it is that you write python code for uh

23
00:01:14,960 --> 00:01:18,759
so we're going to talk about what

24
00:01:16,200 --> 00:01:19,720
performance means uh why it matters and

25
00:01:18,759 --> 00:01:22,240
then we're going to look at how to

26
00:01:19,720 --> 00:01:23,759
profile your code uh and how to process

27
00:01:22,240 --> 00:01:26,680
and interpret those results and actually

28
00:01:23,759 --> 00:01:31,560
turn it into improved performance uh and

29
00:01:26,680 --> 00:01:34,079
we are going to focus on cpython today

30
00:01:31,560 --> 00:01:35,840
so let's Jump Right In what do I mean

31
00:01:34,079 --> 00:01:37,360
when I say performance performance can

32
00:01:35,840 --> 00:01:39,960
mean different things to different

33
00:01:37,360 --> 00:01:42,960
projects it can refer to Resource use

34
00:01:39,960 --> 00:01:45,119
like CPU utilization memory usage uh

35
00:01:42,960 --> 00:01:46,840
Network bandwidth uh or it could mean

36
00:01:45,119 --> 00:01:48,960
something like throughput or

37
00:01:46,840 --> 00:01:52,119
latency for this talk we're going to

38
00:01:48,960 --> 00:01:54,399
focus on the efficient execution of CPU

39
00:01:52,119 --> 00:01:57,439
bound python code uh so we're not going

40
00:01:54,399 --> 00:01:59,560
to cover IO or um concurrent programming

41
00:01:57,439 --> 00:02:02,439
like asynchronous or multi-threaded uh

42
00:01:59,560 --> 00:02:04,920
but the tools we look at uh can support

43
00:02:02,439 --> 00:02:04,920
those as

44
00:02:05,320 --> 00:02:10,119
well uh so why do we care about this um

45
00:02:08,319 --> 00:02:12,280
python doesn't have a great reputation

46
00:02:10,119 --> 00:02:14,920
for performance and it's usually not the

47
00:02:12,280 --> 00:02:17,040
language of First Choice uh for people

48
00:02:14,920 --> 00:02:19,360
looking to build really performance

49
00:02:17,040 --> 00:02:21,599
systems uh and even if we are using

50
00:02:19,360 --> 00:02:23,400
python we can just use a c extension to

51
00:02:21,599 --> 00:02:25,239
improve

52
00:02:23,400 --> 00:02:27,599
performance but it doesn't actually have

53
00:02:25,239 --> 00:02:29,400
to be this way cpython is built on top

54
00:02:27,599 --> 00:02:31,560
of c and if we know how to really

55
00:02:29,400 --> 00:02:32,920
Leverage that uh and write our python

56
00:02:31,560 --> 00:02:35,319
code carefully we can actually get

57
00:02:32,920 --> 00:02:37,200
pretty awesome performance uh if you're

58
00:02:35,319 --> 00:02:39,239
writing code for customers they might

59
00:02:37,200 --> 00:02:41,680
have existing projects or ecosystems

60
00:02:39,239 --> 00:02:43,959
already built in Python uh or they may

61
00:02:41,680 --> 00:02:45,879
just be purely python trained divs and

62
00:02:43,959 --> 00:02:47,280
asking them to rewrite components in

63
00:02:45,879 --> 00:02:49,599
another language just to achieve

64
00:02:47,280 --> 00:02:51,840
performance might not even be

65
00:02:49,599 --> 00:02:53,640
feasible also just because performance

66
00:02:51,840 --> 00:02:55,239
might not be the primary focus of your

67
00:02:53,640 --> 00:02:56,920
project uh that doesn't mean it needs to

68
00:02:55,239 --> 00:02:59,640
run slow we really like it when all of

69
00:02:56,920 --> 00:03:01,519
our code runs fast and uh if you do want

70
00:02:59,640 --> 00:03:03,440
to fallback to using a c extension there

71
00:03:01,519 --> 00:03:05,640
can be some drawbacks to that uh which

72
00:03:03,440 --> 00:03:08,560
I'll talk about in a little

73
00:03:05,640 --> 00:03:10,120
bit if you have a large codebase you

74
00:03:08,560 --> 00:03:12,840
don't need to worry about doing

75
00:03:10,120 --> 00:03:14,480
performance analysis for all of it uh it

76
00:03:12,840 --> 00:03:16,000
can be quite an expensive timec

77
00:03:14,480 --> 00:03:18,680
consuming process so you really want to

78
00:03:16,000 --> 00:03:20,159
be strategic and focus uh that time

79
00:03:18,680 --> 00:03:21,760
where it's going to matter in your

80
00:03:20,159 --> 00:03:24,920
critical path or your hot path through

81
00:03:21,760 --> 00:03:26,080
your code or areas of really high reuse

82
00:03:24,920 --> 00:03:28,319
um or areas that you're going to

83
00:03:26,080 --> 00:03:30,159
distribute to other teams to use and

84
00:03:28,319 --> 00:03:32,480
that way if it's running as efficient as

85
00:03:30,159 --> 00:03:36,080
possible everybody

86
00:03:32,480 --> 00:03:37,400
benefits so let's jump straight into how

87
00:03:36,080 --> 00:03:39,280
how do we analyze our code for

88
00:03:37,400 --> 00:03:40,879
performance uh to start with we need to

89
00:03:39,280 --> 00:03:42,519
know what it is we're going to measure

90
00:03:40,879 --> 00:03:44,000
um this could be throughput so how much

91
00:03:42,519 --> 00:03:46,560
data we can process in a certain amount

92
00:03:44,000 --> 00:03:48,799
of time uh this could be latency so how

93
00:03:46,560 --> 00:03:50,760
quickly are are we responding to

94
00:03:48,799 --> 00:03:52,319
requests um once we know what we're

95
00:03:50,760 --> 00:03:54,560
measuring we need to know what success

96
00:03:52,319 --> 00:03:56,959
looks like uh is there a specific

97
00:03:54,560 --> 00:03:58,280
benchmark we need to meet uh maybe we

98
00:03:56,959 --> 00:04:00,799
simply want to check our code and

99
00:03:58,280 --> 00:04:02,760
minimize any bottlenecks uh or maybe we

100
00:04:00,799 --> 00:04:04,760
have a certain uh window of time two

101
00:04:02,760 --> 00:04:07,680
weeks or a Sprint uh and we want to make

102
00:04:04,760 --> 00:04:09,799
as much progress as we can in that time

103
00:04:07,680 --> 00:04:12,760
frame so the example we're going to work

104
00:04:09,799 --> 00:04:14,280
through today is actually um a real

105
00:04:12,760 --> 00:04:17,400
issue that a customer opened on our

106
00:04:14,280 --> 00:04:19,320
GitHub uh repo between version one and

107
00:04:17,400 --> 00:04:21,639
version two of a library that we shipped

108
00:04:19,320 --> 00:04:24,479
the customer reported a 20x performance

109
00:04:21,639 --> 00:04:26,639
regression in model deserialization so

110
00:04:24,479 --> 00:04:29,840
that is uh in this case the process of

111
00:04:26,639 --> 00:04:33,120
loading Json data from a file on disk

112
00:04:29,840 --> 00:04:35,199
and then uh turning that into a python

113
00:04:33,120 --> 00:04:36,479
object uh so that is what we're trying

114
00:04:35,199 --> 00:04:40,360
to measure here that is our performance

115
00:04:36,479 --> 00:04:43,520
metric um the speed of deserialization

116
00:04:40,360 --> 00:04:45,440
from uh raw Json data into python object

117
00:04:43,520 --> 00:04:47,440
and our goal here is to reach parity

118
00:04:45,440 --> 00:04:50,440
with version

119
00:04:47,440 --> 00:04:53,000
one so this is the code um as you can

120
00:04:50,440 --> 00:04:55,880
see here we have our model document

121
00:04:53,000 --> 00:04:59,400
analysis uh what this model represents

122
00:04:55,880 --> 00:05:01,720
is a deconstructed PDF document so it

123
00:04:59,400 --> 00:05:04,280
breaks down individual pages and what's

124
00:05:01,720 --> 00:05:06,520
on each page paragraphs of text images

125
00:05:04,280 --> 00:05:08,759
tables and things like that uh so this

126
00:05:06,520 --> 00:05:11,360
is the loading of the Json data from

127
00:05:08,759 --> 00:05:13,479
disk and then this is the one

128
00:05:11,360 --> 00:05:15,400
problematic line uh so this is just

129
00:05:13,479 --> 00:05:18,120
constructing an object by passing in

130
00:05:15,400 --> 00:05:20,440
that raw Json

131
00:05:18,120 --> 00:05:22,440
data so how do we profile this we've got

132
00:05:20,440 --> 00:05:23,960
a few tools uh that we can use and the

133
00:05:22,440 --> 00:05:26,000
one we're going to focus on today is C

134
00:05:23,960 --> 00:05:27,160
profile and that is the profiler that

135
00:05:26,000 --> 00:05:29,400
ships with

136
00:05:27,160 --> 00:05:31,360
cpython um if you have access to py

137
00:05:29,400 --> 00:05:33,400
chart py Cham has a really awesome Suite

138
00:05:31,360 --> 00:05:35,759
of performance analysis tools uh the

139
00:05:33,400 --> 00:05:37,280
default profiler with py Cham is yappi

140
00:05:35,759 --> 00:05:40,000
so you can also use yappi directly if

141
00:05:37,280 --> 00:05:42,560
you don't have pyam uh snake viz and viz

142
00:05:40,000 --> 00:05:44,240
Tracer are two visualization tools uh

143
00:05:42,560 --> 00:05:46,560
snake viz is the one we're going to use

144
00:05:44,240 --> 00:05:48,919
today uh but I use both regularly um

145
00:05:46,560 --> 00:05:51,720
they have uh they operate on the outputs

146
00:05:48,919 --> 00:05:53,720
of C profile and Austin is another

147
00:05:51,720 --> 00:05:56,639
profiler and this one has a really nice

148
00:05:53,720 --> 00:05:58,840
uh vs code extension when you're doing

149
00:05:56,639 --> 00:06:01,360
profiling it's really important that you

150
00:05:58,840 --> 00:06:03,360
try and uh run these tests on an

151
00:06:01,360 --> 00:06:05,160
isolated uh well-defined and

152
00:06:03,360 --> 00:06:06,880
reproducible environment I do not

153
00:06:05,160 --> 00:06:09,160
recommend doing this on your local Dev

154
00:06:06,880 --> 00:06:12,440
machine or laptop uh you will get very

155
00:06:09,160 --> 00:06:15,400
inconsistent and inaccurate

156
00:06:12,440 --> 00:06:17,000
results so how do we run the profiler uh

157
00:06:15,400 --> 00:06:18,560
very simple command line we've got here

158
00:06:17,000 --> 00:06:21,280
so we're just invoking the C profile

159
00:06:18,560 --> 00:06:23,000
module giving it an arbitrary output

160
00:06:21,280 --> 00:06:24,400
name this can be anything and then we're

161
00:06:23,000 --> 00:06:27,120
passing in the script we just looked at

162
00:06:24,400 --> 00:06:28,880
before perf test.py we can tell it's run

163
00:06:27,120 --> 00:06:31,199
because we've got one print line there

164
00:06:28,880 --> 00:06:34,280
and then to load that into snake B we're

165
00:06:31,199 --> 00:06:37,120
just passing in the output file that we

166
00:06:34,280 --> 00:06:39,440
defined and we get this nice big table

167
00:06:37,120 --> 00:06:41,960
of numbers so what are we looking at

168
00:06:39,440 --> 00:06:45,000
here uh the first thing we can see is we

169
00:06:41,960 --> 00:06:46,639
have function names uh and line numbers

170
00:06:45,000 --> 00:06:48,080
uh we can see at the top here per test.

171
00:06:46,639 --> 00:06:50,639
Pi that's the module we were running so

172
00:06:48,080 --> 00:06:52,639
that looks good uh next we have this

173
00:06:50,639 --> 00:06:54,400
Constructor uh so that is actually the

174
00:06:52,639 --> 00:06:56,360
Constructor of that document analysis

175
00:06:54,400 --> 00:06:58,479
model so again that's what we would

176
00:06:56,360 --> 00:07:00,240
expect third thing down we have another

177
00:06:58,479 --> 00:07:03,120
Constructor and this is actually the

178
00:07:00,240 --> 00:07:05,120
Base Class of that model so that model

179
00:07:03,120 --> 00:07:07,319
inherits from this one uh so again that

180
00:07:05,120 --> 00:07:09,919
one's also expected and we have a whole

181
00:07:07,319 --> 00:07:12,520
bunch of uh utility functions all called

182
00:07:09,919 --> 00:07:15,199
some variation on deserialize so that

183
00:07:12,520 --> 00:07:17,000
sounds reasonable um and then we have a

184
00:07:15,199 --> 00:07:19,639
couple of copy functions at the bottom

185
00:07:17,000 --> 00:07:22,360
here so we're going to get to that in a

186
00:07:19,639 --> 00:07:25,759
bit uh so the numbers what numbers do we

187
00:07:22,360 --> 00:07:27,560
have first colum we have is uh number of

188
00:07:25,759 --> 00:07:29,639
calls so this is how many times each of

189
00:07:27,560 --> 00:07:31,479
these functions were called uh perest

190
00:07:29,639 --> 00:07:33,919
module was run once that sounds about

191
00:07:31,479 --> 00:07:36,800
right uh model Constructor that also was

192
00:07:33,919 --> 00:07:37,960
one once perfect uh base model

193
00:07:36,800 --> 00:07:39,919
Constructor we can see that was

194
00:07:37,960 --> 00:07:42,400
originally called once uh but the second

195
00:07:39,919 --> 00:07:47,039
number here indicates that it was called

196
00:07:42,400 --> 00:07:49,599
recursively 709 times uh and what that

197
00:07:47,039 --> 00:07:51,599
means is that even though we have one

198
00:07:49,599 --> 00:07:54,919
class that we're creating uh that is

199
00:07:51,599 --> 00:07:56,800
actually made up of 79 smaller classes

200
00:07:54,919 --> 00:08:00,039
that are all being instantiated as part

201
00:07:56,800 --> 00:08:02,879
of that deserialization process

202
00:08:00,039 --> 00:08:05,360
uh next column we have total time and

203
00:08:02,879 --> 00:08:08,159
this is the length of time spent running

204
00:08:05,360 --> 00:08:10,199
a function exclusive of any sub

205
00:08:08,159 --> 00:08:12,039
functions that it calls into so these

206
00:08:10,199 --> 00:08:14,360
are the numbers we want to use if we're

207
00:08:12,039 --> 00:08:15,639
looking at just isolating those lines of

208
00:08:14,360 --> 00:08:17,560
code in the function to figure out how

209
00:08:15,639 --> 00:08:20,400
long they take to run and then we have

210
00:08:17,560 --> 00:08:23,039
that broken down on a peral basis next

211
00:08:20,400 --> 00:08:25,280
column we have is the cumulative time

212
00:08:23,039 --> 00:08:28,159
and so this is the amount of time spent

213
00:08:25,280 --> 00:08:30,240
in that function inclusive of any sub

214
00:08:28,159 --> 00:08:32,599
functions that it calls into and so we

215
00:08:30,240 --> 00:08:34,719
want to use these numbers when we're

216
00:08:32,599 --> 00:08:38,039
trying to get a sense of where time is

217
00:08:34,719 --> 00:08:40,719
being spent uh to achieve certain tasks

218
00:08:38,039 --> 00:08:44,159
and again we have that broken down per

219
00:08:40,719 --> 00:08:47,600
call so how do I use this information so

220
00:08:44,159 --> 00:08:49,320
my process when I am profiling code uh

221
00:08:47,600 --> 00:08:52,200
is there are three red flags that I look

222
00:08:49,320 --> 00:08:54,279
for uh the first one is probably very

223
00:08:52,200 --> 00:08:56,279
obvious which areas are computationally

224
00:08:54,279 --> 00:08:58,120
expensive so this is how much time

225
00:08:56,279 --> 00:08:59,600
things are taking to run and for this

226
00:08:58,120 --> 00:09:01,600
we're using our total time and our

227
00:08:59,600 --> 00:09:03,920
cumulative time and the per call run

228
00:09:01,600 --> 00:09:05,720
times uh the next thing I look for is

229
00:09:03,920 --> 00:09:07,399
areas that are excessively repetitive

230
00:09:05,720 --> 00:09:09,600
and they're uh functions that have being

231
00:09:07,399 --> 00:09:11,519
called more times than seems reasonable

232
00:09:09,600 --> 00:09:15,120
for what they're doing uh and that's

233
00:09:11,519 --> 00:09:16,480
with the uh in calls column and lastly

234
00:09:15,120 --> 00:09:18,040
uh this one can have some interesting

235
00:09:16,480 --> 00:09:19,760
results and that's time wasted

236
00:09:18,040 --> 00:09:23,000
unproductively so this is when I'm

237
00:09:19,760 --> 00:09:24,640
looking for function calls that uh don't

238
00:09:23,000 --> 00:09:25,959
look like they should even be there or

239
00:09:24,640 --> 00:09:29,839
at the very least shouldn't be

240
00:09:25,959 --> 00:09:31,360
represented so high in the numbers

241
00:09:29,839 --> 00:09:32,480
so before we start iterating on this

242
00:09:31,360 --> 00:09:34,079
code we're going to clean it up a little

243
00:09:32,480 --> 00:09:35,560
bit because before we were just running

244
00:09:34,079 --> 00:09:37,240
uh from the command line and that was

245
00:09:35,560 --> 00:09:40,360
analyzing the entire script when in

246
00:09:37,240 --> 00:09:42,240
reality we really only want to uh

247
00:09:40,360 --> 00:09:44,519
analyze the performance of that one line

248
00:09:42,240 --> 00:09:47,079
that Constructor for the object so we're

249
00:09:44,519 --> 00:09:48,640
going to import C profile as a module

250
00:09:47,079 --> 00:09:51,000
and we're going to run it in a context

251
00:09:48,640 --> 00:09:54,480
manager like this and then that we can

252
00:09:51,000 --> 00:09:55,680
output our file at the end uh I'm also

253
00:09:54,480 --> 00:09:57,040
going to take that one line and I'm

254
00:09:55,680 --> 00:09:59,160
going to run it in a loop and there's

255
00:09:57,040 --> 00:10:01,120
two reasons for this uh one is that it's

256
00:09:59,160 --> 00:10:02,560
going to average out any variation in

257
00:10:01,120 --> 00:10:03,760
the runtime a little bit the first time

258
00:10:02,560 --> 00:10:05,160
it runs it's going to be a wee bit

259
00:10:03,760 --> 00:10:07,680
slower because it's doing some caching

260
00:10:05,160 --> 00:10:09,200
and we want to sort of average that out

261
00:10:07,680 --> 00:10:11,560
uh the second reason is that we're

262
00:10:09,200 --> 00:10:13,040
dealing with very very tiny numbers here

263
00:10:11,560 --> 00:10:15,120
a lot of this is in the millisecond and

264
00:10:13,040 --> 00:10:17,839
the microsc range so by uh running it

265
00:10:15,120 --> 00:10:19,800
more times we boost it into the more

266
00:10:17,839 --> 00:10:22,640
human readable seconds so that we get a

267
00:10:19,800 --> 00:10:22,640
better sense of what we're

268
00:10:22,680 --> 00:10:27,079
doing okay so we've re rerun our

269
00:10:25,880 --> 00:10:28,920
profiler results and they're a little

270
00:10:27,079 --> 00:10:30,800
bit cleaner now we no longer have the uh

271
00:10:28,920 --> 00:10:33,160
the mod module at the top there we've

272
00:10:30,800 --> 00:10:35,880
jumped straight into our Constructor and

273
00:10:33,160 --> 00:10:38,079
we can see as expected that ran 50 times

274
00:10:35,880 --> 00:10:40,399
in our Loop now the cool thing about

275
00:10:38,079 --> 00:10:42,079
snake viz and a lot of the profiling

276
00:10:40,399 --> 00:10:44,320
tools will give you sort of a similar

277
00:10:42,079 --> 00:10:46,839
output is that we can have this kind of

278
00:10:44,320 --> 00:10:48,680
graphical breakdown as to where time and

279
00:10:46,839 --> 00:10:50,959
resources are being spent uh when it

280
00:10:48,680 --> 00:10:53,920
runs the code uh we can see at the top

281
00:10:50,959 --> 00:10:56,360
here for 36 and 1 12 seconds we spent in

282
00:10:53,920 --> 00:10:59,360
our model Constructor so that is the

283
00:10:56,360 --> 00:11:01,120
entire deserialization process and again

284
00:10:59,360 --> 00:11:03,320
all of that time is shared with how much

285
00:11:01,120 --> 00:11:06,279
time was spent in the Base Class

286
00:11:03,320 --> 00:11:09,079
Constructor from there the process

287
00:11:06,279 --> 00:11:11,639
breaks into multiple paths um depending

288
00:11:09,079 --> 00:11:14,720
on the call stack uh and immediately we

289
00:11:11,639 --> 00:11:16,680
have a bit of a red flag here um why is

290
00:11:14,720 --> 00:11:18,680
our process spending so much time

291
00:11:16,680 --> 00:11:20,720
copying data this is making up a huge

292
00:11:18,680 --> 00:11:22,120
chunk of our runtime so the first thing

293
00:11:20,720 --> 00:11:24,720
I'm going to do with these results is

294
00:11:22,120 --> 00:11:27,040
figure out what's going on

295
00:11:24,720 --> 00:11:29,800
there and this is actually the base

296
00:11:27,040 --> 00:11:31,279
class uh for our model and we can see

297
00:11:29,800 --> 00:11:34,079
every time this is constructed we're

298
00:11:31,279 --> 00:11:36,760
doing a full copy of the Json data that

299
00:11:34,079 --> 00:11:39,200
is being passed in and if you recall

300
00:11:36,760 --> 00:11:43,519
from the previous table this function is

301
00:11:39,200 --> 00:11:45,600
being called 709 times on that data uh

302
00:11:43,519 --> 00:11:48,079
so in this kind of deserialization

303
00:11:45,600 --> 00:11:49,839
process it might be reasonable to copy

304
00:11:48,079 --> 00:11:53,639
the data once because a dictionary is a

305
00:11:49,839 --> 00:11:55,959
mutable object uh definitely not 79

306
00:11:53,639 --> 00:12:00,240
times uh so first step we're just going

307
00:11:55,959 --> 00:12:01,959
to clean that up and rerun our profiler

308
00:12:00,240 --> 00:12:04,920
immediately you can see we've dropped

309
00:12:01,959 --> 00:12:06,600
down to 5.92 seconds so one iteration on

310
00:12:04,920 --> 00:12:08,959
this profile data we found a massive

311
00:12:06,600 --> 00:12:10,959
bottleneck uh and simply removing that

312
00:12:08,959 --> 00:12:12,639
has cleaned up our data a lot we're

313
00:12:10,959 --> 00:12:14,560
spending a lot more time now actually

314
00:12:12,639 --> 00:12:16,519
doing the

315
00:12:14,560 --> 00:12:19,160
deserialization uh now we have this sort

316
00:12:16,519 --> 00:12:21,199
of Big Orange rectangle to the side here

317
00:12:19,160 --> 00:12:23,440
so that looks a we bit

318
00:12:21,199 --> 00:12:26,360
suspicious uh so we run our numbers

319
00:12:23,440 --> 00:12:29,360
again this time we've uh organized it

320
00:12:26,360 --> 00:12:31,360
for the uh total time and we can see

321
00:12:29,360 --> 00:12:32,880
we've got this serialized function here

322
00:12:31,360 --> 00:12:35,279
and it's one of the most expensive

323
00:12:32,880 --> 00:12:37,800
functions um which strikes me as a bit

324
00:12:35,279 --> 00:12:40,040
odd because we're deserializing so why

325
00:12:37,800 --> 00:12:42,399
have we got uh such an expensive

326
00:12:40,040 --> 00:12:45,199
function here that's doing serializing

327
00:12:42,399 --> 00:12:47,440
so let's take a closer look at

328
00:12:45,199 --> 00:12:49,480
that and this is the function in

329
00:12:47,440 --> 00:12:51,720
question uh and it turns out this was

330
00:12:49,480 --> 00:12:53,880
actually being used as a fallback so

331
00:12:51,720 --> 00:12:55,160
anytime we had uncontrol data that we

332
00:12:53,880 --> 00:12:57,800
didn't know what it was or what to do

333
00:12:55,160 --> 00:12:59,480
with it it was calling this function uh

334
00:12:57,800 --> 00:13:01,399
and as you can see it's sort of a big if

335
00:12:59,480 --> 00:13:04,279
else statement it's doing a whole lot of

336
00:13:01,399 --> 00:13:06,839
instance checks is it a list is it a

337
00:13:04,279 --> 00:13:09,600
dictionary uh is it a date time is it a

338
00:13:06,839 --> 00:13:11,800
Time Delta and if it's none of these

339
00:13:09,600 --> 00:13:13,959
things we get to the end of the function

340
00:13:11,800 --> 00:13:16,839
and do absolutely nothing um so we're

341
00:13:13,959 --> 00:13:19,000
returning the data completely unaltered

342
00:13:16,839 --> 00:13:21,519
uh I happen to know that the data set

343
00:13:19,000 --> 00:13:24,680
we're working on is almost entirely

344
00:13:21,519 --> 00:13:27,000
strings integers and floats and so this

345
00:13:24,680 --> 00:13:29,680
function was being called on all uh

346
00:13:27,000 --> 00:13:32,720
every uh data entry in that file

347
00:13:29,680 --> 00:13:33,959
and doing absolutely nothing with it so

348
00:13:32,720 --> 00:13:36,560
there are two things we could do with

349
00:13:33,959 --> 00:13:38,279
this we could either refactor it to xit

350
00:13:36,560 --> 00:13:40,360
earlier in the case of a simple data

351
00:13:38,279 --> 00:13:41,920
type uh or we could refactor this

352
00:13:40,360 --> 00:13:44,600
function out

353
00:13:41,920 --> 00:13:46,839
completely and Not only was serialize

354
00:13:44,600 --> 00:13:48,800
one of our most expensive functions uh

355
00:13:46,839 --> 00:13:51,320
but we can see here that serialized date

356
00:13:48,800 --> 00:13:53,120
time also featured as well as time Delta

357
00:13:51,320 --> 00:13:55,279
so that one function was accounting for

358
00:13:53,120 --> 00:13:58,040
three of our Most Wanted expensive

359
00:13:55,279 --> 00:14:01,079
functions so we got rid of

360
00:13:58,040 --> 00:14:02,959
that run our profiler and now we're down

361
00:14:01,079 --> 00:14:04,240
to 4.18 seconds so we're shaving off a

362
00:14:02,959 --> 00:14:06,680
little bit more

363
00:14:04,240 --> 00:14:08,560
time uh so now we're spending a lot more

364
00:14:06,680 --> 00:14:10,680
time actually doing the deserialization

365
00:14:08,560 --> 00:14:12,320
and that's the goal here is to maximize

366
00:14:10,680 --> 00:14:15,000
the amount of time that we're actually

367
00:14:12,320 --> 00:14:16,560
doing the work we want to be doing um

368
00:14:15,000 --> 00:14:19,160
once we've achieved that we can actually

369
00:14:16,560 --> 00:14:21,480
start uh fine-tuning that workload as

370
00:14:19,160 --> 00:14:23,959
well so this deserialized with callable

371
00:14:21,480 --> 00:14:25,399
as one of our most expensive functions

372
00:14:23,959 --> 00:14:28,040
and we can see that here when we

373
00:14:25,399 --> 00:14:29,680
organize for total time uh again it's

374
00:14:28,040 --> 00:14:31,959
showing up as one of our most expensive

375
00:14:29,680 --> 00:14:33,519
functions and it's run like 161,000

376
00:14:31,959 --> 00:14:35,000
times so we should try and optimize that

377
00:14:33,519 --> 00:14:37,240
a we

378
00:14:35,000 --> 00:14:38,399
bit uh this is that function and it

379
00:14:37,240 --> 00:14:39,800
looks a little bit similar to the last

380
00:14:38,399 --> 00:14:42,000
one we've kind of got this if elf

381
00:14:39,800 --> 00:14:45,320
structure going on uh again we've got

382
00:14:42,000 --> 00:14:47,480
more of these is instance checks um in

383
00:14:45,320 --> 00:14:49,839
Python uh we have a preference for the

384
00:14:47,480 --> 00:14:51,480
pattern of uh ask forgiveness rather

385
00:14:49,839 --> 00:14:53,480
than permission instead of look before

386
00:14:51,480 --> 00:14:55,000
you leap which is what we're seeing here

387
00:14:53,480 --> 00:14:57,839
so we can refactor this method a little

388
00:14:55,000 --> 00:14:59,639
bit to we be a wee bit more assertive

389
00:14:57,839 --> 00:15:03,040
about the data types where we're working

390
00:14:59,639 --> 00:15:05,440
with uh and we can also Shuffle the

391
00:15:03,040 --> 00:15:08,240
order a wee bit around um so I happen to

392
00:15:05,440 --> 00:15:11,240
know that this scenario here happens

393
00:15:08,240 --> 00:15:13,000
more frequently than this scenario here

394
00:15:11,240 --> 00:15:15,040
so in that case we should move that

395
00:15:13,000 --> 00:15:19,600
further up and so this function will

396
00:15:15,040 --> 00:15:19,600
exit faster on the most high trffic

397
00:15:19,920 --> 00:15:25,399
scenario so this is our restructured

398
00:15:22,519 --> 00:15:27,920
function uh we moved to a tri except

399
00:15:25,399 --> 00:15:29,839
style rather than an is instance style

400
00:15:27,920 --> 00:15:31,440
and we've reordered some of these

401
00:15:29,839 --> 00:15:34,000
operations so that the function is going

402
00:15:31,440 --> 00:15:37,360
to be a little bit more efficient in our

403
00:15:34,000 --> 00:15:37,360
uh most frequent

404
00:15:37,839 --> 00:15:42,759
scenario and we have rerun the numbers

405
00:15:40,000 --> 00:15:44,480
again we are nowed down to 3.38 seconds

406
00:15:42,759 --> 00:15:47,720
so we've made a wee bit more

407
00:15:44,480 --> 00:15:50,360
progress uh and this process we can keep

408
00:15:47,720 --> 00:15:51,839
iterating on I would say my next uh

409
00:15:50,360 --> 00:15:53,959
Target would probably be these light

410
00:15:51,839 --> 00:15:57,480
blue boxes here would like to figure out

411
00:15:53,959 --> 00:15:59,440
what's going on there uh if we organize

412
00:15:57,480 --> 00:16:02,040
our table for number of

413
00:15:59,440 --> 00:16:04,839
calls um this is another bit of a red

414
00:16:02,040 --> 00:16:10,279
flag uh we're running as

415
00:16:04,839 --> 00:16:11,759
instance 736 th000 times um so that

416
00:16:10,279 --> 00:16:14,440
would probably be another thing I start

417
00:16:11,759 --> 00:16:16,639
looking into now to be clear we are

418
00:16:14,440 --> 00:16:18,319
talking about microc seconds here so I

419
00:16:16,639 --> 00:16:20,959
could refactor away all of these is

420
00:16:18,319 --> 00:16:24,240
instance uh calls and I'm saving a total

421
00:16:20,959 --> 00:16:26,160
of16 seconds on 50 models uh so the

422
00:16:24,240 --> 00:16:28,120
point in here is not necessarily to

423
00:16:26,160 --> 00:16:30,800
remove all of those calls and shave off

424
00:16:28,120 --> 00:16:34,160
a tiny amount of time rather when I look

425
00:16:30,800 --> 00:16:36,120
at that um that to me indicates that

426
00:16:34,160 --> 00:16:37,959
maybe we're doing something else very

427
00:16:36,120 --> 00:16:40,440
inefficiently that requires us to check

428
00:16:37,959 --> 00:16:42,920
the type that many times and so perhaps

429
00:16:40,440 --> 00:16:46,560
a more substantial refactor could result

430
00:16:42,920 --> 00:16:49,839
in more substantial uh improvements of

431
00:16:46,560 --> 00:16:51,560
performance so what are we doing here uh

432
00:16:49,839 --> 00:16:53,480
every time we uh see one of these

433
00:16:51,560 --> 00:16:55,560
expensive functions in our profile

434
00:16:53,480 --> 00:16:56,880
results we want to optimize it uh and

435
00:16:55,560 --> 00:16:58,600
the first is the question why the method

436
00:16:56,880 --> 00:17:00,399
is even there is it necessary could we

437
00:16:58,600 --> 00:17:03,079
simp refactor it out completely like

438
00:17:00,399 --> 00:17:05,079
that serialized function um Can the

439
00:17:03,079 --> 00:17:06,360
function be replaced with a faster one

440
00:17:05,079 --> 00:17:08,559
can we utilize something from the

441
00:17:06,360 --> 00:17:11,640
built-in library or um that's already

442
00:17:08,559 --> 00:17:14,880
optimized by C to make this run faster

443
00:17:11,640 --> 00:17:17,959
or maybe we can refactor it to run fewer

444
00:17:14,880 --> 00:17:20,319
times uh next if it's one of these big

445
00:17:17,959 --> 00:17:22,799
uh multipath functions like those big F

446
00:17:20,319 --> 00:17:24,640
statements can we optimize that so that

447
00:17:22,799 --> 00:17:25,600
the majority case is going to be the

448
00:17:24,640 --> 00:17:27,959
most

449
00:17:25,600 --> 00:17:30,240
efficient and lastly are we doubling up

450
00:17:27,959 --> 00:17:32,559
on any logic uh are we doing a lot of

451
00:17:30,240 --> 00:17:36,240
type verification in multiple places on

452
00:17:32,559 --> 00:17:36,240
the same data that could be

453
00:17:36,840 --> 00:17:40,799
simplified so once we've started

454
00:17:39,120 --> 00:17:42,640
iterating on that code we want to verify

455
00:17:40,799 --> 00:17:43,559
that we've actually made improvements uh

456
00:17:42,640 --> 00:17:45,640
so we're going to have a look at the

457
00:17:43,559 --> 00:17:47,679
time it module and again this ships with

458
00:17:45,640 --> 00:17:49,720
the standard Library uh and it's a

459
00:17:47,679 --> 00:17:52,400
really useful tool for doing ab

460
00:17:49,720 --> 00:17:53,679
comparisons on code paths uh and it's a

461
00:17:52,400 --> 00:17:55,120
great way of validating whether you've

462
00:17:53,679 --> 00:17:57,679
met your success

463
00:17:55,120 --> 00:17:59,360
criteria can also be a really good tool

464
00:17:57,679 --> 00:18:01,159
to help you prevent

465
00:17:59,360 --> 00:18:02,840
per progressions going forward so that

466
00:18:01,159 --> 00:18:05,120
you can make sure that once you've

467
00:18:02,840 --> 00:18:07,520
optimized your code uh over time it

468
00:18:05,120 --> 00:18:10,240
doesn't slip back into poor

469
00:18:07,520 --> 00:18:11,720
performance uh also here the results can

470
00:18:10,240 --> 00:18:14,120
be very inconsistent I wouldn't

471
00:18:11,720 --> 00:18:16,200
recommend running this on your laptop uh

472
00:18:14,120 --> 00:18:18,480
if you can use an isolated environment

473
00:18:16,200 --> 00:18:21,039
that uh can be easily reproduced that

474
00:18:18,480 --> 00:18:24,000
would be give you much more um accurate

475
00:18:21,039 --> 00:18:26,799
and reusable

476
00:18:24,000 --> 00:18:28,760
results uh how do we use time it uh very

477
00:18:26,799 --> 00:18:29,799
simple we're importing it as a module

478
00:18:28,760 --> 00:18:31,440
and then we're just passing in the

479
00:18:29,799 --> 00:18:33,320
function we want to measure in this case

480
00:18:31,440 --> 00:18:35,000
the construction of our object and the

481
00:18:33,320 --> 00:18:36,559
number of times we want to run it uh so

482
00:18:35,000 --> 00:18:39,640
we've got AB here with our original

483
00:18:36,559 --> 00:18:41,520
model and our improved one and we can

484
00:18:39,640 --> 00:18:43,919
see here from these numbers we've

485
00:18:41,520 --> 00:18:45,559
achieved about a 4X performance gain so

486
00:18:43,919 --> 00:18:49,559
we've still got quite a long way to go

487
00:18:45,559 --> 00:18:51,200
before we hit that uh 20x um metric that

488
00:18:49,559 --> 00:18:53,039
we want to meet to resolve this customer

489
00:18:51,200 --> 00:18:55,919
issue we haven't closed this issue yet

490
00:18:53,039 --> 00:18:55,919
this is an ongoing

491
00:18:56,159 --> 00:19:01,559
investigation uh so step five is

492
00:18:59,400 --> 00:19:02,960
um as you can see we did about four

493
00:19:01,559 --> 00:19:05,000
iterations and managed to make some

494
00:19:02,960 --> 00:19:06,880
massive improvements and often those

495
00:19:05,000 --> 00:19:09,080
first few iterations are going to yield

496
00:19:06,880 --> 00:19:11,760
some loow hanging fruit that can give

497
00:19:09,080 --> 00:19:13,080
you some really big gains with minimal

498
00:19:11,760 --> 00:19:16,240
um minimal code

499
00:19:13,080 --> 00:19:18,919
changes each iteration is likely to give

500
00:19:16,240 --> 00:19:21,280
you smaller Returns on investment uh or

501
00:19:18,919 --> 00:19:23,960
might result in bigger structural

502
00:19:21,280 --> 00:19:26,840
changes in order to see those per

503
00:19:23,960 --> 00:19:29,440
gains um so if you do find yourself

504
00:19:26,840 --> 00:19:31,120
needing to make massive refactoring do

505
00:19:29,440 --> 00:19:34,520
make sure you have a very thorough test

506
00:19:31,120 --> 00:19:36,039
Suite in place before you do uh and it's

507
00:19:34,520 --> 00:19:39,320
worth noting that there is actually no

508
00:19:36,039 --> 00:19:41,760
end point here um there is no maximum

509
00:19:39,320 --> 00:19:43,360
performance it can always be faster you

510
00:19:41,760 --> 00:19:45,360
will always find something else that you

511
00:19:43,360 --> 00:19:48,320
could tweak to shave off an extra few

512
00:19:45,360 --> 00:19:51,880
microseconds and in some cases uh that

513
00:19:48,320 --> 00:19:51,880
level of fine-tuning will be

514
00:19:51,960 --> 00:19:59,280
important uh so here's a couple of rules

515
00:19:54,000 --> 00:20:01,960
of thumb uh for how to um when you're

516
00:19:59,280 --> 00:20:04,000
writing your code uh things to keep in

517
00:20:01,960 --> 00:20:05,880
mind so that you don't have to do

518
00:20:04,000 --> 00:20:08,120
performance analysis later because your

519
00:20:05,880 --> 00:20:09,080
your code will always be running at its

520
00:20:08,120 --> 00:20:11,159
most

521
00:20:09,080 --> 00:20:13,320
optimal uh to start with you want to

522
00:20:11,159 --> 00:20:15,520
make use of as many of the built-in

523
00:20:13,320 --> 00:20:18,120
functions as python in python as

524
00:20:15,520 --> 00:20:20,520
possible uh as we mentioned cpython it

525
00:20:18,120 --> 00:20:23,480
is built on C and A lot of those

526
00:20:20,520 --> 00:20:26,720
built-in functions are actually uh

527
00:20:23,480 --> 00:20:29,000
highly um fine-tuned C code under the

528
00:20:26,720 --> 00:20:31,840
hood so the more we leverage them the

529
00:20:29,000 --> 00:20:34,760
more we're making use of Python's raw

530
00:20:31,840 --> 00:20:37,400
performance um in terms of the code that

531
00:20:34,760 --> 00:20:39,280
you write a couple of Tricks uh a local

532
00:20:37,400 --> 00:20:42,000
variable lookup is going to be faster

533
00:20:39,280 --> 00:20:44,760
than a global variable lookup uh however

534
00:20:42,000 --> 00:20:48,039
a hardcoded value is going to be even

535
00:20:44,760 --> 00:20:51,159
faster index access and iteration is

536
00:20:48,039 --> 00:20:52,159
faster in a list but lookup is faster in

537
00:20:51,159 --> 00:20:54,760
a

538
00:20:52,159 --> 00:20:57,799
dictionary dictionary lookup is faster

539
00:20:54,760 --> 00:20:59,480
than accessing an object attribute uh

540
00:20:57,799 --> 00:21:01,080
and instantiating a dictionary or a

541
00:20:59,480 --> 00:21:03,880
tuple is going to be faster than an

542
00:21:01,080 --> 00:21:05,360
object so if you're creating a lot of

543
00:21:03,880 --> 00:21:08,200
objects in your

544
00:21:05,360 --> 00:21:10,440
code um and using that as a means to

545
00:21:08,200 --> 00:21:12,480
pass around data you might find you get

546
00:21:10,440 --> 00:21:15,279
a really nice performance Improvement

547
00:21:12,480 --> 00:21:17,600
simply by replacing them with

548
00:21:15,279 --> 00:21:19,600
dictionaries uh if you're concatenating

549
00:21:17,600 --> 00:21:20,320
data bite array is going to be faster

550
00:21:19,600 --> 00:21:22,880
than

551
00:21:20,320 --> 00:21:24,679
bytes however if you're slicing data

552
00:21:22,880 --> 00:21:27,640
memory view is the way to

553
00:21:24,679 --> 00:21:30,000
go Loop and dictionary comprehensions

554
00:21:27,640 --> 00:21:32,000
are highly op in Python so if you're

555
00:21:30,000 --> 00:21:35,279
doing a lot of looping or building

556
00:21:32,000 --> 00:21:37,799
dictionaries or things like that um try

557
00:21:35,279 --> 00:21:39,640
and squeeze that code into a

558
00:21:37,799 --> 00:21:41,840
comprehension um because that way you're

559
00:21:39,640 --> 00:21:43,200
going to leverage a lot of uh Python's

560
00:21:41,840 --> 00:21:45,760
native

561
00:21:43,200 --> 00:21:48,039
performance uh if you have a lot of

562
00:21:45,760 --> 00:21:50,720
Branch en code paths like those if El

563
00:21:48,039 --> 00:21:52,200
statements we saw before try to optimize

564
00:21:50,720 --> 00:21:55,200
them so that your highest traffic

565
00:21:52,200 --> 00:21:58,240
scenario is going to be the first to

566
00:21:55,200 --> 00:22:00,279
exit uh this might be sacrificing some

567
00:21:58,240 --> 00:22:03,320
perform perance of your low traffic

568
00:22:00,279 --> 00:22:05,159
scenarios uh so depending on uh your

569
00:22:03,320 --> 00:22:06,440
needs uh would be whether you want to

570
00:22:05,159 --> 00:22:09,279
make that

571
00:22:06,440 --> 00:22:11,159
compromise and lastly do try and use the

572
00:22:09,279 --> 00:22:14,320
most recent version of python that you

573
00:22:11,159 --> 00:22:16,840
can um if we are writing really good

574
00:22:14,320 --> 00:22:20,600
pythonic code that is leveraging uh the

575
00:22:16,840 --> 00:22:23,640
performance of um C standard library

576
00:22:20,600 --> 00:22:25,159
then uh the more we can upgrade the more

577
00:22:23,640 --> 00:22:26,760
we can make use of whatever performance

578
00:22:25,159 --> 00:22:28,880
improvements are built into python

579
00:22:26,760 --> 00:22:30,919
itself uh especially if you can be on

580
00:22:28,880 --> 00:22:33,919
3.11 or higher because that included

581
00:22:30,919 --> 00:22:37,279
some really great perf

582
00:22:33,919 --> 00:22:39,400
work so there is a downside the more you

583
00:22:37,279 --> 00:22:41,640
iterate on your code to improve

584
00:22:39,400 --> 00:22:43,600
performance um the more you might find

585
00:22:41,640 --> 00:22:47,000
it goes against your instincts as a

586
00:22:43,600 --> 00:22:48,720
programmer so we mentioned before how a

587
00:22:47,000 --> 00:22:52,400
uh Global variable is a lot less

588
00:22:48,720 --> 00:22:54,960
performant than say a hardcoded value uh

589
00:22:52,400 --> 00:22:56,679
and object attribute lookup is a lot

590
00:22:54,960 --> 00:23:00,080
more expensive than just using a

591
00:22:56,679 --> 00:23:02,360
dictionary uh and this goes on um the

592
00:23:00,080 --> 00:23:06,120
most efficient code you will find is one

593
00:23:02,360 --> 00:23:08,919
giant long flat function uh whereas it's

594
00:23:06,120 --> 00:23:11,320
much more easy to read more modular more

595
00:23:08,919 --> 00:23:15,240
testable uh to break that out into lots

596
00:23:11,320 --> 00:23:17,080
of utility functions um so there is

597
00:23:15,240 --> 00:23:20,279
compromises here depending on where you

598
00:23:17,080 --> 00:23:23,279
fall between code readability code

599
00:23:20,279 --> 00:23:24,880
maintainability versus Raw performance

600
00:23:23,279 --> 00:23:27,000
and so where that line is for your

601
00:23:24,880 --> 00:23:29,640
project is going to depend on your

602
00:23:27,000 --> 00:23:32,279
needs um you can compensate for this a

603
00:23:29,640 --> 00:23:34,360
little bit with lots of inline

604
00:23:32,279 --> 00:23:37,799
documentation uh and if you find

605
00:23:34,360 --> 00:23:40,120
yourself making a um non-developer

606
00:23:37,799 --> 00:23:42,640
friendly design decision in order to

607
00:23:40,120 --> 00:23:45,600
achieve Optimal Performance make sure

608
00:23:42,640 --> 00:23:47,960
that is documented in the code and that

609
00:23:45,600 --> 00:23:50,919
way uh if someone else comes along to

610
00:23:47,960 --> 00:23:53,480
maintain it and decides to clean up for

611
00:23:50,919 --> 00:23:56,799
you uh they're not going to revert one

612
00:23:53,480 --> 00:24:00,080
of your your performance

613
00:23:56,799 --> 00:24:03,360
enhancements um so yeah uh what your

614
00:24:00,080 --> 00:24:05,320
needs are for the project is going to um

615
00:24:03,360 --> 00:24:08,520
determine where you fall between code

616
00:24:05,320 --> 00:24:11,520
maintainability and performance

617
00:24:08,520 --> 00:24:14,120
um so that brings us to whether you

618
00:24:11,520 --> 00:24:16,360
should just uh use an

619
00:24:14,120 --> 00:24:19,159
extension so there are a couple of

620
00:24:16,360 --> 00:24:20,640
downsides to using a compiled extension

621
00:24:19,159 --> 00:24:22,720
uh one is that it's difficult to build

622
00:24:20,640 --> 00:24:26,080
from source code unless you have all of

623
00:24:22,720 --> 00:24:29,360
the build tools um installed on your

624
00:24:26,080 --> 00:24:31,000
machine um the this may or may not be a

625
00:24:29,360 --> 00:24:33,440
problem depending on who you plan to

626
00:24:31,000 --> 00:24:36,120
distribute the code to uh if you want it

627
00:24:33,440 --> 00:24:39,279
to be widely distributed for example uh

628
00:24:36,120 --> 00:24:41,760
to publish this as a package on pii you

629
00:24:39,279 --> 00:24:46,080
will need to build test and maintain

630
00:24:41,760 --> 00:24:48,760
many wheels across many platforms um so

631
00:24:46,080 --> 00:24:51,039
that requires you know some extensive CI

632
00:24:48,760 --> 00:24:52,880
setup and infrastructure in order to get

633
00:24:51,039 --> 00:24:54,960
that uh properly maintained and tested

634
00:24:52,880 --> 00:24:58,600
across all of those

635
00:24:54,960 --> 00:25:03,000
platforms uh when errors do occur it can

636
00:24:58,600 --> 00:25:04,600
be more difficult to uh for customers to

637
00:25:03,000 --> 00:25:07,080
diagnose that themselves if they are

638
00:25:04,600 --> 00:25:09,039
pure python developers uh and we did

639
00:25:07,080 --> 00:25:11,159
find in the projects that we have used C

640
00:25:09,039 --> 00:25:13,640
extensions that it can be harder for

641
00:25:11,159 --> 00:25:16,279
people to engage in the open- source uh

642
00:25:13,640 --> 00:25:18,840
process if uh there we have components

643
00:25:16,279 --> 00:25:20,840
in a language that they're not fluent in

644
00:25:18,840 --> 00:25:23,120
uh so it can be harder for them to file

645
00:25:20,840 --> 00:25:26,679
issue reports and open Community

646
00:25:23,120 --> 00:25:28,840
contribution PRS to help resolve

647
00:25:26,679 --> 00:25:31,480
them having said all

648
00:25:28,840 --> 00:25:33,480
uh if those are not if those cave Arts

649
00:25:31,480 --> 00:25:35,360
uh do not impact your project you can

650
00:25:33,480 --> 00:25:38,159
achieve orders of magnitude performance

651
00:25:35,360 --> 00:25:40,200
gains using an extension uh in the

652
00:25:38,159 --> 00:25:43,080
projects that we have used them my

653
00:25:40,200 --> 00:25:46,679
recommendations are to keep it very

654
00:25:43,080 --> 00:25:49,240
targeted so try and find very specific

655
00:25:46,679 --> 00:25:51,760
functions that you want to increase the

656
00:25:49,240 --> 00:25:54,399
performance of with very defined inputs

657
00:25:51,760 --> 00:25:55,559
and outputs and then use those in your

658
00:25:54,399 --> 00:25:59,080
say C

659
00:25:55,559 --> 00:26:01,000
extensions uh don't try to wrap uh very

660
00:25:59,080 --> 00:26:02,600
large portions of code with complex

661
00:26:01,000 --> 00:26:05,240
logic uh because it can make the

662
00:26:02,600 --> 00:26:09,399
maintenance burdens more extensive going

663
00:26:05,240 --> 00:26:11,320
forward especially if uh the maintainers

664
00:26:09,399 --> 00:26:13,360
of this project are likely to change

665
00:26:11,320 --> 00:26:15,799
over time uh and it might be inherited

666
00:26:13,360 --> 00:26:18,279
by a new set of developers who need to

667
00:26:15,799 --> 00:26:20,080
understand uh the goals of that

668
00:26:18,279 --> 00:26:23,000
code

669
00:26:20,080 --> 00:26:26,159
um lastly always do provide a pure

670
00:26:23,000 --> 00:26:28,360
python fallback so in the case of a

671
00:26:26,159 --> 00:26:30,760
build failure installation failure

672
00:26:28,360 --> 00:26:34,080
incompatibility or anything like that uh

673
00:26:30,760 --> 00:26:35,760
even if it is much slower having an out

674
00:26:34,080 --> 00:26:38,279
giving your customers an option to fall

675
00:26:35,760 --> 00:26:39,600
back to a pure python uh implementation

676
00:26:38,279 --> 00:26:42,480
that they know will always work and they

677
00:26:39,600 --> 00:26:45,279
will always understand uh is a really

678
00:26:42,480 --> 00:26:48,520
customer friendly thing to

679
00:26:45,279 --> 00:26:51,000
do so in conclusion um this is the

680
00:26:48,520 --> 00:26:53,799
process that I use when I am doing

681
00:26:51,000 --> 00:26:56,279
performance analysis of my code uh so to

682
00:26:53,799 --> 00:26:58,480
start with I figure out what it is that

683
00:26:56,279 --> 00:27:02,240
uh is important performance-wise for for

684
00:26:58,480 --> 00:27:04,840
that um for that project uh what what we

685
00:27:02,240 --> 00:27:07,080
what it is we want to measure um and

686
00:27:04,840 --> 00:27:10,080
what success looks like what uh

687
00:27:07,080 --> 00:27:11,840
Benchmark are we trying to reach then we

688
00:27:10,080 --> 00:27:13,559
profile it we find out what the

689
00:27:11,840 --> 00:27:14,799
expensive functions are and how many

690
00:27:13,559 --> 00:27:17,200
times they're being called and we

691
00:27:14,799 --> 00:27:19,480
optimize them as much as possible and

692
00:27:17,200 --> 00:27:22,679
then we verify make sure that we have

693
00:27:19,480 --> 00:27:24,640
actually made improvements and we keep

694
00:27:22,679 --> 00:27:26,760
going and we keep going and we keep

695
00:27:24,640 --> 00:27:29,120
improving and hopefully we hit that

696
00:27:26,760 --> 00:27:31,440
Target number um

697
00:27:29,120 --> 00:27:34,039
or maybe it's time boxed and your

698
00:27:31,440 --> 00:27:34,039
manager says

699
00:27:34,559 --> 00:27:40,600
enough uh so that covers pretty much uh

700
00:27:37,559 --> 00:27:42,919
everything I wanted to go over today um

701
00:27:40,600 --> 00:27:44,480
uh these slides will be in my my GitHub

702
00:27:42,919 --> 00:27:45,960
repo uh if you want to click through to

703
00:27:44,480 --> 00:27:48,399
any of the links to the tools and things

704
00:27:45,960 --> 00:27:51,600
like that um my GitHub Alias is just my

705
00:27:48,399 --> 00:27:53,720
name uh big thank you to swa and Adam

706
00:27:51,600 --> 00:27:55,080
who are both um current and former

707
00:27:53,720 --> 00:27:57,480
colleagues who helped contribute to this

708
00:27:55,080 --> 00:28:00,380
content and a big thank you to my team

709
00:27:57,480 --> 00:28:06,559
who allow me do keep writing python

710
00:28:00,380 --> 00:28:06,559
[Applause]