1 00:00:04,960 --> 00:00:19,999 [Music] 2 00:00:20,800 --> 00:00:25,519 hello everybody Welcome to the last talk 3 00:00:22,680 --> 00:00:27,400 in this session just before lunch we've 4 00:00:25,519 --> 00:00:28,760 got a great talk coming up this is Ana 5 00:00:27,400 --> 00:00:30,759 Tish who's here to talk to us about 6 00:00:28,760 --> 00:00:32,880 performant python please welcome her to 7 00:00:30,759 --> 00:00:32,880 the 8 00:00:36,239 --> 00:00:41,160 stage thank you thank you uh good 9 00:00:38,760 --> 00:00:43,600 afternoon as you heard my name is Anna 10 00:00:41,160 --> 00:00:45,640 and today we're going to do a brief 11 00:00:43,600 --> 00:00:49,079 introduction on doing performance 12 00:00:45,640 --> 00:00:49,079 analysis of python 13 00:00:51,000 --> 00:00:56,480 code okay so I work for Microsoft on the 14 00:00:54,000 --> 00:00:58,199 Azure SDK for python uh so throughout 15 00:00:56,480 --> 00:01:00,879 this talk you'll probably hear me refer 16 00:00:58,199 --> 00:01:03,719 to customers or the customer experience 17 00:01:00,879 --> 00:01:05,600 and that's because my background is in 18 00:01:03,719 --> 00:01:08,960 uh building python libraries that we 19 00:01:05,600 --> 00:01:10,560 ship on pii for other developers to use 20 00:01:08,960 --> 00:01:12,119 uh so the word customer may mean 21 00:01:10,560 --> 00:01:14,960 different things to you depending on who 22 00:01:12,119 --> 00:01:16,200 it is that you write python code for uh 23 00:01:14,960 --> 00:01:18,759 so we're going to talk about what 24 00:01:16,200 --> 00:01:19,720 performance means uh why it matters and 25 00:01:18,759 --> 00:01:22,240 then we're going to look at how to 26 00:01:19,720 --> 00:01:23,759 profile your code uh and how to process 27 00:01:22,240 --> 00:01:26,680 and interpret those results and actually 28 00:01:23,759 --> 00:01:31,560 turn it into improved performance uh and 29 00:01:26,680 --> 00:01:34,079 we are going to focus on cpython today 30 00:01:31,560 --> 00:01:35,840 so let's Jump Right In what do I mean 31 00:01:34,079 --> 00:01:37,360 when I say performance performance can 32 00:01:35,840 --> 00:01:39,960 mean different things to different 33 00:01:37,360 --> 00:01:42,960 projects it can refer to Resource use 34 00:01:39,960 --> 00:01:45,119 like CPU utilization memory usage uh 35 00:01:42,960 --> 00:01:46,840 Network bandwidth uh or it could mean 36 00:01:45,119 --> 00:01:48,960 something like throughput or 37 00:01:46,840 --> 00:01:52,119 latency for this talk we're going to 38 00:01:48,960 --> 00:01:54,399 focus on the efficient execution of CPU 39 00:01:52,119 --> 00:01:57,439 bound python code uh so we're not going 40 00:01:54,399 --> 00:01:59,560 to cover IO or um concurrent programming 41 00:01:57,439 --> 00:02:02,439 like asynchronous or multi-threaded uh 42 00:01:59,560 --> 00:02:04,920 but the tools we look at uh can support 43 00:02:02,439 --> 00:02:04,920 those as 44 00:02:05,320 --> 00:02:10,119 well uh so why do we care about this um 45 00:02:08,319 --> 00:02:12,280 python doesn't have a great reputation 46 00:02:10,119 --> 00:02:14,920 for performance and it's usually not the 47 00:02:12,280 --> 00:02:17,040 language of First Choice uh for people 48 00:02:14,920 --> 00:02:19,360 looking to build really performance 49 00:02:17,040 --> 00:02:21,599 systems uh and even if we are using 50 00:02:19,360 --> 00:02:23,400 python we can just use a c extension to 51 00:02:21,599 --> 00:02:25,239 improve 52 00:02:23,400 --> 00:02:27,599 performance but it doesn't actually have 53 00:02:25,239 --> 00:02:29,400 to be this way cpython is built on top 54 00:02:27,599 --> 00:02:31,560 of c and if we know how to really 55 00:02:29,400 --> 00:02:32,920 Leverage that uh and write our python 56 00:02:31,560 --> 00:02:35,319 code carefully we can actually get 57 00:02:32,920 --> 00:02:37,200 pretty awesome performance uh if you're 58 00:02:35,319 --> 00:02:39,239 writing code for customers they might 59 00:02:37,200 --> 00:02:41,680 have existing projects or ecosystems 60 00:02:39,239 --> 00:02:43,959 already built in Python uh or they may 61 00:02:41,680 --> 00:02:45,879 just be purely python trained divs and 62 00:02:43,959 --> 00:02:47,280 asking them to rewrite components in 63 00:02:45,879 --> 00:02:49,599 another language just to achieve 64 00:02:47,280 --> 00:02:51,840 performance might not even be 65 00:02:49,599 --> 00:02:53,640 feasible also just because performance 66 00:02:51,840 --> 00:02:55,239 might not be the primary focus of your 67 00:02:53,640 --> 00:02:56,920 project uh that doesn't mean it needs to 68 00:02:55,239 --> 00:02:59,640 run slow we really like it when all of 69 00:02:56,920 --> 00:03:01,519 our code runs fast and uh if you do want 70 00:02:59,640 --> 00:03:03,440 to fallback to using a c extension there 71 00:03:01,519 --> 00:03:05,640 can be some drawbacks to that uh which 72 00:03:03,440 --> 00:03:08,560 I'll talk about in a little 73 00:03:05,640 --> 00:03:10,120 bit if you have a large codebase you 74 00:03:08,560 --> 00:03:12,840 don't need to worry about doing 75 00:03:10,120 --> 00:03:14,480 performance analysis for all of it uh it 76 00:03:12,840 --> 00:03:16,000 can be quite an expensive timec 77 00:03:14,480 --> 00:03:18,680 consuming process so you really want to 78 00:03:16,000 --> 00:03:20,159 be strategic and focus uh that time 79 00:03:18,680 --> 00:03:21,760 where it's going to matter in your 80 00:03:20,159 --> 00:03:24,920 critical path or your hot path through 81 00:03:21,760 --> 00:03:26,080 your code or areas of really high reuse 82 00:03:24,920 --> 00:03:28,319 um or areas that you're going to 83 00:03:26,080 --> 00:03:30,159 distribute to other teams to use and 84 00:03:28,319 --> 00:03:32,480 that way if it's running as efficient as 85 00:03:30,159 --> 00:03:36,080 possible everybody 86 00:03:32,480 --> 00:03:37,400 benefits so let's jump straight into how 87 00:03:36,080 --> 00:03:39,280 how do we analyze our code for 88 00:03:37,400 --> 00:03:40,879 performance uh to start with we need to 89 00:03:39,280 --> 00:03:42,519 know what it is we're going to measure 90 00:03:40,879 --> 00:03:44,000 um this could be throughput so how much 91 00:03:42,519 --> 00:03:46,560 data we can process in a certain amount 92 00:03:44,000 --> 00:03:48,799 of time uh this could be latency so how 93 00:03:46,560 --> 00:03:50,760 quickly are are we responding to 94 00:03:48,799 --> 00:03:52,319 requests um once we know what we're 95 00:03:50,760 --> 00:03:54,560 measuring we need to know what success 96 00:03:52,319 --> 00:03:56,959 looks like uh is there a specific 97 00:03:54,560 --> 00:03:58,280 benchmark we need to meet uh maybe we 98 00:03:56,959 --> 00:04:00,799 simply want to check our code and 99 00:03:58,280 --> 00:04:02,760 minimize any bottlenecks uh or maybe we 100 00:04:00,799 --> 00:04:04,760 have a certain uh window of time two 101 00:04:02,760 --> 00:04:07,680 weeks or a Sprint uh and we want to make 102 00:04:04,760 --> 00:04:09,799 as much progress as we can in that time 103 00:04:07,680 --> 00:04:12,760 frame so the example we're going to work 104 00:04:09,799 --> 00:04:14,280 through today is actually um a real 105 00:04:12,760 --> 00:04:17,400 issue that a customer opened on our 106 00:04:14,280 --> 00:04:19,320 GitHub uh repo between version one and 107 00:04:17,400 --> 00:04:21,639 version two of a library that we shipped 108 00:04:19,320 --> 00:04:24,479 the customer reported a 20x performance 109 00:04:21,639 --> 00:04:26,639 regression in model deserialization so 110 00:04:24,479 --> 00:04:29,840 that is uh in this case the process of 111 00:04:26,639 --> 00:04:33,120 loading Json data from a file on disk 112 00:04:29,840 --> 00:04:35,199 and then uh turning that into a python 113 00:04:33,120 --> 00:04:36,479 object uh so that is what we're trying 114 00:04:35,199 --> 00:04:40,360 to measure here that is our performance 115 00:04:36,479 --> 00:04:43,520 metric um the speed of deserialization 116 00:04:40,360 --> 00:04:45,440 from uh raw Json data into python object 117 00:04:43,520 --> 00:04:47,440 and our goal here is to reach parity 118 00:04:45,440 --> 00:04:50,440 with version 119 00:04:47,440 --> 00:04:53,000 one so this is the code um as you can 120 00:04:50,440 --> 00:04:55,880 see here we have our model document 121 00:04:53,000 --> 00:04:59,400 analysis uh what this model represents 122 00:04:55,880 --> 00:05:01,720 is a deconstructed PDF document so it 123 00:04:59,400 --> 00:05:04,280 breaks down individual pages and what's 124 00:05:01,720 --> 00:05:06,520 on each page paragraphs of text images 125 00:05:04,280 --> 00:05:08,759 tables and things like that uh so this 126 00:05:06,520 --> 00:05:11,360 is the loading of the Json data from 127 00:05:08,759 --> 00:05:13,479 disk and then this is the one 128 00:05:11,360 --> 00:05:15,400 problematic line uh so this is just 129 00:05:13,479 --> 00:05:18,120 constructing an object by passing in 130 00:05:15,400 --> 00:05:20,440 that raw Json 131 00:05:18,120 --> 00:05:22,440 data so how do we profile this we've got 132 00:05:20,440 --> 00:05:23,960 a few tools uh that we can use and the 133 00:05:22,440 --> 00:05:26,000 one we're going to focus on today is C 134 00:05:23,960 --> 00:05:27,160 profile and that is the profiler that 135 00:05:26,000 --> 00:05:29,400 ships with 136 00:05:27,160 --> 00:05:31,360 cpython um if you have access to py 137 00:05:29,400 --> 00:05:33,400 chart py Cham has a really awesome Suite 138 00:05:31,360 --> 00:05:35,759 of performance analysis tools uh the 139 00:05:33,400 --> 00:05:37,280 default profiler with py Cham is yappi 140 00:05:35,759 --> 00:05:40,000 so you can also use yappi directly if 141 00:05:37,280 --> 00:05:42,560 you don't have pyam uh snake viz and viz 142 00:05:40,000 --> 00:05:44,240 Tracer are two visualization tools uh 143 00:05:42,560 --> 00:05:46,560 snake viz is the one we're going to use 144 00:05:44,240 --> 00:05:48,919 today uh but I use both regularly um 145 00:05:46,560 --> 00:05:51,720 they have uh they operate on the outputs 146 00:05:48,919 --> 00:05:53,720 of C profile and Austin is another 147 00:05:51,720 --> 00:05:56,639 profiler and this one has a really nice 148 00:05:53,720 --> 00:05:58,840 uh vs code extension when you're doing 149 00:05:56,639 --> 00:06:01,360 profiling it's really important that you 150 00:05:58,840 --> 00:06:03,360 try and uh run these tests on an 151 00:06:01,360 --> 00:06:05,160 isolated uh well-defined and 152 00:06:03,360 --> 00:06:06,880 reproducible environment I do not 153 00:06:05,160 --> 00:06:09,160 recommend doing this on your local Dev 154 00:06:06,880 --> 00:06:12,440 machine or laptop uh you will get very 155 00:06:09,160 --> 00:06:15,400 inconsistent and inaccurate 156 00:06:12,440 --> 00:06:17,000 results so how do we run the profiler uh 157 00:06:15,400 --> 00:06:18,560 very simple command line we've got here 158 00:06:17,000 --> 00:06:21,280 so we're just invoking the C profile 159 00:06:18,560 --> 00:06:23,000 module giving it an arbitrary output 160 00:06:21,280 --> 00:06:24,400 name this can be anything and then we're 161 00:06:23,000 --> 00:06:27,120 passing in the script we just looked at 162 00:06:24,400 --> 00:06:28,880 before perf test.py we can tell it's run 163 00:06:27,120 --> 00:06:31,199 because we've got one print line there 164 00:06:28,880 --> 00:06:34,280 and then to load that into snake B we're 165 00:06:31,199 --> 00:06:37,120 just passing in the output file that we 166 00:06:34,280 --> 00:06:39,440 defined and we get this nice big table 167 00:06:37,120 --> 00:06:41,960 of numbers so what are we looking at 168 00:06:39,440 --> 00:06:45,000 here uh the first thing we can see is we 169 00:06:41,960 --> 00:06:46,639 have function names uh and line numbers 170 00:06:45,000 --> 00:06:48,080 uh we can see at the top here per test. 171 00:06:46,639 --> 00:06:50,639 Pi that's the module we were running so 172 00:06:48,080 --> 00:06:52,639 that looks good uh next we have this 173 00:06:50,639 --> 00:06:54,400 Constructor uh so that is actually the 174 00:06:52,639 --> 00:06:56,360 Constructor of that document analysis 175 00:06:54,400 --> 00:06:58,479 model so again that's what we would 176 00:06:56,360 --> 00:07:00,240 expect third thing down we have another 177 00:06:58,479 --> 00:07:03,120 Constructor and this is actually the 178 00:07:00,240 --> 00:07:05,120 Base Class of that model so that model 179 00:07:03,120 --> 00:07:07,319 inherits from this one uh so again that 180 00:07:05,120 --> 00:07:09,919 one's also expected and we have a whole 181 00:07:07,319 --> 00:07:12,520 bunch of uh utility functions all called 182 00:07:09,919 --> 00:07:15,199 some variation on deserialize so that 183 00:07:12,520 --> 00:07:17,000 sounds reasonable um and then we have a 184 00:07:15,199 --> 00:07:19,639 couple of copy functions at the bottom 185 00:07:17,000 --> 00:07:22,360 here so we're going to get to that in a 186 00:07:19,639 --> 00:07:25,759 bit uh so the numbers what numbers do we 187 00:07:22,360 --> 00:07:27,560 have first colum we have is uh number of 188 00:07:25,759 --> 00:07:29,639 calls so this is how many times each of 189 00:07:27,560 --> 00:07:31,479 these functions were called uh perest 190 00:07:29,639 --> 00:07:33,919 module was run once that sounds about 191 00:07:31,479 --> 00:07:36,800 right uh model Constructor that also was 192 00:07:33,919 --> 00:07:37,960 one once perfect uh base model 193 00:07:36,800 --> 00:07:39,919 Constructor we can see that was 194 00:07:37,960 --> 00:07:42,400 originally called once uh but the second 195 00:07:39,919 --> 00:07:47,039 number here indicates that it was called 196 00:07:42,400 --> 00:07:49,599 recursively 709 times uh and what that 197 00:07:47,039 --> 00:07:51,599 means is that even though we have one 198 00:07:49,599 --> 00:07:54,919 class that we're creating uh that is 199 00:07:51,599 --> 00:07:56,800 actually made up of 79 smaller classes 200 00:07:54,919 --> 00:08:00,039 that are all being instantiated as part 201 00:07:56,800 --> 00:08:02,879 of that deserialization process 202 00:08:00,039 --> 00:08:05,360 uh next column we have total time and 203 00:08:02,879 --> 00:08:08,159 this is the length of time spent running 204 00:08:05,360 --> 00:08:10,199 a function exclusive of any sub 205 00:08:08,159 --> 00:08:12,039 functions that it calls into so these 206 00:08:10,199 --> 00:08:14,360 are the numbers we want to use if we're 207 00:08:12,039 --> 00:08:15,639 looking at just isolating those lines of 208 00:08:14,360 --> 00:08:17,560 code in the function to figure out how 209 00:08:15,639 --> 00:08:20,400 long they take to run and then we have 210 00:08:17,560 --> 00:08:23,039 that broken down on a peral basis next 211 00:08:20,400 --> 00:08:25,280 column we have is the cumulative time 212 00:08:23,039 --> 00:08:28,159 and so this is the amount of time spent 213 00:08:25,280 --> 00:08:30,240 in that function inclusive of any sub 214 00:08:28,159 --> 00:08:32,599 functions that it calls into and so we 215 00:08:30,240 --> 00:08:34,719 want to use these numbers when we're 216 00:08:32,599 --> 00:08:38,039 trying to get a sense of where time is 217 00:08:34,719 --> 00:08:40,719 being spent uh to achieve certain tasks 218 00:08:38,039 --> 00:08:44,159 and again we have that broken down per 219 00:08:40,719 --> 00:08:47,600 call so how do I use this information so 220 00:08:44,159 --> 00:08:49,320 my process when I am profiling code uh 221 00:08:47,600 --> 00:08:52,200 is there are three red flags that I look 222 00:08:49,320 --> 00:08:54,279 for uh the first one is probably very 223 00:08:52,200 --> 00:08:56,279 obvious which areas are computationally 224 00:08:54,279 --> 00:08:58,120 expensive so this is how much time 225 00:08:56,279 --> 00:08:59,600 things are taking to run and for this 226 00:08:58,120 --> 00:09:01,600 we're using our total time and our 227 00:08:59,600 --> 00:09:03,920 cumulative time and the per call run 228 00:09:01,600 --> 00:09:05,720 times uh the next thing I look for is 229 00:09:03,920 --> 00:09:07,399 areas that are excessively repetitive 230 00:09:05,720 --> 00:09:09,600 and they're uh functions that have being 231 00:09:07,399 --> 00:09:11,519 called more times than seems reasonable 232 00:09:09,600 --> 00:09:15,120 for what they're doing uh and that's 233 00:09:11,519 --> 00:09:16,480 with the uh in calls column and lastly 234 00:09:15,120 --> 00:09:18,040 uh this one can have some interesting 235 00:09:16,480 --> 00:09:19,760 results and that's time wasted 236 00:09:18,040 --> 00:09:23,000 unproductively so this is when I'm 237 00:09:19,760 --> 00:09:24,640 looking for function calls that uh don't 238 00:09:23,000 --> 00:09:25,959 look like they should even be there or 239 00:09:24,640 --> 00:09:29,839 at the very least shouldn't be 240 00:09:25,959 --> 00:09:31,360 represented so high in the numbers 241 00:09:29,839 --> 00:09:32,480 so before we start iterating on this 242 00:09:31,360 --> 00:09:34,079 code we're going to clean it up a little 243 00:09:32,480 --> 00:09:35,560 bit because before we were just running 244 00:09:34,079 --> 00:09:37,240 uh from the command line and that was 245 00:09:35,560 --> 00:09:40,360 analyzing the entire script when in 246 00:09:37,240 --> 00:09:42,240 reality we really only want to uh 247 00:09:40,360 --> 00:09:44,519 analyze the performance of that one line 248 00:09:42,240 --> 00:09:47,079 that Constructor for the object so we're 249 00:09:44,519 --> 00:09:48,640 going to import C profile as a module 250 00:09:47,079 --> 00:09:51,000 and we're going to run it in a context 251 00:09:48,640 --> 00:09:54,480 manager like this and then that we can 252 00:09:51,000 --> 00:09:55,680 output our file at the end uh I'm also 253 00:09:54,480 --> 00:09:57,040 going to take that one line and I'm 254 00:09:55,680 --> 00:09:59,160 going to run it in a loop and there's 255 00:09:57,040 --> 00:10:01,120 two reasons for this uh one is that it's 256 00:09:59,160 --> 00:10:02,560 going to average out any variation in 257 00:10:01,120 --> 00:10:03,760 the runtime a little bit the first time 258 00:10:02,560 --> 00:10:05,160 it runs it's going to be a wee bit 259 00:10:03,760 --> 00:10:07,680 slower because it's doing some caching 260 00:10:05,160 --> 00:10:09,200 and we want to sort of average that out 261 00:10:07,680 --> 00:10:11,560 uh the second reason is that we're 262 00:10:09,200 --> 00:10:13,040 dealing with very very tiny numbers here 263 00:10:11,560 --> 00:10:15,120 a lot of this is in the millisecond and 264 00:10:13,040 --> 00:10:17,839 the microsc range so by uh running it 265 00:10:15,120 --> 00:10:19,800 more times we boost it into the more 266 00:10:17,839 --> 00:10:22,640 human readable seconds so that we get a 267 00:10:19,800 --> 00:10:22,640 better sense of what we're 268 00:10:22,680 --> 00:10:27,079 doing okay so we've re rerun our 269 00:10:25,880 --> 00:10:28,920 profiler results and they're a little 270 00:10:27,079 --> 00:10:30,800 bit cleaner now we no longer have the uh 271 00:10:28,920 --> 00:10:33,160 the mod module at the top there we've 272 00:10:30,800 --> 00:10:35,880 jumped straight into our Constructor and 273 00:10:33,160 --> 00:10:38,079 we can see as expected that ran 50 times 274 00:10:35,880 --> 00:10:40,399 in our Loop now the cool thing about 275 00:10:38,079 --> 00:10:42,079 snake viz and a lot of the profiling 276 00:10:40,399 --> 00:10:44,320 tools will give you sort of a similar 277 00:10:42,079 --> 00:10:46,839 output is that we can have this kind of 278 00:10:44,320 --> 00:10:48,680 graphical breakdown as to where time and 279 00:10:46,839 --> 00:10:50,959 resources are being spent uh when it 280 00:10:48,680 --> 00:10:53,920 runs the code uh we can see at the top 281 00:10:50,959 --> 00:10:56,360 here for 36 and 1 12 seconds we spent in 282 00:10:53,920 --> 00:10:59,360 our model Constructor so that is the 283 00:10:56,360 --> 00:11:01,120 entire deserialization process and again 284 00:10:59,360 --> 00:11:03,320 all of that time is shared with how much 285 00:11:01,120 --> 00:11:06,279 time was spent in the Base Class 286 00:11:03,320 --> 00:11:09,079 Constructor from there the process 287 00:11:06,279 --> 00:11:11,639 breaks into multiple paths um depending 288 00:11:09,079 --> 00:11:14,720 on the call stack uh and immediately we 289 00:11:11,639 --> 00:11:16,680 have a bit of a red flag here um why is 290 00:11:14,720 --> 00:11:18,680 our process spending so much time 291 00:11:16,680 --> 00:11:20,720 copying data this is making up a huge 292 00:11:18,680 --> 00:11:22,120 chunk of our runtime so the first thing 293 00:11:20,720 --> 00:11:24,720 I'm going to do with these results is 294 00:11:22,120 --> 00:11:27,040 figure out what's going on 295 00:11:24,720 --> 00:11:29,800 there and this is actually the base 296 00:11:27,040 --> 00:11:31,279 class uh for our model and we can see 297 00:11:29,800 --> 00:11:34,079 every time this is constructed we're 298 00:11:31,279 --> 00:11:36,760 doing a full copy of the Json data that 299 00:11:34,079 --> 00:11:39,200 is being passed in and if you recall 300 00:11:36,760 --> 00:11:43,519 from the previous table this function is 301 00:11:39,200 --> 00:11:45,600 being called 709 times on that data uh 302 00:11:43,519 --> 00:11:48,079 so in this kind of deserialization 303 00:11:45,600 --> 00:11:49,839 process it might be reasonable to copy 304 00:11:48,079 --> 00:11:53,639 the data once because a dictionary is a 305 00:11:49,839 --> 00:11:55,959 mutable object uh definitely not 79 306 00:11:53,639 --> 00:12:00,240 times uh so first step we're just going 307 00:11:55,959 --> 00:12:01,959 to clean that up and rerun our profiler 308 00:12:00,240 --> 00:12:04,920 immediately you can see we've dropped 309 00:12:01,959 --> 00:12:06,600 down to 5.92 seconds so one iteration on 310 00:12:04,920 --> 00:12:08,959 this profile data we found a massive 311 00:12:06,600 --> 00:12:10,959 bottleneck uh and simply removing that 312 00:12:08,959 --> 00:12:12,639 has cleaned up our data a lot we're 313 00:12:10,959 --> 00:12:14,560 spending a lot more time now actually 314 00:12:12,639 --> 00:12:16,519 doing the 315 00:12:14,560 --> 00:12:19,160 deserialization uh now we have this sort 316 00:12:16,519 --> 00:12:21,199 of Big Orange rectangle to the side here 317 00:12:19,160 --> 00:12:23,440 so that looks a we bit 318 00:12:21,199 --> 00:12:26,360 suspicious uh so we run our numbers 319 00:12:23,440 --> 00:12:29,360 again this time we've uh organized it 320 00:12:26,360 --> 00:12:31,360 for the uh total time and we can see 321 00:12:29,360 --> 00:12:32,880 we've got this serialized function here 322 00:12:31,360 --> 00:12:35,279 and it's one of the most expensive 323 00:12:32,880 --> 00:12:37,800 functions um which strikes me as a bit 324 00:12:35,279 --> 00:12:40,040 odd because we're deserializing so why 325 00:12:37,800 --> 00:12:42,399 have we got uh such an expensive 326 00:12:40,040 --> 00:12:45,199 function here that's doing serializing 327 00:12:42,399 --> 00:12:47,440 so let's take a closer look at 328 00:12:45,199 --> 00:12:49,480 that and this is the function in 329 00:12:47,440 --> 00:12:51,720 question uh and it turns out this was 330 00:12:49,480 --> 00:12:53,880 actually being used as a fallback so 331 00:12:51,720 --> 00:12:55,160 anytime we had uncontrol data that we 332 00:12:53,880 --> 00:12:57,800 didn't know what it was or what to do 333 00:12:55,160 --> 00:12:59,480 with it it was calling this function uh 334 00:12:57,800 --> 00:13:01,399 and as you can see it's sort of a big if 335 00:12:59,480 --> 00:13:04,279 else statement it's doing a whole lot of 336 00:13:01,399 --> 00:13:06,839 instance checks is it a list is it a 337 00:13:04,279 --> 00:13:09,600 dictionary uh is it a date time is it a 338 00:13:06,839 --> 00:13:11,800 Time Delta and if it's none of these 339 00:13:09,600 --> 00:13:13,959 things we get to the end of the function 340 00:13:11,800 --> 00:13:16,839 and do absolutely nothing um so we're 341 00:13:13,959 --> 00:13:19,000 returning the data completely unaltered 342 00:13:16,839 --> 00:13:21,519 uh I happen to know that the data set 343 00:13:19,000 --> 00:13:24,680 we're working on is almost entirely 344 00:13:21,519 --> 00:13:27,000 strings integers and floats and so this 345 00:13:24,680 --> 00:13:29,680 function was being called on all uh 346 00:13:27,000 --> 00:13:32,720 every uh data entry in that file 347 00:13:29,680 --> 00:13:33,959 and doing absolutely nothing with it so 348 00:13:32,720 --> 00:13:36,560 there are two things we could do with 349 00:13:33,959 --> 00:13:38,279 this we could either refactor it to xit 350 00:13:36,560 --> 00:13:40,360 earlier in the case of a simple data 351 00:13:38,279 --> 00:13:41,920 type uh or we could refactor this 352 00:13:40,360 --> 00:13:44,600 function out 353 00:13:41,920 --> 00:13:46,839 completely and Not only was serialize 354 00:13:44,600 --> 00:13:48,800 one of our most expensive functions uh 355 00:13:46,839 --> 00:13:51,320 but we can see here that serialized date 356 00:13:48,800 --> 00:13:53,120 time also featured as well as time Delta 357 00:13:51,320 --> 00:13:55,279 so that one function was accounting for 358 00:13:53,120 --> 00:13:58,040 three of our Most Wanted expensive 359 00:13:55,279 --> 00:14:01,079 functions so we got rid of 360 00:13:58,040 --> 00:14:02,959 that run our profiler and now we're down 361 00:14:01,079 --> 00:14:04,240 to 4.18 seconds so we're shaving off a 362 00:14:02,959 --> 00:14:06,680 little bit more 363 00:14:04,240 --> 00:14:08,560 time uh so now we're spending a lot more 364 00:14:06,680 --> 00:14:10,680 time actually doing the deserialization 365 00:14:08,560 --> 00:14:12,320 and that's the goal here is to maximize 366 00:14:10,680 --> 00:14:15,000 the amount of time that we're actually 367 00:14:12,320 --> 00:14:16,560 doing the work we want to be doing um 368 00:14:15,000 --> 00:14:19,160 once we've achieved that we can actually 369 00:14:16,560 --> 00:14:21,480 start uh fine-tuning that workload as 370 00:14:19,160 --> 00:14:23,959 well so this deserialized with callable 371 00:14:21,480 --> 00:14:25,399 as one of our most expensive functions 372 00:14:23,959 --> 00:14:28,040 and we can see that here when we 373 00:14:25,399 --> 00:14:29,680 organize for total time uh again it's 374 00:14:28,040 --> 00:14:31,959 showing up as one of our most expensive 375 00:14:29,680 --> 00:14:33,519 functions and it's run like 161,000 376 00:14:31,959 --> 00:14:35,000 times so we should try and optimize that 377 00:14:33,519 --> 00:14:37,240 a we 378 00:14:35,000 --> 00:14:38,399 bit uh this is that function and it 379 00:14:37,240 --> 00:14:39,800 looks a little bit similar to the last 380 00:14:38,399 --> 00:14:42,000 one we've kind of got this if elf 381 00:14:39,800 --> 00:14:45,320 structure going on uh again we've got 382 00:14:42,000 --> 00:14:47,480 more of these is instance checks um in 383 00:14:45,320 --> 00:14:49,839 Python uh we have a preference for the 384 00:14:47,480 --> 00:14:51,480 pattern of uh ask forgiveness rather 385 00:14:49,839 --> 00:14:53,480 than permission instead of look before 386 00:14:51,480 --> 00:14:55,000 you leap which is what we're seeing here 387 00:14:53,480 --> 00:14:57,839 so we can refactor this method a little 388 00:14:55,000 --> 00:14:59,639 bit to we be a wee bit more assertive 389 00:14:57,839 --> 00:15:03,040 about the data types where we're working 390 00:14:59,639 --> 00:15:05,440 with uh and we can also Shuffle the 391 00:15:03,040 --> 00:15:08,240 order a wee bit around um so I happen to 392 00:15:05,440 --> 00:15:11,240 know that this scenario here happens 393 00:15:08,240 --> 00:15:13,000 more frequently than this scenario here 394 00:15:11,240 --> 00:15:15,040 so in that case we should move that 395 00:15:13,000 --> 00:15:19,600 further up and so this function will 396 00:15:15,040 --> 00:15:19,600 exit faster on the most high trffic 397 00:15:19,920 --> 00:15:25,399 scenario so this is our restructured 398 00:15:22,519 --> 00:15:27,920 function uh we moved to a tri except 399 00:15:25,399 --> 00:15:29,839 style rather than an is instance style 400 00:15:27,920 --> 00:15:31,440 and we've reordered some of these 401 00:15:29,839 --> 00:15:34,000 operations so that the function is going 402 00:15:31,440 --> 00:15:37,360 to be a little bit more efficient in our 403 00:15:34,000 --> 00:15:37,360 uh most frequent 404 00:15:37,839 --> 00:15:42,759 scenario and we have rerun the numbers 405 00:15:40,000 --> 00:15:44,480 again we are nowed down to 3.38 seconds 406 00:15:42,759 --> 00:15:47,720 so we've made a wee bit more 407 00:15:44,480 --> 00:15:50,360 progress uh and this process we can keep 408 00:15:47,720 --> 00:15:51,839 iterating on I would say my next uh 409 00:15:50,360 --> 00:15:53,959 Target would probably be these light 410 00:15:51,839 --> 00:15:57,480 blue boxes here would like to figure out 411 00:15:53,959 --> 00:15:59,440 what's going on there uh if we organize 412 00:15:57,480 --> 00:16:02,040 our table for number of 413 00:15:59,440 --> 00:16:04,839 calls um this is another bit of a red 414 00:16:02,040 --> 00:16:10,279 flag uh we're running as 415 00:16:04,839 --> 00:16:11,759 instance 736 th000 times um so that 416 00:16:10,279 --> 00:16:14,440 would probably be another thing I start 417 00:16:11,759 --> 00:16:16,639 looking into now to be clear we are 418 00:16:14,440 --> 00:16:18,319 talking about microc seconds here so I 419 00:16:16,639 --> 00:16:20,959 could refactor away all of these is 420 00:16:18,319 --> 00:16:24,240 instance uh calls and I'm saving a total 421 00:16:20,959 --> 00:16:26,160 of16 seconds on 50 models uh so the 422 00:16:24,240 --> 00:16:28,120 point in here is not necessarily to 423 00:16:26,160 --> 00:16:30,800 remove all of those calls and shave off 424 00:16:28,120 --> 00:16:34,160 a tiny amount of time rather when I look 425 00:16:30,800 --> 00:16:36,120 at that um that to me indicates that 426 00:16:34,160 --> 00:16:37,959 maybe we're doing something else very 427 00:16:36,120 --> 00:16:40,440 inefficiently that requires us to check 428 00:16:37,959 --> 00:16:42,920 the type that many times and so perhaps 429 00:16:40,440 --> 00:16:46,560 a more substantial refactor could result 430 00:16:42,920 --> 00:16:49,839 in more substantial uh improvements of 431 00:16:46,560 --> 00:16:51,560 performance so what are we doing here uh 432 00:16:49,839 --> 00:16:53,480 every time we uh see one of these 433 00:16:51,560 --> 00:16:55,560 expensive functions in our profile 434 00:16:53,480 --> 00:16:56,880 results we want to optimize it uh and 435 00:16:55,560 --> 00:16:58,600 the first is the question why the method 436 00:16:56,880 --> 00:17:00,399 is even there is it necessary could we 437 00:16:58,600 --> 00:17:03,079 simp refactor it out completely like 438 00:17:00,399 --> 00:17:05,079 that serialized function um Can the 439 00:17:03,079 --> 00:17:06,360 function be replaced with a faster one 440 00:17:05,079 --> 00:17:08,559 can we utilize something from the 441 00:17:06,360 --> 00:17:11,640 built-in library or um that's already 442 00:17:08,559 --> 00:17:14,880 optimized by C to make this run faster 443 00:17:11,640 --> 00:17:17,959 or maybe we can refactor it to run fewer 444 00:17:14,880 --> 00:17:20,319 times uh next if it's one of these big 445 00:17:17,959 --> 00:17:22,799 uh multipath functions like those big F 446 00:17:20,319 --> 00:17:24,640 statements can we optimize that so that 447 00:17:22,799 --> 00:17:25,600 the majority case is going to be the 448 00:17:24,640 --> 00:17:27,959 most 449 00:17:25,600 --> 00:17:30,240 efficient and lastly are we doubling up 450 00:17:27,959 --> 00:17:32,559 on any logic uh are we doing a lot of 451 00:17:30,240 --> 00:17:36,240 type verification in multiple places on 452 00:17:32,559 --> 00:17:36,240 the same data that could be 453 00:17:36,840 --> 00:17:40,799 simplified so once we've started 454 00:17:39,120 --> 00:17:42,640 iterating on that code we want to verify 455 00:17:40,799 --> 00:17:43,559 that we've actually made improvements uh 456 00:17:42,640 --> 00:17:45,640 so we're going to have a look at the 457 00:17:43,559 --> 00:17:47,679 time it module and again this ships with 458 00:17:45,640 --> 00:17:49,720 the standard Library uh and it's a 459 00:17:47,679 --> 00:17:52,400 really useful tool for doing ab 460 00:17:49,720 --> 00:17:53,679 comparisons on code paths uh and it's a 461 00:17:52,400 --> 00:17:55,120 great way of validating whether you've 462 00:17:53,679 --> 00:17:57,679 met your success 463 00:17:55,120 --> 00:17:59,360 criteria can also be a really good tool 464 00:17:57,679 --> 00:18:01,159 to help you prevent 465 00:17:59,360 --> 00:18:02,840 per progressions going forward so that 466 00:18:01,159 --> 00:18:05,120 you can make sure that once you've 467 00:18:02,840 --> 00:18:07,520 optimized your code uh over time it 468 00:18:05,120 --> 00:18:10,240 doesn't slip back into poor 469 00:18:07,520 --> 00:18:11,720 performance uh also here the results can 470 00:18:10,240 --> 00:18:14,120 be very inconsistent I wouldn't 471 00:18:11,720 --> 00:18:16,200 recommend running this on your laptop uh 472 00:18:14,120 --> 00:18:18,480 if you can use an isolated environment 473 00:18:16,200 --> 00:18:21,039 that uh can be easily reproduced that 474 00:18:18,480 --> 00:18:24,000 would be give you much more um accurate 475 00:18:21,039 --> 00:18:26,799 and reusable 476 00:18:24,000 --> 00:18:28,760 results uh how do we use time it uh very 477 00:18:26,799 --> 00:18:29,799 simple we're importing it as a module 478 00:18:28,760 --> 00:18:31,440 and then we're just passing in the 479 00:18:29,799 --> 00:18:33,320 function we want to measure in this case 480 00:18:31,440 --> 00:18:35,000 the construction of our object and the 481 00:18:33,320 --> 00:18:36,559 number of times we want to run it uh so 482 00:18:35,000 --> 00:18:39,640 we've got AB here with our original 483 00:18:36,559 --> 00:18:41,520 model and our improved one and we can 484 00:18:39,640 --> 00:18:43,919 see here from these numbers we've 485 00:18:41,520 --> 00:18:45,559 achieved about a 4X performance gain so 486 00:18:43,919 --> 00:18:49,559 we've still got quite a long way to go 487 00:18:45,559 --> 00:18:51,200 before we hit that uh 20x um metric that 488 00:18:49,559 --> 00:18:53,039 we want to meet to resolve this customer 489 00:18:51,200 --> 00:18:55,919 issue we haven't closed this issue yet 490 00:18:53,039 --> 00:18:55,919 this is an ongoing 491 00:18:56,159 --> 00:19:01,559 investigation uh so step five is 492 00:18:59,400 --> 00:19:02,960 um as you can see we did about four 493 00:19:01,559 --> 00:19:05,000 iterations and managed to make some 494 00:19:02,960 --> 00:19:06,880 massive improvements and often those 495 00:19:05,000 --> 00:19:09,080 first few iterations are going to yield 496 00:19:06,880 --> 00:19:11,760 some loow hanging fruit that can give 497 00:19:09,080 --> 00:19:13,080 you some really big gains with minimal 498 00:19:11,760 --> 00:19:16,240 um minimal code 499 00:19:13,080 --> 00:19:18,919 changes each iteration is likely to give 500 00:19:16,240 --> 00:19:21,280 you smaller Returns on investment uh or 501 00:19:18,919 --> 00:19:23,960 might result in bigger structural 502 00:19:21,280 --> 00:19:26,840 changes in order to see those per 503 00:19:23,960 --> 00:19:29,440 gains um so if you do find yourself 504 00:19:26,840 --> 00:19:31,120 needing to make massive refactoring do 505 00:19:29,440 --> 00:19:34,520 make sure you have a very thorough test 506 00:19:31,120 --> 00:19:36,039 Suite in place before you do uh and it's 507 00:19:34,520 --> 00:19:39,320 worth noting that there is actually no 508 00:19:36,039 --> 00:19:41,760 end point here um there is no maximum 509 00:19:39,320 --> 00:19:43,360 performance it can always be faster you 510 00:19:41,760 --> 00:19:45,360 will always find something else that you 511 00:19:43,360 --> 00:19:48,320 could tweak to shave off an extra few 512 00:19:45,360 --> 00:19:51,880 microseconds and in some cases uh that 513 00:19:48,320 --> 00:19:51,880 level of fine-tuning will be 514 00:19:51,960 --> 00:19:59,280 important uh so here's a couple of rules 515 00:19:54,000 --> 00:20:01,960 of thumb uh for how to um when you're 516 00:19:59,280 --> 00:20:04,000 writing your code uh things to keep in 517 00:20:01,960 --> 00:20:05,880 mind so that you don't have to do 518 00:20:04,000 --> 00:20:08,120 performance analysis later because your 519 00:20:05,880 --> 00:20:09,080 your code will always be running at its 520 00:20:08,120 --> 00:20:11,159 most 521 00:20:09,080 --> 00:20:13,320 optimal uh to start with you want to 522 00:20:11,159 --> 00:20:15,520 make use of as many of the built-in 523 00:20:13,320 --> 00:20:18,120 functions as python in python as 524 00:20:15,520 --> 00:20:20,520 possible uh as we mentioned cpython it 525 00:20:18,120 --> 00:20:23,480 is built on C and A lot of those 526 00:20:20,520 --> 00:20:26,720 built-in functions are actually uh 527 00:20:23,480 --> 00:20:29,000 highly um fine-tuned C code under the 528 00:20:26,720 --> 00:20:31,840 hood so the more we leverage them the 529 00:20:29,000 --> 00:20:34,760 more we're making use of Python's raw 530 00:20:31,840 --> 00:20:37,400 performance um in terms of the code that 531 00:20:34,760 --> 00:20:39,280 you write a couple of Tricks uh a local 532 00:20:37,400 --> 00:20:42,000 variable lookup is going to be faster 533 00:20:39,280 --> 00:20:44,760 than a global variable lookup uh however 534 00:20:42,000 --> 00:20:48,039 a hardcoded value is going to be even 535 00:20:44,760 --> 00:20:51,159 faster index access and iteration is 536 00:20:48,039 --> 00:20:52,159 faster in a list but lookup is faster in 537 00:20:51,159 --> 00:20:54,760 a 538 00:20:52,159 --> 00:20:57,799 dictionary dictionary lookup is faster 539 00:20:54,760 --> 00:20:59,480 than accessing an object attribute uh 540 00:20:57,799 --> 00:21:01,080 and instantiating a dictionary or a 541 00:20:59,480 --> 00:21:03,880 tuple is going to be faster than an 542 00:21:01,080 --> 00:21:05,360 object so if you're creating a lot of 543 00:21:03,880 --> 00:21:08,200 objects in your 544 00:21:05,360 --> 00:21:10,440 code um and using that as a means to 545 00:21:08,200 --> 00:21:12,480 pass around data you might find you get 546 00:21:10,440 --> 00:21:15,279 a really nice performance Improvement 547 00:21:12,480 --> 00:21:17,600 simply by replacing them with 548 00:21:15,279 --> 00:21:19,600 dictionaries uh if you're concatenating 549 00:21:17,600 --> 00:21:20,320 data bite array is going to be faster 550 00:21:19,600 --> 00:21:22,880 than 551 00:21:20,320 --> 00:21:24,679 bytes however if you're slicing data 552 00:21:22,880 --> 00:21:27,640 memory view is the way to 553 00:21:24,679 --> 00:21:30,000 go Loop and dictionary comprehensions 554 00:21:27,640 --> 00:21:32,000 are highly op in Python so if you're 555 00:21:30,000 --> 00:21:35,279 doing a lot of looping or building 556 00:21:32,000 --> 00:21:37,799 dictionaries or things like that um try 557 00:21:35,279 --> 00:21:39,640 and squeeze that code into a 558 00:21:37,799 --> 00:21:41,840 comprehension um because that way you're 559 00:21:39,640 --> 00:21:43,200 going to leverage a lot of uh Python's 560 00:21:41,840 --> 00:21:45,760 native 561 00:21:43,200 --> 00:21:48,039 performance uh if you have a lot of 562 00:21:45,760 --> 00:21:50,720 Branch en code paths like those if El 563 00:21:48,039 --> 00:21:52,200 statements we saw before try to optimize 564 00:21:50,720 --> 00:21:55,200 them so that your highest traffic 565 00:21:52,200 --> 00:21:58,240 scenario is going to be the first to 566 00:21:55,200 --> 00:22:00,279 exit uh this might be sacrificing some 567 00:21:58,240 --> 00:22:03,320 perform perance of your low traffic 568 00:22:00,279 --> 00:22:05,159 scenarios uh so depending on uh your 569 00:22:03,320 --> 00:22:06,440 needs uh would be whether you want to 570 00:22:05,159 --> 00:22:09,279 make that 571 00:22:06,440 --> 00:22:11,159 compromise and lastly do try and use the 572 00:22:09,279 --> 00:22:14,320 most recent version of python that you 573 00:22:11,159 --> 00:22:16,840 can um if we are writing really good 574 00:22:14,320 --> 00:22:20,600 pythonic code that is leveraging uh the 575 00:22:16,840 --> 00:22:23,640 performance of um C standard library 576 00:22:20,600 --> 00:22:25,159 then uh the more we can upgrade the more 577 00:22:23,640 --> 00:22:26,760 we can make use of whatever performance 578 00:22:25,159 --> 00:22:28,880 improvements are built into python 579 00:22:26,760 --> 00:22:30,919 itself uh especially if you can be on 580 00:22:28,880 --> 00:22:33,919 3.11 or higher because that included 581 00:22:30,919 --> 00:22:37,279 some really great perf 582 00:22:33,919 --> 00:22:39,400 work so there is a downside the more you 583 00:22:37,279 --> 00:22:41,640 iterate on your code to improve 584 00:22:39,400 --> 00:22:43,600 performance um the more you might find 585 00:22:41,640 --> 00:22:47,000 it goes against your instincts as a 586 00:22:43,600 --> 00:22:48,720 programmer so we mentioned before how a 587 00:22:47,000 --> 00:22:52,400 uh Global variable is a lot less 588 00:22:48,720 --> 00:22:54,960 performant than say a hardcoded value uh 589 00:22:52,400 --> 00:22:56,679 and object attribute lookup is a lot 590 00:22:54,960 --> 00:23:00,080 more expensive than just using a 591 00:22:56,679 --> 00:23:02,360 dictionary uh and this goes on um the 592 00:23:00,080 --> 00:23:06,120 most efficient code you will find is one 593 00:23:02,360 --> 00:23:08,919 giant long flat function uh whereas it's 594 00:23:06,120 --> 00:23:11,320 much more easy to read more modular more 595 00:23:08,919 --> 00:23:15,240 testable uh to break that out into lots 596 00:23:11,320 --> 00:23:17,080 of utility functions um so there is 597 00:23:15,240 --> 00:23:20,279 compromises here depending on where you 598 00:23:17,080 --> 00:23:23,279 fall between code readability code 599 00:23:20,279 --> 00:23:24,880 maintainability versus Raw performance 600 00:23:23,279 --> 00:23:27,000 and so where that line is for your 601 00:23:24,880 --> 00:23:29,640 project is going to depend on your 602 00:23:27,000 --> 00:23:32,279 needs um you can compensate for this a 603 00:23:29,640 --> 00:23:34,360 little bit with lots of inline 604 00:23:32,279 --> 00:23:37,799 documentation uh and if you find 605 00:23:34,360 --> 00:23:40,120 yourself making a um non-developer 606 00:23:37,799 --> 00:23:42,640 friendly design decision in order to 607 00:23:40,120 --> 00:23:45,600 achieve Optimal Performance make sure 608 00:23:42,640 --> 00:23:47,960 that is documented in the code and that 609 00:23:45,600 --> 00:23:50,919 way uh if someone else comes along to 610 00:23:47,960 --> 00:23:53,480 maintain it and decides to clean up for 611 00:23:50,919 --> 00:23:56,799 you uh they're not going to revert one 612 00:23:53,480 --> 00:24:00,080 of your your performance 613 00:23:56,799 --> 00:24:03,360 enhancements um so yeah uh what your 614 00:24:00,080 --> 00:24:05,320 needs are for the project is going to um 615 00:24:03,360 --> 00:24:08,520 determine where you fall between code 616 00:24:05,320 --> 00:24:11,520 maintainability and performance 617 00:24:08,520 --> 00:24:14,120 um so that brings us to whether you 618 00:24:11,520 --> 00:24:16,360 should just uh use an 619 00:24:14,120 --> 00:24:19,159 extension so there are a couple of 620 00:24:16,360 --> 00:24:20,640 downsides to using a compiled extension 621 00:24:19,159 --> 00:24:22,720 uh one is that it's difficult to build 622 00:24:20,640 --> 00:24:26,080 from source code unless you have all of 623 00:24:22,720 --> 00:24:29,360 the build tools um installed on your 624 00:24:26,080 --> 00:24:31,000 machine um the this may or may not be a 625 00:24:29,360 --> 00:24:33,440 problem depending on who you plan to 626 00:24:31,000 --> 00:24:36,120 distribute the code to uh if you want it 627 00:24:33,440 --> 00:24:39,279 to be widely distributed for example uh 628 00:24:36,120 --> 00:24:41,760 to publish this as a package on pii you 629 00:24:39,279 --> 00:24:46,080 will need to build test and maintain 630 00:24:41,760 --> 00:24:48,760 many wheels across many platforms um so 631 00:24:46,080 --> 00:24:51,039 that requires you know some extensive CI 632 00:24:48,760 --> 00:24:52,880 setup and infrastructure in order to get 633 00:24:51,039 --> 00:24:54,960 that uh properly maintained and tested 634 00:24:52,880 --> 00:24:58,600 across all of those 635 00:24:54,960 --> 00:25:03,000 platforms uh when errors do occur it can 636 00:24:58,600 --> 00:25:04,600 be more difficult to uh for customers to 637 00:25:03,000 --> 00:25:07,080 diagnose that themselves if they are 638 00:25:04,600 --> 00:25:09,039 pure python developers uh and we did 639 00:25:07,080 --> 00:25:11,159 find in the projects that we have used C 640 00:25:09,039 --> 00:25:13,640 extensions that it can be harder for 641 00:25:11,159 --> 00:25:16,279 people to engage in the open- source uh 642 00:25:13,640 --> 00:25:18,840 process if uh there we have components 643 00:25:16,279 --> 00:25:20,840 in a language that they're not fluent in 644 00:25:18,840 --> 00:25:23,120 uh so it can be harder for them to file 645 00:25:20,840 --> 00:25:26,679 issue reports and open Community 646 00:25:23,120 --> 00:25:28,840 contribution PRS to help resolve 647 00:25:26,679 --> 00:25:31,480 them having said all 648 00:25:28,840 --> 00:25:33,480 uh if those are not if those cave Arts 649 00:25:31,480 --> 00:25:35,360 uh do not impact your project you can 650 00:25:33,480 --> 00:25:38,159 achieve orders of magnitude performance 651 00:25:35,360 --> 00:25:40,200 gains using an extension uh in the 652 00:25:38,159 --> 00:25:43,080 projects that we have used them my 653 00:25:40,200 --> 00:25:46,679 recommendations are to keep it very 654 00:25:43,080 --> 00:25:49,240 targeted so try and find very specific 655 00:25:46,679 --> 00:25:51,760 functions that you want to increase the 656 00:25:49,240 --> 00:25:54,399 performance of with very defined inputs 657 00:25:51,760 --> 00:25:55,559 and outputs and then use those in your 658 00:25:54,399 --> 00:25:59,080 say C 659 00:25:55,559 --> 00:26:01,000 extensions uh don't try to wrap uh very 660 00:25:59,080 --> 00:26:02,600 large portions of code with complex 661 00:26:01,000 --> 00:26:05,240 logic uh because it can make the 662 00:26:02,600 --> 00:26:09,399 maintenance burdens more extensive going 663 00:26:05,240 --> 00:26:11,320 forward especially if uh the maintainers 664 00:26:09,399 --> 00:26:13,360 of this project are likely to change 665 00:26:11,320 --> 00:26:15,799 over time uh and it might be inherited 666 00:26:13,360 --> 00:26:18,279 by a new set of developers who need to 667 00:26:15,799 --> 00:26:20,080 understand uh the goals of that 668 00:26:18,279 --> 00:26:23,000 code 669 00:26:20,080 --> 00:26:26,159 um lastly always do provide a pure 670 00:26:23,000 --> 00:26:28,360 python fallback so in the case of a 671 00:26:26,159 --> 00:26:30,760 build failure installation failure 672 00:26:28,360 --> 00:26:34,080 incompatibility or anything like that uh 673 00:26:30,760 --> 00:26:35,760 even if it is much slower having an out 674 00:26:34,080 --> 00:26:38,279 giving your customers an option to fall 675 00:26:35,760 --> 00:26:39,600 back to a pure python uh implementation 676 00:26:38,279 --> 00:26:42,480 that they know will always work and they 677 00:26:39,600 --> 00:26:45,279 will always understand uh is a really 678 00:26:42,480 --> 00:26:48,520 customer friendly thing to 679 00:26:45,279 --> 00:26:51,000 do so in conclusion um this is the 680 00:26:48,520 --> 00:26:53,799 process that I use when I am doing 681 00:26:51,000 --> 00:26:56,279 performance analysis of my code uh so to 682 00:26:53,799 --> 00:26:58,480 start with I figure out what it is that 683 00:26:56,279 --> 00:27:02,240 uh is important performance-wise for for 684 00:26:58,480 --> 00:27:04,840 that um for that project uh what what we 685 00:27:02,240 --> 00:27:07,080 what it is we want to measure um and 686 00:27:04,840 --> 00:27:10,080 what success looks like what uh 687 00:27:07,080 --> 00:27:11,840 Benchmark are we trying to reach then we 688 00:27:10,080 --> 00:27:13,559 profile it we find out what the 689 00:27:11,840 --> 00:27:14,799 expensive functions are and how many 690 00:27:13,559 --> 00:27:17,200 times they're being called and we 691 00:27:14,799 --> 00:27:19,480 optimize them as much as possible and 692 00:27:17,200 --> 00:27:22,679 then we verify make sure that we have 693 00:27:19,480 --> 00:27:24,640 actually made improvements and we keep 694 00:27:22,679 --> 00:27:26,760 going and we keep going and we keep 695 00:27:24,640 --> 00:27:29,120 improving and hopefully we hit that 696 00:27:26,760 --> 00:27:31,440 Target number um 697 00:27:29,120 --> 00:27:34,039 or maybe it's time boxed and your 698 00:27:31,440 --> 00:27:34,039 manager says 699 00:27:34,559 --> 00:27:40,600 enough uh so that covers pretty much uh 700 00:27:37,559 --> 00:27:42,919 everything I wanted to go over today um 701 00:27:40,600 --> 00:27:44,480 uh these slides will be in my my GitHub 702 00:27:42,919 --> 00:27:45,960 repo uh if you want to click through to 703 00:27:44,480 --> 00:27:48,399 any of the links to the tools and things 704 00:27:45,960 --> 00:27:51,600 like that um my GitHub Alias is just my 705 00:27:48,399 --> 00:27:53,720 name uh big thank you to swa and Adam 706 00:27:51,600 --> 00:27:55,080 who are both um current and former 707 00:27:53,720 --> 00:27:57,480 colleagues who helped contribute to this 708 00:27:55,080 --> 00:28:00,380 content and a big thank you to my team 709 00:27:57,480 --> 00:28:06,559 who allow me do keep writing python 710 00:28:00,380 --> 00:28:06,559 [Applause]