1 00:00:06,320 --> 00:00:11,499 [Music] 2 00:00:15,599 --> 00:00:20,080 hello everyone and welcome back to the 3 00:00:17,520 --> 00:00:24,160 gear order theater at linux confer you 4 00:00:20,080 --> 00:00:25,680 2022. uh our next speaker is uh rude who 5 00:00:24,160 --> 00:00:27,119 is a distinguished 6 00:00:25,680 --> 00:00:29,279 engineer in the oracle linux and 7 00:00:27,119 --> 00:00:31,199 virtualization organization at oracle 8 00:00:29,279 --> 00:00:33,120 corporation where he works in the tools 9 00:00:31,199 --> 00:00:35,040 team in this talk he's presenting an 10 00:00:33,120 --> 00:00:37,440 overview of g for fng which is a next 11 00:00:35,040 --> 00:00:39,520 generation profiling tool for linux uh 12 00:00:37,440 --> 00:00:41,360 he is awake and here as you can see and 13 00:00:39,520 --> 00:00:43,280 available to answer questions in chat 14 00:00:41,360 --> 00:00:45,039 throughout the talk and at the end if 15 00:00:43,280 --> 00:00:47,440 there is more to talk about but here is 16 00:00:45,039 --> 00:00:47,440 the video 17 00:00:51,760 --> 00:00:55,360 welcome to this presentation about 18 00:00:53,280 --> 00:00:57,600 g-profangie the next generation gnu 19 00:00:55,360 --> 00:00:59,120 profiling tool my name is root from the 20 00:00:57,600 --> 00:01:01,600 pass and this is joint work with 21 00:00:59,120 --> 00:01:02,800 vladimir zensef i will be presenting 22 00:01:01,600 --> 00:01:04,640 today 23 00:01:02,800 --> 00:01:06,080 we both work in the oracle linux 24 00:01:04,640 --> 00:01:07,920 engineering and virtualization 25 00:01:06,080 --> 00:01:10,080 organization and within that we're part 26 00:01:07,920 --> 00:01:11,920 of the compilers and tool chain team and 27 00:01:10,080 --> 00:01:13,760 both of us are deeply involved with g 28 00:01:11,920 --> 00:01:15,520 profinji 29 00:01:13,760 --> 00:01:17,600 and in this presentation would like to 30 00:01:15,520 --> 00:01:19,920 show you what this tool is about so 31 00:01:17,600 --> 00:01:22,400 after a brief introduction into into the 32 00:01:19,920 --> 00:01:24,159 tool i want to show as many examples as 33 00:01:22,400 --> 00:01:26,240 possible i'd like to show you how you 34 00:01:24,159 --> 00:01:29,200 can use this tool to identify 35 00:01:26,240 --> 00:01:31,200 bottlenecks in your application 36 00:01:29,200 --> 00:01:33,040 so the examples are all result driven i 37 00:01:31,200 --> 00:01:35,200 want to show you the kind of information 38 00:01:33,040 --> 00:01:37,680 you can get i won't have a lot of time 39 00:01:35,200 --> 00:01:39,520 to go into the details behind it but 40 00:01:37,680 --> 00:01:42,159 again i want to show you the complete 41 00:01:39,520 --> 00:01:43,759 overview of what you can do 42 00:01:42,159 --> 00:01:46,560 with this tool 43 00:01:43,759 --> 00:01:49,360 for those of you that like to learn more 44 00:01:46,560 --> 00:01:51,600 we have a public repo and within that 45 00:01:49,360 --> 00:01:52,799 there's a mini tutorial in tech info 46 00:01:51,600 --> 00:01:54,320 format 47 00:01:52,799 --> 00:01:56,399 at the end of this presentation i'd like 48 00:01:54,320 --> 00:01:59,680 to say a couple of things about where we 49 00:01:56,399 --> 00:02:01,680 would like to take this tool from here 50 00:01:59,680 --> 00:02:04,799 for the outside world it all started at 51 00:02:01,680 --> 00:02:07,520 august on august 11 2021 when we 52 00:02:04,799 --> 00:02:09,520 submitted our code for review to the bin 53 00:02:07,520 --> 00:02:11,039 utils community 54 00:02:09,520 --> 00:02:12,800 we have been working on it for quite a 55 00:02:11,039 --> 00:02:15,200 while of course but that was the the 56 00:02:12,800 --> 00:02:18,160 time the first time that we went public 57 00:02:15,200 --> 00:02:19,760 and asked for inclusion into bin utils 58 00:02:18,160 --> 00:02:21,280 if you'd like those emails are all 59 00:02:19,760 --> 00:02:22,640 archived you can you can check the 60 00:02:21,280 --> 00:02:25,120 postings and you can check the 61 00:02:22,640 --> 00:02:27,120 discussions that that followed 62 00:02:25,120 --> 00:02:30,080 and to our great pleasure 63 00:02:27,120 --> 00:02:32,560 on december 13 2021 64 00:02:30,080 --> 00:02:34,640 um our software was in approved for 65 00:02:32,560 --> 00:02:37,120 inclusion in into the bin neutrals 66 00:02:34,640 --> 00:02:38,160 collection so in a future release you 67 00:02:37,120 --> 00:02:40,480 will have 68 00:02:38,160 --> 00:02:42,480 you will have this tool but again 69 00:02:40,480 --> 00:02:44,560 there's already a repo you can download 70 00:02:42,480 --> 00:02:46,400 and install the software today if you'd 71 00:02:44,560 --> 00:02:49,120 like to do so 72 00:02:46,400 --> 00:02:50,800 a little bit of history about g profinji 73 00:02:49,120 --> 00:02:52,879 but i'll keep it really brief just to 74 00:02:50,800 --> 00:02:55,280 put things into perspective 75 00:02:52,879 --> 00:02:58,000 it all started with the oracle developer 76 00:02:55,280 --> 00:02:59,760 studio performance analyzer and that was 77 00:02:58,000 --> 00:03:02,879 a performance analysis tool that was 78 00:02:59,760 --> 00:03:04,720 developed for over 20 years 79 00:03:02,879 --> 00:03:06,800 in those years many internal and 80 00:03:04,720 --> 00:03:09,040 external users have been using this tool 81 00:03:06,800 --> 00:03:11,280 and often with pretty ugly real world 82 00:03:09,040 --> 00:03:13,280 applications 83 00:03:11,280 --> 00:03:15,120 the focus was on the spark processor 84 00:03:13,280 --> 00:03:17,440 studio compilers and solar operating 85 00:03:15,120 --> 00:03:20,080 system but there has been a linux 86 00:03:17,440 --> 00:03:22,720 version an x86 linux version for over 10 87 00:03:20,080 --> 00:03:25,040 years so we took this sourcetree as the 88 00:03:22,720 --> 00:03:27,200 basis for g profinji 89 00:03:25,040 --> 00:03:28,959 and we did a lot of work on it first of 90 00:03:27,200 --> 00:03:31,200 all we created the standalone version on 91 00:03:28,959 --> 00:03:33,120 linux so you can just install this this 92 00:03:31,200 --> 00:03:34,879 profiling tool you don't need to install 93 00:03:33,120 --> 00:03:37,040 all of the studio anymore it's a 94 00:03:34,879 --> 00:03:39,200 standalone repo now 95 00:03:37,040 --> 00:03:40,799 part of bin new tilts of course but 96 00:03:39,200 --> 00:03:42,000 again we took it out of the studio 97 00:03:40,799 --> 00:03:44,720 environment 98 00:03:42,000 --> 00:03:46,720 we adapted the code to the to adhere to 99 00:03:44,720 --> 00:03:49,040 the new coding standards 100 00:03:46,720 --> 00:03:50,400 we changed the build process so that 101 00:03:49,040 --> 00:03:51,920 it's compliant with the other bin 102 00:03:50,400 --> 00:03:54,799 neutrals components 103 00:03:51,920 --> 00:03:57,280 and we added a first port to arm so we 104 00:03:54,799 --> 00:03:59,360 have support for i'm now very excited 105 00:03:57,280 --> 00:04:01,760 about that 106 00:03:59,360 --> 00:04:04,879 uh we also fixed several bugs and we 107 00:04:01,760 --> 00:04:06,560 completely redesigned the user interface 108 00:04:04,879 --> 00:04:09,120 so for those of you that are familiar 109 00:04:06,560 --> 00:04:11,680 with the studio product uh you will find 110 00:04:09,120 --> 00:04:14,480 similar things but certainly the the ui 111 00:04:11,680 --> 00:04:16,720 has been completely changed 112 00:04:14,480 --> 00:04:18,560 so what is g proven g 113 00:04:16,720 --> 00:04:20,639 it it's a tool that collects and 114 00:04:18,560 --> 00:04:23,520 displays application level performance 115 00:04:20,639 --> 00:04:25,840 data that's what it does it's meant to 116 00:04:23,520 --> 00:04:28,960 profile your application and we support 117 00:04:25,840 --> 00:04:30,240 the following languages c c plus java 118 00:04:28,960 --> 00:04:32,240 and scala 119 00:04:30,240 --> 00:04:34,560 we do have full support for the gcc 120 00:04:32,240 --> 00:04:37,600 compilers and we support various 121 00:04:34,560 --> 00:04:38,479 processors from intel amd and arm 122 00:04:37,600 --> 00:04:40,240 and 123 00:04:38,479 --> 00:04:43,360 you can just you can use 124 00:04:40,240 --> 00:04:45,280 user tool to profile on those processors 125 00:04:43,360 --> 00:04:47,199 one of the key features of this tool is 126 00:04:45,280 --> 00:04:48,720 that there's no need to recompile the 127 00:04:47,199 --> 00:04:50,880 code you can 128 00:04:48,720 --> 00:04:52,880 profile the binary as is and i will 129 00:04:50,880 --> 00:04:54,479 demonstrate that both on the slides and 130 00:04:52,880 --> 00:04:56,160 in the live demo 131 00:04:54,479 --> 00:04:58,639 we also have full support for 132 00:04:56,160 --> 00:05:00,320 multi-threading 133 00:04:58,639 --> 00:05:02,560 applications parallelized to either with 134 00:05:00,320 --> 00:05:05,199 posix threads openmp or java threads 135 00:05:02,560 --> 00:05:06,960 they can all be profiled and as i will 136 00:05:05,199 --> 00:05:08,560 show as well there's no need to do 137 00:05:06,960 --> 00:05:10,400 anything different in the tool the tool 138 00:05:08,560 --> 00:05:13,520 will automatically adapt to a 139 00:05:10,400 --> 00:05:15,600 multi-threading environment 140 00:05:13,520 --> 00:05:18,160 so how does it all work 141 00:05:15,600 --> 00:05:20,320 well it's a it's a two-step approach 142 00:05:18,160 --> 00:05:22,240 step one we collect the performance data 143 00:05:20,320 --> 00:05:23,919 on the target executable you define the 144 00:05:22,240 --> 00:05:26,400 job you run you're going to run it and 145 00:05:23,919 --> 00:05:28,240 you run it under control of our tool and 146 00:05:26,400 --> 00:05:30,720 while it's running it will collect the 147 00:05:28,240 --> 00:05:32,560 data and in the next step you start 148 00:05:30,720 --> 00:05:34,639 looking at the data you display the data 149 00:05:32,560 --> 00:05:36,800 and do your analysis 150 00:05:34,639 --> 00:05:38,720 and in in that analysis phase there's 151 00:05:36,800 --> 00:05:41,280 information available at the function 152 00:05:38,720 --> 00:05:43,520 source and disassembly level and we have 153 00:05:41,280 --> 00:05:46,479 quite quite a number of views into the 154 00:05:43,520 --> 00:05:48,880 data and in my experience what it means 155 00:05:46,479 --> 00:05:51,199 is that already a single run can provide 156 00:05:48,880 --> 00:05:52,880 a lot of insight 157 00:05:51,199 --> 00:05:55,039 one of the things that makes life easier 158 00:05:52,880 --> 00:05:57,680 certainly for me but i'm sure for other 159 00:05:55,039 --> 00:06:00,000 people as well is scripting support 160 00:05:57,680 --> 00:06:02,080 so in a completely automated way you can 161 00:06:00,000 --> 00:06:04,400 generate your profiles and you can get 162 00:06:02,080 --> 00:06:05,759 the information on your screen i use 163 00:06:04,400 --> 00:06:07,919 that all the time 164 00:06:05,759 --> 00:06:09,680 we also have filters so you can zoom in 165 00:06:07,919 --> 00:06:11,680 on the area of interest 166 00:06:09,680 --> 00:06:13,600 and that's very helpful too 167 00:06:11,680 --> 00:06:15,680 and another really nice feature that we 168 00:06:13,600 --> 00:06:17,520 have i think is really nice is 169 00:06:15,680 --> 00:06:19,520 comparison of profiles when you think 170 00:06:17,520 --> 00:06:21,360 about it profiling is about comparing 171 00:06:19,520 --> 00:06:23,759 you have a good good run and you have a 172 00:06:21,360 --> 00:06:25,360 bad run that's all of a sudden is slower 173 00:06:23,759 --> 00:06:26,880 or you're looking at some performance 174 00:06:25,360 --> 00:06:28,400 regression in a multi-threaded 175 00:06:26,880 --> 00:06:29,919 application 176 00:06:28,400 --> 00:06:31,680 you want to look at the good case versus 177 00:06:29,919 --> 00:06:33,440 the bad case and see what's different 178 00:06:31,680 --> 00:06:35,520 and we support that at the tool level so 179 00:06:33,440 --> 00:06:39,120 you can compare profiles and and again 180 00:06:35,520 --> 00:06:41,199 for that i have an example as well 181 00:06:39,120 --> 00:06:44,000 a little bit of a comparison with g prof 182 00:06:41,199 --> 00:06:45,759 um as sort of suggested by the name 183 00:06:44,000 --> 00:06:47,680 we'd like to see this as the next 184 00:06:45,759 --> 00:06:49,599 version of gproff in a way although 185 00:06:47,680 --> 00:06:51,919 architecturally things are very very 186 00:06:49,599 --> 00:06:54,000 different we hope that those people 187 00:06:51,919 --> 00:06:56,319 using gproff today will switch over to 188 00:06:54,000 --> 00:06:58,319 dprofing although both of them will 189 00:06:56,319 --> 00:07:01,199 continue to exist 190 00:06:58,319 --> 00:07:03,039 first of all gproff uses tracing 191 00:07:01,199 --> 00:07:05,199 instrumentation you have to recompile 192 00:07:03,039 --> 00:07:07,840 your code the compiler will insert 193 00:07:05,199 --> 00:07:10,319 library calls to collect the data 194 00:07:07,840 --> 00:07:13,199 we don't need that we use the sampling 195 00:07:10,319 --> 00:07:15,840 so what we do we take the binary and and 196 00:07:13,199 --> 00:07:18,400 we we run it as is and meanwhile we'll 197 00:07:15,840 --> 00:07:19,840 sample the data and i'll have a slide on 198 00:07:18,400 --> 00:07:21,759 that to explain that a little in a 199 00:07:19,840 --> 00:07:23,520 little bit more detail 200 00:07:21,759 --> 00:07:25,280 so with gproff you need to have access 201 00:07:23,520 --> 00:07:27,039 to the source code you need to recompile 202 00:07:25,280 --> 00:07:29,199 the code you don't have to do that in 203 00:07:27,039 --> 00:07:31,759 our case you can essentially profile any 204 00:07:29,199 --> 00:07:34,080 sort of binary 205 00:07:31,759 --> 00:07:36,639 gproff has somewhat fallen behind in 206 00:07:34,080 --> 00:07:38,800 terms of support for modern features and 207 00:07:36,639 --> 00:07:40,000 weak pretty much up to speed for example 208 00:07:38,800 --> 00:07:42,639 we support 209 00:07:40,000 --> 00:07:45,840 profiling of shared libraries and 210 00:07:42,639 --> 00:07:45,840 multi-threaded applications 211 00:07:46,240 --> 00:07:50,160 gprov also has limited customization i 212 00:07:48,639 --> 00:07:52,080 think while we have for example the 213 00:07:50,160 --> 00:07:54,639 scripting commands and we have lots of 214 00:07:52,080 --> 00:07:57,360 options to really tailor the environment 215 00:07:54,639 --> 00:07:59,440 tailor what you want to profile 216 00:07:57,360 --> 00:08:00,879 gprof does not have filters we have them 217 00:07:59,440 --> 00:08:02,479 and once you get the hang of it you'll 218 00:08:00,879 --> 00:08:03,919 find that they're very very powerful 219 00:08:02,479 --> 00:08:05,599 very helpful 220 00:08:03,919 --> 00:08:07,440 another big thing is is that you can't 221 00:08:05,599 --> 00:08:08,639 compare profiles with gproff while we 222 00:08:07,440 --> 00:08:10,319 can do that 223 00:08:08,639 --> 00:08:12,240 and gproff does not have support for 224 00:08:10,319 --> 00:08:15,120 event counters while we support event 225 00:08:12,240 --> 00:08:17,120 counters so for example you can 226 00:08:15,120 --> 00:08:19,039 you can monitor cache misses now to be 227 00:08:17,120 --> 00:08:21,280 honest that's the footnote here 228 00:08:19,039 --> 00:08:23,120 we're not up to speed on the most recent 229 00:08:21,280 --> 00:08:25,280 processors we're working on that that's 230 00:08:23,120 --> 00:08:27,520 work in progress but you can definitely 231 00:08:25,280 --> 00:08:30,080 for a whole range of processors access 232 00:08:27,520 --> 00:08:32,000 the event counters 233 00:08:30,080 --> 00:08:34,000 so i now want to show you a little bit 234 00:08:32,000 --> 00:08:35,760 what goes on under the hood it is a 235 00:08:34,000 --> 00:08:37,760 simplification but hopefully it will 236 00:08:35,760 --> 00:08:39,279 give you feel and help you understand 237 00:08:37,760 --> 00:08:41,039 how we can do this 238 00:08:39,279 --> 00:08:42,479 with your code without asking you to 239 00:08:41,039 --> 00:08:44,240 recompile 240 00:08:42,479 --> 00:08:46,160 the application 241 00:08:44,240 --> 00:08:48,080 you run your program under control of 242 00:08:46,160 --> 00:08:49,920 our tool and what we do at certain 243 00:08:48,080 --> 00:08:52,160 intervals and by default and that's 10 244 00:08:49,920 --> 00:08:54,320 milliseconds we stop the program when we 245 00:08:52,160 --> 00:08:56,320 stop the program we record information 246 00:08:54,320 --> 00:08:59,120 of interest things we'd like to know we 247 00:08:56,320 --> 00:09:01,760 let go again and again by default 10 248 00:08:59,120 --> 00:09:03,760 milliseconds later we stop it again 249 00:09:01,760 --> 00:09:05,519 that sampling interval is by the way 250 00:09:03,760 --> 00:09:07,360 that's one of the options you can you 251 00:09:05,519 --> 00:09:09,279 can choose you can change it you can you 252 00:09:07,360 --> 00:09:11,279 can put it to a longer sampling rate or 253 00:09:09,279 --> 00:09:12,480 a shorter sampling rate depending on 254 00:09:11,279 --> 00:09:14,959 your needs 255 00:09:12,480 --> 00:09:16,880 so when the program is finished we have 256 00:09:14,959 --> 00:09:19,120 an overview of where the time was spent 257 00:09:16,880 --> 00:09:21,839 and that's the one of the things that we 258 00:09:19,120 --> 00:09:24,320 we will show to you 259 00:09:21,839 --> 00:09:26,480 of the things to be aware of and you may 260 00:09:24,320 --> 00:09:28,720 ask yourself with sampling it's 261 00:09:26,480 --> 00:09:31,200 statistical sampling if you run exactly 262 00:09:28,720 --> 00:09:33,519 the same job again the numbers 263 00:09:31,200 --> 00:09:35,279 will be slightly different but in my 264 00:09:33,519 --> 00:09:37,279 experience the 265 00:09:35,279 --> 00:09:38,959 deviation is really really small you can 266 00:09:37,279 --> 00:09:43,040 just ignore it 267 00:09:38,959 --> 00:09:43,040 this works really well in practice 268 00:09:43,279 --> 00:09:48,560 so here's the command structure 269 00:09:46,399 --> 00:09:50,640 all the commands start with g prof and g 270 00:09:48,560 --> 00:09:52,800 so we make them unique in that way so 271 00:09:50,640 --> 00:09:54,560 they start with g profit g then there's 272 00:09:52,800 --> 00:09:55,519 some functionality whatever you want to 273 00:09:54,560 --> 00:09:57,760 do 274 00:09:55,519 --> 00:09:59,519 optionally a qualifier to further 275 00:09:57,760 --> 00:10:02,240 specify what you want to do and 276 00:09:59,519 --> 00:10:04,240 optionally options to that qualifier to 277 00:10:02,240 --> 00:10:05,600 customize the behavior that's typically 278 00:10:04,240 --> 00:10:08,000 how you use it 279 00:10:05,600 --> 00:10:10,399 so examples are to collect performance 280 00:10:08,000 --> 00:10:12,959 information you you do gproff and g 281 00:10:10,399 --> 00:10:15,680 collect app to display the information 282 00:10:12,959 --> 00:10:17,839 you do gprofng display text 283 00:10:15,680 --> 00:10:19,519 we also have an archiving feature and 284 00:10:17,839 --> 00:10:21,600 you just call it archive because we 285 00:10:19,519 --> 00:10:24,160 don't need anything else at that level 286 00:10:21,600 --> 00:10:25,120 so these are examples of the ui 287 00:10:24,160 --> 00:10:27,920 we have 288 00:10:25,120 --> 00:10:29,680 five commands currently supported 289 00:10:27,920 --> 00:10:31,839 as i've just shown collect the 290 00:10:29,680 --> 00:10:33,120 application level performance data 291 00:10:31,839 --> 00:10:34,959 collect app 292 00:10:33,120 --> 00:10:36,320 display the performance data in ascii 293 00:10:34,959 --> 00:10:38,399 format 294 00:10:36,320 --> 00:10:40,720 we can archive an experiment directory 295 00:10:38,399 --> 00:10:42,480 so what we do we sort of package the 296 00:10:40,720 --> 00:10:44,640 environment and that includes a shared 297 00:10:42,480 --> 00:10:47,279 libraries your source code if you like 298 00:10:44,640 --> 00:10:49,440 you can sort of customize that but we 299 00:10:47,279 --> 00:10:51,760 sort of freeze make a snapshot of the 300 00:10:49,440 --> 00:10:53,920 entire environment so you can always go 301 00:10:51,760 --> 00:10:55,920 back and analyze that environment just 302 00:10:53,920 --> 00:10:57,360 in case libraries on your system have 303 00:10:55,920 --> 00:10:59,040 changed so 304 00:10:57,360 --> 00:11:00,480 so you can completely archive an 305 00:10:59,040 --> 00:11:02,480 experiment and 306 00:11:00,480 --> 00:11:04,560 i wouldn't say replay but analyze the 307 00:11:02,480 --> 00:11:07,200 data later even things even if things 308 00:11:04,560 --> 00:11:08,640 have changed on your system 309 00:11:07,200 --> 00:11:10,480 we have a simple tool it's currently 310 00:11:08,640 --> 00:11:12,880 quite simple it displays the source and 311 00:11:10,480 --> 00:11:15,680 disassembly interleaved and you give it 312 00:11:12,880 --> 00:11:17,680 an object file that will show it 313 00:11:15,680 --> 00:11:19,680 our plan is in the future that it will 314 00:11:17,680 --> 00:11:21,519 show you more more information about 315 00:11:19,680 --> 00:11:22,640 what the compiler has been doing to your 316 00:11:21,519 --> 00:11:24,959 code so 317 00:11:22,640 --> 00:11:26,320 uh maybe show you optimization messages 318 00:11:24,959 --> 00:11:28,800 for example 319 00:11:26,320 --> 00:11:31,920 a tool that's in progress and currently 320 00:11:28,800 --> 00:11:34,640 doesn't do much is called display html 321 00:11:31,920 --> 00:11:36,320 it it generates an html structure and it 322 00:11:34,640 --> 00:11:38,560 allows you to analyze the performance 323 00:11:36,320 --> 00:11:41,200 data in a browser and we're working hard 324 00:11:38,560 --> 00:11:43,680 to finish that and have that available 325 00:11:41,200 --> 00:11:46,399 relatively soon so that's called display 326 00:11:43,680 --> 00:11:46,399 html 327 00:11:46,880 --> 00:11:51,120 i now want to briefly talk about 328 00:11:49,440 --> 00:11:53,839 two important concepts they're called 329 00:11:51,120 --> 00:11:56,320 inclusive and exclusive metrics they're 330 00:11:53,839 --> 00:11:58,720 not restricted to g profit ng it's part 331 00:11:56,320 --> 00:12:00,720 of the sort of the profiling terminology 332 00:11:58,720 --> 00:12:03,200 but i would like to explain them here 333 00:12:00,720 --> 00:12:05,440 you'll also see them in the examples 334 00:12:03,200 --> 00:12:06,480 a metric is something that you measure 335 00:12:05,440 --> 00:12:07,920 that could be 336 00:12:06,480 --> 00:12:09,680 cpu time 337 00:12:07,920 --> 00:12:11,839 level one cache misses 338 00:12:09,680 --> 00:12:13,920 number of branches branch instructions 339 00:12:11,839 --> 00:12:16,000 executed whatever 340 00:12:13,920 --> 00:12:18,959 so that's a metric when we look at the 341 00:12:16,000 --> 00:12:20,480 inclusive value we include everything 342 00:12:18,959 --> 00:12:22,480 underneath the caller so you have a 343 00:12:20,480 --> 00:12:24,720 certain point in your call tree and you 344 00:12:22,480 --> 00:12:27,120 count everything below that 345 00:12:24,720 --> 00:12:28,800 so for example if it's the cpu time it 346 00:12:27,120 --> 00:12:30,800 will give you the total time of that 347 00:12:28,800 --> 00:12:32,800 function including calling other 348 00:12:30,800 --> 00:12:34,880 functions 349 00:12:32,800 --> 00:12:36,959 the opposite sort of metric is the 350 00:12:34,880 --> 00:12:39,279 exclusive metric where we exclude 351 00:12:36,959 --> 00:12:41,680 everything outside of the caller so 352 00:12:39,279 --> 00:12:43,360 that's like the pure time or pure number 353 00:12:41,680 --> 00:12:45,279 of cache misses you spent in that 354 00:12:43,360 --> 00:12:46,320 function outside of calling order 355 00:12:45,279 --> 00:12:48,320 functions 356 00:12:46,320 --> 00:12:49,600 and both metrics have their value and 357 00:12:48,320 --> 00:12:52,320 you can get them very easily with 358 00:12:49,600 --> 00:12:53,839 gprofan g and i will show that here's a 359 00:12:52,320 --> 00:12:55,920 little example 360 00:12:53,839 --> 00:12:56,639 this is part of my call tree and these 361 00:12:55,920 --> 00:12:58,560 are 362 00:12:56,639 --> 00:13:01,120 in total five functions and i'm showing 363 00:12:58,560 --> 00:13:02,399 the exclusive time here of what they 364 00:13:01,120 --> 00:13:05,040 what they spend 365 00:13:02,399 --> 00:13:06,880 and if i would make a table this is what 366 00:13:05,040 --> 00:13:08,399 i would get so the orange function let's 367 00:13:06,880 --> 00:13:10,560 call it function a 368 00:13:08,399 --> 00:13:12,399 it's exclusive time is 10 that's the 369 00:13:10,560 --> 00:13:13,920 number in the box but it's calling those 370 00:13:12,399 --> 00:13:15,680 other functions and when you add up the 371 00:13:13,920 --> 00:13:17,760 numbers you'll see that the total time 372 00:13:15,680 --> 00:13:20,480 is 75 seconds 373 00:13:17,760 --> 00:13:22,480 b is the gray one let's call it b 374 00:13:20,480 --> 00:13:24,399 is what we call a leave function it's 375 00:13:22,480 --> 00:13:26,720 not calling anything else the inclusive 376 00:13:24,399 --> 00:13:29,200 and exclusive times are the same in both 377 00:13:26,720 --> 00:13:30,639 case 20 seconds the same is true for the 378 00:13:29,200 --> 00:13:31,680 for the pink one 379 00:13:30,639 --> 00:13:33,920 function d 380 00:13:31,680 --> 00:13:36,240 c is different that's the light blue one 381 00:13:33,920 --> 00:13:37,920 c is calling something else 382 00:13:36,240 --> 00:13:39,120 function e 383 00:13:37,920 --> 00:13:40,959 so it's 384 00:13:39,120 --> 00:13:43,360 exclusive time is five seconds but it's 385 00:13:40,959 --> 00:13:46,959 calling something that takes 25 seconds 386 00:13:43,360 --> 00:13:49,040 so the inclusive time of that one is 30. 387 00:13:46,959 --> 00:13:51,680 and again in this way you can you can 388 00:13:49,040 --> 00:13:53,360 navigate through the data the inclusive 389 00:13:51,680 --> 00:13:55,920 time for example gives you the most 390 00:13:53,360 --> 00:13:58,560 expensive path in your program and the 391 00:13:55,920 --> 00:14:00,560 exclusive time will tell you how much 392 00:13:58,560 --> 00:14:02,639 that particular function contributes to 393 00:14:00,560 --> 00:14:04,240 that total 394 00:14:02,639 --> 00:14:06,480 let's hide him for a demo i think i'd 395 00:14:04,240 --> 00:14:08,880 like to show you a brief just few minute 396 00:14:06,480 --> 00:14:10,800 demo how you use this tool and actually 397 00:14:08,880 --> 00:14:13,360 i want to show you how easy it is to get 398 00:14:10,800 --> 00:14:16,639 a basic profile on your screen so let's 399 00:14:13,360 --> 00:14:18,240 get started and let's have the demo 400 00:14:16,639 --> 00:14:20,480 and i want to show you how you can make 401 00:14:18,240 --> 00:14:23,600 a profile for your own application using 402 00:14:20,480 --> 00:14:26,320 g profinji my application is a simple 403 00:14:23,600 --> 00:14:28,560 matrix vector algorithm parallelized 404 00:14:26,320 --> 00:14:31,040 with basic threads and the way i run it 405 00:14:28,560 --> 00:14:33,279 is i specify the matrix sizes the number 406 00:14:31,040 --> 00:14:35,360 of rows and columns and the number of 407 00:14:33,279 --> 00:14:37,279 threads that i want to use which is two 408 00:14:35,360 --> 00:14:39,839 in this case 409 00:14:37,279 --> 00:14:42,240 so the program runs as usual it ran to 410 00:14:39,839 --> 00:14:44,639 completion all as well the question is 411 00:14:42,240 --> 00:14:46,399 how do you make a profile out of that 412 00:14:44,639 --> 00:14:48,639 well it's really easy 413 00:14:46,399 --> 00:14:50,399 you run the program as before same 414 00:14:48,639 --> 00:14:53,120 environment variable same settings 415 00:14:50,399 --> 00:14:54,880 nothing changes the only thing you do is 416 00:14:53,120 --> 00:14:56,320 you're going to run it under control of 417 00:14:54,880 --> 00:14:58,160 gprofan g 418 00:14:56,320 --> 00:14:59,839 you tell it what you want to do in this 419 00:14:58,160 --> 00:15:03,199 case i want to collect performance 420 00:14:59,839 --> 00:15:05,440 information on my application so gprofan 421 00:15:03,199 --> 00:15:08,079 g collect app and then the way you run 422 00:15:05,440 --> 00:15:08,079 your program 423 00:15:08,160 --> 00:15:12,680 when you do that you'll see an extra 424 00:15:10,399 --> 00:15:15,519 line creating experiment directory 425 00:15:12,680 --> 00:15:16,360 test.one.er and a process id 426 00:15:15,519 --> 00:15:19,040 the 427 00:15:16,360 --> 00:15:20,639 test.one.er that's a regular linux 428 00:15:19,040 --> 00:15:23,600 directory that has the internal 429 00:15:20,639 --> 00:15:25,519 information for gprofng to to show you 430 00:15:23,600 --> 00:15:28,720 the performance data 431 00:15:25,519 --> 00:15:31,120 and that's clearly a default name if you 432 00:15:28,720 --> 00:15:34,639 would run this again you would guess get 433 00:15:31,120 --> 00:15:36,720 a test.2.er test.3 and so forth 434 00:15:34,639 --> 00:15:39,920 and of course you can change that name 435 00:15:36,720 --> 00:15:42,720 if you if you want by default we never 436 00:15:39,920 --> 00:15:45,040 overwrite an experiment directory so 437 00:15:42,720 --> 00:15:47,360 we'll keep on numbering but again you 438 00:15:45,040 --> 00:15:49,600 can use your own own name for the 439 00:15:47,360 --> 00:15:51,199 directory and that's what you use in 440 00:15:49,600 --> 00:15:52,160 practice to give it a more meaningful 441 00:15:51,199 --> 00:15:54,959 name 442 00:15:52,160 --> 00:15:57,440 in any case we have our test.one.er and 443 00:15:54,959 --> 00:15:59,920 now we can start doing the analysis so 444 00:15:57,440 --> 00:16:02,320 we start graphing g again but this time 445 00:15:59,920 --> 00:16:05,440 i want to display the information i want 446 00:16:02,320 --> 00:16:07,680 to do so in ascii or text format and i 447 00:16:05,440 --> 00:16:08,800 want to analyze the information in that 448 00:16:07,680 --> 00:16:11,519 directory 449 00:16:08,800 --> 00:16:13,360 and here's how you do that 450 00:16:11,519 --> 00:16:15,279 now what happens when i use it the 451 00:16:13,360 --> 00:16:17,279 command this way i get into an 452 00:16:15,279 --> 00:16:18,800 interpreter mode in interactive mode i 453 00:16:17,279 --> 00:16:21,680 can give commands 454 00:16:18,800 --> 00:16:23,680 and i can sort of navigate my way but to 455 00:16:21,680 --> 00:16:26,079 be honest i never use it this way so 456 00:16:23,680 --> 00:16:28,880 i'll get out of that because the way i 457 00:16:26,079 --> 00:16:31,600 like to use it and i would sort of 458 00:16:28,880 --> 00:16:33,279 recommend you do with that as well is 459 00:16:31,600 --> 00:16:34,240 you give the commands at the command 460 00:16:33,279 --> 00:16:37,120 line 461 00:16:34,240 --> 00:16:39,360 as you invoke g profanji so here what 462 00:16:37,120 --> 00:16:41,120 i'm doing is i'm asking for the function 463 00:16:39,360 --> 00:16:44,000 information i want to get the function 464 00:16:41,120 --> 00:16:45,199 overview for that experiment and that's 465 00:16:44,000 --> 00:16:47,920 how you do it 466 00:16:45,199 --> 00:16:50,399 and this is what i get on my screen 467 00:16:47,920 --> 00:16:52,399 and what we see we see that 468 00:16:50,399 --> 00:16:54,560 the two columns the exclusive and 469 00:16:52,399 --> 00:16:56,959 inclusive total cpu time with their 470 00:16:54,560 --> 00:16:58,000 respective values and the name of the 471 00:16:56,959 --> 00:17:00,000 function 472 00:16:58,000 --> 00:17:01,759 the first line is 473 00:17:00,000 --> 00:17:03,839 requires a little bit of explanation 474 00:17:01,759 --> 00:17:04,640 that's called total you will always see 475 00:17:03,839 --> 00:17:07,600 it 476 00:17:04,640 --> 00:17:09,839 and that's generated by g profit g that 477 00:17:07,600 --> 00:17:12,480 has the total value for that metric in 478 00:17:09,839 --> 00:17:15,679 this case the total cpu time so what it 479 00:17:12,480 --> 00:17:16,799 tells us that the total job took 1.691 480 00:17:15,679 --> 00:17:18,120 seconds 481 00:17:16,799 --> 00:17:21,520 and out of that 482 00:17:18,120 --> 00:17:24,319 1.611 so pretty much almost everything 483 00:17:21,520 --> 00:17:25,679 was taken by a function called mxv core 484 00:17:24,319 --> 00:17:28,319 so if we ever want to do some 485 00:17:25,679 --> 00:17:30,640 optimization mxv underscore core is the 486 00:17:28,319 --> 00:17:33,280 one to to look at 487 00:17:30,640 --> 00:17:36,080 so that's a very basic 488 00:17:33,280 --> 00:17:38,160 but useful profile and 489 00:17:36,080 --> 00:17:41,039 the steps i've been showing you are 490 00:17:38,160 --> 00:17:42,640 really all that is needed to get it you 491 00:17:41,039 --> 00:17:45,679 first you collect the performance 492 00:17:42,640 --> 00:17:47,679 information using gprofanzcollect app 493 00:17:45,679 --> 00:17:50,640 and then you display the information 494 00:17:47,679 --> 00:17:52,960 using g preference e display text 495 00:17:50,640 --> 00:17:55,200 and that's all that it takes 496 00:17:52,960 --> 00:17:56,960 now there's a lot more to say about this 497 00:17:55,200 --> 00:17:59,640 and there's a lot more information that 498 00:17:56,960 --> 00:18:02,559 you can get out of even this simple 499 00:17:59,640 --> 00:18:04,480 test.one.er directory and for that i'll 500 00:18:02,559 --> 00:18:07,200 switch back to the slides and i'll show 501 00:18:04,480 --> 00:18:09,440 you lots of examples of all the kinds of 502 00:18:07,200 --> 00:18:12,400 information you can get so let's go back 503 00:18:09,440 --> 00:18:12,400 to the slides now 504 00:18:13,760 --> 00:18:17,840 well i hope you found the demo uh useful 505 00:18:16,480 --> 00:18:20,160 i now want to show you some more 506 00:18:17,840 --> 00:18:22,480 examples on the slides i just wanna 507 00:18:20,160 --> 00:18:24,000 these um this is this is kind of showing 508 00:18:22,480 --> 00:18:25,919 you all the kind of things you can do 509 00:18:24,000 --> 00:18:28,000 but actually there's even more i don't 510 00:18:25,919 --> 00:18:29,679 have time to cover that but these are 511 00:18:28,000 --> 00:18:31,280 some of the most common things that you 512 00:18:29,679 --> 00:18:33,360 may be interested in 513 00:18:31,280 --> 00:18:35,919 so i want to show you how to get a basic 514 00:18:33,360 --> 00:18:37,440 profile you just saw it in the demo but 515 00:18:35,919 --> 00:18:39,280 i wanted to again cover that on the 516 00:18:37,440 --> 00:18:41,440 slide because that's the very first step 517 00:18:39,280 --> 00:18:42,960 that you need to do then you start 518 00:18:41,440 --> 00:18:45,679 displaying the information at the 519 00:18:42,960 --> 00:18:47,520 function source disassembly level you 520 00:18:45,679 --> 00:18:49,520 can customize the data collection and 521 00:18:47,520 --> 00:18:51,760 the displays i want to show you the 522 00:18:49,520 --> 00:18:53,280 scripting i want to show the kind of 523 00:18:51,760 --> 00:18:55,200 things you can do with a multi-threaded 524 00:18:53,280 --> 00:18:58,960 application and i want to show you an 525 00:18:55,200 --> 00:19:00,480 example of comparing profiles 526 00:18:58,960 --> 00:19:03,039 you've just seen this in the demo but 527 00:19:00,480 --> 00:19:04,880 let me let me just repeat it 528 00:19:03,039 --> 00:19:07,200 if this is how you normally run your 529 00:19:04,880 --> 00:19:09,120 program which is is the matrix vector 530 00:19:07,200 --> 00:19:11,200 multiplication parallelized with posix 531 00:19:09,120 --> 00:19:13,760 threads all you need to do to get your 532 00:19:11,200 --> 00:19:16,000 profile is run your program 533 00:19:13,760 --> 00:19:18,960 as before but under control of the 534 00:19:16,000 --> 00:19:21,200 gprofng collect app command 535 00:19:18,960 --> 00:19:23,520 so that starts collecting the the 536 00:19:21,200 --> 00:19:25,919 application level performance data 537 00:19:23,520 --> 00:19:28,080 and as you saw in the demo it will 538 00:19:25,919 --> 00:19:28,960 create an experiment directory 539 00:19:28,080 --> 00:19:31,600 um 540 00:19:28,960 --> 00:19:33,600 i've talked about the naming there and 541 00:19:31,600 --> 00:19:36,320 we see here clearly we did not have any 542 00:19:33,600 --> 00:19:38,640 experiment before we did not specify the 543 00:19:36,320 --> 00:19:40,880 experiment name so the default name is 544 00:19:38,640 --> 00:19:42,799 test.1.er 545 00:19:40,880 --> 00:19:44,320 and that will contain our performance 546 00:19:42,799 --> 00:19:46,400 information 547 00:19:44,320 --> 00:19:48,960 so the next step is to start analyzing 548 00:19:46,400 --> 00:19:50,080 that information and although i i will 549 00:19:48,960 --> 00:19:53,200 show 550 00:19:50,080 --> 00:19:55,120 these commands um in a separate way you 551 00:19:53,200 --> 00:19:57,039 can of course combine them on a command 552 00:19:55,120 --> 00:19:58,720 line and get all that information with 553 00:19:57,039 --> 00:20:00,080 one command you don't have to do those 554 00:19:58,720 --> 00:20:01,840 individual 555 00:20:00,080 --> 00:20:04,000 typing of these individual commands you 556 00:20:01,840 --> 00:20:06,799 can concatenate those 557 00:20:04,000 --> 00:20:09,280 those options in a way i just want to 558 00:20:06,799 --> 00:20:10,559 say that because otherwise you you yeah 559 00:20:09,280 --> 00:20:11,840 you may find it 560 00:20:10,559 --> 00:20:13,840 inconvenient 561 00:20:11,840 --> 00:20:16,000 so let's say i want to get a basic 562 00:20:13,840 --> 00:20:18,159 function overview and all i need to do 563 00:20:16,000 --> 00:20:19,679 is add the functions option it's 564 00:20:18,159 --> 00:20:23,440 actually a command 565 00:20:19,679 --> 00:20:25,039 to the gpro display text tool and i feed 566 00:20:23,440 --> 00:20:27,600 it to the name of the experiment 567 00:20:25,039 --> 00:20:29,679 directory test.1 or dr and this is what 568 00:20:27,600 --> 00:20:31,760 i get on my screen again you've seen it 569 00:20:29,679 --> 00:20:34,240 in the demo i just want to repeat it 570 00:20:31,760 --> 00:20:37,280 here and i also want to mention again 571 00:20:34,240 --> 00:20:40,640 this function called total that that is 572 00:20:37,280 --> 00:20:42,720 created by the uh the gproff and g tool 573 00:20:40,640 --> 00:20:44,640 and that's the total time or the total 574 00:20:42,720 --> 00:20:46,080 metric i should say if it's cash misses 575 00:20:44,640 --> 00:20:47,919 it will show you the total number of 576 00:20:46,080 --> 00:20:49,600 misses so that's an artificially 577 00:20:47,919 --> 00:20:51,840 generated function it's always on the 578 00:20:49,600 --> 00:20:54,240 top because it includes everything 579 00:20:51,840 --> 00:20:56,000 so that that gives the whole measurement 580 00:20:54,240 --> 00:20:57,679 a reference so we can see what the total 581 00:20:56,000 --> 00:21:00,080 time is in this case we see the 582 00:20:57,679 --> 00:21:02,320 inclusive and exclusive times and we see 583 00:21:00,080 --> 00:21:04,320 the distribution over the various parts 584 00:21:02,320 --> 00:21:06,240 of the program 585 00:21:04,320 --> 00:21:08,799 another kind of nice little feature is 586 00:21:06,240 --> 00:21:10,400 the lines option that will show you the 587 00:21:08,799 --> 00:21:12,080 what we call hot lines those are the 588 00:21:10,400 --> 00:21:13,679 most time consuming lines so you can 589 00:21:12,080 --> 00:21:16,240 very quickly see where is your time 590 00:21:13,679 --> 00:21:19,440 spent and what we see in this case that 591 00:21:16,240 --> 00:21:22,960 the two top lines are in mxv core line 592 00:21:19,440 --> 00:21:26,320 36 and somewhat in line 37 but clearly 593 00:21:22,960 --> 00:21:27,679 line 36 in file mxv dot c 594 00:21:26,320 --> 00:21:30,400 is a very 595 00:21:27,679 --> 00:21:32,320 very important one for the performance 596 00:21:30,400 --> 00:21:33,760 usually as soon as you've seen the 597 00:21:32,320 --> 00:21:36,640 functions and maybe you've used the 598 00:21:33,760 --> 00:21:38,960 lines command to to tell you where the 599 00:21:36,640 --> 00:21:40,480 time is spent at the source line level 600 00:21:38,960 --> 00:21:42,640 you usually want to look at the source 601 00:21:40,480 --> 00:21:45,440 and you use the source option for that 602 00:21:42,640 --> 00:21:47,280 and i'm going to specify source and then 603 00:21:45,440 --> 00:21:50,400 i need to specify my target function 604 00:21:47,280 --> 00:21:52,000 which is in this case mxv core so that's 605 00:21:50,400 --> 00:21:52,799 uh that's where i want to see the source 606 00:21:52,000 --> 00:21:54,720 for 607 00:21:52,799 --> 00:21:56,159 and that's what i get so i get the 608 00:21:54,720 --> 00:21:59,760 source code 609 00:21:56,159 --> 00:22:01,520 and it's it has the timings in the 610 00:21:59,760 --> 00:22:03,520 the leftmost columns 611 00:22:01,520 --> 00:22:05,280 and um to make life a little easier we 612 00:22:03,520 --> 00:22:07,600 mark the most expensive one with the 613 00:22:05,280 --> 00:22:09,919 double hash symbol the color coding is 614 00:22:07,600 --> 00:22:11,520 mine by the way the red coloring that's 615 00:22:09,919 --> 00:22:12,400 on your terminal will all be the same 616 00:22:11,520 --> 00:22:14,720 color 617 00:22:12,400 --> 00:22:16,080 but um what it what it shows is where 618 00:22:14,720 --> 00:22:19,200 we're spending the time and those are 619 00:22:16,080 --> 00:22:20,000 those two lines 36 and 37 we've already 620 00:22:19,200 --> 00:22:22,080 seen 621 00:22:20,000 --> 00:22:24,640 uh with the lines overview 622 00:22:22,080 --> 00:22:26,159 so that's how easy it is to identify 623 00:22:24,640 --> 00:22:28,000 where in the source are you spending a 624 00:22:26,159 --> 00:22:29,840 lot of the time 625 00:22:28,000 --> 00:22:32,400 you can go deeper you can go down to the 626 00:22:29,840 --> 00:22:34,480 disassembly instruction level 627 00:22:32,400 --> 00:22:36,080 that's sometimes very helpful because it 628 00:22:34,480 --> 00:22:38,159 may not be clear when looking at the 629 00:22:36,080 --> 00:22:40,799 source where the time is spent so i'd 630 00:22:38,159 --> 00:22:42,320 like to do that and you get the same 631 00:22:40,799 --> 00:22:44,799 information but now at the instruction 632 00:22:42,320 --> 00:22:46,559 level and again i color coded the most 633 00:22:44,799 --> 00:22:48,400 interesting part red 634 00:22:46,559 --> 00:22:49,919 the tool won't do that for you 635 00:22:48,400 --> 00:22:52,240 but you will see the marker the double 636 00:22:49,919 --> 00:22:54,080 hash symbol and you can search for that 637 00:22:52,240 --> 00:22:56,159 and find out where most of the time is 638 00:22:54,080 --> 00:22:57,840 spent and now at the instruction level 639 00:22:56,159 --> 00:23:00,159 so you can see which instructions are 640 00:22:57,840 --> 00:23:02,320 the most expensive ones 641 00:23:00,159 --> 00:23:03,760 i mentioned customization a couple of 642 00:23:02,320 --> 00:23:06,400 times you have different ways to 643 00:23:03,760 --> 00:23:08,880 customize the information for example on 644 00:23:06,400 --> 00:23:10,880 the display text command 645 00:23:08,880 --> 00:23:12,559 you can limit the number of functions 646 00:23:10,880 --> 00:23:15,039 you want to see in this case and do a 647 00:23:12,559 --> 00:23:17,120 limit five and you can define the 648 00:23:15,039 --> 00:23:19,440 metrics you want to see 649 00:23:17,120 --> 00:23:22,240 we were showing the the function name 650 00:23:19,440 --> 00:23:24,559 the inclusive and exclusive times but i 651 00:23:22,240 --> 00:23:26,880 want to do it a little bit different 652 00:23:24,559 --> 00:23:29,200 what i want to show is i want to show 653 00:23:26,880 --> 00:23:31,280 the name of the function followed by the 654 00:23:29,200 --> 00:23:33,520 exclusive time that's the e 655 00:23:31,280 --> 00:23:35,600 i want to see the time that's the dot 656 00:23:33,520 --> 00:23:38,240 and i want to see a percentage i didn't 657 00:23:35,600 --> 00:23:39,760 show that so far so what i want to look 658 00:23:38,240 --> 00:23:41,919 at is the 659 00:23:39,760 --> 00:23:44,640 both the times and and the percentage 660 00:23:41,919 --> 00:23:47,440 distribution and that's what i need to 661 00:23:44,640 --> 00:23:49,520 uh to do to get it 662 00:23:47,440 --> 00:23:52,480 and this is what i see i see that that 663 00:23:49,520 --> 00:23:54,400 function mxv core is taking 95 percent 664 00:23:52,480 --> 00:23:55,520 of the time so it's by far dominating 665 00:23:54,400 --> 00:23:56,799 the entire 666 00:23:55,520 --> 00:23:58,480 uh run 667 00:23:56,799 --> 00:24:00,880 typing those things in gets starts to 668 00:23:58,480 --> 00:24:02,720 get tedious so what you really uh and 669 00:24:00,880 --> 00:24:04,960 probably end up doing is like what i do 670 00:24:02,720 --> 00:24:06,960 i put those commands in the script it 671 00:24:04,960 --> 00:24:09,039 also serves as a nice way to document 672 00:24:06,960 --> 00:24:11,200 how you got your profile 673 00:24:09,039 --> 00:24:13,120 we accept the hash line starting with 674 00:24:11,200 --> 00:24:15,600 the hash as as comments so you can add 675 00:24:13,120 --> 00:24:17,679 comments if you like as i'm showing here 676 00:24:15,600 --> 00:24:19,520 and the first blue line i defined the 677 00:24:17,679 --> 00:24:21,520 matrix to be the name 678 00:24:19,520 --> 00:24:23,520 followed by the inclusive 679 00:24:21,520 --> 00:24:25,440 total cpu time as a number and a 680 00:24:23,520 --> 00:24:28,159 percentage and the same for the 681 00:24:25,440 --> 00:24:31,120 exclusive as a number and a percentage 682 00:24:28,159 --> 00:24:33,279 i want to sort the timings 683 00:24:31,120 --> 00:24:34,960 relative to the total exclusive total 684 00:24:33,279 --> 00:24:36,960 cpu time it's one of those things you 685 00:24:34,960 --> 00:24:39,200 can you can select 686 00:24:36,960 --> 00:24:41,520 uh i only want to see five lines just in 687 00:24:39,200 --> 00:24:43,679 case just for the demonstration purposes 688 00:24:41,520 --> 00:24:45,919 and i want to see the function overview 689 00:24:43,679 --> 00:24:47,279 so with that that 690 00:24:45,919 --> 00:24:49,440 that script 691 00:24:47,279 --> 00:24:52,159 i feed that into the display text using 692 00:24:49,440 --> 00:24:54,000 the script option i specify that file 693 00:24:52,159 --> 00:24:55,919 and then i got an information 694 00:24:54,000 --> 00:24:58,480 information like this so now i got two 695 00:24:55,919 --> 00:25:00,320 tables inclusive and exclusive 696 00:24:58,480 --> 00:25:01,600 and what has been added are the 697 00:25:00,320 --> 00:25:03,360 percentages 698 00:25:01,600 --> 00:25:05,679 so now you can immediately see that the 699 00:25:03,360 --> 00:25:10,080 second most expensive function is called 700 00:25:05,679 --> 00:25:11,760 init data and inclusive time is 4.5 701 00:25:10,080 --> 00:25:13,840 so it doesn't contribute much but it's 702 00:25:11,760 --> 00:25:14,960 the second most expensive function in my 703 00:25:13,840 --> 00:25:17,520 program 704 00:25:14,960 --> 00:25:19,919 you can customize the data collection 705 00:25:17,520 --> 00:25:22,480 part i just show a simple example here i 706 00:25:19,919 --> 00:25:24,080 already sort of alluded to it you can 707 00:25:22,480 --> 00:25:25,600 you can give the experiment a more 708 00:25:24,080 --> 00:25:27,279 meaningful name 709 00:25:25,600 --> 00:25:28,400 we have a lowercase option and an 710 00:25:27,279 --> 00:25:29,760 uppercase 711 00:25:28,400 --> 00:25:32,000 o option 712 00:25:29,760 --> 00:25:35,120 the lowercase o option will not 713 00:25:32,000 --> 00:25:37,279 overwrite an existing experiment the 714 00:25:35,120 --> 00:25:39,200 capital o option will overwrite the 715 00:25:37,279 --> 00:25:40,720 existing experiment i like that a lot 716 00:25:39,200 --> 00:25:42,640 but if you want to be cautious and not 717 00:25:40,720 --> 00:25:44,880 override existing experiments you can 718 00:25:42,640 --> 00:25:47,279 use the lower case so in this way i can 719 00:25:44,880 --> 00:25:49,919 give the experiment any sort of 720 00:25:47,279 --> 00:25:51,600 meaningful name or whatever name i like 721 00:25:49,919 --> 00:25:54,400 there's more things you can customize 722 00:25:51,600 --> 00:25:56,400 you can set sampling granularity you can 723 00:25:54,400 --> 00:25:58,400 add labels to the experiment like 724 00:25:56,400 --> 00:26:00,720 comments and they'll be shown by the 725 00:25:58,400 --> 00:26:03,200 tools so you can do some sort of simple 726 00:26:00,720 --> 00:26:05,039 form of documenting your experiment i 727 00:26:03,200 --> 00:26:07,279 already mentioned you can archive 728 00:26:05,039 --> 00:26:08,960 including sources objects and so forth 729 00:26:07,279 --> 00:26:12,400 in the experiment and package the whole 730 00:26:08,960 --> 00:26:14,320 thing up support for multi-threading 731 00:26:12,400 --> 00:26:16,480 there's nothing you need to to change 732 00:26:14,320 --> 00:26:18,640 you run your multi-threaded code as 733 00:26:16,480 --> 00:26:20,960 before in this case the only change is 734 00:26:18,640 --> 00:26:23,760 i'm running my matrix vector code 735 00:26:20,960 --> 00:26:25,760 and i i use the dash capital o to give 736 00:26:23,760 --> 00:26:26,640 the experiment a somewhat meaningful 737 00:26:25,760 --> 00:26:30,240 name 738 00:26:26,640 --> 00:26:31,520 and i use the t2 option to request two 739 00:26:30,240 --> 00:26:34,159 threads 740 00:26:31,520 --> 00:26:37,760 so i run it as before again under 741 00:26:34,159 --> 00:26:38,799 control of gproff ng collect 742 00:26:37,760 --> 00:26:41,600 and 743 00:26:38,799 --> 00:26:43,760 i can then start analyzing that data and 744 00:26:41,600 --> 00:26:45,520 for example here i'm i'm playing a bit 745 00:26:43,760 --> 00:26:47,760 with the metrics i change the metrics 746 00:26:45,520 --> 00:26:50,320 but i also want to highlight the threats 747 00:26:47,760 --> 00:26:53,600 command that will give me information 748 00:26:50,320 --> 00:26:56,640 across the threads what is showing me is 749 00:26:53,600 --> 00:26:58,480 the metric values that are requested 750 00:26:56,640 --> 00:27:00,799 for all threads and in this case i'm 751 00:26:58,480 --> 00:27:02,799 looking at the total cpu time and that's 752 00:27:00,799 --> 00:27:04,960 a great way to find a load imbalance in 753 00:27:02,799 --> 00:27:06,720 your multi-threaded application 754 00:27:04,960 --> 00:27:08,720 for example here and this is again a 755 00:27:06,720 --> 00:27:10,400 very simple example we see that uh 756 00:27:08,720 --> 00:27:12,799 threat number two and three they both 757 00:27:10,400 --> 00:27:13,840 take about one second about 47 percent 758 00:27:12,799 --> 00:27:16,240 of the time 759 00:27:13,840 --> 00:27:19,039 but threat number one hardly takes any 760 00:27:16,240 --> 00:27:20,480 time it's five percent of the total 761 00:27:19,039 --> 00:27:22,640 and that's by design that's like the 762 00:27:20,480 --> 00:27:24,720 control thread it's in charge of 763 00:27:22,640 --> 00:27:27,679 executing the program but it will not do 764 00:27:24,720 --> 00:27:30,399 any of the computations so it's mostly 765 00:27:27,679 --> 00:27:33,039 doing the administrative work behind the 766 00:27:30,399 --> 00:27:35,200 scenes to make things happen so this is 767 00:27:33,039 --> 00:27:36,799 perfectly understandable but i wanted to 768 00:27:35,200 --> 00:27:38,880 show it to you that this is a really 769 00:27:36,799 --> 00:27:40,240 good way to find load balance issues in 770 00:27:38,880 --> 00:27:43,120 an application 771 00:27:40,240 --> 00:27:45,600 as before you can put things into a into 772 00:27:43,120 --> 00:27:47,360 a file and your commands and here i'm 773 00:27:45,600 --> 00:27:49,679 showing you the filtering 774 00:27:47,360 --> 00:27:51,440 so the red commands select a certain 775 00:27:49,679 --> 00:27:53,279 thread it doesn't have to be one it can 776 00:27:51,440 --> 00:27:56,000 be a range of threads it can be any sort 777 00:27:53,279 --> 00:27:58,640 of arbitrary combination but one what i 778 00:27:56,000 --> 00:28:00,240 want to show here is that for each 779 00:27:58,640 --> 00:28:01,600 thread i want to see the function 780 00:28:00,240 --> 00:28:03,600 profile i don't want to see the 781 00:28:01,600 --> 00:28:05,600 accumulated profile as i was just 782 00:28:03,600 --> 00:28:08,000 showing i want to show it for each 783 00:28:05,600 --> 00:28:09,520 function on an individual basis so i 784 00:28:08,000 --> 00:28:12,000 want to see it with that one two and 785 00:28:09,520 --> 00:28:13,120 three and here's what here's what 786 00:28:12,000 --> 00:28:15,279 happens 787 00:28:13,120 --> 00:28:16,480 i i got the information for thread 788 00:28:15,279 --> 00:28:18,559 number one 789 00:28:16,480 --> 00:28:20,559 and that's confirmed in the the part 790 00:28:18,559 --> 00:28:22,960 that i color-coded red out of the three 791 00:28:20,559 --> 00:28:25,120 i'm selecting thread number one and it 792 00:28:22,960 --> 00:28:27,679 shows me where the time is spent 793 00:28:25,120 --> 00:28:30,080 and likewise you can do the same for 794 00:28:27,679 --> 00:28:31,919 thread number two and number three and 795 00:28:30,080 --> 00:28:33,919 again that will that will tell you on a 796 00:28:31,919 --> 00:28:36,240 thread by thread basis where the time is 797 00:28:33,919 --> 00:28:37,760 spent and that can really help that in 798 00:28:36,240 --> 00:28:40,159 combination with the load balancing 799 00:28:37,760 --> 00:28:42,320 information can really help you to 800 00:28:40,159 --> 00:28:44,080 quickly find out why is that thread so 801 00:28:42,320 --> 00:28:45,120 slow why is it much lower than the other 802 00:28:44,080 --> 00:28:47,760 ones 803 00:28:45,120 --> 00:28:49,679 i i said it several times i 804 00:28:47,760 --> 00:28:51,440 i'm really happy with our comparison of 805 00:28:49,679 --> 00:28:53,679 profiles feature and i now want to show 806 00:28:51,440 --> 00:28:55,520 you a simple example 807 00:28:53,679 --> 00:28:57,760 i'm also using the hardware event 808 00:28:55,520 --> 00:29:00,159 counter support with the dash h option 809 00:28:57,760 --> 00:29:02,320 as shown in this dotted box 810 00:29:00,159 --> 00:29:04,399 and i want to look at last level cache 811 00:29:02,320 --> 00:29:06,480 misses llm 812 00:29:04,399 --> 00:29:08,799 so i'm asking the tool 813 00:29:06,480 --> 00:29:10,320 show me the last level cache misses and 814 00:29:08,799 --> 00:29:11,279 i want to do that for one and two 815 00:29:10,320 --> 00:29:13,440 threads 816 00:29:11,279 --> 00:29:15,600 so now i have two profiles one for one 817 00:29:13,440 --> 00:29:16,880 thread one for two threads and i'd like 818 00:29:15,600 --> 00:29:18,960 to compare 819 00:29:16,880 --> 00:29:20,720 well how do i do that well 820 00:29:18,960 --> 00:29:22,720 again i put things in a script as i 821 00:29:20,720 --> 00:29:25,600 always do and that script is pretty 822 00:29:22,720 --> 00:29:28,000 simple i limit the output to five lines 823 00:29:25,600 --> 00:29:31,679 i only want to look at the the last 824 00:29:28,000 --> 00:29:33,840 level misses and i enable comparison on 825 00:29:31,679 --> 00:29:35,440 so i want to compare 826 00:29:33,840 --> 00:29:37,360 the numbers and that will give me the 827 00:29:35,440 --> 00:29:39,440 table that is shown here 828 00:29:37,360 --> 00:29:40,480 so now for both experiments i get the 829 00:29:39,440 --> 00:29:43,120 values 830 00:29:40,480 --> 00:29:44,159 and you can you can side by side compare 831 00:29:43,120 --> 00:29:47,440 them 832 00:29:44,159 --> 00:29:49,440 so that's very very useful but 833 00:29:47,440 --> 00:29:51,200 on purpose i pick this experiment 834 00:29:49,440 --> 00:29:53,600 because you can also see the numbers are 835 00:29:51,200 --> 00:29:54,640 very large and sometimes it's even hard 836 00:29:53,600 --> 00:29:57,039 to say 837 00:29:54,640 --> 00:29:59,919 to see whether something is smaller or 838 00:29:57,039 --> 00:30:01,120 bigger so we have a flavors of the 839 00:29:59,919 --> 00:30:02,960 comparison 840 00:30:01,120 --> 00:30:03,919 where in a different version of that 841 00:30:02,960 --> 00:30:07,440 script 842 00:30:03,919 --> 00:30:09,440 i asked the tool to show me the ratios 843 00:30:07,440 --> 00:30:11,279 and now i get the ratios and can 844 00:30:09,440 --> 00:30:13,679 immediately see which one is lower and 845 00:30:11,279 --> 00:30:16,000 which one is higher so we see that the 846 00:30:13,679 --> 00:30:18,720 total number for some reason the total 847 00:30:16,000 --> 00:30:20,720 number of last level misses drops when 848 00:30:18,720 --> 00:30:22,559 i'm running on two threads 849 00:30:20,720 --> 00:30:25,600 not really surprised because i'm using 850 00:30:22,559 --> 00:30:30,000 more cash base but that's shown here so 851 00:30:25,600 --> 00:30:31,679 now um i got about a 20 22 reduction 852 00:30:30,000 --> 00:30:33,279 in the number of cash misses so that's 853 00:30:31,679 --> 00:30:35,520 kind of nice nice little bonus of 854 00:30:33,279 --> 00:30:38,640 running in parallel and with the compare 855 00:30:35,520 --> 00:30:40,480 ratio option that's really easy to get 856 00:30:38,640 --> 00:30:42,240 you can also look at deltas by the way i 857 00:30:40,480 --> 00:30:44,799 don't have that on the slide but another 858 00:30:42,240 --> 00:30:47,440 one is to to look at delta 859 00:30:44,799 --> 00:30:48,159 i've shown you several features already 860 00:30:47,440 --> 00:30:51,039 but 861 00:30:48,159 --> 00:30:52,960 there's a lot more to uh to explore i 862 00:30:51,039 --> 00:30:54,559 didn't even talk about the call tree we 863 00:30:52,960 --> 00:30:56,320 have more filters 864 00:30:54,559 --> 00:30:58,080 you can get a lot more information on 865 00:30:56,320 --> 00:31:00,799 the experiments and the conditions like 866 00:30:58,080 --> 00:31:02,399 uh the job you were running there's more 867 00:31:00,799 --> 00:31:05,360 customization 868 00:31:02,399 --> 00:31:07,679 and we have pretty decent support for 869 00:31:05,360 --> 00:31:09,360 hardware event counters i've shown you 870 00:31:07,679 --> 00:31:11,120 one example but for example you can 871 00:31:09,360 --> 00:31:13,679 monitor more than one event i've shown 872 00:31:11,120 --> 00:31:15,600 only one event you can monitor as many 873 00:31:13,679 --> 00:31:16,799 events as the hardware supports for 874 00:31:15,600 --> 00:31:18,880 example 875 00:31:16,799 --> 00:31:21,279 so hopefully this gave you um some sort 876 00:31:18,880 --> 00:31:23,360 of idea of what griffin g is about this 877 00:31:21,279 --> 00:31:25,039 is all available today these examples 878 00:31:23,360 --> 00:31:27,440 were made with the version that's 879 00:31:25,039 --> 00:31:29,440 available in our repo 880 00:31:27,440 --> 00:31:30,720 and the details are in the in the bin 881 00:31:29,440 --> 00:31:33,360 neutrals 882 00:31:30,720 --> 00:31:34,960 archives so what are what are our 883 00:31:33,360 --> 00:31:37,519 current priorities well first of all 884 00:31:34,960 --> 00:31:39,200 fixed bugs if you have a bug we hope not 885 00:31:37,519 --> 00:31:40,880 but if you run into a bug please let us 886 00:31:39,200 --> 00:31:43,039 know we want to fix that as quickly as 887 00:31:40,880 --> 00:31:45,279 possible we also want to help you 888 00:31:43,039 --> 00:31:47,840 getting started 889 00:31:45,279 --> 00:31:49,679 and outside of those two top priorities 890 00:31:47,840 --> 00:31:52,399 for development we're working on a 891 00:31:49,679 --> 00:31:54,720 porting guide we currently support intel 892 00:31:52,399 --> 00:31:56,240 amd and arm 893 00:31:54,720 --> 00:31:58,000 if people want to port to another 894 00:31:56,240 --> 00:31:59,760 platform it's open source we would 895 00:31:58,000 --> 00:32:00,399 encourage that would like to work with 896 00:31:59,760 --> 00:32:02,640 you 897 00:32:00,399 --> 00:32:04,799 on that and we're working on a document 898 00:32:02,640 --> 00:32:05,840 that tells you where changes need to be 899 00:32:04,799 --> 00:32:08,080 made 900 00:32:05,840 --> 00:32:10,159 we're also working on a gui 901 00:32:08,080 --> 00:32:12,840 which is a really good way to display 902 00:32:10,159 --> 00:32:15,840 and analyze that data that really adds 903 00:32:12,840 --> 00:32:18,159 value and we're busy finishing a tool 904 00:32:15,840 --> 00:32:20,799 that generates an html based system to 905 00:32:18,159 --> 00:32:21,919 browse the data in your in your daily 906 00:32:20,799 --> 00:32:23,440 browser 907 00:32:21,919 --> 00:32:26,480 other things we're looking at but we're 908 00:32:23,440 --> 00:32:28,320 definitely open for suggestions is 909 00:32:26,480 --> 00:32:30,399 improved support for hardware event 910 00:32:28,320 --> 00:32:31,919 counters in particular or more recent 911 00:32:30,399 --> 00:32:33,600 processors 912 00:32:31,919 --> 00:32:34,720 uh we'd like to show you some more 913 00:32:33,600 --> 00:32:36,480 metrics 914 00:32:34,720 --> 00:32:38,880 another thing is that we'd like to have 915 00:32:36,480 --> 00:32:40,960 the tool the ability to attach to a 916 00:32:38,880 --> 00:32:43,760 running processor you grab the process 917 00:32:40,960 --> 00:32:45,600 you monitor you let go again so you can 918 00:32:43,760 --> 00:32:46,640 take a snapshot while the program is 919 00:32:45,600 --> 00:32:48,559 running 920 00:32:46,640 --> 00:32:50,480 please send your feedback or anything 921 00:32:48,559 --> 00:32:53,039 you want to discuss with us to the bin 922 00:32:50,480 --> 00:32:55,039 utils alias it's shown here 923 00:32:53,039 --> 00:32:56,399 that's the recommended way to 924 00:32:55,039 --> 00:32:58,159 communicate 925 00:32:56,399 --> 00:33:01,120 last but not least i want to give you a 926 00:32:58,159 --> 00:33:02,159 sneak preview into the gui 927 00:33:01,120 --> 00:33:03,840 we have 928 00:33:02,159 --> 00:33:05,360 quite a number of views including a 929 00:33:03,840 --> 00:33:07,120 flame chart 930 00:33:05,360 --> 00:33:09,440 we show you a timeline that's the 931 00:33:07,120 --> 00:33:11,600 color-coded behavior of the program over 932 00:33:09,440 --> 00:33:13,840 time so you can see the dynamics of the 933 00:33:11,600 --> 00:33:15,200 program this is showing a multi-threaded 934 00:33:13,840 --> 00:33:16,720 application you can look at the 935 00:33:15,200 --> 00:33:18,960 individual threads 936 00:33:16,720 --> 00:33:21,360 you can zoom in to really zoom in on 937 00:33:18,960 --> 00:33:23,039 certain behavior like to intro like to 938 00:33:21,360 --> 00:33:24,640 zoom in on 939 00:33:23,039 --> 00:33:26,559 and of course we have the comparison the 940 00:33:24,640 --> 00:33:29,600 comparison works in the gui as well and 941 00:33:26,559 --> 00:33:32,480 it's a great way to quickly find find 942 00:33:29,600 --> 00:33:34,960 out why run why one run for example is 943 00:33:32,480 --> 00:33:36,799 is slower than another one 944 00:33:34,960 --> 00:33:38,720 so um 945 00:33:36,799 --> 00:33:40,640 i hope this gave you a good overview 946 00:33:38,720 --> 00:33:44,240 first overview of what g profit g is 947 00:33:40,640 --> 00:33:47,600 about and uh let's now um spend time on 948 00:33:44,240 --> 00:33:50,000 answering your questions thank you 949 00:33:47,600 --> 00:33:51,519 thank you so much for that talk rude um 950 00:33:50,000 --> 00:33:53,039 we've got plenty of time now we've got 951 00:33:51,519 --> 00:33:55,200 like 11 minutes to do some more 952 00:33:53,039 --> 00:33:57,519 conversation and questions about it so 953 00:33:55,200 --> 00:33:58,320 i'll let you do your quick talk if you 954 00:33:57,519 --> 00:34:00,559 want 955 00:33:58,320 --> 00:34:02,720 yep all right well thank you okay 956 00:34:00,559 --> 00:34:04,640 um yeah i hope you found it useful we 957 00:34:02,720 --> 00:34:06,960 both vladimir and i are very excited 958 00:34:04,640 --> 00:34:08,639 about this we we know we have a long way 959 00:34:06,960 --> 00:34:10,399 to go there's lots of things that we 960 00:34:08,639 --> 00:34:12,639 want to add and 961 00:34:10,399 --> 00:34:16,000 and it's quite a quite an effort so 962 00:34:12,639 --> 00:34:18,800 definitely as i um as i put on the slide 963 00:34:16,000 --> 00:34:22,000 uh please contact us preferably through 964 00:34:18,800 --> 00:34:24,079 the bin utils male alias that's we have 965 00:34:22,000 --> 00:34:26,240 just been approved for inclusion in bin 966 00:34:24,079 --> 00:34:28,480 neutrals and we're trying to get onto 967 00:34:26,240 --> 00:34:31,520 the next release not the first coming up 968 00:34:28,480 --> 00:34:33,599 release but maybe the next one so about 969 00:34:31,520 --> 00:34:35,520 mid mid this year 970 00:34:33,599 --> 00:34:39,040 and again we really want to hear your 971 00:34:35,520 --> 00:34:40,960 wishes and prioritize them and and 972 00:34:39,040 --> 00:34:44,159 and of course give you give you what 973 00:34:40,960 --> 00:34:46,800 you're asking for um there are two 974 00:34:44,159 --> 00:34:48,720 developments that i wanted to mention um 975 00:34:46,800 --> 00:34:51,359 the gui the gui is 976 00:34:48,720 --> 00:34:53,440 is in fairly good shape um it's not done 977 00:34:51,359 --> 00:34:55,760 yet but we think that with the gui 978 00:34:53,440 --> 00:34:57,440 especially the timeline 979 00:34:55,760 --> 00:34:59,599 the timeline will 980 00:34:57,440 --> 00:35:02,000 will show you color-coded behavior of 981 00:34:59,599 --> 00:35:02,960 your application so every function has a 982 00:35:02,000 --> 00:35:05,040 color 983 00:35:02,960 --> 00:35:07,119 and we show you the color-coded call 984 00:35:05,040 --> 00:35:08,720 stacks as a function of time and and 985 00:35:07,119 --> 00:35:11,760 that'll work in multi-threaded 986 00:35:08,720 --> 00:35:13,839 environments as well so the question was 987 00:35:11,760 --> 00:35:15,520 can i see wait time yes you will 988 00:35:13,839 --> 00:35:18,320 definitely see wait time you literally 989 00:35:15,520 --> 00:35:21,040 see a gap there if if a thread is not 990 00:35:18,320 --> 00:35:23,359 doing anything it'll show up as a gap 991 00:35:21,040 --> 00:35:25,440 now we want to show more metrics 992 00:35:23,359 --> 00:35:27,760 currently we only show the total cpu 993 00:35:25,440 --> 00:35:29,119 time and i'm the first one to admit that 994 00:35:27,760 --> 00:35:31,440 that's not enough 995 00:35:29,119 --> 00:35:33,200 for serious performance analysis so it's 996 00:35:31,440 --> 00:35:34,880 another thing high on the list to give 997 00:35:33,200 --> 00:35:37,599 you more metrics 998 00:35:34,880 --> 00:35:39,760 a little bit unfortunate is that we 999 00:35:37,599 --> 00:35:41,839 we can't always get out of the linux 1000 00:35:39,760 --> 00:35:43,760 kernel what we um 1001 00:35:41,839 --> 00:35:45,680 what we'd like to do but there's always 1002 00:35:43,760 --> 00:35:47,920 room for improvement so again your 1003 00:35:45,680 --> 00:35:49,920 feedback there is is also definitely 1004 00:35:47,920 --> 00:35:52,400 appreciated but i think the timeline 1005 00:35:49,920 --> 00:35:55,040 will help a lot of people to analyze the 1006 00:35:52,400 --> 00:35:56,320 dynamic behavior of their program i use 1007 00:35:55,040 --> 00:35:58,720 it all the time 1008 00:35:56,320 --> 00:36:00,000 in addition to that um and that's my 1009 00:35:58,720 --> 00:36:01,839 this is why i couldn't answer the 1010 00:36:00,000 --> 00:36:03,359 question about ptrace that's like 1011 00:36:01,839 --> 00:36:05,839 vladimir's 1012 00:36:03,359 --> 00:36:08,400 vladimir's territory i'm working on a 1013 00:36:05,839 --> 00:36:10,320 tool to generate an html directory 1014 00:36:08,400 --> 00:36:12,400 structure so you can browse through the 1015 00:36:10,320 --> 00:36:14,640 performance data in your browser you 1016 00:36:12,400 --> 00:36:16,079 don't need anything special you you load 1017 00:36:14,640 --> 00:36:18,560 up the directory 1018 00:36:16,079 --> 00:36:20,800 and patch it in and and 1019 00:36:18,560 --> 00:36:22,000 grab your browser and look at it so 1020 00:36:20,800 --> 00:36:23,440 that's some of the things that we're 1021 00:36:22,000 --> 00:36:24,839 working on now let me look at the 1022 00:36:23,440 --> 00:36:26,720 questions 1023 00:36:24,839 --> 00:36:29,200 um 1024 00:36:26,720 --> 00:36:31,040 let's see 1025 00:36:29,200 --> 00:36:32,320 i think that's more like a 1026 00:36:31,040 --> 00:36:34,640 comment 1027 00:36:32,320 --> 00:36:36,480 to diagnose issues to diagnose issues 1028 00:36:34,640 --> 00:36:38,480 like the directory okay you can all see 1029 00:36:36,480 --> 00:36:43,280 that i uh 1030 00:36:38,480 --> 00:36:43,280 and uh okay yeah yeah 1031 00:36:44,400 --> 00:36:50,480 yeah so that was the comment indeed i 1032 00:36:46,880 --> 00:36:52,839 think that's a comment about when we uh 1033 00:36:50,480 --> 00:36:55,520 when we attach to a running 1034 00:36:52,839 --> 00:36:57,280 process we have some ideas what we can 1035 00:36:55,520 --> 00:36:58,880 do there but right now you only get the 1036 00:36:57,280 --> 00:37:01,280 main process and 1037 00:36:58,880 --> 00:37:03,280 now i i think that's we don't even 1038 00:37:01,280 --> 00:37:05,440 mention this because i think that's a 1039 00:37:03,280 --> 00:37:07,280 non-feature at the moment 1040 00:37:05,440 --> 00:37:09,680 so um 1041 00:37:07,280 --> 00:37:12,480 so yeah the 1042 00:37:09,680 --> 00:37:14,960 um so again attaching to a running 1043 00:37:12,480 --> 00:37:17,200 process is another very very big big 1044 00:37:14,960 --> 00:37:18,880 thing a lot of applications have that 1045 00:37:17,200 --> 00:37:21,599 and you just want to monitor it while 1046 00:37:18,880 --> 00:37:22,560 it's running um maybe for a few minutes 1047 00:37:21,599 --> 00:37:25,599 whatever 1048 00:37:22,560 --> 00:37:28,240 so um yeah we we we recognized that and 1049 00:37:25,599 --> 00:37:31,040 um that's um i would say very yeah 1050 00:37:28,240 --> 00:37:32,320 definitely in the top three 1051 00:37:31,040 --> 00:37:33,920 uh would you like me to read out the 1052 00:37:32,320 --> 00:37:34,640 other open questions in the questions 1053 00:37:33,920 --> 00:37:36,480 tim 1054 00:37:34,640 --> 00:37:39,599 oh let me see let me yeah i can do it 1055 00:37:36,480 --> 00:37:41,359 okay thanks um 1056 00:37:39,599 --> 00:37:44,480 have you run from g prof and g on g 1057 00:37:41,359 --> 00:37:47,119 profangie um yes we do 1058 00:37:44,480 --> 00:37:49,839 we we try to be very performance aware 1059 00:37:47,119 --> 00:37:51,680 of course um we didn't do that recently 1060 00:37:49,839 --> 00:37:53,520 because to be honest we were just so 1061 00:37:51,680 --> 00:37:56,160 focused on getting something out of the 1062 00:37:53,520 --> 00:37:58,720 door but yes um yeah 1063 00:37:56,160 --> 00:38:00,800 overhead that's um yeah that's another 1064 00:37:58,720 --> 00:38:01,920 good um good question 1065 00:38:00,800 --> 00:38:03,680 uh 1066 00:38:01,920 --> 00:38:06,320 usually it all depends on what you're 1067 00:38:03,680 --> 00:38:08,800 doing um just this weekend i got an 1068 00:38:06,320 --> 00:38:11,440 email from somebody i think um his he 1069 00:38:08,800 --> 00:38:13,839 was using 160 threads 1070 00:38:11,440 --> 00:38:17,040 uh or around that number because that's 1071 00:38:13,839 --> 00:38:19,920 what his hardware had and on a very 1072 00:38:17,040 --> 00:38:20,720 intensive java program and he reported 1073 00:38:19,920 --> 00:38:24,240 10 1074 00:38:20,720 --> 00:38:26,079 10 is about what we see as a maximum 1075 00:38:24,240 --> 00:38:28,640 um of course 1076 00:38:26,079 --> 00:38:30,880 it depends on the sampling granularity 1077 00:38:28,640 --> 00:38:34,000 you can put that to low or high 1078 00:38:30,880 --> 00:38:36,400 but i would say um anything more than 10 1079 00:38:34,000 --> 00:38:38,400 would would kind of uh warrant our 1080 00:38:36,400 --> 00:38:39,839 investigation to see what's going on 1081 00:38:38,400 --> 00:38:41,760 yeah 1082 00:38:39,839 --> 00:38:44,880 uh 1083 00:38:41,760 --> 00:38:47,200 about language yeah another the is there 1084 00:38:44,880 --> 00:38:48,720 okay let me answer first uh give the 1085 00:38:47,200 --> 00:38:51,920 question is that thinking about 1086 00:38:48,720 --> 00:38:53,839 languages built on top of c like python 1087 00:38:51,920 --> 00:38:56,640 yeah we would like to do that of course 1088 00:38:53,839 --> 00:38:59,200 um you can't already do some basic 1089 00:38:56,640 --> 00:39:01,200 profiling but you'll you'll get exposed 1090 00:38:59,200 --> 00:39:02,160 to the raw call stacks of whatever you 1091 00:39:01,200 --> 00:39:05,760 do 1092 00:39:02,160 --> 00:39:07,520 um and um but definitely uh 1093 00:39:05,760 --> 00:39:10,400 would be great to have like a tool for 1094 00:39:07,520 --> 00:39:12,480 python or rust or other things so 1095 00:39:10,400 --> 00:39:14,320 um yeah that's also actually where we 1096 00:39:12,480 --> 00:39:16,880 hope the community will help and 1097 00:39:14,320 --> 00:39:19,200 volunteers are always welcome so 1098 00:39:16,880 --> 00:39:20,960 so technically we can 1099 00:39:19,200 --> 00:39:24,240 the infrastructure would support it but 1100 00:39:20,960 --> 00:39:26,160 we need higher high level support 1101 00:39:24,240 --> 00:39:30,079 how does g profit g compared to other 1102 00:39:26,160 --> 00:39:32,000 system level profiling tools like perf 1103 00:39:30,079 --> 00:39:34,560 me being a performance analyst i would 1104 00:39:32,000 --> 00:39:36,480 like to have both we certainly do not 1105 00:39:34,560 --> 00:39:38,480 have any idea to 1106 00:39:36,480 --> 00:39:40,320 compete or replace perth i think perth 1107 00:39:38,480 --> 00:39:41,920 is very strong in certain areas 1108 00:39:40,320 --> 00:39:45,200 especially like the hardware counter 1109 00:39:41,920 --> 00:39:47,839 support and improv is really strong 1110 00:39:45,200 --> 00:39:50,480 we try to also cover other areas i think 1111 00:39:47,839 --> 00:39:52,240 like our display 1112 00:39:50,480 --> 00:39:54,880 display capabilities are maybe a little 1113 00:39:52,240 --> 00:39:56,480 easier than perth but 1114 00:39:54,880 --> 00:39:58,160 the more the better again as a 1115 00:39:56,480 --> 00:39:59,920 performance person i want to have as 1116 00:39:58,160 --> 00:40:01,920 many tools as possible 1117 00:39:59,920 --> 00:40:02,960 so uh yeah 1118 00:40:01,920 --> 00:40:05,200 um 1119 00:40:02,960 --> 00:40:07,440 let me see 1120 00:40:05,200 --> 00:40:08,640 can i get the file name line info so 1121 00:40:07,440 --> 00:40:11,200 that 1122 00:40:08,640 --> 00:40:14,160 i'm not sure can i get the file name 1123 00:40:11,200 --> 00:40:15,520 colon conan line hash info 1124 00:40:14,160 --> 00:40:17,680 so that m 1125 00:40:15,520 --> 00:40:19,599 x next arrow works 1126 00:40:17,680 --> 00:40:21,520 i'm not sure i understand that i'm sorry 1127 00:40:19,599 --> 00:40:23,040 i don't think i understand the question 1128 00:40:21,520 --> 00:40:25,200 that's 1129 00:40:23,040 --> 00:40:26,160 um 1130 00:40:25,200 --> 00:40:27,920 we 1131 00:40:26,160 --> 00:40:30,480 do have 1132 00:40:27,920 --> 00:40:32,960 ideas to integrate 1133 00:40:30,480 --> 00:40:34,800 compiler messages with the source 1134 00:40:32,960 --> 00:40:36,160 the original tool in the studio 1135 00:40:34,800 --> 00:40:38,319 environment had that but that was 1136 00:40:36,160 --> 00:40:40,240 integrated into the studio compiler so 1137 00:40:38,319 --> 00:40:41,119 another project we have on our list is 1138 00:40:40,240 --> 00:40:43,359 to 1139 00:40:41,119 --> 00:40:44,800 give you compiler feedback interleaved 1140 00:40:43,359 --> 00:40:46,319 with the source 1141 00:40:44,800 --> 00:40:47,680 would be great to see what the compiler 1142 00:40:46,319 --> 00:40:50,319 did to your source 1143 00:40:47,680 --> 00:40:52,400 so you can check it and correct it if if 1144 00:40:50,319 --> 00:40:55,040 it doesn't do what you like it to do so 1145 00:40:52,400 --> 00:40:56,720 um but i'm i'm sorry i'm afraid i don't 1146 00:40:55,040 --> 00:40:59,760 really 1147 00:40:56,720 --> 00:40:59,760 understand that question 1148 00:41:00,560 --> 00:41:04,280 and see in the chat 1149 00:41:07,839 --> 00:41:10,160 okay 1150 00:41:15,520 --> 00:41:22,800 i've got two more questions okay yeah 1151 00:41:19,280 --> 00:41:24,079 oh i'm sorry i'm a vi guy um the the 1152 00:41:22,800 --> 00:41:26,640 question here is 1153 00:41:24,079 --> 00:41:30,640 amex next ever refers to emacs compile 1154 00:41:26,640 --> 00:41:32,640 mode i'm i'm sorry i i grew up with vi 1155 00:41:30,640 --> 00:41:35,119 and i never got beyond that even 1156 00:41:32,640 --> 00:41:37,760 switching to vim was already a major 1157 00:41:35,119 --> 00:41:41,520 breakthrough in emotionally 1158 00:41:37,760 --> 00:41:43,119 i'm sorry i'm sorry i i i i i i'm afraid 1159 00:41:41,520 --> 00:41:45,440 i can't help you 1160 00:41:43,119 --> 00:41:45,440 um 1161 00:41:46,240 --> 00:41:49,200 um 1162 00:41:47,200 --> 00:41:51,200 okay the the other question is i guess 1163 00:41:49,200 --> 00:41:53,119 the idea is to keep your performance 1164 00:41:51,200 --> 00:41:55,200 information in some sort of revision 1165 00:41:53,119 --> 00:41:57,280 control so that you can monitor 1166 00:41:55,200 --> 00:42:00,160 performance regressions 1167 00:41:57,280 --> 00:42:02,319 yeah that's currently up to you we we 1168 00:42:00,160 --> 00:42:05,119 we don't provide anything like that you 1169 00:42:02,319 --> 00:42:07,520 know ideally you would use this tool as 1170 00:42:05,119 --> 00:42:09,040 a as a lower level engine to drive those 1171 00:42:07,520 --> 00:42:11,359 kind of things to 1172 00:42:09,040 --> 00:42:13,040 to keep track of of um of like 1173 00:42:11,359 --> 00:42:14,800 regressions 1174 00:42:13,040 --> 00:42:17,040 the tool doesn't doesn't yet support 1175 00:42:14,800 --> 00:42:18,640 that that's that's 1176 00:42:17,040 --> 00:42:21,359 maybe one day 1177 00:42:18,640 --> 00:42:24,720 and yeah 1178 00:42:21,359 --> 00:42:26,560 i do that now myself again in vi 1179 00:42:24,720 --> 00:42:30,760 so 1180 00:42:26,560 --> 00:42:30,760 that's not optimal but 1181 00:42:42,480 --> 00:42:46,240 there's another continuation on the mx 1182 00:42:44,720 --> 00:42:49,040 conversation in the questions yeah i'm 1183 00:42:46,240 --> 00:42:49,040 using yeah 1184 00:42:51,280 --> 00:42:56,640 oh okay okay well okay i just 1185 00:42:54,480 --> 00:42:58,400 you use next arrow to navigate through 1186 00:42:56,640 --> 00:43:00,480 the source so annotating the output 1187 00:42:58,400 --> 00:43:02,640 allows one to get to the source quickly 1188 00:43:00,480 --> 00:43:05,040 vi has support for this as also well 1189 00:43:02,640 --> 00:43:07,359 thank you i learned something about vi 1190 00:43:05,040 --> 00:43:09,680 after all those the years thank you it 1191 00:43:07,359 --> 00:43:11,760 shows what kind of a dinosaur i i 1192 00:43:09,680 --> 00:43:12,800 actually am 1193 00:43:11,760 --> 00:43:15,520 yeah we 1194 00:43:12,800 --> 00:43:19,040 the gui has source navigation i we don't 1195 00:43:15,520 --> 00:43:22,079 have any integration with editors um 1196 00:43:19,040 --> 00:43:23,839 in the html version i try to um you know 1197 00:43:22,079 --> 00:43:25,680 i try to give you some some sort of 1198 00:43:23,839 --> 00:43:26,560 visual guidance about where time is 1199 00:43:25,680 --> 00:43:31,359 spent 1200 00:43:26,560 --> 00:43:31,359 but we we don't have any specific um 1201 00:43:31,920 --> 00:43:37,359 formats for for 1202 00:43:33,920 --> 00:43:38,240 for things like emax or vi 1203 00:43:37,359 --> 00:43:40,800 and 1204 00:43:38,240 --> 00:43:43,520 with the html tool you'll you'll you'll 1205 00:43:40,800 --> 00:43:46,079 get the raw files you can always load 1206 00:43:43,520 --> 00:43:48,000 them yourself into your editor uh you 1207 00:43:46,079 --> 00:43:49,599 can do that of course as well with the 1208 00:43:48,000 --> 00:43:51,760 with the command line 1209 00:43:49,599 --> 00:43:55,920 so i think i don't think we have a lot 1210 00:43:51,760 --> 00:43:55,920 to offer there unfortunately you know 1211 00:44:08,480 --> 00:44:10,480 is there any 1212 00:44:09,599 --> 00:44:12,800 uh 1213 00:44:10,480 --> 00:44:14,160 recent news since you recorded the talk 1214 00:44:12,800 --> 00:44:15,200 that you need to update 1215 00:44:14,160 --> 00:44:18,160 um 1216 00:44:15,200 --> 00:44:21,520 um the one thing is is that yeah things 1217 00:44:18,160 --> 00:44:21,520 the one thing is is that we 1218 00:44:21,680 --> 00:44:27,119 we we we have had a positive review in 1219 00:44:24,880 --> 00:44:28,000 bin newton's 1220 00:44:27,119 --> 00:44:31,040 um 1221 00:44:28,000 --> 00:44:34,000 from ben util so we we hope to be part 1222 00:44:31,040 --> 00:44:36,880 of that later this year um and we're 1223 00:44:34,000 --> 00:44:38,960 targeting maybe the release around mid 1224 00:44:36,880 --> 00:44:40,960 this year around the summer there's one 1225 00:44:38,960 --> 00:44:42,880 coming up earlier but i don't think we 1226 00:44:40,960 --> 00:44:44,400 should force ourselves into it and then 1227 00:44:42,880 --> 00:44:46,880 have all sorts of problems we just need 1228 00:44:44,400 --> 00:44:48,800 a bit more time to be ready for that 1229 00:44:46,880 --> 00:44:51,440 and again if you'd like to communicate 1230 00:44:48,800 --> 00:44:54,079 with us um you can always find this on 1231 00:44:51,440 --> 00:44:55,839 the bin neutrals alias 1232 00:44:54,079 --> 00:44:57,440 that's that's sort of the recommended 1233 00:44:55,839 --> 00:44:59,119 way so others can see the questions as 1234 00:44:57,440 --> 00:45:00,960 well but of course if you drop us an 1235 00:44:59,119 --> 00:45:02,079 email we're not going to ignore you at 1236 00:45:00,960 --> 00:45:04,400 all so 1237 00:45:02,079 --> 00:45:05,440 so absolutely not 1238 00:45:04,400 --> 00:45:07,280 thank you 1239 00:45:05,440 --> 00:45:08,880 all right thank you so much rude uh this 1240 00:45:07,280 --> 00:45:10,720 is a really interesting talk about a new 1241 00:45:08,880 --> 00:45:13,920 tool uh 1242 00:45:10,720 --> 00:45:16,480 our next presentation is at 4 40 pm uh 1243 00:45:13,920 --> 00:45:19,280 aedt and it's tisch with confessions of 1244 00:45:16,480 --> 00:45:20,800 a crypto miner so we'll see you all back 1245 00:45:19,280 --> 00:45:24,920 then thank you so much 1246 00:45:20,800 --> 00:45:24,920 thanks thanks caitlyn thanks