1 00:00:12,023 --> 00:00:18,800 >> Welcome back, everyone, I hope you had a nice,  relaxing break. First up, after our break, we   2 00:00:18,800 --> 00:00:28,960 have here Garth Kidd, to tell us what our Python  was up to at 3:00 a.m. Garth straps open source   3 00:00:28,960 --> 00:00:38,400 libraries together for a living. His Halloween  plan, for the last four years, has been to wire   4 00:00:38,400 --> 00:00:46,000 a microcontroller to a cardboard cutout and have  it yell "exterminate" at children. You sound like   5 00:00:46,000 --> 00:00:56,320 a lot of fun to have as a neighbor. Garth will  be taking questions at the end of his time. If   6 00:00:56,320 --> 00:01:04,640 you have any questions, pop them in the questions  tab in the chat in Venueless and I will be passing   7 00:01:04,640 --> 00:01:09,440 your questions along to Garth at the end. All right. Thank you, Garth. Over to you.  8 00:01:11,680 --> 00:01:19,680 GARTH KIDD: Good day, everybody. I'm here to talk  about observability with OpenTelemetry and Python.   9 00:01:20,400 --> 00:01:26,240 And this is at the beginning, that  doesn't have its own span. At the end,   10 00:01:26,240 --> 00:01:32,960 there will be time for questions. I'm Garth. I'm speaking to from a   11 00:01:32,960 --> 00:01:41,280 [Indiscernible] country. I had a popular  question on Stack Overflow once and   12 00:01:41,280 --> 00:01:48,640 nearly got sued by Apple. I'm your guide, today,  because I've been practicing with observability   13 00:01:50,400 --> 00:02:01,120 with Honeycomb and Open Census and OpenTelemetry.  What's OpenTelemetry? OpenTelemetry is an   14 00:02:01,120 --> 00:02:06,320 observability framework for cloud native  software. They break it down on their website.  15 00:02:07,520 --> 00:02:10,000 But you don't have to read this  slide, you can read the next one.   16 00:02:12,240 --> 00:02:18,480 OpenTelemetry is software to help you understand  your software's performance and behavior.   17 00:02:19,680 --> 00:02:23,200 I'll break it down a little  more. It's a collection of   18 00:02:23,840 --> 00:02:31,840 other people's code in the form of tools, APIs and  SDKs to help you instrument your code to generate   19 00:02:31,840 --> 00:02:40,800 collected export, telemetry data, traces, metrics  and logs and regardless of which language,   20 00:02:40,800 --> 00:02:46,400 framework, library, service, database, cloud  vendor, anything that you're dealing with,   21 00:02:47,520 --> 00:02:51,920 the goal is to ship everything except the  query platform. It is super ambitious.  22 00:02:57,680 --> 00:03:04,800 Logs, metrics and traces. Before I tell you how  to do OpenTelemetry tracing in Python and how   23 00:03:04,800 --> 00:03:10,640 to get best results, I should narrow down what  we mean because why we've got a long history   24 00:03:11,520 --> 00:03:18,320 with logs and metrics for decades, traces,  not so long, you know, albeit that this is a   25 00:03:19,040 --> 00:03:29,120 very Linux bias right now. As of this is long  enough for us to feel certain about what we   26 00:03:29,120 --> 00:03:34,320 mean when we say "logs, metrics or traces," but  it's not necessarily long enough that whoever   27 00:03:34,320 --> 00:03:52,400 we're talking to agree. It is a Peter Borgan  post with three pillars of observability.   28 00:03:53,600 --> 00:03:58,320 The idea that any one of these systems could do  all three of the jobs if they just tried hard   29 00:03:59,040 --> 00:04:04,160 enough. I think [Indiscernible] baked that into  their definition of the APM segment. If you want   30 00:04:04,160 --> 00:04:10,960 to get in that upper right hand corner, you  got to do a lot. I fundamentally disagree.   31 00:04:12,320 --> 00:04:19,040 Each one of these loves the piece of your data  accessibility and tolerates the others at best.  32 00:04:19,840 --> 00:04:28,320 Logging systems log your messages.  Everything else is optional. For example,   33 00:04:28,320 --> 00:04:38,080 you may send sys log packets to discover they  ignore the timestamp and then when they fix the   34 00:04:38,080 --> 00:04:44,480 timestamp, they're still not paying any attention  to the data accessibility protocol. A logging   35 00:04:44,480 --> 00:04:50,160 vendor will always keep your message. Metrics love your measurements.   36 00:04:50,160 --> 00:04:57,040 They might tolerate metadata, but maybe they  insist it on being a number. They might let   37 00:04:57,040 --> 00:05:07,360 you add key value pairs and then warn you not to.  And then time, itself, is fuzzy in metrics line.   38 00:05:08,240 --> 00:05:13,040 They know when they scraped it, but it's not  necessarily part of the data accessibility. I   39 00:05:13,040 --> 00:05:21,040 mean, all tracing systems love your span context.  This is super. Microservices scatter our logs,   40 00:05:21,040 --> 00:05:26,880 right. Traces help us join them back together  and they add this extra timestamp that tells us,   41 00:05:26,880 --> 00:05:30,480 for sure, whether the number we're looking  at is when it started or when it ended.   42 00:05:31,760 --> 00:05:35,600 Each system has its own piece of the  puzzle. That might seem academic, but   43 00:05:36,480 --> 00:05:39,680 we are now responsible for solving the puzzle. [Laughter].  44 00:05:39,680 --> 00:05:44,680 Welcome to the second wave of DevOps  developers. It's our turn. Our   45 00:05:46,320 --> 00:05:53,760 data accessibility, though, is broken up.  It is spread across logging and metrics   46 00:05:53,760 --> 00:06:00,160 systems. It's spread out in each of them.  We can't keep our eye all in one place.   47 00:06:08,080 --> 00:06:19,360 Some of these we've done to ourselves. This  example here uses string interpolation. And   48 00:06:19,360 --> 00:06:23,200 it's spreading the data accessibility out across  three messages when it could have used just one.   49 00:06:24,640 --> 00:06:31,760 Some of the problem's forced on us. Metrics, for  example, they have to be separate and you've kind   50 00:06:31,760 --> 00:06:37,360 of got the idea that you might be able to use  these key value pairs to tie them back together,   51 00:06:37,360 --> 00:06:42,400 but the vendors warn you, like Prometheus  tells you, please, keep the cardinality,   52 00:06:42,400 --> 00:06:51,360 the number of unique values to less than  10. And then, put those key value pairs   53 00:06:53,200 --> 00:06:58,800 only on a few metrics. The say the vast majority  of your metrics should have no labels. This means   54 00:06:58,800 --> 00:07:10,400 we just can't pick, at random, that we're going to  break down 400s by customer ID. So what telemetry   55 00:07:10,400 --> 00:07:20,560 does for us, we bring it all together. We  get the span from tracing. We get the key   56 00:07:20,560 --> 00:07:29,840 value pairs. The measurements, the metadata from  our structured logging and it's all in one spot.  57 00:07:30,400 --> 00:07:37,360 And then, better yet, they couple it with these  strong conventions for what they expect to see.   58 00:07:38,080 --> 00:07:42,320 They just for the attribute alone there are these  other things called "resources," I'll get to them.   59 00:07:43,600 --> 00:07:53,440 They define top level groups, the attributes  and the resources. And they define which ones   60 00:07:53,440 --> 00:07:57,840 they expect to see, which ones they expect to  see together and what the values look like.   61 00:07:58,720 --> 00:08:05,120 The attributes differ request by request. The  resources stay the same for any one process,   62 00:08:06,320 --> 00:08:22,720 broken down by which host, lambda. You got the  HTTP ones, the method, the route, the status code,   63 00:08:23,680 --> 00:08:30,160 network ones, which tell you which IP address was  talking to from which port to one. Database ones,   64 00:08:31,600 --> 00:08:39,920 statement. And then some others, including vitally  the end user ID and the service identity, which   65 00:08:39,920 --> 00:08:46,400 helps you tie all this back to your business. The  code namespace is less helpful than you'd think.  66 00:08:49,520 --> 00:08:55,920 They insist these all get put together so  you can rely on being able to break down   67 00:08:55,920 --> 00:09:03,280 your data accessibility by any of these and filter  by any of the others and you get all of these for   68 00:09:03,280 --> 00:09:13,440 free from new code that somebody else wrote.  So nobody's leaving it out. You can rely on it   69 00:09:13,440 --> 00:09:20,560 all being there and this gives you that happy  spot in the middle of Peter Borgan's diagram,   70 00:09:21,520 --> 00:09:29,280 the happy combination from the data of  tracing logging and metrics. We got our   71 00:09:29,280 --> 00:09:35,840 requests scoped events he talked about and that  is what you get from OpenTelemetry tracing.  72 00:09:41,920 --> 00:09:47,440 Excuse me. So, now we know what  we're talking about. OpenTelemetry   73 00:09:47,440 --> 00:09:51,120 tracing. Let's talk about how to do  it in Python. So like I was saying,   74 00:09:52,160 --> 00:09:56,000 observability framework for cloud native  software, collection of other people's code.   75 00:09:58,560 --> 00:10:06,640 For this talk, I'm going to be concentrating on  the API surface on how you instrument your data   76 00:10:07,600 --> 00:10:15,840 accessibility. Python is stable for tracing, alpha  for metrics and not implemented at all for logging   77 00:10:16,800 --> 00:10:25,920 and for the examples, I'm going to be  concentrating on Flask and one use of this.  78 00:10:27,920 --> 00:10:33,360 If you maintain an open source package and you  really get enthusiastic about this, you can   79 00:10:33,360 --> 00:10:40,080 take a dependency and start calling it from your  package so it's pre integrated with OpenTelemetry   80 00:10:41,040 --> 00:10:48,000 and that'll save everybody else monkey  patching it later. It comes with a no operation   81 00:10:48,880 --> 00:10:57,120 backend that does literally nothing so you can  call it safe with the knowledge that nothing bad   82 00:10:57,120 --> 00:11:05,680 will happen. The SDK does the heavy lifting. The  exporter sends it to somebody else and we've got   83 00:11:05,680 --> 00:11:15,680 these two instrumentation packages for Flask  and requests. Here's a simple Python example.   84 00:11:17,280 --> 00:11:23,920 I'm going to dig in a little on pypi 343 context  managers, that's the thing behind the "with"   85 00:11:23,920 --> 00:11:31,760 statement. It calls somebody and gets an object  and calls into the object to do work at the   86 00:11:31,760 --> 00:11:40,000 beginning using a dunder thanks, Joachim called  "enter." It gets a variable to use during and   87 00:11:40,000 --> 00:11:50,800 tidies up a block of code by calling "exit." We're  constantly creating a span, setting attributes   88 00:11:50,800 --> 00:11:55,840 and then setting the status when we're done,  though that's optional. It defaults to "okay."  89 00:11:57,120 --> 00:12:01,520 Most of the code, though, is not going to be  created by you. It's going to be created by   90 00:12:01,520 --> 00:12:16,960 an integration written by somebody else. Let's  look at Flask. A bit of disclaimer here, I'm a   91 00:12:16,960 --> 00:12:24,640 Flask beginner. I had no idea how the app worked  until Vibhu's talk just now. Let's add tracing.   92 00:12:26,560 --> 00:12:31,680 I've shoved most of it in a different file but the  point I really want to make here is you can make   93 00:12:31,680 --> 00:12:37,360 good results without touching every function you  wrote. You hook up the automatic instrumentation   94 00:12:37,360 --> 00:12:46,880 at the beginning. The hookup code, though, looks  like this. Don't panic. It's a whole bunch of   95 00:12:46,880 --> 00:12:52,080 boilerplate, imports and modules, create  a few objects and hook them together and   96 00:12:54,640 --> 00:13:07,360 run the monkey patching code. It uses Wrap to help  it reach into Flask. If you've uninstrumented, it   97 00:13:08,160 --> 00:13:15,280 quietly pulls all the code back out. It's scary. Anyway, or you if that's too much work,   98 00:13:15,280 --> 00:13:23,040 you can use a vendor distribution. I'm pretty sure  all the vendors will end up shipping one for you.   99 00:13:24,000 --> 00:13:27,200 Either way, once you've hooked up  all of the automatic instrumentation,   100 00:13:29,920 --> 00:13:36,720 you've got free attributes. I'm sorry, I got  lost. Having all these semantic attributes,   101 00:13:38,640 --> 00:13:47,680 let's add our business concerns. This is awful  code. Call out to Simon. I don't know whether   102 00:13:47,680 --> 00:13:56,720 using the AST library to pause a request variable  is a great idea, but let's have some fun. One   103 00:13:56,720 --> 00:14:02,720 point I want to make here is you're not seeing  any code to status span. That got done for us.   104 00:14:03,280 --> 00:14:09,200 I wouldn't go to any extra effort to do so. All we  have to do is the line point to the orange arrow,   105 00:14:10,080 --> 00:14:18,160 get current span and add an attribute back  to our business. That's all you got to do.   106 00:14:18,720 --> 00:14:24,240 And what we get for that is that mixture of  all of the automatic stuff with our stuff   107 00:14:24,240 --> 00:14:29,840 in that one data accessibility structure. Now we  can relate our business concerns toward the rest.  108 00:14:31,760 --> 00:14:32,880 Something goes wrong though,   109 00:14:37,280 --> 00:14:43,840 let's say something crashes out. The only special  handling here is for Flask so we get a Status 400.   110 00:14:47,520 --> 00:14:50,240 The OpenTelemetry handling  of exceptions is automatic.   111 00:14:51,840 --> 00:14:56,320 And what you get is an extra part of the data  accessibility structure reflecting there was   112 00:14:56,320 --> 00:15:01,680 an error and you get an event to give you all  the data accessibility you could want there   113 00:15:02,240 --> 00:15:05,680 and most importantly, for my money,  it keeps everything else intact   114 00:15:09,440 --> 00:15:20,480 and whatever other metrics and measurements. Let's  add some counters. Use another context manager,   115 00:15:20,480 --> 00:15:29,360 we're using "with" again. The context manager is  maintaining a counter that we can increment. So   116 00:15:30,000 --> 00:15:38,240 like I was saying, does work at the beginning,  gives you a value and tidies up at the end and so,   117 00:15:38,960 --> 00:15:43,200 we create a table one in the orange line  and then we use it in that pages out plus   118 00:15:43,200 --> 00:15:52,480 equals one thing there. If it's too much magic  for you, use "try finally." The implementation of   119 00:15:52,480 --> 00:15:58,720 the context manager looks like this and it's not  far from the "hello world" for context managers.   120 00:16:00,320 --> 00:16:05,760 The dunder efforts all here, there's your enter,  which is the work at the beginning and providing   121 00:16:05,760 --> 00:16:15,520 the variable. There your exit. We return  false. One thing I should call out here is the   122 00:16:15,520 --> 00:16:23,840 OpenTelemetry teams are very careful to ever avoid  crashing. So, you might feel like you need to   123 00:16:23,840 --> 00:16:28,160 defensively wrap this in another [Indiscernible]  but you can probably get away without it.  124 00:16:29,760 --> 00:16:36,240 Anyway, and then the add is the other dunder  method which lets you do the incrementing. A   125 00:16:36,240 --> 00:16:41,680 context manager to add measurements looks like  this and this is using a different style of   126 00:16:41,680 --> 00:16:49,600 context manager where it wraps a generator,  taking the first result from it and it treats   127 00:16:49,600 --> 00:16:57,120 that first block of code as the enter. It treats  what you yielded as the context variable and then   128 00:16:58,480 --> 00:17:03,840 it reenters the generator and treats anything  that happens after the yield as your exit.   129 00:17:04,560 --> 00:17:07,280 It's [Indiscernible] fun but  like I said, this is boilerplate,   130 00:17:07,280 --> 00:17:13,280 you just get to call it later and there will  be some code that shows off [Indiscernible].  131 00:17:13,280 --> 00:17:23,440 Consider adding baggage. It goes alongside  your spans. Propagating by itself. And it's   132 00:17:23,440 --> 00:17:28,640 another collection of key value pairs that gets  sent to other services in the request headers,   133 00:17:28,640 --> 00:17:35,640 like you can see at the bottom of the slide there.  It's going to be copied to the span attributes and   134 00:17:36,720 --> 00:17:41,520 by you. It does not happen automatically.  So, it's a little bit of extra work,   135 00:17:41,520 --> 00:17:46,640 but it's super valuable because otherwise,  you'll hit one of these database span,   136 00:17:46,640 --> 00:17:52,720 but the database span doesn't have the end  user ID you need to track down the user who's   137 00:17:53,760 --> 00:18:00,640 doing the thing, right? So the code, for baggages,  looks like this. The API, it's a bit awkward   138 00:18:00,640 --> 00:18:08,480 to me but that's okay. Context managers, again,  to the rescue. We can define one and then with   139 00:18:08,480 --> 00:18:13,840 the baggage context of end user ID, we can go off  and do stuff with our span and that all works.  140 00:18:15,040 --> 00:18:23,200 I bet you could use Christopher's talk. I also  bet you could write Flask middleware to do this,   141 00:18:23,200 --> 00:18:33,120 but I'm a beginner here and I couldn't pull  any of those things off. It's your call.  142 00:18:35,760 --> 00:18:39,680 But that's all you need. Most of the  work is in boilerplate and helper methods   143 00:18:40,400 --> 00:18:46,160 and then you just sprinkle a few extra bits of  code across your request handlers to help you tie   144 00:18:46,800 --> 00:18:51,040 your metadata and your measurements into  the data accessibility that's been output.   145 00:18:51,760 --> 00:18:58,400 Here, you can see the duration oh, I'm sorry.  I forgot to modify the code on that one.   146 00:19:04,160 --> 00:19:09,840 You get all the data accessibility you  need and you can just enjoy the results.  147 00:19:09,840 --> 00:19:17,200 All right. So that's how we do OpenTelemetry  and tracing. But I should give you some   148 00:19:17,840 --> 00:19:24,160 overall advice for how to get best results  on your journey from nothing to, you know,   149 00:19:24,880 --> 00:19:31,600 observability. If it's a green field opportunity,  I do a [Indiscernible] tricks and logic first.   150 00:19:32,400 --> 00:19:37,440 You can derive metrics and logs from  scoped events. But nobody cares.  151 00:19:37,440 --> 00:19:49,360 Then, instrument your inbound and  your outbound requests, you've got 33   152 00:19:49,360 --> 00:19:53,760 automatic integrations to you. If you have  to write any, please contribute them back.  153 00:19:59,360 --> 00:20:04,800 Add resources to make sure you can break  down by release, branch, commit, hash,   154 00:20:05,760 --> 00:20:10,640 whatever else is relevant and quick as a  flash, caching your [Indiscernible] wins,   155 00:20:11,200 --> 00:20:20,880 send it somewhere. Ideally, you send it to some  dynamic dashboard, like Honeycomb. They argue   156 00:20:20,880 --> 00:20:30,400 you need a column store so you can query it  fast without any indexing that would be giving   157 00:20:30,400 --> 00:20:44,560 the dimensionality we're talking about, a right  amplification. Do check the dimensionality and   158 00:20:44,560 --> 00:20:56,400 the cardinality impact. You don't want to tag by  user ID and have it cost you $4,500 a month. You   159 00:20:56,400 --> 00:21:02,640 can stand up to a 100 kilobytes per event. They  advise you to keep stuffing those attributes in.  160 00:21:07,200 --> 00:21:10,960 Every time you ship, you need to be familiar  on it to rely on it in an emergency.  161 00:21:18,000 --> 00:21:22,880 Loop back and start adding attributes  to your upper spans. You want to double   162 00:21:22,880 --> 00:21:27,680 check you're getting the spanning  attributes and semantic resources,   163 00:21:28,400 --> 00:21:35,440 contribute those resources upstream if you need  to and add your measurements and most importantly,   164 00:21:35,440 --> 00:21:42,480 your metadata. You want your end user ID, you want  an account ID, you want a project ID, anything   165 00:21:42,480 --> 00:21:47,040 else that is relevant to your business, needs to  be in there so you can break down by it later.  166 00:21:48,160 --> 00:21:53,040 Once you finish that, add more  attributes. These are from Honeycomb.   167 00:21:55,360 --> 00:22:00,880 If you watched the presentation on PyPI and  you want to see how the other community lives,   168 00:22:00,880 --> 00:22:07,200 you can climb into Honeycomb and  splicing and dicing. It's a heap of fun.   169 00:22:09,520 --> 00:22:16,000 If you're adding everything that's relevant to  you, you will have more than this. Seriously. Add   170 00:22:16,000 --> 00:22:21,760 even more attributes to your spans. This was from  one of my services. It was an image conversion   171 00:22:23,920 --> 00:22:26,640 package I put in an Amazon lambda function.   172 00:22:30,320 --> 00:22:37,760 Release information would be in resources, except  the age and days which would have to be dynamic.   173 00:22:39,840 --> 00:22:46,240 You can see, I'm counting the all of the  input images, all out of the output images,   174 00:22:46,240 --> 00:22:55,440 how long it took and this is still not enough. That's it, it is 50 columns. It's not really   175 00:22:55,440 --> 00:23:02,240 feeling so logy anymore, is it? Don't worry. Don't  panic. Your existing skills will get you here.  176 00:23:06,000 --> 00:23:11,360 And then, like I was saying earlier, you probably  want to propagate package. The trace context   177 00:23:11,360 --> 00:23:19,760 is free tying all of these. If you want those  interior spans to have your business metadata,   178 00:23:19,760 --> 00:23:27,920 the end user ID, the account ID, or the  project ID, baggage is definitely worth it.   179 00:23:29,040 --> 00:23:34,000 You know, you might not you might literally not  find the trace to look at without the baggage   180 00:23:34,560 --> 00:23:39,280 querying just the events you're  getting from those database queries.  181 00:23:41,200 --> 00:23:48,560 Excuse me. And then, I guess finally, for the you  know stern warnings based on bitter experience,   182 00:23:49,280 --> 00:23:55,120 don't lean too hard on tracing. Like, don't  try to fill in gaps in your trace waterfalls,   183 00:23:55,120 --> 00:24:00,320 don't wrap every function call in a span.  Don't write separate spans for your controller   184 00:24:00,320 --> 00:24:07,920 layer and a bunch of separate spans for  your model layer. In short, don't trace   185 00:24:07,920 --> 00:24:15,360 like you log otherwise you'll end up staring at it  like your logs. It's not that effective. Instead,   186 00:24:16,000 --> 00:24:23,280 summarize what's going on inside all of these and  put them on the attributes of the outer most span.   187 00:24:24,640 --> 00:24:28,160 When you see that suspicious bit at the  beginning and the suspicious bit at the end,   188 00:24:30,240 --> 00:24:37,520 don't panic. We know some of it's decoding,  encoding, that's what measurements are for.  189 00:24:41,360 --> 00:24:46,960 If you are worried about the fire hose of  information that's going off to destination   190 00:24:46,960 --> 00:24:53,120 here, don't panic. You can sample it if it gets  overwhelmed. Because all the data accessibility is   191 00:24:53,120 --> 00:24:58,640 together in one data accessibility structure, it's  representative. Genuinely, all of those things did   192 00:24:58,640 --> 00:25:06,240 happen together. So if you take that one event and  throw away the next nine, that's okay. Or if it's   193 00:25:06,240 --> 00:25:15,120 from your load balance and health checks, you can  throw away 99 and 100. However you do it, you end   194 00:25:15,120 --> 00:25:21,680 up with operationally useful data accessibility  still coming out and the vendors at the other end   195 00:25:21,680 --> 00:25:31,920 of this data accessibility compensate. You can  crank up and down the volume based on the bill   196 00:25:31,920 --> 00:25:37,280 at the other end and you'll be able to enjoy  the results. You can do the sampling in two   197 00:25:37,280 --> 00:25:49,520 different places. The OpenTelemetry API rather,  the SDK, has this interior processing pipeline,   198 00:25:49,520 --> 00:25:54,880 which you configured in that boilerplate in the  beginning and you can configure sampling there,   199 00:25:54,880 --> 00:26:01,040 either explicitly in your code or by setting up  a couple of environment variables. You can do   200 00:26:01,040 --> 00:26:06,880 it out of process, you might be implementing  the data accessibility collector, either the   201 00:26:06,880 --> 00:26:19,120 one shipped by the project, and it can be also  configured to do this kind of sampling. Remember,   202 00:26:19,840 --> 00:26:24,560 surviving events are completely intact and you  can make perfectly useful decisions based on this.  203 00:26:25,600 --> 00:26:30,480 Most importantly, look at it every time  you ship. You might find something.  204 00:26:31,760 --> 00:26:39,040 Anyway, we're done. This is the bit of the end  with no span. I've talked to you about logs,   205 00:26:39,040 --> 00:26:43,840 metrics and tracing. About how to do  OpenTelemetry style tracing in Python   206 00:26:44,640 --> 00:26:48,080 and given you a little bit of  advice on how to get best results.   207 00:26:48,080 --> 00:26:53,120 I hope all of this helps you know what your  Python was up to at 3:00 in the morning.   208 00:26:53,760 --> 00:26:56,040 Thank you very much. [Indiscernible] Garth. >>   209 00:26:59,280 --> 00:27:05,200 I was on mute. It was bound to happening.  Thank so you much for that, Garth. You've   210 00:27:05,200 --> 00:27:09,920 certainly opened up a whole new world for me.  I'll have to have a serious think about that.   211 00:27:11,680 --> 00:27:19,120 Okay. So, we have a question. But we have three  minutes, so there's time for a few questions.   212 00:27:20,080 --> 00:27:24,000 If you are watching this, please go  put questions in the question tab now.  213 00:27:24,800 --> 00:27:25,360 All right. GARTH KIDD: Thank   214 00:27:25,360 --> 00:27:29,200 you, brave questioner. >> Yes, it's always very brave   215 00:27:29,200 --> 00:27:35,600 of the person who starts us off. All right. Our  first question is, if you already have metrics,   216 00:27:35,600 --> 00:27:40,880 in what scenarios would you want to set  counter or duration attributes and span,   217 00:27:40,880 --> 00:27:46,560 if they understood the use case correctly? GARTH KIDD: Yeah. Good question.   218 00:27:49,600 --> 00:27:55,120 I think mainly if you've got metrics and you're  already getting all the answers you ever need   219 00:27:55,120 --> 00:28:02,640 from it, first, you can just stop there. That's  find. If you find yourself ever baffled by the   220 00:28:02,640 --> 00:28:06,560 relationship between what's happening on one graph  and what's happening on another graph, though,   221 00:28:06,560 --> 00:28:12,240 and you can't relate those each other, then  that's exactly the time you would want to go   222 00:28:13,280 --> 00:28:18,160 and look at the at the OpenTelemetry  tracing data accessibility to see if   223 00:28:18,160 --> 00:28:22,320 you can figure out what's going on. Especially if, like I was saying,   224 00:28:22,320 --> 00:28:27,360 you've left a lot of the pertinent detail  that would help you tie those things together   225 00:28:27,360 --> 00:28:32,720 out of the label or tags that you're  putting on your tracing for budget reasons.  226 00:28:34,560 --> 00:28:40,640 I think the number that is most going to drive you  over to your OpenTelemetry data accessibility is   227 00:28:40,640 --> 00:28:48,000 latency. That those duration MS columns are  just super. But, I mean, I've also found,   228 00:28:48,720 --> 00:28:57,840 like, the logarithm of the number of seconds since  launch is a wonderful I'm sure you can put that on   229 00:28:57,840 --> 00:29:05,120 your metrics, as well. If you whack that on a  graph and break down everything else, you can   230 00:29:07,760 --> 00:29:14,960 correlate problems when something rebooted or  relaunched. I hope I answered the question.  231 00:29:14,960 --> 00:29:20,240 >> You definitely answered the question  that I was going to ask, which was,   232 00:29:20,240 --> 00:29:26,880 like, when would at what point would you  want to move on to trying this. Yeah.  233 00:29:26,880 --> 00:29:30,640 GARTH KIDD: Flipping through dashboards  or glazing your way through logs. Yeah.  234 00:29:30,640 --> 00:29:35,760 [Laughter]. >>   235 00:29:35,760 --> 00:29:43,600 I think we've all been there. All right.  So, quickly, can you quickly recap what   236 00:29:43,600 --> 00:29:50,720 a span is versus what a baggage is? GARTH KIDD: Oh. Right. So, um, so,   237 00:29:50,720 --> 00:29:57,760 spans are the data accessibility structures we end  up sending to the machinery we're querying with   238 00:29:58,800 --> 00:30:05,440 and you've got a span context which is basically  an index into that. It's got your trace ID,   239 00:30:05,440 --> 00:30:12,480 the parent span ID and the span ID. We  send the span context and baggage as   240 00:30:12,480 --> 00:30:16,000 headers to let somebody else know  how to join in on an existing span.   241 00:30:16,960 --> 00:30:21,840 But the baggage is a separate key value store.  You could shove anything you want in there.   242 00:30:28,400 --> 00:30:34,080 Whereas anything you put in the span is definitely  going to get sent to the machinery at the end   243 00:30:34,080 --> 00:30:40,800 unless some machinery somewhere in the  middle strips it out. That aside, yeah,   244 00:30:42,480 --> 00:30:48,160 might head to the the discussion room,  I'll probably go for the room number two   245 00:30:48,160 --> 00:30:53,200 and try and unpack it more later I haven't  thought through enough to answer that one.  246 00:30:53,200 --> 00:31:01,120 >> Putting you right on the spot with all these  questions. We are out of time, but there is one   247 00:31:01,120 --> 00:31:09,440 more question there. So, would you prefer to me  to send people over to the video hallway chat?  248 00:31:09,440 --> 00:31:13,680 GARTH KIDD: Yeah, I'll go  to hallway chat number two.  249 00:31:13,680 --> 00:31:22,480 >> Okay. Video hallway chat number two. I would  go and drop the last question, that we didn't   250 00:31:22,480 --> 00:31:28,400 have time for, in the text chat over there in  a few minutes oh, and another one's popped up.   251 00:31:30,560 --> 00:31:36,560 Head all over to the video hallway chat, room  number two, to chat with Garth after this talk.  252 00:31:37,200 --> 00:31:42,800 Okay. We're just going to have a short break oh,  thank you, again, Garth. That was great. We're   253 00:31:42,800 --> 00:31:54,240 going to have a short break and our next talk  with begin in 14 minutes. We'll see you all then.