1
00:00:12,023 --> 00:00:18,800
>> Welcome back, everyone, I hope you had a nice, 
relaxing break. First up, after our break, we  

2
00:00:18,800 --> 00:00:28,960
have here Garth Kidd, to tell us what our Python 
was up to at 3:00 a.m. Garth straps open source  

3
00:00:28,960 --> 00:00:38,400
libraries together for a living. His Halloween 
plan, for the last four years, has been to wire  

4
00:00:38,400 --> 00:00:46,000
a microcontroller to a cardboard cutout and have 
it yell "exterminate" at children. You sound like  

5
00:00:46,000 --> 00:00:56,320
a lot of fun to have as a neighbor. Garth will 
be taking questions at the end of his time. If  

6
00:00:56,320 --> 00:01:04,640
you have any questions, pop them in the questions 
tab in the chat in Venueless and I will be passing  

7
00:01:04,640 --> 00:01:09,440
your questions along to Garth at the end.
All right. Thank you, Garth. Over to you. 

8
00:01:11,680 --> 00:01:19,680
GARTH KIDD: Good day, everybody. I'm here to talk 
about observability with OpenTelemetry and Python.  

9
00:01:20,400 --> 00:01:26,240
And this is at the beginning, that 
doesn't have its own span. At the end,  

10
00:01:26,240 --> 00:01:32,960
there will be time for questions.
I'm Garth. I'm speaking to from a  

11
00:01:32,960 --> 00:01:41,280
[Indiscernible] country. I had a popular 
question on Stack Overflow once and  

12
00:01:41,280 --> 00:01:48,640
nearly got sued by Apple. I'm your guide, today, 
because I've been practicing with observability  

13
00:01:50,400 --> 00:02:01,120
with Honeycomb and Open Census and OpenTelemetry. 
What's OpenTelemetry? OpenTelemetry is an  

14
00:02:01,120 --> 00:02:06,320
observability framework for cloud native 
software. They break it down on their website. 

15
00:02:07,520 --> 00:02:10,000
But you don't have to read this 
slide, you can read the next one.  

16
00:02:12,240 --> 00:02:18,480
OpenTelemetry is software to help you understand 
your software's performance and behavior.  

17
00:02:19,680 --> 00:02:23,200
I'll break it down a little 
more. It's a collection of  

18
00:02:23,840 --> 00:02:31,840
other people's code in the form of tools, APIs and 
SDKs to help you instrument your code to generate  

19
00:02:31,840 --> 00:02:40,800
collected export, telemetry data, traces, metrics 
and logs and regardless of which language,  

20
00:02:40,800 --> 00:02:46,400
framework, library, service, database, cloud 
vendor, anything that you're dealing with,  

21
00:02:47,520 --> 00:02:51,920
the goal is to ship everything except the 
query platform. It is super ambitious. 

22
00:02:57,680 --> 00:03:04,800
Logs, metrics and traces. Before I tell you how 
to do OpenTelemetry tracing in Python and how  

23
00:03:04,800 --> 00:03:10,640
to get best results, I should narrow down what 
we mean because why we've got a long history  

24
00:03:11,520 --> 00:03:18,320
with logs and metrics for decades, traces, 
not so long, you know, albeit that this is a  

25
00:03:19,040 --> 00:03:29,120
very Linux bias right now. As of this is long 
enough for us to feel certain about what we  

26
00:03:29,120 --> 00:03:34,320
mean when we say "logs, metrics or traces," but 
it's not necessarily long enough that whoever  

27
00:03:34,320 --> 00:03:52,400
we're talking to agree. It is a Peter Borgan 
post with three pillars of observability.  

28
00:03:53,600 --> 00:03:58,320
The idea that any one of these systems could do 
all three of the jobs if they just tried hard  

29
00:03:59,040 --> 00:04:04,160
enough. I think [Indiscernible] baked that into 
their definition of the APM segment. If you want  

30
00:04:04,160 --> 00:04:10,960
to get in that upper right hand corner, you 
got to do a lot. I fundamentally disagree.  

31
00:04:12,320 --> 00:04:19,040
Each one of these loves the piece of your data 
accessibility and tolerates the others at best. 

32
00:04:19,840 --> 00:04:28,320
Logging systems log your messages. 
Everything else is optional. For example,  

33
00:04:28,320 --> 00:04:38,080
you may send sys log packets to discover they 
ignore the timestamp and then when they fix the  

34
00:04:38,080 --> 00:04:44,480
timestamp, they're still not paying any attention 
to the data accessibility protocol. A logging  

35
00:04:44,480 --> 00:04:50,160
vendor will always keep your message.
Metrics love your measurements.  

36
00:04:50,160 --> 00:04:57,040
They might tolerate metadata, but maybe they 
insist it on being a number. They might let  

37
00:04:57,040 --> 00:05:07,360
you add key value pairs and then warn you not to. 
And then time, itself, is fuzzy in metrics line.  

38
00:05:08,240 --> 00:05:13,040
They know when they scraped it, but it's not 
necessarily part of the data accessibility. I  

39
00:05:13,040 --> 00:05:21,040
mean, all tracing systems love your span context. 
This is super. Microservices scatter our logs,  

40
00:05:21,040 --> 00:05:26,880
right. Traces help us join them back together 
and they add this extra timestamp that tells us,  

41
00:05:26,880 --> 00:05:30,480
for sure, whether the number we're looking 
at is when it started or when it ended.  

42
00:05:31,760 --> 00:05:35,600
Each system has its own piece of the 
puzzle. That might seem academic, but  

43
00:05:36,480 --> 00:05:39,680
we are now responsible for solving the puzzle.
[Laughter]. 

44
00:05:39,680 --> 00:05:44,680
Welcome to the second wave of DevOps 
developers. It's our turn. Our  

45
00:05:46,320 --> 00:05:53,760
data accessibility, though, is broken up. 
It is spread across logging and metrics  

46
00:05:53,760 --> 00:06:00,160
systems. It's spread out in each of them. 
We can't keep our eye all in one place.  

47
00:06:08,080 --> 00:06:19,360
Some of these we've done to ourselves. This 
example here uses string interpolation. And  

48
00:06:19,360 --> 00:06:23,200
it's spreading the data accessibility out across 
three messages when it could have used just one.  

49
00:06:24,640 --> 00:06:31,760
Some of the problem's forced on us. Metrics, for 
example, they have to be separate and you've kind  

50
00:06:31,760 --> 00:06:37,360
of got the idea that you might be able to use 
these key value pairs to tie them back together,  

51
00:06:37,360 --> 00:06:42,400
but the vendors warn you, like Prometheus 
tells you, please, keep the cardinality,  

52
00:06:42,400 --> 00:06:51,360
the number of unique values to less than 
10. And then, put those key value pairs  

53
00:06:53,200 --> 00:06:58,800
only on a few metrics. The say the vast majority 
of your metrics should have no labels. This means  

54
00:06:58,800 --> 00:07:10,400
we just can't pick, at random, that we're going to 
break down 400s by customer ID. So what telemetry  

55
00:07:10,400 --> 00:07:20,560
does for us, we bring it all together. We 
get the span from tracing. We get the key  

56
00:07:20,560 --> 00:07:29,840
value pairs. The measurements, the metadata from 
our structured logging and it's all in one spot. 

57
00:07:30,400 --> 00:07:37,360
And then, better yet, they couple it with these 
strong conventions for what they expect to see.  

58
00:07:38,080 --> 00:07:42,320
They just for the attribute alone there are these 
other things called "resources," I'll get to them.  

59
00:07:43,600 --> 00:07:53,440
They define top level groups, the attributes 
and the resources. And they define which ones  

60
00:07:53,440 --> 00:07:57,840
they expect to see, which ones they expect to 
see together and what the values look like.  

61
00:07:58,720 --> 00:08:05,120
The attributes differ request by request. The 
resources stay the same for any one process,  

62
00:08:06,320 --> 00:08:22,720
broken down by which host, lambda. You got the 
HTTP ones, the method, the route, the status code,  

63
00:08:23,680 --> 00:08:30,160
network ones, which tell you which IP address was 
talking to from which port to one. Database ones,  

64
00:08:31,600 --> 00:08:39,920
statement. And then some others, including vitally 
the end user ID and the service identity, which  

65
00:08:39,920 --> 00:08:46,400
helps you tie all this back to your business. The 
code namespace is less helpful than you'd think. 

66
00:08:49,520 --> 00:08:55,920
They insist these all get put together so 
you can rely on being able to break down  

67
00:08:55,920 --> 00:09:03,280
your data accessibility by any of these and filter 
by any of the others and you get all of these for  

68
00:09:03,280 --> 00:09:13,440
free from new code that somebody else wrote. 
So nobody's leaving it out. You can rely on it  

69
00:09:13,440 --> 00:09:20,560
all being there and this gives you that happy 
spot in the middle of Peter Borgan's diagram,  

70
00:09:21,520 --> 00:09:29,280
the happy combination from the data of 
tracing logging and metrics. We got our  

71
00:09:29,280 --> 00:09:35,840
requests scoped events he talked about and that 
is what you get from OpenTelemetry tracing. 

72
00:09:41,920 --> 00:09:47,440
Excuse me. So, now we know what 
we're talking about. OpenTelemetry  

73
00:09:47,440 --> 00:09:51,120
tracing. Let's talk about how to do 
it in Python. So like I was saying,  

74
00:09:52,160 --> 00:09:56,000
observability framework for cloud native 
software, collection of other people's code.  

75
00:09:58,560 --> 00:10:06,640
For this talk, I'm going to be concentrating on 
the API surface on how you instrument your data  

76
00:10:07,600 --> 00:10:15,840
accessibility. Python is stable for tracing, alpha 
for metrics and not implemented at all for logging  

77
00:10:16,800 --> 00:10:25,920
and for the examples, I'm going to be 
concentrating on Flask and one use of this. 

78
00:10:27,920 --> 00:10:33,360
If you maintain an open source package and you 
really get enthusiastic about this, you can  

79
00:10:33,360 --> 00:10:40,080
take a dependency and start calling it from your 
package so it's pre integrated with OpenTelemetry  

80
00:10:41,040 --> 00:10:48,000
and that'll save everybody else monkey 
patching it later. It comes with a no operation  

81
00:10:48,880 --> 00:10:57,120
backend that does literally nothing so you can 
call it safe with the knowledge that nothing bad  

82
00:10:57,120 --> 00:11:05,680
will happen. The SDK does the heavy lifting. The 
exporter sends it to somebody else and we've got  

83
00:11:05,680 --> 00:11:15,680
these two instrumentation packages for Flask 
and requests. Here's a simple Python example.  

84
00:11:17,280 --> 00:11:23,920
I'm going to dig in a little on pypi 343 context 
managers, that's the thing behind the "with"  

85
00:11:23,920 --> 00:11:31,760
statement. It calls somebody and gets an object 
and calls into the object to do work at the  

86
00:11:31,760 --> 00:11:40,000
beginning using a dunder thanks, Joachim called 
"enter." It gets a variable to use during and  

87
00:11:40,000 --> 00:11:50,800
tidies up a block of code by calling "exit." We're 
constantly creating a span, setting attributes  

88
00:11:50,800 --> 00:11:55,840
and then setting the status when we're done, 
though that's optional. It defaults to "okay." 

89
00:11:57,120 --> 00:12:01,520
Most of the code, though, is not going to be 
created by you. It's going to be created by  

90
00:12:01,520 --> 00:12:16,960
an integration written by somebody else. Let's 
look at Flask. A bit of disclaimer here, I'm a  

91
00:12:16,960 --> 00:12:24,640
Flask beginner. I had no idea how the app worked 
until Vibhu's talk just now. Let's add tracing.  

92
00:12:26,560 --> 00:12:31,680
I've shoved most of it in a different file but the 
point I really want to make here is you can make  

93
00:12:31,680 --> 00:12:37,360
good results without touching every function you 
wrote. You hook up the automatic instrumentation  

94
00:12:37,360 --> 00:12:46,880
at the beginning. The hookup code, though, looks 
like this. Don't panic. It's a whole bunch of  

95
00:12:46,880 --> 00:12:52,080
boilerplate, imports and modules, create 
a few objects and hook them together and  

96
00:12:54,640 --> 00:13:07,360
run the monkey patching code. It uses Wrap to help 
it reach into Flask. If you've uninstrumented, it  

97
00:13:08,160 --> 00:13:15,280
quietly pulls all the code back out. It's scary.
Anyway, or you if that's too much work,  

98
00:13:15,280 --> 00:13:23,040
you can use a vendor distribution. I'm pretty sure 
all the vendors will end up shipping one for you.  

99
00:13:24,000 --> 00:13:27,200
Either way, once you've hooked up 
all of the automatic instrumentation,  

100
00:13:29,920 --> 00:13:36,720
you've got free attributes. I'm sorry, I got 
lost. Having all these semantic attributes,  

101
00:13:38,640 --> 00:13:47,680
let's add our business concerns. This is awful 
code. Call out to Simon. I don't know whether  

102
00:13:47,680 --> 00:13:56,720
using the AST library to pause a request variable 
is a great idea, but let's have some fun. One  

103
00:13:56,720 --> 00:14:02,720
point I want to make here is you're not seeing 
any code to status span. That got done for us.  

104
00:14:03,280 --> 00:14:09,200
I wouldn't go to any extra effort to do so. All we 
have to do is the line point to the orange arrow,  

105
00:14:10,080 --> 00:14:18,160
get current span and add an attribute back 
to our business. That's all you got to do.  

106
00:14:18,720 --> 00:14:24,240
And what we get for that is that mixture of 
all of the automatic stuff with our stuff  

107
00:14:24,240 --> 00:14:29,840
in that one data accessibility structure. Now we 
can relate our business concerns toward the rest. 

108
00:14:31,760 --> 00:14:32,880
Something goes wrong though,  

109
00:14:37,280 --> 00:14:43,840
let's say something crashes out. The only special 
handling here is for Flask so we get a Status 400.  

110
00:14:47,520 --> 00:14:50,240
The OpenTelemetry handling 
of exceptions is automatic.  

111
00:14:51,840 --> 00:14:56,320
And what you get is an extra part of the data 
accessibility structure reflecting there was  

112
00:14:56,320 --> 00:15:01,680
an error and you get an event to give you all 
the data accessibility you could want there  

113
00:15:02,240 --> 00:15:05,680
and most importantly, for my money, 
it keeps everything else intact  

114
00:15:09,440 --> 00:15:20,480
and whatever other metrics and measurements. Let's 
add some counters. Use another context manager,  

115
00:15:20,480 --> 00:15:29,360
we're using "with" again. The context manager is 
maintaining a counter that we can increment. So  

116
00:15:30,000 --> 00:15:38,240
like I was saying, does work at the beginning, 
gives you a value and tidies up at the end and so,  

117
00:15:38,960 --> 00:15:43,200
we create a table one in the orange line 
and then we use it in that pages out plus  

118
00:15:43,200 --> 00:15:52,480
equals one thing there. If it's too much magic 
for you, use "try finally." The implementation of  

119
00:15:52,480 --> 00:15:58,720
the context manager looks like this and it's not 
far from the "hello world" for context managers.  

120
00:16:00,320 --> 00:16:05,760
The dunder efforts all here, there's your enter, 
which is the work at the beginning and providing  

121
00:16:05,760 --> 00:16:15,520
the variable. There your exit. We return 
false. One thing I should call out here is the  

122
00:16:15,520 --> 00:16:23,840
OpenTelemetry teams are very careful to ever avoid 
crashing. So, you might feel like you need to  

123
00:16:23,840 --> 00:16:28,160
defensively wrap this in another [Indiscernible] 
but you can probably get away without it. 

124
00:16:29,760 --> 00:16:36,240
Anyway, and then the add is the other dunder 
method which lets you do the incrementing. A  

125
00:16:36,240 --> 00:16:41,680
context manager to add measurements looks like 
this and this is using a different style of  

126
00:16:41,680 --> 00:16:49,600
context manager where it wraps a generator, 
taking the first result from it and it treats  

127
00:16:49,600 --> 00:16:57,120
that first block of code as the enter. It treats 
what you yielded as the context variable and then  

128
00:16:58,480 --> 00:17:03,840
it reenters the generator and treats anything 
that happens after the yield as your exit.  

129
00:17:04,560 --> 00:17:07,280
It's [Indiscernible] fun but 
like I said, this is boilerplate,  

130
00:17:07,280 --> 00:17:13,280
you just get to call it later and there will 
be some code that shows off [Indiscernible]. 

131
00:17:13,280 --> 00:17:23,440
Consider adding baggage. It goes alongside 
your spans. Propagating by itself. And it's  

132
00:17:23,440 --> 00:17:28,640
another collection of key value pairs that gets 
sent to other services in the request headers,  

133
00:17:28,640 --> 00:17:35,640
like you can see at the bottom of the slide there. 
It's going to be copied to the span attributes and  

134
00:17:36,720 --> 00:17:41,520
by you. It does not happen automatically. 
So, it's a little bit of extra work,  

135
00:17:41,520 --> 00:17:46,640
but it's super valuable because otherwise, 
you'll hit one of these database span,  

136
00:17:46,640 --> 00:17:52,720
but the database span doesn't have the end 
user ID you need to track down the user who's  

137
00:17:53,760 --> 00:18:00,640
doing the thing, right? So the code, for baggages, 
looks like this. The API, it's a bit awkward  

138
00:18:00,640 --> 00:18:08,480
to me but that's okay. Context managers, again, 
to the rescue. We can define one and then with  

139
00:18:08,480 --> 00:18:13,840
the baggage context of end user ID, we can go off 
and do stuff with our span and that all works. 

140
00:18:15,040 --> 00:18:23,200
I bet you could use Christopher's talk. I also 
bet you could write Flask middleware to do this,  

141
00:18:23,200 --> 00:18:33,120
but I'm a beginner here and I couldn't pull 
any of those things off. It's your call. 

142
00:18:35,760 --> 00:18:39,680
But that's all you need. Most of the 
work is in boilerplate and helper methods  

143
00:18:40,400 --> 00:18:46,160
and then you just sprinkle a few extra bits of 
code across your request handlers to help you tie  

144
00:18:46,800 --> 00:18:51,040
your metadata and your measurements into 
the data accessibility that's been output.  

145
00:18:51,760 --> 00:18:58,400
Here, you can see the duration oh, I'm sorry. 
I forgot to modify the code on that one.  

146
00:19:04,160 --> 00:19:09,840
You get all the data accessibility you 
need and you can just enjoy the results. 

147
00:19:09,840 --> 00:19:17,200
All right. So that's how we do OpenTelemetry 
and tracing. But I should give you some  

148
00:19:17,840 --> 00:19:24,160
overall advice for how to get best results 
on your journey from nothing to, you know,  

149
00:19:24,880 --> 00:19:31,600
observability. If it's a green field opportunity, 
I do a [Indiscernible] tricks and logic first.  

150
00:19:32,400 --> 00:19:37,440
You can derive metrics and logs from 
scoped events. But nobody cares. 

151
00:19:37,440 --> 00:19:49,360
Then, instrument your inbound and 
your outbound requests, you've got 33  

152
00:19:49,360 --> 00:19:53,760
automatic integrations to you. If you have 
to write any, please contribute them back. 

153
00:19:59,360 --> 00:20:04,800
Add resources to make sure you can break 
down by release, branch, commit, hash,  

154
00:20:05,760 --> 00:20:10,640
whatever else is relevant and quick as a 
flash, caching your [Indiscernible] wins,  

155
00:20:11,200 --> 00:20:20,880
send it somewhere. Ideally, you send it to some 
dynamic dashboard, like Honeycomb. They argue  

156
00:20:20,880 --> 00:20:30,400
you need a column store so you can query it 
fast without any indexing that would be giving  

157
00:20:30,400 --> 00:20:44,560
the dimensionality we're talking about, a right 
amplification. Do check the dimensionality and  

158
00:20:44,560 --> 00:20:56,400
the cardinality impact. You don't want to tag by 
user ID and have it cost you $4,500 a month. You  

159
00:20:56,400 --> 00:21:02,640
can stand up to a 100 kilobytes per event. They 
advise you to keep stuffing those attributes in. 

160
00:21:07,200 --> 00:21:10,960
Every time you ship, you need to be familiar 
on it to rely on it in an emergency. 

161
00:21:18,000 --> 00:21:22,880
Loop back and start adding attributes 
to your upper spans. You want to double  

162
00:21:22,880 --> 00:21:27,680
check you're getting the spanning 
attributes and semantic resources,  

163
00:21:28,400 --> 00:21:35,440
contribute those resources upstream if you need 
to and add your measurements and most importantly,  

164
00:21:35,440 --> 00:21:42,480
your metadata. You want your end user ID, you want 
an account ID, you want a project ID, anything  

165
00:21:42,480 --> 00:21:47,040
else that is relevant to your business, needs to 
be in there so you can break down by it later. 

166
00:21:48,160 --> 00:21:53,040
Once you finish that, add more 
attributes. These are from Honeycomb.  

167
00:21:55,360 --> 00:22:00,880
If you watched the presentation on PyPI and 
you want to see how the other community lives,  

168
00:22:00,880 --> 00:22:07,200
you can climb into Honeycomb and 
splicing and dicing. It's a heap of fun.  

169
00:22:09,520 --> 00:22:16,000
If you're adding everything that's relevant to 
you, you will have more than this. Seriously. Add  

170
00:22:16,000 --> 00:22:21,760
even more attributes to your spans. This was from 
one of my services. It was an image conversion  

171
00:22:23,920 --> 00:22:26,640
package I put in an Amazon lambda function.  

172
00:22:30,320 --> 00:22:37,760
Release information would be in resources, except 
the age and days which would have to be dynamic.  

173
00:22:39,840 --> 00:22:46,240
You can see, I'm counting the all of the 
input images, all out of the output images,  

174
00:22:46,240 --> 00:22:55,440
how long it took and this is still not enough.
That's it, it is 50 columns. It's not really  

175
00:22:55,440 --> 00:23:02,240
feeling so logy anymore, is it? Don't worry. Don't 
panic. Your existing skills will get you here. 

176
00:23:06,000 --> 00:23:11,360
And then, like I was saying earlier, you probably 
want to propagate package. The trace context  

177
00:23:11,360 --> 00:23:19,760
is free tying all of these. If you want those 
interior spans to have your business metadata,  

178
00:23:19,760 --> 00:23:27,920
the end user ID, the account ID, or the 
project ID, baggage is definitely worth it.  

179
00:23:29,040 --> 00:23:34,000
You know, you might not you might literally not 
find the trace to look at without the baggage  

180
00:23:34,560 --> 00:23:39,280
querying just the events you're 
getting from those database queries. 

181
00:23:41,200 --> 00:23:48,560
Excuse me. And then, I guess finally, for the you 
know stern warnings based on bitter experience,  

182
00:23:49,280 --> 00:23:55,120
don't lean too hard on tracing. Like, don't 
try to fill in gaps in your trace waterfalls,  

183
00:23:55,120 --> 00:24:00,320
don't wrap every function call in a span. 
Don't write separate spans for your controller  

184
00:24:00,320 --> 00:24:07,920
layer and a bunch of separate spans for 
your model layer. In short, don't trace  

185
00:24:07,920 --> 00:24:15,360
like you log otherwise you'll end up staring at it 
like your logs. It's not that effective. Instead,  

186
00:24:16,000 --> 00:24:23,280
summarize what's going on inside all of these and 
put them on the attributes of the outer most span.  

187
00:24:24,640 --> 00:24:28,160
When you see that suspicious bit at the 
beginning and the suspicious bit at the end,  

188
00:24:30,240 --> 00:24:37,520
don't panic. We know some of it's decoding, 
encoding, that's what measurements are for. 

189
00:24:41,360 --> 00:24:46,960
If you are worried about the fire hose of 
information that's going off to destination  

190
00:24:46,960 --> 00:24:53,120
here, don't panic. You can sample it if it gets 
overwhelmed. Because all the data accessibility is  

191
00:24:53,120 --> 00:24:58,640
together in one data accessibility structure, it's 
representative. Genuinely, all of those things did  

192
00:24:58,640 --> 00:25:06,240
happen together. So if you take that one event and 
throw away the next nine, that's okay. Or if it's  

193
00:25:06,240 --> 00:25:15,120
from your load balance and health checks, you can 
throw away 99 and 100. However you do it, you end  

194
00:25:15,120 --> 00:25:21,680
up with operationally useful data accessibility 
still coming out and the vendors at the other end  

195
00:25:21,680 --> 00:25:31,920
of this data accessibility compensate. You can 
crank up and down the volume based on the bill  

196
00:25:31,920 --> 00:25:37,280
at the other end and you'll be able to enjoy 
the results. You can do the sampling in two  

197
00:25:37,280 --> 00:25:49,520
different places. The OpenTelemetry API rather, 
the SDK, has this interior processing pipeline,  

198
00:25:49,520 --> 00:25:54,880
which you configured in that boilerplate in the 
beginning and you can configure sampling there,  

199
00:25:54,880 --> 00:26:01,040
either explicitly in your code or by setting up 
a couple of environment variables. You can do  

200
00:26:01,040 --> 00:26:06,880
it out of process, you might be implementing 
the data accessibility collector, either the  

201
00:26:06,880 --> 00:26:19,120
one shipped by the project, and it can be also 
configured to do this kind of sampling. Remember,  

202
00:26:19,840 --> 00:26:24,560
surviving events are completely intact and you 
can make perfectly useful decisions based on this. 

203
00:26:25,600 --> 00:26:30,480
Most importantly, look at it every time 
you ship. You might find something. 

204
00:26:31,760 --> 00:26:39,040
Anyway, we're done. This is the bit of the end 
with no span. I've talked to you about logs,  

205
00:26:39,040 --> 00:26:43,840
metrics and tracing. About how to do 
OpenTelemetry style tracing in Python  

206
00:26:44,640 --> 00:26:48,080
and given you a little bit of 
advice on how to get best results.  

207
00:26:48,080 --> 00:26:53,120
I hope all of this helps you know what your 
Python was up to at 3:00 in the morning.  

208
00:26:53,760 --> 00:26:56,040
Thank you very much. [Indiscernible] Garth.
>>  

209
00:26:59,280 --> 00:27:05,200
I was on mute. It was bound to happening. 
Thank so you much for that, Garth. You've  

210
00:27:05,200 --> 00:27:09,920
certainly opened up a whole new world for me. 
I'll have to have a serious think about that.  

211
00:27:11,680 --> 00:27:19,120
Okay. So, we have a question. But we have three 
minutes, so there's time for a few questions.  

212
00:27:20,080 --> 00:27:24,000
If you are watching this, please go 
put questions in the question tab now. 

213
00:27:24,800 --> 00:27:25,360
All right.
GARTH KIDD: Thank  

214
00:27:25,360 --> 00:27:29,200
you, brave questioner.
>> Yes, it's always very brave  

215
00:27:29,200 --> 00:27:35,600
of the person who starts us off. All right. Our 
first question is, if you already have metrics,  

216
00:27:35,600 --> 00:27:40,880
in what scenarios would you want to set 
counter or duration attributes and span,  

217
00:27:40,880 --> 00:27:46,560
if they understood the use case correctly?
GARTH KIDD: Yeah. Good question.  

218
00:27:49,600 --> 00:27:55,120
I think mainly if you've got metrics and you're 
already getting all the answers you ever need  

219
00:27:55,120 --> 00:28:02,640
from it, first, you can just stop there. That's 
find. If you find yourself ever baffled by the  

220
00:28:02,640 --> 00:28:06,560
relationship between what's happening on one graph 
and what's happening on another graph, though,  

221
00:28:06,560 --> 00:28:12,240
and you can't relate those each other, then 
that's exactly the time you would want to go  

222
00:28:13,280 --> 00:28:18,160
and look at the at the OpenTelemetry 
tracing data accessibility to see if  

223
00:28:18,160 --> 00:28:22,320
you can figure out what's going on.
Especially if, like I was saying,  

224
00:28:22,320 --> 00:28:27,360
you've left a lot of the pertinent detail 
that would help you tie those things together  

225
00:28:27,360 --> 00:28:32,720
out of the label or tags that you're 
putting on your tracing for budget reasons. 

226
00:28:34,560 --> 00:28:40,640
I think the number that is most going to drive you 
over to your OpenTelemetry data accessibility is  

227
00:28:40,640 --> 00:28:48,000
latency. That those duration MS columns are 
just super. But, I mean, I've also found,  

228
00:28:48,720 --> 00:28:57,840
like, the logarithm of the number of seconds since 
launch is a wonderful I'm sure you can put that on  

229
00:28:57,840 --> 00:29:05,120
your metrics, as well. If you whack that on a 
graph and break down everything else, you can  

230
00:29:07,760 --> 00:29:14,960
correlate problems when something rebooted or 
relaunched. I hope I answered the question. 

231
00:29:14,960 --> 00:29:20,240
>> You definitely answered the question 
that I was going to ask, which was,  

232
00:29:20,240 --> 00:29:26,880
like, when would at what point would you 
want to move on to trying this. Yeah. 

233
00:29:26,880 --> 00:29:30,640
GARTH KIDD: Flipping through dashboards 
or glazing your way through logs. Yeah. 

234
00:29:30,640 --> 00:29:35,760
[Laughter].
>>  

235
00:29:35,760 --> 00:29:43,600
I think we've all been there. All right. 
So, quickly, can you quickly recap what  

236
00:29:43,600 --> 00:29:50,720
a span is versus what a baggage is?
GARTH KIDD: Oh. Right. So, um, so,  

237
00:29:50,720 --> 00:29:57,760
spans are the data accessibility structures we end 
up sending to the machinery we're querying with  

238
00:29:58,800 --> 00:30:05,440
and you've got a span context which is basically 
an index into that. It's got your trace ID,  

239
00:30:05,440 --> 00:30:12,480
the parent span ID and the span ID. We 
send the span context and baggage as  

240
00:30:12,480 --> 00:30:16,000
headers to let somebody else know 
how to join in on an existing span.  

241
00:30:16,960 --> 00:30:21,840
But the baggage is a separate key value store. 
You could shove anything you want in there.  

242
00:30:28,400 --> 00:30:34,080
Whereas anything you put in the span is definitely 
going to get sent to the machinery at the end  

243
00:30:34,080 --> 00:30:40,800
unless some machinery somewhere in the 
middle strips it out. That aside, yeah,  

244
00:30:42,480 --> 00:30:48,160
might head to the the discussion room, 
I'll probably go for the room number two  

245
00:30:48,160 --> 00:30:53,200
and try and unpack it more later I haven't 
thought through enough to answer that one. 

246
00:30:53,200 --> 00:31:01,120
>> Putting you right on the spot with all these 
questions. We are out of time, but there is one  

247
00:31:01,120 --> 00:31:09,440
more question there. So, would you prefer to me 
to send people over to the video hallway chat? 

248
00:31:09,440 --> 00:31:13,680
GARTH KIDD: Yeah, I'll go 
to hallway chat number two. 

249
00:31:13,680 --> 00:31:22,480
>> Okay. Video hallway chat number two. I would 
go and drop the last question, that we didn't  

250
00:31:22,480 --> 00:31:28,400
have time for, in the text chat over there in 
a few minutes oh, and another one's popped up.  

251
00:31:30,560 --> 00:31:36,560
Head all over to the video hallway chat, room 
number two, to chat with Garth after this talk. 

252
00:31:37,200 --> 00:31:42,800
Okay. We're just going to have a short break oh, 
thank you, again, Garth. That was great. We're  

253
00:31:42,800 --> 00:31:54,240
going to have a short break and our next talk 
with begin in 14 minutes. We'll see you all then.