1
00:00:00,000 --> 00:00:05,910
We'd

2
00:00:02,830 --> 00:00:05,910
[Music]

3
00:00:10,160 --> 00:00:14,960
like to welcome to the stage at this

4
00:00:11,440 --> 00:00:17,520
time Andressa Delo Cabistani uh who will

5
00:00:14,960 --> 00:00:20,480
speak with us on self-healing system for

6
00:00:17,520 --> 00:00:23,720
UI tests using ML. Thank you.

7
00:00:20,480 --> 00:00:23,720
Thank you.

8
00:00:28,000 --> 00:00:34,559
So hi. Oh my god, you are a lot of

9
00:00:31,119 --> 00:00:36,160
people. So, hi, my name is Andresa

10
00:00:34,559 --> 00:00:39,360
Cabeni.

11
00:00:36,160 --> 00:00:42,719
Uh, and I am not a machine learning

12
00:00:39,360 --> 00:00:45,520
engineer nor a data scientist. I work as

13
00:00:42,719 --> 00:00:48,559
a software quality engineer. And I want

14
00:00:45,520 --> 00:00:51,600
to show today that there are things we

15
00:00:48,559 --> 00:00:53,120
can do using machine learning in my area

16
00:00:51,600 --> 00:00:54,719
too.

17
00:00:53,120 --> 00:00:57,600
So today I'll talk about the

18
00:00:54,719 --> 00:01:00,640
self-healing system for UI tests that we

19
00:00:57,600 --> 00:01:03,440
built in my team this year. This system

20
00:01:00,640 --> 00:01:06,479
started as a project during an AI

21
00:01:03,440 --> 00:01:11,640
hackathon and ended up as a solution to

22
00:01:06,479 --> 00:01:11,640
make our UI test suites more reliable.

23
00:01:11,680 --> 00:01:17,600
But before I start, let me introduce

24
00:01:13,920 --> 00:01:19,759
myself a little bit. So as I said, my

25
00:01:17,600 --> 00:01:21,439
name is Andrea.

26
00:01:19,759 --> 00:01:24,720
Probably you already know that because

27
00:01:21,439 --> 00:01:27,600
of my accent, but I am Brazilian.

28
00:01:24,720 --> 00:01:29,680
Brazil is doing an outofse carnival

29
00:01:27,600 --> 00:01:33,840
right now celebrating that the former

30
00:01:29,680 --> 00:01:36,960
president was sentenced to 27 years of

31
00:01:33,840 --> 00:01:40,880
prison for attempting a coup d'etata.

32
00:01:36,960 --> 00:01:44,240
So, y Brazil.

33
00:01:40,880 --> 00:01:46,720
Uh, I work at Red Hat and I love

34
00:01:44,240 --> 00:01:49,360
programming. What I like the most about

35
00:01:46,720 --> 00:01:52,000
programming uh is the fact that we are

36
00:01:49,360 --> 00:01:54,960
always learning. Uh when I'm not

37
00:01:52,000 --> 00:01:58,000
working, I am probably singing K-pop

38
00:01:54,960 --> 00:02:00,719
demon hunter songs with my daughters.

39
00:01:58,000 --> 00:02:03,119
Those are day, by the way, I didn't just

40
00:02:00,719 --> 00:02:07,439
get images from random girls in the

41
00:02:03,119 --> 00:02:09,920
internet. Um and the last thing is that

42
00:02:07,439 --> 00:02:11,680
I love Star Wars, but who doesn't,

43
00:02:09,920 --> 00:02:15,520
right?

44
00:02:11,680 --> 00:02:20,080
So, enough about me. Let's do a quick

45
00:02:15,520 --> 00:02:22,560
quick overview of how a UI test works.

46
00:02:20,080 --> 00:02:25,840
Basically, we need to use an automation

47
00:02:22,560 --> 00:02:28,959
framework, for example, Selenium or the

48
00:02:25,840 --> 00:02:32,400
one I built this whole project around uh

49
00:02:28,959 --> 00:02:36,720
called playright and tell the automation

50
00:02:32,400 --> 00:02:40,080
framework to identify a specific element

51
00:02:36,720 --> 00:02:42,000
in the page and perform an action in

52
00:02:40,080 --> 00:02:45,840
that element.

53
00:02:42,000 --> 00:02:48,879
An example to understand that is if I

54
00:02:45,840 --> 00:02:51,599
have a test to validate that the log

55
00:02:48,879 --> 00:02:54,560
logging functionality is working in a

56
00:02:51,599 --> 00:02:58,400
page. I tell the automation framework to

57
00:02:54,560 --> 00:03:01,360
navigate to the page under test. Select

58
00:02:58,400 --> 00:03:03,840
the username input box and fill that

59
00:03:01,360 --> 00:03:06,720
with the correct username.

60
00:03:03,840 --> 00:03:09,440
Select the password input box and fill

61
00:03:06,720 --> 00:03:12,480
that with the correct password. Click

62
00:03:09,440 --> 00:03:16,319
the login button and then assert that

63
00:03:12,480 --> 00:03:19,840
the login functionality was successful.

64
00:03:16,319 --> 00:03:22,480
Or like the example in the slide shows,

65
00:03:19,840 --> 00:03:26,480
go to the PyCon Australia

66
00:03:22,480 --> 00:03:30,879
2025 website and click the link

67
00:03:26,480 --> 00:03:37,080
identified by that huge CSS selector

68
00:03:30,879 --> 00:03:37,080
which is the about link in that page.

69
00:03:39,440 --> 00:03:45,440
To add some context about Python

70
00:03:42,640 --> 00:03:48,959
playright, the first thing to understand

71
00:03:45,440 --> 00:03:52,239
is that Python playright has a page

72
00:03:48,959 --> 00:03:55,280
class with methods that are used to

73
00:03:52,239 --> 00:03:58,480
locate elements in the page. For

74
00:03:55,280 --> 00:04:01,920
example, get by role

75
00:03:58,480 --> 00:04:04,720
that receives the type of row like

76
00:04:01,920 --> 00:04:07,680
button and the name of that button. for

77
00:04:04,720 --> 00:04:10,879
example, submit.

78
00:04:07,680 --> 00:04:14,640
These methods return a playright's

79
00:04:10,879 --> 00:04:19,280
locator instance. The locator instances

80
00:04:14,640 --> 00:04:22,400
make it possible to click, fill, check,

81
00:04:19,280 --> 00:04:24,960
and perform other playright actions. In

82
00:04:22,400 --> 00:04:27,919
this slide, we can see a video I

83
00:04:24,960 --> 00:04:31,360
recorded of myself pointing to the the

84
00:04:27,919 --> 00:04:35,120
mouse cursor uh to different elements

85
00:04:31,360 --> 00:04:37,520
using Playright code generator

86
00:04:35,120 --> 00:04:42,400
just to show some of the ways playright

87
00:04:37,520 --> 00:04:46,320
uses to identify elements in a page. For

88
00:04:42,400 --> 00:04:50,000
example, the get by RO passing the link

89
00:04:46,320 --> 00:04:53,759
as row and setting name about as an

90
00:04:50,000 --> 00:04:57,600
alternative to calling locator method

91
00:04:53,759 --> 00:05:00,320
and using that long awful CSS selector

92
00:04:57,600 --> 00:05:03,280
from the other slides.

93
00:05:00,320 --> 00:05:05,680
And please take a good look at that

94
00:05:03,280 --> 00:05:09,560
about link in the page because I'll talk

95
00:05:05,680 --> 00:05:09,560
about it a lot today.

96
00:05:11,120 --> 00:05:16,720
So when we are testing something like

97
00:05:13,759 --> 00:05:20,160
the logging functionality,

98
00:05:16,720 --> 00:05:22,479
we call it an end to end test.

99
00:05:20,160 --> 00:05:25,440
And if I have a test suite full of end

100
00:05:22,479 --> 00:05:28,880
to end tests that are triggered in a

101
00:05:25,440 --> 00:05:32,560
CI/CD pipeline to run every time new

102
00:05:28,880 --> 00:05:35,600
pieces of code are merged. We call them

103
00:05:32,560 --> 00:05:38,160
regression tests.

104
00:05:35,600 --> 00:05:41,840
These regressions tests run so we can

105
00:05:38,160 --> 00:05:45,039
lower the risk that the changes affect

106
00:05:41,840 --> 00:05:48,160
other functionalities before continuing

107
00:05:45,039 --> 00:05:52,000
its way to prod. And when a failure

108
00:05:48,160 --> 00:05:54,320
occurs in this regression test, we stop

109
00:05:52,000 --> 00:05:59,479
the release of the functionality and

110
00:05:54,320 --> 00:05:59,479
start to investigate what happened.

111
00:06:04,479 --> 00:06:09,520
And here is where the UI test problem

112
00:06:07,919 --> 00:06:13,440
begins.

113
00:06:09,520 --> 00:06:17,919
Sometimes even a small front end change

114
00:06:13,440 --> 00:06:20,240
to an attribute can break a selector

115
00:06:17,919 --> 00:06:23,440
making impossible for an automation

116
00:06:20,240 --> 00:06:27,039
framework to identify the element in the

117
00:06:23,440 --> 00:06:30,400
page to perform the expected action

118
00:06:27,039 --> 00:06:33,680
until a timeout error occurs.

119
00:06:30,400 --> 00:06:36,560
So we stopped the whole pipeline started

120
00:06:33,680 --> 00:06:39,120
to investigate what is wrong just to

121
00:06:36,560 --> 00:06:43,280
find out that the ID changed from

122
00:06:39,120 --> 00:06:46,319
hashtag username to hashtag user and

123
00:06:43,280 --> 00:06:47,840
this doesn't affected the logging

124
00:06:46,319 --> 00:06:51,280
functionality

125
00:06:47,840 --> 00:06:56,120
the logging is still working it's just a

126
00:06:51,280 --> 00:06:56,120
UI test framework limitation

127
00:06:56,160 --> 00:07:03,360
so as good engineers we wanted to create

128
00:06:59,840 --> 00:07:06,720
a solution for this problem. A system

129
00:07:03,360 --> 00:07:10,080
that when a timeout error occur during

130
00:07:06,720 --> 00:07:12,240
an action like click, fail, check, etc.

131
00:07:10,080 --> 00:07:15,599
Because playright couldn't find the

132
00:07:12,240 --> 00:07:18,479
element, the test would heal itself.

133
00:07:15,599 --> 00:07:21,599
Sounds a lot like a Jedi mind trick,

134
00:07:18,479 --> 00:07:25,319
right? It heal itself and make the

135
00:07:21,599 --> 00:07:25,319
pipeline succeed.

136
00:07:25,840 --> 00:07:33,680
It is not as cool as Jedi mind trick but

137
00:07:30,560 --> 00:07:35,520
we can do we can do cool flowcharts with

138
00:07:33,680 --> 00:07:41,039
it.

139
00:07:35,520 --> 00:07:44,800
So let's start with the successful case.

140
00:07:41,039 --> 00:07:48,960
The test start and tell playright to

141
00:07:44,800 --> 00:07:52,639
perform the action click in an element.

142
00:07:48,960 --> 00:07:54,879
If playright is successful to click in

143
00:07:52,639 --> 00:07:57,520
the element,

144
00:07:54,879 --> 00:08:00,160
fingerprints for the element are saved

145
00:07:57,520 --> 00:08:02,000
in a database and the test will

146
00:08:00,160 --> 00:08:07,440
continue.

147
00:08:02,000 --> 00:08:10,080
Now the case that is interest us more

148
00:08:07,440 --> 00:08:13,759
the one that the action wasn't

149
00:08:10,080 --> 00:08:17,120
successful in this case if playright

150
00:08:13,759 --> 00:08:19,120
can't find the element the healing is

151
00:08:17,120 --> 00:08:22,000
triggered.

152
00:08:19,120 --> 00:08:24,800
When the healing is triggered, there are

153
00:08:22,000 --> 00:08:28,160
a bunch of things that need to happen

154
00:08:24,800 --> 00:08:31,039
before attempting to heal. The first

155
00:08:28,160 --> 00:08:33,440
thing in the healing process is to

156
00:08:31,039 --> 00:08:36,399
extract information

157
00:08:33,440 --> 00:08:39,919
for the failed element. So the system

158
00:08:36,399 --> 00:08:42,719
can search in the database for similar

159
00:08:39,919 --> 00:08:46,320
fingerprints.

160
00:08:42,719 --> 00:08:49,680
A after finding similar fingerprints,

161
00:08:46,320 --> 00:08:52,959
multiple alternatives are generated and

162
00:08:49,680 --> 00:08:54,640
sent to an endpoint where they will be

163
00:08:52,959 --> 00:08:58,399
classified

164
00:08:54,640 --> 00:09:02,000
and ranked by confidence score, a number

165
00:08:58,399 --> 00:09:05,440
that express how likely they are to be a

166
00:09:02,000 --> 00:09:08,480
successful substitute.

167
00:09:05,440 --> 00:09:11,760
The healing system will iterate over the

168
00:09:08,480 --> 00:09:14,720
rent alternatives and start to test them

169
00:09:11,760 --> 00:09:18,000
one by one.

170
00:09:14,720 --> 00:09:21,519
Both successful and unsuccessful healing

171
00:09:18,000 --> 00:09:24,720
events will be stored in the database

172
00:09:21,519 --> 00:09:27,839
for future learning. The healing system

173
00:09:24,720 --> 00:09:30,800
will try until an alternative is

174
00:09:27,839 --> 00:09:33,839
successful or until exhaust all

175
00:09:30,800 --> 00:09:36,560
alternatives with confidence score equal

176
00:09:33,839 --> 00:09:40,600
or greater than the threshold which is

177
00:09:36,560 --> 00:09:40,600
defaulted to 0.5

178
00:09:42,880 --> 00:09:48,040
water. Okay. So

179
00:09:48,320 --> 00:09:55,360
since we wanted that our existing test

180
00:09:52,080 --> 00:09:58,800
suits using Pyest playright

181
00:09:55,360 --> 00:10:02,080
didn't need many changes when starting

182
00:09:58,800 --> 00:10:05,120
to use the self-healing system we

183
00:10:02,080 --> 00:10:09,279
implemented a wrapper around playright's

184
00:10:05,120 --> 00:10:11,120
page called self-healing page which is

185
00:10:09,279 --> 00:10:14,800
very creative

186
00:10:11,120 --> 00:10:18,880
and if in the normal Python playright we

187
00:10:14,800 --> 00:10:22,000
have a page object and all methods used

188
00:10:18,880 --> 00:10:25,839
to locate an element return a locator

189
00:10:22,000 --> 00:10:28,560
instance. Self-healing page has all the

190
00:10:25,839 --> 00:10:31,600
same methods but returning a

191
00:10:28,560 --> 00:10:34,560
self-healing locator instance and

192
00:10:31,600 --> 00:10:37,839
self-healing locator implements the same

193
00:10:34,560 --> 00:10:40,240
interface as locator but with the

194
00:10:37,839 --> 00:10:42,720
healing functionality inside all

195
00:10:40,240 --> 00:10:47,240
methods.

196
00:10:42,720 --> 00:10:47,240
the action methods actually.

197
00:10:48,160 --> 00:10:54,560
So in this slide we can see a

198
00:10:51,680 --> 00:10:58,079
comparison. We can see the changes

199
00:10:54,560 --> 00:11:01,839
needed to initialize a PyCon AU page

200
00:10:58,079 --> 00:11:04,079
instance for a system using only Pyest

201
00:11:01,839 --> 00:11:07,920
playright without the healing

202
00:11:04,079 --> 00:11:12,000
functionality. And how to initialize

203
00:11:07,920 --> 00:11:14,720
using the healing system without any

204
00:11:12,000 --> 00:11:18,000
healing we just need to pass the page

205
00:11:14,720 --> 00:11:20,399
fixture to Python AU page and it's ready

206
00:11:18,000 --> 00:11:23,680
to be used.

207
00:11:20,399 --> 00:11:26,959
The second way we need to create a page

208
00:11:23,680 --> 00:11:29,600
object with healing capabilities

209
00:11:26,959 --> 00:11:31,760
which I called self-healing page in

210
00:11:29,600 --> 00:11:35,279
snake case

211
00:11:31,760 --> 00:11:39,600
since we can't pass the fixture page

212
00:11:35,279 --> 00:11:43,279
from pyest playrate directly to pyon au

213
00:11:39,600 --> 00:11:46,240
page we pass it to the class

214
00:11:43,279 --> 00:11:48,560
self-healing p

215
00:11:46,240 --> 00:11:52,079
page to build a page with healing

216
00:11:48,560 --> 00:11:55,279
capabilities that we want to pass to

217
00:11:52,079 --> 00:11:58,720
PyCon AU page.

218
00:11:55,279 --> 00:12:02,560
Self-healing page will accept both page

219
00:11:58,720 --> 00:12:05,440
and the healing DB as our arguments.

220
00:12:02,560 --> 00:12:08,639
After that, we just pass self-healing

221
00:12:05,440 --> 00:12:12,079
page object to Python AU page and the

222
00:12:08,639 --> 00:12:16,800
fixture is ready to be used.

223
00:12:12,079 --> 00:12:21,040
So all the tests using the fixture pyon

224
00:12:16,800 --> 00:12:24,160
AU page won't need to change. The only

225
00:12:21,040 --> 00:12:27,360
changes were in how to set up the

226
00:12:24,160 --> 00:12:32,040
fixture and all methods used are

227
00:12:27,360 --> 00:12:32,040
compatible with both pages.

228
00:12:34,959 --> 00:12:41,680
So we already know from the flowchart

229
00:12:38,000 --> 00:12:44,240
what is the flow of the system.

230
00:12:41,680 --> 00:12:47,839
But what happens in the code when I use

231
00:12:44,240 --> 00:12:51,519
the fixture pyon AU page with healing

232
00:12:47,839 --> 00:12:57,160
capabilities inside my test and tell

233
00:12:51,519 --> 00:12:57,160
playright to click in that about link.

234
00:12:57,279 --> 00:13:03,680
First the get by roll method inside

235
00:13:00,720 --> 00:13:06,240
selfhealing page will return a

236
00:13:03,680 --> 00:13:09,600
self-healing locator.

237
00:13:06,240 --> 00:13:12,560
If you paid attention in the about link

238
00:13:09,600 --> 00:13:15,600
like I told you to do, you already know

239
00:13:12,560 --> 00:13:19,120
that they are using

240
00:13:15,600 --> 00:13:21,839
uh are doing to locate the about link in

241
00:13:19,120 --> 00:13:25,200
the page is a good way.

242
00:13:21,839 --> 00:13:27,200
So the click action is successful

243
00:13:25,200 --> 00:13:29,440
without any trigger to the healing

244
00:13:27,200 --> 00:13:34,079
system. And when an action is

245
00:13:29,440 --> 00:13:38,800
successful, the action method will call

246
00:13:34,079 --> 00:13:44,279
the method to insert the fingerprint and

247
00:13:38,800 --> 00:13:44,279
save the su successful interaction

248
00:13:44,560 --> 00:13:49,600
in a database

249
00:13:47,120 --> 00:13:52,560
and save all and store all the

250
00:13:49,600 --> 00:13:56,680
fingerprints in the database. But what

251
00:13:52,560 --> 00:13:56,680
are those fingerprints?

252
00:13:59,360 --> 00:14:02,360
Oh,

253
00:14:02,480 --> 00:14:05,959
just a second.

254
00:14:06,880 --> 00:14:12,720
I am running.

255
00:14:10,639 --> 00:14:14,560
Okay.

256
00:14:12,720 --> 00:14:18,000
Okay.

257
00:14:14,560 --> 00:14:21,360
Uh so what are those fingerpin

258
00:14:18,000 --> 00:14:25,440
fingerprints? So these are information

259
00:14:21,360 --> 00:14:28,240
about the elements we just clicked.

260
00:14:25,440 --> 00:14:31,519
This information are tag name which is

261
00:14:28,240 --> 00:14:35,040
an A here for a link or what's the inner

262
00:14:31,519 --> 00:14:37,519
text which we know it's about or if

263
00:14:35,040 --> 00:14:40,560
there are accessibility attributes or

264
00:14:37,519 --> 00:14:43,920
the page this element is located and the

265
00:14:40,560 --> 00:14:46,160
type of locator method it is used to get

266
00:14:43,920 --> 00:14:49,440
that element.

267
00:14:46,160 --> 00:14:52,959
And these are all important information

268
00:14:49,440 --> 00:14:56,079
that we will need as soon as the locator

269
00:14:52,959 --> 00:14:58,800
is not valid anymore

270
00:14:56,079 --> 00:15:04,000
and needs to heal.

271
00:14:58,800 --> 00:15:05,760
But now let's simulate a failure.

272
00:15:04,000 --> 00:15:08,480
So

273
00:15:05,760 --> 00:15:11,199
inside our test, we are trying to locate

274
00:15:08,480 --> 00:15:13,199
an incorrect name

275
00:15:11,199 --> 00:15:15,360
ABT.

276
00:15:13,199 --> 00:15:19,199
Well, we know this is supposed to

277
00:15:15,360 --> 00:15:22,240
trigger a healing, but in code details,

278
00:15:19,199 --> 00:15:25,680
what happens?

279
00:15:22,240 --> 00:15:29,680
The timeout error is handled inside each

280
00:15:25,680 --> 00:15:32,720
action method in self-healing locator.

281
00:15:29,680 --> 00:15:35,920
So when the timeout error happens during

282
00:15:32,720 --> 00:15:38,880
the click, it will first check if the

283
00:15:35,920 --> 00:15:43,040
healing service is available and if it

284
00:15:38,880 --> 00:15:44,959
is, it will attempt to heal.

285
00:15:43,040 --> 00:15:47,440
The first step when the system is

286
00:15:44,959 --> 00:15:51,360
attempting to heal

287
00:15:47,440 --> 00:15:54,880
is to get more information

288
00:15:51,360 --> 00:15:58,560
about the failing element.

289
00:15:54,880 --> 00:16:01,040
The stract element context is called and

290
00:15:58,560 --> 00:16:04,240
the first thing it does is to start a

291
00:16:01,040 --> 00:16:07,680
dictionary called the context with

292
00:16:04,240 --> 00:16:11,360
default values for each fingerprint.

293
00:16:07,680 --> 00:16:15,199
After that it extracts the exact page

294
00:16:11,360 --> 00:16:18,480
the element is supposed to be and to and

295
00:16:15,199 --> 00:16:20,639
add to the context and it's possible to

296
00:16:18,480 --> 00:16:24,399
try and get more elements from the body

297
00:16:20,639 --> 00:16:27,360
in that page. It's possible to infer

298
00:16:24,399 --> 00:16:30,800
some data about that element too from

299
00:16:27,360 --> 00:16:35,360
the row link. For example, it's possible

300
00:16:30,800 --> 00:16:37,920
to infer that the tag name is probably a

301
00:16:35,360 --> 00:16:44,199
after adding all this information in the

302
00:16:37,920 --> 00:16:44,199
context dict returns the context.

303
00:16:49,120 --> 00:16:54,959
Now it's time for the healing system to

304
00:16:51,600 --> 00:16:58,160
generate all possible alternatives for

305
00:16:54,959 --> 00:17:01,040
the failing locator.

306
00:16:58,160 --> 00:17:03,839
The dict context was passed as argument

307
00:17:01,040 --> 00:17:08,079
and is used as base to query the

308
00:17:03,839 --> 00:17:10,480
database to get similar fingerprints.

309
00:17:08,079 --> 00:17:13,919
The similar fingerprints are passed to

310
00:17:10,480 --> 00:17:16,559
strategies to generate an exhaustive

311
00:17:13,919 --> 00:17:19,280
number of possibilities

312
00:17:16,559 --> 00:17:22,079
and it can be a really big number of

313
00:17:19,280 --> 00:17:26,319
possibilities. For the example we are

314
00:17:22,079 --> 00:17:28,480
working on here, it suggests 133

315
00:17:26,319 --> 00:17:31,440
alternatives.

316
00:17:28,480 --> 00:17:34,320
A lot of is rubbish of course, but

317
00:17:31,440 --> 00:17:37,360
that's not for me to classify them as

318
00:17:34,320 --> 00:17:41,000
rubbish. That's the job for random

319
00:17:37,360 --> 00:17:41,000
forest classifier.

320
00:17:41,039 --> 00:17:48,000
Sending 133 alternatives wouldn't be

321
00:17:44,799 --> 00:17:50,400
very easy to add in a slide. So I just

322
00:17:48,000 --> 00:17:53,679
added some good alternatives and some

323
00:17:50,400 --> 00:17:56,320
not so good alternatives to call and did

324
00:17:53,679 --> 00:17:58,640
a call to the endpoint where the model

325
00:17:56,320 --> 00:18:02,160
is being served.

326
00:17:58,640 --> 00:18:04,640
The model then classified all of them.

327
00:18:02,160 --> 00:18:06,880
Let's start analyzing the first one.

328
00:18:04,640 --> 00:18:10,799
It's 0.92.

329
00:18:06,880 --> 00:18:13,520
It is a really high confidence score.

330
00:18:10,799 --> 00:18:17,919
The one I used before as an as an

331
00:18:13,520 --> 00:18:21,600
example of successful is the third one

332
00:18:17,919 --> 00:18:25,280
and is classified as 0.91.

333
00:18:21,600 --> 00:18:27,760
Again, very high confidence score.

334
00:18:25,280 --> 00:18:31,840
And we can see that the weirder it gets,

335
00:18:27,760 --> 00:18:34,240
less likely to succeed it is. The

336
00:18:31,840 --> 00:18:37,679
endpoint not only returns all

337
00:18:34,240 --> 00:18:40,720
alternatives classified, but also return

338
00:18:37,679 --> 00:18:44,240
a ranked alternatives list organized

339
00:18:40,720 --> 00:18:46,720
from the most likely to succeed to less

340
00:18:44,240 --> 00:18:49,360
likely. And this is good because as soon

341
00:18:46,720 --> 00:18:53,360
as the healing system finds a good

342
00:18:49,360 --> 00:18:56,360
locator, it just stop. No need to try

343
00:18:53,360 --> 00:18:56,360
others.

344
00:18:59,120 --> 00:19:04,960
Scikitle learn random forest classifier

345
00:19:02,880 --> 00:19:07,760
uh is a supervised machine learning

346
00:19:04,960 --> 00:19:10,880
algorithm which means it learns from

347
00:19:07,760 --> 00:19:13,679
historical examples where we already

348
00:19:10,880 --> 00:19:16,720
know the correct outcomes

349
00:19:13,679 --> 00:19:20,080
in the self-healing system case. This

350
00:19:16,720 --> 00:19:23,039
means training on thousands of past UI

351
00:19:20,080 --> 00:19:26,799
healing events that are labeled as

352
00:19:23,039 --> 00:19:30,080
either success or failure.

353
00:19:26,799 --> 00:19:33,919
The algorithm builds hundreds of

354
00:19:30,080 --> 00:19:36,080
decision trees, which are the forest,

355
00:19:33,919 --> 00:19:38,400
where each tree is trained on a

356
00:19:36,080 --> 00:19:40,960
different random subset of this

357
00:19:38,400 --> 00:19:43,679
historical data and uses a random

358
00:19:40,960 --> 00:19:47,200
selection of features at each decision

359
00:19:43,679 --> 00:19:49,440
point. When making predictions,

360
00:19:47,200 --> 00:19:52,799
all trees vote on the most likely

361
00:19:49,440 --> 00:19:56,080
outcome and the majority decision become

362
00:19:52,799 --> 00:20:00,000
becomes the final prediction.

363
00:19:56,080 --> 00:20:03,840
basically uses the wisdom of crowds

364
00:20:00,000 --> 00:20:07,440
principle that an idea uh of an

365
00:20:03,840 --> 00:20:09,039
individual individual can inherently be

366
00:20:07,440 --> 00:20:11,440
biased

367
00:20:09,039 --> 00:20:14,640
where

368
00:20:11,440 --> 00:20:17,200
but taking the average knowledge of a

369
00:20:14,640 --> 00:20:20,240
crowd can result in eliminating the bias

370
00:20:17,200 --> 00:20:24,480
or noise to produce a clearer and more

371
00:20:20,240 --> 00:20:26,960
coherent result. So English is it is my

372
00:20:24,480 --> 00:20:29,960
second language.

373
00:20:26,960 --> 00:20:29,960
So

374
00:20:30,559 --> 00:20:36,640
since machine mach machine learning oh

375
00:20:33,840 --> 00:20:39,360
my god it's getting hard.

376
00:20:36,640 --> 00:20:42,159
So since machine learning algorithms can

377
00:20:39,360 --> 00:20:45,440
only understand numbers we need to

378
00:20:42,159 --> 00:20:48,559
convert everything about a UI element

379
00:20:45,440 --> 00:20:52,000
and its alternatives into numerical

380
00:20:48,559 --> 00:20:55,360
representations. For example, when a

381
00:20:52,000 --> 00:20:59,039
test fails on a selector like get by row

382
00:20:55,360 --> 00:21:03,679
link name abt, the healing system

383
00:20:59,039 --> 00:21:06,400
doesn't just see text. It is measurable

384
00:21:03,679 --> 00:21:08,640
characteristics.

385
00:21:06,400 --> 00:21:11,360
And the transformation from human

386
00:21:08,640 --> 00:21:15,120
readable information to numerical data

387
00:21:11,360 --> 00:21:17,120
is what allows the AI to make uh

388
00:21:15,120 --> 00:21:20,120
mathematical comparisons and

389
00:21:17,120 --> 00:21:20,120
predictions.

390
00:21:20,559 --> 00:21:27,280
The self-healing system extracts 85

391
00:21:24,159 --> 00:21:30,480
different distinct uh numerical features

392
00:21:27,280 --> 00:21:32,880
across six categories, each designed to

393
00:21:30,480 --> 00:21:36,000
capture different aspects of what makes

394
00:21:32,880 --> 00:21:39,520
a good healing choice.

395
00:21:36,000 --> 00:21:42,080
The selector features converts basic

396
00:21:39,520 --> 00:21:46,159
characteristics of the selectors in

397
00:21:42,080 --> 00:21:50,320
numbers. For example, a string length

398
00:21:46,159 --> 00:21:54,640
becomes a character count. Complexity

399
00:21:50,320 --> 00:21:57,600
becomes a numerical score based on how

400
00:21:54,640 --> 00:22:02,400
many conditions and nested structures

401
00:21:57,600 --> 00:22:04,320
exist. For example, a simple ID

402
00:22:02,400 --> 00:22:08,640
hashubmit

403
00:22:04,320 --> 00:22:12,559
button gets a complexity score of one.

404
00:22:08,640 --> 00:22:16,000
But that nested selector in this slide

405
00:22:12,559 --> 00:22:18,159
gets a complexity score way higher

406
00:22:16,000 --> 00:22:21,159
because of multiple condition and

407
00:22:18,159 --> 00:22:21,159
nesting.

408
00:22:22,960 --> 00:22:30,080
The similarity features that transform

409
00:22:25,440 --> 00:22:33,360
the concept of how similar are those two

410
00:22:30,080 --> 00:22:37,440
selectors into precise numerical

411
00:22:33,360 --> 00:22:40,159
measurements using jakard similarity.

412
00:22:37,440 --> 00:22:43,679
Jakar similarity compares two sets of

413
00:22:40,159 --> 00:22:47,760
data and returns a percentage from zero

414
00:22:43,679 --> 00:22:50,640
to one that express how similar they are

415
00:22:47,760 --> 00:22:53,679
basically is the ratio between the

416
00:22:50,640 --> 00:22:57,440
number of observations in both sets and

417
00:22:53,679 --> 00:23:00,640
the number in either set. So we can see

418
00:22:57,440 --> 00:23:04,400
that after exclude excluding repeated

419
00:23:00,640 --> 00:23:08,640
car characters from both sets the jakard

420
00:23:04,400 --> 00:23:11,919
similarity is very high because only the

421
00:23:08,640 --> 00:23:17,200
u in about

422
00:23:11,919 --> 00:23:20,320
is different. Also since from ab to

423
00:23:17,200 --> 00:23:24,000
about there are only two characters

424
00:23:20,320 --> 00:23:27,200
added. The added distance between the

425
00:23:24,000 --> 00:23:30,480
two words is true. And there are also

426
00:23:27,200 --> 00:23:33,520
semantic similarity features that checks

427
00:23:30,480 --> 00:23:37,679
abt and about about

428
00:23:33,520 --> 00:23:41,039
and abt and sponsor and understand that

429
00:23:37,679 --> 00:23:42,960
abt and sponsor are two unlike to be the

430
00:23:41,039 --> 00:23:46,400
same element.

431
00:23:42,960 --> 00:23:50,919
But about is very likely to be the link

432
00:23:46,400 --> 00:23:50,919
name playwright is looking for.

433
00:23:52,320 --> 00:23:57,840
The context features converts

434
00:23:54,799 --> 00:24:01,360
environmental information about the web

435
00:23:57,840 --> 00:24:05,679
page and HTML elements into numerical

436
00:24:01,360 --> 00:24:09,600
form. Look all the extracted information

437
00:24:05,679 --> 00:24:12,240
for PyCon Wayu web page in the home path

438
00:24:09,600 --> 00:24:14,559
and about path.

439
00:24:12,240 --> 00:24:17,760
Just from all this information, the

440
00:24:14,559 --> 00:24:21,320
system already knows these are two

441
00:24:17,760 --> 00:24:21,320
different paths.

442
00:24:21,360 --> 00:24:28,480
In this context features the HTML

443
00:24:24,400 --> 00:24:31,679
element element type become categorical

444
00:24:28,480 --> 00:24:36,240
numbers where button equals 1 input

445
00:24:31,679 --> 00:24:39,600
equals 2 link equals three

446
00:24:36,240 --> 00:24:42,240
etc. So when healing fails on get by row

447
00:24:39,600 --> 00:24:45,919
link name abt

448
00:24:42,240 --> 00:24:48,240
the system knows it's targeting a link

449
00:24:45,919 --> 00:24:51,720
element and can factor that into its

450
00:24:48,240 --> 00:24:51,720
healing predictions.

451
00:24:52,000 --> 00:24:55,679
Now what I believe to be the most

452
00:24:53,760 --> 00:24:58,559
sophisticated

453
00:24:55,679 --> 00:25:02,320
uh transformation the reliability

454
00:24:58,559 --> 00:25:07,240
features they convert best practices for

455
00:25:02,320 --> 00:25:07,240
selectors in numerical scores.

456
00:25:07,279 --> 00:25:14,000
The presence of an ID selector

457
00:25:10,880 --> 00:25:16,240
becomes uses ID selector equals one.

458
00:25:14,000 --> 00:25:21,760
Good for reliability

459
00:25:16,240 --> 00:25:25,279
and receive a penalty for uh and fragile

460
00:25:21,760 --> 00:25:31,279
patterns like nth child receives a

461
00:25:25,279 --> 00:25:34,000
penalty for uh bad reliability.

462
00:25:31,279 --> 00:25:37,600
Role based selectors like get by row

463
00:25:34,000 --> 00:25:41,919
link name about get positive reliability

464
00:25:37,600 --> 00:25:46,320
scores because they use semantic HTML

465
00:25:41,919 --> 00:25:49,600
uh roles rather than brittle CSS paths.

466
00:25:46,320 --> 00:25:52,799
The system literally counts things like

467
00:25:49,600 --> 00:25:54,799
nesting death, selector nesting death

468
00:25:52,799 --> 00:25:59,120
equals tree,

469
00:25:54,799 --> 00:26:01,440
attribute conditions, and CSS uh

470
00:25:59,120 --> 00:26:04,240
combinators to create numerical

471
00:26:01,440 --> 00:26:08,200
reliability indicators that predict

472
00:26:04,240 --> 00:26:08,200
selector stability.

473
00:26:11,679 --> 00:26:18,400
The dome features analyze where an

474
00:26:14,080 --> 00:26:21,279
element is within the page. Uh here are

475
00:26:18,400 --> 00:26:25,600
code structure like the address of an

476
00:26:21,279 --> 00:26:28,480
element in the HTML document.

477
00:26:25,600 --> 00:26:31,200
When a UI test fails, knowing the

478
00:26:28,480 --> 00:26:33,840
structure context helps predict which

479
00:26:31,200 --> 00:26:37,039
alternative will work in the same

480
00:26:33,840 --> 00:26:41,039
location. The features transform this

481
00:26:37,039 --> 00:26:44,159
address into numerical data also creates

482
00:26:41,039 --> 00:26:47,120
binary flags from parents information to

483
00:26:44,159 --> 00:26:50,240
know if it is inside a form, a nav or a

484
00:26:47,120 --> 00:26:53,200
container. This is important for for

485
00:26:50,240 --> 00:26:57,039
healing because elements within forms

486
00:26:53,200 --> 00:27:00,080
typically uses semantic attributes like

487
00:26:57,039 --> 00:27:01,600
name or ID. So if the element has form

488
00:27:00,080 --> 00:27:05,200
parent

489
00:27:01,600 --> 00:27:10,039
uh the model knows that should

490
00:27:05,200 --> 00:27:10,039
prioritize form specific selectors.

491
00:27:11,360 --> 00:27:17,600
The last category are the text features.

492
00:27:14,799 --> 00:27:20,400
The text features anal uh analyze the

493
00:27:17,600 --> 00:27:24,400
actual readable content associated with

494
00:27:20,400 --> 00:27:28,559
UI elements. X example of the text about

495
00:27:24,400 --> 00:27:32,320
for that link. Text content is often uh

496
00:27:28,559 --> 00:27:36,720
the most stable aspect of web interfaces

497
00:27:32,320 --> 00:27:39,840
while CSS cla classes and DOM structure

498
00:27:36,720 --> 00:27:42,640
uh frequently change. User visible

499
00:27:39,840 --> 00:27:44,320
labels like about 10 to remain

500
00:27:42,640 --> 00:27:47,279
consistent.

501
00:27:44,320 --> 00:27:49,200
The system examiners the inner text and

502
00:27:47,279 --> 00:27:51,600
test content to create multiple

503
00:27:49,200 --> 00:27:54,480
numerical features like if it has a

504
00:27:51,600 --> 00:27:58,240
inner text, the length of the inner

505
00:27:54,480 --> 00:28:01,279
text, the length of the text.

506
00:27:58,240 --> 00:28:04,240
Um, also checks if the text is numerical

507
00:28:01,279 --> 00:28:06,240
or if it contains spaces. And this is

508
00:28:04,240 --> 00:28:09,039
important for healing because knowing

509
00:28:06,240 --> 00:28:12,399
that has an inner text like about it

510
00:28:09,039 --> 00:28:15,039
makes get by row link name about a

511
00:28:12,399 --> 00:28:19,760
preferred alternative when healing the

512
00:28:15,039 --> 00:28:24,480
failing element. Finally, let's see in

513
00:28:19,760 --> 00:28:26,640
practice how this works. I have uh a

514
00:28:24,480 --> 00:28:31,799
video here

515
00:28:26,640 --> 00:28:31,799
for a test without the healing.

516
00:28:31,919 --> 00:28:39,919
So, uh this will take a while. Spoiler

517
00:28:35,919 --> 00:28:43,480
alert. It will fail. Will trigger a

518
00:28:39,919 --> 00:28:43,480
timeout error.

519
00:28:51,120 --> 00:28:57,520
I am running out of time. So, I will

520
00:28:53,520 --> 00:29:04,159
just move it forward. Okay. So, a

521
00:28:57,520 --> 00:29:07,039
timeout error occurred. Um yeah so

522
00:29:04,159 --> 00:29:10,200
now we are going to see how it works in

523
00:29:07,039 --> 00:29:10,200
a test

524
00:29:13,760 --> 00:29:20,320
with the healing and I will move forward

525
00:29:17,279 --> 00:29:23,679
because it takes like around 30 seconds

526
00:29:20,320 --> 00:29:25,520
to to the heal to the timeout

527
00:29:23,679 --> 00:29:28,240
starts

528
00:29:25,520 --> 00:29:32,159
and the healing and then it will just

529
00:29:28,240 --> 00:29:37,880
like click click around

530
00:29:32,159 --> 00:29:37,880
the page and do another things and then

531
00:29:40,880 --> 00:29:45,399
boom successful. So

532
00:29:45,840 --> 00:29:50,559
final considerations really quick to

533
00:29:48,640 --> 00:29:52,960
finish this talk there are a couple

534
00:29:50,559 --> 00:29:55,760
things I want to mention about what is

535
00:29:52,960 --> 00:30:01,200
the future

536
00:29:55,760 --> 00:30:08,000
for that for uh this project. So

537
00:30:01,200 --> 00:30:12,080
this was trained with synthetic data. So

538
00:30:08,000 --> 00:30:14,960
now from moving forward as more healing

539
00:30:12,080 --> 00:30:20,000
events um

540
00:30:14,960 --> 00:30:23,600
start to to happen in our in our tests

541
00:30:20,000 --> 00:30:27,120
more data we will have to retrain the

542
00:30:23,600 --> 00:30:30,399
model and then refine this model later.

543
00:30:27,120 --> 00:30:34,000
uh and this is important because uh the

544
00:30:30,399 --> 00:30:37,679
when it's it's a powerful powerful thing

545
00:30:34,000 --> 00:30:41,039
to have synthetic data generated but

546
00:30:37,679 --> 00:30:44,320
they are not uh very good for real

547
00:30:41,039 --> 00:30:49,039
problems. So as soon we have more real

548
00:30:44,320 --> 00:30:51,120
data the model will be refined and also

549
00:30:49,039 --> 00:30:55,279
uh the other consideration is a

550
00:30:51,120 --> 00:30:57,520
notification system because uh we want

551
00:30:55,279 --> 00:31:00,720
that for a pipeline. We want the

552
00:30:57,520 --> 00:31:03,360
pipeline to succeed but we don't want to

553
00:31:00,720 --> 00:31:07,360
keep healing all the time. So we need to

554
00:31:03,360 --> 00:31:10,799
not be notified when a healing is h

555
00:31:07,360 --> 00:31:13,440
happened. uh so we can change the code

556
00:31:10,799 --> 00:31:17,360
right we don't want uh to keep using

557
00:31:13,440 --> 00:31:22,960
this helings all the time so

558
00:31:17,360 --> 00:31:25,600
I okay so thank you that was my talk and

559
00:31:22,960 --> 00:31:28,880
I probably don't have more time now for

560
00:31:25,600 --> 00:31:32,580
questions but I'll be around if needed

561
00:31:28,880 --> 00:31:36,160
to talk about it thank Thank you.

562
00:31:32,580 --> 00:31:36,160
[Applause]