1 00:00:12,960 --> 00:00:17,279 hi everyone so i am internal rosh from 2 00:00:15,839 --> 00:00:19,840 the school of fundamental sciences 3 00:00:17,279 --> 00:00:21,439 massive university so today i am going 4 00:00:19,840 --> 00:00:24,160 to talk about 5 00:00:21,439 --> 00:00:26,480 football uh soccer data analysis uh 6 00:00:24,160 --> 00:00:27,840 pedagogy introduction this is my twitter 7 00:00:26,480 --> 00:00:30,480 handle 8 00:00:27,840 --> 00:00:32,960 and this is the link to my website 9 00:00:30,480 --> 00:00:35,200 thanks spike on a you for giving me this 10 00:00:32,960 --> 00:00:38,239 opportunity to speak at this conference 11 00:00:35,200 --> 00:00:40,960 so like i'm really glad that uh i'm 12 00:00:38,239 --> 00:00:43,360 present here with you all so this is my 13 00:00:40,960 --> 00:00:44,879 second year speaking at this renowned 14 00:00:43,360 --> 00:00:47,039 conference and 15 00:00:44,879 --> 00:00:48,879 welcome everyone so i'm going to 16 00:00:47,039 --> 00:00:51,039 talk about how to get open access even 17 00:00:48,879 --> 00:00:52,960 data from status bomb using the package 18 00:00:51,039 --> 00:00:54,719 called stats bom pi in python 19 00:00:52,960 --> 00:00:56,719 then how to draw a soccer pitch using 20 00:00:54,719 --> 00:00:58,399 mpl soccer package 21 00:00:56,719 --> 00:01:00,719 how to visualize a pass network for a 22 00:00:58,399 --> 00:01:03,760 particular team in a particular match 23 00:01:00,719 --> 00:01:06,960 and how to use uh network x module to 24 00:01:03,760 --> 00:01:08,479 analyze the pass network to draw pass 25 00:01:06,960 --> 00:01:09,760 maps along uh 26 00:01:08,479 --> 00:01:12,080 their corresponding and their 27 00:01:09,760 --> 00:01:13,119 corresponding heat maps 28 00:01:12,080 --> 00:01:14,880 of a 29 00:01:13,119 --> 00:01:16,320 of players from a particular team in a 30 00:01:14,880 --> 00:01:18,320 particular match 31 00:01:16,320 --> 00:01:19,840 and finally we will talk about how to 32 00:01:18,320 --> 00:01:21,680 implement computational geometric 33 00:01:19,840 --> 00:01:23,360 concepts like convex hulls voronoi 34 00:01:21,680 --> 00:01:25,520 diagrams and delanite triangulations 35 00:01:23,360 --> 00:01:27,520 using a python package called scipy dot 36 00:01:25,520 --> 00:01:29,520 special like any other package we use 37 00:01:27,520 --> 00:01:31,119 peep to install status bomb pipe 38 00:01:29,520 --> 00:01:33,360 and then uh 39 00:01:31,119 --> 00:01:35,520 the open data from status pike and stats 40 00:01:33,360 --> 00:01:37,600 bomb can be accessed and like uh please 41 00:01:35,520 --> 00:01:38,720 go through the terms and conditions uh 42 00:01:37,600 --> 00:01:40,560 section 43 00:01:38,720 --> 00:01:42,799 stated at their github documentation 44 00:01:40,560 --> 00:01:45,119 we'll go step by step uh to understand 45 00:01:42,799 --> 00:01:47,759 how to extract the relevant data 46 00:01:45,119 --> 00:01:49,920 and uh using the stats bomb by package 47 00:01:47,759 --> 00:01:51,280 so we also need to import the package 48 00:01:49,920 --> 00:01:52,799 and next we get access to the 49 00:01:51,280 --> 00:01:54,560 competitions data set what the 50 00:01:52,799 --> 00:01:56,799 competition's data set looks uh like 51 00:01:54,560 --> 00:01:58,240 comp data set looks like so we have a 52 00:01:56,799 --> 00:02:00,240 competition id 53 00:01:58,240 --> 00:02:02,719 and a season id so both these together 54 00:02:00,240 --> 00:02:06,960 act as uh like 55 00:02:02,719 --> 00:02:09,280 a unique id for extracting a particular 56 00:02:06,960 --> 00:02:11,200 row we also have the country name here 57 00:02:09,280 --> 00:02:12,640 actually it's the continent on uh from 58 00:02:11,200 --> 00:02:14,000 where the 59 00:02:12,640 --> 00:02:16,239 league belongs 60 00:02:14,000 --> 00:02:18,800 uh so here it's europe uh the 61 00:02:16,239 --> 00:02:21,599 competition name is uh champions league 62 00:02:18,800 --> 00:02:24,480 uh so and the gender is male so 63 00:02:21,599 --> 00:02:26,800 unfortunately the first 15 rows here 64 00:02:24,480 --> 00:02:28,720 that i am showing is uh 65 00:02:26,800 --> 00:02:30,319 of gender male but if you print the 66 00:02:28,720 --> 00:02:33,360 whole uh 67 00:02:30,319 --> 00:02:36,080 data set you will see there are 68 00:02:33,360 --> 00:02:40,000 like rows where genders are females too 69 00:02:36,080 --> 00:02:41,920 season name is uh 2018-19 or like 17-18 70 00:02:40,000 --> 00:02:44,800 and so on you can extract the column 71 00:02:41,920 --> 00:02:46,800 names of comp and this is what ah the 72 00:02:44,800 --> 00:02:49,280 column names are compressionary season 73 00:02:46,800 --> 00:02:51,200 id and so on ah now let us make sense of 74 00:02:49,280 --> 00:02:53,120 a particular row from the comp data set 75 00:02:51,200 --> 00:02:54,800 so for example we if we look into the 76 00:02:53,120 --> 00:02:56,800 row where the competition at is 16 and 77 00:02:54,800 --> 00:02:58,400 the season id is one we notice that the 78 00:02:56,800 --> 00:02:59,760 country name is europe the competition 79 00:02:58,400 --> 00:03:03,599 name is champions league and the season 80 00:02:59,760 --> 00:03:04,720 name is 2017-18 and so so on so suppose 81 00:03:03,599 --> 00:03:06,720 we are satisfied with the above 82 00:03:04,720 --> 00:03:09,599 information and we want to analyze a 83 00:03:06,720 --> 00:03:11,840 game from 1718's champions league season 84 00:03:09,599 --> 00:03:13,040 so we use both this competition id and 85 00:03:11,840 --> 00:03:13,920 season ids 86 00:03:13,040 --> 00:03:15,920 and 87 00:03:13,920 --> 00:03:17,920 this is what we write uh to extract that 88 00:03:15,920 --> 00:03:19,519 particular row so math equals to sb dot 89 00:03:17,920 --> 00:03:22,239 matches competition id equals to 16 and 90 00:03:19,519 --> 00:03:24,080 season 8 equals to 1 so this gives 91 00:03:22,239 --> 00:03:26,879 us the particular match from where we 92 00:03:24,080 --> 00:03:30,640 will extract the even data 93 00:03:26,879 --> 00:03:32,560 so this is uh the matte data set and we 94 00:03:30,640 --> 00:03:34,480 see it also has a match id which is 95 00:03:32,560 --> 00:03:37,599 again the unique id for this particular 96 00:03:34,480 --> 00:03:38,640 data set mat but here we have only one 97 00:03:37,599 --> 00:03:40,879 row 98 00:03:38,640 --> 00:03:43,599 for this data set and the match date 99 00:03:40,879 --> 00:03:44,400 when the match took place uh the kickoff 100 00:03:43,599 --> 00:03:46,799 time 101 00:03:44,400 --> 00:03:49,400 uh as we know it's an european champions 102 00:03:46,799 --> 00:03:51,760 league uh game so it was on 103 00:03:49,400 --> 00:03:53,519 2017-18 the home team was real madrid 104 00:03:51,760 --> 00:03:55,360 and the other team was liverpool and the 105 00:03:53,519 --> 00:03:58,400 home team score was three and their team 106 00:03:55,360 --> 00:04:00,560 score was one now we use this match id 107 00:03:58,400 --> 00:04:01,840 uh to extract the events and store it in 108 00:04:00,560 --> 00:04:04,159 events data set 109 00:04:01,840 --> 00:04:05,519 okay ah so the events data set uh 110 00:04:04,159 --> 00:04:07,200 fetching as the events data for the 111 00:04:05,519 --> 00:04:09,519 particular match and that's what it 112 00:04:07,200 --> 00:04:11,439 looks like it has a 113 00:04:09,519 --> 00:04:13,040 large number of rows 114 00:04:11,439 --> 00:04:15,120 and columns so we see that we are able 115 00:04:13,040 --> 00:04:18,160 to get access to all the events from uh 116 00:04:15,120 --> 00:04:20,400 the real madrid versus the purple match 117 00:04:18,160 --> 00:04:23,520 and like we can also print the column 118 00:04:20,400 --> 00:04:27,360 names using events.columns uh now we 119 00:04:23,520 --> 00:04:31,520 will see how we can draw a 120 00:04:27,360 --> 00:04:31,520 football pitch using mpl soccer 121 00:04:31,680 --> 00:04:36,560 and this package was developed by uh 122 00:04:34,639 --> 00:04:38,560 and developed and maintained by animal 123 00:04:36,560 --> 00:04:40,000 rogapal and andrew rollinson please go 124 00:04:38,560 --> 00:04:41,520 and check them out 125 00:04:40,000 --> 00:04:43,440 and like 126 00:04:41,520 --> 00:04:44,639 like any other package we pp installed 127 00:04:43,440 --> 00:04:46,880 it first 128 00:04:44,639 --> 00:04:48,639 and then we uh 129 00:04:46,880 --> 00:04:49,600 we also need to import the matplotlib 130 00:04:48,639 --> 00:04:53,040 and the 131 00:04:49,600 --> 00:04:54,840 matplotlib and the pitch classes uh so 132 00:04:53,040 --> 00:04:57,120 this is what we do here import macbook 133 00:04:54,840 --> 00:05:00,000 live.pipe.spl and uh from mpl 134 00:04:57,120 --> 00:05:03,440 soccer.page import 135 00:05:00,000 --> 00:05:06,160 page but here p is capacity that uh with 136 00:05:03,440 --> 00:05:08,080 the function pitch with again p capital 137 00:05:06,160 --> 00:05:10,400 uh we set the page pitch color to be 138 00:05:08,080 --> 00:05:12,800 grass the line color to be white 139 00:05:10,400 --> 00:05:14,240 stripe to be true and so on okay and 140 00:05:12,800 --> 00:05:16,160 these are different parameters that you 141 00:05:14,240 --> 00:05:18,560 can enter and if you draw it you will 142 00:05:16,160 --> 00:05:21,199 end up with this page so this is the 143 00:05:18,560 --> 00:05:23,600 default one uh the stats bomb uh type 144 00:05:21,199 --> 00:05:26,960 page uh which is provide provided by the 145 00:05:23,600 --> 00:05:30,160 stats boom by package where uh the x and 146 00:05:26,960 --> 00:05:33,360 like x axis ranges from zero to 120 and 147 00:05:30,160 --> 00:05:35,440 the y axis ranges from 80 to zero 148 00:05:33,360 --> 00:05:37,120 and but there are different uh types of 149 00:05:35,440 --> 00:05:39,120 features that you can draw with uh just 150 00:05:37,120 --> 00:05:40,960 as one by package uh except the default 151 00:05:39,120 --> 00:05:42,479 one like apart from the default one you 152 00:05:40,960 --> 00:05:44,320 can also set the color to be black and 153 00:05:42,479 --> 00:05:46,400 this is what you will end up with then 154 00:05:44,320 --> 00:05:48,960 the default one stats bomb by their 155 00:05:46,400 --> 00:05:51,360 stats bomb there are different uh 156 00:05:48,960 --> 00:05:54,320 other pictures like opta track cap skill 157 00:05:51,360 --> 00:05:55,520 corner y scout metric sports ufr custom 158 00:05:54,320 --> 00:05:56,800 and so on 159 00:05:55,520 --> 00:05:58,639 okay 160 00:05:56,800 --> 00:06:01,280 like this can be set using pitch type 161 00:05:58,639 --> 00:06:04,160 argument inside the pitch function 162 00:06:01,280 --> 00:06:07,520 here i have set the pitch type to be ufo 163 00:06:04,160 --> 00:06:10,000 and here we see that x ranges from 0 to 164 00:06:07,520 --> 00:06:12,160 slightly above 100 and y ranges from 0 165 00:06:10,000 --> 00:06:13,520 to 70. these are the most basic concepts 166 00:06:12,160 --> 00:06:15,759 covering the topic of drawing and 167 00:06:13,520 --> 00:06:19,039 visualizing a football page using mpl 168 00:06:15,759 --> 00:06:20,639 soccer and now we will use uh 169 00:06:19,039 --> 00:06:23,680 the concepts from complex network 170 00:06:20,639 --> 00:06:25,840 analysis to draw a pass network uh for a 171 00:06:23,680 --> 00:06:27,440 particular team from a particular match 172 00:06:25,840 --> 00:06:29,759 and then we will analyze that network 173 00:06:27,440 --> 00:06:31,440 using networkx package for that we need 174 00:06:29,759 --> 00:06:33,120 to keep install network x we import 175 00:06:31,440 --> 00:06:35,680 network access nx 176 00:06:33,120 --> 00:06:38,560 we also keep install seaborn 177 00:06:35,680 --> 00:06:40,160 and import it at sns if you go back and 178 00:06:38,560 --> 00:06:41,680 look into the events data set you will 179 00:06:40,160 --> 00:06:43,280 notice that there is a column named 180 00:06:41,680 --> 00:06:45,039 tactics that provides us with the team 181 00:06:43,280 --> 00:06:46,319 lineups formations player ids and 182 00:06:45,039 --> 00:06:49,759 they're just a number from both the 183 00:06:46,319 --> 00:06:52,560 teams okay so the corresponding uh 184 00:06:49,759 --> 00:06:54,240 row values for a column type gives us an 185 00:06:52,560 --> 00:06:56,319 idea about whether it was the starting 186 00:06:54,240 --> 00:06:58,000 element for formation or was a tactical 187 00:06:56,319 --> 00:06:59,520 shift so let us generate a completely 188 00:06:58,000 --> 00:07:01,440 new data set only focusing on the 189 00:06:59,520 --> 00:07:03,120 tactics and the type columns 190 00:07:01,440 --> 00:07:05,440 and we'll filter the data in such a way 191 00:07:03,120 --> 00:07:07,919 that the tactics column has no rows set 192 00:07:05,440 --> 00:07:09,599 to name okay so that's what we do here 193 00:07:07,919 --> 00:07:11,759 so this is what it looks like here we 194 00:07:09,599 --> 00:07:14,639 see 195 00:07:11,759 --> 00:07:15,599 the entry uh of the tactics column 196 00:07:14,639 --> 00:07:16,479 uh 197 00:07:15,599 --> 00:07:19,840 is a 198 00:07:16,479 --> 00:07:22,319 dictionary but the key formation 199 00:07:19,840 --> 00:07:25,520 uh gives the formation of the players 200 00:07:22,319 --> 00:07:25,520 the key lineup gives the 201 00:07:25,759 --> 00:07:29,759 player ids names and jersey numbers 202 00:07:28,400 --> 00:07:31,199 for both the team let's focus only on 203 00:07:29,759 --> 00:07:34,240 the tactics for the starting eleven 204 00:07:31,199 --> 00:07:36,720 setup from both the teams and uh like uh 205 00:07:34,240 --> 00:07:38,240 this is given so this i got this idea 206 00:07:36,720 --> 00:07:41,919 from uh 207 00:07:38,240 --> 00:07:44,960 like from the mpl soccer 208 00:07:41,919 --> 00:07:47,759 package uh documentation uh website and 209 00:07:44,960 --> 00:07:49,599 also from the uh youtube video by mac 210 00:07:47,759 --> 00:07:51,840 johns and like uh you should go and 211 00:07:49,599 --> 00:07:54,080 check both these resources out 212 00:07:51,840 --> 00:07:55,520 okay so we will build the analysis uh 213 00:07:54,080 --> 00:07:57,039 build and analyze the past network 214 00:07:55,520 --> 00:07:58,960 generated from among the starting 11 215 00:07:57,039 --> 00:08:00,560 players from either of the tips 216 00:07:58,960 --> 00:08:03,360 okay if you look into the first two rows 217 00:08:00,560 --> 00:08:05,680 of the type column intact we see that 218 00:08:03,360 --> 00:08:08,080 they are set as starting 11 219 00:08:05,680 --> 00:08:10,240 one for each team so let us separately 220 00:08:08,080 --> 00:08:12,160 face the data for the teams so that's 221 00:08:10,240 --> 00:08:14,080 what we do here so it's as i said it was 222 00:08:12,160 --> 00:08:16,960 a the tactics column is made up of a 223 00:08:14,080 --> 00:08:18,960 python dictionary object and for now we 224 00:08:16,960 --> 00:08:19,919 are only interested in the key lineup 225 00:08:18,960 --> 00:08:22,319 okay 226 00:08:19,919 --> 00:08:23,440 so we just extract the information from 227 00:08:22,319 --> 00:08:25,840 the lineup 228 00:08:23,440 --> 00:08:27,039 key we then convert that into a data 229 00:08:25,840 --> 00:08:30,479 frame using 230 00:08:27,039 --> 00:08:32,080 from dict from dict function of the 231 00:08:30,479 --> 00:08:34,719 pandas data frame 232 00:08:32,080 --> 00:08:36,080 we end up with this okay so we say that 233 00:08:34,719 --> 00:08:37,599 the 234 00:08:36,080 --> 00:08:39,039 for the 235 00:08:37,599 --> 00:08:42,000 column player 236 00:08:39,039 --> 00:08:44,640 suppose the id is 5597 and his name was 237 00:08:42,000 --> 00:08:46,720 keller nevers and his position is a 238 00:08:44,640 --> 00:08:48,560 goalkeeper and 239 00:08:46,720 --> 00:08:50,000 his id is one and the jersey number is 240 00:08:48,560 --> 00:08:51,519 also given so we are basically 241 00:08:50,000 --> 00:08:53,680 interested in the players names and 242 00:08:51,519 --> 00:08:55,440 their corresponding jersey numbers so we 243 00:08:53,680 --> 00:08:57,279 will use a simple for loop and store the 244 00:08:55,440 --> 00:08:59,600 information in separate dictionaries for 245 00:08:57,279 --> 00:09:00,959 both the teams okay so that's what we do 246 00:08:59,600 --> 00:09:02,240 here okay so now we have collected the 247 00:09:00,959 --> 00:09:04,160 names and the jeffy numbers of the 248 00:09:02,240 --> 00:09:05,839 players the starting eleven players 249 00:09:04,160 --> 00:09:08,080 uh from both the team from the events 250 00:09:05,839 --> 00:09:10,240 data set will extract out the relevant 251 00:09:08,080 --> 00:09:12,000 columns for our past network analysis 252 00:09:10,240 --> 00:09:15,360 and these are the relevant columns 253 00:09:12,000 --> 00:09:17,760 minute second team type location 254 00:09:15,360 --> 00:09:19,760 person location pass outcome and player 255 00:09:17,760 --> 00:09:22,480 so this is the first 10 rows of the 256 00:09:19,760 --> 00:09:23,680 events pn data frame so minute second 257 00:09:22,480 --> 00:09:25,519 team 258 00:09:23,680 --> 00:09:27,120 type location password location pass 259 00:09:25,519 --> 00:09:28,720 outcome player next step is to filter 260 00:09:27,120 --> 00:09:30,800 out the data set by teams and store them 261 00:09:28,720 --> 00:09:33,200 as new data sets okay one for liverpool 262 00:09:30,800 --> 00:09:34,720 and one for real madrid 263 00:09:33,200 --> 00:09:36,560 that's what we do here as we are only 264 00:09:34,720 --> 00:09:38,560 interested in the past uh network 265 00:09:36,560 --> 00:09:40,720 generation we will filter the data sets 266 00:09:38,560 --> 00:09:43,200 by keeping only those rows where type is 267 00:09:40,720 --> 00:09:45,120 said to be pass and will dis discard all 268 00:09:43,200 --> 00:09:47,200 the other types again view the first 10 269 00:09:45,120 --> 00:09:51,040 rows of the filter data sets so we see 270 00:09:47,200 --> 00:09:53,120 that for real madrid uh here the team is 271 00:09:51,040 --> 00:09:55,279 only real madrid and like it was the 272 00:09:53,120 --> 00:09:56,959 type is only pass 273 00:09:55,279 --> 00:09:59,020 location 274 00:09:56,959 --> 00:10:00,560 here if you notice the location is a 275 00:09:59,020 --> 00:10:02,880 [Music] 276 00:10:00,560 --> 00:10:04,880 like it's a list of two numbers which 277 00:10:02,880 --> 00:10:07,200 gives the x and y coordinates and same 278 00:10:04,880 --> 00:10:09,600 for pass and location okay so later we 279 00:10:07,200 --> 00:10:12,000 will uh like separate this location 280 00:10:09,600 --> 00:10:13,440 column into location x and location y uh 281 00:10:12,000 --> 00:10:14,959 for the 282 00:10:13,440 --> 00:10:16,640 one column for x coordinates and the 283 00:10:14,959 --> 00:10:18,560 other column for y coordinates and same 284 00:10:16,640 --> 00:10:19,839 for pass and location let us now uh very 285 00:10:18,560 --> 00:10:22,480 carefully observe the datasets so 286 00:10:19,839 --> 00:10:24,240 suppose from eventspn uh real dataset we 287 00:10:22,480 --> 00:10:26,640 are focusing on the second and the third 288 00:10:24,240 --> 00:10:29,040 row command which makes the pass at 289 00:10:26,640 --> 00:10:30,720 around zeroth minute and tenth second 290 00:10:29,040 --> 00:10:33,040 second row and danny carvajal receives 291 00:10:30,720 --> 00:10:35,920 the pass at around 0th minute and 11 292 00:10:33,040 --> 00:10:38,000 seconds so in both the data sets we now 293 00:10:35,920 --> 00:10:39,920 need to add two extra columns named as 294 00:10:38,000 --> 00:10:42,160 pass maker and pass receiver but the 295 00:10:39,920 --> 00:10:43,600 pass maker column would be similar to 296 00:10:42,160 --> 00:10:45,680 player column and the pass receiver 297 00:10:43,600 --> 00:10:48,160 column would be the player column whose 298 00:10:45,680 --> 00:10:49,920 index would be shifted by one place in 299 00:10:48,160 --> 00:10:51,839 the negative direction what i learned uh 300 00:10:49,920 --> 00:10:54,240 from the 301 00:10:51,839 --> 00:10:56,640 like mpl soccer's uh documentation and 302 00:10:54,240 --> 00:10:58,800 the youtube video by mike johns so go 303 00:10:56,640 --> 00:11:00,480 take again go check them out 304 00:10:58,800 --> 00:11:02,640 so this can be achieved by the shift 305 00:11:00,480 --> 00:11:04,640 function provided by pandas so we will 306 00:11:02,640 --> 00:11:06,880 perform the operation on both the uh 307 00:11:04,640 --> 00:11:08,000 both events p and real and even spin uh 308 00:11:06,880 --> 00:11:09,040 leave 309 00:11:08,000 --> 00:11:10,720 okay 310 00:11:09,040 --> 00:11:13,440 so that's what we do here so this is the 311 00:11:10,720 --> 00:11:16,079 modified data set um so 312 00:11:13,440 --> 00:11:17,920 we see that uh we have added 313 00:11:16,079 --> 00:11:19,680 the pass maker uh column and the past 314 00:11:17,920 --> 00:11:20,959 receiver column uh now there might be 315 00:11:19,680 --> 00:11:23,839 passes which are which were not 316 00:11:20,959 --> 00:11:25,839 successful okay so one trick like one 317 00:11:23,839 --> 00:11:28,480 thing to notice is that in stats bomb 318 00:11:25,839 --> 00:11:30,160 data passes whose pass outcome are set 319 00:11:28,480 --> 00:11:32,240 as name are actually the successful 320 00:11:30,160 --> 00:11:34,320 passes okay so this is very important 321 00:11:32,240 --> 00:11:36,560 thing to keep in mind we'll again filter 322 00:11:34,320 --> 00:11:38,720 the data sets by successful process 323 00:11:36,560 --> 00:11:39,760 uh so these are the modified data sets 324 00:11:38,720 --> 00:11:41,200 again 325 00:11:39,760 --> 00:11:43,279 so it seems we have been able to 326 00:11:41,200 --> 00:11:44,880 logically clean and modify the data sets 327 00:11:43,279 --> 00:11:46,480 now we are only focused on building the 328 00:11:44,880 --> 00:11:47,920 past network among the players who are 329 00:11:46,480 --> 00:11:49,440 in the starting level 330 00:11:47,920 --> 00:11:51,120 okay 331 00:11:49,440 --> 00:11:53,200 so for that we need the time when the 332 00:11:51,120 --> 00:11:55,360 first substitution took place there were 333 00:11:53,200 --> 00:11:57,040 three substitution for real madrid and 334 00:11:55,360 --> 00:11:59,760 the first substitution took place at 335 00:11:57,040 --> 00:12:01,519 36th minute at 17 second 336 00:11:59,760 --> 00:12:02,639 okay and same for liverpool there were 337 00:12:01,519 --> 00:12:05,040 two substitutions and the first 338 00:12:02,639 --> 00:12:07,200 substitution took place at 29 2013 339 00:12:05,040 --> 00:12:09,600 second 340 00:12:07,200 --> 00:12:12,240 so that's what we do here with this 341 00:12:09,600 --> 00:12:13,760 sample code we like try to find out the 342 00:12:12,240 --> 00:12:17,120 minute and second 343 00:12:13,760 --> 00:12:18,560 so we use the minute 344 00:12:17,120 --> 00:12:21,279 value to 345 00:12:18,560 --> 00:12:23,680 like discard all those uh rows which 346 00:12:21,279 --> 00:12:25,680 appeared after that 347 00:12:23,680 --> 00:12:27,600 minute okay 348 00:12:25,680 --> 00:12:30,720 all those events that occurred after 349 00:12:27,600 --> 00:12:32,720 that uh time now from the data sets we 350 00:12:30,720 --> 00:12:34,800 split the location and the parcel 351 00:12:32,720 --> 00:12:36,880 location into two columns as i said one 352 00:12:34,800 --> 00:12:38,880 for uh x coordinates and another for y 353 00:12:36,880 --> 00:12:41,120 coordinates so inspired by the way given 354 00:12:38,880 --> 00:12:43,839 here uh if you look click in this link 355 00:12:41,120 --> 00:12:45,839 you will see uh in the apple circle uh 356 00:12:43,839 --> 00:12:47,519 documentation page we will take the 357 00:12:45,839 --> 00:12:49,600 average locations of the starting eleven 358 00:12:47,519 --> 00:12:52,000 players uh on the field for for a 359 00:12:49,600 --> 00:12:53,920 unified construction of the past network 360 00:12:52,000 --> 00:12:55,200 so for that we will use the aggregate 361 00:12:53,920 --> 00:12:57,120 function 362 00:12:55,200 --> 00:12:58,720 and what it does is that uh we'll count 363 00:12:57,120 --> 00:12:59,600 the number of passes created by this 364 00:12:58,720 --> 00:13:02,240 player 365 00:12:59,600 --> 00:13:04,560 okay so 366 00:13:02,240 --> 00:13:06,639 here f lock real gives the average 367 00:13:04,560 --> 00:13:08,800 location of the player of a particular 368 00:13:06,639 --> 00:13:10,560 pillar from a particular team in that 369 00:13:08,800 --> 00:13:13,279 match as i said using the aggregate 370 00:13:10,560 --> 00:13:15,279 function gives the mean of 371 00:13:13,279 --> 00:13:17,360 the x and y coordinates of for a 372 00:13:15,279 --> 00:13:19,040 particular player and for case scenario 373 00:13:17,360 --> 00:13:21,600 we see that the mean is 374 00:13:19,040 --> 00:13:22,959 eight four five in the x coordinate and 375 00:13:21,600 --> 00:13:26,079 thirty one point eight three six in the 376 00:13:22,959 --> 00:13:27,600 y coordinate and he completed uh eleven 377 00:13:26,079 --> 00:13:29,120 passes once we sort out the starting 378 00:13:27,600 --> 00:13:30,560 eleven players average locations in a 379 00:13:29,120 --> 00:13:33,120 game we'll try to figure out the number 380 00:13:30,560 --> 00:13:34,959 of times a particular pass maker pass 381 00:13:33,120 --> 00:13:36,399 the ball to a particular pass receiver 382 00:13:34,959 --> 00:13:38,399 because just to keep the direction of 383 00:13:36,399 --> 00:13:40,480 pass in mind that is a pass from player 384 00:13:38,399 --> 00:13:42,480 a to player b is not equal identical to 385 00:13:40,480 --> 00:13:44,959 the pass from player b to player a so it 386 00:13:42,480 --> 00:13:46,399 would give a directed ah network so 387 00:13:44,959 --> 00:13:48,079 we'll use group by and the count 388 00:13:46,399 --> 00:13:50,399 function to count the number of rows 389 00:13:48,079 --> 00:13:52,880 where a unique player a pass the ball to 390 00:13:50,399 --> 00:13:54,880 another unique player b 391 00:13:52,880 --> 00:13:57,360 so here we use the group by function and 392 00:13:54,880 --> 00:13:59,440 here we just reset the index accordingly 393 00:13:57,360 --> 00:14:01,120 and we see that uh here the pass maker 394 00:13:59,440 --> 00:14:03,760 is kasamiro and the first receiver is 395 00:14:01,120 --> 00:14:06,079 carvajal and kasamiro passed only once 396 00:14:03,760 --> 00:14:08,800 in the entire game to uh danny carvajal 397 00:14:06,079 --> 00:14:10,800 he passed six times to tony cruz so we 398 00:14:08,800 --> 00:14:12,880 see that six plus seven eight nine ten 399 00:14:10,800 --> 00:14:15,040 eleven so yeah there there were eleven 400 00:14:12,880 --> 00:14:16,959 passes if you remember for uh december 401 00:14:15,040 --> 00:14:18,959 that will merge the data sets have 402 00:14:16,959 --> 00:14:21,199 looked real and pass real and let us 403 00:14:18,959 --> 00:14:23,839 identify the left and right data frames 404 00:14:21,199 --> 00:14:25,839 for uh performing the merge okay here 405 00:14:23,839 --> 00:14:28,000 avlog real is the left data frame and 406 00:14:25,839 --> 00:14:31,440 the pass real is the right 407 00:14:28,000 --> 00:14:33,360 so we just merge them and uh 408 00:14:31,440 --> 00:14:35,279 this is what the new data set looks like 409 00:14:33,360 --> 00:14:37,760 so pass maker pass receiver the number 410 00:14:35,279 --> 00:14:39,040 of passes uh given by the pass maker to 411 00:14:37,760 --> 00:14:41,680 pass receiver 412 00:14:39,040 --> 00:14:43,760 pass makers average x location 413 00:14:41,680 --> 00:14:44,800 x coordinate pass makers average y 414 00:14:43,760 --> 00:14:46,560 coordinate 415 00:14:44,800 --> 00:14:48,959 and the total number of 416 00:14:46,560 --> 00:14:51,279 passes completed by the pass maker so we 417 00:14:48,959 --> 00:14:53,519 see that it's 11 for all the 418 00:14:51,279 --> 00:14:56,079 cases rows where uh the pass maker is 419 00:14:53,519 --> 00:14:56,079 casimiro 420 00:14:56,160 --> 00:15:01,360 and same for liverpool 421 00:14:59,040 --> 00:15:03,040 uh so finally uh we will again perform a 422 00:15:01,360 --> 00:15:04,720 merge on these updated data sets for 423 00:15:03,040 --> 00:15:06,320 adding the average locations of the past 424 00:15:04,720 --> 00:15:08,720 receivers and the number of times the 425 00:15:06,320 --> 00:15:11,040 receiver receive the ball okay 426 00:15:08,720 --> 00:15:11,040 so 427 00:15:11,199 --> 00:15:14,000 that's what we do 428 00:15:12,399 --> 00:15:14,800 and uh 429 00:15:14,000 --> 00:15:16,480 like 430 00:15:14,800 --> 00:15:18,880 we added this column number of passes 431 00:15:16,480 --> 00:15:21,920 received by that particular uh 432 00:15:18,880 --> 00:15:24,320 player so casimiro i think received 24 433 00:15:21,920 --> 00:15:26,079 passes then lastly we will replace the 434 00:15:24,320 --> 00:15:27,199 players names with their jersey numbers 435 00:15:26,079 --> 00:15:29,600 okay 436 00:15:27,199 --> 00:15:31,360 and then we just visualize the pass 437 00:15:29,600 --> 00:15:33,519 pass network for both the teams so we 438 00:15:31,360 --> 00:15:36,240 use the pitch function to draw the page 439 00:15:33,519 --> 00:15:38,079 and then we use arrows to draw the edges 440 00:15:36,240 --> 00:15:42,000 from between the nodes 441 00:15:38,079 --> 00:15:44,639 and then pitch dot scatter uh 442 00:15:42,000 --> 00:15:46,320 so ps dot arrows function uh does that 443 00:15:44,639 --> 00:15:49,199 and ps dot scatter function draws the 444 00:15:46,320 --> 00:15:51,680 nodes the thicker the edge uh the number 445 00:15:49,199 --> 00:15:53,600 of more the number of 446 00:15:51,680 --> 00:15:55,839 passes were played so that's what it 447 00:15:53,600 --> 00:15:58,639 actually uh means 448 00:15:55,839 --> 00:16:00,240 and yeah so this is the 449 00:15:58,639 --> 00:16:01,600 first network for real madrid against 450 00:16:00,240 --> 00:16:05,839 liverpool 451 00:16:01,600 --> 00:16:08,320 and this is for liverpool okay 452 00:16:05,839 --> 00:16:09,920 and now as we have drawn the visualize 453 00:16:08,320 --> 00:16:13,920 the past networks we can 454 00:16:09,920 --> 00:16:15,279 perform some complex network analysis by 455 00:16:13,920 --> 00:16:16,560 analyzing our networks using some 456 00:16:15,279 --> 00:16:18,800 metrics 457 00:16:16,560 --> 00:16:18,800 and 458 00:16:19,040 --> 00:16:23,440 as uh let us first develop the 459 00:16:20,480 --> 00:16:26,079 isomorphic graph uh to the ones that we 460 00:16:23,440 --> 00:16:29,120 just saw uh and by using the network 461 00:16:26,079 --> 00:16:30,720 package network x package and the thing 462 00:16:29,120 --> 00:16:33,120 is uh we don't need the average 463 00:16:30,720 --> 00:16:35,040 locations here okay uh we are only 464 00:16:33,120 --> 00:16:36,880 interested on the 465 00:16:35,040 --> 00:16:38,959 nodes and the information about the 466 00:16:36,880 --> 00:16:41,040 nodes and the edges but not about the 467 00:16:38,959 --> 00:16:43,040 locations what we do is that 468 00:16:41,040 --> 00:16:44,959 we create this data set pass real new 469 00:16:43,040 --> 00:16:46,160 from password 470 00:16:44,959 --> 00:16:48,720 new where 471 00:16:46,160 --> 00:16:50,399 the pass maker we only take the columns 472 00:16:48,720 --> 00:16:52,800 pass maker pass receiver and number of 473 00:16:50,399 --> 00:16:54,880 passes use the nx dot digraph function 474 00:16:52,800 --> 00:16:56,560 because it's a directed graph so digraph 475 00:16:54,880 --> 00:16:59,120 function from network x 476 00:16:56,560 --> 00:17:01,040 and then we add the edge 477 00:16:59,120 --> 00:17:02,880 along with the weight yes so this is 478 00:17:01,040 --> 00:17:04,720 what it looks like okay now we will use 479 00:17:02,880 --> 00:17:06,079 network x to find out the node degrees 480 00:17:04,720 --> 00:17:08,559 from uh 481 00:17:06,079 --> 00:17:10,880 past network of real madrid so 482 00:17:08,559 --> 00:17:11,760 by using the nx dot degree function on g 483 00:17:10,880 --> 00:17:13,520 real 484 00:17:11,760 --> 00:17:14,880 and we see that these are the node 485 00:17:13,520 --> 00:17:16,400 degrees 486 00:17:14,880 --> 00:17:17,679 so node degrees means the number of 487 00:17:16,400 --> 00:17:20,319 passes 488 00:17:17,679 --> 00:17:21,760 the player was actually 489 00:17:20,319 --> 00:17:25,120 involved in 490 00:17:21,760 --> 00:17:26,559 so for real madrid we see uh that jersey 491 00:17:25,120 --> 00:17:28,079 number eight 492 00:17:26,559 --> 00:17:30,720 was involved in the maximum number of 493 00:17:28,079 --> 00:17:32,799 passes we can create the in degrees and 494 00:17:30,720 --> 00:17:34,640 output out degrees to so that means the 495 00:17:32,799 --> 00:17:37,919 number of in degrees means the number of 496 00:17:34,640 --> 00:17:40,240 passes a player received and out degrees 497 00:17:37,919 --> 00:17:42,480 means the number of passes the player 498 00:17:40,240 --> 00:17:44,400 gave to some other players we can also 499 00:17:42,480 --> 00:17:47,200 generate adjacency matrix 500 00:17:44,400 --> 00:17:48,640 and like we see the diagonal had uh all 501 00:17:47,200 --> 00:17:50,400 zeros that means 502 00:17:48,640 --> 00:17:52,240 there were no self loops so a player 503 00:17:50,400 --> 00:17:54,400 cannot pass to himself now we can work 504 00:17:52,240 --> 00:17:56,799 on a metric that focuses on geodesic 505 00:17:54,400 --> 00:17:58,559 distance between two players nodes uh 506 00:17:56,799 --> 00:18:00,720 two player nodes in a graph one way to 507 00:17:58,559 --> 00:18:03,679 implement this is to divide one by the 508 00:18:00,720 --> 00:18:05,840 weight column and that will give a a 509 00:18:03,679 --> 00:18:08,000 new graph this is the 510 00:18:05,840 --> 00:18:09,039 graph that we end up with 511 00:18:08,000 --> 00:18:10,799 and 512 00:18:09,039 --> 00:18:12,640 we'll perform the same operations to 513 00:18:10,799 --> 00:18:14,799 create a modify modified graph for 514 00:18:12,640 --> 00:18:17,039 liverpool 2 515 00:18:14,799 --> 00:18:19,039 and now using this modified graph we can 516 00:18:17,039 --> 00:18:20,640 calculate the all pair shortest paths 517 00:18:19,039 --> 00:18:22,720 between the nodes players for both the 518 00:18:20,640 --> 00:18:24,559 teams let us compute first for real 519 00:18:22,720 --> 00:18:26,960 matrix if you intend to find the 520 00:18:24,559 --> 00:18:29,520 shortest path between two players like 521 00:18:26,960 --> 00:18:31,679 what would have been the shortest 522 00:18:29,520 --> 00:18:34,160 path for a pass from one player to 523 00:18:31,679 --> 00:18:36,080 another so suppose we want to calculate 524 00:18:34,160 --> 00:18:38,320 the shortest path from the goalkeeper 525 00:18:36,080 --> 00:18:40,640 killer nervous to the forward christian 526 00:18:38,320 --> 00:18:43,679 ronaldo we'll just type the following so 527 00:18:40,640 --> 00:18:46,480 print uh this real one to seven and we 528 00:18:43,679 --> 00:18:48,960 see that from keller never sergio ramos 529 00:18:46,480 --> 00:18:50,799 and from sargeras to 530 00:18:48,960 --> 00:18:53,120 whoever the player 531 00:18:50,799 --> 00:18:56,240 was with jesse number 12 to cristina 532 00:18:53,120 --> 00:18:58,480 ronaldo i think uh 533 00:18:56,240 --> 00:19:00,080 okay it was marcelo and with him 534 00:18:58,480 --> 00:19:01,760 ultimately passing to uh christian and 535 00:19:00,080 --> 00:19:03,919 ronaldo would have been the 536 00:19:01,760 --> 00:19:05,200 uh 537 00:19:03,919 --> 00:19:07,679 like 538 00:19:05,200 --> 00:19:09,120 would have been the uh shortest path and 539 00:19:07,679 --> 00:19:10,559 this seems like a good post match 540 00:19:09,120 --> 00:19:13,360 analysis tool 541 00:19:10,559 --> 00:19:17,440 i got this idea from discussing with uh 542 00:19:13,360 --> 00:19:20,240 sarah babu and he is from iist india 543 00:19:17,440 --> 00:19:22,000 now we can calculate the eccentricity uh 544 00:19:20,240 --> 00:19:24,160 which is based on the shortest distance 545 00:19:22,000 --> 00:19:26,559 so eccentricity of a player note p tells 546 00:19:24,160 --> 00:19:29,679 us how far the furthest player note from 547 00:19:26,559 --> 00:19:32,480 a p is positioned in the past network so 548 00:19:29,679 --> 00:19:33,840 this is for all the players for for for 549 00:19:32,480 --> 00:19:35,840 real madrid 550 00:19:33,840 --> 00:19:38,480 and this is the average uh 551 00:19:35,840 --> 00:19:41,520 eccentricity and for liverpool also 552 00:19:38,480 --> 00:19:42,720 we calculate the average assembly speed 553 00:19:41,520 --> 00:19:43,919 we can also calculate the average 554 00:19:42,720 --> 00:19:46,160 clustering coefficient of a player 555 00:19:43,919 --> 00:19:48,320 calculated it's 0.182 and for liverpool 556 00:19:46,160 --> 00:19:50,000 it's 0.276 557 00:19:48,320 --> 00:19:52,640 so the average clustering coefficient 558 00:19:50,000 --> 00:19:54,400 lies in the range 0 and 1 where a value 559 00:19:52,640 --> 00:19:56,320 of 0 denotes the fact that none of the 560 00:19:54,400 --> 00:19:58,080 nodes are connected to each other and 561 00:19:56,320 --> 00:20:00,000 value of 1 denotes the fact that the 562 00:19:58,080 --> 00:20:01,840 network is a click that is each node is 563 00:20:00,000 --> 00:20:04,880 connected to all the other nodes of the 564 00:20:01,840 --> 00:20:07,840 network also can compute the centrality 565 00:20:04,880 --> 00:20:09,520 of each player uh and 566 00:20:07,840 --> 00:20:10,880 for each node in either team's first 567 00:20:09,520 --> 00:20:12,880 network i understand which player was 568 00:20:10,880 --> 00:20:14,720 the most important so sent more the 569 00:20:12,880 --> 00:20:17,280 centrality of a particular player more 570 00:20:14,720 --> 00:20:19,919 important he was in that match 571 00:20:17,280 --> 00:20:22,400 uh so for real madrid it was casimiro 572 00:20:19,919 --> 00:20:24,320 and for liverpool it was milner right so 573 00:20:22,400 --> 00:20:26,640 next we'll talk about uh pass map and 574 00:20:24,320 --> 00:20:27,760 the corresponding heat pumps so we will 575 00:20:26,640 --> 00:20:30,080 again uh 576 00:20:27,760 --> 00:20:32,080 from the same material again work on the 577 00:20:30,080 --> 00:20:33,280 fetching the pass data visualize the 578 00:20:32,080 --> 00:20:34,720 pass map 579 00:20:33,280 --> 00:20:36,559 and 580 00:20:34,720 --> 00:20:37,520 as we saw in so this is the events data 581 00:20:36,559 --> 00:20:38,799 set 582 00:20:37,520 --> 00:20:40,720 and 583 00:20:38,799 --> 00:20:42,240 so we'll 584 00:20:40,720 --> 00:20:44,480 like take the 585 00:20:42,240 --> 00:20:46,240 column stream type minute location 586 00:20:44,480 --> 00:20:47,520 password location pass outcome and 587 00:20:46,240 --> 00:20:49,760 player 588 00:20:47,520 --> 00:20:51,360 okay so here we will 589 00:20:49,760 --> 00:20:53,280 focus on a particular player and let's 590 00:20:51,360 --> 00:20:54,159 choose tony cruz because he is one of 591 00:20:53,280 --> 00:20:56,720 the 592 00:20:54,159 --> 00:20:58,480 best midfield builders of this uh era 593 00:20:56,720 --> 00:21:00,960 from events pass 594 00:20:58,480 --> 00:21:03,679 we only take the player whose name is 595 00:21:00,960 --> 00:21:06,000 tonic rose and 596 00:21:03,679 --> 00:21:08,320 this is what we end up with the data set 597 00:21:06,000 --> 00:21:10,880 like the type column in events past p1 598 00:21:08,320 --> 00:21:13,360 has event types other than passes but 599 00:21:10,880 --> 00:21:16,880 we'll only set to pass 600 00:21:13,360 --> 00:21:19,520 and that's what we do here and this is 601 00:21:16,880 --> 00:21:20,559 the data set where the type is only pass 602 00:21:19,520 --> 00:21:23,520 and 603 00:21:20,559 --> 00:21:26,960 like this is what the 604 00:21:23,520 --> 00:21:29,760 uh pass map looks for uh for tony close 605 00:21:26,960 --> 00:21:32,159 so if the pitch function and like for 606 00:21:29,760 --> 00:21:34,640 successful process we set the color 607 00:21:32,159 --> 00:21:36,240 green and for the ancestor process we 608 00:21:34,640 --> 00:21:37,440 for the unsuccessful process we set the 609 00:21:36,240 --> 00:21:39,280 color red 610 00:21:37,440 --> 00:21:41,760 and 611 00:21:39,280 --> 00:21:43,840 for the heat map we use the seaboard 612 00:21:41,760 --> 00:21:46,640 schedule plot 613 00:21:43,840 --> 00:21:50,320 yeah again i got this idea from uh 614 00:21:46,640 --> 00:21:52,640 like uh mickey jones tutorial and like 615 00:21:50,320 --> 00:21:53,840 if you 616 00:21:52,640 --> 00:21:55,039 want you can also calculate the 617 00:21:53,840 --> 00:21:57,360 percentage of successful and 618 00:21:55,039 --> 00:22:00,240 unsuccessful process for two nicoles and 619 00:21:57,360 --> 00:22:02,559 we see that he had around 91 percent 620 00:22:00,240 --> 00:22:04,640 successful passes so these are if you 621 00:22:02,559 --> 00:22:07,280 want you can draw the uh frequency 622 00:22:04,640 --> 00:22:08,880 distribution so now we will focus on 623 00:22:07,280 --> 00:22:11,039 applying some computational geometric 624 00:22:08,880 --> 00:22:12,559 concepts on 625 00:22:11,039 --> 00:22:14,320 uh the 626 00:22:12,559 --> 00:22:16,080 event and tracking data 627 00:22:14,320 --> 00:22:17,919 so for that we'll visualize the convex 628 00:22:16,080 --> 00:22:19,600 cells from players even data 629 00:22:17,919 --> 00:22:21,600 so we'll study how to develop a convex 630 00:22:19,600 --> 00:22:23,440 all around these points uh and from 631 00:22:21,600 --> 00:22:25,679 where a player had made a pass or had 632 00:22:23,440 --> 00:22:26,960 taken a shot in a particular game so 633 00:22:25,679 --> 00:22:29,440 mathematically if these points are 634 00:22:26,960 --> 00:22:31,440 contained in a set x then the convex l 635 00:22:29,440 --> 00:22:33,520 is the smallest convex set that contains 636 00:22:31,440 --> 00:22:36,000 x uh so this figure has been adapted 637 00:22:33,520 --> 00:22:37,120 from this wikipedia article uh if you go 638 00:22:36,000 --> 00:22:38,840 and you will see 639 00:22:37,120 --> 00:22:42,240 it has a lot of information about convex 640 00:22:38,840 --> 00:22:43,919 cells so we'll use uh the sci-fi package 641 00:22:42,240 --> 00:22:45,600 uh 642 00:22:43,919 --> 00:22:47,600 go download it 643 00:22:45,600 --> 00:22:49,120 uh so as we have been doing till now let 644 00:22:47,600 --> 00:22:50,480 us pick the important columns from the 645 00:22:49,120 --> 00:22:53,840 events data set 646 00:22:50,480 --> 00:22:56,240 like team location type and player 647 00:22:53,840 --> 00:22:58,159 here we only need the location of the 648 00:22:56,240 --> 00:23:00,720 players so it seems like we only need 649 00:22:58,159 --> 00:23:03,280 our four columns for now uh 650 00:23:00,720 --> 00:23:05,840 and we are focusing on pass and short 651 00:23:03,280 --> 00:23:08,640 events okay so we set the type to be 652 00:23:05,840 --> 00:23:10,400 either pass or short so that's what we 653 00:23:08,640 --> 00:23:12,080 do here then we'll next split the data 654 00:23:10,400 --> 00:23:13,679 into two data sets one for real media 655 00:23:12,080 --> 00:23:16,000 and one for liverpool we will now 656 00:23:13,679 --> 00:23:17,840 extract the even data for tony cruz uh 657 00:23:16,000 --> 00:23:20,559 from events all real 658 00:23:17,840 --> 00:23:22,159 uh so we set the player to be tony cross 659 00:23:20,559 --> 00:23:23,280 and this can be applied to all the other 660 00:23:22,159 --> 00:23:25,360 players 661 00:23:23,280 --> 00:23:26,159 and before computing and visualizing the 662 00:23:25,360 --> 00:23:28,240 past 663 00:23:26,159 --> 00:23:30,159 visualizing the con convex cell it is 664 00:23:28,240 --> 00:23:31,840 good practice to discard the outliers 665 00:23:30,159 --> 00:23:34,720 okay from the data sets 666 00:23:31,840 --> 00:23:36,240 so we use the interquartile range 667 00:23:34,720 --> 00:23:38,799 and we'll find the interquartile ranges 668 00:23:36,240 --> 00:23:40,559 for the columns location x location i 669 00:23:38,799 --> 00:23:42,159 from the event cell tony and then 670 00:23:40,559 --> 00:23:44,159 compute the upper and lower bounds for 671 00:23:42,159 --> 00:23:45,679 the data any points lying beyond these 672 00:23:44,159 --> 00:23:49,520 bounds will be considered as 673 00:23:45,679 --> 00:23:50,559 outliers and will be discarded okay 674 00:23:49,520 --> 00:23:52,240 so 675 00:23:50,559 --> 00:23:54,880 this is what we uh do here this is the 676 00:23:52,240 --> 00:23:56,000 box plot for uh tony cruz's uh location 677 00:23:54,880 --> 00:23:57,600 conditions 678 00:23:56,000 --> 00:24:00,159 and we see these are the so these are 679 00:23:57,600 --> 00:24:01,760 the whisker plots 680 00:24:00,159 --> 00:24:04,320 and for location x these are the 681 00:24:01,760 --> 00:24:05,039 outliers and for location why we don't 682 00:24:04,320 --> 00:24:07,440 have 683 00:24:05,039 --> 00:24:08,320 those outliers but yeah this 684 00:24:07,440 --> 00:24:11,360 uh 685 00:24:08,320 --> 00:24:13,440 we can discard the outliers uh 686 00:24:11,360 --> 00:24:15,120 and next let us look into the eventual 687 00:24:13,440 --> 00:24:16,559 tony data set 688 00:24:15,120 --> 00:24:19,200 and first we'll collect all the points 689 00:24:16,559 --> 00:24:21,120 from the two columns as a 2d matrix 690 00:24:19,200 --> 00:24:23,360 this uh comes in eight while drawing the 691 00:24:21,120 --> 00:24:25,279 convexal 692 00:24:23,360 --> 00:24:28,400 so poinsol equals to even solve only 693 00:24:25,279 --> 00:24:30,559 location x location y dot values give 694 00:24:28,400 --> 00:24:32,320 allows us to do that now let us use the 695 00:24:30,559 --> 00:24:34,880 convex hull function from scipy dot 696 00:24:32,320 --> 00:24:36,000 special uh 697 00:24:34,880 --> 00:24:36,960 package 698 00:24:36,000 --> 00:24:39,600 and 699 00:24:36,960 --> 00:24:41,520 we apply convex along this 700 00:24:39,600 --> 00:24:43,919 location action location 701 00:24:41,520 --> 00:24:46,320 y for events hall tony 702 00:24:43,919 --> 00:24:48,480 and then we just collect the useful 703 00:24:46,320 --> 00:24:50,799 information and we visualize the context 704 00:24:48,480 --> 00:24:52,240 so this is the convexel of tonic uses 705 00:24:50,799 --> 00:24:53,919 field coverage 706 00:24:52,240 --> 00:24:56,480 now we can draw the convex for other 707 00:24:53,919 --> 00:24:59,039 players too from either of the two and 708 00:24:56,480 --> 00:25:01,520 uh next what we do is that we uh draw 709 00:24:59,039 --> 00:25:03,600 the uh 710 00:25:01,520 --> 00:25:05,760 we learn how to get the tracking data 711 00:25:03,600 --> 00:25:08,480 for a particular match uh 712 00:25:05,760 --> 00:25:11,840 uh like for each for 713 00:25:08,480 --> 00:25:13,760 for some particular instances of a match 714 00:25:11,840 --> 00:25:15,279 and then draw the corresponding 715 00:25:13,760 --> 00:25:17,279 delineate triangulations and voronoi 716 00:25:15,279 --> 00:25:18,960 diagrams 717 00:25:17,279 --> 00:25:20,640 so we'll try to understand how to get 718 00:25:18,960 --> 00:25:22,880 the tracking data from particular game 719 00:25:20,640 --> 00:25:24,159 using stratform api so again it's given 720 00:25:22,880 --> 00:25:25,919 in their uh 721 00:25:24,159 --> 00:25:27,120 documentation please go and follow that 722 00:25:25,919 --> 00:25:29,919 if you want 723 00:25:27,120 --> 00:25:32,159 so we need to import useful classes from 724 00:25:29,919 --> 00:25:35,120 the mpl soccer dot strasbourn module and 725 00:25:32,159 --> 00:25:36,480 as you remember we use the match id18245 726 00:25:35,120 --> 00:25:39,360 and 727 00:25:36,480 --> 00:25:41,600 uh from mpl soccer.transform we import 728 00:25:39,360 --> 00:25:43,279 read event event 729 00:25:41,600 --> 00:25:45,360 the code is given here this is the 730 00:25:43,279 --> 00:25:48,559 events and tracking look at the event 731 00:25:45,360 --> 00:25:50,080 and tracking data sets so if we look 732 00:25:48,559 --> 00:25:51,840 closely into the tracking data set we 733 00:25:50,080 --> 00:25:54,240 understand that the column id represents 734 00:25:51,840 --> 00:25:56,720 an uh unique id for a short freeze frame 735 00:25:54,240 --> 00:25:58,480 that is a particular instance of a match 736 00:25:56,720 --> 00:26:00,799 now we'll only extract the relevant 737 00:25:58,480 --> 00:26:03,120 columns id player name the locations x 738 00:26:00,799 --> 00:26:05,600 and y and the team 739 00:26:03,120 --> 00:26:08,559 uh now let us try collecting the jersey 740 00:26:05,600 --> 00:26:09,760 numbers uh we you will use a different 741 00:26:08,559 --> 00:26:12,159 and easier approach from the one we have 742 00:26:09,760 --> 00:26:13,279 done here the blog that i have written 743 00:26:12,159 --> 00:26:14,799 so 744 00:26:13,279 --> 00:26:16,480 uh yeah but 745 00:26:14,799 --> 00:26:19,279 we can follow this to get the plane 746 00:26:16,480 --> 00:26:20,960 information use the following command by 747 00:26:19,279 --> 00:26:22,960 passing the match id so player info goes 748 00:26:20,960 --> 00:26:24,480 to sb dot lineups match ad equals to the 749 00:26:22,960 --> 00:26:26,799 one that we have been using 750 00:26:24,480 --> 00:26:28,240 here is the data set we see that it has 751 00:26:26,799 --> 00:26:30,320 just a number that country the player 752 00:26:28,240 --> 00:26:31,919 belongs to player nickname player name 753 00:26:30,320 --> 00:26:33,679 and player right now let us select a 754 00:26:31,919 --> 00:26:35,440 particular id from taking data set 755 00:26:33,679 --> 00:26:36,240 representing an instance of a particular 756 00:26:35,440 --> 00:26:38,720 game 757 00:26:36,240 --> 00:26:40,240 and we will filter tracking by a id 758 00:26:38,720 --> 00:26:41,520 value um 759 00:26:40,240 --> 00:26:43,039 which will give us the information of 760 00:26:41,520 --> 00:26:45,039 the locations of the players on the page 761 00:26:43,039 --> 00:26:47,840 at that moment 762 00:26:45,039 --> 00:26:50,720 uh so these are all the unique ids for a 763 00:26:47,840 --> 00:26:52,960 particular snap from a game 764 00:26:50,720 --> 00:26:55,440 let us filter the data set now so we use 765 00:26:52,960 --> 00:26:57,760 this short id to be this we just select 766 00:26:55,440 --> 00:26:59,600 a particular value from the id column 767 00:26:57,760 --> 00:27:02,400 uh and then tracking filter because the 768 00:26:59,600 --> 00:27:04,080 tracking tracking id where the id is the 769 00:27:02,400 --> 00:27:07,760 short id 770 00:27:04,080 --> 00:27:10,960 and same for event and uh the data 771 00:27:07,760 --> 00:27:12,080 filtered data set looks like this 772 00:27:10,960 --> 00:27:13,600 okay 773 00:27:12,080 --> 00:27:15,360 so we will now compute the dilute 774 00:27:13,600 --> 00:27:16,880 triangulations from our team's player 775 00:27:15,360 --> 00:27:18,320 locations to get an idea about the 776 00:27:16,880 --> 00:27:19,440 possible links created among the 777 00:27:18,320 --> 00:27:20,880 teammates 778 00:27:19,440 --> 00:27:22,320 so delay right triangulations help us 779 00:27:20,880 --> 00:27:24,159 understand the links 780 00:27:22,320 --> 00:27:26,720 uh between teammates of a particular 781 00:27:24,159 --> 00:27:29,120 team go look into the wikipedia article 782 00:27:26,720 --> 00:27:30,720 from where it took this 783 00:27:29,120 --> 00:27:32,559 diagram uh to understand more about the 784 00:27:30,720 --> 00:27:35,440 illinois triangulations 785 00:27:32,559 --> 00:27:36,559 so we will import that illinois uh 786 00:27:35,440 --> 00:27:38,799 class from 787 00:27:36,559 --> 00:27:40,320 sci-fi dot special 788 00:27:38,799 --> 00:27:41,840 we visualize the triangulations and the 789 00:27:40,320 --> 00:27:43,120 player positions of the instance on that 790 00:27:41,840 --> 00:27:45,760 page 791 00:27:43,120 --> 00:27:48,880 and this is the code to do that uh go 792 00:27:45,760 --> 00:27:52,159 check it out and yeah so this is the 793 00:27:48,880 --> 00:27:54,399 dilenoid triangulation for real madrids 794 00:27:52,159 --> 00:27:56,799 so i've linked to the real madrid 795 00:27:54,399 --> 00:27:58,720 players 796 00:27:56,799 --> 00:28:00,159 and the red nodes indicate the locations 797 00:27:58,720 --> 00:28:02,799 of liverpool's players and the white 798 00:28:00,159 --> 00:28:04,399 notes indicate the real matrix and the 799 00:28:02,799 --> 00:28:05,919 black lines indicate the direct links 800 00:28:04,399 --> 00:28:07,600 between the players one particular team 801 00:28:05,919 --> 00:28:09,600 at a particular moment forming that 802 00:28:07,600 --> 00:28:11,760 illinois triangulations so the book 803 00:28:09,600 --> 00:28:14,080 soccer matrix uh in his book soccer 804 00:28:11,760 --> 00:28:16,159 matrix by dr sumter he mentions that 805 00:28:14,080 --> 00:28:17,840 these lines have two useful indications 806 00:28:16,159 --> 00:28:19,039 first they portray the availability of a 807 00:28:17,840 --> 00:28:20,799 player 808 00:28:19,039 --> 00:28:22,159 uh availability of passes among the 809 00:28:20,799 --> 00:28:24,080 players from a particular team and 810 00:28:22,159 --> 00:28:25,840 second they also indicate the no man's 811 00:28:24,080 --> 00:28:28,240 lands for players from the opposition 812 00:28:25,840 --> 00:28:30,240 team meaning that if an player if an 813 00:28:28,240 --> 00:28:32,000 opposition player is on one of these 814 00:28:30,240 --> 00:28:34,480 linking lines 815 00:28:32,000 --> 00:28:36,799 then they are at the disadvantage 816 00:28:34,480 --> 00:28:38,559 uh so 817 00:28:36,799 --> 00:28:40,640 and we can also draw the voronoi 818 00:28:38,559 --> 00:28:42,399 diagrams uh again look into the 819 00:28:40,640 --> 00:28:43,600 wikipedia article to learn more about 820 00:28:42,399 --> 00:28:46,240 boronics 821 00:28:43,600 --> 00:28:48,720 we use uh cypher specials boronoid and 822 00:28:46,240 --> 00:28:51,600 voronoi plot 2d classes to draw the 823 00:28:48,720 --> 00:28:53,440 veronica diagrams 824 00:28:51,600 --> 00:28:55,039 and this is the 825 00:28:53,440 --> 00:28:56,640 voronoi diagram 826 00:28:55,039 --> 00:28:59,120 and the voronoi diagrams give us the 827 00:28:56,640 --> 00:29:00,799 zones of each and every player 828 00:28:59,120 --> 00:29:02,960 on the pitch at a particular moment by 829 00:29:00,799 --> 00:29:05,039 breaking the pitch into distinct regions 830 00:29:02,960 --> 00:29:06,799 belonging to the players indicating the 831 00:29:05,039 --> 00:29:08,000 field coverage of each player at that 832 00:29:06,799 --> 00:29:10,240 moment 833 00:29:08,000 --> 00:29:12,640 so yes this completes our section and i 834 00:29:10,240 --> 00:29:14,640 think that's it for my presentation so 835 00:29:12,640 --> 00:29:15,440 these are the references go check them 836 00:29:14,640 --> 00:29:17,919 out 837 00:29:15,440 --> 00:29:19,919 at the end thank you wear masks get 838 00:29:17,919 --> 00:29:22,919 vaccinated and stay safe 839 00:29:19,919 --> 00:29:22,919 thanks 840 00:29:26,640 --> 00:29:28,720 you