1 00:00:12,639 --> 00:00:16,240 hello and welcome again to the data 2 00:00:14,639 --> 00:00:19,039 science and analytics track here at 3 00:00:16,240 --> 00:00:21,359 pycon iu 2021 4 00:00:19,039 --> 00:00:23,519 next up from mysore india 5 00:00:21,359 --> 00:00:25,039 is gaziandra deshpande 6 00:00:23,519 --> 00:00:27,359 they held a masters in technology in 7 00:00:25,039 --> 00:00:28,800 computer science and engineering from 8 00:00:27,359 --> 00:00:31,279 um 9 00:00:28,800 --> 00:00:33,120 oh damn i got the thing wrong 10 00:00:31,279 --> 00:00:34,800 and also a postgraduate diploma in cyber 11 00:00:33,120 --> 00:00:36,239 law and cyber forensics from national 12 00:00:34,800 --> 00:00:37,520 law school of india university in 13 00:00:36,239 --> 00:00:39,360 bengaluru 14 00:00:37,520 --> 00:00:41,440 they have presented talks and papers and 15 00:00:39,360 --> 00:00:43,200 been reviewer and program committee at 16 00:00:41,440 --> 00:00:44,480 more conference and more continents that 17 00:00:43,200 --> 00:00:47,760 i can count 18 00:00:44,480 --> 00:00:49,760 and gazendra is also a github certified 19 00:00:47,760 --> 00:00:52,239 campus advisor and they lead the paidada 20 00:00:49,760 --> 00:00:53,920 bellagavi chapter as well as the owaspe 21 00:00:52,239 --> 00:00:56,719 agave chapter 22 00:00:53,920 --> 00:00:58,879 in the talk today gajendra reminds us 23 00:00:56,719 --> 00:01:00,320 that time series is one of the important 24 00:00:58,879 --> 00:01:01,600 as well as lesser known fields of 25 00:01:00,320 --> 00:01:03,520 machine learning 26 00:01:01,600 --> 00:01:05,680 in time series forecasting we analyze 27 00:01:03,520 --> 00:01:06,880 the time dependent data for predicting 28 00:01:05,680 --> 00:01:09,520 long-term 29 00:01:06,880 --> 00:01:11,439 and seasonally trends 30 00:01:09,520 --> 00:01:13,360 please give a warm welcome to gazendra 31 00:01:11,439 --> 00:01:17,200 deshpande and the presentation on time 32 00:01:13,360 --> 00:01:17,200 series forecasting techniques in python 33 00:01:20,799 --> 00:01:24,560 thank you for the nice introduction it 34 00:01:22,799 --> 00:01:28,880 is wonderful to 35 00:01:24,560 --> 00:01:31,759 speak at pycon conferences always 36 00:01:28,880 --> 00:01:33,520 this is my second talk at pycon 37 00:01:31,759 --> 00:01:34,880 australia online 38 00:01:33,520 --> 00:01:37,119 so today i will be presenting a talk on 39 00:01:34,880 --> 00:01:40,400 time series for casting using 40 00:01:37,119 --> 00:01:42,640 python so it will be a basic talk 41 00:01:40,400 --> 00:01:44,320 so we will not be covering uh any 42 00:01:42,640 --> 00:01:46,720 advanced topics here 43 00:01:44,320 --> 00:01:48,799 so in my talk today we will today i will 44 00:01:46,720 --> 00:01:51,119 discuss introduction to time series 45 00:01:48,799 --> 00:01:52,880 forecasting techniques then basic steps 46 00:01:51,119 --> 00:01:55,680 in forecasting 47 00:01:52,880 --> 00:01:57,840 then time series forecasting techniques 48 00:01:55,680 --> 00:02:00,320 then python libraries for 49 00:01:57,840 --> 00:02:01,600 time says forecasting and finally we'll 50 00:02:00,320 --> 00:02:04,079 summarize and we'll take up some 51 00:02:01,600 --> 00:02:04,079 questions 52 00:02:04,240 --> 00:02:08,560 so 53 00:02:05,600 --> 00:02:09,840 we need to first define what is 54 00:02:08,560 --> 00:02:12,879 time series 55 00:02:09,840 --> 00:02:14,720 so it's a collection of data points at 56 00:02:12,879 --> 00:02:18,400 constant time intervals 57 00:02:14,720 --> 00:02:21,360 analyzed to determine a long term trend 58 00:02:18,400 --> 00:02:23,440 so you can see here on this slide 59 00:02:21,360 --> 00:02:25,599 there is a picture which is 60 00:02:23,440 --> 00:02:27,760 showing the temperature data so 61 00:02:25,599 --> 00:02:28,959 temperature collected at 62 00:02:27,760 --> 00:02:29,840 various 63 00:02:28,959 --> 00:02:32,720 hours 64 00:02:29,840 --> 00:02:34,959 and based on the historical data 65 00:02:32,720 --> 00:02:38,800 we are going to predict the future 66 00:02:34,959 --> 00:02:42,560 trends say for example we have collected 67 00:02:38,800 --> 00:02:43,840 uh temperature data for say around 68 00:02:42,560 --> 00:02:46,319 one month 69 00:02:43,840 --> 00:02:48,400 okay then we are going to predict what 70 00:02:46,319 --> 00:02:49,519 will be the temperature 71 00:02:48,400 --> 00:02:51,840 next day 72 00:02:49,519 --> 00:02:54,560 or maybe we are 73 00:02:51,840 --> 00:02:56,800 collecting the temperature data 74 00:02:54,560 --> 00:02:59,519 say for the month of september over the 75 00:02:56,800 --> 00:03:01,200 years and we are going to predict what 76 00:02:59,519 --> 00:03:03,440 will be the temperature temperature and 77 00:03:01,200 --> 00:03:05,840 what will be the weather 78 00:03:03,440 --> 00:03:07,760 for september 2022 79 00:03:05,840 --> 00:03:09,920 so this is what we are going to do in 80 00:03:07,760 --> 00:03:11,920 time series forecasting of course it is 81 00:03:09,920 --> 00:03:13,920 not only related to 82 00:03:11,920 --> 00:03:17,280 weather forecasting anything related to 83 00:03:13,920 --> 00:03:19,599 forecasting and which is time dependent 84 00:03:17,280 --> 00:03:23,360 we are going to forecast 85 00:03:19,599 --> 00:03:26,239 and python has a has got great support 86 00:03:23,360 --> 00:03:28,080 for these libraries 87 00:03:26,239 --> 00:03:31,920 now there are some application areas 88 00:03:28,080 --> 00:03:31,920 which i have uh listed 89 00:03:32,080 --> 00:03:36,720 i may not be able to explain everything 90 00:03:35,599 --> 00:03:38,959 but we will 91 00:03:36,720 --> 00:03:40,560 but i will explain few 92 00:03:38,959 --> 00:03:42,480 say for example 93 00:03:40,560 --> 00:03:44,799 weather forecasting i have just 94 00:03:42,480 --> 00:03:48,080 explained so it may 95 00:03:44,799 --> 00:03:50,400 be used in agricultural industry may be 96 00:03:48,080 --> 00:03:51,519 used by farmers also 97 00:03:50,400 --> 00:03:53,280 so 98 00:03:51,519 --> 00:03:54,720 before 99 00:03:53,280 --> 00:03:58,000 sowing these seeds 100 00:03:54,720 --> 00:03:59,840 a farmer wants to know what will be the 101 00:03:58,000 --> 00:04:01,200 weather forecast for 102 00:03:59,840 --> 00:04:03,680 next few days 103 00:04:01,200 --> 00:04:06,080 so that accordingly he can 104 00:04:03,680 --> 00:04:10,560 plan his farming activities 105 00:04:06,080 --> 00:04:10,560 then similarly earthquake prediction 106 00:04:11,040 --> 00:04:13,120 so 107 00:04:11,760 --> 00:04:15,040 there will be 108 00:04:13,120 --> 00:04:17,200 patterns of earthquakes so what will be 109 00:04:15,040 --> 00:04:17,200 the 110 00:04:17,280 --> 00:04:22,079 area whatever the environment conditions 111 00:04:20,079 --> 00:04:22,960 are earthquakes are happening in that 112 00:04:22,079 --> 00:04:25,120 area 113 00:04:22,960 --> 00:04:27,440 frequently so based on that they try to 114 00:04:25,120 --> 00:04:29,199 predict when the earth earthquake may 115 00:04:27,440 --> 00:04:31,759 happen 116 00:04:29,199 --> 00:04:34,160 then statistics 117 00:04:31,759 --> 00:04:36,160 at least has got lots of 118 00:04:34,160 --> 00:04:37,520 libraries and lots of applications 119 00:04:36,160 --> 00:04:40,240 related to 120 00:04:37,520 --> 00:04:40,240 forecasting 121 00:04:40,840 --> 00:04:47,040 then also in pattern recognition 122 00:04:45,520 --> 00:04:48,400 then signal processing control 123 00:04:47,040 --> 00:04:52,240 engineering 124 00:04:48,400 --> 00:04:53,120 then mathematical finance astronomy 125 00:04:52,240 --> 00:04:55,440 so 126 00:04:53,120 --> 00:04:57,280 many applications are there 127 00:04:55,440 --> 00:05:00,479 maybe if i have to give 128 00:04:57,280 --> 00:05:03,280 one more example then 129 00:05:00,479 --> 00:05:04,880 say for example there's a website and i 130 00:05:03,280 --> 00:05:06,080 want to predict the 131 00:05:04,880 --> 00:05:08,000 number of 132 00:05:06,080 --> 00:05:10,720 visitors in 133 00:05:08,000 --> 00:05:12,240 next few days based on the previous data 134 00:05:10,720 --> 00:05:14,560 so historical data is very very 135 00:05:12,240 --> 00:05:16,639 important for us so based on that only 136 00:05:14,560 --> 00:05:19,199 we have to 137 00:05:16,639 --> 00:05:19,199 forecast 138 00:05:19,680 --> 00:05:24,479 now there are 139 00:05:21,039 --> 00:05:25,759 common time series patterns so whenever 140 00:05:24,479 --> 00:05:28,880 we use 141 00:05:25,759 --> 00:05:30,240 time series it is very important to 142 00:05:28,880 --> 00:05:32,320 go for 143 00:05:30,240 --> 00:05:34,639 visualization of the data so that 144 00:05:32,320 --> 00:05:35,680 visualization of the data 145 00:05:34,639 --> 00:05:37,360 will 146 00:05:35,680 --> 00:05:39,120 give us 147 00:05:37,360 --> 00:05:41,520 uh 148 00:05:39,120 --> 00:05:42,560 the patterns in the 149 00:05:41,520 --> 00:05:43,440 data 150 00:05:42,560 --> 00:05:45,680 okay 151 00:05:43,440 --> 00:05:45,680 so 152 00:05:45,759 --> 00:05:49,680 you can see here there are five 153 00:05:48,400 --> 00:05:51,680 time series 154 00:05:49,680 --> 00:05:55,120 common time series patterns 155 00:05:51,680 --> 00:05:56,720 so first one is level then second one is 156 00:05:55,120 --> 00:05:58,960 trend 157 00:05:56,720 --> 00:05:58,960 then 158 00:05:59,120 --> 00:06:02,639 next one is um 159 00:06:00,639 --> 00:06:04,840 seasonal or periodic 160 00:06:02,639 --> 00:06:06,400 then cyclical then 161 00:06:04,840 --> 00:06:08,400 irregular 162 00:06:06,400 --> 00:06:11,280 fluctuations 163 00:06:08,400 --> 00:06:13,680 so first one is irregular fluctuations 164 00:06:11,280 --> 00:06:15,840 irregular fluctuations are nothing but 165 00:06:13,680 --> 00:06:19,199 they are random in nature 166 00:06:15,840 --> 00:06:22,000 and most of the times this data will be 167 00:06:19,199 --> 00:06:24,080 random so we can't say that the data 168 00:06:22,000 --> 00:06:25,919 will always going to increase our data 169 00:06:24,080 --> 00:06:28,080 will always going to decrease of course 170 00:06:25,919 --> 00:06:29,360 there will be a pattern 171 00:06:28,080 --> 00:06:32,080 in that 172 00:06:29,360 --> 00:06:33,039 but irregular fluctuations is a random 173 00:06:32,080 --> 00:06:35,680 data 174 00:06:33,039 --> 00:06:37,199 there is a lot of randomness 175 00:06:35,680 --> 00:06:39,840 in in data 176 00:06:37,199 --> 00:06:40,960 the next one is the 177 00:06:39,840 --> 00:06:42,479 trend 178 00:06:40,960 --> 00:06:45,440 so trend 179 00:06:42,479 --> 00:06:47,840 shows whether the data is increasing or 180 00:06:45,440 --> 00:06:51,199 decreasing so on this slide you can see 181 00:06:47,840 --> 00:06:52,720 that there is a line here 182 00:06:51,199 --> 00:06:54,800 so which is 183 00:06:52,720 --> 00:06:57,360 growing so that means the trend is 184 00:06:54,800 --> 00:06:58,880 increasing of course trend may be 185 00:06:57,360 --> 00:07:02,400 decreasing also 186 00:06:58,880 --> 00:07:07,599 but in this example it is not shown 187 00:07:02,400 --> 00:07:07,599 then next is the seasonal or periodic 188 00:07:09,120 --> 00:07:13,840 say for example in india we have got 189 00:07:11,360 --> 00:07:15,520 three seasons one is rainy season and 190 00:07:13,840 --> 00:07:16,800 second one is winter season and third 191 00:07:15,520 --> 00:07:18,319 one is 192 00:07:16,800 --> 00:07:20,160 summer season 193 00:07:18,319 --> 00:07:23,280 so there's a lot of temperature 194 00:07:20,160 --> 00:07:23,280 variation and there is 195 00:07:23,919 --> 00:07:29,120 say forever in summer it will be a hot 196 00:07:26,080 --> 00:07:30,720 season then in 197 00:07:29,120 --> 00:07:32,240 rainy season there will be a lot of rain 198 00:07:30,720 --> 00:07:34,720 and in winter season there will be lot 199 00:07:32,240 --> 00:07:34,720 of cold 200 00:07:35,039 --> 00:07:37,599 so 201 00:07:35,919 --> 00:07:39,759 there will be different temperature in 202 00:07:37,599 --> 00:07:42,080 these seasons 203 00:07:39,759 --> 00:07:42,840 then similarly 204 00:07:42,080 --> 00:07:45,199 these 205 00:07:42,840 --> 00:07:47,199 seasons there are different periods say 206 00:07:45,199 --> 00:07:49,599 for example some pattern may repeat 207 00:07:47,199 --> 00:07:51,759 daily some pattern may repeat weekly 208 00:07:49,599 --> 00:07:55,440 some pattern may repeat monthly 209 00:07:51,759 --> 00:07:55,440 and some pattern may repeat 210 00:07:56,160 --> 00:07:58,479 yearly 211 00:07:57,599 --> 00:08:01,039 okay 212 00:07:58,479 --> 00:08:03,680 in astronomy in astronomy also there are 213 00:08:01,039 --> 00:08:05,919 some patterns say for example 214 00:08:03,680 --> 00:08:05,919 uh 215 00:08:06,720 --> 00:08:11,039 uh maybe 216 00:08:08,160 --> 00:08:12,800 the moving cycles or the 217 00:08:11,039 --> 00:08:14,960 uh 218 00:08:12,800 --> 00:08:18,000 the constellations 219 00:08:14,960 --> 00:08:19,840 they have got some pattern of occurrence 220 00:08:18,000 --> 00:08:22,400 the next is the 221 00:08:19,840 --> 00:08:24,000 cyclical so in cyclical 222 00:08:22,400 --> 00:08:27,120 we are going to have 223 00:08:24,000 --> 00:08:28,400 both patterns increasing as well as 224 00:08:27,120 --> 00:08:30,560 decreasing 225 00:08:28,400 --> 00:08:32,880 but it will not be a just 226 00:08:30,560 --> 00:08:34,399 one phase there will be multiple 227 00:08:32,880 --> 00:08:35,519 repetitive 228 00:08:34,399 --> 00:08:39,279 phases 229 00:08:35,519 --> 00:08:41,519 so you can see here that there is a 230 00:08:39,279 --> 00:08:43,360 upward trend then downward and then 231 00:08:41,519 --> 00:08:46,560 again upward trend so this is cyclical 232 00:08:43,360 --> 00:08:49,200 then again uh downward trend so this is 233 00:08:46,560 --> 00:08:51,360 cyclical so irregular fluctuations we 234 00:08:49,200 --> 00:08:52,880 have discussed that it's a 235 00:08:51,360 --> 00:08:54,800 random one 236 00:08:52,880 --> 00:08:57,360 so these are the common time series 237 00:08:54,800 --> 00:08:58,640 patterns and of course this also affects 238 00:08:57,360 --> 00:09:01,440 our 239 00:08:58,640 --> 00:09:01,440 our results 240 00:09:04,000 --> 00:09:09,760 then there are some basic steps in 241 00:09:06,839 --> 00:09:12,160 performing forecasting 242 00:09:09,760 --> 00:09:14,800 so first one is the defining a problem 243 00:09:12,160 --> 00:09:16,800 then collect the information 244 00:09:14,800 --> 00:09:18,800 then perform preliminary 245 00:09:16,800 --> 00:09:20,959 explorative analysis 246 00:09:18,800 --> 00:09:24,800 then choose and 247 00:09:20,959 --> 00:09:26,480 fit model then use and evaluate 248 00:09:24,800 --> 00:09:27,440 a forecasting 249 00:09:26,480 --> 00:09:28,720 model 250 00:09:27,440 --> 00:09:30,640 so first one 251 00:09:28,720 --> 00:09:32,399 is the defining a problem which is very 252 00:09:30,640 --> 00:09:35,600 very important so 253 00:09:32,399 --> 00:09:37,360 we need a problem to work on 254 00:09:35,600 --> 00:09:39,120 we need a date 255 00:09:37,360 --> 00:09:40,720 we need a problem to work on and here we 256 00:09:39,120 --> 00:09:42,000 need to define what is our problem and 257 00:09:40,720 --> 00:09:43,120 what we want to 258 00:09:42,000 --> 00:09:45,760 achieve 259 00:09:43,120 --> 00:09:47,760 so problem and goal should be defined 260 00:09:45,760 --> 00:09:50,000 here the next is 261 00:09:47,760 --> 00:09:51,680 collect the information so here we are 262 00:09:50,000 --> 00:09:53,839 building the 263 00:09:51,680 --> 00:09:55,440 data set by collecting the 264 00:09:53,839 --> 00:09:57,600 historical data 265 00:09:55,440 --> 00:10:00,720 so we also need 266 00:09:57,600 --> 00:10:02,560 inputs from the domain experts 267 00:10:00,720 --> 00:10:05,839 if your domain expert you can directly 268 00:10:02,560 --> 00:10:07,519 work on it then perform a preliminary 269 00:10:05,839 --> 00:10:08,720 explorative 270 00:10:07,519 --> 00:10:10,800 analysis 271 00:10:08,720 --> 00:10:13,920 so in preliminary explorative analysis 272 00:10:10,800 --> 00:10:16,000 you are going to use tools like um 273 00:10:13,920 --> 00:10:19,800 visualization tools 274 00:10:16,000 --> 00:10:19,800 to identify the 275 00:10:20,240 --> 00:10:23,680 kind of 276 00:10:21,839 --> 00:10:24,800 data pattern 277 00:10:23,680 --> 00:10:26,560 and 278 00:10:24,800 --> 00:10:27,760 once you identify this data pattern then 279 00:10:26,560 --> 00:10:29,440 you can 280 00:10:27,760 --> 00:10:30,480 select a model to 281 00:10:29,440 --> 00:10:32,800 fit 282 00:10:30,480 --> 00:10:34,560 okay so in building a preliminary 283 00:10:32,800 --> 00:10:36,640 explorative analysis you are basically 284 00:10:34,560 --> 00:10:39,760 trying to find out the 285 00:10:36,640 --> 00:10:41,680 pattern in the data by performing 286 00:10:39,760 --> 00:10:44,160 several operations 287 00:10:41,680 --> 00:10:48,160 such as visualization and so on 288 00:10:44,160 --> 00:10:50,320 then choose and fit a model 289 00:10:48,160 --> 00:10:52,959 here you are going to experiment with 290 00:10:50,320 --> 00:10:54,079 different models so the one which gives 291 00:10:52,959 --> 00:10:55,839 you 292 00:10:54,079 --> 00:10:57,680 less error 293 00:10:55,839 --> 00:11:01,839 compared to the other 294 00:10:57,680 --> 00:11:01,839 model you are going to select that 295 00:11:02,480 --> 00:11:07,839 then use and evaluate 296 00:11:05,680 --> 00:11:09,440 forecasting model in your application so 297 00:11:07,839 --> 00:11:10,560 finally you are going to 298 00:11:09,440 --> 00:11:14,120 deploy it 299 00:11:10,560 --> 00:11:14,120 in your application 300 00:11:16,079 --> 00:11:19,760 now these are the 301 00:11:18,160 --> 00:11:22,880 common time series 302 00:11:19,760 --> 00:11:25,040 methods uh supported by stats models 303 00:11:22,880 --> 00:11:27,360 package in python 304 00:11:25,040 --> 00:11:28,880 so first one is the auto regression then 305 00:11:27,360 --> 00:11:30,399 moving average 306 00:11:28,880 --> 00:11:32,399 so these are the 307 00:11:30,399 --> 00:11:34,399 two basic 308 00:11:32,399 --> 00:11:35,680 time series methods 309 00:11:34,399 --> 00:11:36,880 but 310 00:11:35,680 --> 00:11:37,839 if you see the 311 00:11:36,880 --> 00:11:39,279 rest 312 00:11:37,839 --> 00:11:41,839 then 313 00:11:39,279 --> 00:11:44,880 they are the combinations of first two 314 00:11:41,839 --> 00:11:44,880 and apart from that 315 00:11:45,279 --> 00:11:49,760 the 316 00:11:47,200 --> 00:11:53,200 different parameters such as seasonality 317 00:11:49,760 --> 00:11:54,720 has been added to the 318 00:11:53,200 --> 00:11:55,519 basic methods 319 00:11:54,720 --> 00:11:57,760 okay 320 00:11:55,519 --> 00:11:59,839 so first one is auto regression then 321 00:11:57,760 --> 00:12:02,000 moving average then auto regressive 322 00:11:59,839 --> 00:12:04,000 moving average auto regressive 323 00:12:02,000 --> 00:12:06,160 integrated 324 00:12:04,000 --> 00:12:08,639 moving average then seasonal auto 325 00:12:06,160 --> 00:12:10,560 regressive integrated moving average 326 00:12:08,639 --> 00:12:13,680 then seasonal autoregressive integrated 327 00:12:10,560 --> 00:12:15,680 moving average with exogenous regressors 328 00:12:13,680 --> 00:12:16,720 and vector auto regression moving 329 00:12:15,680 --> 00:12:18,399 average 330 00:12:16,720 --> 00:12:21,680 then hold winston's exponential 331 00:12:18,399 --> 00:12:24,680 smoothing and finally the diki fuller 332 00:12:21,680 --> 00:12:24,680 test 333 00:12:26,160 --> 00:12:31,040 what happens in auto regression is that 334 00:12:28,320 --> 00:12:35,040 we are going to predict the 335 00:12:31,040 --> 00:12:38,639 future based on the past data 336 00:12:35,040 --> 00:12:40,800 so this past data is given as input to 337 00:12:38,639 --> 00:12:42,560 the regression 338 00:12:40,800 --> 00:12:45,040 then it will be 339 00:12:42,560 --> 00:12:46,639 then it will try to predict the 340 00:12:45,040 --> 00:12:47,839 uh 341 00:12:46,639 --> 00:12:50,240 values 342 00:12:47,839 --> 00:12:52,399 okay so we can just 343 00:12:50,240 --> 00:12:55,399 try to test this 344 00:12:52,399 --> 00:12:55,399 application 345 00:13:17,279 --> 00:13:21,360 okay so before that 346 00:13:19,600 --> 00:13:23,600 so let's see the source code so what we 347 00:13:21,360 --> 00:13:25,200 are doing here is we are trying to 348 00:13:23,600 --> 00:13:28,000 import the 349 00:13:25,200 --> 00:13:29,839 uh package 350 00:13:28,000 --> 00:13:30,880 import the modulator right then we are 351 00:13:29,839 --> 00:13:32,959 also 352 00:13:30,880 --> 00:13:35,680 importing the random module because we 353 00:13:32,959 --> 00:13:37,760 want to generate the data set random 354 00:13:35,680 --> 00:13:39,040 this is just for 355 00:13:37,760 --> 00:13:42,880 an example 356 00:13:39,040 --> 00:13:45,440 but you can really go for real-world 357 00:13:42,880 --> 00:13:45,440 data set 358 00:13:45,519 --> 00:13:49,360 then 359 00:13:46,560 --> 00:13:51,120 fit the model 360 00:13:49,360 --> 00:13:52,639 so here we are using the 361 00:13:51,120 --> 00:13:55,279 autoric function 362 00:13:52,639 --> 00:13:57,519 and it takes 363 00:13:55,279 --> 00:14:00,720 two parameters data and lags 364 00:13:57,519 --> 00:14:03,440 so here lags is the 365 00:14:00,720 --> 00:14:06,480 pattern so here we are predict we want 366 00:14:03,440 --> 00:14:06,480 to put the one value 367 00:14:07,519 --> 00:14:11,600 then fit the model 368 00:14:09,360 --> 00:14:12,880 then make the prediction 369 00:14:11,600 --> 00:14:15,880 and print the 370 00:14:12,880 --> 00:14:15,880 prediction 371 00:14:18,320 --> 00:14:22,639 okay so next is uh 372 00:14:20,720 --> 00:14:24,720 the moving average 373 00:14:22,639 --> 00:14:27,279 so in moving average what happens is we 374 00:14:24,720 --> 00:14:27,279 are going to 375 00:14:28,839 --> 00:14:35,760 uh the average so based on 376 00:14:32,959 --> 00:14:38,720 selected data points 377 00:14:35,760 --> 00:14:40,240 say last n data points are taken to 378 00:14:38,720 --> 00:14:42,160 perform the 379 00:14:40,240 --> 00:14:43,920 moving average 380 00:14:42,160 --> 00:14:45,360 or to calculate the to calculate the 381 00:14:43,920 --> 00:14:49,120 moving average 382 00:14:45,360 --> 00:14:50,639 so here we are importing the 383 00:14:49,120 --> 00:14:52,800 arima model 384 00:14:50,639 --> 00:14:52,800 uh 385 00:14:53,680 --> 00:15:00,320 because we don't have any direct 386 00:14:56,320 --> 00:15:00,320 moving average module here 387 00:15:01,760 --> 00:15:05,760 but the first parameter is 388 00:15:04,079 --> 00:15:09,440 set as uh 389 00:15:05,760 --> 00:15:10,839 zero so that is going to make it as 390 00:15:09,440 --> 00:15:13,199 moving 391 00:15:10,839 --> 00:15:15,519 average so 392 00:15:13,199 --> 00:15:17,920 we are making first parameters in order 393 00:15:15,519 --> 00:15:20,399 that is auto regression and integrated 394 00:15:17,920 --> 00:15:20,399 as zero 395 00:15:20,720 --> 00:15:24,240 so 396 00:15:22,160 --> 00:15:27,839 it will compute the moving average on 397 00:15:24,240 --> 00:15:27,839 the random data 398 00:15:31,759 --> 00:15:34,959 the next is auto regressive moving 399 00:15:34,000 --> 00:15:37,279 average 400 00:15:34,959 --> 00:15:39,680 it is the combination of 401 00:15:37,279 --> 00:15:41,759 of previous two 402 00:15:39,680 --> 00:15:43,519 models 403 00:15:41,759 --> 00:15:46,160 that is the 404 00:15:43,519 --> 00:15:48,959 auto regress auto regression and 405 00:15:46,160 --> 00:15:48,959 moving average 406 00:15:51,839 --> 00:15:57,199 and third one is 407 00:15:55,120 --> 00:15:59,360 auto regressive integrated moving 408 00:15:57,199 --> 00:16:01,600 average arima again it's a combination 409 00:15:59,360 --> 00:16:05,120 of previous two but here the 410 00:16:01,600 --> 00:16:07,519 additional parameter has been added 411 00:16:05,120 --> 00:16:10,000 so that is the 412 00:16:07,519 --> 00:16:12,079 integrating feature 413 00:16:10,000 --> 00:16:14,800 so the integrating feature 414 00:16:12,079 --> 00:16:16,639 is related to the 415 00:16:14,800 --> 00:16:17,920 seasonality 416 00:16:16,639 --> 00:16:20,240 trends 417 00:16:17,920 --> 00:16:20,240 here 418 00:16:21,199 --> 00:16:24,560 uh not seasonality sorry it's it's 419 00:16:23,199 --> 00:16:26,240 related to the 420 00:16:24,560 --> 00:16:29,440 stationarity 421 00:16:26,240 --> 00:16:32,079 so what happens is um 422 00:16:29,440 --> 00:16:33,839 if your data is not stationary then you 423 00:16:32,079 --> 00:16:35,839 have to make it stationary whenever you 424 00:16:33,839 --> 00:16:37,199 are performing 425 00:16:35,839 --> 00:16:39,600 time series 426 00:16:37,199 --> 00:16:44,240 forecasting 427 00:16:39,600 --> 00:16:44,240 so that is what happens in arima model 428 00:16:46,959 --> 00:16:52,480 then in 429 00:16:48,000 --> 00:16:55,120 sarima model the seasonality values are 430 00:16:52,480 --> 00:16:59,040 are added 431 00:16:55,120 --> 00:17:01,680 so it directly supports the 432 00:16:59,040 --> 00:17:04,480 it directly combines the arima model 433 00:17:01,680 --> 00:17:07,120 with the ability of again auto 434 00:17:04,480 --> 00:17:10,640 regression then 435 00:17:07,120 --> 00:17:13,360 differencing then moving average 436 00:17:10,640 --> 00:17:16,640 and at the seasonal level 437 00:17:13,360 --> 00:17:19,120 so it is suitable for again univariate 438 00:17:16,640 --> 00:17:21,919 time series 439 00:17:19,120 --> 00:17:23,919 or trend or seasonal components so again 440 00:17:21,919 --> 00:17:26,480 all previous methods whatever we have 441 00:17:23,919 --> 00:17:29,760 discussed they are all related to or 442 00:17:26,480 --> 00:17:29,760 they are all suitable for 443 00:17:29,840 --> 00:17:32,840 uh 444 00:17:35,919 --> 00:17:39,760 they are also double for univariate time 445 00:17:38,080 --> 00:17:41,039 series analysis so there are some 446 00:17:39,760 --> 00:17:43,120 methods which are suitable for 447 00:17:41,039 --> 00:17:45,120 multivariate analysis which 448 00:17:43,120 --> 00:17:47,760 we will see later 449 00:17:45,120 --> 00:17:51,600 now all these methods in 450 00:17:47,760 --> 00:17:53,360 inside they use a kalman filter 451 00:17:51,600 --> 00:17:54,960 for 452 00:17:53,360 --> 00:17:56,400 fitting the values 453 00:17:54,960 --> 00:17:59,400 using the predict 454 00:17:56,400 --> 00:17:59,400 method 455 00:18:02,480 --> 00:18:08,799 now next one is 456 00:18:05,520 --> 00:18:10,320 cerimo model with exogenous 457 00:18:08,799 --> 00:18:13,520 regressors 458 00:18:10,320 --> 00:18:15,280 now here you can see here that 459 00:18:13,520 --> 00:18:16,559 we are using two 460 00:18:15,280 --> 00:18:18,960 data sets 461 00:18:16,559 --> 00:18:21,600 so data one and 462 00:18:18,960 --> 00:18:21,600 data two 463 00:18:23,360 --> 00:18:25,520 so 464 00:18:26,400 --> 00:18:30,480 we are saying that 465 00:18:28,960 --> 00:18:32,799 in ceramics 466 00:18:30,480 --> 00:18:34,880 function the first parameter is the 467 00:18:32,799 --> 00:18:37,919 data and second parameter is the 468 00:18:34,880 --> 00:18:39,360 exogenous regressor 469 00:18:37,919 --> 00:18:40,240 values 470 00:18:39,360 --> 00:18:42,400 so 471 00:18:40,240 --> 00:18:44,640 they are exogenous 472 00:18:42,400 --> 00:18:46,720 variables are also called as the 473 00:18:44,640 --> 00:18:48,080 covariates 474 00:18:46,720 --> 00:18:49,760 and they can be 475 00:18:48,080 --> 00:18:52,320 thought of as a 476 00:18:49,760 --> 00:18:54,799 parallel input sequences 477 00:18:52,320 --> 00:18:57,600 so that have observations in 478 00:18:54,799 --> 00:19:01,600 at the same steps at the original 479 00:18:57,600 --> 00:19:01,600 original series so that means 480 00:19:02,080 --> 00:19:05,280 at the same time 481 00:19:03,840 --> 00:19:07,679 there are 482 00:19:05,280 --> 00:19:10,720 multiple values which are 483 00:19:07,679 --> 00:19:13,760 being observed 484 00:19:10,720 --> 00:19:15,679 so again this method is suitable for 485 00:19:13,760 --> 00:19:18,320 univariate 486 00:19:15,679 --> 00:19:18,320 time series 487 00:19:22,960 --> 00:19:27,600 now vector auto regression moving 488 00:19:25,600 --> 00:19:30,320 average 489 00:19:27,600 --> 00:19:32,400 this uses 490 00:19:30,320 --> 00:19:33,520 auto regression and moving average 491 00:19:32,400 --> 00:19:35,840 methods 492 00:19:33,520 --> 00:19:37,679 combined with 493 00:19:35,840 --> 00:19:39,919 vectors 494 00:19:37,679 --> 00:19:41,840 so this is suitable for this is suitable 495 00:19:39,919 --> 00:19:46,400 for multivariate 496 00:19:41,840 --> 00:19:48,720 time analysis multivariate time series 497 00:19:46,400 --> 00:19:50,720 time series forecasting 498 00:19:48,720 --> 00:19:54,880 so that's why you can see here that in 499 00:19:50,720 --> 00:19:56,799 this code the list is created then 500 00:19:54,880 --> 00:19:59,360 the first vector is 501 00:19:56,799 --> 00:20:02,080 generated randomly where the second 502 00:19:59,360 --> 00:20:03,840 vectors values are dependent on the 503 00:20:02,080 --> 00:20:07,440 first vector 504 00:20:03,840 --> 00:20:09,280 so basically you need here two sets of 505 00:20:07,440 --> 00:20:12,400 values 506 00:20:09,280 --> 00:20:14,559 so a row will be having both the values 507 00:20:12,400 --> 00:20:16,400 vector one as well as vector two now 508 00:20:14,559 --> 00:20:18,840 note here that vector two's value is 509 00:20:16,400 --> 00:20:22,720 always dependent on the vector 510 00:20:18,840 --> 00:20:25,200 one so this is suitable for 511 00:20:22,720 --> 00:20:26,559 multivariate time series 512 00:20:25,200 --> 00:20:28,000 forecasting 513 00:20:26,559 --> 00:20:30,559 but 514 00:20:28,000 --> 00:20:30,559 without 515 00:20:30,799 --> 00:20:36,480 but without 516 00:20:32,240 --> 00:20:36,480 trend and seasonal components 517 00:20:40,960 --> 00:20:45,120 the next one is the holtminster's 518 00:20:43,679 --> 00:20:46,320 exponential 519 00:20:45,120 --> 00:20:49,039 smoothing 520 00:20:46,320 --> 00:20:51,679 so this using 521 00:20:49,039 --> 00:20:54,720 this method uses the 522 00:20:51,679 --> 00:20:54,720 smoothing techniques 523 00:20:55,520 --> 00:21:00,240 but specifically 524 00:20:57,280 --> 00:21:03,200 holtwinster's exponential smoothing uses 525 00:21:00,240 --> 00:21:05,200 triple exponential smoothing method 526 00:21:03,200 --> 00:21:07,600 and it models the 527 00:21:05,200 --> 00:21:10,000 next time step as the exponentially 528 00:21:07,600 --> 00:21:12,000 weighted linear function 529 00:21:10,000 --> 00:21:14,080 of observations 530 00:21:12,000 --> 00:21:16,320 at prior time steps 531 00:21:14,080 --> 00:21:17,120 then taking trends and seasonality into 532 00:21:16,320 --> 00:21:19,520 the 533 00:21:17,120 --> 00:21:19,520 account 534 00:21:20,720 --> 00:21:25,200 so again it is suitable for univariate 535 00:21:23,120 --> 00:21:26,320 time series but with 536 00:21:25,200 --> 00:21:27,520 trend or 537 00:21:26,320 --> 00:21:30,520 seasonal 538 00:21:27,520 --> 00:21:30,520 components 539 00:21:34,720 --> 00:21:37,679 then next is the 540 00:21:38,480 --> 00:21:42,960 package called as 541 00:21:40,559 --> 00:21:45,840 profit it is by 542 00:21:42,960 --> 00:21:45,840 by facebook 543 00:21:47,039 --> 00:21:50,960 so it's a open source software 544 00:21:49,360 --> 00:21:52,799 addressed by facebook 545 00:21:50,960 --> 00:21:55,440 core data science team 546 00:21:52,799 --> 00:21:58,799 so it's a procedure for forecasting 547 00:21:55,440 --> 00:21:59,840 time series data based on an additive 548 00:21:58,799 --> 00:22:02,480 model 549 00:21:59,840 --> 00:22:04,880 where non-linear trends are 550 00:22:02,480 --> 00:22:06,159 fit with yearly weekly and daily 551 00:22:04,880 --> 00:22:08,880 seasonality 552 00:22:06,159 --> 00:22:10,559 plus holiday effects 553 00:22:08,880 --> 00:22:12,720 it works best with 554 00:22:10,559 --> 00:22:13,840 time series that have strong seasonal 555 00:22:12,720 --> 00:22:17,200 effects 556 00:22:13,840 --> 00:22:19,039 and several seasons of historical data 557 00:22:17,200 --> 00:22:22,720 it's a robust 558 00:22:19,039 --> 00:22:25,280 to missing data and shifts 559 00:22:22,720 --> 00:22:26,559 in in the trend and typically handles 560 00:22:25,280 --> 00:22:27,840 outliers 561 00:22:26,559 --> 00:22:30,080 well 562 00:22:27,840 --> 00:22:31,919 so you can visit this url for more 563 00:22:30,080 --> 00:22:34,480 information 564 00:22:31,919 --> 00:22:36,640 it is also it also supports 565 00:22:34,480 --> 00:22:36,640 r 566 00:22:37,520 --> 00:22:41,840 so it has been widely used in 567 00:22:42,000 --> 00:22:46,559 facebook and its 568 00:22:43,760 --> 00:22:46,559 applications 569 00:22:48,400 --> 00:22:52,400 and the next package is the 570 00:22:50,880 --> 00:22:54,720 darts 571 00:22:52,400 --> 00:22:56,799 so it's a python library for 572 00:22:54,720 --> 00:22:59,039 easy manipulation and 573 00:22:56,799 --> 00:23:00,400 forecasting of 574 00:22:59,039 --> 00:23:02,960 time series 575 00:23:00,400 --> 00:23:04,159 so it contains very variety of models 576 00:23:02,960 --> 00:23:06,159 from 577 00:23:04,159 --> 00:23:07,520 classics such as arima to deep neural 578 00:23:06,159 --> 00:23:10,320 networks 579 00:23:07,520 --> 00:23:11,919 the models can all be used in the 580 00:23:10,320 --> 00:23:14,000 same way using 581 00:23:11,919 --> 00:23:15,760 fit and predict functions similar to 582 00:23:14,000 --> 00:23:18,240 scikit lab 583 00:23:15,760 --> 00:23:20,000 then it also makes it 584 00:23:18,240 --> 00:23:22,400 easy to backtest models and combine 585 00:23:20,000 --> 00:23:23,919 predictions of several models 586 00:23:22,400 --> 00:23:25,280 and external 587 00:23:23,919 --> 00:23:27,120 regressors 588 00:23:25,280 --> 00:23:29,840 so it supports both univariate and 589 00:23:27,120 --> 00:23:31,760 multivariate time series and models 590 00:23:29,840 --> 00:23:34,320 then neural networks can be trained on 591 00:23:31,760 --> 00:23:37,919 multiple time series and some 592 00:23:34,320 --> 00:23:40,480 of the models offer the probabilistic 593 00:23:37,919 --> 00:23:43,440 forecasts 594 00:23:40,480 --> 00:23:46,080 now if you see this slide then it shows 595 00:23:43,440 --> 00:23:47,520 the various capabilities of 596 00:23:46,080 --> 00:23:49,039 a darts um 597 00:23:47,520 --> 00:23:52,400 library 598 00:23:49,039 --> 00:23:54,080 so for discovery it supports 599 00:23:52,400 --> 00:23:56,080 seasonality and trend checks 600 00:23:54,080 --> 00:23:59,279 visualizations 601 00:23:56,080 --> 00:24:01,279 then new models 602 00:23:59,279 --> 00:24:04,400 then for preprocessing it supports 603 00:24:01,279 --> 00:24:06,080 normalization then interpolation of 604 00:24:04,400 --> 00:24:09,039 missing values 605 00:24:06,080 --> 00:24:10,400 then seasonality or 606 00:24:09,039 --> 00:24:11,679 or trend 607 00:24:10,400 --> 00:24:12,880 removal 608 00:24:11,679 --> 00:24:15,679 then up 609 00:24:12,880 --> 00:24:18,000 or down sampling 610 00:24:15,679 --> 00:24:20,480 then it supports classic forecasting 611 00:24:18,000 --> 00:24:23,120 models so which we have 612 00:24:20,480 --> 00:24:24,159 seen in our slides 613 00:24:23,120 --> 00:24:26,640 such as 614 00:24:24,159 --> 00:24:29,120 auto regression moving average 615 00:24:26,640 --> 00:24:30,799 arima ceremony max 616 00:24:29,120 --> 00:24:33,679 etc 617 00:24:30,799 --> 00:24:35,600 then it also supports it also integrates 618 00:24:33,679 --> 00:24:37,440 the facebook's 619 00:24:35,600 --> 00:24:38,400 profit library 620 00:24:37,440 --> 00:24:40,080 and 621 00:24:38,400 --> 00:24:42,240 many others 622 00:24:40,080 --> 00:24:44,240 then for model selection and evaluation 623 00:24:42,240 --> 00:24:48,240 it supports back testing 624 00:24:44,240 --> 00:24:51,120 then residual analysis then research 625 00:24:48,240 --> 00:24:53,600 and it also supports various 626 00:24:51,120 --> 00:24:53,600 metrics 627 00:24:54,320 --> 00:24:59,760 so in summary we have discussed about 628 00:24:57,279 --> 00:25:01,440 basics of time series forecasting 629 00:24:59,760 --> 00:25:02,480 then different times is forecasting 630 00:25:01,440 --> 00:25:04,000 techniques 631 00:25:02,480 --> 00:25:05,679 then different python libraries for 632 00:25:04,000 --> 00:25:08,000 performing time series forecasting 633 00:25:05,679 --> 00:25:10,320 quickly of course these are not the only 634 00:25:08,000 --> 00:25:12,080 libraries we have also got other 635 00:25:10,320 --> 00:25:13,840 libraries such as you can also go for 636 00:25:12,080 --> 00:25:17,360 tensorflow so there is a nice 637 00:25:13,840 --> 00:25:17,360 documentation available 638 00:25:18,480 --> 00:25:23,120 so depending on your application 639 00:25:20,400 --> 00:25:24,960 depending on your 640 00:25:23,120 --> 00:25:26,720 depending on the 641 00:25:24,960 --> 00:25:29,039 pattern in the data 642 00:25:26,720 --> 00:25:30,400 you should be selecting the model and 643 00:25:29,039 --> 00:25:33,600 when you are selecting the model of 644 00:25:30,400 --> 00:25:35,760 course there are different options 645 00:25:33,600 --> 00:25:36,960 so you need to select that option which 646 00:25:35,760 --> 00:25:42,080 is 647 00:25:36,960 --> 00:25:43,679 giving you less error as well as 648 00:25:42,080 --> 00:25:45,039 the one which is 649 00:25:43,679 --> 00:25:46,480 performing 650 00:25:45,039 --> 00:25:48,799 better 651 00:25:46,480 --> 00:25:52,080 compared to the others 652 00:25:48,799 --> 00:25:54,159 so thank you everyone for attending my 653 00:25:52,080 --> 00:25:57,400 talk 654 00:25:54,159 --> 00:25:57,400 thank you 655 00:26:07,120 --> 00:26:11,440 thank you kashendra 656 00:26:09,279 --> 00:26:14,159 um we have a couple of questions for you 657 00:26:11,440 --> 00:26:16,000 from from the audience i'm gonna read 658 00:26:14,159 --> 00:26:18,480 them to you the first one is 659 00:26:16,000 --> 00:26:20,720 what advice do you have for someone 660 00:26:18,480 --> 00:26:23,120 starting a time series project for the 661 00:26:20,720 --> 00:26:25,840 first time 662 00:26:23,120 --> 00:26:27,120 okay so if you are starting for 663 00:26:25,840 --> 00:26:29,679 if you are starting 664 00:26:27,120 --> 00:26:31,279 a time series project for the first time 665 00:26:29,679 --> 00:26:33,600 then 666 00:26:31,279 --> 00:26:35,279 first learn about the 667 00:26:33,600 --> 00:26:39,360 statistics and different statistical 668 00:26:35,279 --> 00:26:42,640 methods which we have discussed 669 00:26:39,360 --> 00:26:42,640 then you should have a 670 00:26:43,679 --> 00:26:47,760 basic understanding of 671 00:26:45,520 --> 00:26:51,200 these statistics and mathematics 672 00:26:47,760 --> 00:26:52,640 then learn the basic libraries 673 00:26:51,200 --> 00:26:55,840 in python 674 00:26:52,640 --> 00:26:57,840 so pandas data frames 675 00:26:55,840 --> 00:27:00,480 numpy's scipy 676 00:26:57,840 --> 00:27:02,559 stat models they will help then you can 677 00:27:00,480 --> 00:27:04,000 learn a specific 678 00:27:02,559 --> 00:27:05,679 python package 679 00:27:04,000 --> 00:27:08,240 which is which satisfies your 680 00:27:05,679 --> 00:27:11,679 requirement of course we have got 681 00:27:08,240 --> 00:27:16,000 many options but you should see which 682 00:27:11,679 --> 00:27:16,000 satisfies your requirements better 683 00:27:16,080 --> 00:27:21,120 and then 684 00:27:18,320 --> 00:27:22,720 you have to select a domain 685 00:27:21,120 --> 00:27:25,840 in which you want to 686 00:27:22,720 --> 00:27:27,360 uh perform forecasting so if you have a 687 00:27:25,840 --> 00:27:28,960 domain knowledge then it's fine if you 688 00:27:27,360 --> 00:27:31,679 don't have a domain knowledge then you 689 00:27:28,960 --> 00:27:34,960 need to collaborate with someone 690 00:27:31,679 --> 00:27:37,120 and collect the data and once you have 691 00:27:34,960 --> 00:27:39,520 the data use the packages and you can 692 00:27:37,120 --> 00:27:39,520 start 693 00:27:40,159 --> 00:27:45,440 the forecasting or even start the time 694 00:27:42,559 --> 00:27:45,440 series analysis 695 00:27:46,320 --> 00:27:51,600 um thank you another question they have 696 00:27:48,559 --> 00:27:53,360 is about profit or darts 697 00:27:51,600 --> 00:27:55,200 about performance 698 00:27:53,360 --> 00:27:56,960 do you have any benchmark for the 699 00:27:55,200 --> 00:27:58,559 performance of each 700 00:27:56,960 --> 00:28:01,120 uh library which 701 00:27:58,559 --> 00:28:04,799 um is more performant or better 702 00:28:01,120 --> 00:28:09,360 maybe has a better use in other cases 703 00:28:04,799 --> 00:28:10,720 uh i'm not aware of the uh benchmarks 704 00:28:09,360 --> 00:28:14,080 but yeah maybe 705 00:28:10,720 --> 00:28:18,399 we may use metrics and test 706 00:28:14,080 --> 00:28:18,399 but i'm not aware of the benchmarks 707 00:28:18,960 --> 00:28:25,760 so i think this is it and thank you so 708 00:28:22,559 --> 00:28:28,399 much kajindra and i want to ask the 709 00:28:25,760 --> 00:28:30,960 venulist people to give a big virtual 710 00:28:28,399 --> 00:28:33,840 applause and thank you 711 00:28:30,960 --> 00:28:36,799 yeah thank you 712 00:28:33,840 --> 00:28:38,960 next up on this stage in about 15 713 00:28:36,799 --> 00:28:41,440 minutes at 2 30 714 00:28:38,960 --> 00:28:43,840 um eastern european time we'll have mali 715 00:28:41,440 --> 00:28:46,799 and mangami talking about a beginner's 716 00:28:43,840 --> 00:28:49,799 guide to gpus for pythonistas see you 717 00:28:46,799 --> 00:28:49,799 then