1 00:00:00,480 --> 00:00:03,480 foreign 2 00:00:08,220 --> 00:00:13,860 welcome back everyone uh we're gonna be 3 00:00:10,800 --> 00:00:16,020 hearing from Felipe who is 4 00:00:13,860 --> 00:00:17,760 a programmer involved in everything from 5 00:00:16,020 --> 00:00:21,420 satellite Communications to machine 6 00:00:17,760 --> 00:00:24,900 learning and we'll be learning from this 7 00:00:21,420 --> 00:00:27,119 well-versed programmer about learning to 8 00:00:24,900 --> 00:00:28,680 sketch with differential differentiable 9 00:00:27,119 --> 00:00:30,240 rendering 10 00:00:28,680 --> 00:00:32,960 all right 11 00:00:30,240 --> 00:00:32,960 thank you 12 00:00:34,980 --> 00:00:40,440 all right so let's do this 13 00:00:38,280 --> 00:00:42,120 um we are going to be learning to sketch 14 00:00:40,440 --> 00:00:43,860 actually the computer is going to be 15 00:00:42,120 --> 00:00:46,620 learning to sketch 16 00:00:43,860 --> 00:00:48,300 um and there are a few things about 17 00:00:46,620 --> 00:00:51,000 sketching there are interesting for a 18 00:00:48,300 --> 00:00:56,039 computer one of them is that it's not a 19 00:00:51,000 --> 00:00:58,199 very computer-like so let's get into it 20 00:00:56,039 --> 00:01:01,260 um first question is can actually 21 00:00:58,199 --> 00:01:05,159 computers can computers actually sketch 22 00:01:01,260 --> 00:01:07,439 and turn this out yes and they are very 23 00:01:05,159 --> 00:01:11,340 good at it actually you are too good at 24 00:01:07,439 --> 00:01:16,020 it so let's move the gopost and 25 00:01:11,340 --> 00:01:18,540 let's ask can computer sketch badly now 26 00:01:16,020 --> 00:01:21,900 now that computers are not having a good 27 00:01:18,540 --> 00:01:25,100 time let's actually show what this looks 28 00:01:21,900 --> 00:01:29,580 like so if you have a picture like this 29 00:01:25,100 --> 00:01:32,040 this is what I want uh none of this 30 00:01:29,580 --> 00:01:32,759 high quality stuff 31 00:01:32,040 --> 00:01:34,680 um 32 00:01:32,759 --> 00:01:36,540 all right so let's get into how you 33 00:01:34,680 --> 00:01:38,759 actually do this and you do it with 34 00:01:36,540 --> 00:01:40,320 machine learning 35 00:01:38,759 --> 00:01:42,479 um and machine learning is something 36 00:01:40,320 --> 00:01:44,340 like this you have a bunch of numbers 37 00:01:42,479 --> 00:01:46,740 you have something there in the middle 38 00:01:44,340 --> 00:01:48,540 and you have a bunch of numbers in the 39 00:01:46,740 --> 00:01:51,720 other end and the thing in the middle 40 00:01:48,540 --> 00:01:53,399 that's actually neural networks but we 41 00:01:51,720 --> 00:01:55,259 don't like neural networks here so we 42 00:01:53,399 --> 00:01:56,939 are going to switch that over for 43 00:01:55,259 --> 00:02:00,119 something 44 00:01:56,939 --> 00:02:01,920 that I'm going to talk about later and 45 00:02:00,119 --> 00:02:03,479 in the output we also don't like a bunch 46 00:02:01,920 --> 00:02:05,880 of numbers we are trying to sketch here 47 00:02:03,479 --> 00:02:08,700 so that's going to be a sketch 48 00:02:05,880 --> 00:02:11,459 all right and so it's something like 49 00:02:08,700 --> 00:02:13,739 this in the output I put a very bad 50 00:02:11,459 --> 00:02:16,800 scatter V Line there that's something 51 00:02:13,739 --> 00:02:18,540 like what you want and the thing in the 52 00:02:16,800 --> 00:02:20,040 middle that's just a function that Maps 53 00:02:18,540 --> 00:02:21,620 a bunch of numbers 54 00:02:20,040 --> 00:02:24,420 to sketches 55 00:02:21,620 --> 00:02:27,480 now how does machine learning actually 56 00:02:24,420 --> 00:02:29,520 work it works by 57 00:02:27,480 --> 00:02:33,120 say that you have the numbers you use 58 00:02:29,520 --> 00:02:35,580 those as knobs and then you turn them a 59 00:02:33,120 --> 00:02:36,900 little and the thing in the output is 60 00:02:35,580 --> 00:02:38,220 going to change 61 00:02:36,900 --> 00:02:41,879 all right 62 00:02:38,220 --> 00:02:44,459 so that's basically how machine learning 63 00:02:41,879 --> 00:02:47,660 works I'm leaving out a very major part 64 00:02:44,459 --> 00:02:51,060 but we're going to talk about that later 65 00:02:47,660 --> 00:02:53,879 another thing that we need is rendering 66 00:02:51,060 --> 00:02:56,400 so I said you have some thing that 67 00:02:53,879 --> 00:02:59,599 generated sketches and that thing is a 68 00:02:56,400 --> 00:03:03,360 function but what is that actually 69 00:02:59,599 --> 00:03:05,879 so sketches are usually made of a bunch 70 00:03:03,360 --> 00:03:09,360 of lines and if you want to render a 71 00:03:05,879 --> 00:03:10,860 line you basically select two points and 72 00:03:09,360 --> 00:03:14,040 you say I want to draw a line from one 73 00:03:10,860 --> 00:03:16,140 point to the other and then you check 74 00:03:14,040 --> 00:03:18,840 every single of those queries on the 75 00:03:16,140 --> 00:03:19,500 screen and if they intersect with the 76 00:03:18,840 --> 00:03:22,739 line 77 00:03:19,500 --> 00:03:26,840 you fill them up all right so that's how 78 00:03:22,739 --> 00:03:29,940 you draw lines with a computer very nice 79 00:03:26,840 --> 00:03:32,840 but what is this business about being 80 00:03:29,940 --> 00:03:35,640 differentiable 81 00:03:32,840 --> 00:03:39,480 that's where the interesting things 82 00:03:35,640 --> 00:03:42,299 start to come in I said that I left out 83 00:03:39,480 --> 00:03:46,080 a very major part about machine learning 84 00:03:42,299 --> 00:03:48,120 and this is that part now the thing is 85 00:03:46,080 --> 00:03:49,640 when you generate those sketches here 86 00:03:48,120 --> 00:03:53,239 for example 87 00:03:49,640 --> 00:03:53,239 function sketches 88 00:03:54,000 --> 00:03:58,980 you actually want the computer to learn 89 00:03:56,280 --> 00:04:00,900 how to make a good sketch or in our case 90 00:03:58,980 --> 00:04:04,400 a bad sketch 91 00:04:00,900 --> 00:04:07,200 so you need to subtract whatever 92 00:04:04,400 --> 00:04:09,780 generating from whatever you actually 93 00:04:07,200 --> 00:04:12,420 want to generate and when we say like 94 00:04:09,780 --> 00:04:14,879 subtract you may notice I put a question 95 00:04:12,420 --> 00:04:17,699 mark between parentheses there because 96 00:04:14,879 --> 00:04:21,019 it's not very clear what you mean like 97 00:04:17,699 --> 00:04:23,400 subtract sketches what is that actually 98 00:04:21,019 --> 00:04:25,820 so basically we are going to represent 99 00:04:23,400 --> 00:04:28,440 every sketch is a big bunch of numbers 100 00:04:25,820 --> 00:04:30,180 and then you're going to subtract each 101 00:04:28,440 --> 00:04:33,720 one of those numbers from one sketch to 102 00:04:30,180 --> 00:04:35,940 the other so that's that's what we mean 103 00:04:33,720 --> 00:04:37,919 um so for example let's say that you 104 00:04:35,940 --> 00:04:42,660 have this blue line and the blue line is 105 00:04:37,919 --> 00:04:45,540 what you want to draw but your little uh 106 00:04:42,660 --> 00:04:48,120 rendering engine is actually creating 107 00:04:45,540 --> 00:04:50,520 the lining red so you need to figure out 108 00:04:48,120 --> 00:04:53,699 what you need to change to make that 109 00:04:50,520 --> 00:04:55,280 little lining red go to the position of 110 00:04:53,699 --> 00:04:58,740 the line in blue 111 00:04:55,280 --> 00:05:01,919 and how I do that is basically you need 112 00:04:58,740 --> 00:05:04,320 to figure out a way of understanding how 113 00:05:01,919 --> 00:05:06,840 far away they are the issue here is that 114 00:05:04,320 --> 00:05:10,380 like if you just subtract those two 115 00:05:06,840 --> 00:05:13,440 lines there is no overlap right so there 116 00:05:10,380 --> 00:05:15,720 is no way of knowing how incorrect am I 117 00:05:13,440 --> 00:05:18,960 in here because basically I'm 100 118 00:05:15,720 --> 00:05:21,060 incorrect the line is not overlapping 119 00:05:18,960 --> 00:05:24,240 what do we do we do there we need a way 120 00:05:21,060 --> 00:05:27,419 of tracing back the influences of each 121 00:05:24,240 --> 00:05:30,300 pixel in the way of saying okay I 122 00:05:27,419 --> 00:05:32,280 created this lining red but I want to 123 00:05:30,300 --> 00:05:34,380 create the line in blue but how far away 124 00:05:32,280 --> 00:05:38,340 am I from the blue line 125 00:05:34,380 --> 00:05:41,220 and it's basically just like Minesweeper 126 00:05:38,340 --> 00:05:42,419 and the idea here is that if you are 127 00:05:41,220 --> 00:05:45,419 incorrect 128 00:05:42,419 --> 00:05:46,919 we shouldn't know by how much and that's 129 00:05:45,419 --> 00:05:49,259 kind of the information that you get in 130 00:05:46,919 --> 00:05:51,840 mindsweeper and the way that we do it 131 00:05:49,259 --> 00:05:53,580 here is basically if you want to 132 00:05:51,840 --> 00:05:55,320 approach some wine instead of just 133 00:05:53,580 --> 00:05:58,080 saying this is the line that I want you 134 00:05:55,320 --> 00:06:00,360 also say this is one pixel away from the 135 00:05:58,080 --> 00:06:01,139 Line This is two pixels away from the 136 00:06:00,360 --> 00:06:03,680 line 137 00:06:01,139 --> 00:06:06,600 etc etc etc 138 00:06:03,680 --> 00:06:10,979 and to be a bit more visual let's say 139 00:06:06,600 --> 00:06:15,060 that this little blob is a pixel that 140 00:06:10,979 --> 00:06:17,580 you generated and the big gap in the 141 00:06:15,060 --> 00:06:19,919 middle that's the line that you want to 142 00:06:17,580 --> 00:06:22,800 generate now if you generate things like 143 00:06:19,919 --> 00:06:26,400 this the little pixel has no idea where 144 00:06:22,800 --> 00:06:28,199 to go because as it sees the world it's 145 00:06:26,400 --> 00:06:31,440 just a flat surface 146 00:06:28,199 --> 00:06:34,860 so what you do is you basically 147 00:06:31,440 --> 00:06:36,900 create pixel rides so you want to put 148 00:06:34,860 --> 00:06:39,060 pixels in places where they can actually 149 00:06:36,900 --> 00:06:43,020 find lines and find where they want to 150 00:06:39,060 --> 00:06:46,020 go by creating this smooth 151 00:06:43,020 --> 00:06:48,660 um slide like surface 152 00:06:46,020 --> 00:06:50,520 all right so that's what differentiable 153 00:06:48,660 --> 00:06:53,460 means you are actually instead of 154 00:06:50,520 --> 00:06:55,560 creating things like this you are 155 00:06:53,460 --> 00:06:57,539 creating things like these because 156 00:06:55,560 --> 00:07:01,020 pixels like them 157 00:06:57,539 --> 00:07:03,240 all right so you have the one to you 158 00:07:01,020 --> 00:07:05,400 want it versus actual now you know which 159 00:07:03,240 --> 00:07:07,500 direction you need to move and you can 160 00:07:05,400 --> 00:07:09,419 map all of that back to the input space 161 00:07:07,500 --> 00:07:13,139 all right 162 00:07:09,419 --> 00:07:15,120 so let's get more into rendering again 163 00:07:13,139 --> 00:07:18,360 um you have this thing called bezier 164 00:07:15,120 --> 00:07:20,520 curves and they are essentially we don't 165 00:07:18,360 --> 00:07:22,380 like straight things here so we are 166 00:07:20,520 --> 00:07:24,120 going to bend them 167 00:07:22,380 --> 00:07:26,280 um all right so instead of having 168 00:07:24,120 --> 00:07:27,599 straight lines now you're going to have 169 00:07:26,280 --> 00:07:29,759 curves 170 00:07:27,599 --> 00:07:31,800 very nice and you can see there is a 171 00:07:29,759 --> 00:07:34,199 middle point there instead of just 172 00:07:31,800 --> 00:07:37,139 having these starting end points and 173 00:07:34,199 --> 00:07:38,819 that's called a control point so now if 174 00:07:37,139 --> 00:07:40,800 you want to represent that curve you 175 00:07:38,819 --> 00:07:42,960 have six numbers two of them are the 176 00:07:40,800 --> 00:07:45,840 start point two of them on the end point 177 00:07:42,960 --> 00:07:47,220 and two of them are the little gray 178 00:07:45,840 --> 00:07:51,000 point in there 179 00:07:47,220 --> 00:07:53,280 all right so that's bezier curves 180 00:07:51,000 --> 00:07:54,840 um there's a bit of math there uh but 181 00:07:53,280 --> 00:07:56,400 it's basically just a set of equations 182 00:07:54,840 --> 00:07:59,340 that we can use we don't care too much 183 00:07:56,400 --> 00:08:02,400 about what exactly they're saying 184 00:07:59,340 --> 00:08:04,259 um and we can jump into code now 185 00:08:02,400 --> 00:08:07,259 all right 186 00:08:04,259 --> 00:08:09,720 we are going to use pytorch uh we are 187 00:08:07,259 --> 00:08:13,500 also going to use byters lightning and 188 00:08:09,720 --> 00:08:15,960 you're going to use Jupiter lab 189 00:08:13,500 --> 00:08:17,660 all right so to start with we want to 190 00:08:15,960 --> 00:08:20,340 draw those lines 191 00:08:17,660 --> 00:08:22,620 and basically we are going to translate 192 00:08:20,340 --> 00:08:24,000 that set of equations that we had into 193 00:08:22,620 --> 00:08:26,580 python code 194 00:08:24,000 --> 00:08:30,120 and in machine learning we like to do 195 00:08:26,580 --> 00:08:32,099 things in big batches because you want 196 00:08:30,120 --> 00:08:34,500 to do everything the GPU which is much 197 00:08:32,099 --> 00:08:38,700 faster than the CPU because it has a 198 00:08:34,500 --> 00:08:41,520 bunch of small computers all bunched 199 00:08:38,700 --> 00:08:43,500 together so to do computation the GPU 200 00:08:41,520 --> 00:08:47,880 you essentially do the same computation 201 00:08:43,500 --> 00:08:49,680 in all of those little computers and if 202 00:08:47,880 --> 00:08:52,320 you see those two first lines there 203 00:08:49,680 --> 00:08:55,380 where it says curves and then colon 204 00:08:52,320 --> 00:08:58,080 colon that's basically adding Dimensions 205 00:08:55,380 --> 00:09:00,480 so we can do a bunch of those at the 206 00:08:58,080 --> 00:09:03,120 same time 207 00:09:00,480 --> 00:09:05,700 um and then we have more code to 208 00:09:03,120 --> 00:09:08,339 actually generate those lines in here we 209 00:09:05,700 --> 00:09:10,260 are basically generating a grid so we 210 00:09:08,339 --> 00:09:11,760 can generate those curves and the way 211 00:09:10,260 --> 00:09:13,980 that you generate a grid is just your 212 00:09:11,760 --> 00:09:16,200 generative line and then from that line 213 00:09:13,980 --> 00:09:19,019 you use something called the quotation 214 00:09:16,200 --> 00:09:22,320 product which basically turns lines into 215 00:09:19,019 --> 00:09:23,160 squares sort of 216 00:09:22,320 --> 00:09:26,220 um 217 00:09:23,160 --> 00:09:27,959 and then you use a lot more math to 218 00:09:26,220 --> 00:09:30,120 actually compute those lines and the 219 00:09:27,959 --> 00:09:33,360 distances to those lines so all of this 220 00:09:30,120 --> 00:09:35,880 is basically about doing this thing here 221 00:09:33,360 --> 00:09:38,100 where instead of just having like the 222 00:09:35,880 --> 00:09:42,240 red pixels there you actually have the 223 00:09:38,100 --> 00:09:44,360 dim red and the even dimmer uh red in 224 00:09:42,240 --> 00:09:44,360 there 225 00:09:45,120 --> 00:09:49,380 right so we generate that and the lines 226 00:09:48,180 --> 00:09:53,040 that actually do that distance 227 00:09:49,380 --> 00:09:56,399 calculation are those around the 228 00:09:53,040 --> 00:09:58,080 are there basically you are calculating 229 00:09:56,399 --> 00:10:00,300 the distance to start point of the line 230 00:09:58,080 --> 00:10:02,339 and then every single point in the line 231 00:10:00,300 --> 00:10:04,200 and then the distance to the end point 232 00:10:02,339 --> 00:10:06,600 and then combining all of those together 233 00:10:04,200 --> 00:10:08,279 and what does that actually look like 234 00:10:06,600 --> 00:10:10,680 because now you're talking too much 235 00:10:08,279 --> 00:10:12,839 about code so it's not to bring it's 236 00:10:10,680 --> 00:10:15,480 nice to bring it back this is what it 237 00:10:12,839 --> 00:10:17,839 looks like up there you have those six 238 00:10:15,480 --> 00:10:20,700 points that I said represent curves 239 00:10:17,839 --> 00:10:23,040 and down here you have the actual 240 00:10:20,700 --> 00:10:25,940 rendering of that so basically just 241 00:10:23,040 --> 00:10:28,800 drawing lines on the screen very nice 242 00:10:25,940 --> 00:10:31,260 and now we get into the actual machine 243 00:10:28,800 --> 00:10:33,300 learning part so so far what we have 244 00:10:31,260 --> 00:10:35,459 done is you have built something that 245 00:10:33,300 --> 00:10:37,860 can do on the screen but instead of 246 00:10:35,459 --> 00:10:40,140 drawing the screen the usual way it 247 00:10:37,860 --> 00:10:42,360 draws on the screen with the little 248 00:10:40,140 --> 00:10:43,019 pixel rights 249 00:10:42,360 --> 00:10:47,399 um 250 00:10:43,019 --> 00:10:49,620 so to build something uh with pytorch 251 00:10:47,399 --> 00:10:51,420 essentially we use those modules which 252 00:10:49,620 --> 00:10:53,339 are classes and we are going to call 253 00:10:51,420 --> 00:10:55,920 create one that's called the rasterizer 254 00:10:53,339 --> 00:10:58,740 and all that it's doing is calling that 255 00:10:55,920 --> 00:11:00,899 code that I just showed to generate a 256 00:10:58,740 --> 00:11:02,940 sketch so basically you have this bunch 257 00:11:00,899 --> 00:11:06,000 of control points which are like the 258 00:11:02,940 --> 00:11:09,380 input points for the curves and then you 259 00:11:06,000 --> 00:11:12,060 generate a final sketch from that 260 00:11:09,380 --> 00:11:14,399 then I'm going to create another class 261 00:11:12,060 --> 00:11:16,740 called a sketch and that's going to be 262 00:11:14,399 --> 00:11:18,120 the model the thing that learns how to 263 00:11:16,740 --> 00:11:20,640 draw these sketches 264 00:11:18,120 --> 00:11:23,220 and all that it's going to do is create 265 00:11:20,640 --> 00:11:24,660 a variable called rasterizer and it's 266 00:11:23,220 --> 00:11:26,760 just going to call the previous code 267 00:11:24,660 --> 00:11:27,959 this thing 268 00:11:26,760 --> 00:11:29,519 um 269 00:11:27,959 --> 00:11:34,260 and 270 00:11:29,519 --> 00:11:36,360 save that in the rasterizer variable and 271 00:11:34,260 --> 00:11:38,820 Define It Forward pass which is 272 00:11:36,360 --> 00:11:40,980 basically how do we use this thing to 273 00:11:38,820 --> 00:11:42,540 generate some kind of output so if you 274 00:11:40,980 --> 00:11:44,339 remember I showed like you're basically 275 00:11:42,540 --> 00:11:46,380 mapping a bunch of numbers a bunch of 276 00:11:44,339 --> 00:11:48,600 other numbers this is the thing that 277 00:11:46,380 --> 00:11:52,680 does that so basically our mapping like 278 00:11:48,600 --> 00:11:55,380 this bunch of points into this image or 279 00:11:52,680 --> 00:11:57,480 sketch if you will 280 00:11:55,380 --> 00:12:00,360 um and then how do you actually train 281 00:11:57,480 --> 00:12:01,980 this thing so here we use a little trick 282 00:12:00,360 --> 00:12:04,680 just to make sure that we don't generate 283 00:12:01,980 --> 00:12:06,839 points that are outside of the screen 284 00:12:04,680 --> 00:12:08,640 um but we use something called a loss 285 00:12:06,839 --> 00:12:10,860 function to actually calculate the 286 00:12:08,640 --> 00:12:13,100 difference between what we generated and 287 00:12:10,860 --> 00:12:16,260 what we wanted to generate 288 00:12:13,100 --> 00:12:18,480 and for loss functions there are a bunch 289 00:12:16,260 --> 00:12:20,279 of different ways that you can uh 290 00:12:18,480 --> 00:12:21,959 calculate differences the simplest way 291 00:12:20,279 --> 00:12:24,060 is just what I said initially like you 292 00:12:21,959 --> 00:12:26,459 just have this bunch of numbers and then 293 00:12:24,060 --> 00:12:29,279 you subtract them and you use that as a 294 00:12:26,459 --> 00:12:31,200 difference but nowadays there are ways 295 00:12:29,279 --> 00:12:33,180 that are more interesting and they're 296 00:12:31,200 --> 00:12:34,980 more natural of generating those 297 00:12:33,180 --> 00:12:39,000 differences and there are basically 298 00:12:34,980 --> 00:12:41,640 things like using neural networks to 299 00:12:39,000 --> 00:12:44,160 actually calculate something that looks 300 00:12:41,640 --> 00:12:44,959 like is the same or looks like it's 301 00:12:44,160 --> 00:12:48,060 different 302 00:12:44,959 --> 00:12:51,180 as opposed to just numbers without any 303 00:12:48,060 --> 00:12:53,519 meaning and what that makes is it 304 00:12:51,180 --> 00:12:56,100 creates different functions that are 305 00:12:53,519 --> 00:12:57,779 actually kind of like what people see so 306 00:12:56,100 --> 00:12:59,519 if you have two images that you look 307 00:12:57,779 --> 00:13:01,860 like look at them any and they are very 308 00:12:59,519 --> 00:13:04,200 similar that function is going to say 309 00:13:01,860 --> 00:13:05,639 okay the difference is zero and if 310 00:13:04,200 --> 00:13:07,139 they're very different that function is 311 00:13:05,639 --> 00:13:10,260 just going to say okay the difference is 312 00:13:07,139 --> 00:13:13,620 very high or one for example 313 00:13:10,260 --> 00:13:16,740 um and I also snaked something in here 314 00:13:13,620 --> 00:13:19,019 uh Sobel differences because those are 315 00:13:16,740 --> 00:13:21,899 interesting for sketches 316 00:13:19,019 --> 00:13:24,480 um the super operator is something that 317 00:13:21,899 --> 00:13:26,820 you use to basically calculate outlines 318 00:13:24,480 --> 00:13:28,320 from images which is something that 319 00:13:26,820 --> 00:13:30,959 makes a lot of sense when you are 320 00:13:28,320 --> 00:13:32,600 drawing sketches because sketches have a 321 00:13:30,959 --> 00:13:35,220 lot of outlines 322 00:13:32,600 --> 00:13:37,079 and then you could combine all of those 323 00:13:35,220 --> 00:13:38,880 for example but the idea here is that 324 00:13:37,079 --> 00:13:41,880 you have a bunch of ways that you can 325 00:13:38,880 --> 00:13:44,519 generate those loss functions and it's 326 00:13:41,880 --> 00:13:47,480 very important to get this right because 327 00:13:44,519 --> 00:13:50,579 that's going to be what teaches 328 00:13:47,480 --> 00:13:52,620 the machine or the computer how they are 329 00:13:50,579 --> 00:13:56,339 going to draw those sketches 330 00:13:52,620 --> 00:13:59,399 all right and once once we have all of 331 00:13:56,339 --> 00:14:00,899 that in we can just train the model and 332 00:13:59,399 --> 00:14:03,240 training in here is not going to be 333 00:14:00,899 --> 00:14:05,160 something where like you get thousands 334 00:14:03,240 --> 00:14:06,779 and thousands or millions of images and 335 00:14:05,160 --> 00:14:08,160 you train the model with that but it's 336 00:14:06,779 --> 00:14:10,560 going to be something where you get the 337 00:14:08,160 --> 00:14:12,899 single image and then you try to make 338 00:14:10,560 --> 00:14:15,300 the model output something that looks 339 00:14:12,899 --> 00:14:16,980 like that image in that thing because 340 00:14:15,300 --> 00:14:20,760 it's composed of a bunch of lines and 341 00:14:16,980 --> 00:14:22,740 because we created this loss function 342 00:14:20,760 --> 00:14:24,959 um that tends to create something that 343 00:14:22,740 --> 00:14:27,779 looks like sketches is going to generate 344 00:14:24,959 --> 00:14:31,320 something that looks like sketches so 345 00:14:27,779 --> 00:14:33,779 for example if we have this picture 346 00:14:31,320 --> 00:14:35,820 once you run that we are going to 347 00:14:33,779 --> 00:14:39,420 generate something like this 348 00:14:35,820 --> 00:14:42,060 which is properly bad and exactly what I 349 00:14:39,420 --> 00:14:43,320 wanted in this case 350 00:14:42,060 --> 00:14:46,980 um 351 00:14:43,320 --> 00:14:48,540 if we input something like this 352 00:14:46,980 --> 00:14:50,040 perfect 353 00:14:48,540 --> 00:14:52,800 all right 354 00:14:50,040 --> 00:14:54,779 if we input something like this all 355 00:14:52,800 --> 00:14:58,440 right it's getting too good I think 356 00:14:54,779 --> 00:15:01,019 maybe not that one but yeah a bunch of 357 00:14:58,440 --> 00:15:03,360 different examples and the interesting 358 00:15:01,019 --> 00:15:06,060 here is that you're not just capturing 359 00:15:03,360 --> 00:15:08,579 like okay does this thing that's being 360 00:15:06,060 --> 00:15:11,220 generated look exactly like the picture 361 00:15:08,579 --> 00:15:14,160 it doesn't but it captures a lot of the 362 00:15:11,220 --> 00:15:15,300 essence of the picture which is what I 363 00:15:14,160 --> 00:15:17,519 want in the first place you actually 364 00:15:15,300 --> 00:15:19,800 want the reason that I did not want like 365 00:15:17,519 --> 00:15:22,620 the good sketches is because I think 366 00:15:19,800 --> 00:15:24,779 it's way more interesting to capture 367 00:15:22,620 --> 00:15:27,300 the essence of a picture and that's 368 00:15:24,779 --> 00:15:29,100 essentially what a sketch is as opposed 369 00:15:27,300 --> 00:15:30,540 to just capturing what the picture is 370 00:15:29,100 --> 00:15:32,160 and that's the reason that it's not 371 00:15:30,540 --> 00:15:34,620 super straightforward to do all of that 372 00:15:32,160 --> 00:15:35,940 you need this differentiable rendering 373 00:15:34,620 --> 00:15:37,440 business 374 00:15:35,940 --> 00:15:39,680 um to actually be able to get through 375 00:15:37,440 --> 00:15:39,680 that 376 00:15:43,199 --> 00:15:49,260 all right and all of that 377 00:15:46,440 --> 00:15:51,420 um is also partially a better version of 378 00:15:49,260 --> 00:15:54,480 all of this that I have done here is 379 00:15:51,420 --> 00:15:57,779 this thing called clipasso and it 380 00:15:54,480 --> 00:16:00,120 basically uses a model that's very good 381 00:15:57,779 --> 00:16:02,519 at telling like this image looks like 382 00:16:00,120 --> 00:16:05,399 this other image but they're not the 383 00:16:02,519 --> 00:16:07,800 same and that's the clip model and they 384 00:16:05,399 --> 00:16:09,899 use that together read a lot of this 385 00:16:07,800 --> 00:16:12,240 differentiable rendering thing to 386 00:16:09,899 --> 00:16:15,720 actually generate sketches that capture 387 00:16:12,240 --> 00:16:17,519 a lot of like what what what you need to 388 00:16:15,720 --> 00:16:19,800 actually tell that for example this is a 389 00:16:17,519 --> 00:16:20,940 flamingo and then they have like I don't 390 00:16:19,800 --> 00:16:23,160 know five 391 00:16:20,940 --> 00:16:25,380 lines in there and then you know that's 392 00:16:23,160 --> 00:16:28,199 a flamingo so 393 00:16:25,380 --> 00:16:30,779 that's uh I also have got the their 394 00:16:28,199 --> 00:16:31,980 GitHub there you can take a look at how 395 00:16:30,779 --> 00:16:35,639 they did that 396 00:16:31,980 --> 00:16:38,540 all right and that's it 397 00:16:35,639 --> 00:16:38,540 questions 398 00:16:40,630 --> 00:16:49,059 [Applause] 399 00:16:57,540 --> 00:17:01,980 that was super interesting 400 00:16:59,699 --> 00:17:04,740 um I think I saw in some of the Python 401 00:17:01,980 --> 00:17:07,620 there it was 128 lines was what you were 402 00:17:04,740 --> 00:17:10,919 working with yes yep so you basically 403 00:17:07,620 --> 00:17:13,740 create where is that I think yep yeah 404 00:17:10,919 --> 00:17:15,839 you basically the big Vector that's 405 00:17:13,740 --> 00:17:17,699 composed of other vectors that each one 406 00:17:15,839 --> 00:17:20,160 of those has like six numbers represent 407 00:17:17,699 --> 00:17:21,600 one of those curves and then like you 408 00:17:20,160 --> 00:17:23,220 say a number of those to actually 409 00:17:21,600 --> 00:17:25,880 generate the sketches so in this case 410 00:17:23,220 --> 00:17:28,439 yep 128 did you um did you do many 411 00:17:25,880 --> 00:17:30,059 experimentations with drastically 412 00:17:28,439 --> 00:17:33,299 dropping or increasing the amount of 413 00:17:30,059 --> 00:17:35,460 lines yes yep in this specific case 414 00:17:33,299 --> 00:17:37,620 because the loss functions that I use 415 00:17:35,460 --> 00:17:40,919 are very simple it tends to do better 416 00:17:37,620 --> 00:17:42,780 with more lines so it creates something 417 00:17:40,919 --> 00:17:45,720 that resembles more of the initial image 418 00:17:42,780 --> 00:17:48,000 with more lines but if you use a better 419 00:17:45,720 --> 00:17:51,780 loss function which is the case in here 420 00:17:48,000 --> 00:17:53,340 then you can go down like to five lines 421 00:17:51,780 --> 00:17:55,140 and still generate something that looks 422 00:17:53,340 --> 00:17:56,940 like the initial image and that's 423 00:17:55,140 --> 00:17:58,860 exactly what they're demonstrating here 424 00:17:56,940 --> 00:18:01,340 like the same algorithm different number 425 00:17:58,860 --> 00:18:01,340 of lines 426 00:18:16,200 --> 00:18:20,400 so it's the only difference between your 427 00:18:19,140 --> 00:18:22,860 approach and the one from this paper is 428 00:18:20,400 --> 00:18:25,200 the loss function itself 429 00:18:22,860 --> 00:18:26,640 mostly yes there are a few other things 430 00:18:25,200 --> 00:18:30,539 that you can do to improve the quality 431 00:18:26,640 --> 00:18:32,700 for example uh in here what we are doing 432 00:18:30,539 --> 00:18:34,919 is initially like the initial condition 433 00:18:32,700 --> 00:18:36,179 the initial set of lines they are 434 00:18:34,919 --> 00:18:37,559 completely random 435 00:18:36,179 --> 00:18:40,080 but one thing that you can do to 436 00:18:37,559 --> 00:18:43,799 actually improve this is Select hotspots 437 00:18:40,080 --> 00:18:45,539 so basically use any function like 438 00:18:43,799 --> 00:18:48,000 for example even the syllable operator 439 00:18:45,539 --> 00:18:49,620 you could use that to select areas of 440 00:18:48,000 --> 00:18:52,320 interest and then you put the lines 441 00:18:49,620 --> 00:18:55,140 initially close to those areas and then 442 00:18:52,320 --> 00:18:57,600 the results are going to be quite a bit 443 00:18:55,140 --> 00:18:59,700 better than if you don't do that so yeah 444 00:18:57,600 --> 00:19:01,679 like in general it's the same approach 445 00:18:59,700 --> 00:19:03,360 different plus functions but there are a 446 00:19:01,679 --> 00:19:06,000 few other things that you can do like 447 00:19:03,360 --> 00:19:10,340 that generally improve quality in this 448 00:19:06,000 --> 00:19:10,340 light ways cool thank you 449 00:19:13,740 --> 00:19:18,299 um how expensive or how slow is it to 450 00:19:15,960 --> 00:19:21,660 run the neural network 451 00:19:18,299 --> 00:19:24,299 um trainer rather than right 452 00:19:21,660 --> 00:19:26,160 yeah so it's quite a bit more expensive 453 00:19:24,299 --> 00:19:28,980 you usually are using a pre-trained 454 00:19:26,160 --> 00:19:30,840 neural network in those cases so it's 455 00:19:28,980 --> 00:19:32,760 not going to be as expensive as like 456 00:19:30,840 --> 00:19:34,559 training the network because like the 457 00:19:32,760 --> 00:19:37,020 the you are just using it as a loss 458 00:19:34,559 --> 00:19:39,000 function not really training it 459 00:19:37,020 --> 00:19:42,000 um so it's not really much more 460 00:19:39,000 --> 00:19:43,500 expensive but because the model is so 461 00:19:42,000 --> 00:19:46,380 simple it's basically just like 462 00:19:43,500 --> 00:19:48,539 generating curves when you add any 463 00:19:46,380 --> 00:19:51,900 neural network the performance drops a 464 00:19:48,539 --> 00:19:54,840 lot so for example uh I did like 10 465 00:19:51,900 --> 00:19:57,780 epochs for all of those and they took on 466 00:19:54,840 --> 00:20:00,419 the CPU on the Mac it takes a few 467 00:19:57,780 --> 00:20:02,760 minutes I would say two minutes but like 468 00:20:00,419 --> 00:20:05,640 if you put a function that uses a neural 469 00:20:02,760 --> 00:20:08,240 network it goes like to maybe four so it 470 00:20:05,640 --> 00:20:08,240 doubles the time 471 00:20:22,020 --> 00:20:26,340 yeah so I've noticed those um all the 472 00:20:24,960 --> 00:20:28,799 curves they're only curved in One 473 00:20:26,340 --> 00:20:30,860 Direction is it possible to deal with 474 00:20:28,799 --> 00:20:33,539 mult uh curved 475 00:20:30,860 --> 00:20:36,179 occurs a curve that goes in multiple 476 00:20:33,539 --> 00:20:39,960 directions like like an s yep I'm 477 00:20:36,179 --> 00:20:42,179 following you um so there is a paper uh 478 00:20:39,960 --> 00:20:43,500 I did not put it in here but I can try 479 00:20:42,179 --> 00:20:45,720 to find it later 480 00:20:43,500 --> 00:20:47,280 um there is a paper that exactly does 481 00:20:45,720 --> 00:20:49,020 that like what's the difference if you 482 00:20:47,280 --> 00:20:50,700 have straight lines if you have curved 483 00:20:49,020 --> 00:20:53,039 lines if you have lines that can curve 484 00:20:50,700 --> 00:20:55,200 in multiple places 485 00:20:53,039 --> 00:20:57,780 um and what they found is like the 486 00:20:55,200 --> 00:21:01,320 visual difference is very small because 487 00:20:57,780 --> 00:21:02,880 what you can do is for example a is a 488 00:21:01,320 --> 00:21:05,160 line that Curves in let's say six places 489 00:21:02,880 --> 00:21:07,679 you can just use six lines that curve 490 00:21:05,160 --> 00:21:10,260 right so just by bumping that number of 491 00:21:07,679 --> 00:21:12,960 Toto Strokes you can do sort of the same 492 00:21:10,260 --> 00:21:15,299 thing and it's in general way less 493 00:21:12,960 --> 00:21:18,440 expensive to use this curves that's just 494 00:21:15,299 --> 00:21:18,440 like curve in a single place 495 00:21:19,380 --> 00:21:23,940 um a question from the Discord is uh how 496 00:21:22,020 --> 00:21:25,620 long did it take to train in total and 497 00:21:23,940 --> 00:21:28,620 what kind of Hardware are you using 498 00:21:25,620 --> 00:21:31,919 right so yeah as I said like 499 00:21:28,620 --> 00:21:33,960 for those examples couple minutes hitch 500 00:21:31,919 --> 00:21:36,059 because like there is no training the 501 00:21:33,960 --> 00:21:38,159 model you are training on a specific 502 00:21:36,059 --> 00:21:40,559 image like you just want oh I want the 503 00:21:38,159 --> 00:21:42,200 model to generate a specific image so I 504 00:21:40,559 --> 00:21:44,400 want to like 505 00:21:42,200 --> 00:21:46,620 approximate all of those parameters that 506 00:21:44,400 --> 00:21:48,720 generate the curves I want to 507 00:21:46,620 --> 00:21:51,059 approximately adjust those based on this 508 00:21:48,720 --> 00:21:54,059 specific image but yeah a couple minutes 509 00:21:51,059 --> 00:21:56,840 on the max CPU it's not even used using 510 00:21:54,059 --> 00:21:56,840 the GPU in here 511 00:22:00,059 --> 00:22:03,960 um some of these sketches 512 00:22:02,100 --> 00:22:05,880 remind me of the sort of sketches you'll 513 00:22:03,960 --> 00:22:07,860 see in an art gallery underneath the 514 00:22:05,880 --> 00:22:09,960 actual painting maybe I should sell them 515 00:22:07,860 --> 00:22:12,179 no no but I guess the point that I'm 516 00:22:09,960 --> 00:22:14,100 making is that um could you use this 517 00:22:12,179 --> 00:22:17,400 technique as like a first approximation 518 00:22:14,100 --> 00:22:19,380 to of an image and then use once you've 519 00:22:17,400 --> 00:22:21,539 got these boundaries use use those 520 00:22:19,380 --> 00:22:23,340 boundaries to sort of do a next 521 00:22:21,539 --> 00:22:25,820 iteration of some sort of image 522 00:22:23,340 --> 00:22:25,820 recognition 523 00:22:26,220 --> 00:22:30,419 you can do one of the things that I'm 524 00:22:28,140 --> 00:22:32,760 quite interested in is the uses that 525 00:22:30,419 --> 00:22:35,100 this could have for animation because 526 00:22:32,760 --> 00:22:36,840 one of like the highest costs that you 527 00:22:35,100 --> 00:22:38,880 have in animation is actually creating 528 00:22:36,840 --> 00:22:40,620 in between frames so you have someone 529 00:22:38,880 --> 00:22:42,539 that draws like the keyframes and then 530 00:22:40,620 --> 00:22:45,179 someone has to go and actually render 531 00:22:42,539 --> 00:22:47,220 each one of those and if you want to 532 00:22:45,179 --> 00:22:49,260 apply techniques like this to that 533 00:22:47,220 --> 00:22:51,120 problem the thing is like you want to 534 00:22:49,260 --> 00:22:53,280 generate a bunch of sketches to train 535 00:22:51,120 --> 00:22:55,500 other models right so if you have things 536 00:22:53,280 --> 00:22:57,480 like this maybe you could use even like 537 00:22:55,500 --> 00:23:00,360 as a first approximation to generate 538 00:22:57,480 --> 00:23:02,700 something else or you could use them to 539 00:23:00,360 --> 00:23:05,220 generate training sets that you can then 540 00:23:02,700 --> 00:23:07,020 use to create models for example that 541 00:23:05,220 --> 00:23:09,120 help people with like generating in 542 00:23:07,020 --> 00:23:11,820 between frames and animations so yeah 543 00:23:09,120 --> 00:23:14,100 definitely something that's possible and 544 00:23:11,820 --> 00:23:17,760 it's actually an active area of research 545 00:23:14,100 --> 00:23:19,860 uh like this thing came out last year so 546 00:23:17,760 --> 00:23:22,700 people are actively working like what 547 00:23:19,860 --> 00:23:22,700 can you use this for 548 00:23:34,140 --> 00:23:39,900 when you sketch you use different 549 00:23:36,600 --> 00:23:42,000 pressures with your pen is that also a 550 00:23:39,900 --> 00:23:45,179 possibility that you can like 551 00:23:42,000 --> 00:23:48,780 differently it is yeah 552 00:23:45,179 --> 00:23:50,460 um so in here as you guys could see like 553 00:23:48,780 --> 00:23:52,559 I'm trying to go for the simplest 554 00:23:50,460 --> 00:23:54,299 approach like I'm not using multiple 555 00:23:52,559 --> 00:23:56,700 curves that can bend in multiple places 556 00:23:54,299 --> 00:23:59,100 I'm using like the simplest version of a 557 00:23:56,700 --> 00:24:01,740 curve I'm also not using colors and I'm 558 00:23:59,100 --> 00:24:03,240 not using different pressures but one 559 00:24:01,740 --> 00:24:05,159 thing that you can do is just like in 560 00:24:03,240 --> 00:24:08,039 your definition of a curve let me just 561 00:24:05,159 --> 00:24:10,080 find that real quick in your definition 562 00:24:08,039 --> 00:24:11,520 of a curve you could just say I want an 563 00:24:10,080 --> 00:24:13,080 extra parameter here that's going to be 564 00:24:11,520 --> 00:24:15,480 the initial pressure and the final 565 00:24:13,080 --> 00:24:17,460 pressure for example and that's going to 566 00:24:15,480 --> 00:24:19,440 improve like the kind of throws that you 567 00:24:17,460 --> 00:24:21,720 can do a lot and you could also say I 568 00:24:19,440 --> 00:24:24,360 want three extra parameters uh for 569 00:24:21,720 --> 00:24:25,860 representing the caller or you can go 570 00:24:24,360 --> 00:24:28,200 crazy with this like you just like the 571 00:24:25,860 --> 00:24:29,640 set of firmers that are training the 572 00:24:28,200 --> 00:24:31,980 only thing that you have to be mindful 573 00:24:29,640 --> 00:24:34,080 of is like when you're actually writing 574 00:24:31,980 --> 00:24:36,720 the thing that takes those curves engine 575 00:24:34,080 --> 00:24:38,460 generates like the final image you have 576 00:24:36,720 --> 00:24:41,100 to be mindful that they are actually 577 00:24:38,460 --> 00:24:42,360 differentiable but otherwise yeah you 578 00:24:41,100 --> 00:24:45,200 could go crazy with the kind of 579 00:24:42,360 --> 00:24:45,200 parameters that you're taking 580 00:24:47,700 --> 00:24:51,120 sorry can you say it again I think that 581 00:24:49,740 --> 00:24:53,340 the question was are there loss 582 00:24:51,120 --> 00:24:56,940 functions that support that already 583 00:24:53,340 --> 00:24:59,760 yeah yeah yeah yes yep in this case like 584 00:24:56,940 --> 00:25:02,520 the last functions that I use here it 585 00:24:59,760 --> 00:25:04,500 supports color image images but I'm 586 00:25:02,520 --> 00:25:06,960 converting everything to grayscale and 587 00:25:04,500 --> 00:25:08,400 in terms of like different pressures 588 00:25:06,960 --> 00:25:09,900 um that would be supported out of the 589 00:25:08,400 --> 00:25:13,620 box because you are just comparing like 590 00:25:09,900 --> 00:25:16,679 two images so if you have a stroke that 591 00:25:13,620 --> 00:25:19,020 initially for example has a 592 00:25:16,679 --> 00:25:22,380 either a 593 00:25:19,020 --> 00:25:23,760 higher brightness or a bigger stroke for 594 00:25:22,380 --> 00:25:26,159 example depending on the way that you 595 00:25:23,760 --> 00:25:26,760 want to represent like the pressure 596 00:25:26,159 --> 00:25:28,620 um 597 00:25:26,760 --> 00:25:30,120 When comparing the pixels the pixels 598 00:25:28,620 --> 00:25:33,320 don't care right it's just the way that 599 00:25:30,120 --> 00:25:33,320 we are generating those pixels 600 00:25:38,340 --> 00:25:43,620 hi thanks a lot for the talk um my 601 00:25:40,799 --> 00:25:46,620 question is during training did you ever 602 00:25:43,620 --> 00:25:48,600 encounter situations where the gradient 603 00:25:46,620 --> 00:25:50,400 descent algorithm did not converge and 604 00:25:48,600 --> 00:25:54,020 you just end up with a bunch of random 605 00:25:50,400 --> 00:25:54,020 squiggly lines that say the 606 00:25:54,480 --> 00:25:58,020 not really 607 00:25:56,279 --> 00:26:00,360 um because like the model is so simple 608 00:25:58,020 --> 00:26:02,340 and the gradient in this case are very 609 00:26:00,360 --> 00:26:06,299 simple as well 610 00:26:02,340 --> 00:26:09,200 um but they only converge up to a point 611 00:26:06,299 --> 00:26:11,580 right so you cannot improve the quality 612 00:26:09,200 --> 00:26:14,760 much further than what you're seeing 613 00:26:11,580 --> 00:26:16,080 here like with those models specifically 614 00:26:14,760 --> 00:26:17,400 um because like this is a maximum 615 00:26:16,080 --> 00:26:18,960 conversions that you're actually going 616 00:26:17,400 --> 00:26:20,279 to get but yeah I did not have any 617 00:26:18,960 --> 00:26:22,880 issues with actually getting into 618 00:26:20,279 --> 00:26:22,880 conversion 619 00:26:25,200 --> 00:26:28,860 I've got a question over here 620 00:26:27,120 --> 00:26:30,480 here have any more Q 621 00:26:28,860 --> 00:26:32,059 it's the little pixel going down the 622 00:26:30,480 --> 00:26:34,559 road 623 00:26:32,059 --> 00:26:37,100 I need to do more of those they're great 624 00:26:34,559 --> 00:26:37,100 yeah 625 00:26:38,940 --> 00:26:42,620 yeah I'm so happy 626 00:26:45,900 --> 00:26:50,279 um and one actual question as well is 627 00:26:48,179 --> 00:26:51,600 there any way of converting that into 3D 628 00:26:50,279 --> 00:26:53,220 or would that be an entirely different 629 00:26:51,600 --> 00:26:55,919 ballpark 630 00:26:53,220 --> 00:26:58,260 that's an interesting question 631 00:26:55,919 --> 00:27:00,600 um the concept of differentiable 632 00:26:58,260 --> 00:27:03,059 rendering exists in 3D so instead of 633 00:27:00,600 --> 00:27:05,520 extracting like a sketch or extracting 634 00:27:03,059 --> 00:27:08,159 like the set of parameters for a curves 635 00:27:05,520 --> 00:27:10,020 you're actually extracting for example 636 00:27:08,159 --> 00:27:12,419 the meshes that you could use to render 637 00:27:10,020 --> 00:27:14,340 that image and there's a lot of research 638 00:27:12,419 --> 00:27:17,659 on that but yeah I don't know much about 639 00:27:14,340 --> 00:27:17,659 that but yeah it's possible 640 00:27:21,059 --> 00:27:25,320 uh are there any like open source 641 00:27:23,039 --> 00:27:27,480 packages for helping people get started 642 00:27:25,320 --> 00:27:28,799 with this 643 00:27:27,480 --> 00:27:33,960 um 644 00:27:28,799 --> 00:27:34,799 for rendering I believe there is 645 00:27:33,960 --> 00:27:36,659 um 646 00:27:34,799 --> 00:27:39,600 in that same paper there are reference 647 00:27:36,659 --> 00:27:41,039 that I did not really put in here but I 648 00:27:39,600 --> 00:27:44,179 do believe they have an open source 649 00:27:41,039 --> 00:27:47,640 version of like their rendering approach 650 00:27:44,179 --> 00:27:50,100 and also for the loss functions there 651 00:27:47,640 --> 00:27:52,520 are packages uh one of them I'm using 652 00:27:50,100 --> 00:27:55,799 here is 653 00:27:52,520 --> 00:27:59,340 this package and it basically provides 654 00:27:55,799 --> 00:28:02,299 uh perceptual loss functions that you 655 00:27:59,340 --> 00:28:02,299 could use out of the box 656 00:28:08,220 --> 00:28:13,640 all right let's give a round of applause 657 00:28:10,799 --> 00:28:13,640 for Felipe