1 00:00:06,320 --> 00:00:11,499 [Music] 2 00:00:15,440 --> 00:00:18,720 hello everyone and welcome back from 3 00:00:17,039 --> 00:00:20,800 your lunch i hope you had a great time 4 00:00:18,720 --> 00:00:23,279 uh and hopefully had some good food 5 00:00:20,800 --> 00:00:26,400 uh next up in in the kia ora theater at 6 00:00:23,279 --> 00:00:28,720 linux conference 2022 is mike cohen a 7 00:00:26,400 --> 00:00:30,160 renowned digital forensic engineer and 8 00:00:28,720 --> 00:00:32,559 senior software engineer who described 9 00:00:30,160 --> 00:00:34,239 himself as a digital paleontologist 10 00:00:32,559 --> 00:00:36,160 mike is the founder and creator of 11 00:00:34,239 --> 00:00:37,760 philosoraptor which is an advanced open 12 00:00:36,160 --> 00:00:40,399 source digital forensic and incident 13 00:00:37,760 --> 00:00:43,200 response framework supporting 14 00:00:40,399 --> 00:00:45,200 linux mac os and windows there's so many 15 00:00:43,200 --> 00:00:46,640 thousand digital forensics 16 00:00:45,200 --> 00:00:47,920 uh 17 00:00:46,640 --> 00:00:49,600 today mike is talking through hunting 18 00:00:47,920 --> 00:00:52,480 for threats on a linux host using 19 00:00:49,600 --> 00:00:53,440 velociraptor and its query language vql 20 00:00:52,480 --> 00:00:56,399 take it off 21 00:00:53,440 --> 00:00:58,399 thanks very much thank you so um i'm 22 00:00:56,399 --> 00:01:01,280 really glad to be here today and talk to 23 00:00:58,399 --> 00:01:03,520 you about velociraptor uh and today 24 00:01:01,280 --> 00:01:06,080 we're going to cover some of the 25 00:01:03,520 --> 00:01:07,680 linux aspects of typical investigation 26 00:01:06,080 --> 00:01:10,000 that we see 27 00:01:07,680 --> 00:01:10,720 when we're doing linux incident response 28 00:01:10,000 --> 00:01:13,040 or 29 00:01:10,720 --> 00:01:14,880 forensic investigations to give you guys 30 00:01:13,040 --> 00:01:18,240 a little bit of a taste to 31 00:01:14,880 --> 00:01:21,920 how to do forensic response at scale and 32 00:01:18,240 --> 00:01:23,360 how velociraptor can make that easier so 33 00:01:21,920 --> 00:01:26,080 specifically 34 00:01:23,360 --> 00:01:28,479 velociraptor has a lot of capabilities 35 00:01:26,080 --> 00:01:30,799 we're really not going to even touch on 36 00:01:28,479 --> 00:01:33,119 many of the capabilities that it has 37 00:01:30,799 --> 00:01:35,680 uh and we're going to cover a lot of the 38 00:01:33,119 --> 00:01:38,000 the things in very briefly but because 39 00:01:35,680 --> 00:01:39,360 this is a linux conference um then we're 40 00:01:38,000 --> 00:01:41,520 going to really look at 41 00:01:39,360 --> 00:01:42,399 typical linux use cases 42 00:01:41,520 --> 00:01:44,640 and 43 00:01:42,399 --> 00:01:46,720 since it's an open source conference so 44 00:01:44,640 --> 00:01:49,360 i'm hoping to give you guys a bit of an 45 00:01:46,720 --> 00:01:52,799 idea as to how to join the open source 46 00:01:49,360 --> 00:01:55,280 uh project and contribute and uh and and 47 00:01:52,799 --> 00:01:57,520 use that so um 48 00:01:55,280 --> 00:02:00,799 what is velociraptor and i've spoken 49 00:01:57,520 --> 00:02:02,960 about uh velociraptor in linux conf um i 50 00:02:00,799 --> 00:02:05,119 think last year there was um 51 00:02:02,960 --> 00:02:08,000 a workshop about it so you know we we 52 00:02:05,119 --> 00:02:10,879 covered it in a lot of depth uh but it 53 00:02:08,000 --> 00:02:13,920 is an open source tool uh that's really 54 00:02:10,879 --> 00:02:15,440 designed for um for digital forensic and 55 00:02:13,920 --> 00:02:18,080 instant response 56 00:02:15,440 --> 00:02:20,160 and uh and also alerting and detection 57 00:02:18,080 --> 00:02:23,840 and essentially it's a way of making it 58 00:02:20,160 --> 00:02:25,760 easy for us to manage uh or investigate 59 00:02:23,840 --> 00:02:26,800 uh at scale 60 00:02:25,760 --> 00:02:28,480 so 61 00:02:26,800 --> 00:02:30,720 the the thing that makes velociraptor 62 00:02:28,480 --> 00:02:33,680 really cool is that it has a query 63 00:02:30,720 --> 00:02:35,519 language called vql and vql is really 64 00:02:33,680 --> 00:02:36,400 kind of at the core of velociraptor it 65 00:02:35,519 --> 00:02:37,840 makes it 66 00:02:36,400 --> 00:02:39,519 do everything and we're going to cover 67 00:02:37,840 --> 00:02:40,640 some of the um 68 00:02:39,519 --> 00:02:42,800 some of the things that we can do with 69 00:02:40,640 --> 00:02:45,040 vql today and how we can use that in the 70 00:02:42,800 --> 00:02:47,840 real world um so 71 00:02:45,040 --> 00:02:49,360 you know it's it's just a bit of a taste 72 00:02:47,840 --> 00:02:50,879 um oops 73 00:02:49,360 --> 00:02:53,200 all right so let's have a look at 74 00:02:50,879 --> 00:02:55,840 generally what velociraptor looks like 75 00:02:53,200 --> 00:02:57,920 so we have a velociraptor server usually 76 00:02:55,840 --> 00:03:01,120 we deploy it in the cloud 77 00:02:57,920 --> 00:03:03,440 and it basically it connects with uh 78 00:03:01,120 --> 00:03:06,720 assets which could be laptops or 79 00:03:03,440 --> 00:03:09,200 servers or essentially any kind of 80 00:03:06,720 --> 00:03:11,760 system that runs the agent so we have 81 00:03:09,200 --> 00:03:12,640 support for windows mac os and linux 82 00:03:11,760 --> 00:03:14,959 agents 83 00:03:12,640 --> 00:03:18,000 today we'll talk about linux 84 00:03:14,959 --> 00:03:19,920 but the agents are connected consider 85 00:03:18,000 --> 00:03:22,400 persistently to the server so that means 86 00:03:19,920 --> 00:03:23,519 that we can investigate each of these 87 00:03:22,400 --> 00:03:25,280 agents 88 00:03:23,519 --> 00:03:26,640 with you know within seconds we don't 89 00:03:25,280 --> 00:03:27,920 need to wait for them to pull or 90 00:03:26,640 --> 00:03:30,159 anything like that we can immediately 91 00:03:27,920 --> 00:03:32,400 get results from them and then we have 92 00:03:30,159 --> 00:03:34,720 the admin ui which 93 00:03:32,400 --> 00:03:37,519 we use that to manage the deployment so 94 00:03:34,720 --> 00:03:40,000 i actually have a bit of a demo today so 95 00:03:37,519 --> 00:03:42,879 i'm just going to show you guys what the 96 00:03:40,000 --> 00:03:43,920 admin ui looks like and as you can see 97 00:03:42,879 --> 00:03:46,720 um 98 00:03:43,920 --> 00:03:49,280 we have just the the welcome screen 99 00:03:46,720 --> 00:03:51,519 there is a dashboard over here that uh 100 00:03:49,280 --> 00:03:53,680 you know just tells us uh some 101 00:03:51,519 --> 00:03:55,519 information about this deployment 102 00:03:53,680 --> 00:03:58,239 uh like how much disk space there is and 103 00:03:55,519 --> 00:04:01,840 things like that and uh today in this 104 00:03:58,239 --> 00:04:04,319 demonstration i have um i have about a 105 00:04:01,840 --> 00:04:06,480 thousand clients connected so a thousand 106 00:04:04,319 --> 00:04:08,480 endpoints connected and uh and the 107 00:04:06,480 --> 00:04:10,799 server is you know kind of waiting for 108 00:04:08,480 --> 00:04:12,959 us we're gonna do some some interesting 109 00:04:10,799 --> 00:04:14,560 work on that so if i just um 110 00:04:12,959 --> 00:04:16,479 search for my clients i can see these 111 00:04:14,560 --> 00:04:17,759 are all my clients here 112 00:04:16,479 --> 00:04:19,440 and 113 00:04:17,759 --> 00:04:21,680 you know i can look at each of them 114 00:04:19,440 --> 00:04:25,440 randomly and see some information about 115 00:04:21,680 --> 00:04:28,240 it including um collecting so telemetry 116 00:04:25,440 --> 00:04:30,880 you know about like how how much cpu and 117 00:04:28,240 --> 00:04:33,120 usage it's you know that each client is 118 00:04:30,880 --> 00:04:34,800 taking each endpoint is taking 119 00:04:33,120 --> 00:04:37,919 but uh that's just 120 00:04:34,800 --> 00:04:38,800 um so just showing you how i can control 121 00:04:37,919 --> 00:04:42,160 each 122 00:04:38,800 --> 00:04:44,720 client each uh we call clients the um 123 00:04:42,160 --> 00:04:47,840 the assets right so they are the clients 124 00:04:44,720 --> 00:04:49,520 okay so typically um we it's very 125 00:04:47,840 --> 00:04:51,840 efficient it's really fast designed to 126 00:04:49,520 --> 00:04:52,880 collect a lot of data real quickly 127 00:04:51,840 --> 00:04:56,240 um 128 00:04:52,880 --> 00:04:58,479 because most of the work is done by 129 00:04:56,240 --> 00:05:00,400 using the query language which runs on 130 00:04:58,479 --> 00:05:02,639 the endpoint so you'll see that later 131 00:05:00,400 --> 00:05:04,720 when we're going to do some pretty heavy 132 00:05:02,639 --> 00:05:06,560 lifting and you'll see the endpoints are 133 00:05:04,720 --> 00:05:08,320 doing a lot of work so even if we hunt 134 00:05:06,560 --> 00:05:10,880 for it with 135 00:05:08,320 --> 00:05:12,000 many many endpoints then we will we will 136 00:05:10,880 --> 00:05:13,680 be able to 137 00:05:12,000 --> 00:05:14,880 uh very quickly 138 00:05:13,680 --> 00:05:16,880 um 139 00:05:14,880 --> 00:05:19,199 see um 140 00:05:16,880 --> 00:05:22,000 we're going to very quickly uh see that 141 00:05:19,199 --> 00:05:24,880 it you know they'll scale really quickly 142 00:05:22,000 --> 00:05:27,039 all right so um 143 00:05:24,880 --> 00:05:29,919 the the idea behind vql is instead of 144 00:05:27,039 --> 00:05:33,440 having specific analysis modules 145 00:05:29,919 --> 00:05:37,039 um we have generic what we call vql 146 00:05:33,440 --> 00:05:40,080 plugins and those plugins uh perform 147 00:05:37,039 --> 00:05:42,080 some low-level forensic analysis such as 148 00:05:40,080 --> 00:05:44,240 uh parsing files 149 00:05:42,080 --> 00:05:46,960 um you know in in the windows world we 150 00:05:44,240 --> 00:05:49,360 have you know ntfs buzzing mft and so on 151 00:05:46,960 --> 00:05:52,240 uh in the linux world we have parsing 152 00:05:49,360 --> 00:05:53,440 using grog sqlite and so on uh and 153 00:05:52,240 --> 00:05:55,680 binary parsing we're going to look at 154 00:05:53,440 --> 00:05:58,639 some of those today but instead of just 155 00:05:55,680 --> 00:06:00,400 having like a module that just does you 156 00:05:58,639 --> 00:06:02,240 know we're going to look at 157 00:06:00,400 --> 00:06:03,919 you know browser history 158 00:06:02,240 --> 00:06:07,360 we have generic pauses and then the 159 00:06:03,919 --> 00:06:09,440 query uses that to uh to build a more 160 00:06:07,360 --> 00:06:10,880 complicated query 161 00:06:09,440 --> 00:06:13,039 parser out of that 162 00:06:10,880 --> 00:06:15,199 so so this is the point of having the 163 00:06:13,039 --> 00:06:17,520 query language we can string together 164 00:06:15,199 --> 00:06:20,400 different basic building blocks to 165 00:06:17,520 --> 00:06:21,360 create a more complex and capable 166 00:06:20,400 --> 00:06:22,560 um 167 00:06:21,360 --> 00:06:23,360 capability 168 00:06:22,560 --> 00:06:25,520 so 169 00:06:23,360 --> 00:06:27,199 because this is all about open source 170 00:06:25,520 --> 00:06:29,199 and this conference you know really 171 00:06:27,199 --> 00:06:31,759 focuses on a lot of the open source 172 00:06:29,199 --> 00:06:33,520 aspects as well uh because it's an open 173 00:06:31,759 --> 00:06:36,639 source we have a vibrant community of 174 00:06:33,520 --> 00:06:38,319 people who write these vql queries for 175 00:06:36,639 --> 00:06:40,160 us so 176 00:06:38,319 --> 00:06:42,000 if you just wanted to know 177 00:06:40,160 --> 00:06:44,160 how to do a particular forensic analysis 178 00:06:42,000 --> 00:06:46,720 or particular or look for particular 179 00:06:44,160 --> 00:06:48,880 threat then probably there's going to be 180 00:06:46,720 --> 00:06:51,280 someone that had written a vql query 181 00:06:48,880 --> 00:06:53,360 that they would share with the world and 182 00:06:51,280 --> 00:06:55,199 that that allows us to kind of 183 00:06:53,360 --> 00:06:57,599 crowdsource these capabilities so we 184 00:06:55,199 --> 00:06:59,280 have on our website let me just 185 00:06:57,599 --> 00:07:00,880 quickly point out 186 00:06:59,280 --> 00:07:03,039 uh so this is our 187 00:07:00,880 --> 00:07:04,479 website talks to the velociraptor.apps 188 00:07:03,039 --> 00:07:06,479 there's going to be links at the you 189 00:07:04,479 --> 00:07:09,680 know at the end to it but we have this 190 00:07:06,479 --> 00:07:12,560 thing called uh the artifact exchange 191 00:07:09,680 --> 00:07:13,840 and this is where people share 192 00:07:12,560 --> 00:07:15,039 their different artifacts so you can see 193 00:07:13,840 --> 00:07:16,639 there's a whole bunch of different 194 00:07:15,039 --> 00:07:18,800 artifacts here 195 00:07:16,639 --> 00:07:20,080 um you know for example 196 00:07:18,800 --> 00:07:22,880 look for j 197 00:07:20,080 --> 00:07:24,160 uh detection you know someone has a 198 00:07:22,880 --> 00:07:25,520 contributed 199 00:07:24,160 --> 00:07:28,560 log4j 200 00:07:25,520 --> 00:07:31,759 artifact and this is the vql that runs 201 00:07:28,560 --> 00:07:33,919 so we can simply share those uh easily 202 00:07:31,759 --> 00:07:37,199 so let me just show you quickly 203 00:07:33,919 --> 00:07:39,120 uh in velociraptor we call artifacts are 204 00:07:37,199 --> 00:07:41,520 those vql libraries 205 00:07:39,120 --> 00:07:44,000 that contains those queries and so these 206 00:07:41,520 --> 00:07:45,919 are the ones that come you know uh built 207 00:07:44,000 --> 00:07:47,440 in and you can see that you know this is 208 00:07:45,919 --> 00:07:49,520 the query here 209 00:07:47,440 --> 00:07:51,919 and these are all built in 210 00:07:49,520 --> 00:07:55,680 right and we can actually 211 00:07:51,919 --> 00:07:59,039 leverage that uh artifact exchange to 212 00:07:55,680 --> 00:08:01,520 obtain all of that community sourced 213 00:07:59,039 --> 00:08:03,120 artifacts so these are built-in and i 214 00:08:01,520 --> 00:08:04,800 can simply 215 00:08:03,120 --> 00:08:06,800 import 216 00:08:04,800 --> 00:08:08,879 those artifacts 217 00:08:06,800 --> 00:08:13,520 so i just 218 00:08:08,879 --> 00:08:14,639 choose to run a server collection uh and 219 00:08:13,520 --> 00:08:16,000 search for 220 00:08:14,639 --> 00:08:17,440 import 221 00:08:16,000 --> 00:08:18,560 in the artifact 222 00:08:17,440 --> 00:08:20,080 and 223 00:08:18,560 --> 00:08:22,240 select this 224 00:08:20,080 --> 00:08:24,879 uh this artifact so it's it's like 225 00:08:22,240 --> 00:08:27,280 there's a built-in artifact that a 226 00:08:24,879 --> 00:08:29,599 built-in query that populates the server 227 00:08:27,280 --> 00:08:31,520 with the community queries basically so 228 00:08:29,599 --> 00:08:32,479 when we when we collect that from the 229 00:08:31,520 --> 00:08:36,000 server 230 00:08:32,479 --> 00:08:38,880 um then it will go off and fetch 231 00:08:36,000 --> 00:08:41,120 uh you know and fetch the um 232 00:08:38,880 --> 00:08:42,959 all the other artifacts and insert them 233 00:08:41,120 --> 00:08:44,800 into the server we're going to use those 234 00:08:42,959 --> 00:08:46,080 today so that's why i need to do that 235 00:08:44,800 --> 00:08:47,600 first 236 00:08:46,080 --> 00:08:49,839 and you can see that 237 00:08:47,600 --> 00:08:52,000 uh now when i look at all of our so 238 00:08:49,839 --> 00:08:54,560 these are like our artifacts which are 239 00:08:52,000 --> 00:08:56,320 saved queries essentially there are the 240 00:08:54,560 --> 00:08:58,000 built-in ones from before but then there 241 00:08:56,320 --> 00:08:59,360 are ones with the 242 00:08:58,000 --> 00:09:01,200 little user icon these are the 243 00:08:59,360 --> 00:09:03,600 contributed artifacts that came from the 244 00:09:01,200 --> 00:09:04,880 artifact exchange so we can see these 245 00:09:03,600 --> 00:09:07,120 are all the ones and we're going to use 246 00:09:04,880 --> 00:09:09,920 some of those today so so now we have 247 00:09:07,120 --> 00:09:11,600 those loaded so we can use them 248 00:09:09,920 --> 00:09:13,839 um so let's 249 00:09:11,600 --> 00:09:17,839 uh so the artifact exchange again is a 250 00:09:13,839 --> 00:09:21,120 place for exchanging uh your uh these 251 00:09:17,839 --> 00:09:24,160 community contributed artifacts queries 252 00:09:21,120 --> 00:09:26,320 uh and we just imported it from the um 253 00:09:24,160 --> 00:09:28,240 artifact exchange by just going to new 254 00:09:26,320 --> 00:09:29,040 collection from the server 255 00:09:28,240 --> 00:09:30,720 so 256 00:09:29,040 --> 00:09:32,480 um so let's have a look at some actual 257 00:09:30,720 --> 00:09:34,480 example like how do we how can we 258 00:09:32,480 --> 00:09:35,200 actually use this vql 259 00:09:34,480 --> 00:09:38,800 to 260 00:09:35,200 --> 00:09:41,279 um to create some actual 261 00:09:38,800 --> 00:09:42,240 um something useful right 262 00:09:41,279 --> 00:09:44,399 so 263 00:09:42,240 --> 00:09:47,040 let's uh and i'm gonna go through a bit 264 00:09:44,399 --> 00:09:48,080 of the process of creating your own 265 00:09:47,040 --> 00:09:50,399 content 266 00:09:48,080 --> 00:09:51,440 to try and give you guys the idea 267 00:09:50,399 --> 00:09:54,320 of 268 00:09:51,440 --> 00:09:56,399 uh how you can use vql creatively to 269 00:09:54,320 --> 00:09:59,360 make some new content to to make new 270 00:09:56,399 --> 00:10:00,480 detections and new new ideas 271 00:09:59,360 --> 00:10:01,839 so 272 00:10:00,480 --> 00:10:02,640 the first example we're going to look 273 00:10:01,839 --> 00:10:05,040 for 274 00:10:02,640 --> 00:10:07,600 is detecting ssh logging events and 275 00:10:05,040 --> 00:10:09,600 because you know linux typically 276 00:10:07,600 --> 00:10:13,200 uh a lot of the investigations that you 277 00:10:09,600 --> 00:10:14,560 know we do on linux are around ssh 278 00:10:13,200 --> 00:10:16,959 compromise 279 00:10:14,560 --> 00:10:19,600 lateral movement happens by 280 00:10:16,959 --> 00:10:21,440 compromising ssh keys 281 00:10:19,600 --> 00:10:24,320 and you know and then sometimes we have 282 00:10:21,440 --> 00:10:25,200 to go through and recover um 283 00:10:24,320 --> 00:10:27,200 you know 284 00:10:25,200 --> 00:10:29,200 who who logged into this machine where 285 00:10:27,200 --> 00:10:32,399 did they come from these kind of things 286 00:10:29,200 --> 00:10:35,279 so ssh is a big part of linux 287 00:10:32,399 --> 00:10:36,320 investigations not the only part but 288 00:10:35,279 --> 00:10:38,959 we're going to we're going to look at 289 00:10:36,320 --> 00:10:40,640 that as an example today 290 00:10:38,959 --> 00:10:41,680 so we're going to look at 291 00:10:40,640 --> 00:10:45,440 how do we 292 00:10:41,680 --> 00:10:47,760 leverage ssh logs to try and understand 293 00:10:45,440 --> 00:10:49,920 how this kind of attack chain 294 00:10:47,760 --> 00:10:52,640 occurs 295 00:10:49,920 --> 00:10:55,600 so let's take a look at 296 00:10:52,640 --> 00:10:58,399 what does an ssh log look like and 297 00:10:55,600 --> 00:11:01,040 you've all seen i'm sure 298 00:10:58,399 --> 00:11:02,640 ssh logs uh typically they are logged 299 00:11:01,040 --> 00:11:05,440 through syslog 300 00:11:02,640 --> 00:11:07,920 uh and there's a file in syslog var log 301 00:11:05,440 --> 00:11:09,200 off log and it contains or any on 302 00:11:07,920 --> 00:11:12,240 different systems it's in a different 303 00:11:09,200 --> 00:11:15,519 location perhaps uh but essentially 304 00:11:12,240 --> 00:11:17,680 syslog is uh is the defect the default 305 00:11:15,519 --> 00:11:20,240 logging system on linux so i think 306 00:11:17,680 --> 00:11:22,399 pretty much all linux systems use syslog 307 00:11:20,240 --> 00:11:24,160 but syslog is not 308 00:11:22,399 --> 00:11:26,560 uh especially 309 00:11:24,160 --> 00:11:28,640 easy to work with the 310 00:11:26,560 --> 00:11:31,200 the difficulty with this log is that it 311 00:11:28,640 --> 00:11:33,440 it consists of line based unstructured 312 00:11:31,200 --> 00:11:35,200 logs so it's essentially just 313 00:11:33,440 --> 00:11:37,920 you know like a print you know statement 314 00:11:35,200 --> 00:11:40,720 essentially you're printing a a line 315 00:11:37,920 --> 00:11:44,320 and that means something right but from 316 00:11:40,720 --> 00:11:46,720 a um a dfir perspective or you know an 317 00:11:44,320 --> 00:11:49,120 investigation of forensics it's it's 318 00:11:46,720 --> 00:11:52,639 unstructured so it's very hard to 319 00:11:49,120 --> 00:11:54,720 to uh to associate it with anything you 320 00:11:52,639 --> 00:11:56,720 know like to make queries on it because 321 00:11:54,720 --> 00:11:58,320 it's uh it's unstructured 322 00:11:56,720 --> 00:12:00,639 so typically 323 00:11:58,320 --> 00:12:01,519 this is what it looks like uh this is a 324 00:12:00,639 --> 00:12:03,360 line 325 00:12:01,519 --> 00:12:05,680 and it has all the key pieces of 326 00:12:03,360 --> 00:12:07,120 information in it that we want but 327 00:12:05,680 --> 00:12:09,120 they're kind of like all over the place 328 00:12:07,120 --> 00:12:10,880 right so it has the date and as you can 329 00:12:09,120 --> 00:12:12,480 see in syslog even it doesn't have the 330 00:12:10,880 --> 00:12:13,920 year which is 331 00:12:12,480 --> 00:12:15,920 terrible 332 00:12:13,920 --> 00:12:18,399 and then it has the host name it has the 333 00:12:15,920 --> 00:12:20,959 servers the service and then and then it 334 00:12:18,399 --> 00:12:22,720 has some key pieces of information like 335 00:12:20,959 --> 00:12:24,639 whether the key was accepted the 336 00:12:22,720 --> 00:12:27,120 connection was accepted or rejected so 337 00:12:24,639 --> 00:12:28,480 we have the word accepted here 338 00:12:27,120 --> 00:12:30,800 and then we have 339 00:12:28,480 --> 00:12:32,639 what kind of authentication it was from 340 00:12:30,800 --> 00:12:34,720 here and then who's the user 341 00:12:32,639 --> 00:12:37,519 and ip addresses and so on 342 00:12:34,720 --> 00:12:39,760 and this is really bad this is really 343 00:12:37,519 --> 00:12:42,079 hard to to 344 00:12:39,760 --> 00:12:43,920 query against right so when we do an 345 00:12:42,079 --> 00:12:47,040 investigation usually what we need to do 346 00:12:43,920 --> 00:12:49,360 is convert these unstructured 347 00:12:47,040 --> 00:12:52,000 you know essentially text soup i would 348 00:12:49,360 --> 00:12:54,399 say into structured logs that we can 349 00:12:52,000 --> 00:12:57,120 query you know in a proper way and 350 00:12:54,399 --> 00:12:58,639 usually the way this works um well you 351 00:12:57,120 --> 00:13:00,560 know i mean you can write like regular 352 00:12:58,639 --> 00:13:02,800 expressions to try and find little bits 353 00:13:00,560 --> 00:13:03,920 and pieces from that and you know 354 00:13:02,800 --> 00:13:06,160 essentially 355 00:13:03,920 --> 00:13:08,240 the way uh that the industry is kind of 356 00:13:06,160 --> 00:13:09,760 settled on solving this problem is using 357 00:13:08,240 --> 00:13:12,959 something called grog 358 00:13:09,760 --> 00:13:14,880 uh grok is just like a way of 359 00:13:12,959 --> 00:13:17,519 expressing very complicated regular 360 00:13:14,880 --> 00:13:20,320 expressions in a little bit simpler way 361 00:13:17,519 --> 00:13:22,240 so these end up essentially being very 362 00:13:20,320 --> 00:13:24,000 large regular expressions still 363 00:13:22,240 --> 00:13:26,000 and you're kind of matching that against 364 00:13:24,000 --> 00:13:28,320 what the log supposed to look like and 365 00:13:26,000 --> 00:13:29,279 sometimes it sort of works 366 00:13:28,320 --> 00:13:31,519 so 367 00:13:29,279 --> 00:13:32,839 that's kind of i guess that's that's the 368 00:13:31,519 --> 00:13:35,760 state 369 00:13:32,839 --> 00:13:38,560 of that's the state of 370 00:13:35,760 --> 00:13:40,560 of logging on linux is not great so um 371 00:13:38,560 --> 00:13:43,279 so this is the best we can do so let's 372 00:13:40,560 --> 00:13:44,079 just have a look at how we can use vql 373 00:13:43,279 --> 00:13:46,000 to 374 00:13:44,079 --> 00:13:47,839 get some structured information from 375 00:13:46,000 --> 00:13:51,120 these syslogs and i'm going to show you 376 00:13:47,839 --> 00:13:52,959 how to quickly write a vql query so the 377 00:13:51,120 --> 00:13:54,959 first thing that we do is we have this 378 00:13:52,959 --> 00:13:56,480 thing called a notebook and a notebook 379 00:13:54,959 --> 00:13:59,120 is like something 380 00:13:56,480 --> 00:14:01,040 that we can use to build up uh to build 381 00:13:59,120 --> 00:14:02,079 vql and run it interactively sort of 382 00:14:01,040 --> 00:14:04,800 like 383 00:14:02,079 --> 00:14:06,880 if you've ever used a jupiter notebook 384 00:14:04,800 --> 00:14:08,320 so it's sort of similar to that so i'm 385 00:14:06,880 --> 00:14:09,920 going to open this notebook here that 386 00:14:08,320 --> 00:14:10,959 i've prepared earlier just for the sake 387 00:14:09,920 --> 00:14:13,120 of time 388 00:14:10,959 --> 00:14:15,839 uh and going through the example of 389 00:14:13,120 --> 00:14:17,600 parsing ssh logs and so i'm going to 390 00:14:15,839 --> 00:14:20,320 give some i'm going to talk about some 391 00:14:17,600 --> 00:14:23,199 of the vql and point out how it's used 392 00:14:20,320 --> 00:14:25,199 to parse these logs so just for uh to 393 00:14:23,199 --> 00:14:27,199 get better real estate on the screen i'm 394 00:14:25,199 --> 00:14:29,279 just going to change it into full screen 395 00:14:27,199 --> 00:14:31,040 so it's a little bit easier to see so a 396 00:14:29,279 --> 00:14:32,480 notebook consists of 397 00:14:31,040 --> 00:14:34,399 these are called cells and they're kind 398 00:14:32,480 --> 00:14:37,120 of invisible initially but if you click 399 00:14:34,399 --> 00:14:39,040 on it then you know they become obvious 400 00:14:37,120 --> 00:14:41,040 and then we can edit each cell so each 401 00:14:39,040 --> 00:14:43,279 cell is like it's kind of quick it has a 402 00:14:41,040 --> 00:14:46,320 query and then it runs that query so you 403 00:14:43,279 --> 00:14:47,680 can see here this is the vql query 404 00:14:46,320 --> 00:14:49,600 uh here 405 00:14:47,680 --> 00:14:52,240 and so the first query we're just going 406 00:14:49,600 --> 00:14:53,440 to grab the files out of the off logs 407 00:14:52,240 --> 00:14:56,160 right so 408 00:14:53,440 --> 00:14:58,240 sorry grab the lines out of the oslo so 409 00:14:56,160 --> 00:15:00,720 you know as i said syslog is just a line 410 00:14:58,240 --> 00:15:02,639 based format so it's just they're just 411 00:15:00,720 --> 00:15:04,560 straightforward lines and so you can see 412 00:15:02,639 --> 00:15:06,000 that this query what it does 413 00:15:04,560 --> 00:15:07,279 uh there is this thing called a plugin 414 00:15:06,000 --> 00:15:08,399 called pars 415 00:15:07,279 --> 00:15:10,480 uh 416 00:15:08,399 --> 00:15:12,560 uh parse lines and then parselines 417 00:15:10,480 --> 00:15:14,320 basically grabs each line and puts it 418 00:15:12,560 --> 00:15:17,040 out into a variable called line so this 419 00:15:14,320 --> 00:15:18,720 is a query and as a 420 00:15:17,040 --> 00:15:19,920 because it's a query it returns a series 421 00:15:18,720 --> 00:15:22,320 of rows 422 00:15:19,920 --> 00:15:24,320 and columns right so queries always 423 00:15:22,320 --> 00:15:26,720 always return rows and columns and so we 424 00:15:24,320 --> 00:15:28,240 can have this is actually a 425 00:15:26,720 --> 00:15:30,639 whole bunch of rows 426 00:15:28,240 --> 00:15:31,680 and that's the column called line right 427 00:15:30,639 --> 00:15:34,000 so 428 00:15:31,680 --> 00:15:36,240 so this is how we would uh now in vql in 429 00:15:34,000 --> 00:15:37,600 in here uh you know we can we can use 430 00:15:36,240 --> 00:15:39,759 command line completion and things like 431 00:15:37,600 --> 00:15:42,240 that so we can see like what parameters 432 00:15:39,759 --> 00:15:44,320 does you know this plugin use 433 00:15:42,240 --> 00:15:45,680 uh you know or we could do like you know 434 00:15:44,320 --> 00:15:47,680 select 435 00:15:45,680 --> 00:15:48,880 start from 436 00:15:47,680 --> 00:15:50,880 and then we can see these are all the 437 00:15:48,880 --> 00:15:51,759 plugins that we could use 438 00:15:50,880 --> 00:15:54,160 um 439 00:15:51,759 --> 00:15:57,040 you know pause and we can search for it 440 00:15:54,160 --> 00:15:59,040 so so this is the uh preferred interface 441 00:15:57,040 --> 00:16:01,040 to write your query because it really 442 00:15:59,040 --> 00:16:02,800 helps you with writing the query once 443 00:16:01,040 --> 00:16:04,959 you write the query and you click save 444 00:16:02,800 --> 00:16:06,800 then it recalculates it so in this 445 00:16:04,959 --> 00:16:08,560 particular case we are pulling 50 lines 446 00:16:06,800 --> 00:16:10,240 out of the first log so that's the first 447 00:16:08,560 --> 00:16:11,759 step is to just get the lines out but 448 00:16:10,240 --> 00:16:14,079 again they're not structured at this 449 00:16:11,759 --> 00:16:16,240 point so what i want to do is i want to 450 00:16:14,079 --> 00:16:18,720 convert them into something structured 451 00:16:16,240 --> 00:16:20,560 and i use the grok expression that's the 452 00:16:18,720 --> 00:16:22,720 big expression that you see before and 453 00:16:20,560 --> 00:16:23,600 these things are available on the on the 454 00:16:22,720 --> 00:16:25,440 nets 455 00:16:23,600 --> 00:16:26,880 um and there's libraries of them so it's 456 00:16:25,440 --> 00:16:30,320 not like you have to come up with them 457 00:16:26,880 --> 00:16:32,480 yourself but it's basically expands into 458 00:16:30,320 --> 00:16:34,480 a big regular expression that matches 459 00:16:32,480 --> 00:16:37,440 that line like i mentioned before and it 460 00:16:34,480 --> 00:16:38,800 converts it into a structured uh format 461 00:16:37,440 --> 00:16:40,959 and you can see that's that's the 462 00:16:38,800 --> 00:16:43,600 structured format it's like it creates 463 00:16:40,959 --> 00:16:45,759 uh uh the whole thing basically uh 464 00:16:43,600 --> 00:16:47,759 splits into a dictionary and then you 465 00:16:45,759 --> 00:16:49,600 know it has these different fields so it 466 00:16:47,759 --> 00:16:51,680 pulls out specific things and you'll see 467 00:16:49,600 --> 00:16:54,320 that the the sad thing is that the time 468 00:16:51,680 --> 00:16:56,480 stamp again is has no year in it so it's 469 00:16:54,320 --> 00:16:58,320 like it's not easy to parse 470 00:16:56,480 --> 00:16:59,759 um but you know 471 00:16:58,320 --> 00:17:02,079 we've got all the key pieces of 472 00:16:59,759 --> 00:17:03,920 information whether it was accepted you 473 00:17:02,079 --> 00:17:06,959 know what kind of thing it was public 474 00:17:03,920 --> 00:17:09,439 key private key etc program etc so so we 475 00:17:06,959 --> 00:17:11,280 can use that to essentially pull out 476 00:17:09,439 --> 00:17:12,480 these structured information so let me 477 00:17:11,280 --> 00:17:15,360 just um 478 00:17:12,480 --> 00:17:16,799 so now once once we have that we want to 479 00:17:15,360 --> 00:17:18,319 actually create something called an 480 00:17:16,799 --> 00:17:19,760 artifact because we don't want people to 481 00:17:18,319 --> 00:17:22,079 have to type 482 00:17:19,760 --> 00:17:24,000 all of this vql into the gui each time 483 00:17:22,079 --> 00:17:26,480 right it's kind of a pain and error 484 00:17:24,000 --> 00:17:28,319 prone so what we want to do is have have 485 00:17:26,480 --> 00:17:29,919 it somehow encapsulate so we can publish 486 00:17:28,319 --> 00:17:31,440 it in an artifact 487 00:17:29,919 --> 00:17:33,360 so luckily 488 00:17:31,440 --> 00:17:36,160 uh that let me just get out of full 489 00:17:33,360 --> 00:17:37,039 screen mode and go back to our 490 00:17:36,160 --> 00:17:39,520 uh 491 00:17:37,039 --> 00:17:43,520 artifact library here and if i search 492 00:17:39,520 --> 00:17:45,919 for ssh then luckily uh oh no this is 493 00:17:43,520 --> 00:17:47,760 this is this uh there is a built-in one 494 00:17:45,919 --> 00:17:48,720 which is actually exactly the same as 495 00:17:47,760 --> 00:17:49,520 before 496 00:17:48,720 --> 00:17:51,600 uh 497 00:17:49,520 --> 00:17:53,760 it's just now it's just kind of like 498 00:17:51,600 --> 00:17:56,000 encapsulated inside of this thing called 499 00:17:53,760 --> 00:17:57,600 artifact which is we can just use 500 00:17:56,000 --> 00:18:00,320 so we don't need to type any of these 501 00:17:57,600 --> 00:18:02,400 queries in we could just use them and 502 00:18:00,320 --> 00:18:04,400 you can edit it and customize it you 503 00:18:02,400 --> 00:18:05,280 know so this is the query that we've had 504 00:18:04,400 --> 00:18:06,799 before it's a little bit more 505 00:18:05,280 --> 00:18:08,480 complicated now because it's going to 506 00:18:06,799 --> 00:18:10,000 look for different files in different 507 00:18:08,480 --> 00:18:12,480 places because it could be a number of 508 00:18:10,000 --> 00:18:15,120 auth logs and it could be zipped up and 509 00:18:12,480 --> 00:18:16,240 etc right so but you know this is a very 510 00:18:15,120 --> 00:18:18,640 simple thing 511 00:18:16,240 --> 00:18:20,320 um and uh let's just uh let's just find 512 00:18:18,640 --> 00:18:24,080 my favorite machine 513 00:18:20,320 --> 00:18:25,919 uh let's uh pick up uh this one 514 00:18:24,080 --> 00:18:28,640 one of my recent hosts 515 00:18:25,919 --> 00:18:30,640 and uh and this is this is uh i've got a 516 00:18:28,640 --> 00:18:32,720 tag on it called mike right so i've got 517 00:18:30,640 --> 00:18:34,480 a label on that machine so i can go to 518 00:18:32,720 --> 00:18:35,919 it straight away quickly 519 00:18:34,480 --> 00:18:37,840 uh and let's have a look at all the 520 00:18:35,919 --> 00:18:39,919 artifacts that we've collected before so 521 00:18:37,840 --> 00:18:42,559 i've collected some other ones before 522 00:18:39,919 --> 00:18:45,200 right for example i grabbed like uh 523 00:18:42,559 --> 00:18:46,960 different files and so on uh but let me 524 00:18:45,200 --> 00:18:49,679 just add uh 525 00:18:46,960 --> 00:18:50,640 let me just search for this ssh 526 00:18:49,679 --> 00:18:53,280 login 527 00:18:50,640 --> 00:18:56,559 okay so uh in this case what i want to 528 00:18:53,280 --> 00:18:58,480 do is search for that uh ssh login again 529 00:18:56,559 --> 00:19:00,320 it just it just goes through and it 530 00:18:58,480 --> 00:19:02,960 takes parameters here 531 00:19:00,320 --> 00:19:04,720 uh so this is just the defaults 532 00:19:02,960 --> 00:19:07,039 so then the next step i'll configure the 533 00:19:04,720 --> 00:19:09,120 parameters for this artifact 534 00:19:07,039 --> 00:19:10,640 uh and you'll notice that i mean the the 535 00:19:09,120 --> 00:19:12,160 vicuoil is in there but i don't really 536 00:19:10,640 --> 00:19:13,520 need to know anything about it so i 537 00:19:12,160 --> 00:19:15,919 don't need to really pause it or 538 00:19:13,520 --> 00:19:17,919 anything um i've got some defaults that 539 00:19:15,919 --> 00:19:19,760 i can change like maybe if my logs are 540 00:19:17,919 --> 00:19:21,600 in a different place i can look for them 541 00:19:19,760 --> 00:19:23,120 uh and this is the grok expression that 542 00:19:21,600 --> 00:19:25,919 i can maybe tweak a little bit maybe 543 00:19:23,120 --> 00:19:27,600 it's a non-conventional version of ssh 544 00:19:25,919 --> 00:19:29,120 and the logs are a little bit different 545 00:19:27,600 --> 00:19:31,520 yeah that happens 546 00:19:29,120 --> 00:19:34,000 um but anyway the defaults are usually 547 00:19:31,520 --> 00:19:35,840 fine we'll just launch it and uh and go 548 00:19:34,000 --> 00:19:37,919 off and collect it and you'll see that 549 00:19:35,840 --> 00:19:40,240 it's you know it's finished in 0.15 550 00:19:37,919 --> 00:19:42,320 seconds it just got essentially as soon 551 00:19:40,240 --> 00:19:44,160 as i tasked this endpoint it went off 552 00:19:42,320 --> 00:19:46,080 and collected that thing and paused it 553 00:19:44,160 --> 00:19:48,640 right and then if i look at my results 554 00:19:46,080 --> 00:19:50,080 then i've got all the ssh logos all the 555 00:19:48,640 --> 00:19:52,720 ssh logs 556 00:19:50,080 --> 00:19:54,640 um and you can see clearly that this is 557 00:19:52,720 --> 00:19:55,919 a problematic machine right right away 558 00:19:54,640 --> 00:19:59,120 why because we're seeing all these 559 00:19:55,919 --> 00:20:01,440 failed password logins so somehow this 560 00:19:59,120 --> 00:20:02,960 machine is getting uh if you've ever run 561 00:20:01,440 --> 00:20:04,159 you know linux machines on the internet 562 00:20:02,960 --> 00:20:07,280 of course they're going to get brute 563 00:20:04,159 --> 00:20:09,039 force all the time so these failed 564 00:20:07,280 --> 00:20:11,120 passwords you know you're going to see 565 00:20:09,039 --> 00:20:12,559 them a lot if your machine is available 566 00:20:11,120 --> 00:20:14,480 on the internet 567 00:20:12,559 --> 00:20:16,720 you'll see the accepted public key which 568 00:20:14,480 --> 00:20:17,520 is the legitimate users using keys and 569 00:20:16,720 --> 00:20:19,919 that's 570 00:20:17,520 --> 00:20:21,200 prob that's fine probably 571 00:20:19,919 --> 00:20:23,120 but then you have a whole bunch of 572 00:20:21,200 --> 00:20:24,960 passwords here 573 00:20:23,120 --> 00:20:27,039 but what would be really bad what would 574 00:20:24,960 --> 00:20:29,120 be really bad if this machine had a 575 00:20:27,039 --> 00:20:31,440 successful attempt with a password 576 00:20:29,120 --> 00:20:33,520 because that is not good right like so 577 00:20:31,440 --> 00:20:35,120 normally you're supposed to use keys and 578 00:20:33,520 --> 00:20:38,480 if someone's brute forcing the password 579 00:20:35,120 --> 00:20:40,799 and got in then you know then that's not 580 00:20:38,480 --> 00:20:42,640 good right so what we can do is we can 581 00:20:40,799 --> 00:20:44,320 do this thing called post processing of 582 00:20:42,640 --> 00:20:46,559 the data so we've collected all the data 583 00:20:44,320 --> 00:20:49,039 with the artifact from this machine 584 00:20:46,559 --> 00:20:51,360 and i can open up a notebook just to 585 00:20:49,039 --> 00:20:53,120 post-process that one collection 586 00:20:51,360 --> 00:20:54,640 and uh and it's the same thing but what 587 00:20:53,120 --> 00:20:56,799 i'm going to do is i'm going to change 588 00:20:54,640 --> 00:20:58,720 this query and i'm just going to add 589 00:20:56,799 --> 00:21:00,720 conditions so it returns all these rows 590 00:20:58,720 --> 00:21:02,480 but i just want to see the rows 591 00:21:00,720 --> 00:21:04,240 but the result 592 00:21:02,480 --> 00:21:05,760 right matches 593 00:21:04,240 --> 00:21:08,480 accepted 594 00:21:05,760 --> 00:21:11,520 all right because so that uh equal tilde 595 00:21:08,480 --> 00:21:13,120 is the regular expression match operator 596 00:21:11,520 --> 00:21:14,559 uh and 597 00:21:13,120 --> 00:21:15,679 uh methods 598 00:21:14,559 --> 00:21:19,039 matches 599 00:21:15,679 --> 00:21:20,720 password okay so if someone got in with 600 00:21:19,039 --> 00:21:22,000 a password you know that would be super 601 00:21:20,720 --> 00:21:24,240 bad right 602 00:21:22,000 --> 00:21:26,240 and so immediately that thing pops up to 603 00:21:24,240 --> 00:21:28,159 me it's like hey that is not cool right 604 00:21:26,240 --> 00:21:30,559 someone use the password now it could 605 00:21:28,159 --> 00:21:33,840 well be configured to do that maybe it's 606 00:21:30,559 --> 00:21:35,280 okay but usually um that requires 607 00:21:33,840 --> 00:21:36,640 further in 608 00:21:35,280 --> 00:21:37,360 inspection 609 00:21:36,640 --> 00:21:40,000 so 610 00:21:37,360 --> 00:21:42,240 so let's just go back to the slides 611 00:21:40,000 --> 00:21:46,240 and recap so we don't go too far ahead 612 00:21:42,240 --> 00:21:49,280 of the of the slides so we used vql we 613 00:21:46,240 --> 00:21:50,880 could use vql to pass each line out of 614 00:21:49,280 --> 00:21:54,159 the file 615 00:21:50,880 --> 00:21:58,240 and then we applied a grok expression to 616 00:21:54,159 --> 00:22:01,360 create a structure out of the 617 00:21:58,240 --> 00:22:02,960 text soup of the syslog right and then 618 00:22:01,360 --> 00:22:05,280 um and then we 619 00:22:02,960 --> 00:22:07,840 wrapped it in something called an 620 00:22:05,280 --> 00:22:11,360 artifact which basically is a yaml file 621 00:22:07,840 --> 00:22:14,559 with metadata so it has a name and it 622 00:22:11,360 --> 00:22:16,640 has parameters that are declared as part 623 00:22:14,559 --> 00:22:20,159 of the the artifact 624 00:22:16,640 --> 00:22:22,559 and you can see that here if i um 625 00:22:20,159 --> 00:22:22,559 simply 626 00:22:22,960 --> 00:22:27,200 find it again so this it's built in 627 00:22:24,640 --> 00:22:29,200 right but in this case but um 628 00:22:27,200 --> 00:22:31,440 you can click edit and then you can see 629 00:22:29,200 --> 00:22:33,600 this is what an artifact looks like 630 00:22:31,440 --> 00:22:36,000 right it has different parts the name 631 00:22:33,600 --> 00:22:37,919 description references and then it has 632 00:22:36,000 --> 00:22:39,600 these parameters section and those are 633 00:22:37,919 --> 00:22:42,400 the things that we can change you know 634 00:22:39,600 --> 00:22:44,159 when we run it so essentially that query 635 00:22:42,400 --> 00:22:45,600 you don't need to really kind of i mean 636 00:22:44,159 --> 00:22:47,919 you can look at it right but you don't 637 00:22:45,600 --> 00:22:51,120 need to really type it each time once 638 00:22:47,919 --> 00:22:53,440 that artifact is created then it's just 639 00:22:51,120 --> 00:22:55,440 ready to be used by anyone right 640 00:22:53,440 --> 00:22:57,039 um and so it can be easily discovered we 641 00:22:55,440 --> 00:23:00,000 just searched for it and ran it and it 642 00:22:57,039 --> 00:23:02,400 was done right so all we did is we can 643 00:23:00,000 --> 00:23:04,960 search for it in our artifact library 644 00:23:02,400 --> 00:23:06,960 which is that um third 645 00:23:04,960 --> 00:23:08,480 thing here 646 00:23:06,960 --> 00:23:10,080 you can 647 00:23:08,480 --> 00:23:12,240 you can 648 00:23:10,080 --> 00:23:14,240 view artifact screen right 649 00:23:12,240 --> 00:23:15,919 uh and then uh and then we selected it 650 00:23:14,240 --> 00:23:18,559 we can look at it we can customize it 651 00:23:15,919 --> 00:23:21,039 and we can collect it so we've collected 652 00:23:18,559 --> 00:23:23,280 it on a system this was the system that 653 00:23:21,039 --> 00:23:25,440 we were looking at so the one that shows 654 00:23:23,280 --> 00:23:27,840 up up the top here so you know in this 655 00:23:25,440 --> 00:23:29,520 case it was this machine here 656 00:23:27,840 --> 00:23:31,520 right and then we went to collected 657 00:23:29,520 --> 00:23:32,799 artifacts and we collected that artifact 658 00:23:31,520 --> 00:23:34,640 and you know 659 00:23:32,799 --> 00:23:36,559 um so that's what we're doing with that 660 00:23:34,640 --> 00:23:38,799 machine just specifically 661 00:23:36,559 --> 00:23:41,039 but now uh we would really like to be 662 00:23:38,799 --> 00:23:42,640 able to do it like everywhere like you 663 00:23:41,039 --> 00:23:44,799 know look we have a thousand machines 664 00:23:42,640 --> 00:23:47,360 right so we want to know 665 00:23:44,799 --> 00:23:49,840 you know did anyone you know pass would 666 00:23:47,360 --> 00:23:52,240 put force our password in any of our you 667 00:23:49,840 --> 00:23:54,559 know machines so going from 668 00:23:52,240 --> 00:23:57,120 investigating one machine to 669 00:23:54,559 --> 00:23:59,039 investigating thunder machines is easy 670 00:23:57,120 --> 00:24:01,760 it's called hunt so we just go and hit 671 00:23:59,039 --> 00:24:02,840 hunt manager create a new hunt 672 00:24:01,760 --> 00:24:06,960 look 673 00:24:02,840 --> 00:24:08,559 for password logins description 674 00:24:06,960 --> 00:24:10,320 uh and this tells me like how many 675 00:24:08,559 --> 00:24:12,240 machines it's expecting that it will 676 00:24:10,320 --> 00:24:14,159 apply to and if i say around everywhere 677 00:24:12,240 --> 00:24:16,720 that's all my deployment 678 00:24:14,159 --> 00:24:18,240 i can match it by label and you know 679 00:24:16,720 --> 00:24:20,240 there's only one machine that has that 680 00:24:18,240 --> 00:24:23,840 label so i can just target it just with 681 00:24:20,240 --> 00:24:25,919 that uh label or if i just match it by 682 00:24:23,840 --> 00:24:28,880 um all my linux machines in this case 683 00:24:25,919 --> 00:24:30,559 they're all linux anyway so so that you 684 00:24:28,880 --> 00:24:33,360 know that's all of them 685 00:24:30,559 --> 00:24:35,520 um and then if i just simply click uh 686 00:24:33,360 --> 00:24:38,159 search for ssh 687 00:24:35,520 --> 00:24:39,840 uh we're gonna do the same thing but on 688 00:24:38,159 --> 00:24:41,279 all our thousand machines and and then 689 00:24:39,840 --> 00:24:42,320 just go right so 690 00:24:41,279 --> 00:24:43,840 uh 691 00:24:42,320 --> 00:24:46,400 when we you know see how it's in the 692 00:24:43,840 --> 00:24:47,919 post state so we can just start it 693 00:24:46,400 --> 00:24:49,279 all right and you can see that as soon 694 00:24:47,919 --> 00:24:50,320 as i click start it's starting to 695 00:24:49,279 --> 00:24:51,919 schedule it 696 00:24:50,320 --> 00:24:54,159 and it goes off 697 00:24:51,919 --> 00:24:56,559 uh you know scheduling it for all the 698 00:24:54,159 --> 00:24:58,400 machines right 200 300 it's going to go 699 00:24:56,559 --> 00:25:00,240 off and collect the results from every 700 00:24:58,400 --> 00:25:01,840 single one now because each one of them 701 00:25:00,240 --> 00:25:04,320 is doing it in parallel they are all 702 00:25:01,840 --> 00:25:05,679 coming back pretty quick um and so you 703 00:25:04,320 --> 00:25:08,640 know we can 704 00:25:05,679 --> 00:25:10,720 we can see the um the results as they 705 00:25:08,640 --> 00:25:12,240 come in so let me just uh let me just 706 00:25:10,720 --> 00:25:14,000 see where are we 707 00:25:12,240 --> 00:25:16,000 okay so that's hunting and processing so 708 00:25:14,000 --> 00:25:17,600 the same thing is happening but on 709 00:25:16,000 --> 00:25:19,200 multiple machines 710 00:25:17,600 --> 00:25:21,520 and you can do the post processing on 711 00:25:19,200 --> 00:25:24,159 the hunt as well and 712 00:25:21,520 --> 00:25:26,640 and we've seen that let me just move on 713 00:25:24,159 --> 00:25:28,720 to the next example real quick 714 00:25:26,640 --> 00:25:31,120 um and so in this example we're talking 715 00:25:28,720 --> 00:25:34,720 about unsecured search keys so again ssh 716 00:25:31,120 --> 00:25:35,679 is our theme today so let's look at um 717 00:25:34,720 --> 00:25:37,440 how 718 00:25:35,679 --> 00:25:39,679 ssh keys should be protected now we all 719 00:25:37,440 --> 00:25:41,200 know that we need to protect our ssh 720 00:25:39,679 --> 00:25:44,000 keys with at least the passphrase 721 00:25:41,200 --> 00:25:45,600 because if we don't then someone can 722 00:25:44,000 --> 00:25:47,520 that can break into that machine they 723 00:25:45,600 --> 00:25:51,039 can just use those ssh keys the 724 00:25:47,520 --> 00:25:53,200 unprotected ones to uh laterally move 725 00:25:51,039 --> 00:25:54,559 from that machine to all the other 726 00:25:53,200 --> 00:25:57,200 machines on the environment right 727 00:25:54,559 --> 00:25:58,720 without having any impediments right so 728 00:25:57,200 --> 00:26:01,919 essentially that key 729 00:25:58,720 --> 00:26:04,159 becomes you know a liability so we need 730 00:26:01,919 --> 00:26:06,240 to uh protect them with a password but 731 00:26:04,159 --> 00:26:08,799 you know like on many environments for 732 00:26:06,240 --> 00:26:10,960 instance in aws when you get a key pair 733 00:26:08,799 --> 00:26:12,240 they're not encrypted or not protected 734 00:26:10,960 --> 00:26:13,679 and a lot of people just look at them 735 00:26:12,240 --> 00:26:15,200 and they're like okay cool and they use 736 00:26:13,679 --> 00:26:16,960 them but they don't realize they need to 737 00:26:15,200 --> 00:26:19,360 go through that extra step of actually 738 00:26:16,960 --> 00:26:21,520 encrypting them uh and so they have a 739 00:26:19,360 --> 00:26:23,520 lot of these keys lying around the 740 00:26:21,520 --> 00:26:24,880 environment that are not protected so 741 00:26:23,520 --> 00:26:26,400 what we want to do here is we want to 742 00:26:24,880 --> 00:26:27,840 find you know all the keys in the 743 00:26:26,400 --> 00:26:30,400 environments that are not properly 744 00:26:27,840 --> 00:26:33,120 encrypted so let's take a look at how we 745 00:26:30,400 --> 00:26:35,760 can parse these these files this private 746 00:26:33,120 --> 00:26:39,200 key format so let me just show you like 747 00:26:35,760 --> 00:26:42,000 how you come up with this id this query 748 00:26:39,200 --> 00:26:43,760 for something quite new so again i've 749 00:26:42,000 --> 00:26:45,440 got an example here 750 00:26:43,760 --> 00:26:47,600 of um 751 00:26:45,440 --> 00:26:49,679 of you know a new notebook i'll just 752 00:26:47,600 --> 00:26:52,400 make it full screen again 753 00:26:49,679 --> 00:26:54,080 and we can see that the first query the 754 00:26:52,400 --> 00:26:56,159 first thing i'm going to do is i'm just 755 00:26:54,080 --> 00:26:57,520 going to read that file right this is my 756 00:26:56,159 --> 00:26:59,520 private key 757 00:26:57,520 --> 00:27:02,640 i'm just going to read it and 758 00:26:59,520 --> 00:27:04,720 i have a read file function in vql and i 759 00:27:02,640 --> 00:27:07,279 can see that it looks like you know it 760 00:27:04,720 --> 00:27:09,279 has this uh header a private key and 761 00:27:07,279 --> 00:27:12,159 then it has a whole bunch of what looks 762 00:27:09,279 --> 00:27:14,480 to be base64 encoded something and then 763 00:27:12,159 --> 00:27:16,880 there's a tail on the end so that's 764 00:27:14,480 --> 00:27:19,440 that's cool so then clearly the thing in 765 00:27:16,880 --> 00:27:22,720 between the the thing between here and 766 00:27:19,440 --> 00:27:25,600 there is looks to be base64 767 00:27:22,720 --> 00:27:27,679 uh encrypted uh encoded right so let's 768 00:27:25,600 --> 00:27:29,440 um let's take a look at 769 00:27:27,679 --> 00:27:31,120 how we decode it 770 00:27:29,440 --> 00:27:32,480 and um 771 00:27:31,120 --> 00:27:34,559 what i'm going to do the first thing is 772 00:27:32,480 --> 00:27:35,919 i'm going to use a regular expression to 773 00:27:34,559 --> 00:27:37,840 pull out 774 00:27:35,919 --> 00:27:40,799 the data between the key 775 00:27:37,840 --> 00:27:43,520 the start the the header and the end 776 00:27:40,799 --> 00:27:46,240 right and that will give me the first 777 00:27:43,520 --> 00:27:48,559 part which is just that base 64-bit 778 00:27:46,240 --> 00:27:52,399 right and then in the second part i'm 779 00:27:48,559 --> 00:27:54,320 going to decode it with base64 decoding 780 00:27:52,399 --> 00:27:55,600 and so i can see straight away this is 781 00:27:54,320 --> 00:27:58,399 what you know so there's a whole bunch 782 00:27:55,600 --> 00:28:00,640 of binary data but straight away you can 783 00:27:58,399 --> 00:28:02,640 see that there is something here right 784 00:28:00,640 --> 00:28:05,440 it says open ssh key 785 00:28:02,640 --> 00:28:06,720 and it has none and none and if you look 786 00:28:05,440 --> 00:28:08,240 and there's a whole bunch of information 787 00:28:06,720 --> 00:28:10,880 here that could be useful as well like 788 00:28:08,240 --> 00:28:12,159 inside the key so that where it was made 789 00:28:10,880 --> 00:28:14,799 and things like that 790 00:28:12,159 --> 00:28:16,880 and you know the the type of key that it 791 00:28:14,799 --> 00:28:20,480 is and so on so there's some information 792 00:28:16,880 --> 00:28:23,039 here that is quite useful but uh but all 793 00:28:20,480 --> 00:28:24,480 this thing is you know binary so now we 794 00:28:23,039 --> 00:28:27,120 have this problem of like okay we have 795 00:28:24,480 --> 00:28:29,520 all this binary data and i can see stuff 796 00:28:27,120 --> 00:28:30,640 in there but i have no idea how to pass 797 00:28:29,520 --> 00:28:31,840 it out 798 00:28:30,640 --> 00:28:33,200 so let's uh 799 00:28:31,840 --> 00:28:35,120 so let's 800 00:28:33,200 --> 00:28:38,640 do some research on google what is this 801 00:28:35,120 --> 00:28:40,960 binary blob uh what is the structure 802 00:28:38,640 --> 00:28:43,520 so we have this there is a someone has 803 00:28:40,960 --> 00:28:46,240 done some research which is great 804 00:28:43,520 --> 00:28:49,279 and describing the uh the format so 805 00:28:46,240 --> 00:28:50,720 let's go to that site here 806 00:28:49,279 --> 00:28:52,799 okay so this is just a page on the 807 00:28:50,720 --> 00:28:55,200 internet that explains the format and 808 00:28:52,799 --> 00:28:57,600 you can see here uh the format is a 809 00:28:55,200 --> 00:28:59,919 binary format and there is like there is 810 00:28:57,600 --> 00:29:02,799 a description of you know so there's 811 00:28:59,919 --> 00:29:05,679 like the length and and some um 812 00:29:02,799 --> 00:29:07,840 uh it explains how the binary data is 813 00:29:05,679 --> 00:29:09,520 you know structured right so we can take 814 00:29:07,840 --> 00:29:12,720 this information and we can build a 815 00:29:09,520 --> 00:29:15,840 binary parser to extract this info this 816 00:29:12,720 --> 00:29:17,679 uh fields out of the binary data 817 00:29:15,840 --> 00:29:19,440 so uh like you know you can write a 818 00:29:17,679 --> 00:29:20,159 python script or something to get it out 819 00:29:19,440 --> 00:29:22,880 but 820 00:29:20,159 --> 00:29:24,000 in vql we actually have built-in binary 821 00:29:22,880 --> 00:29:25,760 parser 822 00:29:24,000 --> 00:29:28,000 and i just want to quickly show you that 823 00:29:25,760 --> 00:29:29,679 i'm not going to go into details about 824 00:29:28,000 --> 00:29:32,640 the parser 825 00:29:29,679 --> 00:29:34,320 uh but in this private keys one 826 00:29:32,640 --> 00:29:36,159 i'm just going to show you 827 00:29:34,320 --> 00:29:38,159 what puzzle looks like so it's kind of 828 00:29:36,159 --> 00:29:41,279 like a descriptive thing there is a 829 00:29:38,159 --> 00:29:43,200 profile that we can use to describe 830 00:29:41,279 --> 00:29:45,360 how each field is laid out so you know 831 00:29:43,200 --> 00:29:47,360 that's that's the header there's a magic 832 00:29:45,360 --> 00:29:50,159 string at offset zero and then the 833 00:29:47,360 --> 00:29:52,960 length of the cipher is offset 15 and 834 00:29:50,159 --> 00:29:55,279 it's a uni-32 big endian 835 00:29:52,960 --> 00:29:56,960 and then the cipher itself which is the 836 00:29:55,279 --> 00:29:59,440 string that describes you know what 837 00:29:56,960 --> 00:30:01,679 cipher is used to encrypt it uh is at 838 00:29:59,440 --> 00:30:04,240 offset 19 and you know the length is 839 00:30:01,679 --> 00:30:06,720 given by you know that other field so so 840 00:30:04,240 --> 00:30:10,000 we can have this so this is this is this 841 00:30:06,720 --> 00:30:12,159 is a description of uh of the binary 842 00:30:10,000 --> 00:30:14,240 format and we can use that to pass the 843 00:30:12,159 --> 00:30:16,320 keys out so again the same thing we're 844 00:30:14,240 --> 00:30:17,919 going to go out to our machine here 845 00:30:16,320 --> 00:30:20,320 and we're going to add another 846 00:30:17,919 --> 00:30:22,399 collection and let's look for our 847 00:30:20,320 --> 00:30:24,559 private keys 848 00:30:22,399 --> 00:30:25,840 okay so here's our private key on a real 849 00:30:24,559 --> 00:30:27,520 machine 850 00:30:25,840 --> 00:30:29,760 and we're going to collect this artifact 851 00:30:27,520 --> 00:30:31,600 again we don't need to necessarily 852 00:30:29,760 --> 00:30:34,000 understand the vql we just need to know 853 00:30:31,600 --> 00:30:36,720 how to use it so if we go over here we 854 00:30:34,000 --> 00:30:38,799 can change the parameters and this one 855 00:30:36,720 --> 00:30:39,840 basically there's a bit more complexity 856 00:30:38,799 --> 00:30:41,679 in the 857 00:30:39,840 --> 00:30:43,279 uh in this artifact because it can 858 00:30:41,679 --> 00:30:44,640 search for keys everywhere and we want 859 00:30:43,279 --> 00:30:46,880 to make sure that it doesn't go into 860 00:30:44,640 --> 00:30:49,520 proc and you know and then get lost in 861 00:30:46,880 --> 00:30:51,520 there right so uh so we can so there's a 862 00:30:49,520 --> 00:30:53,360 few more functions of functionality than 863 00:30:51,520 --> 00:30:55,520 you know we've just described uh but 864 00:30:53,360 --> 00:30:56,320 basically we go off and and collect this 865 00:30:55,520 --> 00:30:58,559 thing 866 00:30:56,320 --> 00:31:00,399 and it comes back in you know a matter 867 00:30:58,559 --> 00:31:02,720 of seconds and we can say oh look you 868 00:31:00,399 --> 00:31:04,640 know this user has a key and it's 869 00:31:02,720 --> 00:31:06,880 protected so that's great so so this is 870 00:31:04,640 --> 00:31:07,840 good right we checked it and and that's 871 00:31:06,880 --> 00:31:11,840 good 872 00:31:07,840 --> 00:31:14,159 so um but you know maybe that key has 873 00:31:11,840 --> 00:31:16,640 other that user has other keys lying 874 00:31:14,159 --> 00:31:18,480 around right so we only by default 875 00:31:16,640 --> 00:31:21,200 search for the keys if you look at the 876 00:31:18,480 --> 00:31:24,320 parameter uh or the the default 877 00:31:21,200 --> 00:31:26,880 parameter uh only uses it in slash home 878 00:31:24,320 --> 00:31:28,880 slash ubuntu sshd which is the location 879 00:31:26,880 --> 00:31:32,080 where normally the keys sit right but 880 00:31:28,880 --> 00:31:34,399 let's uh let's um search we copy that 881 00:31:32,080 --> 00:31:35,919 artifact and we we can tell it to search 882 00:31:34,399 --> 00:31:36,720 you know everywhere 883 00:31:35,919 --> 00:31:38,640 so 884 00:31:36,720 --> 00:31:40,320 that's the default search pattern which 885 00:31:38,640 --> 00:31:42,399 is wildcard 886 00:31:40,320 --> 00:31:44,559 right and we can just make it if if we 887 00:31:42,399 --> 00:31:47,600 do star star that's like a recursive 888 00:31:44,559 --> 00:31:50,320 search for this the system we look for 889 00:31:47,600 --> 00:31:52,480 pam id rsa or idsa these are the three 890 00:31:50,320 --> 00:31:54,640 types of names that we're gonna search 891 00:31:52,480 --> 00:31:57,360 for and uh and you know we go ahead and 892 00:31:54,640 --> 00:31:58,559 we do that and uh and it's gonna take a 893 00:31:57,360 --> 00:31:59,840 little bit longer because it's gonna 894 00:31:58,559 --> 00:32:01,039 search through the whole system so i'm 895 00:31:59,840 --> 00:32:03,440 going to leave it for a couple of 896 00:32:01,039 --> 00:32:06,559 seconds and we'll come back to it later 897 00:32:03,440 --> 00:32:08,320 but you can see that just recapping uh 898 00:32:06,559 --> 00:32:10,720 we read the file we basically fold the 899 00:32:08,320 --> 00:32:12,960 coded we noticed some binary data 900 00:32:10,720 --> 00:32:14,960 and we created a parser for it now 901 00:32:12,960 --> 00:32:17,039 because the parser is in vql we don't 902 00:32:14,960 --> 00:32:20,080 really need to rebuild or recompile or 903 00:32:17,039 --> 00:32:22,000 redeploy anything right we just we just 904 00:32:20,080 --> 00:32:24,559 you know write the vql it's descriptive 905 00:32:22,000 --> 00:32:26,880 and it the vql can go ahead and uh 906 00:32:24,559 --> 00:32:29,279 dissect that data out of the endpoint 907 00:32:26,880 --> 00:32:31,519 right so so then you know we wrote it 908 00:32:29,279 --> 00:32:32,240 into an artifact and then we collected 909 00:32:31,519 --> 00:32:34,320 it 910 00:32:32,240 --> 00:32:36,640 from the artifact uh 911 00:32:34,320 --> 00:32:40,000 um repository here 912 00:32:36,640 --> 00:32:42,320 and uh let me just see if it's finished 913 00:32:40,000 --> 00:32:44,640 yeah it's it's it's only taken uh 16 914 00:32:42,320 --> 00:32:46,640 seconds to go over the file system it's 915 00:32:44,640 --> 00:32:48,480 only a cloud vm so it's quite small it 916 00:32:46,640 --> 00:32:49,519 could take longer on other systems but 917 00:32:48,480 --> 00:32:52,799 we can see 918 00:32:49,519 --> 00:32:55,360 uh that this user has some aws keys 919 00:32:52,799 --> 00:32:56,799 here and they don't have any cyphers so 920 00:32:55,360 --> 00:32:59,519 this immediately because that's the 921 00:32:56,799 --> 00:33:00,880 default way that aws creates those keys 922 00:32:59,519 --> 00:33:02,399 this user didn't 923 00:33:00,880 --> 00:33:03,919 you know go to the extra step of 924 00:33:02,399 --> 00:33:06,960 re-securing their keys after they 925 00:33:03,919 --> 00:33:09,039 downloaded them from the aws console so 926 00:33:06,960 --> 00:33:10,960 this is this is really problematic and 927 00:33:09,039 --> 00:33:12,640 this is a really big deal we can see 928 00:33:10,960 --> 00:33:14,399 lateral movement through these keys all 929 00:33:12,640 --> 00:33:15,120 the time right because people don't do 930 00:33:14,399 --> 00:33:18,399 that 931 00:33:15,120 --> 00:33:20,640 so this is now we can go ahead and uh 932 00:33:18,399 --> 00:33:22,399 and you know tell the user hey you know 933 00:33:20,640 --> 00:33:24,000 you've done the wrong thing let's fix it 934 00:33:22,399 --> 00:33:26,559 but let's just think about what actually 935 00:33:24,000 --> 00:33:28,320 happened here um is that we could 936 00:33:26,559 --> 00:33:31,279 actually do that we could actually do 937 00:33:28,320 --> 00:33:33,120 that as a hand on all the systems um 938 00:33:31,279 --> 00:33:34,799 maybe we can do that real quick 939 00:33:33,120 --> 00:33:36,399 um you know 940 00:33:34,799 --> 00:33:39,519 i've shown you how to do that before but 941 00:33:36,399 --> 00:33:41,600 like you know uh search 942 00:33:39,519 --> 00:33:45,360 search or tam 943 00:33:41,600 --> 00:33:47,600 uh and again we do the same thing 944 00:33:45,360 --> 00:33:49,519 for the private key so it's the same 945 00:33:47,600 --> 00:33:51,760 process right but we're just gonna do it 946 00:33:49,519 --> 00:33:53,200 you know everywhere instead of on one 947 00:33:51,760 --> 00:33:55,279 machine 948 00:33:53,200 --> 00:33:57,120 and then go for it and what's going to 949 00:33:55,279 --> 00:33:58,720 happen now is that all of our machines 950 00:33:57,120 --> 00:34:00,320 are going to go like all thousands of 951 00:33:58,720 --> 00:34:02,080 them and it could be more right they're 952 00:34:00,320 --> 00:34:03,919 going to go and search for that on their 953 00:34:02,080 --> 00:34:05,919 own system but because each one is doing 954 00:34:03,919 --> 00:34:08,079 it sort of in parallel then it still 955 00:34:05,919 --> 00:34:10,399 doesn't take very long to uh to do that 956 00:34:08,079 --> 00:34:12,320 so the goal come back and getting the 957 00:34:10,399 --> 00:34:15,119 the results you know so like just like 958 00:34:12,320 --> 00:34:16,960 before uh it's very calculated so before 959 00:34:15,119 --> 00:34:18,639 we found all the logins right the same 960 00:34:16,960 --> 00:34:20,000 thing so they all came back from all the 961 00:34:18,639 --> 00:34:22,399 machines right then we could still do 962 00:34:20,000 --> 00:34:25,119 the post processing then uh in this case 963 00:34:22,399 --> 00:34:26,800 we uh we're doing the the same thing uh 964 00:34:25,119 --> 00:34:29,359 see if the results are here yet no 965 00:34:26,800 --> 00:34:31,679 they're still coming um and 966 00:34:29,359 --> 00:34:34,480 uh and then we can we can find that oh 967 00:34:31,679 --> 00:34:36,079 here we go so we've got some data there 968 00:34:34,480 --> 00:34:38,000 and uh 969 00:34:36,079 --> 00:34:40,480 yeah so we can we can then see you know 970 00:34:38,000 --> 00:34:41,919 everybody's you know keys and someone 971 00:34:40,480 --> 00:34:42,720 that that are 972 00:34:41,919 --> 00:34:44,720 uh 973 00:34:42,720 --> 00:34:46,560 this this all these machines thousand 974 00:34:44,720 --> 00:34:47,359 machines are kind of virtual all the 975 00:34:46,560 --> 00:34:48,720 same 976 00:34:47,359 --> 00:34:50,079 machines so we're going to get the same 977 00:34:48,720 --> 00:34:52,800 data but 978 00:34:50,079 --> 00:34:54,320 you get the idea of hunting so this is 979 00:34:52,800 --> 00:34:56,399 cool the other thing that's cool about 980 00:34:54,320 --> 00:34:58,240 it so this is how we created a new hunt 981 00:34:56,399 --> 00:35:00,640 we configured it 982 00:34:58,240 --> 00:35:03,040 and then we ran it now the interesting 983 00:35:00,640 --> 00:35:05,520 thing about it is that we haven't 984 00:35:03,040 --> 00:35:07,520 actually downloaded anyone's keys right 985 00:35:05,520 --> 00:35:09,280 so it's not like we went out 986 00:35:07,520 --> 00:35:11,119 grabbed all the keys and ran a python 987 00:35:09,280 --> 00:35:12,560 script locally to check are they 988 00:35:11,119 --> 00:35:14,240 encrypted because obviously that would 989 00:35:12,560 --> 00:35:15,040 be like really bad right because we 990 00:35:14,240 --> 00:35:17,520 don't 991 00:35:15,040 --> 00:35:20,960 copy everybody's private keys right so 992 00:35:17,520 --> 00:35:22,720 having it done by the end point uh means 993 00:35:20,960 --> 00:35:25,599 means that we can we we don't have to 994 00:35:22,720 --> 00:35:27,599 get the data essentially all right last 995 00:35:25,599 --> 00:35:30,640 uh last example recovering deleted log 996 00:35:27,599 --> 00:35:32,240 so we looked at uh how you know the logs 997 00:35:30,640 --> 00:35:34,079 look at syslog 998 00:35:32,240 --> 00:35:37,280 but let's say in a lot of cases you know 999 00:35:34,079 --> 00:35:39,359 people delete the logs or uh compromises 1000 00:35:37,280 --> 00:35:42,000 happened so long ago that the logs got 1001 00:35:39,359 --> 00:35:43,760 rotated uh maybe a few weeks you know 1002 00:35:42,000 --> 00:35:45,520 before normally it's like four weeks 1003 00:35:43,760 --> 00:35:47,760 after and then they get rotated out 1004 00:35:45,520 --> 00:35:50,560 depending on the rotation policy uh 1005 00:35:47,760 --> 00:35:52,960 those logs can be aggressively rotated 1006 00:35:50,560 --> 00:35:54,720 so in that case we really we really need 1007 00:35:52,960 --> 00:35:56,560 to go back in time and try and find 1008 00:35:54,720 --> 00:35:59,359 forensic evidence of 1009 00:35:56,560 --> 00:36:01,760 these compromises from the logs uh and 1010 00:35:59,359 --> 00:36:03,599 we try to recover deleted logs now if we 1011 00:36:01,760 --> 00:36:05,280 if we are back to that you know this is 1012 00:36:03,599 --> 00:36:07,040 is not good right it's not a good 1013 00:36:05,280 --> 00:36:07,839 outcome it's better to have vlogs right 1014 00:36:07,040 --> 00:36:09,680 but 1015 00:36:07,839 --> 00:36:11,200 if you are struggling 1016 00:36:09,680 --> 00:36:13,200 and we don't have logs then we can use a 1017 00:36:11,200 --> 00:36:14,400 technique called carving and carving is 1018 00:36:13,200 --> 00:36:16,160 a very simple technique where we 1019 00:36:14,400 --> 00:36:17,760 basically look for patterns in 1020 00:36:16,160 --> 00:36:19,280 unstructured data 1021 00:36:17,760 --> 00:36:21,760 and the idea is that when someone 1022 00:36:19,280 --> 00:36:23,760 deletes those logs then the data is 1023 00:36:21,760 --> 00:36:26,480 still on the disk so we we might be able 1024 00:36:23,760 --> 00:36:29,119 to find it you know from just the disk 1025 00:36:26,480 --> 00:36:31,280 unstructured data so let me just show 1026 00:36:29,119 --> 00:36:32,560 you an example of how that's how we how 1027 00:36:31,280 --> 00:36:34,079 we do that 1028 00:36:32,560 --> 00:36:35,839 so in this 1029 00:36:34,079 --> 00:36:37,680 particular example let me just make it 1030 00:36:35,839 --> 00:36:38,960 full screen again so the idea is to try 1031 00:36:37,680 --> 00:36:41,520 and look for 1032 00:36:38,960 --> 00:36:43,760 uh patterns that look like a syslog 1033 00:36:41,520 --> 00:36:46,320 message right so uh we've seen before 1034 00:36:43,760 --> 00:36:48,240 the syslog starts with uh jan feb 1035 00:36:46,320 --> 00:36:50,960 martial look at month name and then it 1036 00:36:48,240 --> 00:36:53,200 has the dates of the month and then it 1037 00:36:50,960 --> 00:36:55,760 has you know the the time and then we 1038 00:36:53,200 --> 00:36:57,520 know that you know it's a line so we're 1039 00:36:55,760 --> 00:37:00,320 going to take all the characters until 1040 00:36:57,520 --> 00:37:02,880 the next new line so that's one line so 1041 00:37:00,320 --> 00:37:04,400 when we do this we can so we're going to 1042 00:37:02,880 --> 00:37:07,280 do this query here 1043 00:37:04,400 --> 00:37:09,440 we use a tool called yara which is uh 1044 00:37:07,280 --> 00:37:11,920 used for essentially applying regular 1045 00:37:09,440 --> 00:37:14,880 expressions at scale it's very fast and 1046 00:37:11,920 --> 00:37:16,079 efficient and so on uh and so we can do 1047 00:37:14,880 --> 00:37:17,760 that on 1048 00:37:16,079 --> 00:37:19,680 ideally we want to do it on a device in 1049 00:37:17,760 --> 00:37:21,599 the end but for testing we're just going 1050 00:37:19,680 --> 00:37:22,800 to do it on the real file to find the 1051 00:37:21,599 --> 00:37:25,040 right 1052 00:37:22,800 --> 00:37:27,520 regular expression and so on and so you 1053 00:37:25,040 --> 00:37:29,359 know we go ahead we we grab that file 1054 00:37:27,520 --> 00:37:30,720 and we run the cr expression on it and 1055 00:37:29,359 --> 00:37:33,280 it's supposed to 1056 00:37:30,720 --> 00:37:35,599 uh hit on all of the 1057 00:37:33,280 --> 00:37:37,839 all of the lines that sort of look like 1058 00:37:35,599 --> 00:37:40,480 that right that sort of look like maybe 1059 00:37:37,839 --> 00:37:42,880 a c slog line right so when when i run 1060 00:37:40,480 --> 00:37:44,720 this the first query so that's this one 1061 00:37:42,880 --> 00:37:47,359 uh you can see that there is you know 1062 00:37:44,720 --> 00:37:49,040 some hex data here and it matches you 1063 00:37:47,359 --> 00:37:51,440 know that kind of pattern right so we 1064 00:37:49,040 --> 00:37:54,880 got the gen 16 and then it goes all the 1065 00:37:51,440 --> 00:37:56,640 way to and again no no year right but um 1066 00:37:54,880 --> 00:37:58,720 you know so this basically pulls out our 1067 00:37:56,640 --> 00:38:01,359 log lines or things that look sort of 1068 00:37:58,720 --> 00:38:04,640 like a log line so you know uh then what 1069 00:38:01,359 --> 00:38:07,680 we're going to do is we're going to uh 1070 00:38:04,640 --> 00:38:10,079 extract the actual hit and look for only 1071 00:38:07,680 --> 00:38:12,240 things that sort of look like maybe ssh 1072 00:38:10,079 --> 00:38:13,760 logins right so it has to have the word 1073 00:38:12,240 --> 00:38:16,000 either accepted or failed that's really 1074 00:38:13,760 --> 00:38:18,240 all we care about in this case and so 1075 00:38:16,000 --> 00:38:20,160 you know that's the second query here 1076 00:38:18,240 --> 00:38:22,720 and so you can see the hit 1077 00:38:20,160 --> 00:38:25,599 is uh essentially 1078 00:38:22,720 --> 00:38:27,839 you know uh the ssh 1079 00:38:25,599 --> 00:38:29,920 keys right so it's accepted a login 1080 00:38:27,839 --> 00:38:31,440 accepted login and so on right so so 1081 00:38:29,920 --> 00:38:33,440 basically all we do is we write this 1082 00:38:31,440 --> 00:38:34,880 thing and then we just 1083 00:38:33,440 --> 00:38:37,359 and then again we do the same thing with 1084 00:38:34,880 --> 00:38:39,760 rocket and all the rest right and so we 1085 00:38:37,359 --> 00:38:41,920 can apply that and get 1086 00:38:39,760 --> 00:38:43,280 uh and then carve out 1087 00:38:41,920 --> 00:38:44,800 uh 1088 00:38:43,280 --> 00:38:46,800 the 1089 00:38:44,800 --> 00:38:48,800 ssh logs that could be deleted so let's 1090 00:38:46,800 --> 00:38:50,800 have a look at this one comes from the 1091 00:38:48,800 --> 00:38:54,240 exchange so again it's a probably 1092 00:38:50,800 --> 00:38:56,560 probably contributed content and again 1093 00:38:54,240 --> 00:38:59,119 this is the query this is the regular 1094 00:38:56,560 --> 00:39:02,240 expression that finds out the error rule 1095 00:38:59,119 --> 00:39:06,000 and then this is the grok expression 1096 00:39:02,240 --> 00:39:07,760 okay so let's go over here and carve it 1097 00:39:06,000 --> 00:39:10,640 now carving takes a long time because 1098 00:39:07,760 --> 00:39:12,720 you are really looking for the raw disk 1099 00:39:10,640 --> 00:39:13,520 right so if we actually look at 1100 00:39:12,720 --> 00:39:15,520 uh 1101 00:39:13,520 --> 00:39:17,359 it's it's it's looking at the raw device 1102 00:39:15,520 --> 00:39:20,400 and then just looking for patterns that 1103 00:39:17,359 --> 00:39:22,640 sort of look like ssh um 1104 00:39:20,400 --> 00:39:24,320 messages right so it could take a while 1105 00:39:22,640 --> 00:39:26,480 to do it's going to scan all the disk 1106 00:39:24,320 --> 00:39:28,400 this is a cloud machine so it's not that 1107 00:39:26,480 --> 00:39:29,680 big but it could take a while so let's 1108 00:39:28,400 --> 00:39:32,400 just leave it 1109 00:39:29,680 --> 00:39:35,119 and we'll come back to that 1110 00:39:32,400 --> 00:39:36,720 okay so just recapping 1111 00:39:35,119 --> 00:39:38,640 and then we're going to see some hits 1112 00:39:36,720 --> 00:39:41,200 over here um we're going to see that 1113 00:39:38,640 --> 00:39:43,359 later so in the last five minutes i just 1114 00:39:41,200 --> 00:39:45,920 want to show you guys another cool 1115 00:39:43,359 --> 00:39:48,640 feature in velociraptor which is about 1116 00:39:45,920 --> 00:39:50,160 monitoring events from the endpoints so 1117 00:39:48,640 --> 00:39:52,400 normally when we run a query we talked 1118 00:39:50,160 --> 00:39:54,839 about vql and you can see it's really 1119 00:39:52,400 --> 00:39:58,240 quick and it finishes and gives you 1120 00:39:54,839 --> 00:40:01,040 a table but it doesn't have to finish 1121 00:39:58,240 --> 00:40:03,119 right so the query actually returns data 1122 00:40:01,040 --> 00:40:05,280 as soon as it's available so that means 1123 00:40:03,119 --> 00:40:07,280 that if we can write a query that runs 1124 00:40:05,280 --> 00:40:09,200 sort of forever right 1125 00:40:07,280 --> 00:40:12,480 then as soon as something happens it 1126 00:40:09,200 --> 00:40:14,800 will return stream data back so vql can 1127 00:40:12,480 --> 00:40:18,000 support streaming queries and that is 1128 00:40:14,800 --> 00:40:19,760 where event queries 1129 00:40:18,000 --> 00:40:22,400 go in so you can write 1130 00:40:19,760 --> 00:40:24,720 a query that is running all the time but 1131 00:40:22,400 --> 00:40:26,160 it's constantly streaming back 1132 00:40:24,720 --> 00:40:28,400 uh 1133 00:40:26,160 --> 00:40:30,480 you know rows and then that row those 1134 00:40:28,400 --> 00:40:32,560 rows can simply be forwarded to the 1135 00:40:30,480 --> 00:40:34,720 server and then we are collecting those 1136 00:40:32,560 --> 00:40:35,920 as events so we can use that for a 1137 00:40:34,720 --> 00:40:38,319 number of things we can use it for 1138 00:40:35,920 --> 00:40:41,839 monitoring and also we can use it for 1139 00:40:38,319 --> 00:40:43,760 response for creating uh for automating 1140 00:40:41,839 --> 00:40:46,079 response so we can go and do stuff based 1141 00:40:43,760 --> 00:40:48,160 on those those queries so here's an 1142 00:40:46,079 --> 00:40:51,520 example of i just wanted to show you 1143 00:40:48,160 --> 00:40:53,680 guys the query here how do we turn that 1144 00:40:51,520 --> 00:40:56,000 other query that we did before which was 1145 00:40:53,680 --> 00:40:58,160 remember we had 1146 00:40:56,000 --> 00:41:01,520 passed lines which just goes off and 1147 00:40:58,160 --> 00:41:04,000 reads the lines but there is a similar 1148 00:41:01,520 --> 00:41:07,200 event version of that query called watch 1149 00:41:04,000 --> 00:41:09,839 syslog and that is watching the the line 1150 00:41:07,200 --> 00:41:11,680 so it's essentially like a tail uh uh a 1151 00:41:09,839 --> 00:41:13,680 tail that shaft or you know something 1152 00:41:11,680 --> 00:41:15,200 like that um 1153 00:41:13,680 --> 00:41:17,359 or or a 1154 00:41:15,200 --> 00:41:19,760 less with a tail following or whatever 1155 00:41:17,359 --> 00:41:22,240 right so it looks at the end of the file 1156 00:41:19,760 --> 00:41:24,240 it has watches for new lines to appear 1157 00:41:22,240 --> 00:41:26,480 and then it releases each line into the 1158 00:41:24,240 --> 00:41:29,359 query so it never terminates but once we 1159 00:41:26,480 --> 00:41:31,680 run this query it will always work and 1160 00:41:29,359 --> 00:41:34,240 and then just grok the lines as they 1161 00:41:31,680 --> 00:41:36,480 come and then filter them out and if 1162 00:41:34,240 --> 00:41:38,160 they're ssh then it will say hey that's 1163 00:41:36,480 --> 00:41:40,240 that's you know an interesting one and 1164 00:41:38,160 --> 00:41:44,079 it will pass it on and so we can use 1165 00:41:40,240 --> 00:41:46,240 that to um to monitor for ssh logins so 1166 00:41:44,079 --> 00:41:48,079 there is this artifact here which is 1167 00:41:46,240 --> 00:41:49,920 windows event ssh login i'll just 1168 00:41:48,079 --> 00:41:52,079 quickly show you that 1169 00:41:49,920 --> 00:41:54,640 so again we're going to look for ssh and 1170 00:41:52,079 --> 00:41:56,240 this one is an event version of that and 1171 00:41:54,640 --> 00:41:58,560 we can see that it's a client it's a 1172 00:41:56,240 --> 00:42:00,240 slightly different type but it's still 1173 00:41:58,560 --> 00:42:03,440 the same kind of general structure it's 1174 00:42:00,240 --> 00:42:06,079 still an artifact and but to install it 1175 00:42:03,440 --> 00:42:07,599 we have to go into this screen here 1176 00:42:06,079 --> 00:42:10,880 which shows us 1177 00:42:07,599 --> 00:42:14,079 uh the event monitoring on the client so 1178 00:42:10,880 --> 00:42:16,560 we can target it specifically to a label 1179 00:42:14,079 --> 00:42:18,720 group say mike 1180 00:42:16,560 --> 00:42:21,760 and you know and then otherwise it's 1181 00:42:18,720 --> 00:42:24,079 kind of the same um ui right we just 1182 00:42:21,760 --> 00:42:26,400 select which ones we want and then we 1183 00:42:24,079 --> 00:42:27,280 can configure them and so on right 1184 00:42:26,400 --> 00:42:29,200 and 1185 00:42:27,280 --> 00:42:31,119 and then once we do that then the event 1186 00:42:29,200 --> 00:42:33,280 starts streaming in 1187 00:42:31,119 --> 00:42:35,440 so we can see that so in this case for 1188 00:42:33,280 --> 00:42:37,200 instance uh you can see that there is 1189 00:42:35,440 --> 00:42:39,119 one event that came in 1190 00:42:37,200 --> 00:42:40,720 i had them on before so i can show you 1191 00:42:39,119 --> 00:42:42,800 how it looks like 1192 00:42:40,720 --> 00:42:44,880 when someone logs in then immediately 1193 00:42:42,800 --> 00:42:46,960 that event is streamed to the server so 1194 00:42:44,880 --> 00:42:49,119 it's not log folding it's not just 1195 00:42:46,960 --> 00:42:51,119 forwarding all the logs indiscriminately 1196 00:42:49,119 --> 00:42:53,280 it's doing the querying and processing 1197 00:42:51,119 --> 00:42:54,960 on the endpoint directly and then just 1198 00:42:53,280 --> 00:42:56,319 for forwarding back 1199 00:42:54,960 --> 00:42:57,680 uh just 1200 00:42:56,319 --> 00:42:59,520 those ones that are relevant to the 1201 00:42:57,680 --> 00:43:00,400 query right so we can do we can do both 1202 00:42:59,520 --> 00:43:02,319 we can 1203 00:43:00,400 --> 00:43:05,200 uh forward all the events or we can do 1204 00:43:02,319 --> 00:43:07,119 the post uh process the pre-filtering 1205 00:43:05,200 --> 00:43:08,960 and the processing on the endpoint and 1206 00:43:07,119 --> 00:43:10,880 just forward back those really 1207 00:43:08,960 --> 00:43:13,119 high-valued i mean this is a really 1208 00:43:10,880 --> 00:43:15,119 high-valued event uh you know and it 1209 00:43:13,119 --> 00:43:16,800 could be sitting between thousands of 1210 00:43:15,119 --> 00:43:18,720 syslog lines right we don't care about 1211 00:43:16,800 --> 00:43:21,280 those we just care about this one so 1212 00:43:18,720 --> 00:43:22,240 they all go in the same place right uh 1213 00:43:21,280 --> 00:43:24,560 then the 1214 00:43:22,240 --> 00:43:26,720 let me just finally the last uh thing 1215 00:43:24,560 --> 00:43:28,480 that i wanted to show you guys 1216 00:43:26,720 --> 00:43:30,480 uh so this is how we collect the events 1217 00:43:28,480 --> 00:43:32,400 right we just run and we see those uh 1218 00:43:30,480 --> 00:43:35,119 things the last thing that i wanted to 1219 00:43:32,400 --> 00:43:37,520 show uh to talk about is sysmon and 1220 00:43:35,119 --> 00:43:40,880 cismo is really exciting it's sysmon is 1221 00:43:37,520 --> 00:43:42,960 like the default um i guess kernel 1222 00:43:40,880 --> 00:43:44,480 events monitoring tool for windows so 1223 00:43:42,960 --> 00:43:47,119 it's been around for a long time on 1224 00:43:44,480 --> 00:43:50,240 windows and just recently they've 1225 00:43:47,119 --> 00:43:51,839 released a sysmon for linux based on 1226 00:43:50,240 --> 00:43:54,000 ebpf 1227 00:43:51,839 --> 00:43:55,440 and we've talked a lot about ebpf in 1228 00:43:54,000 --> 00:43:58,800 this conference especially in the kernel 1229 00:43:55,440 --> 00:44:01,520 hacking uh minicom early on uh early on 1230 00:43:58,800 --> 00:44:03,680 in conference so ebpf is a method for us 1231 00:44:01,520 --> 00:44:05,599 to be able to get information from the 1232 00:44:03,680 --> 00:44:08,000 kernel about things like process 1233 00:44:05,599 --> 00:44:09,599 execution network connections all that 1234 00:44:08,000 --> 00:44:11,119 really good stuff from the detection 1235 00:44:09,599 --> 00:44:14,000 perspective 1236 00:44:11,119 --> 00:44:16,960 and sysmon is now a nice easy way of 1237 00:44:14,000 --> 00:44:19,200 getting into that um it's still immature 1238 00:44:16,960 --> 00:44:20,480 you know it's still a little bit buggy 1239 00:44:19,200 --> 00:44:22,800 but it has a lot of interest from the 1240 00:44:20,480 --> 00:44:26,240 community everybody's excited about it 1241 00:44:22,800 --> 00:44:28,240 um it still has some shortfalls 1242 00:44:26,240 --> 00:44:30,319 but i'll just show you how how it looks 1243 00:44:28,240 --> 00:44:32,160 like that this is the sysmon uh the 1244 00:44:30,319 --> 00:44:33,680 synthetic version itself just writes the 1245 00:44:32,160 --> 00:44:35,119 syslog which is 1246 00:44:33,680 --> 00:44:36,720 terrible because then you have to apply 1247 00:44:35,119 --> 00:44:37,920 these regular expressions to get the 1248 00:44:36,720 --> 00:44:39,599 data out 1249 00:44:37,920 --> 00:44:41,839 um so i've 1250 00:44:39,599 --> 00:44:43,440 written a patch to fix it to 1251 00:44:41,839 --> 00:44:46,079 write it to unix domain socket so it's a 1252 00:44:43,440 --> 00:44:48,319 lot more efficient and json encoded 1253 00:44:46,079 --> 00:44:50,640 and so we can use this plugin called 1254 00:44:48,319 --> 00:44:53,920 netcat which connects to the unix domain 1255 00:44:50,640 --> 00:44:55,599 socket and reads all the lines out uh 1256 00:44:53,920 --> 00:44:57,839 and but otherwise it's exactly the same 1257 00:44:55,599 --> 00:44:59,839 path that jason didn't show it so 1258 00:44:57,839 --> 00:45:01,040 let me just quickly show you what that 1259 00:44:59,839 --> 00:45:03,520 looks like 1260 00:45:01,040 --> 00:45:05,119 uh so all we do is we just collect that 1261 00:45:03,520 --> 00:45:08,160 from our endpoint 1262 00:45:05,119 --> 00:45:10,079 uh and you can see that it's basically 1263 00:45:08,160 --> 00:45:12,079 uh it's it's giving us this structured 1264 00:45:10,079 --> 00:45:14,160 information about 1265 00:45:12,079 --> 00:45:16,240 you know process execution like here's a 1266 00:45:14,160 --> 00:45:18,640 ps command line that ran 1267 00:45:16,240 --> 00:45:20,240 uh you know where it ran from and so on 1268 00:45:18,640 --> 00:45:22,720 a lot of these fields are they kind of 1269 00:45:20,240 --> 00:45:24,960 all only make sense on windows but maybe 1270 00:45:22,720 --> 00:45:26,800 there isn't really an equivalent 1271 00:45:24,960 --> 00:45:28,319 you know source data source for it on 1272 00:45:26,800 --> 00:45:30,319 linux but 1273 00:45:28,319 --> 00:45:32,000 um but you can use them to just like 1274 00:45:30,319 --> 00:45:33,920 filter and say oh you know when this 1275 00:45:32,000 --> 00:45:35,760 process ran what was the parent process 1276 00:45:33,920 --> 00:45:38,319 what did it do and then you can write 1277 00:45:35,760 --> 00:45:40,079 detections based on that so again 1278 00:45:38,319 --> 00:45:42,000 so much mike we've run up against time 1279 00:45:40,079 --> 00:45:45,520 and we need to keep on schedule no 1280 00:45:42,000 --> 00:45:49,440 worries uh well so just last last slide 1281 00:45:45,520 --> 00:45:51,599 um just references check out the uh the 1282 00:45:49,440 --> 00:45:53,839 the website the github uh and the 1283 00:45:51,599 --> 00:45:54,720 discord and um thank you very much for 1284 00:45:53,839 --> 00:45:56,079 your time 1285 00:45:54,720 --> 00:45:58,400 thank you very much um i hope that you 1286 00:45:56,079 --> 00:46:00,560 can drop some of those links and answer 1287 00:45:58,400 --> 00:46:01,680 the questions in the text chat in 1288 00:46:00,560 --> 00:46:06,640 venulis 1289 00:46:01,680 --> 00:46:07,440 uh up next at three at uh 2 25 p.m adt 1290 00:46:06,640 --> 00:46:10,240 is j 1291 00:46:07,440 --> 00:46:11,680 rosen rosenbaum with rolfer initiative 1292 00:46:10,240 --> 00:46:13,119 how to make the world of ai a more 1293 00:46:11,680 --> 00:46:14,400 ethical place 1294 00:46:13,119 --> 00:46:17,400 thank you very much mike 1295 00:46:14,400 --> 00:46:17,400 thanks