1 00:00:06,320 --> 00:00:11,499 [Music] 2 00:00:15,360 --> 00:00:20,480 i'd like to introduce claire 3 00:00:17,520 --> 00:00:22,160 um claire is an urban planner 4 00:00:20,480 --> 00:00:24,480 and programmer and is currently 5 00:00:22,160 --> 00:00:26,000 undertaking a phd in the intersection of 6 00:00:24,480 --> 00:00:28,240 these two fields 7 00:00:26,000 --> 00:00:30,320 they find themselves writing scripts 8 00:00:28,240 --> 00:00:32,399 frequently during their research to help 9 00:00:30,320 --> 00:00:35,120 them record and make sense of the large 10 00:00:32,399 --> 00:00:38,000 amounts of qualitative data necessary to 11 00:00:35,120 --> 00:00:39,680 answer questions about the adoption of 12 00:00:38,000 --> 00:00:41,200 digital technology and the future of 13 00:00:39,680 --> 00:00:43,280 urban planning work 14 00:00:41,200 --> 00:00:45,440 outside of their phd claire works with 15 00:00:43,280 --> 00:00:47,760 others in urban planning pro in the 16 00:00:45,440 --> 00:00:49,680 urban planning profession to advocate 17 00:00:47,760 --> 00:00:51,920 for the use of open technology and 18 00:00:49,680 --> 00:00:54,640 standards to ensure good governance 19 00:00:51,920 --> 00:00:57,280 and our cities and regions so over to 20 00:00:54,640 --> 00:00:59,440 you claire 21 00:00:57,280 --> 00:01:03,440 hi thanks for that 22 00:00:59,440 --> 00:01:05,119 great looks like it's all working 23 00:01:03,440 --> 00:01:06,560 um 24 00:01:05,119 --> 00:01:08,560 let's get started 25 00:01:06,560 --> 00:01:11,119 so before we begin i would like to 26 00:01:08,560 --> 00:01:12,880 acknowledge the wonga people who are the 27 00:01:11,119 --> 00:01:14,400 traditional owners of the land from 28 00:01:12,880 --> 00:01:16,400 which i am speaking 29 00:01:14,400 --> 00:01:19,360 to you from today and acknowledge their 30 00:01:16,400 --> 00:01:21,840 elders past present and emerging and to 31 00:01:19,360 --> 00:01:24,080 also acknowledge that sovereignty was 32 00:01:21,840 --> 00:01:26,640 never seated 33 00:01:24,080 --> 00:01:28,720 so hi my name is claire daniel and i'm 34 00:01:26,640 --> 00:01:31,200 an urban planner and data science type 35 00:01:28,720 --> 00:01:34,560 person currently undertaking a phd at 36 00:01:31,200 --> 00:01:36,960 unsw and now as part of this phd i 37 00:01:34,560 --> 00:01:39,600 conducted a citation network analysis of 38 00:01:36,960 --> 00:01:41,759 the planning support systems literature 39 00:01:39,600 --> 00:01:43,040 now as fascinating as planning support 40 00:01:41,759 --> 00:01:46,320 systems are 41 00:01:43,040 --> 00:01:48,560 and as interesting as this niche area of 42 00:01:46,320 --> 00:01:50,799 academic endeavor is 43 00:01:48,560 --> 00:01:52,799 i've made the call that is probably not 44 00:01:50,799 --> 00:01:55,600 relevant to most of you watching today 45 00:01:52,799 --> 00:01:58,159 however the process of undertaking a 46 00:01:55,600 --> 00:02:00,560 citation network analysis may well be 47 00:01:58,159 --> 00:02:02,159 quite relevant for people within the 48 00:02:00,560 --> 00:02:04,399 galleries libraries and museums 49 00:02:02,159 --> 00:02:07,680 community so it is that which i thought 50 00:02:04,399 --> 00:02:10,239 i'd talk to you about today 51 00:02:07,680 --> 00:02:13,120 so right from the start what is citation 52 00:02:10,239 --> 00:02:15,120 network analysis well it is a type of 53 00:02:13,120 --> 00:02:17,200 systematic literature review which is 54 00:02:15,120 --> 00:02:19,920 essentially conceptually quite simple 55 00:02:17,200 --> 00:02:22,959 but can be computationally attached 56 00:02:19,920 --> 00:02:24,879 and um as we know in academic literature 57 00:02:22,959 --> 00:02:26,640 there are very strict con conventions 58 00:02:24,879 --> 00:02:28,959 saying if there's any information or 59 00:02:26,640 --> 00:02:31,280 ideas that have been published elsewhere 60 00:02:28,959 --> 00:02:33,440 that those are referenced within 61 00:02:31,280 --> 00:02:37,440 scholarly publications and there are a 62 00:02:33,440 --> 00:02:41,680 huge body of standards and syntax strict 63 00:02:37,440 --> 00:02:43,680 syntax the kind of process of doing so 64 00:02:41,680 --> 00:02:46,080 um and a citation network analysis 65 00:02:43,680 --> 00:02:48,319 simply assumes that if one document has 66 00:02:46,080 --> 00:02:51,040 cited another document that these two 67 00:02:48,319 --> 00:02:53,200 documents are somehow related and what 68 00:02:51,040 --> 00:02:55,840 you can then do is you can map out all 69 00:02:53,200 --> 00:02:57,200 of these citation relationships using a 70 00:02:55,840 --> 00:02:59,599 formal mathematical network 71 00:02:57,200 --> 00:03:02,159 representation or mathematical graph 72 00:02:59,599 --> 00:03:04,560 and you upon which you can then do 73 00:03:02,159 --> 00:03:06,000 various quantitative statistics which 74 00:03:04,560 --> 00:03:09,280 will give you insights into the 75 00:03:06,000 --> 00:03:11,760 structure of your research field 76 00:03:09,280 --> 00:03:14,720 so next up in our program how to 77 00:03:11,760 --> 00:03:17,760 citation network analysis well according 78 00:03:14,720 --> 00:03:19,360 to zaun strotman 2015 there are five 79 00:03:17,760 --> 00:03:21,040 steps which i will be going through in 80 00:03:19,360 --> 00:03:23,840 more detail 81 00:03:21,040 --> 00:03:27,040 step one delineation of your research 82 00:03:23,840 --> 00:03:29,599 field now this is difficult you do have 83 00:03:27,040 --> 00:03:31,440 to put a boundary around the research 84 00:03:29,599 --> 00:03:32,480 papers that you want to include you 85 00:03:31,440 --> 00:03:35,840 cannot 86 00:03:32,480 --> 00:03:37,840 feasibly do an analysis of everything so 87 00:03:35,840 --> 00:03:39,920 like most forms of systematic review 88 00:03:37,840 --> 00:03:43,440 this delineation is usually done on a 89 00:03:39,920 --> 00:03:44,480 keyword search in a citation database 90 00:03:43,440 --> 00:03:46,400 um 91 00:03:44,480 --> 00:03:48,239 and so what are your options for 92 00:03:46,400 --> 00:03:50,799 databases 93 00:03:48,239 --> 00:03:53,200 well traditionally there have been only 94 00:03:50,799 --> 00:03:56,080 two databases with decent coverage of 95 00:03:53,200 --> 00:03:58,799 published papers that is scopus and 96 00:03:56,080 --> 00:04:00,400 weber science of course google scholar 97 00:03:58,799 --> 00:04:02,239 has for long 98 00:04:00,400 --> 00:04:04,000 a long period of time now provided a 99 00:04:02,239 --> 00:04:06,319 free means of searching the academic 100 00:04:04,000 --> 00:04:07,519 literature that currently provides no 101 00:04:06,319 --> 00:04:09,920 easy way 102 00:04:07,519 --> 00:04:11,920 to to download data from that system in 103 00:04:09,920 --> 00:04:13,680 bulk and provides no api to its 104 00:04:11,920 --> 00:04:15,439 databases 105 00:04:13,680 --> 00:04:17,440 so traditionally weber science and 106 00:04:15,439 --> 00:04:20,320 scopus have been the only way to do 107 00:04:17,440 --> 00:04:22,079 broad scale citation analysis 108 00:04:20,320 --> 00:04:24,880 now my university is subscribed to 109 00:04:22,079 --> 00:04:26,720 scopus and for my analysis i use the 110 00:04:24,880 --> 00:04:28,400 scopus api 111 00:04:26,720 --> 00:04:30,320 um but the fact that i was advised to 112 00:04:28,400 --> 00:04:31,440 use the proprietary api for this 113 00:04:30,320 --> 00:04:34,160 analysis 114 00:04:31,440 --> 00:04:35,759 kind of seemed like another artifact of 115 00:04:34,160 --> 00:04:38,560 the structural problems that we have in 116 00:04:35,759 --> 00:04:41,440 academia with unjust academic paywalls 117 00:04:38,560 --> 00:04:43,840 and citation data is really important so 118 00:04:41,440 --> 00:04:47,120 the amount of academic literature is 119 00:04:43,840 --> 00:04:50,080 growing exponentially and without access 120 00:04:47,120 --> 00:04:51,440 to the citation data we kind of create 121 00:04:50,080 --> 00:04:53,680 the risk of 122 00:04:51,440 --> 00:04:55,520 duplication of effort creating even more 123 00:04:53,680 --> 00:04:58,240 research silos 124 00:04:55,520 --> 00:05:00,479 and even more importantly 125 00:04:58,240 --> 00:05:03,039 ensure like not 126 00:05:00,479 --> 00:05:03,840 preventing access to academic research 127 00:05:03,039 --> 00:05:06,000 from 128 00:05:03,840 --> 00:05:08,240 which is often publicly funded 129 00:05:06,000 --> 00:05:10,160 uh making it harder for for people who 130 00:05:08,240 --> 00:05:12,400 need to actually use that research to 131 00:05:10,160 --> 00:05:13,360 access it 132 00:05:12,400 --> 00:05:15,280 so 133 00:05:13,360 --> 00:05:17,440 in preparation for this talk i have 134 00:05:15,280 --> 00:05:19,919 actually done a little bit of reading 135 00:05:17,440 --> 00:05:23,039 about some of the new initiatives that 136 00:05:19,919 --> 00:05:25,039 are changing this status quo um and to 137 00:05:23,039 --> 00:05:26,479 qualify this information i'm not 138 00:05:25,039 --> 00:05:30,320 personally involved in any of the 139 00:05:26,479 --> 00:05:31,600 projects that i am about to mention 140 00:05:30,320 --> 00:05:33,440 so 141 00:05:31,600 --> 00:05:36,320 it seems that the initiative for open 142 00:05:33,440 --> 00:05:38,800 citations has been a big influence in 143 00:05:36,320 --> 00:05:39,919 recent years to make citation data more 144 00:05:38,800 --> 00:05:42,160 accessible 145 00:05:39,919 --> 00:05:44,800 crossref of course has existed since 146 00:05:42,160 --> 00:05:48,240 2000 and crossref is the registration 147 00:05:44,800 --> 00:05:50,639 agency for digital object identifiers 148 00:05:48,240 --> 00:05:53,440 for scholarly work and it maintains an 149 00:05:50,639 --> 00:05:54,960 open infrastructure an open database 150 00:05:53,440 --> 00:05:57,759 to which 151 00:05:54,960 --> 00:05:59,840 various publishers submit 152 00:05:57,759 --> 00:06:02,720 details of their publications and it 153 00:05:59,840 --> 00:06:06,800 maintains an open api 154 00:06:02,720 --> 00:06:09,039 um however in 2017 just one percent of 155 00:06:06,800 --> 00:06:11,919 the eligible papers 156 00:06:09,039 --> 00:06:14,960 that were listed in crossref 157 00:06:11,919 --> 00:06:17,120 uh contained open citation data and this 158 00:06:14,960 --> 00:06:20,080 is when the where the initiative for 159 00:06:17,120 --> 00:06:22,000 open citations comes in so this 160 00:06:20,080 --> 00:06:24,400 initiative was supporting publishers to 161 00:06:22,000 --> 00:06:26,479 open this citation data to 162 00:06:24,400 --> 00:06:28,880 allow it to be openly available in 163 00:06:26,479 --> 00:06:30,080 crossref and there's been massive 164 00:06:28,880 --> 00:06:32,880 success 165 00:06:30,080 --> 00:06:34,080 because as of october 2021 the 166 00:06:32,880 --> 00:06:36,240 percentage of 167 00:06:34,080 --> 00:06:39,520 relevant articles that now provide this 168 00:06:36,240 --> 00:06:42,000 data openly is has gone up to a whopping 169 00:06:39,520 --> 00:06:42,000 88 170 00:06:42,400 --> 00:06:46,639 separate to this um there was an 171 00:06:44,720 --> 00:06:48,880 initiative by microsoft called microsoft 172 00:06:46,639 --> 00:06:50,880 academic and this was doing something 173 00:06:48,880 --> 00:06:52,960 similar to the way that google 174 00:06:50,880 --> 00:06:55,199 automatically compiles its records in 175 00:06:52,960 --> 00:06:57,440 google scholar but unlike google 176 00:06:55,199 --> 00:06:59,599 microsoft academic made 177 00:06:57,440 --> 00:07:02,560 its data available over 178 00:06:59,599 --> 00:07:04,800 under a license like an open attribution 179 00:07:02,560 --> 00:07:06,960 type of license 180 00:07:04,800 --> 00:07:08,960 um and so from these initiatives you 181 00:07:06,960 --> 00:07:10,720 start to see a number of open projects 182 00:07:08,960 --> 00:07:13,520 for searching the literature kind of 183 00:07:10,720 --> 00:07:15,360 starting to bloom um that rely on one or 184 00:07:13,520 --> 00:07:17,599 the other of these two databases or a 185 00:07:15,360 --> 00:07:19,840 combination of both and this is quite 186 00:07:17,599 --> 00:07:22,400 good because the search functions built 187 00:07:19,840 --> 00:07:24,960 into the cross cross riff infrastructure 188 00:07:22,400 --> 00:07:27,039 are very rudimentarily essentially it's 189 00:07:24,960 --> 00:07:29,919 designed to return the details of an 190 00:07:27,039 --> 00:07:33,120 individual paper if you feed it uh 191 00:07:29,919 --> 00:07:36,080 individual identification information um 192 00:07:33,120 --> 00:07:37,840 like the doi or the 193 00:07:36,080 --> 00:07:39,199 title and also details so on and so 194 00:07:37,840 --> 00:07:41,520 forth 195 00:07:39,199 --> 00:07:43,440 um however 196 00:07:41,520 --> 00:07:45,520 microsoft academic has actually been 197 00:07:43,440 --> 00:07:47,599 retired as of about two weeks ago 198 00:07:45,520 --> 00:07:49,440 december 2021 199 00:07:47,599 --> 00:07:52,319 um and i'm not quite sure what the 200 00:07:49,440 --> 00:07:54,879 downstream impact this might have on 201 00:07:52,319 --> 00:07:56,639 these projects 202 00:07:54,879 --> 00:07:58,560 luckily however when we look at the 203 00:07:56,639 --> 00:08:01,360 overall coverage of each of these 204 00:07:58,560 --> 00:08:04,400 foundation databases there's been some 205 00:08:01,360 --> 00:08:07,520 analysis done by martin martin at al 206 00:08:04,400 --> 00:08:10,479 and cross rift ref has really shot up so 207 00:08:07,520 --> 00:08:12,639 in mid 2021 they finally convinced 208 00:08:10,479 --> 00:08:14,879 elsevier who i hear is one of the 209 00:08:12,639 --> 00:08:16,720 largest academic publishers in the world 210 00:08:14,879 --> 00:08:19,280 finally convinced them to make their 211 00:08:16,720 --> 00:08:20,560 citation data openly available in 212 00:08:19,280 --> 00:08:22,720 crossref 213 00:08:20,560 --> 00:08:24,639 um and according to the study it is 214 00:08:22,720 --> 00:08:26,319 looking like the 215 00:08:24,639 --> 00:08:28,960 citation databases that have been built 216 00:08:26,319 --> 00:08:31,840 from that data are now rivaling 217 00:08:28,960 --> 00:08:34,000 uh the proprietary databases of scopus 218 00:08:31,840 --> 00:08:35,360 that like kobus dimensions web science 219 00:08:34,000 --> 00:08:38,000 in terms of 220 00:08:35,360 --> 00:08:40,399 the scope of their coverage so that is 221 00:08:38,000 --> 00:08:41,360 good news 222 00:08:40,399 --> 00:08:44,080 right 223 00:08:41,360 --> 00:08:46,399 so once we have chosen our database 224 00:08:44,080 --> 00:08:48,399 there will be various python packages 225 00:08:46,399 --> 00:08:51,680 that will assist with extracting data 226 00:08:48,399 --> 00:08:54,399 from some of the more well-known apis 227 00:08:51,680 --> 00:08:57,040 and most of these apis will return to 228 00:08:54,399 --> 00:08:59,040 you a long list of data metadata about 229 00:08:57,040 --> 00:09:01,360 every individual paper the title the 230 00:08:59,040 --> 00:09:03,600 attract details journal details 231 00:09:01,360 --> 00:09:07,279 institutional details and of course a 232 00:09:03,600 --> 00:09:09,440 list of digital ids for their reference 233 00:09:07,279 --> 00:09:10,880 list 234 00:09:09,440 --> 00:09:14,000 onto step two 235 00:09:10,880 --> 00:09:16,720 construction of the network 236 00:09:14,000 --> 00:09:18,560 so first to construct a raw citation 237 00:09:16,720 --> 00:09:21,680 network for 238 00:09:18,560 --> 00:09:24,399 for our purposes we need to construct a 239 00:09:21,680 --> 00:09:27,440 raw adjacency matrix so this is a large 240 00:09:24,399 --> 00:09:29,920 and sparse matrix of ones and zeros 241 00:09:27,440 --> 00:09:32,399 where one access represents the citing 242 00:09:29,920 --> 00:09:34,959 papers and the other access represents 243 00:09:32,399 --> 00:09:38,480 the cited papers and so you can see on 244 00:09:34,959 --> 00:09:40,640 the slide if paper a size paper b 245 00:09:38,480 --> 00:09:44,720 you then put a one in the corresponding 246 00:09:40,640 --> 00:09:46,880 cell it is as simple as that 247 00:09:44,720 --> 00:09:48,800 and so for those of you who are working 248 00:09:46,880 --> 00:09:50,640 with people perhaps who aren't python 249 00:09:48,800 --> 00:09:52,880 users or aren't programmers 250 00:09:50,640 --> 00:09:54,880 the good news is there is plenty of free 251 00:09:52,880 --> 00:09:56,399 and open source software that has 252 00:09:54,880 --> 00:09:58,880 graphic user interfaces that will 253 00:09:56,399 --> 00:10:01,200 construct these networks automatically 254 00:09:58,880 --> 00:10:02,800 from data that is downloaded manually 255 00:10:01,200 --> 00:10:06,079 from the websites of these various 256 00:10:02,800 --> 00:10:08,720 citation databases and some of the 257 00:10:06,079 --> 00:10:10,399 more useful ones i have put up on the 258 00:10:08,720 --> 00:10:12,160 screen 259 00:10:10,399 --> 00:10:14,399 for those that are python users there 260 00:10:12,160 --> 00:10:16,880 may be packages that do a lot of this 261 00:10:14,399 --> 00:10:19,040 but for my part i constructed a list of 262 00:10:16,880 --> 00:10:21,120 the citation pairs 263 00:10:19,040 --> 00:10:23,600 and then i gave this to the network x 264 00:10:21,120 --> 00:10:27,279 package to construct a graph and then 265 00:10:23,600 --> 00:10:29,600 generate the raw adjacency matrix for me 266 00:10:27,279 --> 00:10:32,320 which was a fairly straight forward 267 00:10:29,600 --> 00:10:33,360 process 268 00:10:32,320 --> 00:10:35,920 so 269 00:10:33,360 --> 00:10:38,720 we have our adjacency matrix we've got 270 00:10:35,920 --> 00:10:40,839 our raw network set up the second step 271 00:10:38,720 --> 00:10:43,200 half of step two comes with a major 272 00:10:40,839 --> 00:10:45,040 methodological consideration 273 00:10:43,200 --> 00:10:47,519 which is which connected is 274 00:10:45,040 --> 00:10:50,000 connectedness measure to use 275 00:10:47,519 --> 00:10:52,320 so essentially the raw citation network 276 00:10:50,000 --> 00:10:55,120 is very sparse uh it contains a lot of 277 00:10:52,320 --> 00:10:57,200 zeros it contains a lot of blank space 278 00:10:55,120 --> 00:11:00,320 um and therefore it's very hard to 279 00:10:57,200 --> 00:11:02,240 calculate any meaningful statistics with 280 00:11:00,320 --> 00:11:03,360 this 281 00:11:02,240 --> 00:11:05,519 matrix 282 00:11:03,360 --> 00:11:07,600 so instead what we do is we calculate 283 00:11:05,519 --> 00:11:09,920 connectedness measures to solidify this 284 00:11:07,600 --> 00:11:12,800 matrix and we use the raw adjacency 285 00:11:09,920 --> 00:11:15,440 matrix to calculate either a co-citation 286 00:11:12,800 --> 00:11:18,320 matrix or a bibliographic coupling 287 00:11:15,440 --> 00:11:19,680 matrix these things are both useful 288 00:11:18,320 --> 00:11:20,880 but they measure slightly different 289 00:11:19,680 --> 00:11:22,320 things so it's important to know the 290 00:11:20,880 --> 00:11:24,800 difference between these two things when 291 00:11:22,320 --> 00:11:27,360 you interpret your results 292 00:11:24,800 --> 00:11:30,720 so to try to explain a co-citation 293 00:11:27,360 --> 00:11:34,160 matrix represents how many times paper a 294 00:11:30,720 --> 00:11:37,360 and paper b appear in the same reference 295 00:11:34,160 --> 00:11:40,480 list of another article 296 00:11:37,360 --> 00:11:43,040 um and a bibliographic coupling matrix 297 00:11:40,480 --> 00:11:45,839 on the other hand measures how many 298 00:11:43,040 --> 00:11:47,040 references paper a and paper b have in 299 00:11:45,839 --> 00:11:48,160 common 300 00:11:47,040 --> 00:11:50,880 and so 301 00:11:48,160 --> 00:11:53,600 as you imagine the co-citation matrix is 302 00:11:50,880 --> 00:11:56,079 most useful in the identification of 303 00:11:53,600 --> 00:11:58,320 groups of influential papers important 304 00:11:56,079 --> 00:12:00,639 in defining the past direction and 305 00:11:58,320 --> 00:12:02,880 structure of your research field while 306 00:12:00,639 --> 00:12:04,000 the bibliographic cutting matrix is a 307 00:12:02,880 --> 00:12:06,079 little more useful for the 308 00:12:04,000 --> 00:12:08,800 identification of clauses or papers that 309 00:12:06,079 --> 00:12:10,800 draw on similar ideas and it's a 310 00:12:08,800 --> 00:12:13,120 slightly more useful measure for 311 00:12:10,800 --> 00:12:16,399 classifying recent papers that have yet 312 00:12:13,120 --> 00:12:16,399 to be cited by others 313 00:12:16,880 --> 00:12:21,360 so in python all you need to do here is 314 00:12:19,279 --> 00:12:24,560 transpose the raw adjacency matrix and 315 00:12:21,360 --> 00:12:27,040 multiply it by the original adjacency 316 00:12:24,560 --> 00:12:30,480 matrix and further solidify i've applied 317 00:12:27,040 --> 00:12:32,639 a cosine similarity function 318 00:12:30,480 --> 00:12:34,639 on to step three which is multivariate 319 00:12:32,639 --> 00:12:36,639 statistical analysis 320 00:12:34,639 --> 00:12:39,279 so the standard method of citation 321 00:12:36,639 --> 00:12:41,839 analysis is then to perform multivariate 322 00:12:39,279 --> 00:12:44,800 statistical analysis or factor analysis 323 00:12:41,839 --> 00:12:46,800 on our matrix this is a linear 324 00:12:44,800 --> 00:12:49,839 math statistical method and it's a great 325 00:12:46,800 --> 00:12:53,120 way to identify a smaller number of 326 00:12:49,839 --> 00:12:54,959 factors when you have a large 327 00:12:53,120 --> 00:12:58,160 uh but in the relationships between a 328 00:12:54,959 --> 00:12:59,920 large number of underlying variables and 329 00:12:58,160 --> 00:13:00,959 in our analysis we have a lot of 330 00:12:59,920 --> 00:13:02,720 underlying 331 00:13:00,959 --> 00:13:04,959 underlying variables because every 332 00:13:02,720 --> 00:13:07,360 individual paper in the group of papers 333 00:13:04,959 --> 00:13:10,959 that we are studying our research corpus 334 00:13:07,360 --> 00:13:12,800 is a is an individual variable um so 335 00:13:10,959 --> 00:13:14,560 what we really need to do is we need to 336 00:13:12,800 --> 00:13:17,279 group these papers together we need to 337 00:13:14,560 --> 00:13:18,720 identify a few underlying research 338 00:13:17,279 --> 00:13:21,200 themes 339 00:13:18,720 --> 00:13:23,839 um and these under the these underlying 340 00:13:21,200 --> 00:13:25,680 factors will will uh these underlying 341 00:13:23,839 --> 00:13:27,440 research themes might represent 342 00:13:25,680 --> 00:13:30,000 different topics or different lines of 343 00:13:27,440 --> 00:13:32,079 inquiry within your research field 344 00:13:30,000 --> 00:13:36,079 um and this type of statistical analysis 345 00:13:32,079 --> 00:13:38,480 has proved to be quite a robust way of 346 00:13:36,079 --> 00:13:41,040 doing this kind of characterization of 347 00:13:38,480 --> 00:13:43,120 your research field 348 00:13:41,040 --> 00:13:45,760 so principal component analysis is the 349 00:13:43,120 --> 00:13:47,760 most common type of factor analysis used 350 00:13:45,760 --> 00:13:49,920 and there are various statistical 351 00:13:47,760 --> 00:13:52,959 program packages in python that will 352 00:13:49,920 --> 00:13:54,800 help you do this 353 00:13:52,959 --> 00:13:57,199 so four network analysis and 354 00:13:54,800 --> 00:13:59,199 visualization so in addition to the 355 00:13:57,199 --> 00:14:01,199 factory analysis there are various other 356 00:13:59,199 --> 00:14:03,760 quantitative measures you can use to 357 00:14:01,199 --> 00:14:05,279 analyze your network um you have your 358 00:14:03,760 --> 00:14:07,839 algorithms that will do your network 359 00:14:05,279 --> 00:14:09,839 petitioning although that's not as good 360 00:14:07,839 --> 00:14:12,160 as your factor analysis because network 361 00:14:09,839 --> 00:14:14,800 partitioning algorithms will force you 362 00:14:12,160 --> 00:14:16,320 to classify paper in one theme or the 363 00:14:14,800 --> 00:14:18,480 other when in reality they could be 364 00:14:16,320 --> 00:14:19,680 related to to more than one different 365 00:14:18,480 --> 00:14:20,880 theme 366 00:14:19,680 --> 00:14:21,920 um 367 00:14:20,880 --> 00:14:23,519 but 368 00:14:21,920 --> 00:14:25,120 the other useful thing that you can do 369 00:14:23,519 --> 00:14:27,519 with your network is to calculate 370 00:14:25,120 --> 00:14:29,360 various measures of centrality so you 371 00:14:27,519 --> 00:14:31,600 have your things like your degree 372 00:14:29,360 --> 00:14:33,760 centrality which is the number of times 373 00:14:31,600 --> 00:14:35,839 a paper is cited by others in that 374 00:14:33,760 --> 00:14:37,760 network um all the way through to 375 00:14:35,839 --> 00:14:40,000 something like betweenness centrality 376 00:14:37,760 --> 00:14:42,880 which means that your paper kind of 377 00:14:40,000 --> 00:14:45,680 forms a node in a very in a frequent 378 00:14:42,880 --> 00:14:48,079 path a path that's frequently um 379 00:14:45,680 --> 00:14:49,600 used if you were to run an algorithm 380 00:14:48,079 --> 00:14:51,680 going from one 381 00:14:49,600 --> 00:14:53,760 side of your network to the other for 382 00:14:51,680 --> 00:14:55,600 instance and that could indicate a high 383 00:14:53,760 --> 00:14:57,120 between a centrality could indicate that 384 00:14:55,600 --> 00:15:00,240 that paper has some kind of boundary 385 00:14:57,120 --> 00:15:01,760 spanning properties um and is is 386 00:15:00,240 --> 00:15:06,079 expanding across different research 387 00:15:01,760 --> 00:15:06,079 ideas or different research silos 388 00:15:06,320 --> 00:15:10,160 there are lots of python packages out 389 00:15:08,320 --> 00:15:12,639 there that will help you do network 390 00:15:10,160 --> 00:15:14,639 analysis i used igraph because it is 391 00:15:12,639 --> 00:15:17,440 what i'm most familiar with but there's 392 00:15:14,639 --> 00:15:20,320 the aforementioned network x and at 393 00:15:17,440 --> 00:15:22,079 python this year i went to a talk by 394 00:15:20,320 --> 00:15:25,040 someone who is 395 00:15:22,079 --> 00:15:27,199 helping out with kg lab which is a 396 00:15:25,040 --> 00:15:28,240 python project kind of looking to 397 00:15:27,199 --> 00:15:30,160 integrate 398 00:15:28,240 --> 00:15:33,120 all of these things 399 00:15:30,160 --> 00:15:35,120 um and whilst these python packages will 400 00:15:33,120 --> 00:15:39,040 draw decent represent visual 401 00:15:35,120 --> 00:15:40,959 representations of the raw networks i do 402 00:15:39,040 --> 00:15:43,440 recommend using software the graphic 403 00:15:40,959 --> 00:15:45,920 user interface because that will make it 404 00:15:43,440 --> 00:15:48,000 easier for you to play with your visual 405 00:15:45,920 --> 00:15:49,120 properties if you are doing your network 406 00:15:48,000 --> 00:15:51,920 diagrams 407 00:15:49,120 --> 00:15:54,000 um doing graphics programmatically can 408 00:15:51,920 --> 00:15:54,720 be a faff 409 00:15:54,000 --> 00:15:58,000 so 410 00:15:54,720 --> 00:16:00,000 step five interpretation of results 411 00:15:58,000 --> 00:16:02,880 and validation 412 00:16:00,000 --> 00:16:05,600 um so the important thing here is to go 413 00:16:02,880 --> 00:16:07,440 back and look at each of the subgroups 414 00:16:05,600 --> 00:16:09,360 of papers that you have identified so 415 00:16:07,440 --> 00:16:12,959 your factor analysis will give every 416 00:16:09,360 --> 00:16:15,279 individual paper a score for each factor 417 00:16:12,959 --> 00:16:18,079 and you look at the groups of paper 418 00:16:15,279 --> 00:16:20,880 which uh have a high score within each 419 00:16:18,079 --> 00:16:23,600 factor usually above 0.3 or 0.5 420 00:16:20,880 --> 00:16:24,880 depending on what tolerance you choose 421 00:16:23,600 --> 00:16:27,600 and so you look at all those high 422 00:16:24,880 --> 00:16:28,720 scoring papers in each under each factor 423 00:16:27,600 --> 00:16:30,399 as a group 424 00:16:28,720 --> 00:16:32,320 um and you can do things like 425 00:16:30,399 --> 00:16:34,160 descriptive statistics looking at the 426 00:16:32,320 --> 00:16:35,920 type the the universities and the 427 00:16:34,160 --> 00:16:38,720 countries that are contributing research 428 00:16:35,920 --> 00:16:41,199 in that subgroup um but most importantly 429 00:16:38,720 --> 00:16:43,440 is to do your qualitative analysis and 430 00:16:41,199 --> 00:16:45,199 now while this might seem quite daunting 431 00:16:43,440 --> 00:16:48,560 when you have thousands of research 432 00:16:45,199 --> 00:16:50,560 papers it is actually quite gratifying 433 00:16:48,560 --> 00:16:52,399 at least in my experience 434 00:16:50,560 --> 00:16:54,240 when even just skimming through the 435 00:16:52,399 --> 00:16:57,040 titles of those subgroups just how 436 00:16:54,240 --> 00:17:00,079 quickly those research themes emerge and 437 00:16:57,040 --> 00:17:02,839 how easy it is to delineate those 438 00:17:00,079 --> 00:17:05,600 different groups of papers within your 439 00:17:02,839 --> 00:17:07,280 network um and to give you a better idea 440 00:17:05,600 --> 00:17:09,600 of the type of insights that are 441 00:17:07,280 --> 00:17:11,919 possible for from citation network 442 00:17:09,600 --> 00:17:13,600 analysis i would like to run through 443 00:17:11,919 --> 00:17:16,799 some of the key figures 444 00:17:13,600 --> 00:17:18,880 from my upcoming paper 445 00:17:16,799 --> 00:17:21,039 um so this is just a raw citation 446 00:17:18,880 --> 00:17:24,000 network showing a number of different 447 00:17:21,039 --> 00:17:27,600 keyword searches 448 00:17:24,000 --> 00:17:29,600 this shows the kind of research output 449 00:17:27,600 --> 00:17:31,760 that was um 450 00:17:29,600 --> 00:17:33,840 downloaded for each of those keyword 451 00:17:31,760 --> 00:17:36,559 searches over time and you can see the 452 00:17:33,840 --> 00:17:37,760 trends in those different fields 453 00:17:36,559 --> 00:17:39,039 over time 454 00:17:37,760 --> 00:17:41,600 with my 455 00:17:39,039 --> 00:17:43,840 fields kind of modest 456 00:17:41,600 --> 00:17:46,320 modest increment kind of being dwarfed 457 00:17:43,840 --> 00:17:48,160 by exponential increases in the interest 458 00:17:46,320 --> 00:17:50,720 in things like smart cities and urban 459 00:17:48,160 --> 00:17:52,320 science and urban analytics 460 00:17:50,720 --> 00:17:55,200 um 461 00:17:52,320 --> 00:17:57,679 this is a diagram that shows kind of the 462 00:17:55,200 --> 00:18:01,760 first order relationships between 463 00:17:57,679 --> 00:18:03,919 various different keyword searches 464 00:18:01,760 --> 00:18:06,160 um and how closely though those 465 00:18:03,919 --> 00:18:08,799 different areas are related 466 00:18:06,160 --> 00:18:10,640 and this is my 467 00:18:08,799 --> 00:18:13,039 research corpus itself so this is the 468 00:18:10,640 --> 00:18:15,679 planning support systems literature and 469 00:18:13,039 --> 00:18:18,400 it's been colored by the highest scoring 470 00:18:15,679 --> 00:18:20,400 factor for each paper and gratifyingly 471 00:18:18,400 --> 00:18:21,200 you can already see those clusters kind 472 00:18:20,400 --> 00:18:22,320 of 473 00:18:21,200 --> 00:18:23,200 within the 474 00:18:22,320 --> 00:18:27,120 raw 475 00:18:23,200 --> 00:18:29,840 um citation network as well 476 00:18:27,120 --> 00:18:31,600 and the size of each of the little 477 00:18:29,840 --> 00:18:33,520 bubbles is corresponds to the number of 478 00:18:31,600 --> 00:18:34,960 citations 479 00:18:33,520 --> 00:18:35,760 and like 480 00:18:34,960 --> 00:18:39,120 you 481 00:18:35,760 --> 00:18:40,880 and similar to the having those large 482 00:18:39,120 --> 00:18:42,400 those few papers with a large number of 483 00:18:40,880 --> 00:18:44,320 citations that's quite 484 00:18:42,400 --> 00:18:46,000 quite normal in the citation network so 485 00:18:44,320 --> 00:18:47,919 you'll have 486 00:18:46,000 --> 00:18:50,559 um 487 00:18:47,919 --> 00:18:52,640 a it's kind of like any social network 488 00:18:50,559 --> 00:18:55,679 it follows a power law you'll find that 489 00:18:52,640 --> 00:18:57,919 there are papers that act as hubs within 490 00:18:55,679 --> 00:19:00,080 the networks which have exponentially 491 00:18:57,919 --> 00:19:01,679 more citations 492 00:19:00,080 --> 00:19:03,520 um than 493 00:19:01,679 --> 00:19:05,600 the majority of 494 00:19:03,520 --> 00:19:08,400 papers within your network which might 495 00:19:05,600 --> 00:19:10,240 only have one or two 496 00:19:08,400 --> 00:19:12,080 uh you can do these kinds of descriptive 497 00:19:10,240 --> 00:19:14,880 statistics on the metadata like the 498 00:19:12,080 --> 00:19:17,440 countries the institutions um the most 499 00:19:14,880 --> 00:19:20,400 prolific authors 500 00:19:17,440 --> 00:19:23,200 here's another representation that i did 501 00:19:20,400 --> 00:19:27,200 of my different research streams kind of 502 00:19:23,200 --> 00:19:29,120 showing the overlap between the papers 503 00:19:27,200 --> 00:19:30,000 within those different research streams 504 00:19:29,120 --> 00:19:31,919 and again those bubbles are 505 00:19:30,000 --> 00:19:35,679 proportionate to the number 506 00:19:31,919 --> 00:19:38,240 of papers within the research streams 507 00:19:35,679 --> 00:19:40,160 uh here's another showing each of my 508 00:19:38,240 --> 00:19:42,080 individual factors or research streams 509 00:19:40,160 --> 00:19:45,120 to change changing those different 510 00:19:42,080 --> 00:19:45,120 groups over time 511 00:19:45,520 --> 00:19:48,799 so 512 00:19:46,559 --> 00:19:51,440 um that's some of the things that you 513 00:19:48,799 --> 00:19:53,360 can do with it and in conclusion 514 00:19:51,440 --> 00:19:55,039 performing a citation network analysis 515 00:19:53,360 --> 00:19:56,480 of course doesn't replace a traditional 516 00:19:55,039 --> 00:19:58,000 literature review 517 00:19:56,480 --> 00:19:59,440 doing a citation network analysis 518 00:19:58,000 --> 00:20:01,200 doesn't tell you 519 00:19:59,440 --> 00:20:02,960 um anything about the quality of the 520 00:20:01,200 --> 00:20:05,840 research itself or even that much about 521 00:20:02,960 --> 00:20:08,320 its findings but it is a really good 522 00:20:05,840 --> 00:20:10,000 shortcut to kind of that overall view of 523 00:20:08,320 --> 00:20:11,600 your research field and identifying 524 00:20:10,000 --> 00:20:14,720 those key players 525 00:20:11,600 --> 00:20:17,679 and those key lines of inquiry 526 00:20:14,720 --> 00:20:19,280 um one word of warning for anyone who is 527 00:20:17,679 --> 00:20:21,440 going to attempt this or 528 00:20:19,280 --> 00:20:23,039 advising anyone to attempt this that 529 00:20:21,440 --> 00:20:25,039 there are hundreds of different 530 00:20:23,039 --> 00:20:27,600 statistics that can be calculated from 531 00:20:25,039 --> 00:20:29,760 these citation data sets uh which was a 532 00:20:27,600 --> 00:20:31,200 little overwhelming and that they're for 533 00:20:29,760 --> 00:20:32,880 and then it's kind of a challenge for 534 00:20:31,200 --> 00:20:35,760 the researcher to condense all of these 535 00:20:32,880 --> 00:20:37,919 statistics into some kind of meaningful 536 00:20:35,760 --> 00:20:40,240 and useful story 537 00:20:37,919 --> 00:20:42,960 finally though to reiterate academic 538 00:20:40,240 --> 00:20:45,440 literature is growing exponentially and 539 00:20:42,960 --> 00:20:48,080 opening this data and these tools will 540 00:20:45,440 --> 00:20:50,799 become more and more important both as a 541 00:20:48,080 --> 00:20:53,120 means of just retaining 542 00:20:50,799 --> 00:20:54,240 the ability to keep track of research 543 00:20:53,120 --> 00:20:56,400 findings 544 00:20:54,240 --> 00:20:59,280 um which will prevent duplication of 545 00:20:56,400 --> 00:21:02,960 effort but also make research more 546 00:20:59,280 --> 00:21:06,159 accessible to everyone 547 00:21:02,960 --> 00:21:08,320 here is the main how-to text that i use 548 00:21:06,159 --> 00:21:11,200 when i perform my own analysis and it's 549 00:21:08,320 --> 00:21:13,120 a really useful textbook that is free to 550 00:21:11,200 --> 00:21:14,080 download 551 00:21:13,120 --> 00:21:15,919 um 552 00:21:14,080 --> 00:21:17,919 also if you're in a specific field i 553 00:21:15,919 --> 00:21:19,919 would recommend that you look for 554 00:21:17,919 --> 00:21:21,520 previously published papers that utilize 555 00:21:19,919 --> 00:21:22,640 this method to get an 556 00:21:21,520 --> 00:21:24,960 idea 557 00:21:22,640 --> 00:21:26,720 of how how it could be used in your own 558 00:21:24,960 --> 00:21:30,480 speciality 559 00:21:26,720 --> 00:21:32,640 um the full paper of my analysis will be 560 00:21:30,480 --> 00:21:35,039 published very shortly it's been 561 00:21:32,640 --> 00:21:37,919 accepted and i've submitted the final 562 00:21:35,039 --> 00:21:40,159 documents when it is published i will 563 00:21:37,919 --> 00:21:42,480 make sure the pre-print will go up on my 564 00:21:40,159 --> 00:21:46,320 personal website and the code will be 565 00:21:42,480 --> 00:21:47,360 available on my github account 566 00:21:46,320 --> 00:21:50,080 so 567 00:21:47,360 --> 00:21:52,799 thank you everyone and 568 00:21:50,080 --> 00:21:54,799 happy to take any discussion or any 569 00:21:52,799 --> 00:21:57,200 questions 570 00:21:54,799 --> 00:22:00,400 wonderful thank you so much claire 571 00:21:57,200 --> 00:22:03,039 um we currently have two questions uh 572 00:22:00,400 --> 00:22:05,200 first one is i'm on the periphery of the 573 00:22:03,039 --> 00:22:07,679 research world this seems to be useful 574 00:22:05,200 --> 00:22:10,080 to use uh connections between papers 575 00:22:07,679 --> 00:22:13,600 rather than the content of the paper is 576 00:22:10,080 --> 00:22:15,600 that another area of research 577 00:22:13,600 --> 00:22:18,080 connections between papers rather than 578 00:22:15,600 --> 00:22:20,080 content so 579 00:22:18,080 --> 00:22:21,120 okay so yes 580 00:22:20,080 --> 00:22:23,679 um 581 00:22:21,120 --> 00:22:26,000 the citation network analysis is good 582 00:22:23,679 --> 00:22:27,520 for looking at getting that overall 583 00:22:26,000 --> 00:22:30,400 structure 584 00:22:27,520 --> 00:22:32,000 of the field um 585 00:22:30,400 --> 00:22:33,520 and those identifying those 586 00:22:32,000 --> 00:22:35,200 relationships and those different 587 00:22:33,520 --> 00:22:37,760 research streams 588 00:22:35,200 --> 00:22:39,360 you might identify kind of silos of 589 00:22:37,760 --> 00:22:41,520 thought or or 590 00:22:39,360 --> 00:22:43,039 kind of different groups of universities 591 00:22:41,520 --> 00:22:44,559 that might be collaborating really 592 00:22:43,039 --> 00:22:45,360 closely together 593 00:22:44,559 --> 00:22:47,840 um 594 00:22:45,360 --> 00:22:49,520 it's a really good way of making sure 595 00:22:47,840 --> 00:22:53,200 you don't miss 596 00:22:49,520 --> 00:22:55,200 those really key pieces of research that 597 00:22:53,200 --> 00:22:57,440 have been done before that have been 598 00:22:55,200 --> 00:22:58,159 cited thousands of times and because 599 00:22:57,440 --> 00:22:59,840 they 600 00:22:58,159 --> 00:23:02,159 i know i've done that before and i've 601 00:22:59,840 --> 00:23:04,000 always felt stupid when i'm like oh i 602 00:23:02,159 --> 00:23:05,280 should have known about this 603 00:23:04,000 --> 00:23:08,320 um 604 00:23:05,280 --> 00:23:11,039 but yes it doesn't it doesn't help you 605 00:23:08,320 --> 00:23:13,600 really evaluate the quality of the 606 00:23:11,039 --> 00:23:16,080 research as such so you still need to do 607 00:23:13,600 --> 00:23:17,919 your traditional literature review or 608 00:23:16,080 --> 00:23:20,400 you can do another type of systematic 609 00:23:17,919 --> 00:23:23,679 literature review where you do 610 00:23:20,400 --> 00:23:26,799 go in and read the papers exhaustively 611 00:23:23,679 --> 00:23:28,240 and that kind of thing um yeah so this 612 00:23:26,799 --> 00:23:29,919 is just another 613 00:23:28,240 --> 00:23:31,840 another useful tool and those tools are 614 00:23:29,919 --> 00:23:34,080 being um 615 00:23:31,840 --> 00:23:36,320 with the with this data becoming so much 616 00:23:34,080 --> 00:23:38,559 more open and a number of those open 617 00:23:36,320 --> 00:23:40,159 source projects both with the apis but 618 00:23:38,559 --> 00:23:43,600 also um 619 00:23:40,159 --> 00:23:46,640 the free open source software which 620 00:23:43,600 --> 00:23:48,480 anyone can can download and use 621 00:23:46,640 --> 00:23:51,279 without necessarily 622 00:23:48,480 --> 00:23:53,679 a statistics degree or a computer 623 00:23:51,279 --> 00:23:55,440 programming degree um 624 00:23:53,679 --> 00:23:57,200 it's kind of something that i would 625 00:23:55,440 --> 00:24:00,320 recommend most people give a go before 626 00:23:57,200 --> 00:24:02,799 they start a major research project 627 00:24:00,320 --> 00:24:06,240 yeah interestingly um because i work in 628 00:24:02,799 --> 00:24:07,520 the library sector we um try and link up 629 00:24:06,240 --> 00:24:10,080 between universities and their 630 00:24:07,520 --> 00:24:12,080 repositories as to who's got the same 631 00:24:10,080 --> 00:24:13,600 papers and if there's duplicates do we 632 00:24:12,080 --> 00:24:14,880 really need to store the duplicates so 633 00:24:13,600 --> 00:24:15,840 that would be an interesting study as 634 00:24:14,880 --> 00:24:18,159 well 635 00:24:15,840 --> 00:24:20,720 um another question for you did claire 636 00:24:18,159 --> 00:24:23,600 consider creating an intermediary mind 637 00:24:20,720 --> 00:24:26,080 map of what she had found or taxonomy so 638 00:24:23,600 --> 00:24:29,120 adding a halfway step that is more an 639 00:24:26,080 --> 00:24:32,120 art than science 640 00:24:29,120 --> 00:24:32,120 right 641 00:24:32,240 --> 00:24:37,200 um 642 00:24:34,640 --> 00:24:40,080 i suppose 643 00:24:37,200 --> 00:24:41,440 what i did to make sense of it so when 644 00:24:40,080 --> 00:24:44,640 you get your 645 00:24:41,440 --> 00:24:46,880 results from your factor analysis 646 00:24:44,640 --> 00:24:49,279 basically it will have 647 00:24:46,880 --> 00:24:51,840 it will give you an eigenvalue 648 00:24:49,279 --> 00:24:54,080 for each factor and that kind of 649 00:24:51,840 --> 00:24:56,320 represents 650 00:24:54,080 --> 00:24:57,440 it kind of gives it each factor scored 651 00:24:56,320 --> 00:24:58,840 and 652 00:24:57,440 --> 00:25:02,720 which tells you 653 00:24:58,840 --> 00:25:04,880 the um how well that factor 654 00:25:02,720 --> 00:25:06,960 is describing what percentage of 655 00:25:04,880 --> 00:25:09,840 variation in your network that factor is 656 00:25:06,960 --> 00:25:11,360 describing how important is it to the 657 00:25:09,840 --> 00:25:13,440 entire structure of the network and it 658 00:25:11,360 --> 00:25:15,520 will you'll usually have a few that have 659 00:25:13,440 --> 00:25:17,120 really high and then it will 660 00:25:15,520 --> 00:25:18,559 will drop off 661 00:25:17,120 --> 00:25:20,400 kind of like that 662 00:25:18,559 --> 00:25:23,630 and you kind of go one two three four 663 00:25:20,400 --> 00:25:25,440 five so the numbers on my um 664 00:25:23,630 --> 00:25:29,840 [Music] 665 00:25:25,440 --> 00:25:29,840 let me see if i can bring up my 666 00:25:30,080 --> 00:25:37,200 screen again 667 00:25:32,720 --> 00:25:38,960 am i still showing my screen yeah oops 668 00:25:37,200 --> 00:25:42,320 let's see how we go 669 00:25:38,960 --> 00:25:45,200 so if i skip through 670 00:25:42,320 --> 00:25:45,200 and look at 671 00:25:47,120 --> 00:25:52,799 that's not that's not helping 672 00:25:50,159 --> 00:25:54,400 stop sharing let's try again share share 673 00:25:52,799 --> 00:25:59,279 screen 674 00:25:54,400 --> 00:26:02,880 share screen uh screen too 675 00:25:59,279 --> 00:26:04,480 and then people can probably see that 676 00:26:02,880 --> 00:26:06,720 i'll just bring out my powerpoint that 677 00:26:04,480 --> 00:26:10,159 might be easier okay 678 00:26:06,720 --> 00:26:11,679 sorry back on track 679 00:26:10,159 --> 00:26:15,200 all right so each of those numbers 680 00:26:11,679 --> 00:26:17,840 represents the kind of ranked order of 681 00:26:15,200 --> 00:26:21,279 those factors and what i've done 682 00:26:17,840 --> 00:26:23,360 is kind of grouped them 683 00:26:21,279 --> 00:26:25,919 in um 684 00:26:23,360 --> 00:26:28,159 brought four broader things 685 00:26:25,919 --> 00:26:30,559 um and i've done that by analyzing how 686 00:26:28,159 --> 00:26:31,600 much they overlap in a more quantitative 687 00:26:30,559 --> 00:26:33,840 way 688 00:26:31,600 --> 00:26:36,960 um but there's also a little bit of 689 00:26:33,840 --> 00:26:36,960 qualitative kind of 690 00:26:37,279 --> 00:26:41,279 yeah assigning one 691 00:26:39,360 --> 00:26:44,159 assigning a bubble to a category 692 00:26:41,279 --> 00:26:47,120 particularly on those edges um was 693 00:26:44,159 --> 00:26:48,159 was a qualitative decision so 694 00:26:47,120 --> 00:26:49,360 yeah 695 00:26:48,159 --> 00:26:51,440 wonderful 696 00:26:49,360 --> 00:26:53,520 next question as you say there are many 697 00:26:51,440 --> 00:26:55,279 different analysis options for your 698 00:26:53,520 --> 00:26:57,120 network as well as 699 00:26:55,279 --> 00:26:59,279 all the choices for how you construct 700 00:26:57,120 --> 00:27:01,279 your network can you tell us more about 701 00:26:59,279 --> 00:27:03,840 how you determined what story you wanted 702 00:27:01,279 --> 00:27:06,799 to tell and which technical choices 703 00:27:03,840 --> 00:27:07,840 would be useful for that 704 00:27:06,799 --> 00:27:09,760 right 705 00:27:07,840 --> 00:27:12,080 okay so the first major decision was 706 00:27:09,760 --> 00:27:14,240 co-citation of google electric coupling 707 00:27:12,080 --> 00:27:16,799 um i went with bibliographic coding 708 00:27:14,240 --> 00:27:19,200 because i knew there would be a lot of 709 00:27:16,799 --> 00:27:20,960 useful papers that probably haven't had 710 00:27:19,200 --> 00:27:23,440 many citations 711 00:27:20,960 --> 00:27:25,679 um it's still limited citation network 712 00:27:23,440 --> 00:27:27,120 analysis is better at looking at what's 713 00:27:25,679 --> 00:27:28,960 happened in the past and what is 714 00:27:27,120 --> 00:27:30,640 happening right now or what might happen 715 00:27:28,960 --> 00:27:33,919 in the future 716 00:27:30,640 --> 00:27:36,159 um so i went bibliographic coupling and 717 00:27:33,919 --> 00:27:36,159 then 718 00:27:36,320 --> 00:27:41,520 then yeah it took it took a while i just 719 00:27:39,200 --> 00:27:42,559 generated big dashboards for every 720 00:27:41,520 --> 00:27:44,640 single 721 00:27:42,559 --> 00:27:46,720 factor and you just go through it and 722 00:27:44,640 --> 00:27:49,120 you're like okay this this this thing is 723 00:27:46,720 --> 00:27:51,919 different about this factor suddenly 724 00:27:49,120 --> 00:27:54,720 twice as many researchers from china are 725 00:27:51,919 --> 00:27:58,240 kind of in this bubble why is that 726 00:27:54,720 --> 00:28:00,880 um so going through those like you can 727 00:27:58,240 --> 00:28:03,760 get python to print out however many 728 00:28:00,880 --> 00:28:05,679 charts of your most frequent words or 729 00:28:03,760 --> 00:28:06,960 your institutions and countries and all 730 00:28:05,679 --> 00:28:07,919 that kind of thing 731 00:28:06,960 --> 00:28:09,440 um 732 00:28:07,919 --> 00:28:11,760 yeah and just combing through and being 733 00:28:09,440 --> 00:28:13,760 like okay that's that's different that's 734 00:28:11,760 --> 00:28:16,080 different why is that different 735 00:28:13,760 --> 00:28:19,600 um but it is it is a time consuming 736 00:28:16,080 --> 00:28:22,000 process to make it 737 00:28:19,600 --> 00:28:24,799 to to kind of go through all of that 738 00:28:22,000 --> 00:28:26,399 information even if it is technically 739 00:28:24,799 --> 00:28:28,159 especially if you may be using a piece 740 00:28:26,399 --> 00:28:30,480 of software you might be able to bring 741 00:28:28,159 --> 00:28:32,320 up your network in a matter of minutes 742 00:28:30,480 --> 00:28:35,919 but um yeah 743 00:28:32,320 --> 00:28:37,440 making sense of it can take a while 744 00:28:35,919 --> 00:28:40,320 uh i recommend looking for those 745 00:28:37,440 --> 00:28:40,320 outliers yeah 746 00:28:40,399 --> 00:28:43,120 one last question before we go to the 747 00:28:42,000 --> 00:28:44,960 break 748 00:28:43,120 --> 00:28:47,440 have you been able to determine if a 749 00:28:44,960 --> 00:28:50,799 citation provider has an impact on 750 00:28:47,440 --> 00:28:52,559 articles that are unearthed 751 00:28:50,799 --> 00:28:54,720 how's the impact of articles that are on 752 00:28:52,559 --> 00:28:57,360 earth i have not personally done that 753 00:28:54,720 --> 00:28:58,799 research i am certain 754 00:28:57,360 --> 00:29:01,760 that there are 755 00:28:58,799 --> 00:29:03,200 papers out there with people who have 756 00:29:01,760 --> 00:29:05,120 looked at that 757 00:29:03,200 --> 00:29:08,080 um so 758 00:29:05,120 --> 00:29:11,039 the person who 759 00:29:08,080 --> 00:29:13,679 did this kind of analysis 760 00:29:11,039 --> 00:29:15,440 is probably a good place to start 761 00:29:13,679 --> 00:29:17,679 in terms of 762 00:29:15,440 --> 00:29:19,520 that kind of research so they've been 763 00:29:17,679 --> 00:29:21,919 really interested in the coverage of all 764 00:29:19,520 --> 00:29:23,840 these different citation databases 765 00:29:21,919 --> 00:29:27,039 um there was another study at flight 766 00:29:23,840 --> 00:29:29,360 past that looked at the quality of the 767 00:29:27,039 --> 00:29:31,120 search functions um and there's probably 768 00:29:29,360 --> 00:29:33,600 various people who could do statistical 769 00:29:31,120 --> 00:29:37,360 analysis looking at 770 00:29:33,600 --> 00:29:40,399 um the correlation between citations and 771 00:29:37,360 --> 00:29:43,520 it's it's sourced within a database or 772 00:29:40,399 --> 00:29:46,480 it's accessibility within any of these 773 00:29:43,520 --> 00:29:46,480 these kind of tools 774 00:29:46,720 --> 00:29:50,559 wonderful well thank you so much if 775 00:29:48,799 --> 00:29:53,840 anyone has any other questions for 776 00:29:50,559 --> 00:29:56,559 claire claire will be in the 777 00:29:53,840 --> 00:29:58,799 go glam community chat 778 00:29:56,559 --> 00:30:02,720 channel and 779 00:29:58,799 --> 00:30:07,000 we will be back with bonnie at 11 30 a.m 780 00:30:02,720 --> 00:30:07,000 and we'll see you there thanks