1 00:00:00,480 --> 00:00:03,480 foreign 2 00:00:08,280 --> 00:00:12,420 we are delighted to be welcoming Amanda 3 00:00:10,380 --> 00:00:14,940 remotely so Amanda are you there yes 4 00:00:12,420 --> 00:00:16,920 fantastic hi Amanda can you hear me 5 00:00:14,940 --> 00:00:18,840 I can thank you so much I'm delighted to 6 00:00:16,920 --> 00:00:20,760 be here awesome and our audio levels are 7 00:00:18,840 --> 00:00:22,439 great in the room to hear you as well so 8 00:00:20,760 --> 00:00:25,500 without too much delay we're gonna hand 9 00:00:22,439 --> 00:00:28,260 over Amanda take it away 10 00:00:25,500 --> 00:00:30,000 thank you so much hello again and thank 11 00:00:28,260 --> 00:00:32,099 you everyone for joining me today to 12 00:00:30,000 --> 00:00:34,380 talk about navigating the murky areas 13 00:00:32,099 --> 00:00:36,719 where ethical legal and policy 14 00:00:34,380 --> 00:00:38,460 challenges present when measuring open 15 00:00:36,719 --> 00:00:40,680 source 16 00:00:38,460 --> 00:00:42,600 um so to start off some admin notes for 17 00:00:40,680 --> 00:00:44,579 Those whom it helps you can find these 18 00:00:42,600 --> 00:00:49,260 slides and very detailed speaker notes 19 00:00:44,579 --> 00:00:51,059 at bitly that's bit.ly backslash OSS 20 00:00:49,260 --> 00:00:53,000 Dash 21 00:00:51,059 --> 00:00:56,719 uh 22 00:00:53,000 --> 00:00:56,719 d-r-a-g-o-n-s dash 23 00:00:59,059 --> 00:01:03,960 r-a-i-n-b-o-w-s-n-e-k-2023 dash lights 24 00:01:02,699 --> 00:01:05,640 um I would also like to begin by 25 00:01:03,960 --> 00:01:07,439 acknowledging and paying respect to the 26 00:01:05,640 --> 00:01:10,500 traditional owners of the various lands 27 00:01:07,439 --> 00:01:11,880 on which I am calling from today I'm 28 00:01:10,500 --> 00:01:13,740 joining you from the land which is 29 00:01:11,880 --> 00:01:16,080 served as a site of sustenance community 30 00:01:13,740 --> 00:01:18,420 meeting and exchange among indigenous 31 00:01:16,080 --> 00:01:20,159 peoples since time immemorial the 32 00:01:18,420 --> 00:01:22,080 Western Abenaki are the traditional 33 00:01:20,159 --> 00:01:24,780 stewards of these Forest lands and 34 00:01:22,080 --> 00:01:27,180 Waters in which they call indakina or 35 00:01:24,780 --> 00:01:29,040 Homeland we respect their spiritual and 36 00:01:27,180 --> 00:01:30,479 lived connections to this region and 37 00:01:29,040 --> 00:01:32,880 remember the hardships they've endured 38 00:01:30,479 --> 00:01:34,740 both past and present including violence 39 00:01:32,880 --> 00:01:37,680 and forced displacement at the hands of 40 00:01:34,740 --> 00:01:39,180 colonizing peoples As We Gather for this 41 00:01:37,680 --> 00:01:41,340 physically dispersed and virtually 42 00:01:39,180 --> 00:01:42,600 constructed meeting as we share our own 43 00:01:41,340 --> 00:01:44,460 knowledge teaching and learning 44 00:01:42,600 --> 00:01:46,200 practices may we give thanks for the 45 00:01:44,460 --> 00:01:48,600 opportunity to share in the joys of this 46 00:01:46,200 --> 00:01:50,759 place and protect it 47 00:01:48,600 --> 00:01:52,259 I would also like to let everyone know 48 00:01:50,759 --> 00:01:54,720 that I will be speaking today both 49 00:01:52,259 --> 00:01:56,700 directly and indirectly about exclusion 50 00:01:54,720 --> 00:01:58,740 the potential and reality of harm 51 00:01:56,700 --> 00:02:00,420 against marginalized groups and how 52 00:01:58,740 --> 00:02:03,119 technology can be used to identify 53 00:02:00,420 --> 00:02:05,159 people who have not chosen that if these 54 00:02:03,119 --> 00:02:07,259 topics are best covered elsewhere in a 55 00:02:05,159 --> 00:02:08,520 different format or not at all for you I 56 00:02:07,259 --> 00:02:10,739 wish you all the warmth and best 57 00:02:08,520 --> 00:02:12,840 community this space has to offer you 58 00:02:10,739 --> 00:02:15,780 will have time before I get into these 59 00:02:12,840 --> 00:02:19,280 topics to not be present anymore in the 60 00:02:15,780 --> 00:02:19,280 space if that works best for you 61 00:02:19,379 --> 00:02:24,599 so some context about myself I am a pale 62 00:02:23,040 --> 00:02:26,879 white woman with light hair and eyes who 63 00:02:24,599 --> 00:02:27,840 wears glasses uh I my hair is different 64 00:02:26,879 --> 00:02:29,819 now 65 00:02:27,840 --> 00:02:31,200 um I am a researcher and engineer at 66 00:02:29,819 --> 00:02:32,879 Google where I'm currently leading a 67 00:02:31,200 --> 00:02:35,160 team focused on research and education 68 00:02:32,879 --> 00:02:37,260 and our open source programs office 69 00:02:35,160 --> 00:02:40,140 I'm also one of Google's co-leads for 70 00:02:37,260 --> 00:02:42,239 project ocean and across with the 71 00:02:40,140 --> 00:02:43,940 indomitable Katie McLaughlin who I know 72 00:02:42,239 --> 00:02:46,379 is also part of this lovely Community 73 00:02:43,940 --> 00:02:48,000 external faculty for the Vermont complex 74 00:02:46,379 --> 00:02:49,620 system center and I'm also a co-founder 75 00:02:48,000 --> 00:02:52,140 for open source stories with Julia 76 00:02:49,620 --> 00:02:53,900 farioli I sit on the board of directors 77 00:02:52,140 --> 00:02:56,819 for the computational Democracy project 78 00:02:53,900 --> 00:02:59,160 I once wrote a book with Alice Zang on 79 00:02:56,819 --> 00:03:01,379 feature engineering I'm also queer a 80 00:02:59,160 --> 00:03:03,660 very proud mama of two smaller humans a 81 00:03:01,379 --> 00:03:05,280 U.S Navy veteran and I'm lucky enough to 82 00:03:03,660 --> 00:03:07,680 live currently in the state of Vermont 83 00:03:05,280 --> 00:03:09,720 in the United States 84 00:03:07,680 --> 00:03:12,360 and while I am fortunate enough 85 00:03:09,720 --> 00:03:14,519 to be here my state and the areas I love 86 00:03:12,360 --> 00:03:16,680 continue to experience life-altering 87 00:03:14,519 --> 00:03:19,080 flooding as the direct result of human 88 00:03:16,680 --> 00:03:20,400 accelerated climate change if you want 89 00:03:19,080 --> 00:03:21,480 to learn more about what Vermont is 90 00:03:20,400 --> 00:03:23,580 experiencing and how you can help 91 00:03:21,480 --> 00:03:24,959 Vermont Public Radio has consistently 92 00:03:23,580 --> 00:03:26,519 the most accurate coverage and updates 93 00:03:24,959 --> 00:03:28,800 for our local communities and those who 94 00:03:26,519 --> 00:03:30,480 care about us I tell all of you this 95 00:03:28,800 --> 00:03:32,220 because it's important for us to share 96 00:03:30,480 --> 00:03:33,659 and understand where we come from and 97 00:03:32,220 --> 00:03:35,220 since I cannot be there with you today 98 00:03:33,659 --> 00:03:38,400 it is important for you to know where my 99 00:03:35,220 --> 00:03:42,000 heart and my head is as we talk today 100 00:03:38,400 --> 00:03:44,700 so to kick off the conversation 101 00:03:42,000 --> 00:03:47,459 um I I wanted to talk about where I 102 00:03:44,700 --> 00:03:50,340 started with working in the space in 103 00:03:47,459 --> 00:03:52,019 2019 I was working in Google Cloud as a 104 00:03:50,340 --> 00:03:54,299 developer relations engineering manager 105 00:03:52,019 --> 00:03:56,159 with a wonderful team and a nagging 106 00:03:54,299 --> 00:03:58,379 feeling about the general understanding 107 00:03:56,159 --> 00:03:59,760 of how open source worked 108 00:03:58,379 --> 00:04:01,500 um so basically I had this kind of 109 00:03:59,760 --> 00:04:03,720 research idea that I couldn't figure out 110 00:04:01,500 --> 00:04:05,400 where to put or what to do with and 111 00:04:03,720 --> 00:04:07,739 fortunately many of my friends from 112 00:04:05,400 --> 00:04:10,200 Google also frequent the half Bakery of 113 00:04:07,739 --> 00:04:12,180 ideas and loved to introduce you to the 114 00:04:10,200 --> 00:04:14,099 friends you haven't met yet and this is 115 00:04:12,180 --> 00:04:16,440 how I met Julia farioli of the open 116 00:04:14,099 --> 00:04:17,940 source programs office together we 117 00:04:16,440 --> 00:04:20,280 developed and pitched the idea for what 118 00:04:17,940 --> 00:04:22,919 would become project ocean a pilot for 119 00:04:20,280 --> 00:04:24,660 the team and moving forward academic 120 00:04:22,919 --> 00:04:27,240 industry and community research of Open 121 00:04:24,660 --> 00:04:29,520 Source at a global scale 122 00:04:27,240 --> 00:04:32,040 so despite launching a new initiative in 123 00:04:29,520 --> 00:04:34,199 early 2020 we were still able to achieve 124 00:04:32,040 --> 00:04:35,759 many of the goals we had and the work 125 00:04:34,199 --> 00:04:37,740 continues with a range of internal 126 00:04:35,759 --> 00:04:40,020 academic and Community Partners today 127 00:04:37,740 --> 00:04:41,759 one of our most audacious goals was 128 00:04:40,020 --> 00:04:43,560 mapping out the entire open source 129 00:04:41,759 --> 00:04:45,660 ecosystem and then sharing it with 130 00:04:43,560 --> 00:04:47,160 everyone and honestly at the time it 131 00:04:45,660 --> 00:04:48,419 didn't feel that audacious because we're 132 00:04:47,160 --> 00:04:50,040 Google we know how to organize 133 00:04:48,419 --> 00:04:52,380 information and make it accessible and 134 00:04:50,040 --> 00:04:54,720 useful or so I was told 135 00:04:52,380 --> 00:04:56,580 uh but one of the first barriers we 136 00:04:54,720 --> 00:04:58,740 continued to run against was defining 137 00:04:56,580 --> 00:05:00,419 the problem space of Open Source with 138 00:04:58,740 --> 00:05:02,940 our stakeholders and the communities we 139 00:05:00,419 --> 00:05:04,680 work with there are many mental models 140 00:05:02,940 --> 00:05:05,639 and analogies about open source that 141 00:05:04,680 --> 00:05:07,320 exist 142 00:05:05,639 --> 00:05:09,139 some of these models Aid our 143 00:05:07,320 --> 00:05:11,699 understanding of a complex ecosystem 144 00:05:09,139 --> 00:05:14,040 however we found that we don't challenge 145 00:05:11,699 --> 00:05:15,720 these enough to truly investigate the 146 00:05:14,040 --> 00:05:18,240 risks and the problems in open source 147 00:05:15,720 --> 00:05:20,820 now rather than continually building for 148 00:05:18,240 --> 00:05:22,440 how we've been burned previously we lack 149 00:05:20,820 --> 00:05:24,300 sufficient baselines to gain a true 150 00:05:22,440 --> 00:05:26,280 census of how many people are involved 151 00:05:24,300 --> 00:05:28,320 in open source how many people are 152 00:05:26,280 --> 00:05:29,699 holding up our technical world and where 153 00:05:28,320 --> 00:05:31,820 we might be at risk of the whole modern 154 00:05:29,699 --> 00:05:33,600 tech Sac simply falling to the ground 155 00:05:31,820 --> 00:05:34,800 organizationally this means that we 156 00:05:33,600 --> 00:05:36,300 can't move forward effectively and 157 00:05:34,800 --> 00:05:38,340 sustainably until we break the popular 158 00:05:36,300 --> 00:05:40,979 but over reductionist model of modern 159 00:05:38,340 --> 00:05:43,620 digital infrastructure 160 00:05:40,979 --> 00:05:45,360 so this XKCD comic is frequently shown 161 00:05:43,620 --> 00:05:47,400 to demonstrate the gaps that exist in 162 00:05:45,360 --> 00:05:48,900 both our understanding and our failure 163 00:05:47,400 --> 00:05:51,660 to adequately respond to the challenge 164 00:05:48,900 --> 00:05:53,160 of Shoring up open source more often 165 00:05:51,660 --> 00:05:54,840 than not it's used to show that the 166 00:05:53,160 --> 00:05:57,300 brittleness of the open source ecosystem 167 00:05:54,840 --> 00:05:59,039 and why further large-scale investment 168 00:05:57,300 --> 00:06:01,440 is critical 169 00:05:59,039 --> 00:06:03,600 my counter to this comic as developed 170 00:06:01,440 --> 00:06:05,460 with Nikki ringland is that the reality 171 00:06:03,600 --> 00:06:08,340 of Open Source is that billions of 172 00:06:05,460 --> 00:06:11,340 dollars are already spent globally on 173 00:06:08,340 --> 00:06:13,139 sustaining open source so the initial 174 00:06:11,340 --> 00:06:15,240 comic has been used effectively to 175 00:06:13,139 --> 00:06:17,400 centralize buckets of money attention 176 00:06:15,240 --> 00:06:19,560 and power and to build walls against 177 00:06:17,400 --> 00:06:21,360 organizationals around organizational 178 00:06:19,560 --> 00:06:23,520 spaces for collaboration 179 00:06:21,360 --> 00:06:25,020 however it isn't that we aren't 180 00:06:23,520 --> 00:06:27,300 investing is that we are making 181 00:06:25,020 --> 00:06:29,039 localized decisions about investment and 182 00:06:27,300 --> 00:06:31,020 support for open source while we lack a 183 00:06:29,039 --> 00:06:33,180 resilient way to approach a global 184 00:06:31,020 --> 00:06:35,220 decentralized and equitably resourced 185 00:06:33,180 --> 00:06:37,199 ecosystem 186 00:06:35,220 --> 00:06:38,880 black swans of Open Source past are 187 00:06:37,199 --> 00:06:41,400 frequently cited as the risk we will see 188 00:06:38,880 --> 00:06:43,440 happen again if we don't invest in a 189 00:06:41,400 --> 00:06:46,919 centralized initiative now now and maybe 190 00:06:43,440 --> 00:06:49,080 yesterday and this may be true or it may 191 00:06:46,919 --> 00:06:51,840 be that despite any amount of investment 192 00:06:49,080 --> 00:06:53,880 or organization the next problem will 193 00:06:51,840 --> 00:06:55,680 present itself in a way that will also 194 00:06:53,880 --> 00:06:57,240 only be clearly visible through 195 00:06:55,680 --> 00:06:59,460 hindsight 196 00:06:57,240 --> 00:07:01,560 so this brings us to the impetus behind 197 00:06:59,460 --> 00:07:03,419 mapping out open source ecosystems as 198 00:07:01,560 --> 00:07:06,000 Julie and I enventioned it for project 199 00:07:03,419 --> 00:07:08,280 ocean quite simply we are looking for 200 00:07:06,000 --> 00:07:10,500 that magic spot of any research which 201 00:07:08,280 --> 00:07:13,380 allows us to have better data and better 202 00:07:10,500 --> 00:07:16,199 models so that we can make better data 203 00:07:13,380 --> 00:07:18,479 informed decisions we want to know where 204 00:07:16,199 --> 00:07:20,400 we are so that we can work collectively 205 00:07:18,479 --> 00:07:22,560 in our communities to figure out where 206 00:07:20,400 --> 00:07:24,000 to go next 207 00:07:22,560 --> 00:07:25,620 so along the way with my many 208 00:07:24,000 --> 00:07:27,360 collaborities and colleagues as I said 209 00:07:25,620 --> 00:07:30,180 including your very own Katie McLaughlin 210 00:07:27,360 --> 00:07:32,099 who I believe We call glassent we have 211 00:07:30,180 --> 00:07:34,560 found ourselves frequently in the spots 212 00:07:32,099 --> 00:07:36,419 on the map that remain unclear where 213 00:07:34,560 --> 00:07:38,220 there's no clear policy or approach to 214 00:07:36,419 --> 00:07:40,740 navigating the data and Abstract 215 00:07:38,220 --> 00:07:42,599 representation of Open Source so this is 216 00:07:40,740 --> 00:07:45,360 where cartographers frequently Mark as 217 00:07:42,599 --> 00:07:49,380 the RB dragons on their Maps the fuzzy 218 00:07:45,360 --> 00:07:51,780 places which may hold treasure or danger 219 00:07:49,380 --> 00:07:53,819 and so today I'd like to walk through a 220 00:07:51,780 --> 00:07:56,099 few of these areas specifically around 221 00:07:53,819 --> 00:07:58,560 the ethical legal and policy challenges 222 00:07:56,099 --> 00:08:00,360 we continue to navigate and as much as 223 00:07:58,560 --> 00:08:02,759 I'd like to have clear well-lit paths 224 00:08:00,360 --> 00:08:05,039 for you to also navigate these Waters I 225 00:08:02,759 --> 00:08:06,960 don't I can share what I've learned what 226 00:08:05,039 --> 00:08:08,460 we should talk about as a community and 227 00:08:06,960 --> 00:08:09,840 what questions researchers can ask 228 00:08:08,460 --> 00:08:12,240 themselves When approaching the same 229 00:08:09,840 --> 00:08:13,979 challenge My Hope Is that together we 230 00:08:12,240 --> 00:08:15,720 continue to challenge assumptions avoid 231 00:08:13,979 --> 00:08:17,160 the most harm and find the best way 232 00:08:15,720 --> 00:08:19,500 forward each time we approach the home 233 00:08:17,160 --> 00:08:21,840 of a dragon 234 00:08:19,500 --> 00:08:23,580 so first let's explore some of the 235 00:08:21,840 --> 00:08:25,199 ethical challenges which arise when 236 00:08:23,580 --> 00:08:26,580 working within research about open 237 00:08:25,199 --> 00:08:28,979 source 238 00:08:26,580 --> 00:08:30,780 one of the key cultural tenets of Open 239 00:08:28,979 --> 00:08:33,300 Source software and the communities that 240 00:08:30,780 --> 00:08:35,459 build around it is working in the open 241 00:08:33,300 --> 00:08:36,899 this place is much but not all of the 242 00:08:35,459 --> 00:08:38,820 conversation about these communities 243 00:08:36,899 --> 00:08:41,159 online in a way that is visible to 244 00:08:38,820 --> 00:08:44,279 anyone who wants to join the effort 245 00:08:41,159 --> 00:08:46,260 this is the intent be open to make work 246 00:08:44,279 --> 00:08:48,360 community and Innovation transparent 247 00:08:46,260 --> 00:08:50,040 there's a subset of the larger open 248 00:08:48,360 --> 00:08:51,839 source movement which identifies as free 249 00:08:50,040 --> 00:08:53,580 software one could argue it's the 250 00:08:51,839 --> 00:08:55,080 opposite way open source is a subset of 251 00:08:53,580 --> 00:08:57,540 free software we're not here to debate 252 00:08:55,080 --> 00:08:59,339 that today but here free hinges on the 253 00:08:57,540 --> 00:09:01,860 concept of Libre rather than tied to a 254 00:08:59,339 --> 00:09:04,080 cost savings measure this is a very 255 00:09:01,860 --> 00:09:06,300 different concept than open or free 256 00:09:04,080 --> 00:09:08,880 meaning please scrape our data and tell 257 00:09:06,300 --> 00:09:10,620 us about ourselves there's no Universal 258 00:09:08,880 --> 00:09:12,420 consent form that any one person or 259 00:09:10,620 --> 00:09:14,940 Community is signing up for when placing 260 00:09:12,420 --> 00:09:16,860 their information online yet this is how 261 00:09:14,940 --> 00:09:18,600 we've been advancing our knowledge of 262 00:09:16,860 --> 00:09:21,420 the problems that exist in open source 263 00:09:18,600 --> 00:09:23,760 and advocating for their resolution the 264 00:09:21,420 --> 00:09:26,040 ethical challenge here when and where 265 00:09:23,760 --> 00:09:28,740 can we assume consent for online content 266 00:09:26,040 --> 00:09:30,540 to be analyzed how do we even find and 267 00:09:28,740 --> 00:09:32,279 respect these barriers when they are so 268 00:09:30,540 --> 00:09:34,080 unclear 269 00:09:32,279 --> 00:09:35,820 this brings us to the next ethical 270 00:09:34,080 --> 00:09:38,459 challenge which primarily applies to 271 00:09:35,820 --> 00:09:41,459 folks working in institutions with irbs 272 00:09:38,459 --> 00:09:43,320 for those not familiar IRB stands for 273 00:09:41,459 --> 00:09:45,300 institutional review board 274 00:09:43,320 --> 00:09:46,680 which review research studies to ensure 275 00:09:45,300 --> 00:09:48,720 that they comply with applicable 276 00:09:46,680 --> 00:09:51,060 regulations meet communally accepted 277 00:09:48,720 --> 00:09:53,220 ethical standards follow institutional 278 00:09:51,060 --> 00:09:54,540 policies and adequately protect research 279 00:09:53,220 --> 00:09:56,580 participants 280 00:09:54,540 --> 00:09:58,560 and as we just talked about the lack of 281 00:09:56,580 --> 00:10:00,899 explicit consent to be a test subject in 282 00:09:58,560 --> 00:10:03,600 aggregate there is an Institutional Gap 283 00:10:00,899 --> 00:10:05,580 that many irbs do not require review of 284 00:10:03,600 --> 00:10:07,440 analysis and research when it is 285 00:10:05,580 --> 00:10:09,420 conducted in aggregate from metadata 286 00:10:07,440 --> 00:10:11,580 such as the kind that you collect from 287 00:10:09,420 --> 00:10:14,100 platform apis 288 00:10:11,580 --> 00:10:16,080 so even if the markers of the metadata 289 00:10:14,100 --> 00:10:17,640 could potentially be identifying such as 290 00:10:16,080 --> 00:10:19,920 mailing list information where email 291 00:10:17,640 --> 00:10:21,620 addresses and location data do not 292 00:10:19,920 --> 00:10:23,700 adequately provide anonymization 293 00:10:21,620 --> 00:10:25,320 individuals whose work is represented 294 00:10:23,700 --> 00:10:26,640 are not considered to be research 295 00:10:25,320 --> 00:10:28,920 participants for data which already 296 00:10:26,640 --> 00:10:31,080 exists and is open to many 297 00:10:28,920 --> 00:10:34,080 so this brings the ethical challenge for 298 00:10:31,080 --> 00:10:35,880 us when and where do we add and how do 299 00:10:34,080 --> 00:10:38,100 we identify when a research study should 300 00:10:35,880 --> 00:10:39,839 be reviewed by IRB and how do we 301 00:10:38,100 --> 00:10:42,420 advocate for that kind of oversight 302 00:10:39,839 --> 00:10:43,860 within the research Community when is it 303 00:10:42,420 --> 00:10:46,140 beneficial to the group being studied 304 00:10:43,860 --> 00:10:47,760 when is it actually not a necessary step 305 00:10:46,140 --> 00:10:49,860 towards framing ethical research 306 00:10:47,760 --> 00:10:52,500 problems 307 00:10:49,860 --> 00:10:54,720 this brings us to the next step of 308 00:10:52,500 --> 00:10:57,000 problem framing humans are complex 309 00:10:54,720 --> 00:10:59,820 creatures as much as it serves reducing 310 00:10:57,000 --> 00:11:02,160 the problem space no one person readily 311 00:10:59,820 --> 00:11:04,980 sits in a single dimensional cluster 312 00:11:02,160 --> 00:11:07,200 when we fail to recognize this we are at 313 00:11:04,980 --> 00:11:10,320 risk of problem framing becoming too 314 00:11:07,200 --> 00:11:12,779 reductive we have to ask when and where 315 00:11:10,320 --> 00:11:14,640 are we actively erasing people or 316 00:11:12,779 --> 00:11:16,620 reducing their identities to vectors 317 00:11:14,640 --> 00:11:20,459 which become harmful when applied to 318 00:11:16,620 --> 00:11:22,320 policy one for all is not all for one we 319 00:11:20,459 --> 00:11:25,140 must continue to be conscientious of 320 00:11:22,320 --> 00:11:28,980 categorization and reductionism event of 321 00:11:25,140 --> 00:11:31,140 identities to single categories 322 00:11:28,980 --> 00:11:32,940 so continue on with the concept and 323 00:11:31,140 --> 00:11:35,279 challenge of self-identification markers 324 00:11:32,940 --> 00:11:37,260 we must always be aware that people from 325 00:11:35,279 --> 00:11:38,820 vulnerable communities May and do 326 00:11:37,260 --> 00:11:41,279 intentionally separate their identities 327 00:11:38,820 --> 00:11:43,740 across multiple online communities and 328 00:11:41,279 --> 00:11:46,019 spaces now more than ever we have to be 329 00:11:43,740 --> 00:11:48,000 aware of how our aggregated work May 330 00:11:46,019 --> 00:11:50,100 unintentionally Place someone from a 331 00:11:48,000 --> 00:11:52,019 digital space into a place where 332 00:11:50,100 --> 00:11:54,180 physically they may be subject to real 333 00:11:52,019 --> 00:11:57,420 world consequences because of who they 334 00:11:54,180 --> 00:11:59,279 are and how they exist in the world 335 00:11:57,420 --> 00:12:01,680 so this brings us to a fundamental 336 00:11:59,279 --> 00:12:03,600 technique of working within Acro open 337 00:12:01,680 --> 00:12:05,459 source research we're combining 338 00:12:03,600 --> 00:12:07,200 information across multiple Trace 339 00:12:05,459 --> 00:12:10,079 systems to gain a more complete picture 340 00:12:07,200 --> 00:12:10,920 of a community is a common technique for 341 00:12:10,079 --> 00:12:12,660 example 342 00:12:10,920 --> 00:12:14,880 if we want to see how software changes 343 00:12:12,660 --> 00:12:17,160 over time or Community changes over time 344 00:12:14,880 --> 00:12:19,620 we may combine data from working 345 00:12:17,160 --> 00:12:21,300 software repositories issue trackers and 346 00:12:19,620 --> 00:12:23,880 maintainer mailing lists 347 00:12:21,300 --> 00:12:26,579 since not every system identifier is the 348 00:12:23,880 --> 00:12:28,740 same we can use anti-aliasing method 349 00:12:26,579 --> 00:12:29,940 methods to consolidate work to 350 00:12:28,740 --> 00:12:32,640 individuals 351 00:12:29,940 --> 00:12:34,019 here I refer to anti-aliasing as 352 00:12:32,640 --> 00:12:36,600 sampling techniques to combine 353 00:12:34,019 --> 00:12:38,399 information to single identifiers 354 00:12:36,600 --> 00:12:41,100 the ethical challenge 355 00:12:38,399 --> 00:12:43,200 question this brings us is does 356 00:12:41,100 --> 00:12:44,820 anti-aliasing across data sets 357 00:12:43,200 --> 00:12:46,680 potentially create opportunities for 358 00:12:44,820 --> 00:12:47,760 harm for members of Open Source 359 00:12:46,680 --> 00:12:49,860 communities 360 00:12:47,760 --> 00:12:51,839 we talked before about how individuals 361 00:12:49,860 --> 00:12:54,060 may separate themselves online across 362 00:12:51,839 --> 00:12:56,100 digital spaces one of these specific 363 00:12:54,060 --> 00:12:57,480 separations is when a trans person who 364 00:12:56,100 --> 00:12:59,040 works in open source moves their work 365 00:12:57,480 --> 00:13:01,380 and digital identity as they come out as 366 00:12:59,040 --> 00:13:02,820 trans they may not change over all their 367 00:13:01,380 --> 00:13:05,100 professional Community work from their 368 00:13:02,820 --> 00:13:06,420 dead name to their present identity when 369 00:13:05,100 --> 00:13:08,339 this is uncovered through inferred 370 00:13:06,420 --> 00:13:10,320 methods how do we honor the person and 371 00:13:08,339 --> 00:13:12,480 their life who we have unintentionally 372 00:13:10,320 --> 00:13:13,500 outed beyond their original trusted 373 00:13:12,480 --> 00:13:16,139 community 374 00:13:13,500 --> 00:13:18,540 the impetus and the work is not on them 375 00:13:16,139 --> 00:13:20,220 it is on us as researchers to make sure 376 00:13:18,540 --> 00:13:23,300 that we are honoring the people in the 377 00:13:20,220 --> 00:13:23,300 spaces we are working with 378 00:13:23,700 --> 00:13:27,420 so we're going to pause on ethical 379 00:13:25,260 --> 00:13:29,399 challenges for now to wade into the 380 00:13:27,420 --> 00:13:31,380 waters where lawyers hesitate to be 381 00:13:29,399 --> 00:13:34,019 cited 382 00:13:31,380 --> 00:13:35,760 a quick note uh this is a picture of a 383 00:13:34,019 --> 00:13:37,500 lawyer who accidentally used a chat 384 00:13:35,760 --> 00:13:40,860 product filter to show themselves as a 385 00:13:37,500 --> 00:13:42,899 cat during a trial online in 2020. now I 386 00:13:40,860 --> 00:13:45,060 am not a lawyer and I am not a cat but 387 00:13:42,899 --> 00:13:46,740 any good lawyer cat gets super fidgety 388 00:13:45,060 --> 00:13:48,839 when Engineers talk about label matters 389 00:13:46,740 --> 00:13:50,459 so I will not get into exact specifics 390 00:13:48,839 --> 00:13:52,740 today but I will generally refer to 391 00:13:50,459 --> 00:13:54,540 things about regulations and laws and 392 00:13:52,740 --> 00:13:56,220 other good ideas which I think maintain 393 00:13:54,540 --> 00:13:58,500 good social structure and I approve 394 00:13:56,220 --> 00:14:00,300 wholeheartedly of when there is fairly 395 00:13:58,500 --> 00:14:02,459 and not used against people to be able 396 00:14:00,300 --> 00:14:04,440 to keep them in oppressive States please 397 00:14:02,459 --> 00:14:06,240 consult your own lawyer cats as needed 398 00:14:04,440 --> 00:14:08,700 for further clarification advice and the 399 00:14:06,240 --> 00:14:10,680 Dark Matters to avoid 400 00:14:08,700 --> 00:14:12,839 again I am not a lawyer and I'm not your 401 00:14:10,680 --> 00:14:15,720 lawyer but when I talk with lawyer cats 402 00:14:12,839 --> 00:14:17,399 about data and what is data and where 403 00:14:15,720 --> 00:14:20,040 the law sees what is data we get into 404 00:14:17,399 --> 00:14:23,820 some very twisty Charney conversations 405 00:14:20,040 --> 00:14:26,820 there is no Global legal standard on 406 00:14:23,820 --> 00:14:29,339 what is information what is data and 407 00:14:26,820 --> 00:14:31,920 what is the differentiation between data 408 00:14:29,339 --> 00:14:34,740 and a data set 409 00:14:31,920 --> 00:14:37,800 so even more complex is that there's no 410 00:14:34,740 --> 00:14:41,519 Universal legal standard of what data is 411 00:14:37,800 --> 00:14:44,880 who information belongs two what is 412 00:14:41,519 --> 00:14:46,920 public what is open and how we can apply 413 00:14:44,880 --> 00:14:49,980 that across all the different use cases 414 00:14:46,920 --> 00:14:52,500 so when you see open data you have to 415 00:14:49,980 --> 00:14:57,480 keep in mind that open to use may not be 416 00:14:52,500 --> 00:15:00,300 open to us depending on who that us is 417 00:14:57,480 --> 00:15:01,860 so yes Amanda you may say but what about 418 00:15:00,300 --> 00:15:05,100 this concept I hear about in copyright 419 00:15:01,860 --> 00:15:06,660 law which refers to fair use not like 420 00:15:05,100 --> 00:15:08,940 anyone's talking about that these days 421 00:15:06,660 --> 00:15:10,680 again this is not standardized across 422 00:15:08,940 --> 00:15:13,500 every country institution organization 423 00:15:10,680 --> 00:15:15,240 and Industry the most recent advances in 424 00:15:13,500 --> 00:15:17,579 text generation of summarization have 425 00:15:15,240 --> 00:15:20,459 Revisited and accelerated the challenges 426 00:15:17,579 --> 00:15:21,779 to what exactly constitutes fair use so 427 00:15:20,459 --> 00:15:23,880 if you want to give your favorite 428 00:15:21,779 --> 00:15:25,860 intellectual property lawyer cat a 429 00:15:23,880 --> 00:15:28,620 headache ask them what's currently 430 00:15:25,860 --> 00:15:31,860 changing or up for change in legislation 431 00:15:28,620 --> 00:15:34,440 regulation and case law surrounding fair 432 00:15:31,860 --> 00:15:36,600 use and large language models I've 433 00:15:34,440 --> 00:15:38,459 included in the references one recent 434 00:15:36,600 --> 00:15:40,019 Viewpoint from Van Lindberg who also 435 00:15:38,459 --> 00:15:42,480 wrote a book on intellectual property 436 00:15:40,019 --> 00:15:44,699 and open source 437 00:15:42,480 --> 00:15:46,980 uh in the meantime 438 00:15:44,699 --> 00:15:49,079 there are licensing standards that exist 439 00:15:46,980 --> 00:15:51,300 and are used by open source communities 440 00:15:49,079 --> 00:15:54,360 to explain how you can use the 441 00:15:51,300 --> 00:15:56,820 information they post online where we 442 00:15:54,360 --> 00:15:58,920 get into legal challenges here is what 443 00:15:56,820 --> 00:16:01,860 license and terms and condition applies 444 00:15:58,920 --> 00:16:04,139 where and for what use so open source 445 00:16:01,860 --> 00:16:06,420 software by definition falls under 446 00:16:04,139 --> 00:16:08,399 licenses as outlined by the open source 447 00:16:06,420 --> 00:16:11,579 definition and held by the open source 448 00:16:08,399 --> 00:16:13,199 Institute however metadata about the 449 00:16:11,579 --> 00:16:15,060 software which is the data exhaust 450 00:16:13,199 --> 00:16:17,760 created by the platforms they use 451 00:16:15,060 --> 00:16:19,920 proprietary or otherwise and the license 452 00:16:17,760 --> 00:16:22,320 which apply to the data they use can all 453 00:16:19,920 --> 00:16:24,000 be different so in addition some 454 00:16:22,320 --> 00:16:25,860 information depends on use commercial 455 00:16:24,000 --> 00:16:28,019 versus non-commercial some depends on 456 00:16:25,860 --> 00:16:29,579 explicit licensing or what it means when 457 00:16:28,019 --> 00:16:31,019 no license is attached at all but the 458 00:16:29,579 --> 00:16:33,720 data was created by the citizen of a 459 00:16:31,019 --> 00:16:35,880 specific country eliminating the step of 460 00:16:33,720 --> 00:16:37,680 untangling these license and use cases 461 00:16:35,880 --> 00:16:39,899 is not something that can be skipped 462 00:16:37,680 --> 00:16:43,759 when designing a research problem 463 00:16:39,899 --> 00:16:43,759 concerning open source 464 00:16:43,860 --> 00:16:47,279 we've talked about ethical and legal 465 00:16:45,779 --> 00:16:49,560 challenges which are necessary building 466 00:16:47,279 --> 00:16:51,360 blocks to begin to address where policy 467 00:16:49,560 --> 00:16:54,920 challenges present themselves when 468 00:16:51,360 --> 00:16:54,920 researching open source communities 469 00:16:55,259 --> 00:17:00,360 that was necessary this is a lot 470 00:16:57,540 --> 00:17:02,699 okay so one widely used governance model 471 00:17:00,360 --> 00:17:05,579 within open source is to have large 472 00:17:02,699 --> 00:17:07,439 foundations capital L capital F Shepherd 473 00:17:05,579 --> 00:17:09,059 projects and portfolios for open source 474 00:17:07,439 --> 00:17:10,860 maintainers enabling them to offload 475 00:17:09,059 --> 00:17:12,720 some of the work to maintain a globally 476 00:17:10,860 --> 00:17:14,520 used technology product 477 00:17:12,720 --> 00:17:16,380 however this brings us back to the 478 00:17:14,520 --> 00:17:17,699 problem of aggregated consent and 479 00:17:16,380 --> 00:17:19,140 individual consent when working with 480 00:17:17,699 --> 00:17:21,059 data about people 481 00:17:19,140 --> 00:17:23,459 when communities choose to work with 482 00:17:21,059 --> 00:17:26,100 larger foundations sometimes a large 483 00:17:23,459 --> 00:17:27,780 Foundation capital L capital F when does 484 00:17:26,100 --> 00:17:30,179 that Foundation begin to represent the 485 00:17:27,780 --> 00:17:32,340 individuals and Collective groups within 486 00:17:30,179 --> 00:17:34,559 what do communities give up when they 487 00:17:32,340 --> 00:17:37,200 join a larger organization and what do 488 00:17:34,559 --> 00:17:39,059 they retain who ultimately gets to make 489 00:17:37,200 --> 00:17:40,860 this decision do they know they're 490 00:17:39,059 --> 00:17:42,900 making that decision do they know who 491 00:17:40,860 --> 00:17:44,220 they're making that decision for do they 492 00:17:42,900 --> 00:17:45,960 know who they continue to make that 493 00:17:44,220 --> 00:17:49,260 decision for and when do they get to 494 00:17:45,960 --> 00:17:51,419 change that should they decide 495 00:17:49,260 --> 00:17:53,100 can foundations sign up projects and 496 00:17:51,419 --> 00:17:55,380 maintainers for larger initiatives 497 00:17:53,100 --> 00:17:57,539 research studies can they change 498 00:17:55,380 --> 00:17:59,940 mandatory communication and operational 499 00:17:57,539 --> 00:18:02,520 platforms terms and conditions without 500 00:17:59,940 --> 00:18:03,720 the consent of the individual maintainer 501 00:18:02,520 --> 00:18:06,600 communities 502 00:18:03,720 --> 00:18:08,400 I file this under policy because we may 503 00:18:06,600 --> 00:18:10,500 not always be aware of what visibility 504 00:18:08,400 --> 00:18:11,940 and privacy we are signing commitments 505 00:18:10,500 --> 00:18:14,220 to when we are joining a large 506 00:18:11,940 --> 00:18:16,620 organization regardless of the good 507 00:18:14,220 --> 00:18:19,020 intent of either the organization or the 508 00:18:16,620 --> 00:18:21,600 project maintainers themselves 509 00:18:19,020 --> 00:18:24,480 so what does 510 00:18:21,600 --> 00:18:27,059 what do we require of marginalized 511 00:18:24,480 --> 00:18:29,039 people in a community when a larger 512 00:18:27,059 --> 00:18:30,600 organization ethically drifts away from 513 00:18:29,039 --> 00:18:32,280 their best interests 514 00:18:30,600 --> 00:18:34,020 do they have to bring these themselves 515 00:18:32,280 --> 00:18:36,059 or do their identities forward to 516 00:18:34,020 --> 00:18:37,200 confront those changes how are we 517 00:18:36,059 --> 00:18:38,940 thinking about members of our 518 00:18:37,200 --> 00:18:41,580 communities when things start to drift 519 00:18:38,940 --> 00:18:43,559 and we have ethical challenges that no 520 00:18:41,580 --> 00:18:44,940 longer align and how do we disentangle 521 00:18:43,559 --> 00:18:47,280 then from commitments that we may have 522 00:18:44,940 --> 00:18:49,500 signed 523 00:18:47,280 --> 00:18:52,080 this all culminates into how we create 524 00:18:49,500 --> 00:18:54,360 Community consent models and care of the 525 00:18:52,080 --> 00:18:56,640 commons which allows for the people who 526 00:18:54,360 --> 00:18:59,700 create the data to have the most control 527 00:18:56,640 --> 00:19:02,880 over how it is used who has access to it 528 00:18:59,700 --> 00:19:05,220 and for what purposes the platforms both 529 00:19:02,880 --> 00:19:07,620 public and private both commercial and 530 00:19:05,220 --> 00:19:10,380 non-commercial that a community chooses 531 00:19:07,620 --> 00:19:12,419 to use ultimately places additional 532 00:19:10,380 --> 00:19:15,000 policy and conditions on their members 533 00:19:12,419 --> 00:19:16,860 in a way which not every person may 534 00:19:15,000 --> 00:19:19,860 fully agree to 535 00:19:16,860 --> 00:19:22,140 as researchers this returns us to the 536 00:19:19,860 --> 00:19:23,220 fundamental concept of building with and 537 00:19:22,140 --> 00:19:25,559 not for 538 00:19:23,220 --> 00:19:28,020 every open source project is a part of a 539 00:19:25,559 --> 00:19:31,919 larger socio-technical system we cannot 540 00:19:28,020 --> 00:19:34,200 ignore the people for the bits 541 00:19:31,919 --> 00:19:36,299 speaking of not ignoring the people for 542 00:19:34,200 --> 00:19:39,240 the bits you cannot ask me to talk about 543 00:19:36,299 --> 00:19:40,799 policy without talking about open source 544 00:19:39,240 --> 00:19:42,480 I absolutely cannot leave any public 545 00:19:40,799 --> 00:19:44,520 space these days without taking the time 546 00:19:42,480 --> 00:19:47,700 to reinforce these three simple words 547 00:19:44,520 --> 00:19:50,160 that open source matters as someone who 548 00:19:47,700 --> 00:19:52,559 works for a big Corporation I am highly 549 00:19:50,160 --> 00:19:54,840 and always aware that not only does my 550 00:19:52,559 --> 00:19:56,460 precise language build trust but the 551 00:19:54,840 --> 00:20:00,179 washing and generalizing of language 552 00:19:56,460 --> 00:20:02,820 allows all sorts of Mega evil tyrannical 553 00:20:00,179 --> 00:20:05,280 acts to use their power and privilege to 554 00:20:02,820 --> 00:20:07,500 avoid accountability and responsibility 555 00:20:05,280 --> 00:20:09,360 thankfully for me two of my included 556 00:20:07,500 --> 00:20:12,299 references including why open source 557 00:20:09,360 --> 00:20:14,760 matters and just released today open for 558 00:20:12,299 --> 00:20:17,340 business big Tech concentrated power and 559 00:20:14,760 --> 00:20:19,080 the political economy of open AI speak 560 00:20:17,340 --> 00:20:21,539 to this important issue even more 561 00:20:19,080 --> 00:20:23,039 thoroughly than we have time today I 562 00:20:21,539 --> 00:20:24,840 would encourage every person here and 563 00:20:23,039 --> 00:20:26,640 everyone who is not to read these 564 00:20:24,840 --> 00:20:28,620 thoroughly and consider what is in your 565 00:20:26,640 --> 00:20:31,620 power to take action on the ideas and 566 00:20:28,620 --> 00:20:33,600 challenges presented by them 567 00:20:31,620 --> 00:20:36,660 so while these challenges haven't been 568 00:20:33,600 --> 00:20:38,280 solved in the last 20-ish minutes I do 569 00:20:36,660 --> 00:20:39,900 want to offer some tools to help move 570 00:20:38,280 --> 00:20:41,460 open source research and understanding 571 00:20:39,900 --> 00:20:44,220 forward 572 00:20:41,460 --> 00:20:46,559 uh absolutely Shameless self-promotion 573 00:20:44,220 --> 00:20:48,900 together with Julia farioli and Juniper 574 00:20:46,559 --> 00:20:50,580 Lovato we recently published a set of 575 00:20:48,900 --> 00:20:53,039 best practices for open source 576 00:20:50,580 --> 00:20:55,140 researchers we cover recommendations in 577 00:20:53,039 --> 00:20:57,000 the areas of data ethics research best 578 00:20:55,140 --> 00:20:58,559 practices respect and Equity 579 00:20:57,000 --> 00:21:00,780 considerations and how to approach 580 00:20:58,559 --> 00:21:03,660 ecosystem Integrity while avoiding 581 00:21:00,780 --> 00:21:05,700 reductionist thinking this is not an 582 00:21:03,660 --> 00:21:07,919 exhaustive list but we hope it is a good 583 00:21:05,700 --> 00:21:10,200 starting point 584 00:21:07,919 --> 00:21:11,940 if you have appetite for a book one 585 00:21:10,200 --> 00:21:14,039 paper isn't enough my first 586 00:21:11,940 --> 00:21:16,260 recommendation remains data feminism the 587 00:21:14,039 --> 00:21:19,320 2020 book by Catherine de Ignacio and 588 00:21:16,260 --> 00:21:20,880 Lauren F Klein it is expert nuanced and 589 00:21:19,320 --> 00:21:23,700 highly critical of their own methods 590 00:21:20,880 --> 00:21:27,240 which I always appreciate 591 00:21:23,700 --> 00:21:28,919 and if one book isn't enough VM pursuers 592 00:21:27,240 --> 00:21:30,659 book Forge your future with open source 593 00:21:28,919 --> 00:21:32,700 will give you a deeper understanding of 594 00:21:30,659 --> 00:21:34,679 Open Source Community work how it's 595 00:21:32,700 --> 00:21:37,020 grown over time where practitioners 596 00:21:34,679 --> 00:21:39,659 present and what kind of workflows they 597 00:21:37,020 --> 00:21:40,799 share online and why this isn't a book 598 00:21:39,659 --> 00:21:42,539 to teach you about research best 599 00:21:40,799 --> 00:21:45,659 practices but rather to teach you about 600 00:21:42,539 --> 00:21:47,700 how open source is the way it is now and 601 00:21:45,659 --> 00:21:49,140 where you can meet ecosystems where they 602 00:21:47,700 --> 00:21:51,059 are 603 00:21:49,140 --> 00:21:52,380 and of course the breadth and depth 604 00:21:51,059 --> 00:21:53,940 searched through all of these references 605 00:21:52,380 --> 00:21:55,799 while pile up your paper pile even 606 00:21:53,940 --> 00:21:58,799 quicker than I can including the one 607 00:21:55,799 --> 00:22:01,740 that's taking over my dog's bed 608 00:21:58,799 --> 00:22:03,659 to end I want to encourage you to keep 609 00:22:01,740 --> 00:22:05,820 asking yourself and each other the hard 610 00:22:03,659 --> 00:22:08,100 questions about how we design problems 611 00:22:05,820 --> 00:22:10,559 what data we collect and the conclusions 612 00:22:08,100 --> 00:22:12,840 we draw about these ecosystems 613 00:22:10,559 --> 00:22:14,880 after all we're all in this open source 614 00:22:12,840 --> 00:22:17,480 raft together 615 00:22:14,880 --> 00:22:17,480 thank you 616 00:22:18,600 --> 00:22:22,150 thank you very much Amanda and can I 617 00:22:20,520 --> 00:22:23,260 just get everyone to keep uploading 618 00:22:22,150 --> 00:22:23,710 [Music] 619 00:22:23,260 --> 00:22:26,859 [Applause] 620 00:22:23,710 --> 00:22:26,859 [Music] 621 00:22:27,179 --> 00:22:31,500 Amanda thank you for your persistence uh 622 00:22:29,700 --> 00:22:33,480 presenting remotely it's difficult to 623 00:22:31,500 --> 00:22:35,340 present to a non-existent audience but I 624 00:22:33,480 --> 00:22:37,620 can assure you there were giggles at IP 625 00:22:35,340 --> 00:22:39,960 property lawyer cat so thank you very 626 00:22:37,620 --> 00:22:42,419 much for your talk we do have some time 627 00:22:39,960 --> 00:22:44,280 for a question or two if there's 628 00:22:42,419 --> 00:22:45,539 anything from the audience now I'm going 629 00:22:44,280 --> 00:22:48,860 to squint and see if we can see through 630 00:22:45,539 --> 00:22:48,860 the lights are there any questions here 631 00:22:49,559 --> 00:22:53,220 Amanda there's no immediate questions I 632 00:22:51,600 --> 00:22:55,500 will share with everyone that Amanda 633 00:22:53,220 --> 00:22:57,600 will be more present uh in Discord 634 00:22:55,500 --> 00:22:59,460 throughout the conference and is more 635 00:22:57,600 --> 00:23:00,419 active on Mastodon than elsewhere so if 636 00:22:59,460 --> 00:23:02,159 you'd like to reach out after the 637 00:23:00,419 --> 00:23:05,460 conference or during the conference you 638 00:23:02,159 --> 00:23:06,960 can do so online uh Amanda we will take 639 00:23:05,460 --> 00:23:08,880 a break now thank you very much for 640 00:23:06,960 --> 00:23:10,500 joining us remotely and have a wonderful 641 00:23:08,880 --> 00:23:13,460 rest of your evening 642 00:23:10,500 --> 00:23:13,460 thank you so much 643 00:23:15,120 --> 00:23:17,780 thank you everyone