1 00:00:00,000 --> 00:00:08,895 *Musik* 2 00:00:08,895 --> 00:00:20,040 Herald: Who of you is using Facebook? Twitter? Diaspora? 3 00:00:20,040 --> 00:00:27,630 *concerned noise* And all of that data you enter there 4 00:00:27,630 --> 00:00:34,240 gets to server, gets into the hand of somebody who's using it 5 00:00:34,240 --> 00:00:38,519 and the next talk is especially about that, 6 00:00:38,519 --> 00:00:43,879 because there's also intelligent machines and intelligent algorithms 7 00:00:43,879 --> 00:00:47,489 that try to make something out of that data. 8 00:00:47,489 --> 00:00:50,920 So the post-doc researcher Jennifer Helsby 9 00:00:50,920 --> 00:00:55,839 of the University of Chicago, which works in this 10 00:00:55,839 --> 00:00:59,370 intersection between policy and technology, 11 00:00:59,370 --> 00:01:04,709 will now ask you the question: To who would we give that power? 12 00:01:04,709 --> 00:01:12,860 Dr. Helsby: Thanks. *applause* 13 00:01:12,860 --> 00:01:17,090 Okay, so, today I'm gonna do a brief tour of intelligent systems 14 00:01:17,090 --> 00:01:18,640 and how they're currently used 15 00:01:18,640 --> 00:01:21,760 and then we're gonna look at some examples with respect 16 00:01:21,760 --> 00:01:23,710 to the properties that we might care about 17 00:01:23,710 --> 00:01:26,000 these systems having, and I'll talk a little bit about 18 00:01:26,000 --> 00:01:27,940 some of the work that's been done in academia 19 00:01:27,940 --> 00:01:28,680 on these topics. 20 00:01:28,680 --> 00:01:31,780 And then we'll talk about some promising paths forward. 21 00:01:31,780 --> 00:01:37,040 So, I wanna start with this: Kranzberg's First Law of Technology 22 00:01:37,040 --> 00:01:40,420 So, it's not good or bad, but it also isn't neutral. 23 00:01:40,420 --> 00:01:42,980 Technology shapes our world, and it can act as 24 00:01:42,980 --> 00:01:46,140 a liberating force-- or an oppressive and controlling force. 25 00:01:46,140 --> 00:01:49,730 So, in this talk, I'm gonna go towards some of the aspects 26 00:01:49,730 --> 00:01:53,830 of intelligent systems that might be more controlling in nature. 27 00:01:53,830 --> 00:01:56,060 So, as we all know, 28 00:01:56,060 --> 00:01:59,770 because of the rapidly decreasing cost of storage and computation, 29 00:01:59,770 --> 00:02:02,170 along with the rise of new sensor technologies, 30 00:02:02,170 --> 00:02:05,510 data collection devices are being pushed into every 31 00:02:05,510 --> 00:02:08,329 aspect of our lives: in our homes, our cars, 32 00:02:08,329 --> 00:02:10,469 in our pockets, on our wrists. 33 00:02:10,469 --> 00:02:13,280 And data collection systems act as intermediaries 34 00:02:13,280 --> 00:02:15,230 for a huge amount of human communication. 35 00:02:15,230 --> 00:02:17,900 And much of this data sits in government 36 00:02:17,900 --> 00:02:19,860 and corporate databases. 37 00:02:19,860 --> 00:02:23,090 So, in order to make use of this data, 38 00:02:23,090 --> 00:02:27,280 we need to be able to make some inferences. 39 00:02:27,280 --> 00:02:30,280 So, one way of approaching this is I can hire 40 00:02:30,280 --> 00:02:32,310 a lot of humans, and I can have these humans 41 00:02:32,310 --> 00:02:34,990 manually examine the data, and they can acquire 42 00:02:34,990 --> 00:02:36,900 expert knowledge of the domain, and then 43 00:02:36,900 --> 00:02:38,510 perhaps they can make some decisions 44 00:02:38,510 --> 00:02:40,830 or at least some recommendations based on it. 45 00:02:40,830 --> 00:02:43,030 However, there's some problems with this. 46 00:02:43,030 --> 00:02:45,810 One is that it's slow, and thus expensive. 47 00:02:45,810 --> 00:02:48,060 It's also biased. We know that humans have 48 00:02:48,060 --> 00:02:50,700 all sorts of biases, both conscious and unconscious, 49 00:02:50,700 --> 00:02:53,390 and it would be nice to have a system that did not have 50 00:02:53,390 --> 00:02:54,959 these inaccuracies. 51 00:02:54,959 --> 00:02:57,069 It's also not very transparent: I might 52 00:02:57,069 --> 00:02:58,910 not really know the factors that led to 53 00:02:58,910 --> 00:03:00,930 some decisions being made. 54 00:03:00,930 --> 00:03:03,360 Even humans themselves often don't really understand 55 00:03:03,360 --> 00:03:05,360 why they came to a given decision, because 56 00:03:05,360 --> 00:03:08,130 of their being emotional in nature. 57 00:03:08,130 --> 00:03:11,530 And, thus, these human decision making systems 58 00:03:11,530 --> 00:03:13,170 are often difficult to audit. 59 00:03:13,170 --> 00:03:15,819 So, another way to proceed is maybe instead 60 00:03:15,819 --> 00:03:18,000 I study the system and the data carefully 61 00:03:18,000 --> 00:03:20,520 and I write down the best rules for making a decision 62 00:03:20,520 --> 00:03:23,280 or, I can have a machine dynamically figure out 63 00:03:23,280 --> 00:03:25,459 the best rules, as in machine learning. 64 00:03:25,459 --> 00:03:28,640 So, maybe this is a better approach. 65 00:03:28,640 --> 00:03:32,230 It's certainly fast, and thus cheap. 66 00:03:32,230 --> 00:03:34,290 And maybe I can construct the system in such a way 67 00:03:34,290 --> 00:03:37,090 that it doesn't have the biases that are inherent 68 00:03:37,090 --> 00:03:39,209 in human decision making. 69 00:03:39,209 --> 00:03:41,560 And, since I've written these rules down, 70 00:03:41,560 --> 00:03:42,819 or a computer has learned these rules, 71 00:03:42,819 --> 00:03:45,140 then I can just show them to somebody, right? 72 00:03:45,140 --> 00:03:46,819 And then they can audit it. 73 00:03:46,819 --> 00:03:49,020 So, more and more decision making is being 74 00:03:49,020 --> 00:03:50,750 done in this way. 75 00:03:50,750 --> 00:03:53,170 And so, in this model, we take data 76 00:03:53,170 --> 00:03:55,709 we make an inference based on that data 77 00:03:55,709 --> 00:03:58,120 using these algorithms, and then 78 00:03:58,120 --> 00:03:59,420 we can take actions. 79 00:03:59,420 --> 00:04:01,860 And, when we take this more scientific approach 80 00:04:01,860 --> 00:04:04,200 to making decisions and optimizing for 81 00:04:04,200 --> 00:04:07,310 a desired outcome, we can take an experimental approach 82 00:04:07,310 --> 00:04:10,080 so we can determine which actions are most effective 83 00:04:10,080 --> 00:04:12,310 in achieving a desired outcome. 84 00:04:12,310 --> 00:04:14,010 Maybe there are some types of communication 85 00:04:14,010 --> 00:04:16,750 styles that are most effective with certain people. 86 00:04:16,750 --> 00:04:19,510 I can perhaps deploy some individualized incentives 87 00:04:19,510 --> 00:04:22,060 to get the outcome that I desire. 88 00:04:22,060 --> 00:04:25,990 And, maybe even if I carefully design an experiment 89 00:04:25,990 --> 00:04:27,810 with the environment in which people make 90 00:04:27,810 --> 00:04:30,699 these decisions, perhaps even very small changes 91 00:04:30,699 --> 00:04:34,250 can introduce significant changes in peoples' behavior. 92 00:04:34,250 --> 00:04:37,320 So, through these mechanisms, and this experimental approach, 93 00:04:37,320 --> 00:04:39,840 I can maximize the probability that humans do 94 00:04:39,840 --> 00:04:42,020 what I want. 95 00:04:42,020 --> 00:04:45,380 So, algorithmic decision making is being used 96 00:04:45,380 --> 00:04:47,270 in industry, and is used in lots of other areas, 97 00:04:47,270 --> 00:04:49,530 from astrophysics to medicine, and is now 98 00:04:49,530 --> 00:04:52,199 moving into new domains, including 99 00:04:52,199 --> 00:04:53,990 government applications. 100 00:04:53,990 --> 00:04:58,560 So, we have recommendation engines like Netflix, Yelp, SoundCloud, 101 00:04:58,560 --> 00:05:00,699 that direct our attention to what we should 102 00:05:00,699 --> 00:05:03,510 watch and listen to. 103 00:05:03,510 --> 00:05:07,919 Since 2009, Google uses personalized searched results, 104 00:05:07,919 --> 00:05:12,840 including if you're not logged in into your Google account. 105 00:05:12,840 --> 00:05:15,389 And we also have algorithm curation and filtering, 106 00:05:15,389 --> 00:05:17,530 as in the case of Facebook News Feed, 107 00:05:17,530 --> 00:05:19,870 Google News, Yahoo News, 108 00:05:19,870 --> 00:05:22,840 which shows you what news articles, for example, 109 00:05:22,840 --> 00:05:24,330 you should be looking at. 110 00:05:24,330 --> 00:05:25,650 And this is important, because a lot of people 111 00:05:25,650 --> 00:05:29,410 get news from these media. 112 00:05:29,410 --> 00:05:31,520 We even have algorithmic journalists! 113 00:05:31,520 --> 00:05:35,240 So, automatic systems generate articles 114 00:05:35,240 --> 00:05:36,880 about weather, traffic, or sports 115 00:05:36,880 --> 00:05:38,729 instead of a human. 116 00:05:38,729 --> 00:05:41,949 And, another application that's more recent 117 00:05:41,949 --> 00:05:43,570 is the use of predictive systems 118 00:05:43,570 --> 00:05:45,180 in political campaigns. 119 00:05:45,180 --> 00:05:47,370 So, political campaigns also now take this 120 00:05:47,370 --> 00:05:50,340 approach to predict on an individual basis 121 00:05:50,340 --> 00:05:53,300 which candidate voters are likely to vote for. 122 00:05:53,300 --> 00:05:55,500 And then they can target, on an individual basis, 123 00:05:55,500 --> 00:05:58,199 those that can be persuaded otherwise. 124 00:05:58,199 --> 00:06:00,830 And, finally, in the public sector, 125 00:06:00,830 --> 00:06:02,710 we're starting to use predictive systems 126 00:06:02,710 --> 00:06:06,320 in areas from policing, to health, to education and energy. 127 00:06:06,320 --> 00:06:08,979 So, there are some advantages to this. 128 00:06:08,979 --> 00:06:12,790 So, one thing is that we can automate 129 00:06:12,790 --> 00:06:15,759 aspects of our lives that we consider to be mundane 130 00:06:15,759 --> 00:06:17,620 using systems that are intelligent 131 00:06:17,620 --> 00:06:19,580 and adaptive enough. 132 00:06:19,580 --> 00:06:21,680 We can make use of all the data 133 00:06:21,680 --> 00:06:23,990 and really get the pieces of information we 134 00:06:23,990 --> 00:06:25,830 really care about. 135 00:06:25,830 --> 00:06:29,650 We can spend money in the most effective way, 136 00:06:29,650 --> 00:06:32,110 and we can do this with this experimental 137 00:06:32,110 --> 00:06:34,210 approach to optimize actions to produce 138 00:06:34,210 --> 00:06:35,190 desired outcomes. 139 00:06:35,190 --> 00:06:37,300 So, we can embed intelligence 140 00:06:37,300 --> 00:06:39,520 into all of these mundane objects 141 00:06:39,520 --> 00:06:41,180 and enable them to make decisions for us, 142 00:06:41,180 --> 00:06:42,860 and so that's what we're doing more and more, 143 00:06:42,860 --> 00:06:45,210 and we can have an object that decides for us 144 00:06:45,210 --> 00:06:46,840 what temperature we should set our house, 145 00:06:46,840 --> 00:06:49,009 what we should be doing, etc. 146 00:06:49,009 --> 00:06:52,400 So, there might be some implications here. 147 00:06:52,400 --> 00:06:55,680 We want these systems that do work on this data 148 00:06:55,680 --> 00:06:58,039 to increase the opportunities available to us. 149 00:06:58,039 --> 00:07:00,259 But it might be that there are some implications 150 00:07:00,259 --> 00:07:01,780 that we have not carefully thought through. 151 00:07:01,780 --> 00:07:03,430 This is a new area, and people are only 152 00:07:03,430 --> 00:07:05,940 starting to scratch the surface of what the 153 00:07:05,940 --> 00:07:07,289 problems might be. 154 00:07:07,289 --> 00:07:09,600 In some cases, they might narrow the options 155 00:07:09,600 --> 00:07:10,990 available to people, 156 00:07:10,990 --> 00:07:13,199 and this approach subjects people to 157 00:07:13,199 --> 00:07:15,620 suggestive messaging intended to nudge them 158 00:07:15,620 --> 00:07:17,169 to a desired outcome. 159 00:07:17,169 --> 00:07:19,320 Some people may have a problem with that. 160 00:07:19,320 --> 00:07:20,650 Values we care about are not gonna be 161 00:07:20,650 --> 00:07:23,860 baked into these systems by default. 162 00:07:23,860 --> 00:07:25,960 It's also the case that some algorithmic systems 163 00:07:25,960 --> 00:07:28,300 facilitate work that we do not like. 164 00:07:28,300 --> 00:07:30,199 For example, in the case of mass surveillance. 165 00:07:30,199 --> 00:07:32,130 And even the same systems, 166 00:07:32,130 --> 00:07:34,039 used by different people or organizations, 167 00:07:34,039 --> 00:07:36,110 have very different consequences. 168 00:07:36,110 --> 00:07:37,320 For example, if I can predict 169 00:07:37,320 --> 00:07:40,020 with high accuracy, based on say search queries, 170 00:07:40,020 --> 00:07:42,050 who's gonna be admitted to a hospital, 171 00:07:42,050 --> 00:07:43,750 some people would be interested in knowing that. 172 00:07:43,750 --> 00:07:46,120 You might be interested in having your doctor know that. 173 00:07:46,120 --> 00:07:47,919 But that same predictive model in the hands of 174 00:07:47,919 --> 00:07:50,569 an insurance company has a very different implication. 175 00:07:50,569 --> 00:07:53,389 So, the point here is that these systems 176 00:07:53,389 --> 00:07:55,860 structure and influence how humans interact 177 00:07:55,860 --> 00:07:58,360 with each other, how they interact with society, 178 00:07:58,360 --> 00:07:59,850 and how they interact with government. 179 00:07:59,850 --> 00:08:03,080 And if they constrain what people can do, 180 00:08:03,080 --> 00:08:05,069 we should really care about this. 181 00:08:05,069 --> 00:08:08,270 So now I'm gonna go to sort of an extreme case, 182 00:08:08,270 --> 00:08:11,930 just as an example, and that's this Chinese Social Credit System. 183 00:08:11,930 --> 00:08:14,169 And so this is probably one of the more 184 00:08:14,169 --> 00:08:17,259 ambitious uses of data, 185 00:08:17,259 --> 00:08:18,880 that is used to rank each citizen 186 00:08:18,880 --> 00:08:21,190 based on their behavior, in China. 187 00:08:21,190 --> 00:08:24,210 So right now, there are various pilot systems 188 00:08:24,210 --> 00:08:27,660 deployed by various companies doing this in China. 189 00:08:27,660 --> 00:08:30,729 They're currently voluntary, and by 2020 190 00:08:30,729 --> 00:08:32,630 this system is gonna be decided on, 191 00:08:32,630 --> 00:08:34,679 or a combination of the systems, 192 00:08:34,679 --> 00:08:37,409 that is gonna be mandatory for everyone. 193 00:08:37,409 --> 00:08:40,950 And so, in this system, there are some citizens, 194 00:08:40,950 --> 00:08:44,380 and a huge range of data sources are used. 195 00:08:44,380 --> 00:08:46,820 So, some of the data sources are 196 00:08:46,820 --> 00:08:48,360 your financial data, 197 00:08:48,360 --> 00:08:50,020 your criminal history, 198 00:08:50,020 --> 00:08:52,320 how many points you have on your driver's license, 199 00:08:52,320 --> 00:08:55,360 medical information-- for example, if you take birth control pills, 200 00:08:55,360 --> 00:08:56,810 that's incorporated. 201 00:08:56,810 --> 00:08:59,830 Your purchase history-- for example, if you purchase games, 202 00:08:59,830 --> 00:09:02,430 you are down-ranked in the system. 203 00:09:02,430 --> 00:09:04,490 Some of the systems, not all of them, 204 00:09:04,490 --> 00:09:07,260 incorporate social media monitoring, 205 00:09:07,260 --> 00:09:09,200 which makes sense if you're a state like China, 206 00:09:09,200 --> 00:09:11,270 you probably want to know about 207 00:09:11,270 --> 00:09:14,899 political statements that people are saying on social media. 208 00:09:14,899 --> 00:09:18,020 And, one of the more interesting parts is 209 00:09:18,020 --> 00:09:22,160 social network analysis: looking at the relationships between people. 210 00:09:22,160 --> 00:09:24,270 So, if you have a close relationship with somebody 211 00:09:24,270 --> 00:09:26,180 and they have a low credit score, 212 00:09:26,180 --> 00:09:29,130 that can have implications on your credit score. 213 00:09:29,130 --> 00:09:34,440 So, the way that these scores are generated is secret. 214 00:09:34,440 --> 00:09:38,140 And, according to the call for these systems 215 00:09:38,140 --> 00:09:39,270 put out by the government, 216 00:09:39,270 --> 00:09:42,810 the goal is to "carry forward the sincerity and 217 00:09:42,810 --> 00:09:45,760 traditional virtues" and establish the idea of a 218 00:09:45,760 --> 00:09:47,520 "sincerity culture." 219 00:09:47,520 --> 00:09:49,440 But wait, it gets better: 220 00:09:49,440 --> 00:09:52,450 so, there's a portal that enables citizens 221 00:09:52,450 --> 00:09:55,040 to look up the citizen score of anyone. 222 00:09:55,040 --> 00:09:56,520 And many people like this system, 223 00:09:56,520 --> 00:09:58,320 they think it's a fun game. 224 00:09:58,320 --> 00:10:00,700 They boast about it on social media, 225 00:10:00,700 --> 00:10:03,610 they put their score in their dating profile, 226 00:10:03,610 --> 00:10:04,760 because if you're ranked highly you're 227 00:10:04,760 --> 00:10:06,589 part of an exclusive club. 228 00:10:06,589 --> 00:10:10,060 You can get VIP treatment at hotels and other companies. 229 00:10:10,060 --> 00:10:11,880 But the downside is that, if you're excluded 230 00:10:11,880 --> 00:10:15,540 from that club, your weak score may have other implications, 231 00:10:15,540 --> 00:10:20,120 like being unable to get access to credit, housing, jobs. 232 00:10:20,120 --> 00:10:23,399 There is some reporting that even travel visas 233 00:10:23,399 --> 00:10:27,000 might be restricted if your score is particularly low. 234 00:10:27,000 --> 00:10:31,160 So, a system like this, for a state, is really 235 00:10:31,160 --> 00:10:34,690 the optimal solution to the problem of the public. 236 00:10:34,690 --> 00:10:37,130 It constitutes a very subtle and insiduous 237 00:10:37,130 --> 00:10:39,350 mechanism of social control. 238 00:10:39,350 --> 00:10:41,209 You don't need to spend a lot of money on 239 00:10:41,209 --> 00:10:43,800 police or prisons if you can set up a system 240 00:10:43,800 --> 00:10:45,820 where people discourage one another from 241 00:10:45,820 --> 00:10:48,930 anti-social acts like political action in exchange for 242 00:10:48,930 --> 00:10:51,430 a coupon for a free Uber ride. 243 00:10:51,430 --> 00:10:55,269 So, there are a lot of legitimate questions here: 244 00:10:55,269 --> 00:10:58,370 What protections does user data have in this scheme? 245 00:10:58,370 --> 00:11:01,279 Do any safeguards exist to prevent tampering? 246 00:11:01,279 --> 00:11:04,310 What mechanism, if any, is there to prevent 247 00:11:04,310 --> 00:11:08,810 false input data from creating erroneous inferences? 248 00:11:08,810 --> 00:11:10,420 Is there any way that people can fix 249 00:11:10,420 --> 00:11:12,540 their score once they're ranked poorly? 250 00:11:12,540 --> 00:11:13,899 Or does it end up becoming a 251 00:11:13,899 --> 00:11:15,720 self-fulfilling prophecy? 252 00:11:15,720 --> 00:11:17,850 Your weak score means you have less access 253 00:11:17,850 --> 00:11:21,620 to jobs and credit, and now you will have 254 00:11:21,620 --> 00:11:24,709 limited access to opportunity. 255 00:11:24,709 --> 00:11:27,110 So, let's take a step back. 256 00:11:27,110 --> 00:11:28,470 So, what do we want? 257 00:11:28,470 --> 00:11:31,540 So, we probably don't want that, 258 00:11:31,540 --> 00:11:33,570 but as advocates we really wanna 259 00:11:33,570 --> 00:11:36,130 understand what questions we should be asking 260 00:11:36,130 --> 00:11:37,510 of these systems. Right now there's 261 00:11:37,510 --> 00:11:39,570 very little oversight, 262 00:11:39,570 --> 00:11:41,420 and we wanna make sure that we don't 263 00:11:41,420 --> 00:11:44,029 sort of sleepwalk our way to a situation 264 00:11:44,029 --> 00:11:46,649 where we've lost even more power 265 00:11:46,649 --> 00:11:49,740 to these centralized systems of control. 266 00:11:49,740 --> 00:11:52,209 And if you're an implementer, we wanna understand 267 00:11:52,209 --> 00:11:53,709 what can we be doing better. 268 00:11:53,709 --> 00:11:56,019 Are there better ways that we can be implementing 269 00:11:56,019 --> 00:11:57,640 these systems? 270 00:11:57,640 --> 00:11:59,430 Are there values that, as humans, 271 00:11:59,430 --> 00:12:01,060 we care about that we should make sure 272 00:12:01,060 --> 00:12:02,420 these systems have? 273 00:12:02,420 --> 00:12:05,550 So, the first thing that most people in the room 274 00:12:05,550 --> 00:12:07,820 might think about is privacy. 275 00:12:07,820 --> 00:12:10,510 Which is, of course, of the utmost importance. 276 00:12:10,510 --> 00:12:12,920 We need privacy, and there is a good discussion 277 00:12:12,920 --> 00:12:15,680 on the importance of protecting user data where possible. 278 00:12:15,680 --> 00:12:18,420 So, in this talk, I'm gonna focus on the other aspects of 279 00:12:18,420 --> 00:12:19,470 algorithmic decision making, 280 00:12:19,470 --> 00:12:21,190 that I think have got less attention. 281 00:12:21,190 --> 00:12:25,140 Because it's not just privacy that we need to worry about here. 282 00:12:25,140 --> 00:12:28,519 We also want systems that are fair and equitable. 283 00:12:28,519 --> 00:12:30,240 We want transparent systems, 284 00:12:30,240 --> 00:12:35,110 we don't want opaque decisions to be made about us, 285 00:12:35,110 --> 00:12:36,510 decisions that might have serious impacts 286 00:12:36,510 --> 00:12:37,779 on our lives. 287 00:12:37,779 --> 00:12:40,490 And we need some accountability mechanisms. 288 00:12:40,490 --> 00:12:41,890 So, for the rest of this talk 289 00:12:41,890 --> 00:12:43,230 we're gonna go through each one of these things 290 00:12:43,230 --> 00:12:45,230 and look at some examples. 291 00:12:45,230 --> 00:12:47,709 So, the first thing is fairness. 292 00:12:47,709 --> 00:12:50,450 And so, as I said in the beginning, this is one area 293 00:12:50,450 --> 00:12:52,690 where there might be an advantage 294 00:12:52,690 --> 00:12:55,079 to making decisions by machine, 295 00:12:55,079 --> 00:12:56,740 especially in areas where there have 296 00:12:56,740 --> 00:12:59,410 historically been fairness issues with 297 00:12:59,410 --> 00:13:02,350 decision making, such as law enforcement. 298 00:13:02,350 --> 00:13:05,839 So, this is one way that police departments 299 00:13:05,839 --> 00:13:08,360 use predictive models. 300 00:13:08,360 --> 00:13:10,540 The idea here is police would like to 301 00:13:10,540 --> 00:13:13,450 allocate resources in a more effective way, 302 00:13:13,450 --> 00:13:15,050 and they would also like to enable 303 00:13:15,050 --> 00:13:16,640 proactive policing. 304 00:13:16,640 --> 00:13:20,110 So, if you can predict where crimes are going to occur, 305 00:13:20,110 --> 00:13:22,149 or who is going to commit crimes, 306 00:13:22,149 --> 00:13:24,870 then you can put cops in those places, 307 00:13:24,870 --> 00:13:27,769 or perhaps following these people, 308 00:13:27,769 --> 00:13:29,300 and then the crimes will not occur. 309 00:13:29,300 --> 00:13:31,370 So, it's sort of the pre-crime approach. 310 00:13:31,370 --> 00:13:34,649 So, there are a few ways of going about this. 311 00:13:34,649 --> 00:13:37,920 One way is doing this individual-level prediction. 312 00:13:37,920 --> 00:13:41,089 So you take each citizen and estimate the risk 313 00:13:41,089 --> 00:13:43,769 that each citizen will participate, say, in violence 314 00:13:43,769 --> 00:13:45,279 based on some data. 315 00:13:45,279 --> 00:13:46,779 And then you can flag those people that are 316 00:13:46,779 --> 00:13:49,199 considered particularly violent. 317 00:13:49,199 --> 00:13:51,519 So, this is currently done. 318 00:13:51,519 --> 00:13:52,589 This is done in the U.S. 319 00:13:52,589 --> 00:13:56,120 It's done in Chicago, by the Chicago Police Department. 320 00:13:56,120 --> 00:13:58,350 And they maintain a heat list of individuals 321 00:13:58,350 --> 00:14:00,790 that are considered most likely to commit, 322 00:14:00,790 --> 00:14:03,529 or be the victim of, violence. 323 00:14:03,529 --> 00:14:06,700 And this is done using data that the police maintain. 324 00:14:06,700 --> 00:14:09,589 So, the features that are used in this predictive model 325 00:14:09,589 --> 00:14:12,209 include things that are derived from 326 00:14:12,209 --> 00:14:14,610 individuals' criminal history. 327 00:14:14,610 --> 00:14:16,810 So, for example, have they been involved in 328 00:14:16,810 --> 00:14:18,350 gun violence in the past? 329 00:14:18,350 --> 00:14:21,450 Do they have narcotics arrests? And so on. 330 00:14:21,450 --> 00:14:22,860 But another thing that's incorporated 331 00:14:22,860 --> 00:14:25,060 in the Chicago Police Department model is 332 00:14:25,060 --> 00:14:28,300 information derived from social media network analysis. 333 00:14:28,300 --> 00:14:30,630 So, who you interact with, 334 00:14:30,630 --> 00:14:32,279 as noted in police data. 335 00:14:32,279 --> 00:14:34,899 So, for example, your co-arrestees. 336 00:14:34,899 --> 00:14:36,440 When officers conduct field interviews, 337 00:14:36,440 --> 00:14:38,240 who are people interacting with? 338 00:14:38,240 --> 00:14:42,940 And then this is all incorporated into this risk score. 339 00:14:42,940 --> 00:14:44,639 So another way to proceed, 340 00:14:44,639 --> 00:14:47,070 which is the method that most companies 341 00:14:47,070 --> 00:14:49,579 that sell products like this to the police have taken, 342 00:14:49,579 --> 00:14:51,459 is instead predicting which areas 343 00:14:51,459 --> 00:14:53,810 are likely to have crimes committed in them. 344 00:14:53,810 --> 00:14:56,690 So, take my city, I put a grid down, 345 00:14:56,690 --> 00:14:58,180 and then I use crime statistics 346 00:14:58,180 --> 00:15:00,430 and maybe some ancillary data sources, 347 00:15:00,430 --> 00:15:01,790 to determine which areas have 348 00:15:01,790 --> 00:15:04,709 the highest risk of crimes occurring in them, 349 00:15:04,709 --> 00:15:06,329 and I can flag those areas and send 350 00:15:06,329 --> 00:15:08,470 police officers to them. 351 00:15:08,470 --> 00:15:10,950 So now, let's look at some of the tools 352 00:15:10,950 --> 00:15:14,010 that are used for this geographic-level prediction. 353 00:15:14,010 --> 00:15:19,040 So, here are 3 companies that sell these 354 00:15:19,040 --> 00:15:22,910 geographic-level predictive policing systems. 355 00:15:22,910 --> 00:15:25,639 So, PredPol has a system that uses 356 00:15:25,639 --> 00:15:27,200 primarily crime statistics: 357 00:15:27,200 --> 00:15:30,209 only the time, place, and type of crime 358 00:15:30,209 --> 00:15:33,040 to predict where crimes will occur. 359 00:15:33,040 --> 00:15:35,970 HunchLab uses a wider range of data sources 360 00:15:35,970 --> 00:15:37,260 including, for example, weather 361 00:15:37,260 --> 00:15:39,720 and then Hitachi is a newer system 362 00:15:39,720 --> 00:15:42,100 that has a predictive crime analytics tool 363 00:15:42,100 --> 00:15:44,779 that also incorporates social media. 364 00:15:44,779 --> 00:15:47,850 The first one, to my knowledge, to do so. 365 00:15:47,850 --> 00:15:49,399 And these systems are in use 366 00:15:49,399 --> 00:15:52,820 in 50+ cities in the U.S. 367 00:15:52,820 --> 00:15:56,540 So, why do police departments buy this? 368 00:15:56,540 --> 00:15:57,760 Some police departments are interesting in 369 00:15:57,760 --> 00:16:00,500 buying systems like this, because they're marketed 370 00:16:00,500 --> 00:16:02,660 as impartial systems, 371 00:16:02,660 --> 00:16:06,199 so it's a way to police in an unbiased way. 372 00:16:06,199 --> 00:16:08,040 And so, these companies make 373 00:16:08,040 --> 00:16:08,670 statements like this-- 374 00:16:08,670 --> 00:16:10,800 by the way, the references will all be at the end, 375 00:16:10,800 --> 00:16:12,560 and they'll be on the slides-- 376 00:16:12,560 --> 00:16:13,370 So, for example 377 00:16:13,370 --> 00:16:16,110 the predictive crime analytics from Hitachi 378 00:16:16,110 --> 00:16:17,610 claims that the system is anonymous, 379 00:16:17,610 --> 00:16:19,350 because it shows you an area, 380 00:16:19,350 --> 00:16:23,060 it doesn't show you to look for a particular person. 381 00:16:23,060 --> 00:16:25,699 and PredPol reassures people that 382 00:16:25,699 --> 00:16:29,560 it eliminates any liberties or profiling concerns. 383 00:16:29,560 --> 00:16:32,269 And HunchLab notes that the system 384 00:16:32,269 --> 00:16:35,170 fairly represents priorities for public safety 385 00:16:35,170 --> 00:16:38,769 and is unbiased by race or ethnicity, for example. 386 00:16:38,769 --> 00:16:43,529 So, let's take a minute to describe in more detail 387 00:16:43,529 --> 00:16:48,100 what we mean when we talk about fairness. 388 00:16:48,100 --> 00:16:51,300 So, when we talk about fairness, 389 00:16:51,300 --> 00:16:52,740 we mean a few things. 390 00:16:52,740 --> 00:16:56,070 So, one is fairness with respect to individuals: 391 00:16:56,070 --> 00:16:58,040 so if I'm very similar to somebody 392 00:16:58,040 --> 00:17:00,170 and we go through some process 393 00:17:00,170 --> 00:17:03,430 and there is two very different outcomes to that process 394 00:17:03,430 --> 00:17:05,679 we would consider that to be unfair. 395 00:17:05,679 --> 00:17:07,929 So, we want similar people to be treated 396 00:17:07,929 --> 00:17:09,539 in a similar way. 397 00:17:09,539 --> 00:17:13,079 But, there are certain protected attributes 398 00:17:13,079 --> 00:17:15,199 that we wouldn't want someone 399 00:17:15,199 --> 00:17:17,099 to discriminate based on. 400 00:17:17,099 --> 00:17:20,069 And so, there's this other property, Group Fairness. 401 00:17:20,069 --> 00:17:22,249 So, we can look at the statistical parity 402 00:17:22,249 --> 00:17:25,439 between groups, based on gender, race, etc. 403 00:17:25,439 --> 00:17:28,049 and see if they're treated in a similar way. 404 00:17:28,049 --> 00:17:30,409 And we might not expect that in some cases, 405 00:17:30,409 --> 00:17:32,429 for example if the base rates in each group 406 00:17:32,429 --> 00:17:34,659 are very different. 407 00:17:34,659 --> 00:17:36,889 And then there's also Fairness in Errors. 408 00:17:36,889 --> 00:17:40,080 All predictive systems are gonna make errors, 409 00:17:40,080 --> 00:17:42,989 and if the errors are concentrated, 410 00:17:42,989 --> 00:17:46,399 then that may also represent unfairness. 411 00:17:46,399 --> 00:17:50,149 And so this concern arose recently with Facebook 412 00:17:50,149 --> 00:17:52,289 because people with Native American names 413 00:17:52,289 --> 00:17:54,389 had their profiles flagged as fraudulent 414 00:17:54,389 --> 00:17:58,759 far more often than those with White American names. 415 00:17:58,759 --> 00:18:00,559 So these are the sorts of things that we worry about 416 00:18:00,559 --> 00:18:02,190 and each of these are metrics, 417 00:18:02,190 --> 00:18:04,239 and if you're interested more you should 418 00:18:04,239 --> 00:18:06,159 check those 2 papers out. 419 00:18:06,159 --> 00:18:10,639 So, how can potential issues with predictive policing 420 00:18:10,639 --> 00:18:13,850 have implications for these principles? 421 00:18:13,850 --> 00:18:18,559 So, one problem is the training data that's used. 422 00:18:18,559 --> 00:18:21,059 Some of these systems only use crime statistics, 423 00:18:21,059 --> 00:18:23,600 other systems-- all of them use crime statistics 424 00:18:23,600 --> 00:18:25,619 in some way. 425 00:18:25,619 --> 00:18:31,419 So, one problem is that crime databases 426 00:18:31,419 --> 00:18:34,830 contain only crimes that've been detected. 427 00:18:34,830 --> 00:18:38,629 Right? So, the police are only gonna detect 428 00:18:38,629 --> 00:18:41,009 crimes that they know are happening, 429 00:18:41,009 --> 00:18:44,109 either through patrol and their own investigation 430 00:18:44,109 --> 00:18:46,320 or because they've been alerted to crime, 431 00:18:46,320 --> 00:18:48,789 for example by a citizen calling the police. 432 00:18:48,789 --> 00:18:52,179 So, a citizen has to feel like they *can* call the police, 433 00:18:52,179 --> 00:18:54,019 like that's a good idea. 434 00:18:54,019 --> 00:18:58,789 So, some crimes suffer from this problem less than others: 435 00:18:58,789 --> 00:19:02,249 for example, gun violence is much easier to detect 436 00:19:02,249 --> 00:19:03,639 relative to fraud, for example, 437 00:19:03,639 --> 00:19:07,509 which is very difficult to detect. 438 00:19:07,509 --> 00:19:11,940 Now the racial profiling aspect of this might come in 439 00:19:11,940 --> 00:19:15,590 because of biased policing in the past. 440 00:19:15,590 --> 00:19:19,999 So, for example, for marijuana arrests, 441 00:19:19,999 --> 00:19:22,619 black people are arrested in the U.S. at rates 442 00:19:22,619 --> 00:19:25,119 4 times that of white people, 443 00:19:25,119 --> 00:19:27,960 even though there is statistical parity 444 00:19:27,960 --> 00:19:31,389 with these 2 groups, to within a few percent. 445 00:19:31,389 --> 00:19:35,820 So, this is where problems can arise. 446 00:19:35,820 --> 00:19:37,159 So, let's go back to this 447 00:19:37,159 --> 00:19:38,749 geographic-level predictive policing. 448 00:19:38,749 --> 00:19:42,460 So the danger here is that, unless this system 449 00:19:42,460 --> 00:19:44,299 is very carefully constructed, 450 00:19:44,299 --> 00:19:47,090 this sort of crime area ranking might 451 00:19:47,090 --> 00:19:49,019 again become a self-fulling prophecy. 452 00:19:49,019 --> 00:19:51,460 If you send police officers to these areas, 453 00:19:51,460 --> 00:19:53,220 you further scrutinize them, 454 00:19:53,220 --> 00:19:55,659 and then again you're only detecting a subset 455 00:19:55,659 --> 00:19:57,979 of crimes, and the cycle continues. 456 00:19:57,979 --> 00:20:02,139 So, one obvious issue is that 457 00:20:02,139 --> 00:20:07,599 this statement about geographic-based crime prediction 458 00:20:07,599 --> 00:20:10,229 being anonymous is not true, 459 00:20:10,229 --> 00:20:13,159 because race and location are very strongly 460 00:20:13,159 --> 00:20:14,840 correlated in the U.S. 461 00:20:14,840 --> 00:20:16,609 And this is something that machine-learning systems 462 00:20:16,609 --> 00:20:20,049 can potentially learn. 463 00:20:20,049 --> 00:20:23,039 Another issue is that, for example, 464 00:20:23,039 --> 00:20:25,580 for individual fairness, one of my homes 465 00:20:25,580 --> 00:20:27,599 sits within one of these boxes. 466 00:20:27,599 --> 00:20:29,950 Some of these boxes in these systems are very small, 467 00:20:29,950 --> 00:20:33,399 for example PredPol is 500ft x 500ft, 468 00:20:33,399 --> 00:20:36,349 so it's maybe only a few houses. 469 00:20:36,349 --> 00:20:39,149 So, the implications of this system are that 470 00:20:39,149 --> 00:20:40,849 you have police officers maybe sitting 471 00:20:40,849 --> 00:20:42,979 in a police cruiser outside your home 472 00:20:42,979 --> 00:20:45,450 and a few doors down someone 473 00:20:45,450 --> 00:20:46,799 may not be within that box, 474 00:20:46,799 --> 00:20:48,159 and doesn't have this. 475 00:20:48,159 --> 00:20:51,399 So, that may represent unfairness. 476 00:20:51,399 --> 00:20:54,929 So, there are real questions here, 477 00:20:54,929 --> 00:20:57,720 especially because there's no opt-out. 478 00:20:57,720 --> 00:21:00,059 There's no way to opt-out of this system: 479 00:21:00,059 --> 00:21:02,239 if you live in a city that has this, 480 00:21:02,239 --> 00:21:04,909 then you have to deal with it. 481 00:21:04,909 --> 00:21:07,229 So, it's quite difficult to find out 482 00:21:07,229 --> 00:21:09,879 what's really going on 483 00:21:09,879 --> 00:21:11,169 because the algorithm is secret. 484 00:21:11,169 --> 00:21:13,049 And, in most cases, we don't know 485 00:21:13,049 --> 00:21:14,789 the full details of the inputs. 486 00:21:14,789 --> 00:21:16,679 We have some idea about what features are used, 487 00:21:16,679 --> 00:21:17,970 but that's about it. 488 00:21:17,970 --> 00:21:19,509 We also don't know the output. 489 00:21:19,509 --> 00:21:21,899 That would be knowing police allocation, 490 00:21:21,899 --> 00:21:23,179 police strategies, 491 00:21:23,179 --> 00:21:26,299 and in order to nail down what's really going on here 492 00:21:26,299 --> 00:21:28,609 in order to verify the validity of 493 00:21:28,609 --> 00:21:30,009 these companies' claims, 494 00:21:30,009 --> 00:21:33,799 it may be necessary to have a 3rd party come in, 495 00:21:33,799 --> 00:21:35,629 examine the inputs and outputs of the system, 496 00:21:35,629 --> 00:21:37,590 and say concretely what's going on. 497 00:21:37,590 --> 00:21:39,460 And if everything is fine and dandy 498 00:21:39,460 --> 00:21:40,929 then this shouldn't be a problem. 499 00:21:40,929 --> 00:21:43,619 So, that's potentially one role that 500 00:21:43,619 --> 00:21:44,769 advocates can play. 501 00:21:44,769 --> 00:21:46,720 Maybe we should start pushing for audits 502 00:21:46,720 --> 00:21:48,820 of systems that are used in this way. 503 00:21:48,820 --> 00:21:50,970 These could have serious implications 504 00:21:50,970 --> 00:21:52,679 for peoples' lives. 505 00:21:52,679 --> 00:21:55,249 So, we'll return to this idea a little bit later, 506 00:21:55,249 --> 00:21:58,210 but for now this leads us nicely to Transparency. 507 00:21:58,210 --> 00:21:59,419 So, we wanna know 508 00:21:59,419 --> 00:22:01,929 what these systems are doing. 509 00:22:01,929 --> 00:22:04,729 But it's very hard, for the reasons described earlier, 510 00:22:04,729 --> 00:22:06,139 but even in the case of something like 511 00:22:06,139 --> 00:22:09,849 trying to understand Google's search algorithm, 512 00:22:09,849 --> 00:22:11,679 it's difficult because it's personalized. 513 00:22:11,679 --> 00:22:13,529 So, by construction, each user is 514 00:22:13,529 --> 00:22:15,320 only seeing one endpoint. 515 00:22:15,320 --> 00:22:18,169 So, it's a very isolating system. 516 00:22:18,169 --> 00:22:20,349 What do other people see? 517 00:22:20,349 --> 00:22:22,409 And one reason it's difficult to make 518 00:22:22,409 --> 00:22:24,099 some of these systems transparent 519 00:22:24,099 --> 00:22:26,679 is because of, simply, the complexity 520 00:22:26,679 --> 00:22:27,950 of the algorithms. 521 00:22:27,950 --> 00:22:30,309 So, an algorithm can become so complex that 522 00:22:30,309 --> 00:22:31,669 it's difficult to comprehend, 523 00:22:31,669 --> 00:22:33,289 even for the designer of the system, 524 00:22:33,289 --> 00:22:35,509 or the implementer of the system. 525 00:22:35,509 --> 00:22:38,419 The designed might know that this algorithm 526 00:22:38,419 --> 00:22:42,889 maximizes some metric-- say, accuracy, 527 00:22:42,889 --> 00:22:44,570 but they may not always have a solid 528 00:22:44,570 --> 00:22:46,779 understanding of what the algorithm is doing 529 00:22:46,779 --> 00:22:48,330 for all inputs. 530 00:22:48,330 --> 00:22:50,970 Certainly with respect to fairness. 531 00:22:50,970 --> 00:22:55,759 So, in some cases, it might not be appropriate to use 532 00:22:55,759 --> 00:22:57,379 an extremely complex model. 533 00:22:57,379 --> 00:22:59,529 It might be better to use a simpler system 534 00:22:59,529 --> 00:23:02,910 with human-interpretable features. 535 00:23:02,910 --> 00:23:04,749 Another issue that arises 536 00:23:04,749 --> 00:23:07,559 from the opacity of these systems 537 00:23:07,559 --> 00:23:09,409 and the centralized control 538 00:23:09,409 --> 00:23:11,860 is that it makes them very influential. 539 00:23:11,860 --> 00:23:13,950 And thus, an excellent target 540 00:23:13,950 --> 00:23:16,210 for manipulation or tampering. 541 00:23:16,210 --> 00:23:18,479 So, this might be tampering that is done 542 00:23:18,479 --> 00:23:21,950 from an organization that controls the system, 543 00:23:21,950 --> 00:23:23,769 or an insider at one of the organizations, 544 00:23:23,769 --> 00:23:27,139 or anyone who's able to compromise their security. 545 00:23:27,139 --> 00:23:30,249 So, this is an interesting academic work 546 00:23:30,249 --> 00:23:32,099 that looked at the possibility of 547 00:23:32,099 --> 00:23:34,159 slightly modifying search rankings 548 00:23:34,159 --> 00:23:36,619 to shift people's political views. 549 00:23:36,619 --> 00:23:39,009 So, since people are most likely to 550 00:23:39,009 --> 00:23:41,330 click on the top search results, 551 00:23:41,330 --> 00:23:44,429 so 90% of clicks go to the first page of search results, 552 00:23:44,429 --> 00:23:46,719 then perhaps by reshuffling things a little bit, 553 00:23:46,719 --> 00:23:48,729 or maybe dropping some search results, 554 00:23:48,729 --> 00:23:50,269 you can influence people's views 555 00:23:50,269 --> 00:23:51,679 in a coherent way, 556 00:23:51,679 --> 00:23:53,090 and maybe you can make it so subtle 557 00:23:53,090 --> 00:23:55,749 that no one is able to notice. 558 00:23:55,749 --> 00:23:57,249 So in this academic study, 559 00:23:57,249 --> 00:24:00,349 they did an experiment 560 00:24:00,349 --> 00:24:02,070 in the 2014 Indian election. 561 00:24:02,070 --> 00:24:04,219 So they used real voters, 562 00:24:04,219 --> 00:24:06,450 and they kept the size of the experiment small enough 563 00:24:06,450 --> 00:24:08,190 that it was not going to influence the outcome 564 00:24:08,190 --> 00:24:10,090 of the election. 565 00:24:10,090 --> 00:24:12,139 So the researchers took people, 566 00:24:12,139 --> 00:24:14,229 they determined their political leaning, 567 00:24:14,229 --> 00:24:17,429 and they segmented them into control and treatment groups, 568 00:24:17,429 --> 00:24:19,269 where the treatment was manipulation 569 00:24:19,269 --> 00:24:21,210 of the search ranking results, 570 00:24:21,210 --> 00:24:24,409 And then they had these people browse the web. 571 00:24:24,409 --> 00:24:25,969 And what they found, is that 572 00:24:25,969 --> 00:24:28,229 this mechanism is very effective at shifting 573 00:24:28,229 --> 00:24:30,429 people's voter preferences. 574 00:24:30,429 --> 00:24:33,649 So, in this study, they were able to introduce 575 00:24:33,649 --> 00:24:36,849 a 20% shift in voter preferences. 576 00:24:36,849 --> 00:24:39,299 Even alerting users to the fact that this 577 00:24:39,299 --> 00:24:41,729 was going to be done, telling them 578 00:24:41,729 --> 00:24:44,049 "we are going to manipulate your search results," 579 00:24:44,049 --> 00:24:45,729 "really pay attention," 580 00:24:45,729 --> 00:24:49,099 they were totally unable to decrease 581 00:24:49,099 --> 00:24:50,859 the magnitude of the effect. 582 00:24:50,859 --> 00:24:55,109 So, the margins of error in many elections 583 00:24:55,109 --> 00:24:57,669 is incredibly small, 584 00:24:57,669 --> 00:24:59,929 and the authors estimate that this shift 585 00:24:59,929 --> 00:25:02,009 could change the outcome of about 586 00:25:02,009 --> 00:25:07,109 25% of elections worldwide, if this were done. 587 00:25:07,109 --> 00:25:10,919 And the bias is so small that no one can tell. 588 00:25:10,919 --> 00:25:14,279 So, all humans, no matter how smart 589 00:25:14,279 --> 00:25:17,109 and resistant to manipulation we think we are, 590 00:25:17,109 --> 00:25:21,909 all of us are subject to this sort of manipulation, 591 00:25:21,909 --> 00:25:24,320 and we really can't tell. 592 00:25:24,320 --> 00:25:27,129 So, I'm not saying that this is occurring, 593 00:25:27,129 --> 00:25:31,389 but right now there is no regulation to stop this, 594 00:25:31,389 --> 00:25:34,409 there is no way we could reliably detect this, 595 00:25:34,409 --> 00:25:37,210 so there's a huge amount of power here. 596 00:25:37,210 --> 00:25:39,779 So, something to think about. 597 00:25:39,779 --> 00:25:42,710 But it's not only corporations that are interested 598 00:25:42,710 --> 00:25:47,269 in this sort of behavioral manipulation. 599 00:25:47,269 --> 00:25:51,119 In 2010, UK Prime Minister David Cameron 600 00:25:51,119 --> 00:25:54,969 created this UK Behavioural Insights Team, 601 00:25:54,969 --> 00:25:57,269 which is informally called the Nudge Unit. 602 00:25:57,269 --> 00:26:01,489 And so what they do is they use behavioral science 603 00:26:01,489 --> 00:26:04,769 and this predictive analytics approach, 604 00:26:04,769 --> 00:26:06,119 with experimentation, 605 00:26:06,119 --> 00:26:07,940 to have people make better decisions 606 00:26:07,940 --> 00:26:09,690 for themselves and society-- 607 00:26:09,690 --> 00:26:11,989 as determined by the UK government. 608 00:26:11,989 --> 00:26:14,269 And as of a few months ago, 609 00:26:14,269 --> 00:26:16,849 after an executive order signed by Obama 610 00:26:16,849 --> 00:26:19,349 in September, the United States now has 611 00:26:19,349 --> 00:26:21,429 its own Nudge Unit. 612 00:26:21,429 --> 00:26:24,009 So, to be clear, I don't think that this is 613 00:26:24,009 --> 00:26:25,539 some sort of malicious plot. 614 00:26:25,539 --> 00:26:27,440 I think that there *can* be huge value 615 00:26:27,440 --> 00:26:29,489 in these sorts of initiatives, 616 00:26:29,489 --> 00:26:31,330 positively impacting people's lives, 617 00:26:31,330 --> 00:26:34,179 but when this sort of behavioral manipulation 618 00:26:34,179 --> 00:26:37,289 is being done, in part openly, 619 00:26:37,289 --> 00:26:39,460 oversight is pretty important, 620 00:26:39,460 --> 00:26:41,700 and we really need to consider 621 00:26:41,700 --> 00:26:46,090 what these systems are optimizing for. 622 00:26:46,090 --> 00:26:47,849 And that's something that we might 623 00:26:47,849 --> 00:26:52,090 not always know, or at least understand, 624 00:26:52,090 --> 00:26:54,450 so for example, for industry, 625 00:26:54,450 --> 00:26:57,679 we do have a pretty good understanding there: 626 00:26:57,679 --> 00:26:59,809 industry cares about optimizing for 627 00:26:59,809 --> 00:27:01,960 the time spent on the website, 628 00:27:01,960 --> 00:27:04,929 Facebook wants you to spend more time on Facebook, 629 00:27:04,929 --> 00:27:06,950 they want you to click on ads, 630 00:27:06,950 --> 00:27:09,109 click on newsfeed items, 631 00:27:09,109 --> 00:27:11,299 they want you to like things. 632 00:27:11,299 --> 00:27:14,309 And, fundamentally: profit. 633 00:27:14,309 --> 00:27:17,599 So, already this has some serious implications, 634 00:27:17,599 --> 00:27:19,690 and this had pretty serious implications 635 00:27:19,690 --> 00:27:22,190 in the last 10 years, in media for example. 636 00:27:22,190 --> 00:27:25,119 The optimizing for click-through rate in journalism 637 00:27:25,119 --> 00:27:26,629 has produced a race to the bottom 638 00:27:26,629 --> 00:27:28,039 in terms of quality. 639 00:27:28,039 --> 00:27:30,919 And another issue is that optimizing 640 00:27:30,919 --> 00:27:34,589 for what people like might not always be 641 00:27:34,589 --> 00:27:35,839 the best approach. 642 00:27:35,839 --> 00:27:38,859 So, Facebook officials have said publicly 643 00:27:38,859 --> 00:27:41,279 about how Facebook's goal is to make you happy, 644 00:27:41,279 --> 00:27:43,149 they want you to open that newsfeed 645 00:27:43,149 --> 00:27:45,080 and just feel great. 646 00:27:45,080 --> 00:27:47,379 But, there's an issue there, right? 647 00:27:47,379 --> 00:27:50,169 Because people get their news, 648 00:27:50,169 --> 00:27:52,369 like 40% of people according to Pew Research, 649 00:27:52,369 --> 00:27:54,599 get their news from Facebook. 650 00:27:54,599 --> 00:27:58,460 So, if people don't want to see 651 00:27:58,460 --> 00:28:01,239 war and corpses, because it makes them feel sad, 652 00:28:01,239 --> 00:28:04,179 so this is not a system that is gonna optimize 653 00:28:04,179 --> 00:28:07,149 for an informed population. 654 00:28:07,149 --> 00:28:09,359 It's not gonna produce a population that is 655 00:28:09,359 --> 00:28:11,469 ready to engage in civic life. 656 00:28:11,469 --> 00:28:13,059 It's gonna produce an amused populations 657 00:28:13,059 --> 00:28:16,809 whose time is occupied by cat pictures. 658 00:28:16,809 --> 00:28:19,159 So, in politics, we have a similar 659 00:28:19,159 --> 00:28:21,269 optimization problem that's occurring. 660 00:28:21,269 --> 00:28:23,769 So, these political campaigns that use 661 00:28:23,769 --> 00:28:26,769 these predictive systems, 662 00:28:26,769 --> 00:28:28,669 are optimizing for votes for the desired candidate, 663 00:28:28,669 --> 00:28:30,200 of course. 664 00:28:30,200 --> 00:28:33,499 So, instead of a political campaign being 665 00:28:33,499 --> 00:28:36,139 --well, maybe this is a naive view, but-- 666 00:28:36,139 --> 00:28:38,070 being an open discussion of the issues 667 00:28:38,070 --> 00:28:39,830 facing the country, 668 00:28:39,830 --> 00:28:43,200 it becomes this micro-targeted persuasion game, 669 00:28:43,200 --> 00:28:44,669 and the people that get targeted 670 00:28:44,669 --> 00:28:47,349 are a very small subset of all people, 671 00:28:47,349 --> 00:28:49,399 and it's only gonna be people that are 672 00:28:49,399 --> 00:28:51,409 you know, on the edge, maybe disinterested, 673 00:28:51,409 --> 00:28:54,399 those are the people that are gonna get attention 674 00:28:54,399 --> 00:28:58,839 from political candidates. 675 00:28:58,839 --> 00:29:01,869 In policy, as with these Nudge Units, 676 00:29:01,869 --> 00:29:03,539 they're being used to enable 677 00:29:03,539 --> 00:29:06,109 better use of government services. 678 00:29:06,109 --> 00:29:07,419 There are some good projects that have 679 00:29:07,419 --> 00:29:09,419 come out of this: 680 00:29:09,419 --> 00:29:11,409 increasing voter registration, 681 00:29:11,409 --> 00:29:12,739 improving health outcomes, 682 00:29:12,739 --> 00:29:14,419 improving education outcomes. 683 00:29:14,419 --> 00:29:16,419 But some of these predictive systems 684 00:29:16,419 --> 00:29:18,229 that we're starting to see in government 685 00:29:18,229 --> 00:29:20,700 are optimizing for compliance, 686 00:29:20,700 --> 00:29:23,669 as is the case with predictive policing. 687 00:29:23,669 --> 00:29:25,460 So this is something that we need to 688 00:29:25,460 --> 00:29:28,649 watch carefully. 689 00:29:28,649 --> 00:29:30,119 I think this is a nice quote that 690 00:29:30,119 --> 00:29:33,339 sort of describes the problem. 691 00:29:33,339 --> 00:29:35,200 In some ways me might be narrowing 692 00:29:35,200 --> 00:29:38,259 our horizon, and the danger is that 693 00:29:38,259 --> 00:29:41,989 these tools are separating people. 694 00:29:41,989 --> 00:29:43,570 And this is particularly bad 695 00:29:43,570 --> 00:29:45,940 for political action, because political action 696 00:29:45,940 --> 00:29:49,879 requires people to have shared experience, 697 00:29:49,879 --> 00:29:53,799 and thus are able to collectively act 698 00:29:53,799 --> 00:29:57,629 to exert pressure to fix problems. 699 00:29:57,629 --> 00:30:00,810 So, finally: accountability. 700 00:30:00,810 --> 00:30:03,399 So, we need some oversight mechanisms. 701 00:30:03,399 --> 00:30:06,519 For example, in the case of errors-- 702 00:30:06,519 --> 00:30:08,219 so this is particularly important for 703 00:30:08,219 --> 00:30:10,849 civil or bureaucratic systems. 704 00:30:10,849 --> 00:30:14,330 So, when an algorithm produces some decision, 705 00:30:14,330 --> 00:30:16,549 we don't always want humans to just 706 00:30:16,549 --> 00:30:18,039 defer to the machine, 707 00:30:18,039 --> 00:30:21,859 and that might represent one of the problems. 708 00:30:21,859 --> 00:30:25,419 So, there are starting to be some cases 709 00:30:25,419 --> 00:30:28,039 of computer algorithms yielding a decision, 710 00:30:28,039 --> 00:30:30,409 and then humans being unable to correct 711 00:30:30,409 --> 00:30:31,799 an obvious error. 712 00:30:31,799 --> 00:30:35,190 So there's this case in Georgia, in the United States, 713 00:30:35,190 --> 00:30:37,259 where 2 young people went to 714 00:30:37,259 --> 00:30:38,529 the Department of Motor Vehicles, 715 00:30:38,529 --> 00:30:39,749 they're twins, and they went 716 00:30:39,749 --> 00:30:42,099 to get their driver's license. 717 00:30:42,099 --> 00:30:44,979 However, they were both flagged by 718 00:30:44,979 --> 00:30:47,489 a fraud algorithm that uses facial recognition 719 00:30:47,489 --> 00:30:48,809 to look for similar faces, 720 00:30:48,809 --> 00:30:50,919 and I guess the people that designed the system 721 00:30:50,919 --> 00:30:54,549 didn't think of the possibility of twins. 722 00:30:54,549 --> 00:30:58,489 Yeah. So, they just left 723 00:30:58,489 --> 00:30:59,889 without their driver's licenses. 724 00:30:59,889 --> 00:31:01,889 The people in the Department of Motor Vehicles 725 00:31:01,889 --> 00:31:03,809 were unable to correct this. 726 00:31:03,809 --> 00:31:06,820 So, this is one implication-- 727 00:31:06,820 --> 00:31:08,579 it's like something out of Kafka. 728 00:31:08,579 --> 00:31:11,529 But there are also cases of errors being made, 729 00:31:11,529 --> 00:31:13,879 and people not noticing until 730 00:31:13,879 --> 00:31:15,909 after actions have been taken, 731 00:31:15,909 --> 00:31:17,570 some of them very serious-- 732 00:31:17,570 --> 00:31:19,129 because people simply deferred 733 00:31:19,129 --> 00:31:20,619 to the machine. 734 00:31:20,619 --> 00:31:23,309 So, this is an example from San Francisco. 735 00:31:23,309 --> 00:31:26,679 So, an ALPR-- an Automated License Plate Reader-- 736 00:31:26,679 --> 00:31:29,429 is a device that uses image recognition 737 00:31:29,429 --> 00:31:32,099 to detect and read license plates, 738 00:31:32,099 --> 00:31:34,339 and usually to compare license plates 739 00:31:34,339 --> 00:31:37,159 with a known list of plates of interest. 740 00:31:37,159 --> 00:31:39,799 And, so, San Francisco uses these 741 00:31:39,799 --> 00:31:42,179 and they're mounted on police cars. 742 00:31:42,179 --> 00:31:46,659 So, in this case, San Francisco ALPR 743 00:31:46,659 --> 00:31:48,879 got a hit on a car, 744 00:31:48,879 --> 00:31:53,029 and it was the car of a 47-year-old woman, 745 00:31:53,029 --> 00:31:54,839 with no criminal history. 746 00:31:54,839 --> 00:31:56,029 And so it was a false hit 747 00:31:56,029 --> 00:31:58,099 because it was a blurry image, 748 00:31:58,099 --> 00:31:59,709 and it matched erroneously with 749 00:31:59,709 --> 00:32:00,909 one of the plates of interest 750 00:32:00,909 --> 00:32:03,479 that happened to be a stolen vehicle. 751 00:32:03,479 --> 00:32:06,869 So, they conducted a traffic stop on her, 752 00:32:06,869 --> 00:32:09,330 and they take her out of the vehicle, 753 00:32:09,330 --> 00:32:11,049 they search her and the vehicle, 754 00:32:11,049 --> 00:32:12,659 she gets a pat-down, 755 00:32:12,659 --> 00:32:14,849 and they have her kneel 756 00:32:14,849 --> 00:32:17,780 at gunpoint, in the street. 757 00:32:17,780 --> 00:32:20,989 So, how much oversight should be present 758 00:32:20,989 --> 00:32:23,999 depends on the implications of the system. 759 00:32:23,999 --> 00:32:25,279 It's certainly the case that 760 00:32:25,279 --> 00:32:26,910 for some of these decision-making systems, 761 00:32:26,910 --> 00:32:29,219 an error might not be that important, 762 00:32:29,219 --> 00:32:31,149 it could be relatively harmless, 763 00:32:31,149 --> 00:32:33,559 but in this case, an error in this algorithmic decision 764 00:32:33,559 --> 00:32:36,259 led to this totally innocent person 765 00:32:36,259 --> 00:32:40,019 literally having a gun pointed at her. 766 00:32:40,019 --> 00:32:44,019 So, that brings us to: we need some way of 767 00:32:44,019 --> 00:32:45,419 getting some information about 768 00:32:45,419 --> 00:32:47,249 what is going on here. 769 00:32:47,249 --> 00:32:50,179 We don't wanna have to wait for these events 770 00:32:50,179 --> 00:32:52,580 before we are able to determine 771 00:32:52,580 --> 00:32:54,409 some information about the system. 772 00:32:54,409 --> 00:32:56,139 So, auditing is one option: 773 00:32:56,139 --> 00:32:58,109 to independently verify the statements 774 00:32:58,109 --> 00:33:00,809 of companies, in situations where we have 775 00:33:00,809 --> 00:33:02,939 inputs and outputs. 776 00:33:02,939 --> 00:33:05,200 So, for example, this could be done with 777 00:33:05,200 --> 00:33:07,489 Google, Facebook. 778 00:33:07,489 --> 00:33:09,190 If you have the inputs of a system, 779 00:33:09,190 --> 00:33:10,649 say you have test accounts, 780 00:33:10,649 --> 00:33:11,729 or real accounts, 781 00:33:11,729 --> 00:33:14,359 maybe you can collect people's information together. 782 00:33:14,359 --> 00:33:15,830 So that was something that was done 783 00:33:15,830 --> 00:33:18,759 during the 2012 Obama campaign 784 00:33:18,759 --> 00:33:20,249 by ProPublica. 785 00:33:20,249 --> 00:33:21,269 People noticed that they were getting 786 00:33:21,269 --> 00:33:24,739 different emails from the Obama campaign, 787 00:33:24,739 --> 00:33:26,009 and were interested to see 788 00:33:26,009 --> 00:33:28,209 based on what factors 789 00:33:28,209 --> 00:33:29,749 the emails were changing. 790 00:33:29,749 --> 00:33:32,659 So, I think about 200 people submitted emails 791 00:33:32,659 --> 00:33:34,940 and they were able to determine some information 792 00:33:34,940 --> 00:33:38,809 about what the emails were being varied based on. 793 00:33:38,809 --> 00:33:40,859 So there have been some successful 794 00:33:40,859 --> 00:33:43,080 attempts at this. 795 00:33:43,080 --> 00:33:45,919 So, compare inputs and then look at 796 00:33:45,919 --> 00:33:48,709 why one item was shown to one user 797 00:33:48,709 --> 00:33:50,289 and not another, and see if there's 798 00:33:50,289 --> 00:33:51,879 any statistical differences. 799 00:33:51,879 --> 00:33:56,279 So, there's some potential legal issues 800 00:33:56,279 --> 00:33:57,749 with the test accounts, so that's something 801 00:33:57,749 --> 00:34:01,499 to think about-- I'm not a lawyer. 802 00:34:01,499 --> 00:34:03,919 So, for example, if you wanna examine 803 00:34:03,919 --> 00:34:06,269 ad-targeting algorithms, 804 00:34:06,269 --> 00:34:07,969 one way to proceed is to construct 805 00:34:07,969 --> 00:34:10,589 a browsing profile, and then examine 806 00:34:10,589 --> 00:34:12,989 what ads are served back to you. 807 00:34:12,989 --> 00:34:14,119 And so this is something that 808 00:34:14,119 --> 00:34:16,250 academic researchers have looked at, 809 00:34:16,250 --> 00:34:17,489 because, at the time at least, 810 00:34:17,489 --> 00:34:20,879 you didn't need to make an account to do this. 811 00:34:20,879 --> 00:34:24,768 So, this was a study that was presented at 812 00:34:24,768 --> 00:34:27,799 Privacy Enhancing Technologies last year, 813 00:34:27,799 --> 00:34:31,149 and in this study, the researchers 814 00:34:31,149 --> 00:34:33,179 generate some browsing profiles 815 00:34:33,179 --> 00:34:35,909 that differ only by one characteristic, 816 00:34:35,909 --> 00:34:37,690 so they're basically identical in every way 817 00:34:37,690 --> 00:34:39,049 except for one thing. 818 00:34:39,049 --> 00:34:42,359 And that is denoted by Treatment 1 and 2. 819 00:34:42,359 --> 00:34:44,460 So this is a randomized, controlled trial, 820 00:34:44,460 --> 00:34:46,389 but I left out the randomization part 821 00:34:46,389 --> 00:34:48,220 for simplicity. 822 00:34:48,220 --> 00:34:54,799 So, in one study, they applied a treatment of gender. 823 00:34:54,799 --> 00:34:56,799 So, they had the browsing profiles 824 00:34:56,799 --> 00:34:59,319 in Treatment 1 be male browsing profiles, 825 00:34:59,319 --> 00:35:02,029 and the browsing profiles in Treatment 2 be female. 826 00:35:02,029 --> 00:35:04,430 And they wanted to see: is there any difference 827 00:35:04,430 --> 00:35:06,079 in the way that ads are targeted 828 00:35:06,079 --> 00:35:08,710 if browsing profiles are effectively identical 829 00:35:08,710 --> 00:35:11,019 except for gender? 830 00:35:11,019 --> 00:35:14,710 So, it turns out that there *was*. 831 00:35:14,710 --> 00:35:19,180 So, a 3rd-party site was showing Google ads 832 00:35:19,180 --> 00:35:21,289 for senior executive positions 833 00:35:21,289 --> 00:35:23,980 at a rate 6 times higher to the fake men 834 00:35:23,980 --> 00:35:27,059 than for the fake women in this study. 835 00:35:27,059 --> 00:35:30,109 So, this sort of auditing is not going to 836 00:35:30,109 --> 00:35:32,779 be able to determine everything 837 00:35:32,779 --> 00:35:34,930 that algorithms are doing, but they can 838 00:35:34,930 --> 00:35:36,519 sometimes uncover interesting, 839 00:35:36,519 --> 00:35:40,900 at least statistical differences. 840 00:35:40,900 --> 00:35:47,099 So, this leads us to the fundamental issue: 841 00:35:47,099 --> 00:35:49,180 Right now, we're really not in control 842 00:35:49,180 --> 00:35:50,510 of some of these systems, 843 00:35:50,510 --> 00:35:54,480 and we really need these predictive systems 844 00:35:54,480 --> 00:35:56,119 to be controlled by us, 845 00:35:56,119 --> 00:35:57,819 in order for them not to be used 846 00:35:57,819 --> 00:36:00,109 as a system of control. 847 00:36:00,109 --> 00:36:03,220 So there are some technologies that I'd like 848 00:36:03,220 --> 00:36:06,890 to point you all to. 849 00:36:06,890 --> 00:36:08,319 We need tools in the digital commons 850 00:36:08,319 --> 00:36:11,160 that can help address some of these concerns. 851 00:36:11,160 --> 00:36:13,349 So, the first thing is that of course 852 00:36:13,349 --> 00:36:14,730 we known that minimizing the amount of 853 00:36:14,730 --> 00:36:17,069 data available can help in some contexts, 854 00:36:17,069 --> 00:36:18,980 which we can do by making systems 855 00:36:18,980 --> 00:36:22,779 that are private by design, and by default. 856 00:36:22,779 --> 00:36:24,549 Another thing is that these audit tools 857 00:36:24,549 --> 00:36:25,890 might be useful. 858 00:36:25,890 --> 00:36:30,720 And, so, these 2 nice examples in academia... 859 00:36:30,720 --> 00:36:34,359 the ad experiment that I just showed was done 860 00:36:34,359 --> 00:36:36,120 using AdFisher. 861 00:36:36,120 --> 00:36:38,200 So, these are 2 toolkits that you can use 862 00:36:38,200 --> 00:36:41,440 to start doing this sort of auditing. 863 00:36:41,440 --> 00:36:44,579 Another technology that is generally useful, 864 00:36:44,579 --> 00:36:46,700 but particularly in the case of prediction 865 00:36:46,700 --> 00:36:48,789 it's useful to maintain access to 866 00:36:48,789 --> 00:36:50,289 as many sites as possible, 867 00:36:50,289 --> 00:36:52,589 through anonymity systems like Tor, 868 00:36:52,589 --> 00:36:54,319 because it's impossible to personalize 869 00:36:54,319 --> 00:36:55,650 when everyone looks the same. 870 00:36:55,650 --> 00:36:59,130 So this is a very important technology. 871 00:36:59,130 --> 00:37:01,519 Something that doesn't really exist, 872 00:37:01,519 --> 00:37:03,630 but that I think is pretty important, 873 00:37:03,630 --> 00:37:05,829 is having some tool to view the landscape. 874 00:37:05,829 --> 00:37:08,160 So, as we know from these few studies 875 00:37:08,160 --> 00:37:10,440 that have been done, 876 00:37:10,440 --> 00:37:12,059 different people are not seeing the internet 877 00:37:12,059 --> 00:37:12,950 in the same way. 878 00:37:12,950 --> 00:37:15,730 This is one reason why we don't like censorship. 879 00:37:15,730 --> 00:37:17,880 But, rich and poor people, 880 00:37:17,880 --> 00:37:19,659 from academic research we know that 881 00:37:19,659 --> 00:37:23,790 there is widespread price discrimination on the internet, 882 00:37:23,790 --> 00:37:25,650 so rich and poor people see a different view 883 00:37:25,650 --> 00:37:26,970 of the Internet, 884 00:37:26,970 --> 00:37:28,400 men and women see a different view 885 00:37:28,400 --> 00:37:29,940 of the Internet. 886 00:37:29,940 --> 00:37:31,200 We wanna know how different people 887 00:37:31,200 --> 00:37:32,450 see the same site, 888 00:37:32,450 --> 00:37:34,329 and this could be the beginning of 889 00:37:34,329 --> 00:37:36,329 a defense system for this sort of 890 00:37:36,329 --> 00:37:41,730 manipulation/tampering that I showed earlier. 891 00:37:41,730 --> 00:37:45,549 Another interesting approach is obfuscation: 892 00:37:45,549 --> 00:37:46,980 injecting noise into the system. 893 00:37:46,980 --> 00:37:49,190 So there's an interesting browser extension 894 00:37:49,190 --> 00:37:51,720 called Adnauseum, that's for Firefox, 895 00:37:51,720 --> 00:37:54,579 which clicks on every single ad you're served, 896 00:37:54,579 --> 00:37:55,680 to inject noise. 897 00:37:55,680 --> 00:37:57,019 So that's, I think, an interesting approach 898 00:37:57,019 --> 00:38:00,170 that people haven't looked at too much. 899 00:38:00,170 --> 00:38:03,780 So in terms of policy, 900 00:38:03,780 --> 00:38:06,530 Facebook and Google, these internet giants, 901 00:38:06,530 --> 00:38:08,829 have billions of users, 902 00:38:08,829 --> 00:38:12,220 and sometimes they like to call themselves 903 00:38:12,220 --> 00:38:13,769 new public utilities, 904 00:38:13,769 --> 00:38:15,000 and if that's the case then 905 00:38:15,000 --> 00:38:17,549 it might be necessary to subject them 906 00:38:17,549 --> 00:38:20,539 to additional regulation. 907 00:38:20,539 --> 00:38:21,990 Another problem that's come up, 908 00:38:21,990 --> 00:38:23,539 for example with some of the studies 909 00:38:23,539 --> 00:38:24,900 that Facebook has done, 910 00:38:24,900 --> 00:38:29,039 is sometimes a lack of ethics review. 911 00:38:29,039 --> 00:38:31,059 So, for example, in academia, 912 00:38:31,059 --> 00:38:33,859 if you're gonna do research involving humans, 913 00:38:33,859 --> 00:38:35,390 there's an Institutional Review Board 914 00:38:35,390 --> 00:38:36,970 that you go to that verifies that 915 00:38:36,970 --> 00:38:39,140 you're doing things in an ethical manner. 916 00:38:39,140 --> 00:38:40,910 And some companies do have internal 917 00:38:40,910 --> 00:38:43,029 review processes like this, but it might 918 00:38:43,029 --> 00:38:45,119 be important to have an independent 919 00:38:45,119 --> 00:38:48,200 ethics board that does this sort of thing. 920 00:38:48,200 --> 00:38:50,849 And we *really* need 3rd-party auditing. 921 00:38:50,849 --> 00:38:54,519 So, for example, some companies 922 00:38:54,519 --> 00:38:56,220 don't want auditing to be done 923 00:38:56,220 --> 00:38:59,190 because of IP concerns, 924 00:38:59,190 --> 00:39:00,579 and if that's the concern 925 00:39:00,579 --> 00:39:03,180 maybe having a set of people 926 00:39:03,180 --> 00:39:05,680 that are not paid by the company 927 00:39:05,680 --> 00:39:07,200 to check how some of these systems 928 00:39:07,200 --> 00:39:08,640 are being implemented, 929 00:39:08,640 --> 00:39:11,240 could help give us confidence that 930 00:39:11,240 --> 00:39:16,979 things are being done in a reasonable way. 931 00:39:16,979 --> 00:39:20,269 So, in closing, 932 00:39:20,269 --> 00:39:23,180 algorithmic decision making is here, 933 00:39:23,180 --> 00:39:26,140 and it's barreling forward at a very fast rate, 934 00:39:26,140 --> 00:39:27,890 and we need to figure out what 935 00:39:27,890 --> 00:39:30,410 the guide rails should be, 936 00:39:30,410 --> 00:39:31,380 and how to install them 937 00:39:31,380 --> 00:39:33,119 to handle some of the potential threats. 938 00:39:33,119 --> 00:39:35,470 There's a huge amount of power here. 939 00:39:35,470 --> 00:39:37,910 We need more openness in these systems. 940 00:39:37,910 --> 00:39:39,589 And, right now, 941 00:39:39,589 --> 00:39:41,559 with the intelligent systems that do exist, 942 00:39:41,559 --> 00:39:43,920 we don't know what's occurring really, 943 00:39:43,920 --> 00:39:46,510 and we need to watch carefully 944 00:39:46,510 --> 00:39:49,099 where and how these systems are being used. 945 00:39:49,099 --> 00:39:50,690 And I think this community has 946 00:39:50,690 --> 00:39:53,940 an important role to play in this fight, 947 00:39:53,940 --> 00:39:55,730 to study what's being done, 948 00:39:55,730 --> 00:39:57,160 to show people what's being done, 949 00:39:57,160 --> 00:39:58,670 to raise the debate and advocate, 950 00:39:58,670 --> 00:40:01,200 and, where necessary, to resist. 951 00:40:01,200 --> 00:40:03,339 Thanks. 952 00:40:03,339 --> 00:40:13,129 *applause* 953 00:40:13,129 --> 00:40:17,519 Herald: So, let's have a question and answer. 954 00:40:17,519 --> 00:40:19,080 Microphone 2, please. 955 00:40:19,080 --> 00:40:20,199 Mic 2: Hi there. 956 00:40:20,199 --> 00:40:23,259 Thanks for the talk. 957 00:40:23,259 --> 00:40:26,230 Since these pre-crime softwares also 958 00:40:26,230 --> 00:40:27,359 arrived here in Germany 959 00:40:27,359 --> 00:40:29,680 with the start of the so-called CopWatch system 960 00:40:29,680 --> 00:40:32,779 in southern Germany, and Bavaria and Nuremberg especially, 961 00:40:32,779 --> 00:40:35,420 where they try to predict burglary crime 962 00:40:35,420 --> 00:40:37,460 using that criminal record 963 00:40:37,460 --> 00:40:40,170 geographical analysis, like you explained, 964 00:40:40,170 --> 00:40:43,380 leads me to a 2-fold question: 965 00:40:43,380 --> 00:40:47,900 first, have you heard of any research 966 00:40:47,900 --> 00:40:49,760 that measures the effectiveness 967 00:40:49,760 --> 00:40:53,690 of such measures, at all? 968 00:40:53,690 --> 00:40:57,040 And, second: 969 00:40:57,040 --> 00:41:00,599 What do you think of the game theory 970 00:41:00,599 --> 00:41:02,690 if the thieves or the bad guys 971 00:41:02,690 --> 00:41:07,619 know the system, and when they game the system, 972 00:41:07,619 --> 00:41:09,980 they will probably win, 973 00:41:09,980 --> 00:41:11,640 since one police officer in an interview said 974 00:41:11,640 --> 00:41:14,019 this system is used to reduce 975 00:41:14,019 --> 00:41:16,460 the personal costs of policing, 976 00:41:16,460 --> 00:41:19,460 so they just send the guys where the red flags are, 977 00:41:19,460 --> 00:41:22,290 and the others take the day off. 978 00:41:22,290 --> 00:41:24,360 Dr. Helsby: Yup. 979 00:41:24,360 --> 00:41:27,150 Um, so, with respect to 980 00:41:27,150 --> 00:41:30,990 testing the effectiveness of predictive policing, 981 00:41:30,990 --> 00:41:31,990 the companies, 982 00:41:31,990 --> 00:41:33,910 some of them do randomized, controlled trials 983 00:41:33,910 --> 00:41:35,240 and claim a reduction in policing. 984 00:41:35,240 --> 00:41:38,349 The best independent study that I've seen 985 00:41:38,349 --> 00:41:40,680 is by this RAND Corporation 986 00:41:40,680 --> 00:41:43,120 that did a study in, I think, 987 00:41:43,120 --> 00:41:44,920 Shreveport, Louisiana, 988 00:41:44,920 --> 00:41:47,589 and in their report they claim 989 00:41:47,589 --> 00:41:50,190 that there was no statistically significant 990 00:41:50,190 --> 00:41:52,900 difference, they didn't find any reduction. 991 00:41:52,900 --> 00:41:54,099 And it *was* specifically looking at 992 00:41:54,099 --> 00:41:56,730 property crime, which I think you mentioned. 993 00:41:56,730 --> 00:41:59,480 So, I think right now there's sort of 994 00:41:59,480 --> 00:42:01,069 conflicting reports between 995 00:42:01,069 --> 00:42:06,180 the independent auditors and these company claims. 996 00:42:06,180 --> 00:42:09,289 So there definitely needs to be more study. 997 00:42:09,289 --> 00:42:12,240 And then, the 2nd thing...sorry, remind me what it was? 998 00:42:12,240 --> 00:42:15,189 Mic 2: What about the guys gaming the system? 999 00:42:15,189 --> 00:42:16,949 Dr. Helsby: Oh, yeah. 1000 00:42:16,949 --> 00:42:18,900 I think it's a legitimate concern. 1001 00:42:18,900 --> 00:42:22,480 Like, if all the outputs were just immediately public, 1002 00:42:22,480 --> 00:42:24,599 then, yes, everyone knows the location 1003 00:42:24,599 --> 00:42:26,549 of all police officers, 1004 00:42:26,549 --> 00:42:29,009 and I imagine that people would have 1005 00:42:29,009 --> 00:42:30,779 a problem with that. 1006 00:42:30,779 --> 00:42:32,679 Yup. 1007 00:42:32,679 --> 00:42:35,990 Heraldl: Microphone #4, please. 1008 00:42:35,990 --> 00:42:39,369 Mic 4: Yeah, this is not actually a question, 1009 00:42:39,369 --> 00:42:40,779 but just a comment. 1010 00:42:40,779 --> 00:42:42,970 I've enjoyed your talk very much, 1011 00:42:42,970 --> 00:42:47,789 in particular after watching 1012 00:42:47,789 --> 00:42:52,270 the talk in Hall 1 earlier in the afternoon. 1013 00:42:52,270 --> 00:42:55,730 The "Say Hi to Your New Boss", about 1014 00:42:55,730 --> 00:42:59,609 algorithms that are trained with big data, 1015 00:42:59,609 --> 00:43:02,390 and finally make decisions. 1016 00:43:02,390 --> 00:43:08,210 And I think these 2 talks are kind of complementary, 1017 00:43:08,210 --> 00:43:11,309 and if people are interested in the topic 1018 00:43:11,309 --> 00:43:14,710 they might want to check out the other talk 1019 00:43:14,710 --> 00:43:16,259 and watch it later, because these 1020 00:43:16,259 --> 00:43:17,319 fit very well together. 1021 00:43:17,319 --> 00:43:19,589 Dr. Helsby: Yeah, it was a great talk. 1022 00:43:19,589 --> 00:43:22,130 Herald: Microphone #2, please. 1023 00:43:22,130 --> 00:43:25,049 Mic 2: Um, yeah, you mentioned 1024 00:43:25,049 --> 00:43:27,319 the need to have some kind of 3rd-party auditing 1025 00:43:27,319 --> 00:43:30,900 or some kind of way to 1026 00:43:30,900 --> 00:43:31,930 peek into these algorithms 1027 00:43:31,930 --> 00:43:33,079 and to see what they're doing, 1028 00:43:33,079 --> 00:43:34,420 and to see if they're being fair. 1029 00:43:34,420 --> 00:43:36,199 Can you talk a little bit more about that? 1030 00:43:36,199 --> 00:43:38,059 Like, going forward, 1031 00:43:38,059 --> 00:43:40,690 some kind of regulatory structures 1032 00:43:40,690 --> 00:43:44,200 would probably have to emerge 1033 00:43:44,200 --> 00:43:47,200 to analyze and to look at 1034 00:43:47,200 --> 00:43:49,339 these black boxes that are just sort of 1035 00:43:49,339 --> 00:43:51,309 popping up everywhere and, you know, 1036 00:43:51,309 --> 00:43:52,939 controlling more and more of the things 1037 00:43:52,939 --> 00:43:56,150 in our lives, and important decisions. 1038 00:43:56,150 --> 00:43:58,539 So, just, what kind of discussions 1039 00:43:58,539 --> 00:43:59,460 are there for that? 1040 00:43:59,460 --> 00:44:01,809 And what kind of possibility is there for that? 1041 00:44:01,809 --> 00:44:04,900 And, I'm sure that companies would be 1042 00:44:04,900 --> 00:44:08,000 very, very resistant to 1043 00:44:08,000 --> 00:44:09,890 any kind of attempt to look into 1044 00:44:09,890 --> 00:44:13,890 algorithms, and to... 1045 00:44:13,890 --> 00:44:15,070 Dr. Helsby: Yeah, I mean, definitely 1046 00:44:15,070 --> 00:44:18,069 companies would be very resistant to 1047 00:44:18,069 --> 00:44:19,670 having people look into their algorithms. 1048 00:44:19,670 --> 00:44:22,190 So, if you wanna do a very rigorous 1049 00:44:22,190 --> 00:44:23,339 audit of what's going on 1050 00:44:23,339 --> 00:44:25,660 then it's probably necessary to have 1051 00:44:25,660 --> 00:44:26,589 a few people come in 1052 00:44:26,589 --> 00:44:28,900 and sign NDAs, and then 1053 00:44:28,900 --> 00:44:31,039 look through the systems. 1054 00:44:31,039 --> 00:44:33,140 So, that's one way to proceed. 1055 00:44:33,140 --> 00:44:35,049 But, another way to proceed that-- 1056 00:44:35,049 --> 00:44:38,720 so, these academic researchers have done 1057 00:44:38,720 --> 00:44:40,009 a few experiments 1058 00:44:40,009 --> 00:44:42,809 and found some interesting things, 1059 00:44:42,809 --> 00:44:45,500 and that's sort all the attempts at auditing 1060 00:44:45,500 --> 00:44:46,450 that we've seen: 1061 00:44:46,450 --> 00:44:48,490 there was 1 attempt in 2012 for the Obama campaign, 1062 00:44:48,490 --> 00:44:49,910 but there's really not been any 1063 00:44:49,910 --> 00:44:51,500 sort of systematic attempt-- 1064 00:44:51,500 --> 00:44:52,589 you know, like, in censorship 1065 00:44:52,589 --> 00:44:54,539 we see a systematic attempt to 1066 00:44:54,539 --> 00:44:56,779 do measurement as often as possible, 1067 00:44:56,779 --> 00:44:58,240 check what's going on, 1068 00:44:58,240 --> 00:44:59,339 and that itself, you know, 1069 00:44:59,339 --> 00:45:00,900 can act as an oversight mechanism. 1070 00:45:00,900 --> 00:45:01,880 But, right now, 1071 00:45:01,880 --> 00:45:03,900 I think many of these companies 1072 00:45:03,900 --> 00:45:05,259 realize no one is watching, 1073 00:45:05,259 --> 00:45:07,160 so there's no real push to have 1074 00:45:07,160 --> 00:45:10,440 people verify: are you being fair when you 1075 00:45:10,440 --> 00:45:11,539 implement this system? 1076 00:45:11,539 --> 00:45:12,969 Because no one's really checking. 1077 00:45:12,969 --> 00:45:13,980 Mic 2: Do you think that, 1078 00:45:13,980 --> 00:45:15,339 at some point, it would be like 1079 00:45:15,339 --> 00:45:19,059 an FDA or SEC, to give some American examples... 1080 00:45:19,059 --> 00:45:21,490 an actual government regulatory agency 1081 00:45:21,490 --> 00:45:24,960 that has the power and ability to 1082 00:45:24,960 --> 00:45:27,930 not just sort of look and try to 1083 00:45:27,930 --> 00:45:31,710 reverse engineer some of these algorithms, 1084 00:45:31,710 --> 00:45:33,920 but actually peek in there and make sure 1085 00:45:33,920 --> 00:45:36,420 that things are fair, because it seems like 1086 00:45:36,420 --> 00:45:38,240 there's just-- it's so important now 1087 00:45:38,240 --> 00:45:41,769 that, again, it could be the difference between 1088 00:45:41,769 --> 00:45:42,930 life and death, between 1089 00:45:42,930 --> 00:45:44,589 getting a job, not getting a job, 1090 00:45:44,589 --> 00:45:46,130 being pulled over, not being pulled over, 1091 00:45:46,130 --> 00:45:48,069 being racially profiled, not racially profiled, 1092 00:45:48,069 --> 00:45:49,410 things like that. Dr. Helsby: Right. 1093 00:45:49,410 --> 00:45:50,430 Mic 2: Is it moving in that direction? 1094 00:45:50,430 --> 00:45:52,249 Or is it way too early for it? 1095 00:45:52,249 --> 00:45:55,110 Dr. Helsby: I mean, so some people have... 1096 00:45:55,110 --> 00:45:56,859 someone has called for, like, 1097 00:45:56,859 --> 00:45:59,079 a Federal Search Commission, 1098 00:45:59,079 --> 00:46:00,930 or like a Federal Algorithms Commission, 1099 00:46:00,930 --> 00:46:03,200 that would do this sort of oversight work, 1100 00:46:03,200 --> 00:46:06,130 but it's in such early stages right now 1101 00:46:06,130 --> 00:46:09,970 that there's no real push for that. 1102 00:46:09,970 --> 00:46:13,330 But I think it's a good idea. 1103 00:46:13,330 --> 00:46:15,729 Herald: And again, #2 please. 1104 00:46:15,729 --> 00:46:17,059 Mic 2: Thank you again for your talk. 1105 00:46:17,059 --> 00:46:19,309 I was just curious if you can point 1106 00:46:19,309 --> 00:46:20,440 to any examples of 1107 00:46:20,440 --> 00:46:22,619 either current producers or consumers 1108 00:46:22,619 --> 00:46:24,029 of these algorithmic systems 1109 00:46:24,029 --> 00:46:26,390 who are actively and publicly trying 1110 00:46:26,390 --> 00:46:27,720 to do so in a responsible manner 1111 00:46:27,720 --> 00:46:29,720 by describing what they're trying to do 1112 00:46:29,720 --> 00:46:31,380 and how they're going about it? 1113 00:46:31,380 --> 00:46:37,210 Dr. Helsby: So, yeah, there are some companies, 1114 00:46:37,210 --> 00:46:39,000 for example, like DataKind, 1115 00:46:39,000 --> 00:46:42,710 that try to deploy algorithmic systems 1116 00:46:42,710 --> 00:46:44,640 in as responsible a way as possible, 1117 00:46:44,640 --> 00:46:47,250 for like public policy. 1118 00:46:47,250 --> 00:46:49,549 Like, I actually also implement systems 1119 00:46:49,549 --> 00:46:51,750 for public policy in a transparent way. 1120 00:46:51,750 --> 00:46:54,329 Like, all the code is in GitHub, etc. 1121 00:46:54,329 --> 00:47:00,020 And it is also the case to give credit to 1122 00:47:00,020 --> 00:47:01,990 Google, and these giants, 1123 00:47:01,990 --> 00:47:06,109 they're trying to implement transparency systems 1124 00:47:06,109 --> 00:47:08,170 that help you understand. 1125 00:47:08,170 --> 00:47:09,289 This has been done with respect to 1126 00:47:09,289 --> 00:47:12,329 how your data is being collected, 1127 00:47:12,329 --> 00:47:14,579 but for example if you go on Amazon.com 1128 00:47:14,579 --> 00:47:17,890 you can see a recommendation has been made, 1129 00:47:17,890 --> 00:47:19,420 and that is pretty transparent. 1130 00:47:19,420 --> 00:47:21,480 You can see "this item was recommended to me," 1131 00:47:21,480 --> 00:47:25,039 so you know that prediction is being used in this case, 1132 00:47:25,039 --> 00:47:27,089 and it will say why prediction is being used: 1133 00:47:27,089 --> 00:47:29,230 because you purchased some item. 1134 00:47:29,230 --> 00:47:30,380 And Google has a similar thing, 1135 00:47:30,380 --> 00:47:32,420 if you go to like Google Ad Settings, 1136 00:47:32,420 --> 00:47:35,249 you can even turn off personalization of ads 1137 00:47:35,249 --> 00:47:36,380 if you want, 1138 00:47:36,380 --> 00:47:38,119 and you can also see some of the inferences 1139 00:47:38,119 --> 00:47:39,400 that have been learned about you. 1140 00:47:39,400 --> 00:47:40,819 A subset of the inferences that have been 1141 00:47:40,819 --> 00:47:41,700 learned about you. 1142 00:47:41,700 --> 00:47:43,940 So, like, what interests... 1143 00:47:43,940 --> 00:47:47,869 Herald: A question from the internet, please? 1144 00:47:47,869 --> 00:47:50,930 Signal Angel: Yes, billetQ is asking 1145 00:47:50,930 --> 00:47:54,479 how do you avoid biases in machine learning? 1146 00:47:54,479 --> 00:47:57,380 I asume analysis system, for example, 1147 00:47:57,380 --> 00:48:00,420 could be biased against women and minorities, 1148 00:48:00,420 --> 00:48:04,960 if used for hiring decisions based on known data. 1149 00:48:04,960 --> 00:48:06,499 Dr. Helsby: Yeah, so one thing is to 1150 00:48:06,499 --> 00:48:08,529 just explicitly check. 1151 00:48:08,529 --> 00:48:12,199 So, you can check to see how 1152 00:48:12,199 --> 00:48:14,309 positive outcomes are being distributed 1153 00:48:14,309 --> 00:48:16,779 among those protected classes. 1154 00:48:16,779 --> 00:48:19,210 You could also incorporate these sort of 1155 00:48:19,210 --> 00:48:21,440 fairness constraints in the function 1156 00:48:21,440 --> 00:48:24,069 that you optimize when you train the system, 1157 00:48:24,069 --> 00:48:25,950 and so, if you're interested in reading more 1158 00:48:25,950 --> 00:48:28,960 about this, the 2 papers-- 1159 00:48:28,960 --> 00:48:31,909 let me go to References-- 1160 00:48:31,909 --> 00:48:32,730 there's a good paper called 1161 00:48:32,730 --> 00:48:35,339 Fairness Through Awareness that describes 1162 00:48:35,339 --> 00:48:37,499 how to go about doing this, 1163 00:48:37,499 --> 00:48:39,579 so I recommend this person read that. 1164 00:48:39,579 --> 00:48:40,970 It's good. 1165 00:48:40,970 --> 00:48:43,400 Herald: Microphone 2, please. 1166 00:48:43,400 --> 00:48:45,400 Mic2: Thanks again for your talk. 1167 00:48:45,400 --> 00:48:49,649 Umm, hello? 1168 00:48:49,649 --> 00:48:50,999 Okay. 1169 00:48:50,999 --> 00:48:52,960 Umm, I see of course a problem with 1170 00:48:52,960 --> 00:48:54,619 all the black boxes that you describe 1171 00:48:54,619 --> 00:48:57,069 with regards for the crime systems, 1172 00:48:57,069 --> 00:48:59,569 but when we look at the advertising systems 1173 00:48:59,569 --> 00:49:02,169 in many cases they are very networked. 1174 00:49:02,169 --> 00:49:04,160 There are many different systems collaborating 1175 00:49:04,160 --> 00:49:07,109 and exchanging data via open APIs: 1176 00:49:07,109 --> 00:49:08,720 RESTful APIs, and various 1177 00:49:08,720 --> 00:49:11,720 demand-side platforms and audience-exchange platforms, 1178 00:49:11,720 --> 00:49:12,539 and everything. 1179 00:49:12,539 --> 00:49:15,420 So, can that help to at least 1180 00:49:15,420 --> 00:49:22,160 increase awareness on where targeting, personalization 1181 00:49:22,160 --> 00:49:23,679 might be happening? 1182 00:49:23,679 --> 00:49:26,190 I mean, I'm looking at systems like 1183 00:49:26,190 --> 00:49:29,539 BuiltWith, that surface what kind of 1184 00:49:29,539 --> 00:49:31,380 JavaScript libraries are used elsewhere. 1185 00:49:31,380 --> 00:49:32,999 So, is that something that could help 1186 00:49:32,999 --> 00:49:35,670 at least to give a better awareness 1187 00:49:35,670 --> 00:49:38,690 and listing all the points where 1188 00:49:38,690 --> 00:49:41,409 you might be targeted... 1189 00:49:41,409 --> 00:49:43,070 Dr. Helsby: So, like, with respect to 1190 00:49:43,070 --> 00:49:46,460 advertising, the fact that there is behind the scenes 1191 00:49:46,460 --> 00:49:48,450 this like complicated auction process 1192 00:49:48,450 --> 00:49:50,650 that's occurring, just makes things 1193 00:49:50,650 --> 00:49:51,819 a lot more complicated. 1194 00:49:51,819 --> 00:49:54,170 So, for example, I said briefly 1195 00:49:54,170 --> 00:49:57,269 that they found that there's this statistical difference 1196 00:49:57,269 --> 00:49:59,099 between how men and women are treated, 1197 00:49:59,099 --> 00:50:01,339 but it doesn't necessarily mean that 1198 00:50:01,339 --> 00:50:03,640 "Oh, the algorithm is definitely biased." 1199 00:50:03,640 --> 00:50:06,369 It could be because of this auction process, 1200 00:50:06,369 --> 00:50:10,569 it could be that women are considered 1201 00:50:10,569 --> 00:50:12,630 more valuable when it comes to advertising, 1202 00:50:12,630 --> 00:50:15,099 and so these executive ads are getting 1203 00:50:15,099 --> 00:50:17,160 outbid by some other ads, 1204 00:50:17,160 --> 00:50:18,890 and so there's a lot of potential 1205 00:50:18,890 --> 00:50:20,490 causes for that. 1206 00:50:20,490 --> 00:50:22,829 So, I think it just makes things a lot more complicated. 1207 00:50:22,829 --> 00:50:25,910 I don't know if it helps with the bias at all. 1208 00:50:25,910 --> 00:50:27,410 Mic 2: Well, the question was more 1209 00:50:27,410 --> 00:50:30,299 a direction... can it help to surface 1210 00:50:30,299 --> 00:50:32,499 and make people aware of that fact? 1211 00:50:32,499 --> 00:50:34,930 I mean, I can talk to my kids probably, 1212 00:50:34,930 --> 00:50:36,259 and they will probably understand, 1213 00:50:36,259 --> 00:50:38,420 but I can't explain that to my grandma, 1214 00:50:38,420 --> 00:50:43,150 who's also, umm, looking at an iPad. 1215 00:50:43,150 --> 00:50:44,289 Dr. Helsby: So, the fact that 1216 00:50:44,289 --> 00:50:45,690 the systems are... 1217 00:50:45,690 --> 00:50:48,509 I don't know if I understand. 1218 00:50:48,509 --> 00:50:50,529 Mic 2: OK. I think that the main problem 1219 00:50:50,529 --> 00:50:53,710 is that we are behind the industry efforts 1220 00:50:53,710 --> 00:50:57,179 to being targeted at, and many people 1221 00:50:57,179 --> 00:51:00,579 do know, but a lot more people don't know, 1222 00:51:00,579 --> 00:51:03,160 and making them aware of the fact 1223 00:51:03,160 --> 00:51:07,269 that they are a target, in a way, 1224 00:51:07,269 --> 00:51:10,990 is something that can only be shown 1225 00:51:10,990 --> 00:51:14,779 by a 3rd party that disposed that data, 1226 00:51:14,779 --> 00:51:16,339 and make audits in a way-- 1227 00:51:16,339 --> 00:51:17,929 maybe in an automated way. 1228 00:51:17,929 --> 00:51:19,170 Dr. Helsby: Right. 1229 00:51:19,170 --> 00:51:21,410 Yeah, I think it certainly could help with advocacy 1230 00:51:21,410 --> 00:51:23,059 if that's the point, yeah. 1231 00:51:23,059 --> 00:51:26,079 Herald: Another question from the internet, please. 1232 00:51:26,079 --> 00:51:29,319 Signal Angel: Yes, on IRC they are asking 1233 00:51:29,319 --> 00:51:31,440 if we know that prediction in some cases 1234 00:51:31,440 --> 00:51:34,460 provides an influence that cannot be controlled. 1235 00:51:34,460 --> 00:51:38,480 So, r4v5 would like to know from you 1236 00:51:38,480 --> 00:51:41,519 if there are some cases or areas where 1237 00:51:41,519 --> 00:51:45,060 machine learning simply shouldn't go? 1238 00:51:45,060 --> 00:51:48,349 Dr. Helsby: Umm, so I think... 1239 00:51:48,349 --> 00:51:52,559 I mean, yes, I think that it is the case 1240 00:51:52,559 --> 00:51:54,650 that in some cases machine learning 1241 00:51:54,650 --> 00:51:56,180 might not be appropriate. 1242 00:51:56,180 --> 00:51:58,359 For example, if you use machine learning 1243 00:51:58,359 --> 00:52:00,970 to decide who should be searched. 1244 00:52:00,970 --> 00:52:02,619 I don't think it should be the case that 1245 00:52:02,619 --> 00:52:03,809 machine learning algorithms should 1246 00:52:03,809 --> 00:52:05,440 ever be used to determine 1247 00:52:05,440 --> 00:52:08,430 probable cause, or something like that. 1248 00:52:08,430 --> 00:52:12,339 So, if it's just one piece of evidence 1249 00:52:12,339 --> 00:52:13,299 that you consider, 1250 00:52:13,299 --> 00:52:14,990 and there's human oversight always, 1251 00:52:14,990 --> 00:52:18,519 *maybe* it's fine, but 1252 00:52:18,519 --> 00:52:20,839 we should be very suspicious and hesitant 1253 00:52:20,839 --> 00:52:22,119 in certain contexts where 1254 00:52:22,119 --> 00:52:24,529 the ramifications are very serious. 1255 00:52:24,529 --> 00:52:27,259 Like the No Fly List, and so on. 1256 00:52:27,259 --> 00:52:29,200 Herald: And #2 again. 1257 00:52:29,200 --> 00:52:30,809 Mic 2: A second question 1258 00:52:30,809 --> 00:52:33,509 that just occurred to me, if you don't mind. 1259 00:52:33,509 --> 00:52:35,339 Umm, until the advent of 1260 00:52:35,339 --> 00:52:36,559 algorithmic systems, 1261 00:52:36,559 --> 00:52:40,470 when there've been cases of serious harm 1262 00:52:40,470 --> 00:52:42,799 that's been resulted in individuals or groups, 1263 00:52:42,799 --> 00:52:44,579 and it's been demonstrated that 1264 00:52:44,579 --> 00:52:46,029 it's occurred because of 1265 00:52:46,029 --> 00:52:49,400 an individual or a system of people 1266 00:52:49,400 --> 00:52:53,019 being systematically biased, then often 1267 00:52:53,019 --> 00:52:55,130 one of the actions that's taken is 1268 00:52:55,130 --> 00:52:56,869 pressure's applied, and then 1269 00:52:56,869 --> 00:52:59,660 people are required to change, 1270 00:52:59,660 --> 00:53:01,049 and hopely be held responsible, 1271 00:53:01,049 --> 00:53:02,910 and then change the way that they do things 1272 00:53:02,910 --> 00:53:06,400 to try to remove bias from that system. 1273 00:53:06,400 --> 00:53:07,839 What's the current thinking about 1274 00:53:07,839 --> 00:53:10,299 how we can go about doing that 1275 00:53:10,299 --> 00:53:12,599 when the systems that are doing that 1276 00:53:12,599 --> 00:53:13,650 are algorithmic? 1277 00:53:13,650 --> 00:53:15,999 Is it just going to be human oversight, 1278 00:53:15,999 --> 00:53:16,910 and humans are gonna have to be 1279 00:53:16,910 --> 00:53:18,379 held responsible for the oversight? 1280 00:53:18,379 --> 00:53:20,890 Dr. Helsby: So, in terms of bias, 1281 00:53:20,890 --> 00:53:22,569 if we're concerned about bias towards 1282 00:53:22,569 --> 00:53:24,019 particular types of people, 1283 00:53:24,019 --> 00:53:25,710 that's something that we can optimize for. 1284 00:53:25,710 --> 00:53:28,839 So, we can train systems that are unbiased 1285 00:53:28,839 --> 00:53:30,019 in this way. 1286 00:53:30,019 --> 00:53:32,109 So that's one way to deal with it. 1287 00:53:32,109 --> 00:53:34,039 But there's always gonna be errors, 1288 00:53:34,039 --> 00:53:35,420 so that's sort of a separate issue 1289 00:53:35,420 --> 00:53:37,509 from the bias, and in the case 1290 00:53:37,509 --> 00:53:39,180 where there are errors, 1291 00:53:39,180 --> 00:53:40,539 there must be oversight. 1292 00:53:40,539 --> 00:53:45,079 So, one way that one could improve 1293 00:53:45,079 --> 00:53:46,410 the way that this is done 1294 00:53:46,410 --> 00:53:48,160 is by making sure that you're 1295 00:53:48,160 --> 00:53:50,799 keeping track of confidence of decisions. 1296 00:53:50,799 --> 00:53:54,039 So, if you have a low confidence prediction, 1297 00:53:54,039 --> 00:53:56,259 then maybe a human should come in and check things. 1298 00:53:56,259 --> 00:53:58,809 So, that might be one way to proceed. 1299 00:54:02,099 --> 00:54:03,990 Herald: So, there's no more question. 1300 00:54:03,990 --> 00:54:06,199 I close this talk now, 1301 00:54:06,199 --> 00:54:08,239 and thank you very much 1302 00:54:08,239 --> 00:54:09,410 and a big applause to 1303 00:54:09,410 --> 00:54:11,780 Jennifer Helsby! 1304 00:54:11,780 --> 00:54:16,310 *roaring applause*