1 00:00:00,310 --> 00:00:10,240 *32c3 preroll music* 2 00:00:10,240 --> 00:00:13,920 Angel: I introduce Whitney Merrill. She is an attorney in the US 3 00:00:13,920 --> 00:00:17,259 and she just recently, actually last week, graduated 4 00:00:17,259 --> 00:00:20,999 to her CS masters in Illinois. 5 00:00:20,999 --> 00:00:27,299 *applause* 6 00:00:27,299 --> 00:00:30,249 Angel: Without further ado: ‘Predicting Crime In A Big Data World’. 7 00:00:30,249 --> 00:00:32,870 *cautious applause* 8 00:00:32,870 --> 00:00:36,920 Whitney Merrill: Hi everyone. Thank you so much for coming. 9 00:00:36,920 --> 00:00:40,950 I know it´s been a exhausting Congress, so I appreciate you guys coming 10 00:00:40,950 --> 00:00:45,300 to hear me talk about Big Data and Crime Prediction. 11 00:00:45,300 --> 00:00:48,820 This is kind of a hobby of mine, I, 12 00:00:48,820 --> 00:00:53,030 in my last semester at Illinois, decided to poke around 13 00:00:53,030 --> 00:00:56,850 what´s currently happening, how these algorithms are being used and kind of 14 00:00:56,850 --> 00:01:00,390 figure out what kind of information can be gathered. So, I have about 30 minutes 15 00:01:00,390 --> 00:01:04,629 with you guys. I´m gonna do a broad overview of the types of programs. 16 00:01:04,629 --> 00:01:10,020 I´m gonna talk about what Predictive Policing is, the data used, 17 00:01:10,020 --> 00:01:13,600 similar systems in other areas where predictive algorithms are 18 00:01:13,600 --> 00:01:19,079 trying to better society, current uses in policing. 19 00:01:19,079 --> 00:01:22,119 I´m gonna talk a little bit about their effectiveness and then give you 20 00:01:22,119 --> 00:01:26,409 some final thoughts. So, imagine, 21 00:01:26,409 --> 00:01:30,310 in the very near future a Police officer is walking down the street 22 00:01:30,310 --> 00:01:34,389 wearing a camera on her collar. In her ear is a feed of information 23 00:01:34,389 --> 00:01:38,819 about the people and cars she passes alerting her to individuals and cars 24 00:01:38,819 --> 00:01:43,259 that might fit a particular crime or profile for a criminal. 25 00:01:43,259 --> 00:01:47,619 Early in the day she examined a map highlighting hotspots for crime. 26 00:01:47,619 --> 00:01:52,459 In the area she´s been set to patrol the predictive policing software 27 00:01:52,459 --> 00:01:57,590 indicates that there is an 82% chance of burglary at 2 pm, 28 00:01:57,590 --> 00:02:01,539 and it´s currently 2:10 pm. As she passes one individual 29 00:02:01,539 --> 00:02:05,549 her camera captures the individual´s face, runs it through 30 00:02:05,549 --> 00:02:10,399 a coordinated Police database - all of the Police departments that use this database 31 00:02:10,399 --> 00:02:14,680 are sharing information. Facial recognition software indicates that 32 00:02:14,680 --> 00:02:19,580 the person is Bobby Burglar who was previously convicted of burglary, 33 00:02:19,580 --> 00:02:24,790 was recently released and is now currently on patrole. The voice in her ear whispers: 34 00:02:24,790 --> 00:02:29,970 50 percent likely to commit a crime. Can she stop and search him? 35 00:02:29,970 --> 00:02:32,970 Should she chat him up? Should see how he acts? 36 00:02:32,970 --> 00:02:37,150 Does she need additional information to stop and detain him? 37 00:02:37,150 --> 00:02:40,900 And does it matter that he´s carrying a large duffle bag? 38 00:02:40,900 --> 00:02:45,579 Did the algorithm take this into account or did it just look at his face? 39 00:02:45,579 --> 00:02:49,939 What information was being collected at the time the algorithm 40 00:02:49,939 --> 00:02:55,259 chose to say 50% to provide the final analysis? 41 00:02:55,259 --> 00:02:57,930 So, another thought I´m gonna have you guys think about as I go 42 00:02:57,930 --> 00:03:01,540 through this presentation, is this quote that is more favorable 43 00:03:01,540 --> 00:03:05,870 towards Police algorithms, which is: “As people become data plots 44 00:03:05,870 --> 00:03:10,209 and probability scores, law enforcement officials and politicians alike 45 00:03:10,209 --> 00:03:16,519 can point and say: ‘Technology is void of the racist, profiling bias of humans.’” 46 00:03:16,519 --> 00:03:21,459 Is that true? Well, they probably will point and say that, 47 00:03:21,459 --> 00:03:24,860 but is it actually void of racist, profiling humans? 48 00:03:24,860 --> 00:03:27,849 And I´m gonna talk about that as well. 49 00:03:27,849 --> 00:03:32,759 So, Predictive Policing explained. Who and what? 50 00:03:32,759 --> 00:03:35,620 First of all, Predictive Policing actually isn´t new. All we´re doing 51 00:03:35,620 --> 00:03:41,469 is adding technology, doing better, faster aggregation of data. 52 00:03:41,469 --> 00:03:47,200 Analysts in Police departments have been doing this by hand for decades. 53 00:03:47,200 --> 00:03:50,950 These techniques are used to create profiles that accurately match 54 00:03:50,950 --> 00:03:55,530 likely offenders with specific past crimes. So, there´s individual targeting 55 00:03:55,530 --> 00:03:59,489 and then we have location-based targeting. The location-based, 56 00:03:59,489 --> 00:04:05,010 the goal is to help Police forces deploy their resources 57 00:04:05,010 --> 00:04:10,230 in a correct manner, in an efficient manner. They can be as simple 58 00:04:10,230 --> 00:04:13,950 as recommending that general crime may happen in a particular area, 59 00:04:13,950 --> 00:04:19,108 or specifically, what type of crime will happen in a one-block-radius. 60 00:04:19,108 --> 00:04:24,050 They take into account the time of day, the recent data collected 61 00:04:24,050 --> 00:04:30,040 and when in the year it´s happening as well as weather etc. 62 00:04:30,040 --> 00:04:33,850 So, another really quick thing worth going over, cause not everyone 63 00:04:33,850 --> 00:04:39,090 is familiar with machine learning. This is a very basic breakdown 64 00:04:39,090 --> 00:04:43,069 of training an algorithm on a data set. 65 00:04:43,069 --> 00:04:46,240 You collect it from many different sources, you put it all together, 66 00:04:46,240 --> 00:04:51,019 you clean it up, you split it into 3 sets: a training set, a validation set 67 00:04:51,019 --> 00:04:56,350 and a test set. The training set is what is going to develop the rules 68 00:04:56,350 --> 00:05:01,379 in which it´s going to kind of determine the final outcome. 69 00:05:01,379 --> 00:05:05,060 You´re gonna use a validation set to optimize it and finally 70 00:05:05,060 --> 00:05:09,729 apply this to establish a confidence level. 71 00:05:09,729 --> 00:05:15,349 There you´ll set a support level where you say you need a certain amount of data 72 00:05:15,349 --> 00:05:19,940 to determine whether or not the algorithm has enough information 73 00:05:19,940 --> 00:05:24,190 to kind of make a prediction. So, rules with a low support level 74 00:05:24,190 --> 00:05:28,759 are less likely to be statistically significant and the confidence level 75 00:05:28,759 --> 00:05:34,099 in the end is basically if there´s an 85% confidence level 76 00:05:34,099 --> 00:05:39,930 that means there´s an 85% chance that the suspect, e.g. meeting the rule in question, 77 00:05:39,930 --> 00:05:45,139 is engaged in criminal conduct. So, what does this mean? Well, 78 00:05:45,139 --> 00:05:49,590 it encourages collection and hoarding of data about crimes and individuals. 79 00:05:49,590 --> 00:05:52,720 Because you want as much information as possible so that you detect 80 00:05:52,720 --> 00:05:56,030 even the less likely scenarios. 81 00:05:56,030 --> 00:05:59,729 Information sharing is also encouraged because it´s easier, 82 00:05:59,729 --> 00:06:04,090 it´s done by third parties, or even what are called fourth parties 83 00:06:04,090 --> 00:06:07,860 and shared amongst departments. And here, your criminal data again 84 00:06:07,860 --> 00:06:10,810 was being done by analysts in Police departments for decades, but 85 00:06:10,810 --> 00:06:13,660 the information sharing and the amount of information they could aggregate 86 00:06:13,660 --> 00:06:17,169 was just significantly more difficult. So, 87 00:06:17,169 --> 00:06:21,410 what are these Predictive Policing algorithms and software… 88 00:06:21,410 --> 00:06:24,580 what are they doing? Are they determining guilt and innocence? 89 00:06:24,580 --> 00:06:29,289 And, unlike a thoughtcrime, they are not saying this person is guilty, 90 00:06:29,289 --> 00:06:33,289 this person is innocent. It´s creating a probability of whether or not 91 00:06:33,289 --> 00:06:37,800 the person has likely committed a crime or will likely commit a crime. 92 00:06:37,800 --> 00:06:41,030 And it can only say something to the future and the past. 93 00:06:41,030 --> 00:06:46,310 This here is a picture from one particular piece of software 94 00:06:46,310 --> 00:06:50,199 provided by HunchLab; and patterns emerge here from past crimes 95 00:06:50,199 --> 00:06:58,230 that can profile criminal types and associations, detect crime patterns etc. 96 00:06:58,230 --> 00:07:02,139 Generally in this types of algorithms they are using unsupervised data, 97 00:07:02,139 --> 00:07:05,479 that means someone is not going through and saying true-false, good-bad, good-bad. 98 00:07:05,479 --> 00:07:10,780 There´s just 1) too much information and 2) they´re trying to do clustering, 99 00:07:10,780 --> 00:07:15,279 determine the things that are similar. 100 00:07:15,279 --> 00:07:20,110 So, really quickly, I´m also gonna talk about the data that´s used. 101 00:07:20,110 --> 00:07:23,259 There are several different types: Personal characteristics, 102 00:07:23,259 --> 00:07:28,169 demographic information, activities of individuals, scientific data etc. 103 00:07:28,169 --> 00:07:32,690 This comes from all sorts of sources, one that really shocked me, was, 104 00:07:32,690 --> 00:07:36,860 and I´ll talk about it a little bit in the future, but, is the radiation detectors 105 00:07:36,860 --> 00:07:41,310 on New York City Police are constantly taking in data 106 00:07:41,310 --> 00:07:44,819 and it´s so sensitive, it can detect if you´ve had a recent medical treatment 107 00:07:44,819 --> 00:07:49,330 that involves radiation. Facial recognition and biometrics 108 00:07:49,330 --> 00:07:52,860 are clear here and the third-party doctrine – which basically says 109 00:07:52,860 --> 00:07:56,550 in the United States that you have no reasonable expectation of privacy in data 110 00:07:56,550 --> 00:08:01,490 you share with third parties – facilitates easy collection 111 00:08:01,490 --> 00:08:05,720 for Police officers and Government officials because they can go 112 00:08:05,720 --> 00:08:11,080 and ask for the information without any sort of warrant. 113 00:08:11,080 --> 00:08:16,259 For a really great overview: a friend of mine, Dia, did a talk here at CCC 114 00:08:16,259 --> 00:08:21,280 on “The architecture of a street level panopticon”. Does a really great overview 115 00:08:21,280 --> 00:08:25,199 of how this type of data is collected on the streets. Worth checking out 116 00:08:25,199 --> 00:08:29,490 ´cause I´m gonna gloss over kind of the types of data. 117 00:08:29,490 --> 00:08:33,450 There is in the United States what they call Multistate Anti-Terrorism 118 00:08:33,450 --> 00:08:38,279 Information Exchange Program which uses everything from credit history, 119 00:08:38,279 --> 00:08:42,029 your concealed weapons permits, aircraft pilot licenses, 120 00:08:42,029 --> 00:08:46,800 fishing licences etc. that´s searchable and shared amongst Police departments 121 00:08:46,800 --> 00:08:50,530 and Government officials and this is just more information. So, if they can collect 122 00:08:50,530 --> 00:08:57,690 it, they will aggregate it into a data base. So, what are the current uses? 123 00:08:57,690 --> 00:09:01,779 There are many, many different companies currently 124 00:09:01,779 --> 00:09:04,950 making software and marketing it to Police departments. 125 00:09:04,950 --> 00:09:08,470 All of them are slightly different, have different features, but currently 126 00:09:08,470 --> 00:09:12,260 it´s a competition to get clients, Police departments etc. 127 00:09:12,260 --> 00:09:15,829 The more Police departments you have the more data sharing you can sell, 128 00:09:15,829 --> 00:09:21,089 saying: “Oh, by enrolling you’ll now have x,y and z Police departments’ data 129 00:09:21,089 --> 00:09:27,040 to access” etc. These here are Hitachi and HunchLab, 130 00:09:27,040 --> 00:09:31,350 they both are hotspot targeting, it´s not individual targeting, 131 00:09:31,350 --> 00:09:35,140 those are a lot rarer. And it´s actually being used in my home town, 132 00:09:35,140 --> 00:09:39,550 which I´ll talk about in a little bit. Here, the appropriate tactics 133 00:09:39,550 --> 00:09:44,180 are automatically displayed for officers when they´re entering mission areas. 134 00:09:44,180 --> 00:09:47,920 So HunchLab will tell an officer: “Hey, you´re entering an area 135 00:09:47,920 --> 00:09:52,180 where there´s gonna be burglary that you should keep an eye out, be aware”. 136 00:09:52,180 --> 00:09:58,010 And this is updating in live time and they´re hoping it mitigates crime. 137 00:09:58,010 --> 00:10:01,240 Here are 2 other ones, the Domain Awareness System was created 138 00:10:01,240 --> 00:10:05,139 in New York City after 9/11 in conjunction with Microsoft. 139 00:10:05,139 --> 00:10:10,000 New York City actually makes money selling it to other cities 140 00:10:10,000 --> 00:10:16,470 to use this. CCTV-cameras are collected, they can… 141 00:10:16,470 --> 00:10:21,029 If they say there´s a man wearing a red shirt, 142 00:10:21,029 --> 00:10:24,430 the software will look for people wearing red shirts and at least 143 00:10:24,430 --> 00:10:28,139 alert Police departments to people that meet this description 144 00:10:28,139 --> 00:10:34,389 walking in public in New York City. The other one is by IBM 145 00:10:34,389 --> 00:10:40,139 and there are quite a few, you know, it´s just generally another hotspot targeting, 146 00:10:40,139 --> 00:10:45,839 each have a few different features. Worth mentioning, too, is the Heat List. 147 00:10:45,839 --> 00:10:50,769 This targeted individuals. I’m from the city of Chicago. I grew up in the city. 148 00:10:50,769 --> 00:10:55,149 There are currently 420 names, when this came out about a year ago, 149 00:10:55,149 --> 00:10:59,920 of individuals who are 500 times more likely than average to be involved 150 00:10:59,920 --> 00:11:05,230 in violence. Individual names, passed around to each Police officer in Chicago. 151 00:11:05,230 --> 00:11:10,029 They consider the rap sheet, disturbance calls, social network etc. 152 00:11:10,029 --> 00:11:15,540 But one of the main things they considered in placing mainly young black individuals 153 00:11:15,540 --> 00:11:19,279 on this list were known acquaintances and their arrest histories. 154 00:11:19,279 --> 00:11:23,279 So if kids went to school or young teenagers went to school 155 00:11:23,279 --> 00:11:27,880 with several people in a gang – and that individual may not even be involved 156 00:11:27,880 --> 00:11:32,160 in a gang – they’re more likely to appear on the list. The list has been 157 00:11:32,160 --> 00:11:36,829 heavily criticized for being racist, for not giving these children 158 00:11:36,829 --> 00:11:40,660 or young individuals on the list a chance to change their history 159 00:11:40,660 --> 00:11:44,510 because it’s being decided for them. They’re being told: “You are likely 160 00:11:44,510 --> 00:11:49,850 to be a criminal, and we’re gonna watch you”. Officers in Chicago 161 00:11:49,850 --> 00:11:53,550 visited these individuals would do knock and announce with a knock on the door 162 00:11:53,550 --> 00:11:58,029 and say: “Hi, I’m here, like just checking up what are you up to”. 163 00:11:58,029 --> 00:12:02,480 Which you don’t need any special suspicion to do. But it’s, you know, 164 00:12:02,480 --> 00:12:06,860 kind of a harassment that might cause a feedback, 165 00:12:06,860 --> 00:12:11,310 back into the data collected. 166 00:12:11,310 --> 00:12:15,209 This is PRECOBS. It’s currently used here in Hamburg. 167 00:12:15,209 --> 00:12:19,100 They actually went to Chicago and visited the Chicago Police Department 168 00:12:19,100 --> 00:12:24,170 to learn about Predictive Policing tactics in Chicago to implement it 169 00:12:24,170 --> 00:12:29,729 throughout Germany, Hamburg and Berlin. 170 00:12:29,729 --> 00:12:33,620 It’s used to generally forecast repeat-offenses. 171 00:12:33,620 --> 00:12:39,930 Again, when training data sets you need enough data points to predict crime. 172 00:12:39,930 --> 00:12:43,699 So crimes that are less likely to happen or happen very rarely: 173 00:12:43,699 --> 00:12:48,120 much harder to predict. Crimes that aren’t reported: much harder to predict. 174 00:12:48,120 --> 00:12:52,480 So a lot of these software… like pieces of software 175 00:12:52,480 --> 00:12:58,290 rely on algorithms that are hoping that there’s a same sort of picture, 176 00:12:58,290 --> 00:13:03,070 that they can predict: where and when and what type of crime will happen. 177 00:13:03,070 --> 00:13:06,890 PRECOBS is actually a term with a plan 178 00:13:06,890 --> 00:13:11,240 – the movie ‘Minority Report’, if you’re familiar with it, it’s the 3 psychics 179 00:13:11,240 --> 00:13:15,370 who predict crimes before they happen. 180 00:13:15,370 --> 00:13:19,149 So there’re other, similar systems in the world that are being used 181 00:13:19,149 --> 00:13:22,949 to predict whether or not something will happen. 182 00:13:22,949 --> 00:13:27,360 The first one is ‘Disease and Diagnosis’. They found that algorithms are actually 183 00:13:27,360 --> 00:13:33,810 more likely than doctors to predict what disease an individual has. 184 00:13:33,810 --> 00:13:39,480 It’s kind of shocking. The other is ‘Security Clearance’ in the US. 185 00:13:39,480 --> 00:13:44,240 It allows access to classified documents. There’s no automatic access in the US. 186 00:13:44,240 --> 00:13:48,750 So every person who wants to see some sort of secret cleared document 187 00:13:48,750 --> 00:13:53,089 must go through this process. And it’s vetting individuals. 188 00:13:53,089 --> 00:13:56,690 So it’s an opt-in process. But here they’re trying to predict who will 189 00:13:56,690 --> 00:14:00,550 disclose information, who will break the clearance system; 190 00:14:00,550 --> 00:14:05,810 and predict there… Here, the error rate, they’re probably much more comfortable 191 00:14:05,810 --> 00:14:09,360 with a high error rate. Because they have so many people competing 192 00:14:09,360 --> 00:14:13,699 for a particular job, to get clearance, that if they’re wrong, 193 00:14:13,699 --> 00:14:18,000 that somebody probably won’t disclose information, they don’t care, 194 00:14:18,000 --> 00:14:22,319 they just rather eliminate them than take the risk. 195 00:14:22,319 --> 00:14:27,509 So I’m an attorney in the US. I have this urge to talk about US law. 196 00:14:27,509 --> 00:14:32,089 It also seems to impact a lot of people internationally. 197 00:14:32,089 --> 00:14:36,360 Here we’re talking about the targeting of individuals, not hotspots. 198 00:14:36,360 --> 00:14:40,810 So targeting of individuals is not as widespread, currently. 199 00:14:40,810 --> 00:14:45,579 However it’s happening in Chicago; 200 00:14:45,579 --> 00:14:49,259 and other cities are considering implementing programs and there are grants 201 00:14:49,259 --> 00:14:53,730 right now to encourage Police departments 202 00:14:53,730 --> 00:14:57,110 to figure out target lists. 203 00:14:57,110 --> 00:15:00,699 So in the US suspicion is based on the totality of the circumstances. 204 00:15:00,699 --> 00:15:04,730 That’s the whole picture. The Police officer, the individual must look 205 00:15:04,730 --> 00:15:08,269 at the whole picture of what’s happening before they can detain an individual. 206 00:15:08,269 --> 00:15:11,920 It’s supposed to be a balanced assessment of relative weights, meaning 207 00:15:11,920 --> 00:15:16,399 – you know – if you know that the person is a pastor maybe then 208 00:15:16,399 --> 00:15:21,720 pacing in front of a liquor store, is not as suspicious 209 00:15:21,720 --> 00:15:26,370 as somebody who’s been convicted of 3 burglaries. It has to be ‘based 210 00:15:26,370 --> 00:15:31,430 on specific and articulable facts’. And the Police officers can use experience 211 00:15:31,430 --> 00:15:37,470 and common sense to determine whether or not their suspicion… 212 00:15:37,470 --> 00:15:42,920 Large amounts of networked data generally can provide individualized suspicion. 213 00:15:42,920 --> 00:15:48,410 The principal components here… the events leading up to the stop-and-search 214 00:15:48,410 --> 00:15:52,319 – what is the person doing right before they’re detained as well as the use 215 00:15:52,319 --> 00:15:57,709 of historical facts known about that individual, the crime, the area 216 00:15:57,709 --> 00:16:02,329 in which it’s happening etc. So it can rely on both things. 217 00:16:02,329 --> 00:16:06,819 No court in the US has really put out a percentage as what Probable Cause 218 00:16:06,819 --> 00:16:11,089 and Reasonable Suspicion. So ‘Probable Cause’ you need to get a warrant 219 00:16:11,089 --> 00:16:14,639 to search and seize an individual. ‘Reasonable Suspicion’ is needed 220 00:16:14,639 --> 00:16:20,329 to do stop-and-frisk in the US – stop an individual and question them. 221 00:16:20,329 --> 00:16:24,100 And this is a little bit different than what they call ‘Consensual Encounters’, 222 00:16:24,100 --> 00:16:27,680 where a Police officer goes up to you and chats you up. ‘Reasonable Suspicion’ 223 00:16:27,680 --> 00:16:32,029 – you’re actually detained. But I had a law professor who basically said: 224 00:16:32,029 --> 00:16:35,730 “30%..45% seem like a really good number 225 00:16:35,730 --> 00:16:39,290 just to show how low it really is”.You don’t even need to be 50% sure 226 00:16:39,290 --> 00:16:42,180 that somebody has committed a crime. 227 00:16:42,180 --> 00:16:47,459 So, officers can draw from their own experience to determine ‘Probable Cause’. 228 00:16:47,459 --> 00:16:51,350 And the UK has a similar ‘Reasonable Suspicion’ standard 229 00:16:51,350 --> 00:16:55,010 which depend on the circumstances of each case. So, 230 00:16:55,010 --> 00:16:58,819 I’m not as familiar with UK law but I believe even that some of the analysis-run 231 00:16:58,819 --> 00:17:03,480 ‘Reasonable Suspicion’ is similar. 232 00:17:03,480 --> 00:17:07,339 Is this like a black box? So, I threw this slide in 233 00:17:07,339 --> 00:17:10,960 for those who are interested in comparing this US law. 234 00:17:10,960 --> 00:17:15,280 Generally a dog sniff in the US falls under a particular set 235 00:17:15,280 --> 00:17:20,140 of legal history which is: a dog can go up, sniff for dogs, 236 00:17:20,140 --> 00:17:24,220 alert and that is completely okay. 237 00:17:24,220 --> 00:17:28,099 And the Police officers can use that data to detain and further search 238 00:17:28,099 --> 00:17:33,520 an individual. So is an algorithm similar to the dog which is kind of a black box? 239 00:17:33,520 --> 00:17:37,030 Information goes out, it’s processed, information comes out and 240 00:17:37,030 --> 00:17:42,720 a prediction is made. Police rely on the ‘Good Faith’ 241 00:17:42,720 --> 00:17:48,780 in ‘Totality of the Circumstances’ to make their decision. So there’s 242 00:17:48,780 --> 00:17:53,970 really no… if they’re relying on the algorithm 243 00:17:53,970 --> 00:17:57,230 and think in that situation that everything’s okay we might reach 244 00:17:57,230 --> 00:18:01,980 a level of ‘Reasonable Suspicion’ where the individual can now pat down 245 00:18:01,980 --> 00:18:08,470 the person he’s decided on the street or the algorithm has alerted to. So, 246 00:18:08,470 --> 00:18:13,220 the big question is, you know, “Could the officer consult predictive software apps 247 00:18:13,220 --> 00:18:18,610 in any individual analysis. Could he say: ‘60% likely to commit a crime’”. 248 00:18:18,610 --> 00:18:24,180 In my hypo: Does that mean that the person 249 00:18:24,180 --> 00:18:29,160 without looking at anything else detain that individual. 250 00:18:29,160 --> 00:18:33,810 And the answer is “Probably not”. One: predictive Policing algorithms just 251 00:18:33,810 --> 00:18:37,770 can not take in the Totality of the Circumstances. They have to be 252 00:18:37,770 --> 00:18:42,690 frequently updated, there are things that are happening that 253 00:18:42,690 --> 00:18:46,060 the algorithm possibly could not have taken into account. 254 00:18:46,060 --> 00:18:48,590 The problem here is that the algorithm itself, 255 00:18:48,590 --> 00:18:51,780 the prediction itself becomes part of Totality of the Circumstances, 256 00:18:51,780 --> 00:18:56,330 which I’m going to talk about a little bit more later. 257 00:18:56,330 --> 00:19:00,660 But officers have to have Reasonable Suspicion before the stop occurs. 258 00:19:00,660 --> 00:19:04,660 Retroactive justification is not sufficient. So, 259 00:19:04,660 --> 00:19:08,790 the algorithm can’t just say: “60% likely, you detain the individual 260 00:19:08,790 --> 00:19:12,130 and then figure out why you’ve detained the person”. It has to be 261 00:19:12,130 --> 00:19:16,570 before the detention actually happens. And the suspicion must relate 262 00:19:16,570 --> 00:19:19,990 to current criminal activity. The person must be doing something 263 00:19:19,990 --> 00:19:24,700 to indicate criminal activity. Just the fact that an algorithm says, 264 00:19:24,700 --> 00:19:29,440 based on these facts: “60%”, or even without articulating 265 00:19:29,440 --> 00:19:33,890 why the algorithm has chosen that, isn’t enough. 266 00:19:33,890 --> 00:19:38,380 Maybe you can see a gun shaped bulge in the pocket etc. 267 00:19:38,380 --> 00:19:43,160 So, effectiveness… the Totality of the Circumstances, 268 00:19:43,160 --> 00:19:46,720 can the algorithms keep up? Generally, probably not. 269 00:19:46,720 --> 00:19:50,560 Missing data, not capable of processing this data in real time. 270 00:19:50,560 --> 00:19:54,820 There’s no idea… the algorithm doesn’t know, 271 00:19:54,820 --> 00:19:58,950 and the Police officer probably doesn’t know the all of the facts. 272 00:19:58,950 --> 00:20:03,260 So the Police officer can take the algorithm into consideration 273 00:20:03,260 --> 00:20:08,130 but the problem here is: Did the algorithm know that the individual was active 274 00:20:08,130 --> 00:20:12,670 in the community, or was a politician, or 275 00:20:12,670 --> 00:20:17,450 that was a personal friend of the officer etc. It can’t just be relied upon. 276 00:20:17,450 --> 00:20:22,640 What if the algorithm did take into account that the individual was a Pastor? 277 00:20:22,640 --> 00:20:26,180 Now that information is counted twice and the balancing for the Totality 278 00:20:26,180 --> 00:20:34,320 of the Circumstances is off. Humans here must be the final decider. 279 00:20:34,320 --> 00:20:38,040 What are the problems? Well, there’s bad underlying data, 280 00:20:38,040 --> 00:20:41,970 there’s no transparency into what kind of data is being used, 281 00:20:41,970 --> 00:20:45,720 how it was collected, how old it is, how often it’s been updated, 282 00:20:45,720 --> 00:20:51,010 whether or not it’s been verified. There could just be noise in the training data. 283 00:20:51,010 --> 00:20:57,240 Honestly, the data is biased. It was collected by individuals in the US; 284 00:20:57,240 --> 00:21:01,020 generally there’ve been several studies done that 285 00:21:01,020 --> 00:21:05,270 black, young individuals are stopped more often than whites. 286 00:21:05,270 --> 00:21:09,800 And this is going to cause a collection bias. 287 00:21:09,800 --> 00:21:14,550 It’s gonna be drastically disproportionate to the makeup of the population of cities; 288 00:21:14,550 --> 00:21:19,440 and as more data has been collected on minorities, refugees in poor neighborhoods 289 00:21:19,440 --> 00:21:23,640 it’s gonna feed back in and of course only have data on those groups and provide 290 00:21:23,640 --> 00:21:26,410 feedback and say: “More crime is likely to 291 00:21:26,410 --> 00:21:27,770 happen because that’s where the data 292 00:21:27,770 --> 00:21:32,250 was collected”. So, what’s an acceptable error rate, well, 293 00:21:32,250 --> 00:21:37,500 depends on the burden of proof. Harm is different for an opt-in system. 294 00:21:37,500 --> 00:21:40,840 You know, what’s my harm if I don’t get clearance, or I don’t get the job; 295 00:21:40,840 --> 00:21:45,160 but I’m opting in, I’m asking to being considered for employment. 296 00:21:45,160 --> 00:21:49,080 In the US, what’s an error? If you search and find nothing, if you think 297 00:21:49,080 --> 00:21:53,630 you have Reasonable Suspicion based on good faith, 298 00:21:53,630 --> 00:21:57,060 both on the algorithm and what you witness, the US says that it’s 299 00:21:57,060 --> 00:22:00,620 no 4th Amendment violation, even if nothing has happened. 300 00:22:00,620 --> 00:22:05,970 It’s very low error false-positive rate here. 301 00:22:05,970 --> 00:22:09,140 In Big Data, generally, and machine-learning it’s great! 302 00:22:09,140 --> 00:22:13,550 Like 1% error is fantastic! But that’s pretty large for the number of individuals 303 00:22:13,550 --> 00:22:17,930 stopped each day. Or who might be subject to these algorithms. 304 00:22:17,930 --> 00:22:21,950 Because even though there’re only 400 individuals on the list in Chicago 305 00:22:21,950 --> 00:22:25,210 those individuals have been listed basically as targets 306 00:22:25,210 --> 00:22:28,870 by the Chicago Police Department. 307 00:22:28,870 --> 00:22:33,700 Other problems include database errors. Exclusion of evidence in the US 308 00:22:33,700 --> 00:22:37,170 only happens when there’s gross negligence or systematic misconduct. 309 00:22:37,170 --> 00:22:42,150 That’s very difficult to prove, especially when a lot of people view these algorithms 310 00:22:42,150 --> 00:22:47,360 as a big box. Data goes in, predictions come out, everyone’s happy. 311 00:22:47,360 --> 00:22:53,100 You rely and trust on the quality of IBM, HunchLab etc. 312 00:22:53,100 --> 00:22:56,730 to provide good software. 313 00:22:56,730 --> 00:23:01,000 Finally, some more concerns I have include feedback loop auditing 314 00:23:01,000 --> 00:23:04,810 and access to data and algorithms and the prediction thresholds. 315 00:23:04,810 --> 00:23:09,970 How certain must a prediction be – before it’s reported to the Police – 316 00:23:09,970 --> 00:23:13,230 that the person might commit a crime. Or that crime might happen 317 00:23:13,230 --> 00:23:18,460 in the individual area. If Reasonable Suspicion is as low as 35%, 318 00:23:18,460 --> 00:23:23,740 and reasonable Suspicion in the US has been held at: That guy drives a car 319 00:23:23,740 --> 00:23:28,350 that drug dealers like to drive, and he’s in the DEA database 320 00:23:28,350 --> 00:23:36,550 as a possible drug dealer. That was enough to stop and search him. 321 00:23:36,550 --> 00:23:40,090 So, are there Positives? Well, PredPol, 322 00:23:40,090 --> 00:23:44,800 which is one of the services that provides Predictive Policing software, 323 00:23:44,800 --> 00:23:49,650 says: “Since these cities have implemented there’s been dropping crime”. 324 00:23:49,650 --> 00:23:54,030 In L.A. 13% reduction in crime, in one division. 325 00:23:54,030 --> 00:23:57,510 There was even one day where they had no crime reported. 326 00:23:57,510 --> 00:24:04,550 Santa Cruz – 25..29% reduction, -9% in assaults etc. 327 00:24:04,550 --> 00:24:10,030 One: these are Police departments self-reporting these successes for… 328 00:24:10,030 --> 00:24:14,670 you know, take it for what it is and reiterated by the people 329 00:24:14,670 --> 00:24:20,510 selling the software. But perhaps it is actually reducing crime. 330 00:24:20,510 --> 00:24:24,390 It’s kind of hard to tell because there’s a feedback loop. 331 00:24:24,390 --> 00:24:29,200 Do we know that crime is really being reduced? Will it affect the data 332 00:24:29,200 --> 00:24:33,170 that is collected in the future? It’s really hard to know. Because 333 00:24:33,170 --> 00:24:38,330 if you send the Police officers into a community it’s more likely 334 00:24:38,330 --> 00:24:42,580 that they’re going to affect that community and that data collection. 335 00:24:42,580 --> 00:24:46,940 Will more crimes happen because they feel like the Police are harassing them? 336 00:24:46,940 --> 00:24:52,020 It’s very likely and it’s a problem here. 337 00:24:52,020 --> 00:24:56,930 So, some final thoughts. Predictive Policing programs are not going anywhere. 338 00:24:56,930 --> 00:25:01,430 They’re only in their wheelstart. 339 00:25:01,430 --> 00:25:06,030 And I think that more analysis, more transparency, more access to data 340 00:25:06,030 --> 00:25:10,560 needs to happen around these algorithms. There needs to be regulation. 341 00:25:10,560 --> 00:25:16,000 Currently, a very successful way in which 342 00:25:16,000 --> 00:25:19,310 these companies get data is they buy from Third Party sources 343 00:25:19,310 --> 00:25:24,590 and then sell it to Police departments. So perhaps PredPol might get information 344 00:25:24,590 --> 00:25:28,780 from Google, Facebook, Social Media accounts; aggregate data themselves, 345 00:25:28,780 --> 00:25:31,890 and then turn around and sell it to Police departments or provide access 346 00:25:31,890 --> 00:25:36,110 to Police departments. And generally, the Courts are gonna have to begin to work out 347 00:25:36,110 --> 00:25:40,210 how to handle this type of data. There’s not case law, 348 00:25:40,210 --> 00:25:45,160 at least in the US, that really knows how to handle predictive algorithms 349 00:25:45,160 --> 00:25:48,900 in determining what the analysis says. And so there really needs to be 350 00:25:48,900 --> 00:25:52,600 a lot more research and thought put into this. 351 00:25:52,600 --> 00:25:56,480 And one of the big things in order for this to actually be useful: 352 00:25:56,480 --> 00:26:01,590 if this is a tactic that had been used by Police departments for decades, 353 00:26:01,590 --> 00:26:04,420 we need to eliminate the bias in the data sets. Because right now 354 00:26:04,420 --> 00:26:09,090 all that it’s doing is facilitating and continuing bias, set in the database. 355 00:26:09,090 --> 00:26:12,610 And it’s incredibly difficult. It’s data collected by humans. 356 00:26:12,610 --> 00:26:17,780 And it causes initial selection bias. Which is gonna have to stop 357 00:26:17,780 --> 00:26:21,380 for it to be successful. 358 00:26:21,380 --> 00:26:25,930 And perhaps these systems can cause implicit bias or confirmation bias, 359 00:26:25,930 --> 00:26:29,030 e.g. Police are going to believe what they’ve been told. 360 00:26:29,030 --> 00:26:33,170 So if a Police officer goes on duty to an area 361 00:26:33,170 --> 00:26:36,660 and an algorithm says: “You’re 70% likely to find a burglar 362 00:26:36,660 --> 00:26:40,840 in this area”. Are they gonna find a burglar because they’ve been told: 363 00:26:40,840 --> 00:26:45,930 “You might find a burglar”? And finally the US border. 364 00:26:45,930 --> 00:26:49,800 There is no 4th Amendment protection at the US border. 365 00:26:49,800 --> 00:26:53,740 It’s an exception to the warrant requirement. This means 366 00:26:53,740 --> 00:26:58,740 no suspicion is needed to commit a search. So this data is gonna go into 367 00:26:58,740 --> 00:27:03,680 a way to examine you when you cross the border. 368 00:27:03,680 --> 00:27:09,960 And aggregate data can be used to refuse you entry into the US etc. 369 00:27:09,960 --> 00:27:13,690 And I think that’s pretty much it. And so a few minutes for questions. 370 00:27:13,690 --> 00:27:24,490 *applause* Thank you! 371 00:27:24,490 --> 00:27:27,460 Herald: Thanks a lot for your talk, Whitney. We have about 4 minutes left 372 00:27:27,460 --> 00:27:31,800 for questions. So please line up at the microphones and remember to 373 00:27:31,800 --> 00:27:37,740 make short and easy questions. 374 00:27:37,740 --> 00:27:42,060 Microphone No.2, please. 375 00:27:42,060 --> 00:27:53,740 Question: Just a comment: if I want to run a crime organization, like, 376 00:27:53,740 --> 00:27:57,760 I would target the PRECOBS here in Hamburg, maybe. 377 00:27:57,760 --> 00:28:01,170 So I can take the crime to the scenes 378 00:28:01,170 --> 00:28:05,700 where the PRECOBS doesn’t suspect. 379 00:28:05,700 --> 00:28:08,940 Whitney: Possibly. And I think this is a big problem in getting availability 380 00:28:08,940 --> 00:28:13,410 of data; in that there’s a good argument for Police departments to say: 381 00:28:13,410 --> 00:28:16,590 “We don’t want to tell you what our tactics are for Policing, 382 00:28:16,590 --> 00:28:19,490 because it might move crime”. 383 00:28:19,490 --> 00:28:23,130 Herald: Do we have questions from the internet? Yes, then please, 384 00:28:23,130 --> 00:28:26,580 one question from the internet. 385 00:28:26,580 --> 00:28:29,770 Signal Angel: Is there evidence that data like the use of encrypted messaging 386 00:28:29,770 --> 00:28:35,710 systems, encrypted emails, VPN, TOR, with automated request to the ISP, 387 00:28:35,710 --> 00:28:41,980 are used to obtain real names and collected to contribute to the scoring? 388 00:28:41,980 --> 00:28:45,580 Whitney: I’m not sure if that’s being taken into account 389 00:28:45,580 --> 00:28:49,530 by Predictive Policing algorithms, or by the software being used. 390 00:28:49,530 --> 00:28:55,160 I know that Police departments do take those things into consideration. 391 00:28:55,160 --> 00:29:00,630 And considering that in the US Totality of the Circumstances is 392 00:29:00,630 --> 00:29:04,980 how you evaluate suspicion. They are gonna take all of those things into account 393 00:29:04,980 --> 00:29:09,150 and they actually kind of have to take into account. 394 00:29:09,150 --> 00:29:11,830 Herald: Okay, microphone No.1, please. 395 00:29:11,830 --> 00:29:16,790 Question: In your example you mentioned disease tracking, e.g. Google Flu Trends 396 00:29:16,790 --> 00:29:21,870 is a good example of preventive Predictive Policing. Are there any examples 397 00:29:21,870 --> 00:29:27,630 where – instead of increasing Policing in the lives of communities – 398 00:29:27,630 --> 00:29:34,260 where sociologists or social workers are called to use predictive tools, 399 00:29:34,260 --> 00:29:36,210 instead of more criminalization? 400 00:29:36,210 --> 00:29:41,360 Whitney: I’m not aware if that’s… if Police departments are sending 401 00:29:41,360 --> 00:29:45,250 social workers instead of Police officers. But that wouldn’t surprise me because 402 00:29:45,250 --> 00:29:50,060 algorithms are being used to suspect child abuse. And in the US they’re gonna send 403 00:29:50,060 --> 00:29:53,230 a social worker in regard. So I would not be surprised if that’s also being 404 00:29:53,230 --> 00:29:56,890 considered. Since that’s part of the resources. 405 00:29:56,890 --> 00:29:59,030 Herald: OK, so if you have a really short question, then 406 00:29:59,030 --> 00:30:01,470 microphone No.2, please. Last question. 407 00:30:01,470 --> 00:30:08,440 Question: Okay, thank you for the talk. This talk as well as few others 408 00:30:08,440 --> 00:30:13,710 brought the thought in the debate about the fine-tuning that is required 409 00:30:13,710 --> 00:30:19,790 between false positives and preventing crimes or terror. 410 00:30:19,790 --> 00:30:24,250 Now, it’s a different situation if the Policeman is predicting, 411 00:30:24,250 --> 00:30:28,350 or a system is predicting somebody’s stealing a paper from someone; 412 00:30:28,350 --> 00:30:32,230 or someone is creating a terror attack. 413 00:30:32,230 --> 00:30:38,030 And the justification to prevent it 414 00:30:38,030 --> 00:30:42,980 under the expense of false positive is different in these cases. 415 00:30:42,980 --> 00:30:49,080 How do you make sure that the decision or the fine-tuning is not going to be 416 00:30:49,080 --> 00:30:53,570 deep down in the algorithm and by the programmers, 417 00:30:53,570 --> 00:30:58,650 but rather by the customer – the Policemen or the authorities? 418 00:30:58,650 --> 00:31:02,720 Whitney: I can imagine that Police officers are using common sense in that, 419 00:31:02,720 --> 00:31:06,220 and their knowledge about the situation and even what they’re being told 420 00:31:06,220 --> 00:31:10,450 by the algorithm. You hope that they’re gonna take… 421 00:31:10,450 --> 00:31:13,790 they probably are gonna take terrorism to a different level 422 00:31:13,790 --> 00:31:17,260 than a common burglary or a stealing of a piece of paper 423 00:31:17,260 --> 00:31:21,760 or a non-violent crime. And that fine-tuning 424 00:31:21,760 --> 00:31:26,160 is probably on a Police department 425 00:31:26,160 --> 00:31:29,390 by Police department basis. 426 00:31:29,390 --> 00:31:32,090 Herald: Thank you! This was Whitney Merrill, give a warm round of applause, please!! 427 00:31:32,090 --> 00:31:40,490 Whitney: Thank you! *applause* 428 00:31:40,490 --> 00:31:42,510 *postroll music* 429 00:31:42,510 --> 00:31:51,501 *Subtitles created by c3subtitles.de in the year 2016. Join and help us!*