1 00:00:01,905 --> 00:00:04,705 ♪ ♪ 2 00:00:06,805 --> 00:00:08,738 SARAH BRAYNE: We live in this era where 3 00:00:08,738 --> 00:00:11,038 we leave digital traces throughout the course 4 00:00:11,038 --> 00:00:13,138 of our everyday lives. 5 00:00:13,138 --> 00:00:14,871 ANDY CLARNO: What is this data, 6 00:00:14,871 --> 00:00:16,138 how is it collected, how is it being used? 7 00:00:16,138 --> 00:00:20,705 NARRATOR: One way it's being used is to make predictions 8 00:00:20,705 --> 00:00:22,038 about who might commit a crime... 9 00:00:22,038 --> 00:00:23,838 Hey, give me all your money, man! 10 00:00:23,838 --> 00:00:25,371 NARRATOR: ...and who should get bail. 11 00:00:25,371 --> 00:00:27,305 JUDGE: On count one, you're charged with felony intimidation... 12 00:00:27,305 --> 00:00:30,305 ANDREW FERGUSON: The idea is that if you look at past crimes, 13 00:00:30,305 --> 00:00:32,838 you might be able to predict the future. 14 00:00:32,838 --> 00:00:34,938 WILLIAM ISAAC: We want safer communities, 15 00:00:34,938 --> 00:00:36,638 we want societies that are less incarcerated. 16 00:00:36,638 --> 00:00:39,438 NARRATOR: But is that what we're getting? 17 00:00:39,438 --> 00:00:41,071 Are the predictions reliable? 18 00:00:41,071 --> 00:00:43,571 CATHY O'NEIL: I think algorithms can, 19 00:00:43,571 --> 00:00:44,838 in many cases, be better than people. 20 00:00:44,838 --> 00:00:47,338 But, of course, algorithms don't have consciousness. 21 00:00:47,338 --> 00:00:50,238 The algorithm only knows what it's been fed. 22 00:00:50,238 --> 00:00:51,305 RUHA BENJAMIN: Because it's technology, 23 00:00:51,305 --> 00:00:54,005 we don't question them as much as we might 24 00:00:54,005 --> 00:00:56,838 a racist judge or a racist officer. 25 00:00:56,838 --> 00:01:00,038 They're behind this veneer of neutrality. 26 00:01:00,038 --> 00:01:03,238 ISAAC: We need to know who's accountable 27 00:01:03,238 --> 00:01:06,338 when systems harm the communities 28 00:01:06,338 --> 00:01:07,605 that they're designed to serve. 29 00:01:07,605 --> 00:01:11,605 NARRATOR: Can we trust the justice of predictive algorithms? 30 00:01:11,605 --> 00:01:13,105 And should we? 31 00:01:13,105 --> 00:01:14,905 "Computers Vs. Crime," 32 00:01:14,905 --> 00:01:17,738 right now, on "NOVA." 33 00:01:39,905 --> 00:01:42,005 (computers booting up) 34 00:01:42,005 --> 00:01:44,771 ♪ ♪ 35 00:01:44,771 --> 00:01:47,438 NARRATOR: We live in a world of big data, 36 00:01:47,438 --> 00:01:49,671 where computers look for patterns 37 00:01:49,671 --> 00:01:52,205 in vast collections of information 38 00:01:52,205 --> 00:01:54,038 in order to predict the future. 39 00:01:54,038 --> 00:01:58,705 And we depend on their accuracy. 40 00:01:58,705 --> 00:02:00,505 Is it a good morning for jogging? 41 00:02:00,505 --> 00:02:02,971 Will this become cancer? 42 00:02:02,971 --> 00:02:05,371 What movie should I choose? 43 00:02:05,371 --> 00:02:08,038 The best way to beat traffic? 44 00:02:08,038 --> 00:02:09,771 Your computer can tell you. 45 00:02:09,771 --> 00:02:13,405 Similar computer programs, called predictive algorithms, 46 00:02:13,405 --> 00:02:17,005 are mining big data to make predictions 47 00:02:17,005 --> 00:02:19,271 about crime and punishment-- 48 00:02:19,271 --> 00:02:22,638 reinventing how our criminal legal system works. 49 00:02:22,638 --> 00:02:24,671 Policing agencies have used these computer algorithms 50 00:02:24,671 --> 00:02:30,205 in an effort to predict where the next crime will occur 51 00:02:30,205 --> 00:02:31,871 and even who the perpetrator will be. 52 00:02:31,871 --> 00:02:34,205 ASSISTANT DISTRICT ATTORNEY: Here, the state is recommending... 53 00:02:34,205 --> 00:02:35,005 NARRATOR: Judges use them 54 00:02:35,005 --> 00:02:37,105 to determine who should get bail 55 00:02:37,105 --> 00:02:38,571 and who shouldn't. 56 00:02:38,571 --> 00:02:41,205 JUDGE: If you fail to appear next time, you get no bond. 57 00:02:41,205 --> 00:02:44,171 NARRATOR: It may sound like the police of the future 58 00:02:44,171 --> 00:02:45,671 in the movie "Minority Report." I'm placing you under arrest 59 00:02:45,671 --> 00:02:47,638 for the future murder of Sarah Marks. 60 00:02:47,638 --> 00:02:49,571 NARRATOR: But fiction it's not. 61 00:02:49,571 --> 00:02:55,138 How do these predictions actually work? 62 00:02:55,138 --> 00:02:56,938 Can computer algorithms 63 00:02:56,938 --> 00:03:01,438 make our criminal legal system more equitable? 64 00:03:01,438 --> 00:03:08,305 Are these algorithms truly fair and free of human bias? 65 00:03:11,205 --> 00:03:12,305 ANDREW PAPACHRISTOS: I grew up in Chicago 66 00:03:12,305 --> 00:03:14,771 in the 1980s and early 1990s. 67 00:03:14,771 --> 00:03:21,671 ♪ ♪ 68 00:03:24,305 --> 00:03:26,471 My dad was an immigrant from Greece, 69 00:03:26,471 --> 00:03:29,705 we worked in my family's restaurant, 70 00:03:29,705 --> 00:03:31,271 called KaMar's. 71 00:03:31,271 --> 00:03:37,005 NARRATOR: Andrew Papachristos was a 16-year-old kid 72 00:03:37,005 --> 00:03:39,771 in the North Side of Chicago in the 1990s. 73 00:03:39,771 --> 00:03:42,938 I spent a lot of my formative years busing tables, 74 00:03:42,938 --> 00:03:47,405 serving people hamburgers and gyros. 75 00:03:47,405 --> 00:03:48,938 It kind of was a whole family affair. 76 00:03:48,938 --> 00:03:53,938 NARRATOR: Young Papachristos was aware the streets could be dangerous, 77 00:03:53,938 --> 00:04:00,038 but never imagined the violence would touch him or his family. 78 00:04:00,038 --> 00:04:02,871 REPORTER: Two more gang-related murders Monday night. 79 00:04:02,871 --> 00:04:05,171 PAPACHRISTOS: And of course, you know, the '80s and '90s in Chicago 80 00:04:05,171 --> 00:04:06,638 was some of the historically most violent periods in Chicago. 81 00:04:06,638 --> 00:04:11,705 Street corner drug markets, street organizations. 82 00:04:11,705 --> 00:04:14,938 And then like a lot of other businesses on our, on our block 83 00:04:14,938 --> 00:04:16,171 and in our neighborhood, 84 00:04:16,171 --> 00:04:19,471 local gangs tried to extort my family and the business. 85 00:04:19,471 --> 00:04:23,471 And my dad had been running KaMar's for 30 years 86 00:04:23,471 --> 00:04:24,871 and kind of just said no. 87 00:04:24,871 --> 00:04:27,071 ♪ ♪ 88 00:04:27,071 --> 00:04:32,171 (sirens blaring) 89 00:04:32,171 --> 00:04:37,071 NARRATOR: Then, one night, the family restaurant burned to the ground. 90 00:04:37,071 --> 00:04:40,071 Police suspected arson. 91 00:04:40,071 --> 00:04:42,038 PAPACHRISTOS: It was quite a shock to our family, 92 00:04:42,038 --> 00:04:43,305 'cause everybody in the neighborhood worked 93 00:04:43,305 --> 00:04:45,871 in the restaurant at one point in their life. 94 00:04:45,871 --> 00:04:51,538 And my parents lost 30 years of their lives. 95 00:04:51,538 --> 00:04:54,505 That was really one of the events that made me want to 96 00:04:54,505 --> 00:04:55,471 understand violence. 97 00:04:55,471 --> 00:04:56,905 Like, how could this happen? 98 00:04:56,905 --> 00:04:59,205 ♪ ♪ 99 00:04:59,205 --> 00:05:00,471 NARRATOR: About a decade later, 100 00:05:00,471 --> 00:05:06,071 Papachristos was a graduate student searching for answers. 101 00:05:06,071 --> 00:05:07,105 PAPACHRISTOS: In graduate school, 102 00:05:07,105 --> 00:05:11,171 I was working on a violence prevention program 103 00:05:11,171 --> 00:05:12,138 that brought together community members, 104 00:05:12,138 --> 00:05:17,305 including street outreach workers. 105 00:05:17,305 --> 00:05:19,338 And we were sitting at a table, 106 00:05:19,338 --> 00:05:21,571 and one of these outreach workers asked me, 107 00:05:21,571 --> 00:05:22,871 the university student, 108 00:05:22,871 --> 00:05:24,771 "Who's next? 109 00:05:24,771 --> 00:05:27,138 Who's going to get shot next?" 110 00:05:30,538 --> 00:05:32,905 And where that led was me sitting down 111 00:05:32,905 --> 00:05:36,571 with stacks of shooting and, and homicide files 112 00:05:36,571 --> 00:05:39,505 with a red pen and a legal pad, 113 00:05:39,505 --> 00:05:41,638 by hand creating these network images 114 00:05:41,638 --> 00:05:43,305 of, this person shot this person, 115 00:05:43,305 --> 00:05:46,038 and this person was involved with this group and this event, 116 00:05:46,038 --> 00:05:49,605 and creating a web of these relationships. 117 00:05:49,605 --> 00:05:50,938 And then I learned that 118 00:05:50,938 --> 00:05:52,971 there's this whole science about networks. 119 00:05:52,971 --> 00:05:55,371 I didn't have to invent anything. 120 00:05:55,371 --> 00:05:57,371 ♪ ♪ 121 00:05:57,371 --> 00:06:00,638 NARRATOR: Social network analysis was already influencing 122 00:06:00,638 --> 00:06:01,805 popular culture. 123 00:06:01,805 --> 00:06:07,205 "Six Degrees of Separation" was a play on Broadway. 124 00:06:07,205 --> 00:06:10,938 And then, there was Six Degrees of Kevin Bacon. 125 00:06:10,938 --> 00:06:12,371 PAPACHRISTOS: The idea was, you would play this game, 126 00:06:12,371 --> 00:06:15,105 and whoever got the shortest distance to Kevin Bacon 127 00:06:15,105 --> 00:06:16,738 would win. 128 00:06:16,738 --> 00:06:19,538 So Robert De Niro was in a movie with so-and-so, 129 00:06:19,538 --> 00:06:20,805 who was in a movie with Kevin Bacon. 130 00:06:20,805 --> 00:06:24,405 It was creating, essentially, a series of ties 131 00:06:24,405 --> 00:06:26,205 among movies and actors. 132 00:06:26,205 --> 00:06:28,905 And in fact, there's a mathematics 133 00:06:28,905 --> 00:06:30,038 behind that principle. 134 00:06:30,038 --> 00:06:34,571 It's actually old mathematical graph theory, right? 135 00:06:34,571 --> 00:06:36,471 That goes back to 1900s mathematics. 136 00:06:36,471 --> 00:06:41,005 And lots of scientists started seeing that there were 137 00:06:41,005 --> 00:06:42,071 mathematical principles, 138 00:06:42,071 --> 00:06:46,671 and computational resources-- computers, data-- 139 00:06:46,671 --> 00:06:48,938 were at a point that you could test those things. 140 00:06:48,938 --> 00:06:51,071 So it was in a very exciting time. 141 00:06:51,071 --> 00:06:54,438 We looked at arrest records, at police stops, 142 00:06:54,438 --> 00:06:56,538 and we looked at victimization records. 143 00:06:56,538 --> 00:06:57,705 Who was the victim of a homicide 144 00:06:57,705 --> 00:07:00,705 or a non-fatal shooting? 145 00:07:00,705 --> 00:07:02,038 ♪ ♪ 146 00:07:02,038 --> 00:07:08,038 The statistical model starts by creating the social networks of, 147 00:07:08,038 --> 00:07:09,438 say, everybody who may have been arrested 148 00:07:09,438 --> 00:07:10,705 in a, in a particular neighborhood. 149 00:07:10,705 --> 00:07:14,305 So Person A and Person B were in a robbery together, 150 00:07:14,305 --> 00:07:16,505 they have a tie, and then Person B and Person C 151 00:07:16,505 --> 00:07:19,971 were, were stopped by the police in another instance. 152 00:07:19,971 --> 00:07:22,205 And it creates networks of thousands of people. 153 00:07:22,205 --> 00:07:25,271 Understanding that events are connected, 154 00:07:25,271 --> 00:07:26,671 places are connected. 155 00:07:26,671 --> 00:07:28,438 That there are old things, like disputes between crews, 156 00:07:28,438 --> 00:07:35,138 which actually drive behavior for generations. 157 00:07:35,138 --> 00:07:37,171 What we saw was striking. 158 00:07:37,171 --> 00:07:38,938 (snaps): And you could see it immediately, 159 00:07:38,938 --> 00:07:40,438 and you could see it a mile away. 160 00:07:40,438 --> 00:07:42,838 Which was, gunshot victims clumped together. 161 00:07:42,838 --> 00:07:45,838 You, you very rarely see one victim. 162 00:07:45,838 --> 00:07:47,271 You see two, three, four. 163 00:07:47,271 --> 00:07:48,405 Sometimes they string across time and space. 164 00:07:48,405 --> 00:07:53,938 And then the model predicts, what's the probability 165 00:07:53,938 --> 00:07:56,571 that this is going to lead to a shooting 166 00:07:56,571 --> 00:07:59,038 on the same pathway in the future? 167 00:07:59,038 --> 00:08:02,438 (gun firing, people shouting) 168 00:08:02,438 --> 00:08:05,138 REPORTER: Another young man lies dead. 169 00:08:05,138 --> 00:08:08,105 NARRATOR: In Boston, Papachristos found 170 00:08:08,105 --> 00:08:10,505 that 85% of all gunshot injuries 171 00:08:10,505 --> 00:08:13,338 occurred within a single social network. 172 00:08:13,338 --> 00:08:15,271 Individuals in this network 173 00:08:15,271 --> 00:08:17,505 were less than five handshakes away 174 00:08:17,505 --> 00:08:20,205 from the victim of a gun homicide 175 00:08:20,205 --> 00:08:22,738 or non-fatal shooting. 176 00:08:22,738 --> 00:08:24,838 The closer a person was 177 00:08:24,838 --> 00:08:26,071 connected to a gunshot victim, he found, 178 00:08:26,071 --> 00:08:32,505 the greater the probability that that person would be shot. 179 00:08:32,505 --> 00:08:35,571 Around 2011, when Papachristos was presenting 180 00:08:35,571 --> 00:08:38,505 his groundbreaking work on social networks 181 00:08:38,505 --> 00:08:40,305 and gang violence, 182 00:08:40,305 --> 00:08:43,905 the Chicago Police Department wanted to know more. 183 00:08:43,905 --> 00:08:45,038 PAPACHRISTOS: We were at a conference. 184 00:08:45,038 --> 00:08:46,905 The then-superintendent of the police department, 185 00:08:46,905 --> 00:08:49,205 he was asking me a bunch of questions. 186 00:08:49,205 --> 00:08:50,705 He had clearly read the paper. 187 00:08:50,705 --> 00:08:53,271 NARRATOR: The Chicago Police Department 188 00:08:53,271 --> 00:08:55,771 was working on its own predictive policing program 189 00:08:55,771 --> 00:08:57,571 to fight crime. 190 00:08:57,571 --> 00:09:00,271 They were convinced that Papachristos's model 191 00:09:00,271 --> 00:09:05,971 could make their new policing model even more effective. 192 00:09:01,805 --> 00:09:05,971 could make their new policing model even more effective. 193 00:09:05,971 --> 00:09:07,171 LOGAN KOEPKE: Predictive policing involves 194 00:09:07,171 --> 00:09:12,605 looking to historical crime data to predict future events, 195 00:09:12,605 --> 00:09:14,805 either where police believe crime may occur 196 00:09:14,805 --> 00:09:18,605 or who might be involved in certain crimes. 197 00:09:18,605 --> 00:09:21,371 ♪ ♪ 198 00:09:21,371 --> 00:09:22,571 So it's the use of historical data to forecast a future event. 199 00:09:22,571 --> 00:09:27,138 NARRATOR: At the core of these programs is software, 200 00:09:27,138 --> 00:09:30,138 which, like all computer programs, 201 00:09:30,138 --> 00:09:33,405 is built around an algorithm. 202 00:09:33,405 --> 00:09:35,371 So, think of an algorithm like a recipe. 203 00:09:35,371 --> 00:09:40,005 ♪ ♪ 204 00:09:40,005 --> 00:09:41,805 You have inputs, 205 00:09:41,805 --> 00:09:44,605 which are your ingredients, you have the algorithm, 206 00:09:44,605 --> 00:09:45,438 which is the steps. 207 00:09:45,438 --> 00:09:50,271 ♪ ♪ 208 00:09:50,271 --> 00:09:51,771 And then there's the output, 209 00:09:51,771 --> 00:09:53,871 which is hopefully the delicious cake you're making. 210 00:09:57,405 --> 00:09:59,105 GROUP: Happy birthday! 211 00:09:59,105 --> 00:10:02,538 So one way to think about algorithms is to think about 212 00:10:02,538 --> 00:10:03,805 the hiring process. 213 00:10:03,805 --> 00:10:07,171 In fact, recruiters have been studied for a hundred years. 214 00:10:07,171 --> 00:10:10,538 And it turns out many human recruiters 215 00:10:10,538 --> 00:10:13,371 have a standard algorithm 216 00:10:13,371 --> 00:10:16,038 when they're looking at a risumi. 217 00:10:16,038 --> 00:10:19,471 So they start with your name, and then they look to see 218 00:10:19,471 --> 00:10:21,271 where you went to school, and then finally, 219 00:10:21,271 --> 00:10:23,905 they look at what your last job was. 220 00:10:23,905 --> 00:10:26,438 If they don't see the pattern they're looking for... 221 00:10:26,438 --> 00:10:28,138 (bell dings) ...that's all the time you get. 222 00:10:28,138 --> 00:10:32,105 And in a sense, that's exactly what artificial intelligence 223 00:10:32,105 --> 00:10:35,371 is doing, as well, in a very basic level. 224 00:10:35,371 --> 00:10:37,738 It's recognizing sets of patterns and using that 225 00:10:37,738 --> 00:10:42,338 to decide what the next step in its decision process would be. 226 00:10:42,338 --> 00:10:47,371 ♪ ♪ 227 00:10:47,371 --> 00:10:48,571 NARRATOR: What is commonly referred to 228 00:10:48,571 --> 00:10:51,605 as artificial intelligence, or A.I., 229 00:10:51,605 --> 00:10:54,338 is a process called machine learning, 230 00:10:54,338 --> 00:10:56,871 where a computer algorithm will adjust on its own, 231 00:10:56,871 --> 00:10:58,005 without human instructions, 232 00:10:58,005 --> 00:11:03,205 in response to the patterns it finds in the data. 233 00:11:03,205 --> 00:11:07,005 These powerful processes can analyze more data 234 00:11:07,005 --> 00:11:08,005 than any person can, 235 00:11:08,005 --> 00:11:13,271 and find patterns never recognized before. 236 00:11:13,271 --> 00:11:15,071 The principles for machine learning 237 00:11:15,071 --> 00:11:16,138 were invented in the 1950s, 238 00:11:16,138 --> 00:11:22,271 but began proliferating only after about 2010. 239 00:11:22,271 --> 00:11:23,371 What we consider machine learning today 240 00:11:23,371 --> 00:11:28,071 came about because hard drives became very cheap. 241 00:11:28,071 --> 00:11:31,238 So it was really easy to get a lot of data on everyone 242 00:11:31,238 --> 00:11:32,938 in every aspect of life. 243 00:11:32,938 --> 00:11:35,705 And the question is, what can we do with all of that data? 244 00:11:35,705 --> 00:11:39,305 Those new uses are things like predictive policing, 245 00:11:39,305 --> 00:11:42,238 they are things like deciding whether or not a person's 246 00:11:42,238 --> 00:11:44,338 going to get a job or not, 247 00:11:44,338 --> 00:11:45,805 or be invited for a job interview. 248 00:11:45,805 --> 00:11:51,271 NARRATOR: So how does such a powerful tool like machine learning work? 249 00:11:51,271 --> 00:11:53,771 Take the case of a hiring algorithm. 250 00:11:53,771 --> 00:11:56,738 First, a computer needs to understand the objective. 251 00:11:56,738 --> 00:11:59,271 Here, the objective is identifying 252 00:11:59,271 --> 00:12:01,605 the best candidate for the job. 253 00:12:01,605 --> 00:12:04,671 The algorithm looks at risumis of former job candidates 254 00:12:04,671 --> 00:12:10,571 and searches for keywords in risumis of successful hires. 255 00:12:10,571 --> 00:12:14,438 The risumis are what's called training data. 256 00:12:14,438 --> 00:12:18,038 The algorithm assigns values to each keyword. 257 00:12:18,038 --> 00:12:20,471 Words that appear more frequently in the risumis 258 00:12:20,471 --> 00:12:23,838 of successful candidates are given more value. 259 00:12:23,838 --> 00:12:27,438 The system learns from past risumis the patterns 260 00:12:27,438 --> 00:12:31,138 of qualities that are associated with successful hires. 261 00:12:31,138 --> 00:12:33,271 Then it makes its predictions by identifying these 262 00:12:33,271 --> 00:12:36,505 same patterns from the risumis of potential candidates. 263 00:12:36,505 --> 00:12:39,371 ♪ ♪ 264 00:12:39,371 --> 00:12:41,305 In a similar way, 265 00:12:41,305 --> 00:12:43,671 the Chicago police wanted to find patterns in crime reports 266 00:12:43,671 --> 00:12:47,538 and arrest records to predict who would be connected 267 00:12:47,538 --> 00:12:48,671 to violence in the future. 268 00:12:48,671 --> 00:12:54,805 They thought Papachristos's model could help. 269 00:12:54,805 --> 00:12:58,071 Obviously we wanted to, and tried, and framed and wrote 270 00:12:58,071 --> 00:12:59,238 all the caveats and made our recommendations to say, 271 00:12:59,238 --> 00:13:03,738 "This research should be in this public health space." 272 00:13:03,738 --> 00:13:06,305 But once the math is out there, 273 00:13:06,305 --> 00:13:07,505 once the statistics are out there, 274 00:13:07,505 --> 00:13:10,938 people can also take it and do what they want with it. 275 00:13:13,038 --> 00:13:16,705 NARRATOR: While Papachristos saw the model as a tool to identify 276 00:13:16,705 --> 00:13:17,871 future victims of gun violence, 277 00:13:17,871 --> 00:13:21,971 CPD saw the chance to identify not only future victims, 278 00:13:21,971 --> 00:13:24,838 but future criminals. 279 00:13:26,805 --> 00:13:28,871 First it took me, you know, by, by surprise, 280 00:13:28,871 --> 00:13:30,205 and then it got me worried. 281 00:13:30,205 --> 00:13:31,471 What is it gonna do? 282 00:13:31,471 --> 00:13:33,738 Who is it gonna harm? 283 00:13:33,738 --> 00:13:35,971 ♪ ♪ 284 00:13:35,971 --> 00:13:39,005 NARRATOR: What the police wanted to predict was who was at risk 285 00:13:39,005 --> 00:13:42,405 for being involved in future violence. 286 00:13:42,405 --> 00:13:43,871 Gimme all your money, man. 287 00:13:43,871 --> 00:13:47,471 NARRATOR: Training on hundreds of thousands of arrest records, 288 00:13:47,471 --> 00:13:51,371 the computer algorithm looks for patterns or factors 289 00:13:51,371 --> 00:13:53,838 associated with violent crime 290 00:13:53,838 --> 00:13:55,271 to calculate the risk that an individual 291 00:13:55,271 --> 00:14:00,571 will be connected to future violence. 292 00:14:00,571 --> 00:14:03,205 Using social network analysis, 293 00:14:03,205 --> 00:14:05,238 arrest records of associates 294 00:14:05,238 --> 00:14:08,371 are also included in that calculation. 295 00:14:08,371 --> 00:14:14,038 The program was called the Strategic Subject List, or SSL. 296 00:14:14,038 --> 00:14:16,405 It would be one of the most controversial 297 00:14:16,405 --> 00:14:17,871 in Chicago policing history. 298 00:14:17,871 --> 00:14:20,938 ANDY CLARNO: The idea behind the Strategic Subjects List, 299 00:14:20,938 --> 00:14:22,871 or the SSL, 300 00:14:22,871 --> 00:14:24,438 was to try to identify the people who would be 301 00:14:24,438 --> 00:14:29,671 most likely to become involved as what they called 302 00:14:29,671 --> 00:14:32,671 a "party to violence," either as a shooter or a victim. 303 00:14:32,671 --> 00:14:34,405 ♪ ♪ 304 00:14:34,405 --> 00:14:36,671 NARRATOR: Chicago police would use Papachristos's research 305 00:14:36,671 --> 00:14:40,438 to evaluate what was called an individual's 306 00:14:40,438 --> 00:14:43,505 "co-arrest network." 307 00:14:43,505 --> 00:14:45,605 And the way that the Chicago Police Department 308 00:14:45,605 --> 00:14:49,138 calculated an individual's network was through 309 00:14:49,138 --> 00:14:50,638 kind of two degrees of removal. 310 00:14:50,638 --> 00:14:53,971 Anybody that I'd been arrested with and anybody that they 311 00:14:53,971 --> 00:14:56,605 would, had been arrested with counted as people who were 312 00:14:56,605 --> 00:14:58,271 within my network. 313 00:14:58,271 --> 00:15:01,671 So my risk score would be based on my individual history 314 00:15:01,671 --> 00:15:03,871 of arrest and victimization, as well as the histories 315 00:15:03,871 --> 00:15:06,605 of arrest and victimization of people within that 316 00:15:06,605 --> 00:15:10,071 two-degree network of mine. 317 00:15:10,071 --> 00:15:11,571 It was colloquially known as the "heat list." 318 00:15:11,571 --> 00:15:12,705 If you were hot, you were on it. 319 00:15:12,705 --> 00:15:15,571 And they gave you literally a risk score. 320 00:15:15,571 --> 00:15:16,671 At one time, it was zero to 500-plus. 321 00:15:16,671 --> 00:15:18,871 If you're 500-plus, you are a high-risk person. 322 00:15:18,871 --> 00:15:21,705 ♪ ♪ 323 00:15:21,705 --> 00:15:22,905 And if you made this heat list, 324 00:15:22,905 --> 00:15:26,938 you might find a detective knocking on your front door. 325 00:15:26,938 --> 00:15:32,138 ♪ ♪ 326 00:15:32,138 --> 00:15:33,171 NARRATOR: Trying to predict 327 00:15:33,171 --> 00:15:38,538 future criminal activity is not a new idea. 328 00:15:38,538 --> 00:15:41,171 Scotland Yard in London began using this approach 329 00:15:41,171 --> 00:15:45,171 by mapping crime events in the 1930s. 330 00:15:48,871 --> 00:15:50,105 But in the 1990s, 331 00:15:50,105 --> 00:15:55,138 it was New York City Police Commissioner William Bratton 332 00:15:55,138 --> 00:15:57,938 who took crime mapping to another level. 333 00:15:57,938 --> 00:16:00,771 BRATTON: I run the New York City Police Department. 334 00:16:00,771 --> 00:16:03,171 My competition is the criminal element. 335 00:16:03,171 --> 00:16:06,771 NARRATOR: Bratton convinced policing agencies across the country 336 00:16:06,771 --> 00:16:09,205 that data-driven policing was the key 337 00:16:09,205 --> 00:16:10,571 to successful policing strategies. 338 00:16:10,571 --> 00:16:12,571 Part of this is to prevent crime in the first place. 339 00:16:12,571 --> 00:16:17,571 ♪ ♪ 340 00:16:17,571 --> 00:16:18,738 NARRATOR: Bratton was inspired 341 00:16:18,738 --> 00:16:22,071 by the work of his own New York City Transit Police. 342 00:16:22,071 --> 00:16:24,071 As you see all those, 343 00:16:24,071 --> 00:16:25,938 uh, dots on the map, 344 00:16:25,938 --> 00:16:27,305 that's our opponents. 345 00:16:27,305 --> 00:16:29,171 NARRATOR: It was called Charts of the Future, 346 00:16:29,171 --> 00:16:34,005 and credited with cutting subway felonies by 27% 347 00:16:34,005 --> 00:16:34,638 and robberies by a third. 348 00:16:34,638 --> 00:16:39,438 Bratton saw potential. 349 00:16:39,438 --> 00:16:42,305 He ordered all New York City precincts 350 00:16:42,305 --> 00:16:43,271 to systematically map crime, 351 00:16:43,271 --> 00:16:48,538 collect data, find patterns, report back. 352 00:16:48,538 --> 00:16:50,938 The new approach was called CompStat. 353 00:16:50,938 --> 00:16:54,738 BRAYNE: CompStat, I think, in a way, is kind of a precessor 354 00:16:54,738 --> 00:16:55,905 of predictive policing, 355 00:16:55,905 --> 00:17:00,605 in the sense that many of the same principles there-- 356 00:17:00,605 --> 00:17:03,071 you know, using data tracking, year-to-dates, 357 00:17:03,071 --> 00:17:06,438 identifying places where law enforcement interventions 358 00:17:06,438 --> 00:17:07,571 could be effective, et cetera-- 359 00:17:07,571 --> 00:17:10,271 really laid the groundwork for predictive policing. 360 00:17:10,271 --> 00:17:12,705 ♪ ♪ 361 00:17:12,705 --> 00:17:14,538 NARRATOR: By the early 2000s, 362 00:17:14,538 --> 00:17:16,971 as computational power increased, 363 00:17:16,971 --> 00:17:18,771 criminologists were convinced this new data trove 364 00:17:18,771 --> 00:17:23,705 could be used in machine learning to create models 365 00:17:23,705 --> 00:17:24,671 that predict when and where 366 00:17:24,671 --> 00:17:27,838 crime would happen in the future. 367 00:17:27,838 --> 00:17:30,038 ♪ ♪ 368 00:17:30,038 --> 00:17:32,105 REPORTER: L.A. police now say the gunmen opened fire 369 00:17:32,105 --> 00:17:33,805 with a semi-automatic weapon. 370 00:17:33,805 --> 00:17:35,405 NARRATOR: In 2008, 371 00:17:35,405 --> 00:17:38,805 now as chief of the Los Angeles Police Department, 372 00:17:38,805 --> 00:17:41,471 Bratton joined with academics at U.C.L.A. 373 00:17:41,471 --> 00:17:44,705 to help launch a predictive policing system 374 00:17:44,705 --> 00:17:46,171 called PredPol, 375 00:17:46,171 --> 00:17:49,105 powered by a machine learning algorithm. 376 00:17:49,105 --> 00:17:52,538 ♪ ♪ 377 00:17:52,538 --> 00:17:54,171 ISAAC: PredPol started 378 00:17:54,171 --> 00:17:56,105 as a spin-off of a set of, like, 379 00:17:56,105 --> 00:18:00,038 government contracts that were related to military work. 380 00:18:00,038 --> 00:18:02,171 They were developing 381 00:18:02,171 --> 00:18:05,571 a form of an algorithm that was used to predict I.E.Ds. 382 00:18:05,571 --> 00:18:07,771 (device explodes) 383 00:18:07,771 --> 00:18:08,971 And it was a technique that was used 384 00:18:08,971 --> 00:18:13,238 to also detect aftershocks and seismographic activity. 385 00:18:13,238 --> 00:18:15,871 (dogs barking and whining, objects clattering) 386 00:18:15,871 --> 00:18:17,138 And after those contracts ended, 387 00:18:17,138 --> 00:18:19,305 the company decided they wanted to apply this 388 00:18:19,305 --> 00:18:20,471 in the domain of, of policing 389 00:18:20,471 --> 00:18:22,405 domestically in the United States. 390 00:18:22,405 --> 00:18:25,271 (radio beeping) 391 00:18:25,271 --> 00:18:27,271 NARRATOR: The PredPol model 392 00:18:27,271 --> 00:18:28,671 relies on three types of historical data: 393 00:18:28,671 --> 00:18:35,271 type of crime, crime location, and time of crime, 394 00:18:35,271 --> 00:18:37,405 going back two to five years. 395 00:18:37,405 --> 00:18:38,971 The algorithm is looking for patterns 396 00:18:38,971 --> 00:18:44,271 to identify locations where crime is most likely to occur. 397 00:18:44,271 --> 00:18:46,871 As new crime incidents are reported, 398 00:18:46,871 --> 00:18:51,738 they get folded into the calculation. 399 00:18:51,738 --> 00:18:52,805 The predictions are displayed on a map 400 00:18:52,805 --> 00:18:57,405 as 500 x 500 foot areas that officers are then 401 00:18:57,405 --> 00:18:59,838 directed to patrol. 402 00:18:59,838 --> 00:19:01,705 ISAAC: And then from there, the algorithm says, 403 00:19:01,705 --> 00:19:04,805 "Okay, based on what we know about the kind of 404 00:19:04,805 --> 00:19:06,638 "very recent history, 405 00:19:06,638 --> 00:19:08,771 "where is likely that we'll see crime 406 00:19:08,771 --> 00:19:11,105 in the next day or the next hour?" 407 00:19:11,105 --> 00:19:14,238 ♪ ♪ 408 00:19:14,238 --> 00:19:15,771 BRAYNE: One of the key reasons 409 00:19:15,771 --> 00:19:16,805 that police start using these tools 410 00:19:16,805 --> 00:19:20,071 is the efficient and even, to a certain extent, 411 00:19:20,071 --> 00:19:21,371 like in their logic, 412 00:19:21,371 --> 00:19:24,305 more fair, um, and, and justifiable allocation 413 00:19:24,305 --> 00:19:25,538 of their police resources. 414 00:19:25,538 --> 00:19:28,138 ♪ ♪ 415 00:19:28,138 --> 00:19:29,805 NARRATOR: By 2013, 416 00:19:29,805 --> 00:19:33,405 in addition to PredPol, predictive policing systems 417 00:19:33,405 --> 00:19:37,105 developed by companies like HunchLab, IBM, and Palantir 418 00:19:37,105 --> 00:19:39,505 were in use across the country. 419 00:19:39,505 --> 00:19:42,005 (radios running in background) 420 00:19:42,005 --> 00:19:44,905 And computer algorithms 421 00:19:44,905 --> 00:19:46,171 were also being adopted in courtrooms. 422 00:19:46,171 --> 00:19:52,871 BAILIFF: 21CF3810, State of Wisconsin versus Chantille... 423 00:19:52,871 --> 00:19:55,838 KATHERINE FORREST: These tools are used in pretrial determinations, 424 00:19:55,838 --> 00:19:58,971 they're used in sentencing determinations, 425 00:19:58,971 --> 00:20:00,305 and they're used in housing determinations. 426 00:20:00,305 --> 00:20:05,738 They're also used, importantly, in the plea bargaining phase. 427 00:20:05,738 --> 00:20:08,138 They're used really throughout the entire process 428 00:20:08,138 --> 00:20:13,005 to try to do what judges have been doing, 429 00:20:13,005 --> 00:20:15,205 which is the very, very difficult task 430 00:20:15,205 --> 00:20:16,405 of trying to understand and predict 431 00:20:16,405 --> 00:20:21,671 what will a human being do tomorrow, or the next day, 432 00:20:21,671 --> 00:20:23,371 or next month, or three years from now. 433 00:20:23,371 --> 00:20:25,338 ASSISTANT DISTRICT ATTORNEY: Bail forfeited. 434 00:20:25,338 --> 00:20:27,671 He failed to appear 12/13/21. 435 00:20:27,671 --> 00:20:29,538 Didn't even make it to preliminary hearing. 436 00:20:29,538 --> 00:20:33,205 The software tools are an attempt to try to predict 437 00:20:33,205 --> 00:20:34,371 it better than humans can. 438 00:20:34,371 --> 00:20:36,371 MICHELLE HAVAS: On count one, you're charged with 439 00:20:36,371 --> 00:20:38,271 felony intimidation of a victim. 440 00:20:38,271 --> 00:20:40,871 SWEENEY: So, in the United States, you're innocent 441 00:20:40,871 --> 00:20:44,305 until you've been proven guilty, but you've been arrested. 442 00:20:44,305 --> 00:20:45,805 Now that you've been arrested, 443 00:20:45,805 --> 00:20:48,171 a judge has to decide whether or not 444 00:20:48,171 --> 00:20:49,538 you get out on bail, 445 00:20:49,538 --> 00:20:51,705 or how high or low that bail should be. 446 00:20:51,705 --> 00:20:55,205 You're charged with driving on a suspended license. 447 00:20:55,205 --> 00:20:56,405 I've set that bond at $1,000. 448 00:20:56,405 --> 00:20:58,838 No insurance, I've set that bond at $1,000. 449 00:20:58,838 --> 00:21:01,638 ALISON SHAMES: One of the problems is, 450 00:21:01,638 --> 00:21:04,405 judges often are relying on money bond 451 00:21:04,405 --> 00:21:05,605 or financial conditions of release. 452 00:21:05,605 --> 00:21:08,205 JUDGE: So I'm going to lower his fine 453 00:21:08,205 --> 00:21:10,005 to make it a bit more reasonable. 454 00:21:10,005 --> 00:21:13,171 So instead of $250,000 cash, 455 00:21:13,171 --> 00:21:15,305 surety is $100,000. 456 00:21:15,305 --> 00:21:17,871 SHAMES: It allows people who have access to money to be released. 457 00:21:17,871 --> 00:21:20,571 If you are poor, you are often being detained pretrial. 458 00:21:20,571 --> 00:21:27,071 Approximately 70% of the people in jail are there on pretrial. 459 00:21:27,071 --> 00:21:29,505 These are people who are presumed innocent, 460 00:21:29,505 --> 00:21:32,538 but are detained during the pretrial stage of their case. 461 00:21:32,538 --> 00:21:38,205 NARRATOR: Many jurisdictions use pretrial assessment algorithms 462 00:21:38,205 --> 00:21:42,105 with a goal to reduce jail populations and decrease 463 00:21:42,105 --> 00:21:43,571 the impact of judicial bias. 464 00:21:43,571 --> 00:21:49,238 SHAMES: The use of a tool like this takes historical data 465 00:21:49,238 --> 00:21:51,371 and assesses, based on research, 466 00:21:51,371 --> 00:21:56,871 associates factors that are predictive of the two outcomes 467 00:21:56,871 --> 00:21:59,138 that the judge is concerned with. 468 00:21:59,138 --> 00:22:02,071 That's community safety and whether that person 469 00:22:02,071 --> 00:22:05,871 will appear back in court during the pretrial period. 470 00:22:05,871 --> 00:22:08,305 ♪ ♪ 471 00:22:08,305 --> 00:22:11,905 NARRATOR: Many of these algorithms are based on a concept called 472 00:22:11,905 --> 00:22:12,838 a regression model. 473 00:22:12,838 --> 00:22:17,038 The earliest, called linear regression, 474 00:22:17,038 --> 00:22:24,338 dates back to 19th-century mathematics. 475 00:22:24,338 --> 00:22:27,071 O'NEIL: At the end of the day, machine learning algorithms 476 00:22:27,071 --> 00:22:29,905 do exactly what linear regression does, 477 00:22:29,905 --> 00:22:31,971 which is predict-- 478 00:22:31,971 --> 00:22:34,605 based on the initial conditions, the situation they're seeing-- 479 00:22:34,605 --> 00:22:36,338 predict what will happen in the future, 480 00:22:36,338 --> 00:22:38,038 whether that's, like, in the next one minute 481 00:22:38,038 --> 00:22:40,438 or the next four years. 482 00:22:41,805 --> 00:22:44,571 NARRATOR: Throughout the United States, over 60 jurisdictions 483 00:22:44,571 --> 00:22:49,371 use predictive algorithms as part of the legal process. 484 00:22:49,371 --> 00:22:53,105 One of the most widely used is COMPAS. 485 00:22:53,105 --> 00:22:55,171 The COMPAS algorithm weighs factors, 486 00:22:55,171 --> 00:22:57,271 including a defendant's answers to a questionnaire, 487 00:22:57,271 --> 00:23:02,571 to provide a risk assessment score. 488 00:23:02,571 --> 00:23:07,271 These scores are used every day by judges to guide decisions 489 00:23:07,271 --> 00:23:12,571 about pretrial detention, bail, and even sentencing. 490 00:23:12,571 --> 00:23:15,705 But the reliability of the COMPAS algorithm 491 00:23:15,705 --> 00:23:17,271 has been questioned. 492 00:23:17,271 --> 00:23:24,071 In 2016, ProPublica published an investigative report 493 00:23:24,071 --> 00:23:25,671 on the COMPAS risk assessment tool. 494 00:23:25,671 --> 00:23:31,305 BENJAMIN: Investigators wanted to see if the scores were accurate 495 00:23:31,305 --> 00:23:33,238 in predicting whether these individuals 496 00:23:33,238 --> 00:23:35,871 would commit a future crime. 497 00:23:35,871 --> 00:23:38,171 And they found two things that were interesting. 498 00:23:38,171 --> 00:23:42,938 One was that the score was remarkably unreliable 499 00:23:42,938 --> 00:23:46,838 in predicting who would commit a, a crime in the future 500 00:23:46,838 --> 00:23:48,138 over this two-year period. 501 00:23:48,138 --> 00:23:52,338 But then the other thing that ProPublica investigators found 502 00:23:52,338 --> 00:23:57,705 was that Black people were much more likely to be deemed 503 00:23:57,705 --> 00:24:01,605 high risk and white people low risk. 504 00:24:01,605 --> 00:24:04,838 NARRATOR: This was true even in cases when the Black person 505 00:24:04,838 --> 00:24:06,871 was arrested for a minor offense and the white person 506 00:24:06,871 --> 00:24:12,238 in question was arrested for a more serious crime. 507 00:24:12,238 --> 00:24:17,905 BENJAMIN: This ProPublica study was one of the first to begin 508 00:24:17,905 --> 00:24:22,105 to burst the bubble of technology 509 00:24:22,105 --> 00:24:24,738 as somehow objective and neutral. 510 00:24:24,738 --> 00:24:31,571 NARRATOR: The article created a national controversy. 511 00:24:31,571 --> 00:24:35,171 But at Dartmouth, a student convinced her professor 512 00:24:35,171 --> 00:24:36,538 they should both be more than stunned. 513 00:24:36,538 --> 00:24:40,571 HANY FARID: As it turns out, one of my students, Julia Dressel, 514 00:24:40,571 --> 00:24:42,405 reads the same article and said, 515 00:24:42,405 --> 00:24:44,205 "This is terrible. 516 00:24:44,205 --> 00:24:45,538 We should do something about it." (chuckles) 517 00:24:45,538 --> 00:24:48,271 This is the difference between an awesome idealistic student 518 00:24:48,271 --> 00:24:50,338 and a jaded, uh, professor. 519 00:24:50,338 --> 00:24:52,405 And I thought, "I think you're right." 520 00:24:52,405 --> 00:24:54,738 And as we were sort of struggling to understand 521 00:24:54,738 --> 00:24:58,771 the underlying roots of the bias in the algorithms, 522 00:24:58,771 --> 00:25:00,938 we asked ourselves a really simple question: 523 00:25:00,938 --> 00:25:05,105 are the algorithms today, are they doing better than humans? 524 00:25:05,105 --> 00:25:07,905 Because presumably, that's why you have these algorithms, 525 00:25:07,905 --> 00:25:11,571 is that they eliminate some of the bias and the prejudices, 526 00:25:11,571 --> 00:25:14,071 either implicit or explicit, in the human judgment. 527 00:25:14,071 --> 00:25:17,871 NARRATOR: To analyze COMPAS's risk assessment accuracy, 528 00:25:17,871 --> 00:25:20,871 they used the crowdsourcing platform Mechanical Turk. 529 00:25:20,871 --> 00:25:25,571 Their online study included 400 participants 530 00:25:25,571 --> 00:25:29,171 who evaluated 1,000 defendants. 531 00:25:29,171 --> 00:25:30,771 FARID: We asked participants to 532 00:25:30,771 --> 00:25:34,438 read a very short paragraph about an actual defendant. 533 00:25:34,438 --> 00:25:35,871 How old they were, 534 00:25:35,871 --> 00:25:37,871 whether they were male or female, 535 00:25:37,871 --> 00:25:40,405 what their prior juvenile conviction record was, 536 00:25:40,405 --> 00:25:42,438 and their prior adult conviction record. 537 00:25:42,438 --> 00:25:45,071 And, importantly, we didn't tell people their race. 538 00:25:45,071 --> 00:25:46,371 And then we ask a very simple question, 539 00:25:46,371 --> 00:25:48,071 "Do you think this person will commit a crime 540 00:25:48,071 --> 00:25:49,971 in the next two years?" 541 00:25:49,971 --> 00:25:50,938 Yes, no. 542 00:25:50,938 --> 00:25:53,938 And again, these are non-experts. 543 00:25:53,938 --> 00:25:55,605 These are people being paid 544 00:25:55,605 --> 00:25:58,438 a couple of bucks online to answer a survey. 545 00:25:58,438 --> 00:26:00,638 No criminal justice experience, 546 00:26:00,638 --> 00:26:02,505 don't know anything about the defendants. 547 00:26:02,505 --> 00:26:05,571 They were as accurate as the commercial software 548 00:26:05,571 --> 00:26:07,371 being used in the courts today, 549 00:26:07,371 --> 00:26:09,238 one particular piece of software. 550 00:26:09,238 --> 00:26:11,505 That was really surprising. 551 00:26:11,505 --> 00:26:13,771 We would've expected a little bit of improvement. 552 00:26:13,771 --> 00:26:14,971 After all, the algorithm has access 553 00:26:14,971 --> 00:26:17,838 to huge amounts of training data. 554 00:26:19,338 --> 00:26:22,205 NARRATOR: And something else puzzled the researchers. 555 00:26:22,205 --> 00:26:24,505 The MTurk workers' answers to questions 556 00:26:24,505 --> 00:26:27,838 about who would commit crimes in the future and who wouldn't 557 00:26:27,838 --> 00:26:30,738 showed a surprising pattern of racial bias, 558 00:26:30,738 --> 00:26:33,871 even though race wasn't indicated 559 00:26:33,871 --> 00:26:35,938 in any of the profiles. 560 00:26:35,938 --> 00:26:38,605 They were more likely to say a person of color 561 00:26:38,605 --> 00:26:41,338 will be high risk when they weren't, 562 00:26:41,338 --> 00:26:44,071 and they were more likely to say that a white person 563 00:26:44,071 --> 00:26:46,838 would not be high risk when in fact they were. 564 00:26:46,838 --> 00:26:49,705 And this made no sense to us at all. 565 00:26:49,705 --> 00:26:50,938 You don't know the race of the person. 566 00:26:50,938 --> 00:26:53,071 How is it possible that you're biased against them? 567 00:26:53,071 --> 00:26:54,838 (radios running in background) 568 00:26:54,838 --> 00:26:56,871 In this country, if you are a person of color, 569 00:26:56,871 --> 00:26:59,738 you are significantly more likely, historically, 570 00:26:59,738 --> 00:27:00,871 to be arrested, 571 00:27:00,871 --> 00:27:03,571 to be charged, and to be convicted of a crime. 572 00:27:03,571 --> 00:27:06,305 So in fact, prior convictions 573 00:27:06,305 --> 00:27:09,471 is a proxy for your race. 574 00:27:09,471 --> 00:27:11,505 Not a perfect proxy, but it is correlated, 575 00:27:11,505 --> 00:27:13,638 because of the historical inequities 576 00:27:13,638 --> 00:27:14,805 in the criminal justice system 577 00:27:14,805 --> 00:27:17,771 and policing in this country. 578 00:27:17,771 --> 00:27:19,171 (siren blaring) 579 00:27:19,171 --> 00:27:22,105 MAN: It's my car, bro, come on, what are y'all doing? 580 00:27:22,105 --> 00:27:23,705 Like, this, this is racial profiling. 581 00:27:23,705 --> 00:27:25,738 NARRATOR: Research indicates a Black person 582 00:27:25,738 --> 00:27:29,271 is five times more likely to be stopped without cause 583 00:27:29,271 --> 00:27:30,771 than a white person. 584 00:27:30,771 --> 00:27:33,138 Black people are at least twice as likely 585 00:27:33,138 --> 00:27:35,838 as white people to be arrested for drug offenses, 586 00:27:35,838 --> 00:27:37,938 even though Black and white people 587 00:27:37,938 --> 00:27:39,871 use drugs at the same rate. 588 00:27:39,871 --> 00:27:43,671 Black people are also about 12 times 589 00:27:43,671 --> 00:27:45,971 more likely to be wrongly convicted of drug crimes. 590 00:27:45,971 --> 00:27:51,705 FORREST: Historically, Black men have been arrested at higher levels 591 00:27:51,705 --> 00:27:53,005 than other populations. 592 00:27:53,005 --> 00:27:58,338 Therefore, the tool predicts that a Black man, for instance, 593 00:27:58,338 --> 00:28:00,671 will be arrested at a rate and recidivate at a rate 594 00:28:00,671 --> 00:28:04,271 that is higher than a white individual. 595 00:28:06,405 --> 00:28:09,138 FARID: And so what was happening is, you know, the big data, 596 00:28:09,138 --> 00:28:10,338 the big machine learning folks are saying, 597 00:28:10,338 --> 00:28:13,238 "Look, we're not giving it race-- it can't be racist." 598 00:28:13,238 --> 00:28:15,638 But that is spectacularly naive, 599 00:28:15,638 --> 00:28:18,438 because we know that other things correlate with race. 600 00:28:18,438 --> 00:28:19,605 In this case, number of prior convictions. 601 00:28:19,605 --> 00:28:23,371 And so when you train an algorithm on historical data, 602 00:28:23,371 --> 00:28:24,538 well, guess what. 603 00:28:24,538 --> 00:28:26,905 It's going to reproduce history-- of course it will. 604 00:28:28,505 --> 00:28:31,238 NARRATOR: Compounding the problem is the fact that 605 00:28:31,238 --> 00:28:32,771 predictive algorithms can't be put on the witness stand 606 00:28:32,771 --> 00:28:38,238 and interrogated about their decision-making processes. 607 00:28:38,238 --> 00:28:39,738 FORREST: Many defendants have had difficulty 608 00:28:39,738 --> 00:28:45,205 getting access to the underlying information 609 00:28:45,205 --> 00:28:47,305 that tells them, 610 00:28:47,305 --> 00:28:50,638 what was the data set that was used to assess me? 611 00:28:50,638 --> 00:28:53,305 What were the inputs that were used? 612 00:28:53,305 --> 00:28:55,205 How were those inputs weighted? 613 00:28:55,205 --> 00:28:57,638 So you've got what can be, these days, 614 00:28:57,638 --> 00:28:58,871 increasingly, a black box. 615 00:28:58,871 --> 00:29:02,438 A lack of transparency. 616 00:29:04,271 --> 00:29:05,671 NARRATOR: Some black box algorithms get their name 617 00:29:05,671 --> 00:29:08,505 from a lack of transparency about the code 618 00:29:08,505 --> 00:29:10,505 and data inputs they use, 619 00:29:10,505 --> 00:29:13,571 which can be deemed proprietary. 620 00:29:13,571 --> 00:29:17,705 But that's not the only kind of black box. 621 00:29:17,705 --> 00:29:21,071 A black box is any system which is so complicated 622 00:29:21,071 --> 00:29:24,171 that you can see what goes in and you can see what comes out, 623 00:29:24,171 --> 00:29:26,971 but it's impossible to understand 624 00:29:26,971 --> 00:29:29,271 what's going on inside it. 625 00:29:29,271 --> 00:29:32,238 All of those steps in the algorithm 626 00:29:32,238 --> 00:29:37,238 are hidden inside phenomenally complex math 627 00:29:37,238 --> 00:29:39,605 and processes. 628 00:29:39,605 --> 00:29:43,071 FARID: And I would argue that when you are using algorithms 629 00:29:43,071 --> 00:29:45,905 in mission-critical applications, 630 00:29:45,905 --> 00:29:47,071 like criminal justice system, 631 00:29:47,071 --> 00:29:49,438 we should not be deploying black box algorithms. 632 00:29:55,405 --> 00:29:58,371 NARRATOR: PredPol, like many predictive platforms, 633 00:29:58,371 --> 00:30:00,738 claimed a proven record for crime reduction. 634 00:30:00,738 --> 00:30:05,538 In 2015, PredPol published its algorithm 635 00:30:01,805 --> 00:30:05,538 In 2015, PredPol published its algorithm 636 00:30:05,538 --> 00:30:08,505 in a peer-reviewed journal. 637 00:30:08,505 --> 00:30:11,138 William Isaac and Kristian Lum, 638 00:30:11,138 --> 00:30:13,005 research scientists who investigate 639 00:30:13,005 --> 00:30:14,905 predictive policing platforms, 640 00:30:14,905 --> 00:30:17,371 analyzed the algorithm. 641 00:30:17,371 --> 00:30:19,638 ISAAC: We just kind of saw the algorithm 642 00:30:19,638 --> 00:30:21,271 as going back to the same one or two blocks 643 00:30:21,271 --> 00:30:24,438 every single time. 644 00:30:26,205 --> 00:30:27,771 And that's kind of strange, 645 00:30:27,771 --> 00:30:30,838 because if you had a truly predictive policing system, 646 00:30:30,838 --> 00:30:33,371 you wouldn't necessarily see it going to the same locations 647 00:30:33,371 --> 00:30:35,505 over and over again. 648 00:30:38,338 --> 00:30:40,271 NARRATOR: For their experiment, 649 00:30:40,271 --> 00:30:41,638 Isaac and Lum used a different data set, 650 00:30:41,638 --> 00:30:47,471 public health data, to map illicit drug use in Oakland. 651 00:30:47,471 --> 00:30:50,938 ISAAC: So, a good chunk of the city was kind of 652 00:30:50,938 --> 00:30:53,671 evenly distributed in terms of where 653 00:30:53,671 --> 00:30:55,405 potential illicit drug use might be. 654 00:30:55,405 --> 00:30:59,338 But the police predictions were clustering around areas 655 00:30:59,338 --> 00:31:02,138 where police had, you know, 656 00:31:02,138 --> 00:31:04,238 historically found incidents of illicit drug use. 657 00:31:04,238 --> 00:31:08,071 Specifically, we saw significant numbers of neighborhoods 658 00:31:08,071 --> 00:31:10,171 that were predominantly non-white and lower-income 659 00:31:10,171 --> 00:31:16,138 being deliberate targets of the predictions. 660 00:31:16,138 --> 00:31:19,471 NARRATOR: Even though illicit drug use was a citywide problem, 661 00:31:19,471 --> 00:31:21,138 the algorithm focused its predictions 662 00:31:21,138 --> 00:31:25,305 on low-income neighborhoods and communities of color. 663 00:31:26,471 --> 00:31:29,738 ISAAC: The reason why is actually really important. 664 00:31:29,738 --> 00:31:31,005 It's very hard to divorce 665 00:31:31,005 --> 00:31:33,071 these predictions from those histories 666 00:31:33,071 --> 00:31:35,938 and legacies of over-policing. 667 00:31:37,138 --> 00:31:41,338 As a result of that, they manifest themselves in the data. 668 00:31:41,338 --> 00:31:43,571 NARRATOR: In an area where there is more police presence, 669 00:31:43,571 --> 00:31:46,171 more crime is uncovered. 670 00:31:47,471 --> 00:31:49,771 The crime data indicates to the algorithm 671 00:31:49,771 --> 00:31:52,438 that the heavily policed neighborhood 672 00:31:52,438 --> 00:31:55,205 is where future crime will be found, 673 00:31:55,205 --> 00:31:56,471 even though there may be other neighborhoods 674 00:31:56,471 --> 00:32:02,105 where crimes are being committed at the same or higher rate. 675 00:32:03,305 --> 00:32:05,738 ISAAC: Every new prediction that you generate 676 00:32:05,738 --> 00:32:08,105 is going to be increasingly dependent 677 00:32:08,105 --> 00:32:11,471 on the behavior of the algorithm in the past. 678 00:32:11,471 --> 00:32:13,371 So, you know, if you go ten days, 20 days, 679 00:32:13,371 --> 00:32:15,838 30 days into the future, right, after using an algorithm, 680 00:32:15,838 --> 00:32:19,371 all of those predictions have changed the behavior 681 00:32:19,371 --> 00:32:20,705 of the police department 682 00:32:20,705 --> 00:32:23,738 and are now being folded back into the next day's prediction. 683 00:32:26,171 --> 00:32:27,571 NARRATOR: The result can be a feedback loop 684 00:32:27,571 --> 00:32:32,005 that reinforces historical policing practices. 685 00:32:34,438 --> 00:32:35,871 SWEENEY: All of these different types 686 00:32:35,871 --> 00:32:38,205 of machine learning algorithms are all trying to help us 687 00:32:38,205 --> 00:32:41,071 figure out, are there some patterns in this data? 688 00:32:41,071 --> 00:32:43,738 It's up to us to then figure out, 689 00:32:43,738 --> 00:32:46,238 are those legitimate patterns, do they, 690 00:32:46,238 --> 00:32:47,405 are they useful patterns? 691 00:32:47,405 --> 00:32:49,105 Because the computer has no idea. 692 00:32:49,105 --> 00:32:51,371 It didn't make a logical association. 693 00:32:51,371 --> 00:32:54,805 It just made it, made a correlation. 694 00:32:56,071 --> 00:32:59,338 MING: My favorite definition of artificial intelligence 695 00:32:59,338 --> 00:33:02,805 is, it's any autonomous system 696 00:33:02,805 --> 00:33:04,371 that can make decisions under uncertainty. 697 00:33:04,371 --> 00:33:09,905 You can't make decisions under uncertainty without bias. 698 00:33:11,305 --> 00:33:14,771 In fact, it's impossible to escape from having bias. 699 00:33:14,771 --> 00:33:16,471 It's a mathematical reality 700 00:33:16,471 --> 00:33:21,338 about any intelligent system, even us. 701 00:33:21,338 --> 00:33:23,038 (siren blaring in distance) 702 00:33:23,038 --> 00:33:26,171 NARRATOR: And even if the goal is to get rid of prejudice, 703 00:33:26,171 --> 00:33:31,705 bias in the historical data can undermine that objective. 704 00:33:33,738 --> 00:33:35,505 Amazon discovered this 705 00:33:35,505 --> 00:33:38,071 when they began a search for top talent 706 00:33:38,071 --> 00:33:40,738 with a hiring algorithm whose training data 707 00:33:40,738 --> 00:33:45,071 depended on hiring successes from the past. 708 00:33:45,071 --> 00:33:49,338 MING: Amazon, somewhat famously within the A.I. industry, 709 00:33:49,338 --> 00:33:54,205 they tried to build a hiring algorithm. 710 00:33:54,205 --> 00:33:56,871 They had a massive data set. 711 00:33:56,871 --> 00:33:58,571 They had all the right answers, 712 00:33:58,571 --> 00:34:00,971 because they knew literally who got hired 713 00:34:00,971 --> 00:34:04,038 and who got that promotion in their first year. 714 00:34:04,038 --> 00:34:05,771 (typing) 715 00:34:05,771 --> 00:34:07,871 NARRATOR: The company created multiple models 716 00:34:07,871 --> 00:34:10,138 to review past candidates' risumis 717 00:34:10,138 --> 00:34:15,838 and identify some 50,000 key terms. 718 00:34:15,838 --> 00:34:18,471 MING: What Amazon actually wanted to achieve 719 00:34:18,471 --> 00:34:21,105 was to diversify their hiring. 720 00:34:21,105 --> 00:34:24,771 Amazon, just like every other tech company, 721 00:34:24,771 --> 00:34:25,838 and a lot of other companies, as well, 722 00:34:25,838 --> 00:34:29,971 has enormous bias built into its hiring history. 723 00:34:29,971 --> 00:34:35,538 It was always biased, strongly biased, in favor of men, 724 00:34:35,538 --> 00:34:37,838 in favor, generally, 725 00:34:37,838 --> 00:34:40,771 of white or sometimes Asian men. 726 00:34:40,771 --> 00:34:44,038 Well, they went and built a hiring algorithm. 727 00:34:44,038 --> 00:34:45,371 And sure enough, this thing was 728 00:34:45,371 --> 00:34:50,005 the most sexist recruiter you could imagine. 729 00:34:50,005 --> 00:34:52,638 If you said the word "women's" in your risumi, 730 00:34:52,638 --> 00:34:53,971 then it wouldn't hire you. 731 00:34:53,971 --> 00:34:54,971 If you went to a women's college, 732 00:34:54,971 --> 00:34:57,838 it didn't want to hire you. 733 00:34:57,838 --> 00:35:00,405 So they take out all the gender markers, 734 00:35:00,405 --> 00:35:02,271 and all of the women's colleges-- 735 00:35:02,271 --> 00:35:04,305 all the things that explicitly says, 736 00:35:04,305 --> 00:35:05,605 "This is a man," and, "This is a woman," 737 00:35:05,605 --> 00:35:08,671 or even the ones that, obviously, implicitly say it. 738 00:35:08,671 --> 00:35:11,105 So they did that. 739 00:35:11,105 --> 00:35:13,671 And then they trained up their new deep neural network 740 00:35:13,671 --> 00:35:16,171 to decide who Amazon would hire. 741 00:35:16,171 --> 00:35:18,405 And it did something amazing. 742 00:35:18,405 --> 00:35:19,638 It did something no human could do. 743 00:35:19,638 --> 00:35:22,938 It figured out who was a woman and it wouldn't hire them. 744 00:35:24,571 --> 00:35:26,271 It was able to look through 745 00:35:26,271 --> 00:35:29,571 all of the correlations that existed 746 00:35:29,571 --> 00:35:30,771 in that massive data set 747 00:35:30,771 --> 00:35:35,638 and figure out which ones most strongly correlated 748 00:35:35,638 --> 00:35:37,438 with someone getting a promotion. 749 00:35:37,438 --> 00:35:40,871 And the single biggest correlate 750 00:35:40,871 --> 00:35:42,671 of getting a promotion was being a man. 751 00:35:42,671 --> 00:35:46,738 And it figured those patterns out and didn't hire women. 752 00:35:47,971 --> 00:35:54,038 NARRATOR: Amazon abandoned its hiring algorithm in 2017. 753 00:35:54,038 --> 00:35:55,938 Remember the way machine learning works, right? 754 00:35:55,938 --> 00:35:57,905 It's like a student who doesn't really understand 755 00:35:57,905 --> 00:35:59,171 the material in the class. 756 00:35:59,171 --> 00:36:03,105 They got a bunch of questions, they got a bunch of answers. 757 00:36:00,438 --> 00:36:03,105 They got a bunch of questions, they got a bunch of answers. 758 00:36:03,105 --> 00:36:04,605 And now they're trying to pattern match 759 00:36:04,605 --> 00:36:06,171 for a new question and say, "Oh, wait. 760 00:36:06,171 --> 00:36:08,071 "Let me find an answer that looks pretty much 761 00:36:08,071 --> 00:36:09,838 like the questions and answers I saw before." 762 00:36:09,838 --> 00:36:13,071 The algorithm only worked because someone has said, 763 00:36:13,071 --> 00:36:16,605 "Oh, this person whose data you have, 764 00:36:16,605 --> 00:36:18,171 "they were a good employee. 765 00:36:18,171 --> 00:36:19,405 This other person was a bad employee," or, "This person 766 00:36:19,405 --> 00:36:22,471 performed well," or, "This person did not perform well." 767 00:36:24,971 --> 00:36:27,205 O'NEIL: Because algorithms don't just look for patterns, 768 00:36:27,205 --> 00:36:29,305 they look for patterns of success, however it's defined. 769 00:36:29,305 --> 00:36:32,805 But the definition of success is really critically important 770 00:36:32,805 --> 00:36:35,805 to what that end up, ends up being. 771 00:36:35,805 --> 00:36:37,738 And a lot of, a lot of opinion 772 00:36:37,738 --> 00:36:40,738 is embedded in, what, what does success look like? 773 00:36:43,071 --> 00:36:44,605 NARRATOR: In the case of algorithms, 774 00:36:44,605 --> 00:36:47,505 human choices play a critical role. 775 00:36:47,505 --> 00:36:50,171 O'NEIL: The data itself was curated. 776 00:36:50,171 --> 00:36:52,605 Someone decided what data to collect. 777 00:36:52,605 --> 00:36:55,705 Somebody decided what data was not relevant, right? 778 00:36:55,705 --> 00:36:58,238 And they don't exclude it necessarily 779 00:36:58,238 --> 00:36:59,838 intentionally-- they could be blind spots. 780 00:36:59,838 --> 00:37:03,405 NARRATOR: The need to identify such oversights 781 00:37:03,405 --> 00:37:04,405 becomes more urgent 782 00:37:04,405 --> 00:37:09,571 as technology takes on more decision making. 783 00:37:09,571 --> 00:37:11,971 ♪ ♪ 784 00:37:11,971 --> 00:37:15,205 Consider facial recognition technology, 785 00:37:15,205 --> 00:37:16,371 used by law enforcement 786 00:37:16,371 --> 00:37:19,105 in cities around the world for surveillance. 787 00:37:22,938 --> 00:37:24,638 In Detroit, 2018, 788 00:37:24,638 --> 00:37:28,471 law enforcement looked to facial recognition technology 789 00:37:28,471 --> 00:37:29,871 when $3,800 worth of watches 790 00:37:29,871 --> 00:37:34,438 were stolen from an upscale boutique. 791 00:37:35,771 --> 00:37:38,071 Police ran a still frame from the shop's surveillance video 792 00:37:38,071 --> 00:37:42,838 through their facial recognition system to find a match. 793 00:37:42,838 --> 00:37:46,238 How do I turn a face into numbers 794 00:37:46,238 --> 00:37:47,871 that equations can act with? 795 00:37:47,871 --> 00:37:50,671 You turn the individual pixels in the picture of that face 796 00:37:50,671 --> 00:37:53,938 into values. 797 00:37:56,138 --> 00:37:59,471 What it's really looking for are complex patterns 798 00:37:59,471 --> 00:38:01,738 across those pixels. 799 00:38:01,738 --> 00:38:04,705 The sequence of taking a pattern of numbers 800 00:38:04,705 --> 00:38:06,671 and transforming it 801 00:38:06,671 --> 00:38:08,371 into little edges and angles, 802 00:38:08,371 --> 00:38:14,105 then transforming that into eyes and cheekbones and mustaches. 803 00:38:15,638 --> 00:38:16,771 NARRATOR: To find that match, 804 00:38:16,771 --> 00:38:21,905 the system can be trained on billions of photographs. 805 00:38:23,538 --> 00:38:26,105 Facial recognition uses a class of machine learning 806 00:38:26,105 --> 00:38:28,438 called deep learning. 807 00:38:28,438 --> 00:38:30,671 The models built by deep learning techniques 808 00:38:30,671 --> 00:38:35,838 are called neural networks. 809 00:38:35,838 --> 00:38:37,038 VENKATASUBRAMANIAN: A neural network 810 00:38:37,038 --> 00:38:39,271 is, you know, stylized as, you know, trying to model 811 00:38:39,271 --> 00:38:41,805 how neural pathways work in the brain. 812 00:38:43,571 --> 00:38:44,571 You can think of a neural network 813 00:38:44,571 --> 00:38:50,038 as a collection of neurons. 814 00:38:50,038 --> 00:38:51,638 So you put some values into a neuron, 815 00:38:51,638 --> 00:38:55,905 and if they're, sufficiently, they add up to some number, 816 00:38:55,905 --> 00:38:57,238 or they cross some threshold, 817 00:38:57,238 --> 00:39:01,205 this one will fire and send off a new number to the next neuron. 818 00:39:01,205 --> 00:39:03,671 NARRATOR: At a certain threshold, 819 00:39:03,671 --> 00:39:05,971 the neuron will fire to the next neuron. 820 00:39:05,971 --> 00:39:10,671 If it's below the threshold, the neuron doesn't fire. 821 00:39:10,671 --> 00:39:12,771 This process repeats and repeats 822 00:39:12,771 --> 00:39:14,905 across hundreds, possibly thousands of layers, 823 00:39:14,905 --> 00:39:18,071 making connections like the neurons in our brain. 824 00:39:18,071 --> 00:39:19,838 ♪ ♪ 825 00:39:19,838 --> 00:39:24,138 The output is a predictive match. 826 00:39:27,271 --> 00:39:28,438 Based on a facial recognition match, 827 00:39:28,438 --> 00:39:32,471 in January 2020, the police arrested Robert Williams 828 00:39:32,471 --> 00:39:35,005 for the theft of the watches. 829 00:39:37,038 --> 00:39:38,838 The next day, he was released. 830 00:39:38,838 --> 00:39:42,738 Not only did Williams have an alibi, 831 00:39:42,738 --> 00:39:46,238 but it wasn't his face. 832 00:39:46,238 --> 00:39:49,071 MING: To be very blunt about it, these algorithms are probably 833 00:39:49,071 --> 00:39:51,671 dramatically over-trained on white faces. 834 00:39:51,671 --> 00:39:57,038 ♪ ♪ 835 00:39:57,038 --> 00:39:59,305 So, of course, algorithms that start out bad 836 00:39:59,305 --> 00:40:01,138 can be improved, in general. 837 00:40:01,138 --> 00:40:04,405 The Gender Shades project found that 838 00:40:04,405 --> 00:40:07,405 certain facial recognition technology, 839 00:40:07,405 --> 00:40:09,171 when they actually tested it on Black women, 840 00:40:09,171 --> 00:40:15,238 it was 65% accurate, whereas for white men, it was 99% accurate. 841 00:40:16,605 --> 00:40:19,738 How did they improve it? Because they did. 842 00:40:19,738 --> 00:40:21,571 They built an algorithm 843 00:40:21,571 --> 00:40:23,638 that was trained on more diverse data. 844 00:40:23,638 --> 00:40:26,438 So I don't think it's completely a lost cause 845 00:40:26,438 --> 00:40:30,238 to improve algorithms to be better. 846 00:40:31,705 --> 00:40:36,305 MAN (in ad voiceover): I used to think my job was all about arrests. 847 00:40:36,305 --> 00:40:38,038 NEDY: LESLIE KEN There was a commercial a few years ago 848 00:40:38,038 --> 00:40:40,638 that showed a police officer going to a gas station 849 00:40:40,638 --> 00:40:42,705 and then waiting for the criminal to show up. 850 00:40:42,705 --> 00:40:44,238 MAN: We analyze crime data, 851 00:40:44,238 --> 00:40:46,305 spot patterns, 852 00:40:46,305 --> 00:40:49,171 and figure out where to send patrols. 853 00:40:49,171 --> 00:40:51,438 They said, "Well, our algorithm will tell you exactly 854 00:40:51,438 --> 00:40:53,471 where the crime, the next crime is going to take place." 855 00:40:53,471 --> 00:40:56,271 Well, that's just silly, uh, it, that's not how it works. 856 00:40:56,271 --> 00:40:59,571 MAN: By stopping it before it happens. 857 00:40:59,571 --> 00:41:00,438 (sighs) 858 00:41:00,438 --> 00:41:03,105 MAN: Let's build a smarter planet. 859 00:41:08,238 --> 00:41:10,571 ♪ ♪ 860 00:41:10,571 --> 00:41:12,971 JOEL CAPLAN: Understanding what it is about these places 861 00:41:12,971 --> 00:41:15,671 that enable crime problems 862 00:41:15,671 --> 00:41:18,405 to emerge and/or persist. 863 00:41:18,405 --> 00:41:21,438 NARRATOR: At Rutgers University, 864 00:41:21,438 --> 00:41:22,671 the researchers who invented 865 00:41:22,671 --> 00:41:26,371 the crime mapping platform called Risk Terrain Modeling, 866 00:41:26,371 --> 00:41:28,371 or RTM, 867 00:41:28,371 --> 00:41:30,871 bristle at the term "predictive policing." 868 00:41:30,871 --> 00:41:36,105 CAPLAN (voiceover): We don't want to predict, we want to prevent. 869 00:41:37,338 --> 00:41:40,005 I worked as a police officer a long time ago, 870 00:41:40,005 --> 00:41:41,271 in the early 2000s. 871 00:41:41,271 --> 00:41:46,871 Police collected data for as long as police have existed. 872 00:41:46,871 --> 00:41:50,105 Now there's a greater recognition 873 00:41:50,105 --> 00:41:52,605 that data can have value. 874 00:41:52,605 --> 00:41:55,305 But it's not just about the data. 875 00:41:55,305 --> 00:41:56,771 It's about how you analyze it, how you use those results. 876 00:41:56,771 --> 00:42:00,405 There's only two data sets that risk terrain modeling uses. 877 00:42:00,405 --> 00:42:02,671 These data sets are local, 878 00:42:02,671 --> 00:42:07,871 current information about crime incidents within a given area 879 00:42:07,871 --> 00:42:11,038 and information about environmental features 880 00:42:11,038 --> 00:42:12,571 that exist in that landscape, 881 00:42:12,571 --> 00:42:13,738 such as bars, fast food restaurants, 882 00:42:13,738 --> 00:42:17,905 convenience stores, schools, parks, alleyways. 883 00:42:19,405 --> 00:42:20,871 KENNEDY: The algorithm is basically 884 00:42:20,871 --> 00:42:24,105 the relationship between these environmental features 885 00:42:24,105 --> 00:42:27,071 and the, the outcome data, which in this case is crime. 886 00:42:27,071 --> 00:42:28,971 The algorithm provides you with a map 887 00:42:28,971 --> 00:42:32,005 of the distribution of the risk values. 888 00:42:33,271 --> 00:42:35,605 ALEJANDRO GIME NEZ-SANTANA: This is the highest-risk area, 889 00:42:35,605 --> 00:42:37,005 on this commercial corridor on Bloomfield Avenue. 890 00:42:37,005 --> 00:42:41,738 NARRATOR: But the algorithm isn't intended for use just by police. 891 00:42:41,738 --> 00:42:44,871 Criminologist Alejandro Giminez-Santana 892 00:42:44,871 --> 00:42:47,238 leads the Newark Public Safety Collaborative, 893 00:42:47,238 --> 00:42:51,038 a collection of 40 community organizations. 894 00:42:51,038 --> 00:42:53,271 They use RTM as a diagnostic tool 895 00:42:53,271 --> 00:42:58,271 to understand not just where crime may happen next, 896 00:42:58,271 --> 00:43:00,838 but why. 897 00:43:00,838 --> 00:43:03,471 GIME NEZ-SANTANA: Through RTM, we identify this commercial corridor 898 00:43:03,471 --> 00:43:05,005 on Bloomfield Avenue, which is where we are right now, 899 00:43:05,005 --> 00:43:08,738 as a risky area for auto theft due to car idling. 900 00:43:08,738 --> 00:43:09,871 So why is this space 901 00:43:09,871 --> 00:43:12,638 particularly problematic when it comes to auto theft? 902 00:43:14,171 --> 00:43:16,038 One is because we're in a commercial corridor, 903 00:43:16,038 --> 00:43:17,538 where there's high density of people 904 00:43:17,538 --> 00:43:20,905 who go to the beauty salon or to go to a restaurant. 905 00:43:20,905 --> 00:43:22,538 Uber delivery and Uber Eats, 906 00:43:22,538 --> 00:43:24,771 delivery people who come to grab orders that also, 907 00:43:24,771 --> 00:43:26,838 and leave their cars running 908 00:43:26,838 --> 00:43:28,438 create the conditions for this crime 909 00:43:28,438 --> 00:43:31,238 to be concentrated in this particular area. 910 00:43:31,238 --> 00:43:33,171 What the data showed us was, 911 00:43:33,171 --> 00:43:35,971 there was a tremendous rise in auto vehicle thefts. 912 00:43:35,971 --> 00:43:39,238 But we convinced the police department 913 00:43:39,238 --> 00:43:41,505 to take a more social service approach. 914 00:43:41,505 --> 00:43:43,971 NARRATOR: Community organizers convinced police 915 00:43:43,971 --> 00:43:46,838 not to ticket idling cars, 916 00:43:46,838 --> 00:43:48,371 and let organizers create 917 00:43:48,371 --> 00:43:51,671 an effective public awareness poster campaign instead. 918 00:43:51,671 --> 00:43:54,171 And we put it out to the Newark students 919 00:43:54,171 --> 00:43:57,171 to submit in this flyer campaign, 920 00:43:57,171 --> 00:44:00,205 and have their artwork on the actual flyer. 921 00:44:00,205 --> 00:44:02,871 GIME NEZ-SANTANA: As you can see, this is the commercial corridor 922 00:44:02,871 --> 00:44:04,538 on Bloomfield Avenue. 923 00:44:04,538 --> 00:44:05,771 The site score shows a six, 924 00:44:05,771 --> 00:44:07,338 which means that we are at the highest risk of auto theft 925 00:44:07,338 --> 00:44:09,538 in this particular location. 926 00:44:09,538 --> 00:44:11,105 And as I move closer to the end 927 00:44:11,105 --> 00:44:14,805 of the commercial corridor, the site risk score is coming down. 928 00:44:14,805 --> 00:44:17,071 NARRATOR: This is the first time in Newark 929 00:44:17,071 --> 00:44:19,505 that police data for crime occurrences 930 00:44:19,505 --> 00:44:23,305 have been shared widely with community members. 931 00:44:23,305 --> 00:44:26,238 ELVIS PEREZ: The kind of data we share is incident-related data-- 932 00:44:26,238 --> 00:44:29,338 sort of time, location, that sort of information. 933 00:44:29,338 --> 00:44:31,471 We don't discuss any private arrest information. 934 00:44:31,471 --> 00:44:35,538 We're trying to avoid a crime. 935 00:44:37,905 --> 00:44:39,638 NARRATOR: In 2019, 936 00:44:39,638 --> 00:44:42,605 Caplan and Kennedy formed a start-up at Rutgers 937 00:44:42,605 --> 00:44:45,538 to meet the rising demand for their technology. 938 00:44:45,538 --> 00:44:49,138 Despite the many possible applications for RTM, 939 00:44:49,138 --> 00:44:51,871 from tracking public health issues 940 00:44:51,871 --> 00:44:53,571 to understanding vehicle crashes, 941 00:44:53,571 --> 00:44:59,738 law enforcement continues to be its principal application. 942 00:44:59,738 --> 00:45:01,438 Like any other technology, 943 00:45:01,438 --> 00:45:04,338 risk terrain modeling can be used for the public good 944 00:45:04,338 --> 00:45:06,438 when people use it wisely. 945 00:45:08,838 --> 00:45:14,871 ♪ ♪ 946 00:45:14,871 --> 00:45:17,838 We as academics and scientists, we actually need to be critical, 947 00:45:17,838 --> 00:45:20,338 because it could be the best model in the world, 948 00:45:20,338 --> 00:45:21,738 it could be very good predictions, 949 00:45:21,738 --> 00:45:22,905 but how you use those predictions 950 00:45:22,905 --> 00:45:24,771 matters, in some ways, even more. 951 00:45:24,771 --> 00:45:26,105 REPORTER: The police department 952 00:45:26,105 --> 00:45:28,805 had revised the SSL numerous times... 953 00:45:28,805 --> 00:45:30,105 NARRATOR: In 2019, 954 00:45:30,105 --> 00:45:34,671 Chicago's inspector general contracted the RAND Corporation 955 00:45:34,671 --> 00:45:38,038 to evaluate the Strategic Subject List, 956 00:45:38,038 --> 00:45:39,571 the predictive policing platform 957 00:45:39,571 --> 00:45:45,738 that incorporated Papachristos's research on social networks. 958 00:45:45,738 --> 00:45:47,405 PAPACHRISTOS: I never wanted to go down this path 959 00:45:47,405 --> 00:45:51,105 of who was the person that was the potential suspect. 960 00:45:51,105 --> 00:45:53,071 And that problem is not necessarily 961 00:45:53,071 --> 00:45:55,038 with the statistical model, it's the fact that someone 962 00:45:55,038 --> 00:45:57,171 took victim and made him an offender. 963 00:45:57,171 --> 00:46:00,138 You've criminalized someone who is at risk, 964 00:46:00,138 --> 00:46:01,671 that you should be prioritizing saving their life. 965 00:46:01,671 --> 00:46:07,271 NARRATOR: It turned out that some 400,000 people were included on the SSL. 966 00:46:07,271 --> 00:46:13,605 Of those, 77% were Black or Hispanic. 967 00:46:15,138 --> 00:46:18,105 The inspector general's audit revealed 968 00:46:18,105 --> 00:46:20,771 that SSL scores were unreliable. 969 00:46:20,771 --> 00:46:23,638 The Rand Corporation found the program had no impact 970 00:46:23,638 --> 00:46:28,238 on homicide or victimization rates. 971 00:46:28,238 --> 00:46:31,371 (protesters chanting) 972 00:46:31,371 --> 00:46:34,305 NARRATOR: The program was shut down. 973 00:46:37,171 --> 00:46:38,271 But data collection continues 974 00:46:38,271 --> 00:46:41,671 to be essential to law enforcement. 975 00:46:41,671 --> 00:46:45,305 ♪ ♪ 976 00:46:45,305 --> 00:46:47,938 O'NEIL: There are things about us that we might not even 977 00:46:47,938 --> 00:46:51,305 be aware of that are sort of being collected 978 00:46:51,305 --> 00:46:52,938 by the data brokers 979 00:46:52,938 --> 00:46:55,071 and will be held against us for the rest of our lives-- 980 00:46:55,071 --> 00:46:58,605 held against people forever, digitally. 981 00:46:59,738 --> 00:47:03,138 NARRATOR: Data is produced and collected. 982 00:47:03,138 --> 00:47:05,505 But is it accurate? 983 00:47:05,505 --> 00:47:08,505 And can the data be properly vetted? 984 00:47:08,505 --> 00:47:09,638 PAPACHRISTOS: And that was one of the critiques 985 00:47:09,638 --> 00:47:12,871 of not just the Strategic Subjects List, 986 00:47:12,871 --> 00:47:14,171 but the gang database in Chicago. 987 00:47:14,171 --> 00:47:18,071 Any data source that treats data as a stagnant, forever condition 988 00:47:18,071 --> 00:47:20,371 is a problem. 989 00:47:23,405 --> 00:47:25,438 WOMAN: The gang database has been around for four years. 990 00:47:25,438 --> 00:47:28,271 It'll be five in January. 991 00:47:28,271 --> 00:47:30,905 We want to get rid of surveillance 992 00:47:30,905 --> 00:47:33,005 in Black and brown communities. 993 00:47:33,005 --> 00:47:34,838 BENJAMIN: In places like Chicago, 994 00:47:34,838 --> 00:47:37,371 in places like L.A., where I grew up, 995 00:47:37,371 --> 00:47:40,671 there are gang databases with tens of thousands 996 00:47:40,671 --> 00:47:43,771 of people listed, their names listed in these databases. 997 00:47:43,771 --> 00:47:45,905 Just by simply having a certain name 998 00:47:45,905 --> 00:47:47,905 and coming from a certain ZIP code 999 00:47:47,905 --> 00:47:50,505 could land you in these databases. 1000 00:47:50,505 --> 00:47:53,038 Do you all feel safe in Chicago? 1001 00:47:53,038 --> 00:47:54,338 DARRELL DACRES: The cops pulled up out of nowhere. 1002 00:47:54,338 --> 00:47:59,238 Didn't ask any questions, just immediately start beating on us. 1003 00:47:59,238 --> 00:48:00,471 And basically were saying, like, 1004 00:48:00,471 --> 00:48:02,838 what are, what are we doing over here, you know, like, 1005 00:48:02,838 --> 00:48:04,505 in this, in this gangbang area? 1006 00:48:04,505 --> 00:48:07,038 I was already labeled as a gangbanger 1007 00:48:07,038 --> 00:48:08,871 from that area because of where I lived. 1008 00:48:08,871 --> 00:48:10,738 I, I just happened to live there. 1009 00:48:12,738 --> 00:48:13,905 NARRATOR: The Chicago gang database 1010 00:48:13,905 --> 00:48:17,705 is shared with hundreds of law enforcement agencies. 1011 00:48:17,705 --> 00:48:19,538 Even if someone is wrongly included, 1012 00:48:19,538 --> 00:48:24,738 there is no mechanism to have their name removed. 1013 00:48:24,738 --> 00:48:27,171 If you try to apply for an apartment, 1014 00:48:27,171 --> 00:48:29,638 or if you try to apply for a job or a college, 1015 00:48:29,638 --> 00:48:34,838 or even in a, um, a house, it will show 1016 00:48:34,838 --> 00:48:37,805 that you are in this record of a gang database. 1017 00:48:37,805 --> 00:48:39,605 I was arrested for peacefully protesting. 1018 00:48:39,605 --> 00:48:42,105 And they told me that, "Well, 1019 00:48:42,105 --> 00:48:43,971 you're in the gang database." 1020 00:48:43,971 --> 00:48:46,338 But I was never in no gang. 1021 00:48:46,338 --> 00:48:47,471 MAN: Because you have a gang designation, 1022 00:48:47,471 --> 00:48:49,505 you're a security threat group, 1023 00:48:49,505 --> 00:48:51,205 right? 1024 00:48:51,205 --> 00:48:52,505 NARRATOR: Researchers and activists 1025 00:48:52,505 --> 00:48:54,905 have been instrumental in dismantling 1026 00:48:54,905 --> 00:48:57,371 some of these systems. 1027 00:48:57,371 --> 00:48:58,471 And so we continue to push back. 1028 00:48:58,471 --> 00:48:59,338 I mean, the fight is not going to finish 1029 00:48:59,338 --> 00:49:01,205 until we get rid of the database. 1030 00:49:01,205 --> 00:49:02,605 ♪ ♪ 1031 00:49:02,605 --> 00:49:05,238 FERGUSON: I think what we're seeing now 1032 00:49:05,238 --> 00:49:07,771 is not a move away from data. 1033 00:49:07,771 --> 00:49:11,305 It's just a move away from this term "predictive policing." 1034 00:49:11,305 --> 00:49:14,238 But we're seeing big companies, 1035 00:49:14,238 --> 00:49:15,671 big tech, enter the policing space. 1036 00:49:15,671 --> 00:49:19,571 We're seeing the reality that almost all policing now 1037 00:49:19,571 --> 00:49:21,905 is data-driven. 1038 00:49:21,905 --> 00:49:23,638 You're seeing these same police departments 1039 00:49:23,638 --> 00:49:25,271 invest heavily in the technology, 1040 00:49:25,271 --> 00:49:28,405 including other forms of surveillance technology, 1041 00:49:28,405 --> 00:49:30,605 including other forms of databases 1042 00:49:30,605 --> 00:49:32,438 to sort of manage policing. 1043 00:49:32,438 --> 00:49:33,838 (chanting): We want you out! 1044 00:49:33,838 --> 00:49:37,371 NARRATOR: More citizens are calling for regulations 1045 00:49:37,371 --> 00:49:38,571 to audit algorithms 1046 00:49:38,571 --> 00:49:42,205 and guarantee they're accomplishing what they promise 1047 00:49:42,205 --> 00:49:43,438 without harm. 1048 00:49:43,438 --> 00:49:46,905 BRAYNE: Ironically, there is very little data 1049 00:49:46,905 --> 00:49:49,305 on police use of big data. 1050 00:49:49,305 --> 00:49:52,338 And there is no systematic data 1051 00:49:52,338 --> 00:49:54,771 at a national level on how these tools are used. 1052 00:49:54,771 --> 00:49:57,805 The deployment of these tools 1053 00:49:57,805 --> 00:50:00,105 so far outpaces 1054 00:50:00,105 --> 00:50:02,638 legal and regulatory responses to them. 1055 00:50:02,638 --> 00:50:03,605 What you have happening 1056 00:50:03,605 --> 00:50:06,638 is essentially this regulatory Wild West. 1057 00:50:08,005 --> 00:50:09,338 O'NEIL: And we're, like, "Well, it's an algorithm, 1058 00:50:09,338 --> 00:50:11,371 let's, let's just throw it into production." 1059 00:50:11,371 --> 00:50:13,071 Without testing it to whether 1060 00:50:13,071 --> 00:50:18,971 it "works" sufficiently, um, at all. 1061 00:50:20,338 --> 00:50:22,838 NARRATOR: Multiple requests for comment 1062 00:50:22,838 --> 00:50:23,971 from police agencies and law enforcement officials 1063 00:50:23,971 --> 00:50:27,771 in several cities, including Chicago and New York, 1064 00:50:27,771 --> 00:50:31,938 were either declined or went unanswered. 1065 00:50:31,938 --> 00:50:36,971 ♪ ♪ 1066 00:50:36,971 --> 00:50:40,538 Artificial intelligence must serve people, 1067 00:50:40,538 --> 00:50:42,671 and therefore artificial intelligence 1068 00:50:42,671 --> 00:50:44,338 must always comply with people's rights. 1069 00:50:44,338 --> 00:50:49,538 NARRATOR: The European Union is preparing to implement legislation 1070 00:50:49,538 --> 00:50:51,438 to regulate artificial intelligence. 1071 00:50:51,438 --> 00:50:56,838 In 2021, bills to regulate data science algorithms 1072 00:50:56,838 --> 00:50:59,671 were introduced in 17 states, 1073 00:50:59,671 --> 00:51:03,505 and enacted in Alabama, Colorado, 1074 00:51:03,505 --> 00:51:05,471 Illinois, and Mississippi. 1075 00:51:05,471 --> 00:51:07,671 SWEENEY: If you look carefully on electrical devices, 1076 00:51:07,671 --> 00:51:10,771 you'll see "U.L.," for Underwriters Laboratory. 1077 00:51:10,771 --> 00:51:11,938 That's a process that came about 1078 00:51:11,938 --> 00:51:13,538 so that things, when you plugged them in, 1079 00:51:13,538 --> 00:51:14,905 didn't blow up in your hand. 1080 00:51:14,905 --> 00:51:16,371 That's the same kind of idea 1081 00:51:16,371 --> 00:51:18,971 that we need in these algorithms. 1082 00:51:21,305 --> 00:51:24,271 O'NEIL: We can adjust it to make it better than the past, 1083 00:51:24,271 --> 00:51:25,905 and we can do it carefully, 1084 00:51:25,905 --> 00:51:28,105 and we can do it with, with precision 1085 00:51:28,105 --> 00:51:30,838 in an ongoing conversation about what it means to us 1086 00:51:30,838 --> 00:51:34,071 that it is, it's biased in the right way. 1087 00:51:34,071 --> 00:51:35,405 I don't think you remove bias, 1088 00:51:35,405 --> 00:51:37,905 but you get to a bias that you can live with, 1089 00:51:37,905 --> 00:51:40,638 that you, you think is moral. 1090 00:51:40,638 --> 00:51:43,005 To be clear, like, I, I think we can do better, 1091 00:51:43,005 --> 00:51:44,571 but often doing better 1092 00:51:44,571 --> 00:51:47,571 would look like we don't use this at all. 1093 00:51:47,571 --> 00:51:48,771 (radio running) 1094 00:51:48,771 --> 00:51:51,071 FARID: There's nothing fundamentally wrong 1095 00:51:51,071 --> 00:51:52,338 with trying to predict the future, 1096 00:51:52,338 --> 00:51:54,505 as long as you understand how the algorithms are working, 1097 00:51:54,505 --> 00:51:55,538 how are they being deployed. 1098 00:51:55,538 --> 00:51:59,138 What is the consequence of getting it right? 1099 00:51:59,138 --> 00:52:00,471 And most importantly is, 1100 00:52:00,471 --> 00:52:03,138 what is the consequence of getting it wrong? 1101 00:52:03,138 --> 00:52:04,338 OFFICER: Keep your hands on the steering wheel! 1102 00:52:04,338 --> 00:52:06,005 MAN: My hands haven't moved off the steering wheel! 1103 00:52:06,005 --> 00:52:07,838 MAN 2: Are you gonna arrest me? 1104 00:52:07,838 --> 00:52:08,705 MAN 1: Officer, what are we here for? 1105 00:52:08,705 --> 00:52:09,505 OFFICER: We just want to talk with... 1106 00:52:31,338 --> 00:52:34,538 ♪ ♪ 1107 00:52:55,505 --> 00:52:58,571 ANNOUNCER: This program is available with PBS Passport 1108 00:52:58,571 --> 00:53:00,871 and on Amazon Prime Video. 1109 00:53:00,871 --> 00:53:04,838 ♪ ♪ 1110 00:53:15,371 --> 00:53:20,938 ♪ ♪