1
00:00:00,000 --> 00:00:08,895
*Musik*

2
00:00:08,895 --> 00:00:20,040
Herald: Who of you is using Facebook? Twitter? 
Diaspora?

3
00:00:20,040 --> 00:00:27,630
*concerned noise* And all of that data
you enter there

4
00:00:27,630 --> 00:00:34,240
gets to server, gets into the hand of somebody
who's using it

5
00:00:34,240 --> 00:00:38,519
and the next talk
is especially about that,

6
00:00:38,519 --> 00:00:43,879
because there's also intelligent machines
and intelligent algorithms

7
00:00:43,879 --> 00:00:47,489
that try to make something
out of that data.

8
00:00:47,489 --> 00:00:50,920
So the post-doc researcher Jennifer Helsby

9
00:00:50,920 --> 00:00:55,839
of the University of Chicago,
which works in this

10
00:00:55,839 --> 00:00:59,370
intersection between policy and 
technology,

11
00:00:59,370 --> 00:01:04,709
will now ask you the question:
To who would we give that power?

12
00:01:04,709 --> 00:01:12,860
Dr. Helsby: Thanks.
*applause*

13
00:01:12,860 --> 00:01:17,090
Okay, so, today I'm gonna do a brief tour
of intelligent systems

14
00:01:17,090 --> 00:01:18,640
and how they're currently used

15
00:01:18,640 --> 00:01:21,760
and then we're gonna look at some examples
with respect

16
00:01:21,760 --> 00:01:23,710
to the properties that we might care about

17
00:01:23,710 --> 00:01:26,000
these systems having,
and I'll talk a little bit about

18
00:01:26,000 --> 00:01:27,940
some of the work that's been done in academia

19
00:01:27,940 --> 00:01:28,680
on these topics.

20
00:01:28,680 --> 00:01:31,780
And then we'll talk about some
promising paths forward.

21
00:01:31,780 --> 00:01:37,040
So, I wanna start with this:
Kranzberg's First Law of Technology

22
00:01:37,040 --> 00:01:40,420
So, it's not good or bad,
but it also isn't neutral.

23
00:01:40,420 --> 00:01:42,980
Technology shapes our world,
and it can act as

24
00:01:42,980 --> 00:01:46,140
a liberating force-- or an oppressive and
controlling force.

25
00:01:46,140 --> 00:01:49,730
So, in this talk, I'm gonna go
towards some of the aspects

26
00:01:49,730 --> 00:01:53,830
of intelligent systems that might be more
controlling in nature.

27
00:01:53,830 --> 00:01:56,060
So, as we all know,

28
00:01:56,060 --> 00:01:59,770
because of the rapidly decreasing cost
of storage and computation,

29
00:01:59,770 --> 00:02:02,170
along with the rise of new sensor technologies,

30
00:02:02,170 --> 00:02:05,510
data collection devices
are being pushed into every

31
00:02:05,510 --> 00:02:08,329
aspect of our lives: in our homes, our cars,

32
00:02:08,329 --> 00:02:10,469
in our pockets, on our wrists.

33
00:02:10,469 --> 00:02:13,280
And data collection systems act as intermediaries

34
00:02:13,280 --> 00:02:15,230
for a huge amount of human communication.

35
00:02:15,230 --> 00:02:17,900
And much of this data sits in government

36
00:02:17,900 --> 00:02:19,860
and corporate databases.

37
00:02:19,860 --> 00:02:23,090
So, in order to make use of this data,

38
00:02:23,090 --> 00:02:27,280
we need to be able to make some inferences.

39
00:02:27,280 --> 00:02:30,280
So, one way of approaching this is I can hire

40
00:02:30,280 --> 00:02:32,310
a lot of humans, and I can have these humans

41
00:02:32,310 --> 00:02:34,990
manually examine the data, and they can acquire

42
00:02:34,990 --> 00:02:36,900
expert knowledge of the domain, and then

43
00:02:36,900 --> 00:02:38,510
perhaps they can make some decisions

44
00:02:38,510 --> 00:02:40,830
or at least some recommendations
based on it.

45
00:02:40,830 --> 00:02:43,030
However, there's some problems with this.

46
00:02:43,030 --> 00:02:45,810
One is that it's slow, and thus expensive.

47
00:02:45,810 --> 00:02:48,060
It's also biased. We know that humans have

48
00:02:48,060 --> 00:02:50,700
all sorts of biases, both conscious and unconscious,

49
00:02:50,700 --> 00:02:53,390
and it would be nice to have a system
that did not have

50
00:02:53,390 --> 00:02:54,959
these inaccuracies.

51
00:02:54,959 --> 00:02:57,069
It's also not very transparent: I might

52
00:02:57,069 --> 00:02:58,910
not really know the factors that led to

53
00:02:58,910 --> 00:03:00,930
some decisions being made.

54
00:03:00,930 --> 00:03:03,360
Even humans themselves
often don't really understand

55
00:03:03,360 --> 00:03:05,360
why they came to a given decision, because

56
00:03:05,360 --> 00:03:08,130
of their being emotional in nature.

57
00:03:08,130 --> 00:03:11,530
And, thus, these human decision making systems

58
00:03:11,530 --> 00:03:13,170
are often difficult to audit.

59
00:03:13,170 --> 00:03:15,819
So, another way to proceed is maybe instead

60
00:03:15,819 --> 00:03:18,000
I study the system and the data carefully

61
00:03:18,000 --> 00:03:20,520
and I write down the best rules
for making a decision

62
00:03:20,520 --> 00:03:23,280
or, I can have a machine
dynamically figure out

63
00:03:23,280 --> 00:03:25,459
the best rules, as in machine learning.

64
00:03:25,459 --> 00:03:28,640
So, maybe this is a better approach.

65
00:03:28,640 --> 00:03:32,230
It's certainly fast, and thus cheap.

66
00:03:32,230 --> 00:03:34,290
And maybe I can construct
the system in such a way

67
00:03:34,290 --> 00:03:37,090
that it doesn't have the biases that are inherent

68
00:03:37,090 --> 00:03:39,209
in human decision making.

69
00:03:39,209 --> 00:03:41,560
And, since I've written these rules down,

70
00:03:41,560 --> 00:03:42,819
or a computer has learned these rules,

71
00:03:42,819 --> 00:03:45,140
then I can just show them to somebody, right?

72
00:03:45,140 --> 00:03:46,819
And then they can audit it.

73
00:03:46,819 --> 00:03:49,020
So, more and more decision making is being

74
00:03:49,020 --> 00:03:50,750
done in this way.

75
00:03:50,750 --> 00:03:53,170
And so, in this model, we take data

76
00:03:53,170 --> 00:03:55,709
we make an inference based on that data

77
00:03:55,709 --> 00:03:58,120
using these algorithms, and then

78
00:03:58,120 --> 00:03:59,420
we can take actions.

79
00:03:59,420 --> 00:04:01,860
And, when we take this more scientific approach

80
00:04:01,860 --> 00:04:04,200
to making decisions and optimizing for

81
00:04:04,200 --> 00:04:07,310
a desired outcome,
we can take an experimental approach

82
00:04:07,310 --> 00:04:10,080
so we can determine
which actions are most effective

83
00:04:10,080 --> 00:04:12,310
in achieving a desired outcome.

84
00:04:12,310 --> 00:04:14,010
Maybe there are some types of communication

85
00:04:14,010 --> 00:04:16,750
styles that are most effective
with certain people.

86
00:04:16,750 --> 00:04:19,510
I can perhaps deploy some individualized incentives

87
00:04:19,510 --> 00:04:22,060
to get the outcome that I desire.

88
00:04:22,060 --> 00:04:25,990
And, maybe even if I carefully design an experiment

89
00:04:25,990 --> 00:04:27,810
with the environment in which people make

90
00:04:27,810 --> 00:04:30,699
these decisions, perhaps even very small changes

91
00:04:30,699 --> 00:04:34,250
can introduce significant changes
in peoples' behavior.

92
00:04:34,250 --> 00:04:37,320
So, through these mechanisms,
and this experimental approach,

93
00:04:37,320 --> 00:04:39,840
I can maximize the probability
that humans do

94
00:04:39,840 --> 00:04:42,020
what I want.

95
00:04:42,020 --> 00:04:45,380
So, algorithmic decision making is being used

96
00:04:45,380 --> 00:04:47,270
in industry, and is used
in lots of other areas,

97
00:04:47,270 --> 00:04:49,530
from astrophysics to medicine, and is now

98
00:04:49,530 --> 00:04:52,199
moving into new domains, including

99
00:04:52,199 --> 00:04:53,990
government applications.

100
00:04:53,990 --> 00:04:58,560
So, we have recommendation engines like
Netflix, Yelp, SoundCloud,

101
00:04:58,560 --> 00:05:00,699
that direct our attention to what we should

102
00:05:00,699 --> 00:05:03,510
watch and listen to.

103
00:05:03,510 --> 00:05:07,919
Since 2009, Google uses
personalized searched results,

104
00:05:07,919 --> 00:05:12,840
including if you're not logged in
into your Google account.

105
00:05:12,840 --> 00:05:15,389
And we also have algorithm curation and filtering,

106
00:05:15,389 --> 00:05:17,530
as in the case of Facebook News Feed,

107
00:05:17,530 --> 00:05:19,870
Google News, Yahoo News,

108
00:05:19,870 --> 00:05:22,840
which shows you what news articles, for example,

109
00:05:22,840 --> 00:05:24,330
you should be looking at.

110
00:05:24,330 --> 00:05:25,650
And this is important, because a lot of people

111
00:05:25,650 --> 00:05:29,410
get news from these media.

112
00:05:29,410 --> 00:05:31,520
We even have algorithmic journalists!

113
00:05:31,520 --> 00:05:35,240
So, automatic systems generate articles

114
00:05:35,240 --> 00:05:36,880
about weather, traffic, or sports

115
00:05:36,880 --> 00:05:38,729
instead of a human.

116
00:05:38,729 --> 00:05:41,949
And, another application that's more recent

117
00:05:41,949 --> 00:05:43,570
is the use of predictive systems

118
00:05:43,570 --> 00:05:45,180
in political campaigns.

119
00:05:45,180 --> 00:05:47,370
So, political campaigns also now take this

120
00:05:47,370 --> 00:05:50,340
approach to predict on an individual basis

121
00:05:50,340 --> 00:05:53,300
which candidate voters
are likely to vote for.

122
00:05:53,300 --> 00:05:55,500
And then they can target,
on an individual basis,

123
00:05:55,500 --> 00:05:58,199
those that can be persuaded otherwise.

124
00:05:58,199 --> 00:06:00,830
And, finally, in the public sector,

125
00:06:00,830 --> 00:06:02,710
we're starting to use predictive systems

126
00:06:02,710 --> 00:06:06,320
in areas from policing, to health,
to education and energy.

127
00:06:06,320 --> 00:06:08,979
So, there are some advantages to this.

128
00:06:08,979 --> 00:06:12,790
So, one thing is that we can automate

129
00:06:12,790 --> 00:06:15,759
aspects of our lives
that we consider to be mundane

130
00:06:15,759 --> 00:06:17,620
using systems that are intelligent

131
00:06:17,620 --> 00:06:19,580
and adaptive enough.

132
00:06:19,580 --> 00:06:21,680
We can make use of all the data

133
00:06:21,680 --> 00:06:23,990
and really get the pieces of information we

134
00:06:23,990 --> 00:06:25,830
really care about.

135
00:06:25,830 --> 00:06:29,650
We can spend money in the most effective way,

136
00:06:29,650 --> 00:06:32,110
and we can do this with this experimental

137
00:06:32,110 --> 00:06:34,210
approach to optimize actions to produce

138
00:06:34,210 --> 00:06:35,190
desired outcomes.

139
00:06:35,190 --> 00:06:37,300
So, we can embed intelligence

140
00:06:37,300 --> 00:06:39,520
into all of these mundane objects

141
00:06:39,520 --> 00:06:41,180
and enable them to make decisions for us,

142
00:06:41,180 --> 00:06:42,860
and so that's what we're doing more and more,

143
00:06:42,860 --> 00:06:45,210
and we can have an object
that decides for us

144
00:06:45,210 --> 00:06:46,840
what temperature we should set our house,

145
00:06:46,840 --> 00:06:49,009
what we should be doing, etc.

146
00:06:49,009 --> 00:06:52,400
So, there might be some implications here.

147
00:06:52,400 --> 00:06:55,680
We want these systems
that do work on this data

148
00:06:55,680 --> 00:06:58,039
to increase the opportunities
available to us.

149
00:06:58,039 --> 00:07:00,259
But it might be that there are some implications

150
00:07:00,259 --> 00:07:01,780
that we have not carefully thought through.

151
00:07:01,780 --> 00:07:03,430
This is a new area, and people are only

152
00:07:03,430 --> 00:07:05,940
starting to scratch the surface of what the

153
00:07:05,940 --> 00:07:07,289
problems might be.

154
00:07:07,289 --> 00:07:09,600
In some cases, they might narrow the options

155
00:07:09,600 --> 00:07:10,990
available to people,

156
00:07:10,990 --> 00:07:13,199
and this approach subjects people to

157
00:07:13,199 --> 00:07:15,620
suggestive messaging intended to nudge them

158
00:07:15,620 --> 00:07:17,169
to a desired outcome.

159
00:07:17,169 --> 00:07:19,320
Some people may have a problem with that.

160
00:07:19,320 --> 00:07:20,650
Values we care about are not gonna be

161
00:07:20,650 --> 00:07:23,860
baked into these systems by default.

162
00:07:23,860 --> 00:07:25,960
It's also the case that some algorithmic systems

163
00:07:25,960 --> 00:07:28,300
facilitate work that we do not like.

164
00:07:28,300 --> 00:07:30,199
For example, in the case of mass surveillance.

165
00:07:30,199 --> 00:07:32,130
And even the same systems,

166
00:07:32,130 --> 00:07:34,039
used by different people or organizations,

167
00:07:34,039 --> 00:07:36,110
have very different consequences.

168
00:07:36,110 --> 00:07:37,320
For example, if I can predict

169
00:07:37,320 --> 00:07:40,020
with high accuracy, based on say search queries,

170
00:07:40,020 --> 00:07:42,050
who's gonna be admitted to a hospital,

171
00:07:42,050 --> 00:07:43,750
some people would be interested
in knowing that.

172
00:07:43,750 --> 00:07:46,120
You might be interested
in having your doctor know that.

173
00:07:46,120 --> 00:07:47,919
But that same predictive model
in the hands of

174
00:07:47,919 --> 00:07:50,569
an insurance company
has a very different implication.

175
00:07:50,569 --> 00:07:53,389
So, the point here is that these systems

176
00:07:53,389 --> 00:07:55,860
structure and influence how humans interact

177
00:07:55,860 --> 00:07:58,360
with each other, how they interact with society,

178
00:07:58,360 --> 00:07:59,850
and how they interact with government.

179
00:07:59,850 --> 00:08:03,080
And if they constrain what people can do,

180
00:08:03,080 --> 00:08:05,069
we should really care about this.

181
00:08:05,069 --> 00:08:08,270
So now I'm gonna go to
sort of an extreme case,

182
00:08:08,270 --> 00:08:11,930
just as an example, and that's this
Chinese Social Credit System.

183
00:08:11,930 --> 00:08:14,169
And so this is probably one of the more

184
00:08:14,169 --> 00:08:17,259
ambitious uses of data,

185
00:08:17,259 --> 00:08:18,880
that is used to rank each citizen

186
00:08:18,880 --> 00:08:21,190
based on their behavior, in China.

187
00:08:21,190 --> 00:08:24,210
So right now, there are various pilot systems

188
00:08:24,210 --> 00:08:27,660
deployed by various companies doing this in
China.

189
00:08:27,660 --> 00:08:30,729
They're currently voluntary, and by 2020

190
00:08:30,729 --> 00:08:32,630
this system is gonna be decided on,

191
00:08:32,630 --> 00:08:34,679
or a combination of the systems,

192
00:08:34,679 --> 00:08:37,409
that is gonna be mandatory for everyone.

193
00:08:37,409 --> 00:08:40,950
And so, in this system, there are some citizens,

194
00:08:40,950 --> 00:08:44,380
and a huge range of data sources are used.

195
00:08:44,380 --> 00:08:46,820
So, some of the data sources are

196
00:08:46,820 --> 00:08:48,360
your financial data,

197
00:08:48,360 --> 00:08:50,020
your criminal history,

198
00:08:50,020 --> 00:08:52,320
how many points you have
on your driver's license,

199
00:08:52,320 --> 00:08:55,360
medical information-- for example,
if you take birth control pills,

200
00:08:55,360 --> 00:08:56,810
that's incorporated.

201
00:08:56,810 --> 00:08:59,830
Your purchase history-- for example,
if you purchase games,

202
00:08:59,830 --> 00:09:02,430
you are down-ranked in the system.

203
00:09:02,430 --> 00:09:04,490
Some of the systems, not all of them,

204
00:09:04,490 --> 00:09:07,260
incorporate social media monitoring,

205
00:09:07,260 --> 00:09:09,200
which makes sense if you're a state like China,

206
00:09:09,200 --> 00:09:11,270
you probably want to know about

207
00:09:11,270 --> 00:09:14,899
political statements that people
are saying on social media.

208
00:09:14,899 --> 00:09:18,020
And, one of the more interesting parts is

209
00:09:18,020 --> 00:09:22,160
social network analysis:
looking at the relationships between people.

210
00:09:22,160 --> 00:09:24,270
So, if you have a close relationship with
somebody

211
00:09:24,270 --> 00:09:26,180
and they have a low credit score,

212
00:09:26,180 --> 00:09:29,130
that can have implications on your credit
score.

213
00:09:29,130 --> 00:09:34,440
So, the way that these scores
are generated is secret.

214
00:09:34,440 --> 00:09:38,140
And, according to the call for these systems

215
00:09:38,140 --> 00:09:39,270
put out by the government,

216
00:09:39,270 --> 00:09:42,810
the goal is to
"carry forward the sincerity and

217
00:09:42,810 --> 00:09:45,760
traditional virtues" and
establish the idea of a

218
00:09:45,760 --> 00:09:47,520
"sincerity culture."

219
00:09:47,520 --> 00:09:49,440
But wait, it gets better:

220
00:09:49,440 --> 00:09:52,450
so, there's a portal that enables citizens

221
00:09:52,450 --> 00:09:55,040
to look up the citizen score of anyone.

222
00:09:55,040 --> 00:09:56,520
And many people like this system,

223
00:09:56,520 --> 00:09:58,320
they think it's a fun game.

224
00:09:58,320 --> 00:10:00,700
They boast about it on social media,

225
00:10:00,700 --> 00:10:03,610
they put their score in their dating profile,

226
00:10:03,610 --> 00:10:04,760
because if you're ranked highly you're

227
00:10:04,760 --> 00:10:06,589
part of an exclusive club.

228
00:10:06,589 --> 00:10:10,060
You can get VIP treatment
at hotels and other companies.

229
00:10:10,060 --> 00:10:11,880
But the downside is that, if you're excluded

230
00:10:11,880 --> 00:10:15,540
from that club, your weak score
may have other implications,

231
00:10:15,540 --> 00:10:20,120
like being unable to get access
to credit, housing, jobs.

232
00:10:20,120 --> 00:10:23,399
There is some reporting that even travel visas

233
00:10:23,399 --> 00:10:27,000
might be restricted
if your score is particularly low.

234
00:10:27,000 --> 00:10:31,160
So, a system like this, for a state, is really

235
00:10:31,160 --> 00:10:34,690
the optimal solution
to the problem of the public.

236
00:10:34,690 --> 00:10:37,130
It constitutes a very subtle and insiduous

237
00:10:37,130 --> 00:10:39,350
mechanism of social control.

238
00:10:39,350 --> 00:10:41,209
You don't need to spend a lot of money on

239
00:10:41,209 --> 00:10:43,800
police or prisons if you can set up a system

240
00:10:43,800 --> 00:10:45,820
where people discourage one another from

241
00:10:45,820 --> 00:10:48,930
anti-social acts like political action
in exchange for

242
00:10:48,930 --> 00:10:51,430
a coupon for a free Uber ride.

243
00:10:51,430 --> 00:10:55,269
So, there are a lot of
legitimate questions here:

244
00:10:55,269 --> 00:10:58,370
What protections does
user data have in this scheme?

245
00:10:58,370 --> 00:11:01,279
Do any safeguards exist to prevent tampering?

246
00:11:01,279 --> 00:11:04,310
What mechanism, if any, is there to prevent

247
00:11:04,310 --> 00:11:08,810
false input data from creating erroneous inferences?

248
00:11:08,810 --> 00:11:10,420
Is there any way that people can fix

249
00:11:10,420 --> 00:11:12,540
their score once they're ranked poorly?

250
00:11:12,540 --> 00:11:13,899
Or does it end up becoming a

251
00:11:13,899 --> 00:11:15,720
self-fulfilling prophecy?

252
00:11:15,720 --> 00:11:17,850
Your weak score means you have less access

253
00:11:17,850 --> 00:11:21,620
to jobs and credit, and now you will have

254
00:11:21,620 --> 00:11:24,709
limited access to opportunity.

255
00:11:24,709 --> 00:11:27,110
So, let's take a step back.

256
00:11:27,110 --> 00:11:28,470
So, what do we want?

257
00:11:28,470 --> 00:11:31,540
So, we probably don't want that,

258
00:11:31,540 --> 00:11:33,570
but as advocates we really wanna

259
00:11:33,570 --> 00:11:36,130
understand what questions we should be asking

260
00:11:36,130 --> 00:11:37,510
of these systems. Right now there's

261
00:11:37,510 --> 00:11:39,570
very little oversight,

262
00:11:39,570 --> 00:11:41,420
and we wanna make sure that we don't

263
00:11:41,420 --> 00:11:44,029
sort of sleepwalk our way to a situation

264
00:11:44,029 --> 00:11:46,649
where we've lost even more power

265
00:11:46,649 --> 00:11:49,740
to these centralized systems of control.

266
00:11:49,740 --> 00:11:52,209
And if you're an implementer, we wanna understand

267
00:11:52,209 --> 00:11:53,709
what can we be doing better.

268
00:11:53,709 --> 00:11:56,019
Are there better ways that we can be implementing

269
00:11:56,019 --> 00:11:57,640
these systems?

270
00:11:57,640 --> 00:11:59,430
Are there values that, as humans,

271
00:11:59,430 --> 00:12:01,060
we care about that we should make sure

272
00:12:01,060 --> 00:12:02,420
these systems have?

273
00:12:02,420 --> 00:12:05,550
So, the first thing
that most people in the room

274
00:12:05,550 --> 00:12:07,820
might think about is privacy.

275
00:12:07,820 --> 00:12:10,510
Which is, of course, of the utmost importance.

276
00:12:10,510 --> 00:12:12,920
We need privacy, and there is a good discussion

277
00:12:12,920 --> 00:12:15,680
on the importance of protecting
user data where possible.

278
00:12:15,680 --> 00:12:18,420
So, in this talk, I'm gonna focus
on the other aspects of

279
00:12:18,420 --> 00:12:19,470
algorithmic decision making,

280
00:12:19,470 --> 00:12:21,190
that I think have got less attention.

281
00:12:21,190 --> 00:12:25,140
Because it's not just privacy
that we need to worry about here.

282
00:12:25,140 --> 00:12:28,519
We also want systems that are fair and equitable.

283
00:12:28,519 --> 00:12:30,240
We want transparent systems,

284
00:12:30,240 --> 00:12:35,110
we don't want opaque decisions
to be made about us,

285
00:12:35,110 --> 00:12:36,510
decisions that might have serious impacts

286
00:12:36,510 --> 00:12:37,779
on our lives.

287
00:12:37,779 --> 00:12:40,490
And we need some accountability mechanisms.

288
00:12:40,490 --> 00:12:41,890
So, for the rest of this talk

289
00:12:41,890 --> 00:12:43,230
we're gonna go through each one of these things

290
00:12:43,230 --> 00:12:45,230
and look at some examples.

291
00:12:45,230 --> 00:12:47,709
So, the first thing is fairness.

292
00:12:47,709 --> 00:12:50,450
And so, as I said in the beginning,
this is one area

293
00:12:50,450 --> 00:12:52,690
where there might be an advantage

294
00:12:52,690 --> 00:12:55,079
to making decisions by machine,

295
00:12:55,079 --> 00:12:56,740
especially in areas where there have

296
00:12:56,740 --> 00:12:59,410
historically been fairness issues with

297
00:12:59,410 --> 00:13:02,350
decision making, such as law enforcement.

298
00:13:02,350 --> 00:13:05,839
So, this is one way that police departments

299
00:13:05,839 --> 00:13:08,360
use predictive models.

300
00:13:08,360 --> 00:13:10,540
The idea here is police would like to

301
00:13:10,540 --> 00:13:13,450
allocate resources in a more effective way,

302
00:13:13,450 --> 00:13:15,050
and they would also like to enable

303
00:13:15,050 --> 00:13:16,640
proactive policing.

304
00:13:16,640 --> 00:13:20,110
So, if you can predict where crimes
are going to occur,

305
00:13:20,110 --> 00:13:22,149
or who is going to commit crimes,

306
00:13:22,149 --> 00:13:24,870
then you can put cops in those places,

307
00:13:24,870 --> 00:13:27,769
or perhaps following these people,

308
00:13:27,769 --> 00:13:29,300
and then the crimes will not occur.

309
00:13:29,300 --> 00:13:31,370
So, it's sort of the pre-crime approach.

310
00:13:31,370 --> 00:13:34,649
So, there are a few ways of going about this.

311
00:13:34,649 --> 00:13:37,920
One way is doing this individual-level prediction.

312
00:13:37,920 --> 00:13:41,089
So you take each citizen
and estimate the risk

313
00:13:41,089 --> 00:13:43,769
that each citizen will participate,
say, in violence

314
00:13:43,769 --> 00:13:45,279
based on some data.

315
00:13:45,279 --> 00:13:46,779
And then you can flag those people that are

316
00:13:46,779 --> 00:13:49,199
considered particularly violent.

317
00:13:49,199 --> 00:13:51,519
So, this is currently done.

318
00:13:51,519 --> 00:13:52,589
This is done in the U.S.

319
00:13:52,589 --> 00:13:56,120
It's done in Chicago,
by the Chicago Police Department.

320
00:13:56,120 --> 00:13:58,350
And they maintain a heat list of individuals

321
00:13:58,350 --> 00:14:00,790
that are considered most likely to commit,

322
00:14:00,790 --> 00:14:03,529
or be the victim of, violence.

323
00:14:03,529 --> 00:14:06,700
And this is done using data
that the police maintain.

324
00:14:06,700 --> 00:14:09,589
So, the features that are used
in this predictive model

325
00:14:09,589 --> 00:14:12,209
include things that are derived from

326
00:14:12,209 --> 00:14:14,610
individuals' criminal history.

327
00:14:14,610 --> 00:14:16,810
So, for example, have they been involved in

328
00:14:16,810 --> 00:14:18,350
gun violence in the past?

329
00:14:18,350 --> 00:14:21,450
Do they have narcotics arrests? And so on.

330
00:14:21,450 --> 00:14:22,860
But another thing that's incorporated

331
00:14:22,860 --> 00:14:25,060
in the Chicago Police Department model is

332
00:14:25,060 --> 00:14:28,300
information derived from
social media network analysis.

333
00:14:28,300 --> 00:14:30,630
So, who you interact with,

334
00:14:30,630 --> 00:14:32,279
as noted in police data.

335
00:14:32,279 --> 00:14:34,899
So, for example, your co-arrestees.

336
00:14:34,899 --> 00:14:36,440
When officers conduct field interviews,

337
00:14:36,440 --> 00:14:38,240
who are people interacting with?

338
00:14:38,240 --> 00:14:42,940
And then this is all incorporated
into this risk score.

339
00:14:42,940 --> 00:14:44,639
So another way to proceed,

340
00:14:44,639 --> 00:14:47,070
which is the method that most companies

341
00:14:47,070 --> 00:14:49,579
that sell products like this
to the police have taken,

342
00:14:49,579 --> 00:14:51,459
is instead predicting which areas

343
00:14:51,459 --> 00:14:53,810
are likely to have crimes committed in them.

344
00:14:53,810 --> 00:14:56,690
So, take my city, I put a grid down,

345
00:14:56,690 --> 00:14:58,180
and then I use crime statistics

346
00:14:58,180 --> 00:15:00,430
and maybe some ancillary data sources,

347
00:15:00,430 --> 00:15:01,790
to determine which areas have

348
00:15:01,790 --> 00:15:04,709
the highest risk of crimes occurring in them,

349
00:15:04,709 --> 00:15:06,329
and I can flag those areas and send

350
00:15:06,329 --> 00:15:08,470
police officers to them.

351
00:15:08,470 --> 00:15:10,950
So now, let's look at some of the tools

352
00:15:10,950 --> 00:15:14,010
that are used for this geographic-level prediction.

353
00:15:14,010 --> 00:15:19,040
So, here are 3 companies that sell these

354
00:15:19,040 --> 00:15:22,910
geographic-level predictive policing systems.

355
00:15:22,910 --> 00:15:25,639
So, PredPol has a system that uses

356
00:15:25,639 --> 00:15:27,200
primarily crime statistics:

357
00:15:27,200 --> 00:15:30,209
only the time, place, and type of crime

358
00:15:30,209 --> 00:15:33,040
to predict where crimes will occur.

359
00:15:33,040 --> 00:15:35,970
HunchLab uses a wider range of data sources

360
00:15:35,970 --> 00:15:37,260
including, for example, weather

361
00:15:37,260 --> 00:15:39,720
and then Hitachi is a newer system

362
00:15:39,720 --> 00:15:42,100
that has a predictive crime analytics tool

363
00:15:42,100 --> 00:15:44,779
that also incorporates social media.

364
00:15:44,779 --> 00:15:47,850
The first one, to my knowledge, to do so.

365
00:15:47,850 --> 00:15:49,399
And these systems are in use

366
00:15:49,399 --> 00:15:52,820
in 50+ cities in the U.S.

367
00:15:52,820 --> 00:15:56,540
So, why do police departments buy this?

368
00:15:56,540 --> 00:15:57,760
Some police departments are interesting in

369
00:15:57,760 --> 00:16:00,500
buying systems like this, because they're marketed

370
00:16:00,500 --> 00:16:02,660
as impartial systems,

371
00:16:02,660 --> 00:16:06,199
so it's a way to police in an unbiased way.

372
00:16:06,199 --> 00:16:08,040
And so, these companies make

373
00:16:08,040 --> 00:16:08,670
statements like this--

374
00:16:08,670 --> 00:16:10,800
by the way, the references
will all be at the end,

375
00:16:10,800 --> 00:16:12,560
and they'll be on the slides--

376
00:16:12,560 --> 00:16:13,370
So, for example

377
00:16:13,370 --> 00:16:16,110
the predictive crime analytics from Hitachi

378
00:16:16,110 --> 00:16:17,610
claims that the system is anonymous,

379
00:16:17,610 --> 00:16:19,350
because it shows you an area,

380
00:16:19,350 --> 00:16:23,060
it doesn't show you
to look for a particular person.

381
00:16:23,060 --> 00:16:25,699
and PredPol reassures people that

382
00:16:25,699 --> 00:16:29,560
it eliminates any liberties or profiling concerns.

383
00:16:29,560 --> 00:16:32,269
And HunchLab notes that the system

384
00:16:32,269 --> 00:16:35,170
fairly represents priorities for public safety

385
00:16:35,170 --> 00:16:38,769
and is unbiased by race
or ethnicity, for example.

386
00:16:38,769 --> 00:16:43,529
So, let's take a minute
to describe in more detail

387
00:16:43,529 --> 00:16:48,100
what we mean when we talk about fairness.

388
00:16:48,100 --> 00:16:51,300
So, when we talk about fairness,

389
00:16:51,300 --> 00:16:52,740
we mean a few things.

390
00:16:52,740 --> 00:16:56,070
So, one is fairness with respect to individuals:

391
00:16:56,070 --> 00:16:58,040
so if I'm very similar to somebody

392
00:16:58,040 --> 00:17:00,170
and we go through some process

393
00:17:00,170 --> 00:17:03,430
and there is two very different
outcomes to that process

394
00:17:03,430 --> 00:17:05,679
we would consider that to be unfair.

395
00:17:05,679 --> 00:17:07,929
So, we want similar people to be treated

396
00:17:07,929 --> 00:17:09,539
in a similar way.

397
00:17:09,539 --> 00:17:13,079
But, there are certain protected attributes

398
00:17:13,079 --> 00:17:15,199
that we wouldn't want someone

399
00:17:15,199 --> 00:17:17,099
to discriminate based on.

400
00:17:17,099 --> 00:17:20,069
And so, there's this other property,
Group Fairness.

401
00:17:20,069 --> 00:17:22,249
So, we can look at the statistical parity

402
00:17:22,249 --> 00:17:25,439
between groups, based on gender, race, etc.

403
00:17:25,439 --> 00:17:28,049
and see if they're treated in a similar way.

404
00:17:28,049 --> 00:17:30,409
And we might not expect that in some cases,

405
00:17:30,409 --> 00:17:32,429
for example if the base rates in each group

406
00:17:32,429 --> 00:17:34,659
are very different.

407
00:17:34,659 --> 00:17:36,889
And then there's also Fairness in Errors.

408
00:17:36,889 --> 00:17:40,080
All predictive systems are gonna make errors,

409
00:17:40,080 --> 00:17:42,989
and if the errors are concentrated,

410
00:17:42,989 --> 00:17:46,399
then that may also represent unfairness.

411
00:17:46,399 --> 00:17:50,149
And so this concern arose recently with Facebook

412
00:17:50,149 --> 00:17:52,289
because people with Native American names

413
00:17:52,289 --> 00:17:54,389
had their profiles flagged as fraudulent

414
00:17:54,389 --> 00:17:58,759
far more often than those
with White American names.

415
00:17:58,759 --> 00:18:00,559
So these are the sorts of things
that we worry about

416
00:18:00,559 --> 00:18:02,190
and each of these are metrics,

417
00:18:02,190 --> 00:18:04,239
and if you're interested more you should

418
00:18:04,239 --> 00:18:06,159
check those 2 papers out.

419
00:18:06,159 --> 00:18:10,639
So, how can potential issues
with predictive policing

420
00:18:10,639 --> 00:18:13,850
have implications for these principles?

421
00:18:13,850 --> 00:18:18,559
So, one problem is
the training data that's used.

422
00:18:18,559 --> 00:18:21,059
Some of these systems only use crime statistics,

423
00:18:21,059 --> 00:18:23,600
other systems-- all of them use crime statistics

424
00:18:23,600 --> 00:18:25,619
in some way.

425
00:18:25,619 --> 00:18:31,419
So, one problem is that crime databases

426
00:18:31,419 --> 00:18:34,830
contain only crimes that've been detected.

427
00:18:34,830 --> 00:18:38,629
Right? So, the police are only gonna detect

428
00:18:38,629 --> 00:18:41,009
crimes that they know are happening,

429
00:18:41,009 --> 00:18:44,109
either through patrol and their own investigation

430
00:18:44,109 --> 00:18:46,320
or because they've been alerted to crime,

431
00:18:46,320 --> 00:18:48,789
for example by a citizen calling the police.

432
00:18:48,789 --> 00:18:52,179
So, a citizen has to feel like
they *can* call the police,

433
00:18:52,179 --> 00:18:54,019
like that's a good idea.

434
00:18:54,019 --> 00:18:58,789
So, some crimes suffer
from this problem less than others:

435
00:18:58,789 --> 00:19:02,249
for example, gun violence
is much easier to detect

436
00:19:02,249 --> 00:19:03,639
relative to fraud, for example,

437
00:19:03,639 --> 00:19:07,509
which is very difficult to detect.

438
00:19:07,509 --> 00:19:11,940
Now the racial profiling aspect
of this might come in

439
00:19:11,940 --> 00:19:15,590
because of biased policing in the past.

440
00:19:15,590 --> 00:19:19,999
So, for example, for marijuana arrests,

441
00:19:19,999 --> 00:19:22,619
black people are arrested in the U.S. at rates

442
00:19:22,619 --> 00:19:25,119
4 times that of white people,

443
00:19:25,119 --> 00:19:27,960
even though there is statistical parity

444
00:19:27,960 --> 00:19:31,389
with these 2 groups, to within a few percent.

445
00:19:31,389 --> 00:19:35,820
So, this is where problems can arise.

446
00:19:35,820 --> 00:19:37,159
So, let's go back to this

447
00:19:37,159 --> 00:19:38,749
geographic-level predictive policing.

448
00:19:38,749 --> 00:19:42,460
So the danger here is that, unless this system

449
00:19:42,460 --> 00:19:44,299
is very carefully constructed,

450
00:19:44,299 --> 00:19:47,090
this sort of crime area ranking might

451
00:19:47,090 --> 00:19:49,019
again become a self-fulling prophecy.

452
00:19:49,019 --> 00:19:51,460
If you send police officers to these areas,

453
00:19:51,460 --> 00:19:53,220
you further scrutinize them,

454
00:19:53,220 --> 00:19:55,659
and then again you're only detecting a subset

455
00:19:55,659 --> 00:19:57,979
of crimes, and the cycle continues.

456
00:19:57,979 --> 00:20:02,139
So, one obvious issue is that

457
00:20:02,139 --> 00:20:07,599
this statement about geographic-based
crime prediction

458
00:20:07,599 --> 00:20:10,229
being anonymous is not true,

459
00:20:10,229 --> 00:20:13,159
because race and location are very strongly

460
00:20:13,159 --> 00:20:14,840
correlated in the U.S.

461
00:20:14,840 --> 00:20:16,609
And this is something that machine-learning
systems

462
00:20:16,609 --> 00:20:20,049
can potentially learn.

463
00:20:20,049 --> 00:20:23,039
Another issue is that, for example,

464
00:20:23,039 --> 00:20:25,580
for individual fairness, one of my homes

465
00:20:25,580 --> 00:20:27,599
sits within one of these boxes.

466
00:20:27,599 --> 00:20:29,950
Some of these boxes
in these systems are very small,

467
00:20:29,950 --> 00:20:33,399
for example PredPol is 500ft x 500ft,

468
00:20:33,399 --> 00:20:36,349
so it's maybe only a few houses.

469
00:20:36,349 --> 00:20:39,149
So, the implications of this system are that

470
00:20:39,149 --> 00:20:40,849
you have police officers maybe sitting

471
00:20:40,849 --> 00:20:42,979
in a police cruiser outside your home

472
00:20:42,979 --> 00:20:45,450
and a few doors down someone

473
00:20:45,450 --> 00:20:46,799
may not be within that box,

474
00:20:46,799 --> 00:20:48,159
and doesn't have this.

475
00:20:48,159 --> 00:20:51,399
So, that may represent unfairness.

476
00:20:51,399 --> 00:20:54,929
So, there are real questions here,

477
00:20:54,929 --> 00:20:57,720
especially because there's no opt-out.

478
00:20:57,720 --> 00:21:00,059
There's no way to opt-out of this system:

479
00:21:00,059 --> 00:21:02,239
if you live in a city that has this,

480
00:21:02,239 --> 00:21:04,909
then you have to deal with it.

481
00:21:04,909 --> 00:21:07,229
So, it's quite difficult to find out

482
00:21:07,229 --> 00:21:09,879
what's really going on

483
00:21:09,879 --> 00:21:11,169
because the algorithm is secret.

484
00:21:11,169 --> 00:21:13,049
And, in most cases, we don't know

485
00:21:13,049 --> 00:21:14,789
the full details of the inputs.

486
00:21:14,789 --> 00:21:16,679
We have some idea
about what features are used,

487
00:21:16,679 --> 00:21:17,970
but that's about it.

488
00:21:17,970 --> 00:21:19,509
We also don't know the output.

489
00:21:19,509 --> 00:21:21,899
That would be knowing police allocation,

490
00:21:21,899 --> 00:21:23,179
police strategies,

491
00:21:23,179 --> 00:21:26,299
and in order to nail down
what's really going on here

492
00:21:26,299 --> 00:21:28,609
in order to verify the validity of

493
00:21:28,609 --> 00:21:30,009
these companies' claims,

494
00:21:30,009 --> 00:21:33,799
it may be necessary
to have a 3rd party come in,

495
00:21:33,799 --> 00:21:35,629
examine the inputs and outputs of the system,

496
00:21:35,629 --> 00:21:37,590
and say concretely what's going on.

497
00:21:37,590 --> 00:21:39,460
And if everything is fine and dandy

498
00:21:39,460 --> 00:21:40,929
then this shouldn't be a problem.

499
00:21:40,929 --> 00:21:43,619
So, that's potentially one role that

500
00:21:43,619 --> 00:21:44,769
advocates can play.

501
00:21:44,769 --> 00:21:46,720
Maybe we should start pushing for audits

502
00:21:46,720 --> 00:21:48,820
of systems that are used in this way.

503
00:21:48,820 --> 00:21:50,970
These could have serious implications

504
00:21:50,970 --> 00:21:52,679
for peoples' lives.

505
00:21:52,679 --> 00:21:55,249
So, we'll return
to this idea a little bit later,

506
00:21:55,249 --> 00:21:58,210
but for now this leads us
nicely to Transparency.

507
00:21:58,210 --> 00:21:59,419
So, we wanna know

508
00:21:59,419 --> 00:22:01,929
what these systems are doing.

509
00:22:01,929 --> 00:22:04,729
But it's very hard,
for the reasons described earlier,

510
00:22:04,729 --> 00:22:06,139
but even in the case of something like

511
00:22:06,139 --> 00:22:09,849
trying to understand Google's search algorithm,

512
00:22:09,849 --> 00:22:11,679
it's difficult because it's personalized.

513
00:22:11,679 --> 00:22:13,529
So, by construction, each user is

514
00:22:13,529 --> 00:22:15,320
only seeing one endpoint.

515
00:22:15,320 --> 00:22:18,169
So, it's a very isolating system.

516
00:22:18,169 --> 00:22:20,349
What do other people see?

517
00:22:20,349 --> 00:22:22,409
And one reason it's difficult to make

518
00:22:22,409 --> 00:22:24,099
some of these systems transparent

519
00:22:24,099 --> 00:22:26,679
is because of, simply, the complexity

520
00:22:26,679 --> 00:22:27,950
of the algorithms.

521
00:22:27,950 --> 00:22:30,309
So, an algorithm can become so complex that

522
00:22:30,309 --> 00:22:31,669
it's difficult to comprehend,

523
00:22:31,669 --> 00:22:33,289
even for the designer of the system,

524
00:22:33,289 --> 00:22:35,509
or the implementer of the system.

525
00:22:35,509 --> 00:22:38,419
The designed might know that this algorithm

526
00:22:38,419 --> 00:22:42,889
maximizes some metric-- say, accuracy,

527
00:22:42,889 --> 00:22:44,570
but they may not always have a solid

528
00:22:44,570 --> 00:22:46,779
understanding of what the algorithm is doing

529
00:22:46,779 --> 00:22:48,330
for all inputs.

530
00:22:48,330 --> 00:22:50,970
Certainly with respect to fairness.

531
00:22:50,970 --> 00:22:55,759
So, in some cases,
it might not be appropriate to use

532
00:22:55,759 --> 00:22:57,379
an extremely complex model.

533
00:22:57,379 --> 00:22:59,529
It might be better to use a simpler system

534
00:22:59,529 --> 00:23:02,910
with human-interpretable features.

535
00:23:02,910 --> 00:23:04,749
Another issue that arises

536
00:23:04,749 --> 00:23:07,559
from the opacity of these systems

537
00:23:07,559 --> 00:23:09,409
and the centralized control

538
00:23:09,409 --> 00:23:11,860
is that it makes them very influential.

539
00:23:11,860 --> 00:23:13,950
And thus, an excellent target

540
00:23:13,950 --> 00:23:16,210
for manipulation or tampering.

541
00:23:16,210 --> 00:23:18,479
So, this might be tampering that is done

542
00:23:18,479 --> 00:23:21,950
from an organization that controls the system,

543
00:23:21,950 --> 00:23:23,769
or an insider at one of the organizations,

544
00:23:23,769 --> 00:23:27,139
or anyone who's able to compromise their security.

545
00:23:27,139 --> 00:23:30,249
So, this is an interesting academic work

546
00:23:30,249 --> 00:23:32,099
that looked at the possibility of

547
00:23:32,099 --> 00:23:34,159
slightly modifying search rankings

548
00:23:34,159 --> 00:23:36,619
to shift people's political views.

549
00:23:36,619 --> 00:23:39,009
So, since people are most likely to

550
00:23:39,009 --> 00:23:41,330
click on the top search results,

551
00:23:41,330 --> 00:23:44,429
so 90% of clicks go to the
first page of search results,

552
00:23:44,429 --> 00:23:46,719
then perhaps by reshuffling
things a little bit,

553
00:23:46,719 --> 00:23:48,729
or maybe dropping some search results,

554
00:23:48,729 --> 00:23:50,269
you can influence people's views

555
00:23:50,269 --> 00:23:51,679
in a coherent way,

556
00:23:51,679 --> 00:23:53,090
and maybe you can make it so subtle

557
00:23:53,090 --> 00:23:55,749
that no one is able to notice.

558
00:23:55,749 --> 00:23:57,249
So in this academic study,

559
00:23:57,249 --> 00:24:00,349
they did an experiment

560
00:24:00,349 --> 00:24:02,070
in the 2014 Indian election.

561
00:24:02,070 --> 00:24:04,219
So they used real voters,

562
00:24:04,219 --> 00:24:06,450
and they kept the size
of the experiment small enough

563
00:24:06,450 --> 00:24:08,190
that it was not going to influence the outcome

564
00:24:08,190 --> 00:24:10,090
of the election.

565
00:24:10,090 --> 00:24:12,139
So the researchers took people,

566
00:24:12,139 --> 00:24:14,229
they determined their political leaning,

567
00:24:14,229 --> 00:24:17,429
and they segmented them into
control and treatment groups,

568
00:24:17,429 --> 00:24:19,269
where the treatment was manipulation

569
00:24:19,269 --> 00:24:21,210
of the search ranking results,

570
00:24:21,210 --> 00:24:24,409
And then they had these people
browse the web.

571
00:24:24,409 --> 00:24:25,969
And what they found, is that

572
00:24:25,969 --> 00:24:28,229
this mechanism is very effective at shifting

573
00:24:28,229 --> 00:24:30,429
people's voter preferences.

574
00:24:30,429 --> 00:24:33,649
So, in this study, they were able to introduce

575
00:24:33,649 --> 00:24:36,849
a 20% shift in voter preferences.

576
00:24:36,849 --> 00:24:39,299
Even alerting users to the fact that this

577
00:24:39,299 --> 00:24:41,729
was going to be done, telling them

578
00:24:41,729 --> 00:24:44,049
"we are going to manipulate your search results,"

579
00:24:44,049 --> 00:24:45,729
"really pay attention,"

580
00:24:45,729 --> 00:24:49,099
they were totally unable to decrease

581
00:24:49,099 --> 00:24:50,859
the magnitude of the effect.

582
00:24:50,859 --> 00:24:55,109
So, the margins of error in many elections

583
00:24:55,109 --> 00:24:57,669
is incredibly small,

584
00:24:57,669 --> 00:24:59,929
and the authors estimate that this shift

585
00:24:59,929 --> 00:25:02,009
could change the outcome of about

586
00:25:02,009 --> 00:25:07,109
25% of elections worldwide, if this were done.

587
00:25:07,109 --> 00:25:10,919
And the bias is so small that no one can tell.

588
00:25:10,919 --> 00:25:14,279
So, all humans, no matter how smart

589
00:25:14,279 --> 00:25:17,109
and resistant to manipulation
we think we are,

590
00:25:17,109 --> 00:25:21,909
all of us are subject to this sort of manipulation,

591
00:25:21,909 --> 00:25:24,320
and we really can't tell.

592
00:25:24,320 --> 00:25:27,129
So, I'm not saying that this is occurring,

593
00:25:27,129 --> 00:25:31,389
but right now there is no
regulation to stop this,

594
00:25:31,389 --> 00:25:34,409
there is no way we could reliably detect this,

595
00:25:34,409 --> 00:25:37,210
so there's a huge amount of power here.

596
00:25:37,210 --> 00:25:39,779
So, something to think about.

597
00:25:39,779 --> 00:25:42,710
But it's not only corporations that are interested

598
00:25:42,710 --> 00:25:47,269
in this sort of behavioral manipulation.

599
00:25:47,269 --> 00:25:51,119
In 2010, UK Prime Minister David Cameron

600
00:25:51,119 --> 00:25:54,969
created this UK Behavioural Insights Team,

601
00:25:54,969 --> 00:25:57,269
which is informally called the Nudge Unit.

602
00:25:57,269 --> 00:26:01,489
And so what they do is
they use behavioral science

603
00:26:01,489 --> 00:26:04,769
and this predictive analytics approach,

604
00:26:04,769 --> 00:26:06,119
with experimentation,

605
00:26:06,119 --> 00:26:07,940
to have people make better decisions

606
00:26:07,940 --> 00:26:09,690
for themselves and society--

607
00:26:09,690 --> 00:26:11,989
as determined by the UK government.

608
00:26:11,989 --> 00:26:14,269
And as of a few months ago,

609
00:26:14,269 --> 00:26:16,849
after an executive order signed by Obama

610
00:26:16,849 --> 00:26:19,349
in September, the United States now has

611
00:26:19,349 --> 00:26:21,429
its own Nudge Unit.

612
00:26:21,429 --> 00:26:24,009
So, to be clear, I don't think that this is

613
00:26:24,009 --> 00:26:25,539
some sort of malicious plot.

614
00:26:25,539 --> 00:26:27,440
I think that there *can* be huge value

615
00:26:27,440 --> 00:26:29,489
in these sorts of initiatives,

616
00:26:29,489 --> 00:26:31,330
positively impacting people's lives,

617
00:26:31,330 --> 00:26:34,179
but when this sort of behavioral manipulation

618
00:26:34,179 --> 00:26:37,289
is being done, in part openly,

619
00:26:37,289 --> 00:26:39,460
oversight is pretty important,

620
00:26:39,460 --> 00:26:41,700
and we really need to consider

621
00:26:41,700 --> 00:26:46,090
what these systems are optimizing for.

622
00:26:46,090 --> 00:26:47,849
And that's something that we might

623
00:26:47,849 --> 00:26:52,090
not always know, or at least understand,

624
00:26:52,090 --> 00:26:54,450
so for example, for industry,

625
00:26:54,450 --> 00:26:57,679
we do have a pretty good understanding there:

626
00:26:57,679 --> 00:26:59,809
industry cares about optimizing for

627
00:26:59,809 --> 00:27:01,960
the time spent on the website,

628
00:27:01,960 --> 00:27:04,929
Facebook wants you to spend more time on Facebook,

629
00:27:04,929 --> 00:27:06,950
they want you to click on ads,

630
00:27:06,950 --> 00:27:09,109
click on newsfeed items,

631
00:27:09,109 --> 00:27:11,299
they want you to like things.

632
00:27:11,299 --> 00:27:14,309
And, fundamentally: profit.

633
00:27:14,309 --> 00:27:17,599
So, already this has some serious implications,

634
00:27:17,599 --> 00:27:19,690
and this had pretty serious implications

635
00:27:19,690 --> 00:27:22,190
in the last 10 years, in media for example.

636
00:27:22,190 --> 00:27:25,119
The optimizing for click-through rate in journalism

637
00:27:25,119 --> 00:27:26,629
has produced a race to the bottom

638
00:27:26,629 --> 00:27:28,039
in terms of quality.

639
00:27:28,039 --> 00:27:30,919
And another issue is that optimizing

640
00:27:30,919 --> 00:27:34,589
for what people like might not always be

641
00:27:34,589 --> 00:27:35,839
the best approach.

642
00:27:35,839 --> 00:27:38,859
So, Facebook officials have said publicly

643
00:27:38,859 --> 00:27:41,279
about how Facebook's goal is to make you happy,

644
00:27:41,279 --> 00:27:43,149
they want you to open that newsfeed

645
00:27:43,149 --> 00:27:45,080
and just feel great.

646
00:27:45,080 --> 00:27:47,379
But, there's an issue there, right?

647
00:27:47,379 --> 00:27:50,169
Because people get their news,

648
00:27:50,169 --> 00:27:52,369
like 40% of people according to Pew Research,

649
00:27:52,369 --> 00:27:54,599
get their news from Facebook.

650
00:27:54,599 --> 00:27:58,460
So, if people don't want to see

651
00:27:58,460 --> 00:28:01,239
war and corpses,
because it makes them feel sad,

652
00:28:01,239 --> 00:28:04,179
so this is not a system that is gonna optimize

653
00:28:04,179 --> 00:28:07,149
for an informed population.

654
00:28:07,149 --> 00:28:09,359
It's not gonna produce a population that is

655
00:28:09,359 --> 00:28:11,469
ready to engage in civic life.

656
00:28:11,469 --> 00:28:13,059
It's gonna produce an amused populations

657
00:28:13,059 --> 00:28:16,809
whose time is occupied by cat pictures.

658
00:28:16,809 --> 00:28:19,159
So, in politics, we have a similar

659
00:28:19,159 --> 00:28:21,269
optimization problem that's occurring.

660
00:28:21,269 --> 00:28:23,769
So, these political campaigns that use

661
00:28:23,769 --> 00:28:26,769
these predictive systems,

662
00:28:26,769 --> 00:28:28,669
are optimizing for votes for the desired candidate,

663
00:28:28,669 --> 00:28:30,200
of course.

664
00:28:30,200 --> 00:28:33,499
So, instead of a political campaign being

665
00:28:33,499 --> 00:28:36,139
--well, maybe this is a naive view, but--

666
00:28:36,139 --> 00:28:38,070
being an open discussion of the issues

667
00:28:38,070 --> 00:28:39,830
facing the country,

668
00:28:39,830 --> 00:28:43,200
it becomes this micro-targeted
persuasion game,

669
00:28:43,200 --> 00:28:44,669
and the people that get targeted

670
00:28:44,669 --> 00:28:47,349
are a very small subset of all people,

671
00:28:47,349 --> 00:28:49,399
and it's only gonna be people that are

672
00:28:49,399 --> 00:28:51,409
you know, on the edge, maybe disinterested,

673
00:28:51,409 --> 00:28:54,399
those are the people that are gonna get attention

674
00:28:54,399 --> 00:28:58,839
from political candidates.

675
00:28:58,839 --> 00:29:01,869
In policy, as with these Nudge Units,

676
00:29:01,869 --> 00:29:03,539
they're being used to enable

677
00:29:03,539 --> 00:29:06,109
better use of government services.

678
00:29:06,109 --> 00:29:07,419
There are some good projects that have

679
00:29:07,419 --> 00:29:09,419
come out of this:

680
00:29:09,419 --> 00:29:11,409
increasing voter registration,

681
00:29:11,409 --> 00:29:12,739
improving health outcomes,

682
00:29:12,739 --> 00:29:14,419
improving education outcomes.

683
00:29:14,419 --> 00:29:16,419
But some of these predictive systems

684
00:29:16,419 --> 00:29:18,229
that we're starting to see in government

685
00:29:18,229 --> 00:29:20,700
are optimizing for compliance,

686
00:29:20,700 --> 00:29:23,669
as is the case with predictive policing.

687
00:29:23,669 --> 00:29:25,460
So this is something that we need to

688
00:29:25,460 --> 00:29:28,649
watch carefully.

689
00:29:28,649 --> 00:29:30,119
I think this is a nice quote that

690
00:29:30,119 --> 00:29:33,339
sort of describes the problem.

691
00:29:33,339 --> 00:29:35,200
In some ways me might be narrowing

692
00:29:35,200 --> 00:29:38,259
our horizon, and the danger is that

693
00:29:38,259 --> 00:29:41,989
these tools are separating people.

694
00:29:41,989 --> 00:29:43,570
And this is particularly bad

695
00:29:43,570 --> 00:29:45,940
for political action, because political action

696
00:29:45,940 --> 00:29:49,879
requires people to have shared experience,

697
00:29:49,879 --> 00:29:53,799
and thus are able to collectively act

698
00:29:53,799 --> 00:29:57,629
to exert pressure to fix problems.

699
00:29:57,629 --> 00:30:00,810
So, finally: accountability.

700
00:30:00,810 --> 00:30:03,399
So, we need some oversight mechanisms.

701
00:30:03,399 --> 00:30:06,519
For example, in the case of errors--

702
00:30:06,519 --> 00:30:08,219
so this is particularly important for

703
00:30:08,219 --> 00:30:10,849
civil or bureaucratic systems.

704
00:30:10,849 --> 00:30:14,330
So, when an algorithm produces some decision,

705
00:30:14,330 --> 00:30:16,549
we don't always want humans to just

706
00:30:16,549 --> 00:30:18,039
defer to the machine,

707
00:30:18,039 --> 00:30:21,859
and that might represent one of the problems.

708
00:30:21,859 --> 00:30:25,419
So, there are starting to be some cases

709
00:30:25,419 --> 00:30:28,039
of computer algorithms yielding a decision,

710
00:30:28,039 --> 00:30:30,409
and then humans being unable to correct

711
00:30:30,409 --> 00:30:31,799
an obvious error.

712
00:30:31,799 --> 00:30:35,190
So there's this case in Georgia,
in the United States,

713
00:30:35,190 --> 00:30:37,259
where 2 young people went to

714
00:30:37,259 --> 00:30:38,529
the Department of Motor Vehicles,

715
00:30:38,529 --> 00:30:39,749
they're twins, and they went

716
00:30:39,749 --> 00:30:42,099
to get their driver's license.

717
00:30:42,099 --> 00:30:44,979
However, they were both flagged by

718
00:30:44,979 --> 00:30:47,489
a fraud algorithm that uses facial recognition

719
00:30:47,489 --> 00:30:48,809
to look for similar faces,

720
00:30:48,809 --> 00:30:50,919
and I guess the people that designed the system

721
00:30:50,919 --> 00:30:54,549
didn't think of the possibility of twins.

722
00:30:54,549 --> 00:30:58,489
Yeah.
So, they just left

723
00:30:58,489 --> 00:30:59,889
without their driver's licenses.

724
00:30:59,889 --> 00:31:01,889
The people in the Department of Motor Vehicles

725
00:31:01,889 --> 00:31:03,809
were unable to correct this.

726
00:31:03,809 --> 00:31:06,820
So, this is one implication--

727
00:31:06,820 --> 00:31:08,579
it's like something out of Kafka.

728
00:31:08,579 --> 00:31:11,529
But there are also cases of errors being made,

729
00:31:11,529 --> 00:31:13,879
and people not noticing until

730
00:31:13,879 --> 00:31:15,909
after actions have been taken,

731
00:31:15,909 --> 00:31:17,570
some of them very serious--

732
00:31:17,570 --> 00:31:19,129
because people simply deferred

733
00:31:19,129 --> 00:31:20,619
to the machine.

734
00:31:20,619 --> 00:31:23,309
So, this is an example from San Francisco.

735
00:31:23,309 --> 00:31:26,679
So, an ALPR-- an Automated License Plate Reader--

736
00:31:26,679 --> 00:31:29,429
is a device that uses image recognition

737
00:31:29,429 --> 00:31:32,099
to detect and read license plates,

738
00:31:32,099 --> 00:31:34,339
and usually to compare license plates

739
00:31:34,339 --> 00:31:37,159
with a known list of plates of interest.

740
00:31:37,159 --> 00:31:39,799
And, so, San Francisco uses these

741
00:31:39,799 --> 00:31:42,179
and they're mounted on police cars.

742
00:31:42,179 --> 00:31:46,659
So, in this case, San Francisco ALPR

743
00:31:46,659 --> 00:31:48,879
got a hit on a car,

744
00:31:48,879 --> 00:31:53,029
and it was the car of a 47-year-old woman,

745
00:31:53,029 --> 00:31:54,839
with no criminal history.

746
00:31:54,839 --> 00:31:56,029
And so it was a false hit

747
00:31:56,029 --> 00:31:58,099
because it was a blurry image,

748
00:31:58,099 --> 00:31:59,709
and it matched erroneously with

749
00:31:59,709 --> 00:32:00,909
one of the plates of interest

750
00:32:00,909 --> 00:32:03,479
that happened to be a stolen vehicle.

751
00:32:03,479 --> 00:32:06,869
So, they conducted a traffic stop on her,

752
00:32:06,869 --> 00:32:09,330
and they take her out of the vehicle,

753
00:32:09,330 --> 00:32:11,049
they search her and the vehicle,

754
00:32:11,049 --> 00:32:12,659
she gets a pat-down,

755
00:32:12,659 --> 00:32:14,849
and they have her kneel

756
00:32:14,849 --> 00:32:17,780
at gunpoint, in the street.

757
00:32:17,780 --> 00:32:20,989
So, how much oversight should be present

758
00:32:20,989 --> 00:32:23,999
depends on the implications of the system.

759
00:32:23,999 --> 00:32:25,279
It's certainly the case that

760
00:32:25,279 --> 00:32:26,910
for some of these decision-making systems,

761
00:32:26,910 --> 00:32:29,219
an error might not be that important,

762
00:32:29,219 --> 00:32:31,149
it could be relatively harmless,

763
00:32:31,149 --> 00:32:33,559
but in this case,
an error in this algorithmic decision

764
00:32:33,559 --> 00:32:36,259
led to this totally innocent person

765
00:32:36,259 --> 00:32:40,019
literally having a gun pointed at her.

766
00:32:40,019 --> 00:32:44,019
So, that brings us to: we need some way of

767
00:32:44,019 --> 00:32:45,419
getting some information about

768
00:32:45,419 --> 00:32:47,249
what is going on here.

769
00:32:47,249 --> 00:32:50,179
We don't wanna have to wait for these events

770
00:32:50,179 --> 00:32:52,580
before we are able to determine

771
00:32:52,580 --> 00:32:54,409
some information about the system.

772
00:32:54,409 --> 00:32:56,139
So, auditing is one option:

773
00:32:56,139 --> 00:32:58,109
to independently verify the statements

774
00:32:58,109 --> 00:33:00,809
of companies, in situations where we have

775
00:33:00,809 --> 00:33:02,939
inputs and outputs.

776
00:33:02,939 --> 00:33:05,200
So, for example, this could be done with

777
00:33:05,200 --> 00:33:07,489
Google, Facebook.

778
00:33:07,489 --> 00:33:09,190
If you have the inputs of a system,

779
00:33:09,190 --> 00:33:10,649
say you have test accounts,

780
00:33:10,649 --> 00:33:11,729
or real accounts,

781
00:33:11,729 --> 00:33:14,359
maybe you can collect
people's information together.

782
00:33:14,359 --> 00:33:15,830
So that was something that was done

783
00:33:15,830 --> 00:33:18,759
during the 2012 Obama campaign

784
00:33:18,759 --> 00:33:20,249
by ProPublica.

785
00:33:20,249 --> 00:33:21,269
People noticed that they were getting

786
00:33:21,269 --> 00:33:24,739
different emails from the Obama campaign,

787
00:33:24,739 --> 00:33:26,009
and were interested to see

788
00:33:26,009 --> 00:33:28,209
based on what factors

789
00:33:28,209 --> 00:33:29,749
the emails were changing.

790
00:33:29,749 --> 00:33:32,659
So, I think about 200 people submitted emails

791
00:33:32,659 --> 00:33:34,940
and they were able to determine some information

792
00:33:34,940 --> 00:33:38,809
about what the emails
were being varied based on.

793
00:33:38,809 --> 00:33:40,859
So there have been some successful

794
00:33:40,859 --> 00:33:43,080
attempts at this.

795
00:33:43,080 --> 00:33:45,919
So, compare inputs and then look at

796
00:33:45,919 --> 00:33:48,709
why one item was shown to one user

797
00:33:48,709 --> 00:33:50,289
and not another, and see if there's

798
00:33:50,289 --> 00:33:51,879
any statistical differences.

799
00:33:51,879 --> 00:33:56,279
So, there's some potential legal issues

800
00:33:56,279 --> 00:33:57,749
with the test accounts, so that's something

801
00:33:57,749 --> 00:34:01,499
to think about-- I'm not a lawyer.

802
00:34:01,499 --> 00:34:03,919
So, for example, if you wanna examine

803
00:34:03,919 --> 00:34:06,269
ad-targeting algorithms,

804
00:34:06,269 --> 00:34:07,969
one way to proceed is to construct

805
00:34:07,969 --> 00:34:10,589
a browsing profile, and then examine

806
00:34:10,589 --> 00:34:12,989
what ads are served back to you.

807
00:34:12,989 --> 00:34:14,119
And so this is something that

808
00:34:14,119 --> 00:34:16,250
academic researchers have looked at,

809
00:34:16,250 --> 00:34:17,489
because, at the time at least,

810
00:34:17,489 --> 00:34:20,879
you didn't need to make an account to do this.

811
00:34:20,879 --> 00:34:24,768
So, this was a study that was presented at

812
00:34:24,768 --> 00:34:27,799
Privacy Enhancing Technologies last year,

813
00:34:27,799 --> 00:34:31,149
and in this study, the researchers

814
00:34:31,149 --> 00:34:33,179
generate some browsing profiles

815
00:34:33,179 --> 00:34:35,909
that differ only by one characteristic,

816
00:34:35,909 --> 00:34:37,690
so they're basically identical in every way

817
00:34:37,690 --> 00:34:39,049
except for one thing.

818
00:34:39,049 --> 00:34:42,359
And that is denoted by Treatment 1 and 2.

819
00:34:42,359 --> 00:34:44,460
So this is a randomized, controlled trial,

820
00:34:44,460 --> 00:34:46,389
but I left out the randomization part

821
00:34:46,389 --> 00:34:48,220
for simplicity.

822
00:34:48,220 --> 00:34:54,799
So, in one study,
they applied a treatment of gender.

823
00:34:54,799 --> 00:34:56,799
So, they had the browsing profiles

824
00:34:56,799 --> 00:34:59,319
in Treatment 1 be male browsing profiles,

825
00:34:59,319 --> 00:35:02,029
and the browsing profiles in Treatment 2
be female.

826
00:35:02,029 --> 00:35:04,430
And they wanted to see: is there any difference

827
00:35:04,430 --> 00:35:06,079
in the way that ads are targeted

828
00:35:06,079 --> 00:35:08,710
if browsing profiles are effectively identical

829
00:35:08,710 --> 00:35:11,019
except for gender?

830
00:35:11,019 --> 00:35:14,710
So, it turns out that there *was*.

831
00:35:14,710 --> 00:35:19,180
So, a 3rd-party site was showing Google ads

832
00:35:19,180 --> 00:35:21,289
for senior executive positions

833
00:35:21,289 --> 00:35:23,980
at a rate 6 times higher to the fake men

834
00:35:23,980 --> 00:35:27,059
than for the fake women in this study.

835
00:35:27,059 --> 00:35:30,109
So, this sort of auditing is not going to

836
00:35:30,109 --> 00:35:32,779
be able to determine everything

837
00:35:32,779 --> 00:35:34,930
that algorithms are doing, but they can

838
00:35:34,930 --> 00:35:36,519
sometimes uncover interesting,

839
00:35:36,519 --> 00:35:40,900
at least statistical differences.

840
00:35:40,900 --> 00:35:47,099
So, this leads us to the fundamental issue:

841
00:35:47,099 --> 00:35:49,180
Right now, we're really not in control

842
00:35:49,180 --> 00:35:50,510
of some of these systems,

843
00:35:50,510 --> 00:35:54,480
and we really need these predictive systems

844
00:35:54,480 --> 00:35:56,119
to be controlled by us,

845
00:35:56,119 --> 00:35:57,819
in order for them not to be used

846
00:35:57,819 --> 00:36:00,109
as a system of control.

847
00:36:00,109 --> 00:36:03,220
So there are some technologies that I'd like

848
00:36:03,220 --> 00:36:06,890
to point you all to.

849
00:36:06,890 --> 00:36:08,319
We need tools in the digital commons

850
00:36:08,319 --> 00:36:11,160
that can help address some of these concerns.

851
00:36:11,160 --> 00:36:13,349
So, the first thing is that of course

852
00:36:13,349 --> 00:36:14,730
we known that minimizing the amount of

853
00:36:14,730 --> 00:36:17,069
data available can help in some contexts,

854
00:36:17,069 --> 00:36:18,980
which we can do by making systems

855
00:36:18,980 --> 00:36:22,779
that are private by design, and by default.

856
00:36:22,779 --> 00:36:24,549
Another thing is that these audit tools

857
00:36:24,549 --> 00:36:25,890
might be useful.

858
00:36:25,890 --> 00:36:30,720
And, so, these 2 nice examples in academia...

859
00:36:30,720 --> 00:36:34,359
the ad experiment that I just showed was done

860
00:36:34,359 --> 00:36:36,120
using AdFisher.

861
00:36:36,120 --> 00:36:38,200
So, these are 2 toolkits that you can use

862
00:36:38,200 --> 00:36:41,440
to start doing this sort of auditing.

863
00:36:41,440 --> 00:36:44,579
Another technology that is generally useful,

864
00:36:44,579 --> 00:36:46,700
but particularly in the case of prediction

865
00:36:46,700 --> 00:36:48,789
it's useful to maintain access to

866
00:36:48,789 --> 00:36:50,289
as many sites as possible,

867
00:36:50,289 --> 00:36:52,589
through anonymity systems like Tor,

868
00:36:52,589 --> 00:36:54,319
because it's impossible to personalize

869
00:36:54,319 --> 00:36:55,650
when everyone looks the same.

870
00:36:55,650 --> 00:36:59,130
So this is a very important technology.

871
00:36:59,130 --> 00:37:01,519
Something that doesn't really exist,

872
00:37:01,519 --> 00:37:03,630
but that I think is pretty important,

873
00:37:03,630 --> 00:37:05,829
is having some tool to view the landscape.

874
00:37:05,829 --> 00:37:08,160
So, as we know from these few studies

875
00:37:08,160 --> 00:37:10,440
that have been done,

876
00:37:10,440 --> 00:37:12,059
different people are not seeing the internet

877
00:37:12,059 --> 00:37:12,950
in the same way.

878
00:37:12,950 --> 00:37:15,730
This is one reason why we don't like censorship.

879
00:37:15,730 --> 00:37:17,880
But, rich and poor people,

880
00:37:17,880 --> 00:37:19,659
from academic research we know that

881
00:37:19,659 --> 00:37:23,790
there is widespread price discrimination
on the internet,

882
00:37:23,790 --> 00:37:25,650
so rich and poor people see a different view

883
00:37:25,650 --> 00:37:26,970
of the Internet,

884
00:37:26,970 --> 00:37:28,400
men and women see a different view

885
00:37:28,400 --> 00:37:29,940
of the Internet.

886
00:37:29,940 --> 00:37:31,200
We wanna know how different people

887
00:37:31,200 --> 00:37:32,450
see the same site,

888
00:37:32,450 --> 00:37:34,329
and this could be the beginning of

889
00:37:34,329 --> 00:37:36,329
a defense system for this sort of

890
00:37:36,329 --> 00:37:41,730
manipulation/tampering that I showed earlier.

891
00:37:41,730 --> 00:37:45,549
Another interesting approach is obfuscation:

892
00:37:45,549 --> 00:37:46,980
injecting noise into the system.

893
00:37:46,980 --> 00:37:49,190
So there's an interesting browser extension

894
00:37:49,190 --> 00:37:51,720
called Adnauseum, that's for Firefox,

895
00:37:51,720 --> 00:37:54,579
which clicks on every single ad you're served,

896
00:37:54,579 --> 00:37:55,680
to inject noise.

897
00:37:55,680 --> 00:37:57,019
So that's, I think, an interesting approach

898
00:37:57,019 --> 00:38:00,170
that people haven't looked at too much.

899
00:38:00,170 --> 00:38:03,780
So in terms of policy,

900
00:38:03,780 --> 00:38:06,530
Facebook and Google, these internet giants,

901
00:38:06,530 --> 00:38:08,829
have billions of users,

902
00:38:08,829 --> 00:38:12,220
and sometimes they like to call themselves

903
00:38:12,220 --> 00:38:13,769
new public utilities,

904
00:38:13,769 --> 00:38:15,000
and if that's the case then

905
00:38:15,000 --> 00:38:17,549
it might be necessary to subject them

906
00:38:17,549 --> 00:38:20,539
to additional regulation.

907
00:38:20,539 --> 00:38:21,990
Another problem that's come up,

908
00:38:21,990 --> 00:38:23,539
for example with some of the studies

909
00:38:23,539 --> 00:38:24,900
that Facebook has done,

910
00:38:24,900 --> 00:38:29,039
is sometimes a lack of ethics review.

911
00:38:29,039 --> 00:38:31,059
So, for example, in academia,

912
00:38:31,059 --> 00:38:33,859
if you're gonna do research involving humans,

913
00:38:33,859 --> 00:38:35,390
there's an Institutional Review Board

914
00:38:35,390 --> 00:38:36,970
that you go to that verifies that

915
00:38:36,970 --> 00:38:39,140
you're doing things in an ethical manner.

916
00:38:39,140 --> 00:38:40,910
And some companies do have internal

917
00:38:40,910 --> 00:38:43,029
review processes like this, but it might

918
00:38:43,029 --> 00:38:45,119
be important to have an independent

919
00:38:45,119 --> 00:38:48,200
ethics board that does this sort of thing.

920
00:38:48,200 --> 00:38:50,849
And we *really* need 3rd-party auditing.

921
00:38:50,849 --> 00:38:54,519
So, for example, some companies

922
00:38:54,519 --> 00:38:56,220
don't want auditing to be done

923
00:38:56,220 --> 00:38:59,190
because of IP concerns,

924
00:38:59,190 --> 00:39:00,579
and if that's the concern

925
00:39:00,579 --> 00:39:03,180
maybe having a set of people

926
00:39:03,180 --> 00:39:05,680
that are not paid by the company

927
00:39:05,680 --> 00:39:07,200
to check how some of these systems

928
00:39:07,200 --> 00:39:08,640
are being implemented,

929
00:39:08,640 --> 00:39:11,240
could help give us confidence that

930
00:39:11,240 --> 00:39:16,979
things are being done in a reasonable way.

931
00:39:16,979 --> 00:39:20,269
So, in closing,

932
00:39:20,269 --> 00:39:23,180
algorithmic decision making is here,

933
00:39:23,180 --> 00:39:26,140
and it's barreling forward
at a very fast rate,

934
00:39:26,140 --> 00:39:27,890
and we need to figure out what

935
00:39:27,890 --> 00:39:30,410
the guide rails should be,

936
00:39:30,410 --> 00:39:31,380
and how to install them

937
00:39:31,380 --> 00:39:33,119
to handle some of the potential threats.

938
00:39:33,119 --> 00:39:35,470
There's a huge amount of power here.

939
00:39:35,470 --> 00:39:37,910
We need more openness in these systems.

940
00:39:37,910 --> 00:39:39,589
And, right now,

941
00:39:39,589 --> 00:39:41,559
with the intelligent systems that do exist,

942
00:39:41,559 --> 00:39:43,920
we don't know what's occurring really,

943
00:39:43,920 --> 00:39:46,510
and we need to watch carefully

944
00:39:46,510 --> 00:39:49,099
where and how these systems are being used.

945
00:39:49,099 --> 00:39:50,690
And I think this community has

946
00:39:50,690 --> 00:39:53,940
an important role to play in this fight,

947
00:39:53,940 --> 00:39:55,730
to study what's being done,

948
00:39:55,730 --> 00:39:57,160
to show people what's being done,

949
00:39:57,160 --> 00:39:58,670
to raise the debate and advocate,

950
00:39:58,670 --> 00:40:01,200
and, where necessary, to resist.

951
00:40:01,200 --> 00:40:03,339
Thanks.

952
00:40:03,339 --> 00:40:13,129
*applause*

953
00:40:13,129 --> 00:40:17,519
Herald: So, let's have a question and answer.

954
00:40:17,519 --> 00:40:19,080
Microphone 2, please.

955
00:40:19,080 --> 00:40:20,199
Mic 2: Hi there.

956
00:40:20,199 --> 00:40:23,259
Thanks for the talk.

957
00:40:23,259 --> 00:40:26,230
Since these pre-crime softwares also

958
00:40:26,230 --> 00:40:27,359
arrived here in Germany

959
00:40:27,359 --> 00:40:29,680
with the start of the so-called CopWatch system

960
00:40:29,680 --> 00:40:32,779
in southern Germany,
and Bavaria and Nuremberg especially,

961
00:40:32,779 --> 00:40:35,420
where they try to predict burglary crime

962
00:40:35,420 --> 00:40:37,460
using that criminal record

963
00:40:37,460 --> 00:40:40,170
geographical analysis, like you explained,

964
00:40:40,170 --> 00:40:43,380
leads me to a 2-fold question:

965
00:40:43,380 --> 00:40:47,900
first, have you heard of any research

966
00:40:47,900 --> 00:40:49,760
that measures the effectiveness

967
00:40:49,760 --> 00:40:53,690
of such measures, at all?

968
00:40:53,690 --> 00:40:57,040
And, second:

969
00:40:57,040 --> 00:41:00,599
What do you think of the game theory

970
00:41:00,599 --> 00:41:02,690
if the thieves or the bad guys

971
00:41:02,690 --> 00:41:07,619
know the system, and when they
game the system,

972
00:41:07,619 --> 00:41:09,980
they will probably win,

973
00:41:09,980 --> 00:41:11,640
since one police officer in an interview said

974
00:41:11,640 --> 00:41:14,019
this system is used to reduce

975
00:41:14,019 --> 00:41:16,460
the personal costs of policing,

976
00:41:16,460 --> 00:41:19,460
so they just send the guys
where the red flags are,

977
00:41:19,460 --> 00:41:22,290
and the others take the day off.

978
00:41:22,290 --> 00:41:24,360
Dr. Helsby: Yup.

979
00:41:24,360 --> 00:41:27,150
Um, so, with respect to

980
00:41:27,150 --> 00:41:30,990
testing the effectiveness of predictive policing,

981
00:41:30,990 --> 00:41:31,990
the companies,

982
00:41:31,990 --> 00:41:33,910
some of them do randomized, controlled trials

983
00:41:33,910 --> 00:41:35,240
and claim a reduction in policing.

984
00:41:35,240 --> 00:41:38,349
The best independent study that I've seen

985
00:41:38,349 --> 00:41:40,680
is by this RAND Corporation

986
00:41:40,680 --> 00:41:43,120
that did a study in, I think,

987
00:41:43,120 --> 00:41:44,920
Shreveport, Louisiana,

988
00:41:44,920 --> 00:41:47,589
and in their report they claim

989
00:41:47,589 --> 00:41:50,190
that there was no statistically significant

990
00:41:50,190 --> 00:41:52,900
difference, they didn't find any reduction.

991
00:41:52,900 --> 00:41:54,099
And it *was* specifically looking at

992
00:41:54,099 --> 00:41:56,730
property crime, which I think you mentioned.

993
00:41:56,730 --> 00:41:59,480
So, I think right now there's sort of

994
00:41:59,480 --> 00:42:01,069
conflicting reports between

995
00:42:01,069 --> 00:42:06,180
the independent auditors
and these company claims.

996
00:42:06,180 --> 00:42:09,289
So there definitely needs to be more study.

997
00:42:09,289 --> 00:42:12,240
And then, the 2nd thing...sorry,
remind me what it was?

998
00:42:12,240 --> 00:42:15,189
Mic 2: What about the guys gaming the system?

999
00:42:15,189 --> 00:42:16,949
Dr. Helsby: Oh, yeah.

1000
00:42:16,949 --> 00:42:18,900
I think it's a legitimate concern.

1001
00:42:18,900 --> 00:42:22,480
Like, if all the outputs
were just immediately public,

1002
00:42:22,480 --> 00:42:24,599
then, yes, everyone knows the location

1003
00:42:24,599 --> 00:42:26,549
of all police officers,

1004
00:42:26,549 --> 00:42:29,009
and I imagine that people would have

1005
00:42:29,009 --> 00:42:30,779
a problem with that.

1006
00:42:30,779 --> 00:42:32,679
Yup.

1007
00:42:32,679 --> 00:42:35,990
Heraldl: Microphone #4, please.

1008
00:42:35,990 --> 00:42:39,369
Mic 4: Yeah, this is not actually a question,

1009
00:42:39,369 --> 00:42:40,779
but just a comment.

1010
00:42:40,779 --> 00:42:42,970
I've enjoyed your talk very much,

1011
00:42:42,970 --> 00:42:47,789
in particular after watching

1012
00:42:47,789 --> 00:42:52,270
the talk in Hall 1 earlier in the afternoon.

1013
00:42:52,270 --> 00:42:55,730
The "Say Hi to Your New Boss", about

1014
00:42:55,730 --> 00:42:59,609
algorithms that are trained with big data,

1015
00:42:59,609 --> 00:43:02,390
and finally make decisions.

1016
00:43:02,390 --> 00:43:08,210
And I think these 2 talks are kind of complementary,

1017
00:43:08,210 --> 00:43:11,309
and if people are interested in the topic

1018
00:43:11,309 --> 00:43:14,710
they might want to check out the other talk

1019
00:43:14,710 --> 00:43:16,259
and watch it later, because these

1020
00:43:16,259 --> 00:43:17,319
fit very well together.

1021
00:43:17,319 --> 00:43:19,589
Dr. Helsby: Yeah, it was a great talk.

1022
00:43:19,589 --> 00:43:22,130
Herald: Microphone #2, please.

1023
00:43:22,130 --> 00:43:25,049
Mic 2: Um, yeah, you mentioned

1024
00:43:25,049 --> 00:43:27,319
the need to have some kind of 3rd-party auditing

1025
00:43:27,319 --> 00:43:30,900
or some kind of way to

1026
00:43:30,900 --> 00:43:31,930
peek into these algorithms

1027
00:43:31,930 --> 00:43:33,079
and to see what they're doing,

1028
00:43:33,079 --> 00:43:34,420
and to see if they're being fair.

1029
00:43:34,420 --> 00:43:36,199
Can you talk a little bit more about that?

1030
00:43:36,199 --> 00:43:38,059
Like, going forward,

1031
00:43:38,059 --> 00:43:40,690
some kind of regulatory structures

1032
00:43:40,690 --> 00:43:44,200
would probably have to emerge

1033
00:43:44,200 --> 00:43:47,200
to analyze and to look at

1034
00:43:47,200 --> 00:43:49,339
these black boxes that are just sort of

1035
00:43:49,339 --> 00:43:51,309
popping up everywhere and, you know,

1036
00:43:51,309 --> 00:43:52,939
controlling more and more of the things

1037
00:43:52,939 --> 00:43:56,150
in our lives, and important decisions.

1038
00:43:56,150 --> 00:43:58,539
So, just, what kind of discussions

1039
00:43:58,539 --> 00:43:59,460
are there for that?

1040
00:43:59,460 --> 00:44:01,809
And what kind of possibility
is there for that?

1041
00:44:01,809 --> 00:44:04,900
And, I'm sure that companies would be

1042
00:44:04,900 --> 00:44:08,000
very, very resistant to

1043
00:44:08,000 --> 00:44:09,890
any kind of attempt to look into

1044
00:44:09,890 --> 00:44:13,890
algorithms, and to...

1045
00:44:13,890 --> 00:44:15,070
Dr. Helsby: Yeah, I mean, definitely

1046
00:44:15,070 --> 00:44:18,069
companies would be very resistant to

1047
00:44:18,069 --> 00:44:19,670
having people look into their algorithms.

1048
00:44:19,670 --> 00:44:22,190
So, if you wanna do a very rigorous

1049
00:44:22,190 --> 00:44:23,339
audit of what's going on

1050
00:44:23,339 --> 00:44:25,660
then it's probably necessary to have

1051
00:44:25,660 --> 00:44:26,589
a few people come in

1052
00:44:26,589 --> 00:44:28,900
and sign NDAs, and then

1053
00:44:28,900 --> 00:44:31,039
look through the systems.

1054
00:44:31,039 --> 00:44:33,140
So, that's one way to proceed.

1055
00:44:33,140 --> 00:44:35,049
But, another way to proceed that--

1056
00:44:35,049 --> 00:44:38,720
so, these academic researchers have done

1057
00:44:38,720 --> 00:44:40,009
a few experiments

1058
00:44:40,009 --> 00:44:42,809
and found some interesting things,

1059
00:44:42,809 --> 00:44:45,500
and that's sort all the attempts at auditing

1060
00:44:45,500 --> 00:44:46,450
that we've seen:

1061
00:44:46,450 --> 00:44:48,490
there was 1 attempt in 2012
for the Obama campaign,

1062
00:44:48,490 --> 00:44:49,910
but there's really not been any

1063
00:44:49,910 --> 00:44:51,500
sort of systematic attempt--

1064
00:44:51,500 --> 00:44:52,589
you know, like, in censorship

1065
00:44:52,589 --> 00:44:54,539
we see a systematic attempt to

1066
00:44:54,539 --> 00:44:56,779
do measurement as often as possible,

1067
00:44:56,779 --> 00:44:58,240
check what's going on,

1068
00:44:58,240 --> 00:44:59,339
and that itself, you know,

1069
00:44:59,339 --> 00:45:00,900
can act as an oversight mechanism.

1070
00:45:00,900 --> 00:45:01,880
But, right now,

1071
00:45:01,880 --> 00:45:03,900
I think many of these companies

1072
00:45:03,900 --> 00:45:05,259
realize no one is watching,

1073
00:45:05,259 --> 00:45:07,160
so there's no real push to have

1074
00:45:07,160 --> 00:45:10,440
people verify: are you being fair when you

1075
00:45:10,440 --> 00:45:11,539
implement this system?

1076
00:45:11,539 --> 00:45:12,969
Because no one's really checking.

1077
00:45:12,969 --> 00:45:13,980
Mic 2: Do you think that,

1078
00:45:13,980 --> 00:45:15,339
at some point, it would be like

1079
00:45:15,339 --> 00:45:19,059
an FDA or SEC, to give some American examples...

1080
00:45:19,059 --> 00:45:21,490
an actual government regulatory agency

1081
00:45:21,490 --> 00:45:24,960
that has the power and ability to

1082
00:45:24,960 --> 00:45:27,930
not just sort of look and try to

1083
00:45:27,930 --> 00:45:31,710
reverse engineer some of these algorithms,

1084
00:45:31,710 --> 00:45:33,920
but actually peek in there and make sure

1085
00:45:33,920 --> 00:45:36,420
that things are fair, because it seems like

1086
00:45:36,420 --> 00:45:38,240
there's just-- it's so important now

1087
00:45:38,240 --> 00:45:41,769
that, again, it could be the difference between

1088
00:45:41,769 --> 00:45:42,930
life and death, between

1089
00:45:42,930 --> 00:45:44,589
getting a job, not getting a job,

1090
00:45:44,589 --> 00:45:46,130
being pulled over,
not being pulled over,

1091
00:45:46,130 --> 00:45:48,069
being racially profiled,
not racially profiled,

1092
00:45:48,069 --> 00:45:49,410
things like that.
Dr. Helsby: Right.

1093
00:45:49,410 --> 00:45:50,430
Mic 2: Is it moving in that direction?

1094
00:45:50,430 --> 00:45:52,249
Or is it way too early for it?

1095
00:45:52,249 --> 00:45:55,110
Dr. Helsby: I mean, so some people have...

1096
00:45:55,110 --> 00:45:56,859
someone has called for, like,

1097
00:45:56,859 --> 00:45:59,079
a Federal Search Commission,

1098
00:45:59,079 --> 00:46:00,930
or like a Federal Algorithms Commission,

1099
00:46:00,930 --> 00:46:03,200
that would do this sort of oversight work,

1100
00:46:03,200 --> 00:46:06,130
but it's in such early stages right now

1101
00:46:06,130 --> 00:46:09,970
that there's no real push for that.

1102
00:46:09,970 --> 00:46:13,330
But I think it's a good idea.

1103
00:46:13,330 --> 00:46:15,729
Herald: And again, #2 please.

1104
00:46:15,729 --> 00:46:17,059
Mic 2: Thank you again for your talk.

1105
00:46:17,059 --> 00:46:19,309
I was just curious if you can point

1106
00:46:19,309 --> 00:46:20,440
to any examples of

1107
00:46:20,440 --> 00:46:22,619
either current producers or consumers

1108
00:46:22,619 --> 00:46:24,029
of these algorithmic systems

1109
00:46:24,029 --> 00:46:26,390
who are actively and publicly trying

1110
00:46:26,390 --> 00:46:27,720
to do so in a responsible manner

1111
00:46:27,720 --> 00:46:29,720
by describing what they're trying to do

1112
00:46:29,720 --> 00:46:31,380
and how they're going about it?

1113
00:46:31,380 --> 00:46:37,210
Dr. Helsby: So, yeah, there are some companies,

1114
00:46:37,210 --> 00:46:39,000
for example, like DataKind,

1115
00:46:39,000 --> 00:46:42,710
that try to deploy algorithmic systems

1116
00:46:42,710 --> 00:46:44,640
in as responsible a way as possible,

1117
00:46:44,640 --> 00:46:47,250
for like public policy.

1118
00:46:47,250 --> 00:46:49,549
Like, I actually also implement systems

1119
00:46:49,549 --> 00:46:51,750
for public policy in a transparent way.

1120
00:46:51,750 --> 00:46:54,329
Like, all the code is in GitHub, etc.

1121
00:46:54,329 --> 00:47:00,020
And it is also the case to give credit to

1122
00:47:00,020 --> 00:47:01,990
Google, and these giants,

1123
00:47:01,990 --> 00:47:06,109
they're trying to implement transparency systems

1124
00:47:06,109 --> 00:47:08,170
that help you understand.

1125
00:47:08,170 --> 00:47:09,289
This has been done with respect to

1126
00:47:09,289 --> 00:47:12,329
how your data is being collected,

1127
00:47:12,329 --> 00:47:14,579
but for example if you go on Amazon.com

1128
00:47:14,579 --> 00:47:17,890
you can see a recommendation has been made,

1129
00:47:17,890 --> 00:47:19,420
and that is pretty transparent.

1130
00:47:19,420 --> 00:47:21,480
You can see "this item
was recommended to me,"

1131
00:47:21,480 --> 00:47:25,039
so you know that prediction
is being used in this case,

1132
00:47:25,039 --> 00:47:27,089
and it will say why prediction is being used:

1133
00:47:27,089 --> 00:47:29,230
because you purchased some item.

1134
00:47:29,230 --> 00:47:30,380
And Google has a similar thing,

1135
00:47:30,380 --> 00:47:32,420
if you go to like Google Ad Settings,

1136
00:47:32,420 --> 00:47:35,249
you can even turn off personalization of ads

1137
00:47:35,249 --> 00:47:36,380
if you want,

1138
00:47:36,380 --> 00:47:38,119
and you can also see some of the inferences

1139
00:47:38,119 --> 00:47:39,400
that have been learned about you.

1140
00:47:39,400 --> 00:47:40,819
A subset of the inferences that have been

1141
00:47:40,819 --> 00:47:41,700
learned about you.

1142
00:47:41,700 --> 00:47:43,940
So, like, what interests...

1143
00:47:43,940 --> 00:47:47,869
Herald: A question from the internet, please?

1144
00:47:47,869 --> 00:47:50,930
Signal Angel: Yes, billetQ is asking

1145
00:47:50,930 --> 00:47:54,479
how do you avoid biases in machine learning?

1146
00:47:54,479 --> 00:47:57,380
I asume analysis system, for example,

1147
00:47:57,380 --> 00:48:00,420
could be biased against women and minorities,

1148
00:48:00,420 --> 00:48:04,960
if used for hiring decisions
based on known data.

1149
00:48:04,960 --> 00:48:06,499
Dr. Helsby: Yeah, so one thing is to

1150
00:48:06,499 --> 00:48:08,529
just explicitly check.

1151
00:48:08,529 --> 00:48:12,199
So, you can check to see how

1152
00:48:12,199 --> 00:48:14,309
positive outcomes are being distributed

1153
00:48:14,309 --> 00:48:16,779
among those protected classes.

1154
00:48:16,779 --> 00:48:19,210
You could also incorporate these sort of

1155
00:48:19,210 --> 00:48:21,440
fairness constraints in the function

1156
00:48:21,440 --> 00:48:24,069
that you optimize when you train the system,

1157
00:48:24,069 --> 00:48:25,950
and so, if you're interested in reading more

1158
00:48:25,950 --> 00:48:28,960
about this, the 2 papers--

1159
00:48:28,960 --> 00:48:31,909
let me go to References--

1160
00:48:31,909 --> 00:48:32,730
there's a good paper called

1161
00:48:32,730 --> 00:48:35,339
Fairness Through Awareness that describes

1162
00:48:35,339 --> 00:48:37,499
how to go about doing this,

1163
00:48:37,499 --> 00:48:39,579
so I recommend this person read that.

1164
00:48:39,579 --> 00:48:40,970
It's good.

1165
00:48:40,970 --> 00:48:43,400
Herald: Microphone 2, please.

1166
00:48:43,400 --> 00:48:45,400
Mic2: Thanks again for your talk.

1167
00:48:45,400 --> 00:48:49,649
Umm, hello?

1168
00:48:49,649 --> 00:48:50,999
Okay.

1169
00:48:50,999 --> 00:48:52,960
Umm, I see of course a problem with

1170
00:48:52,960 --> 00:48:54,619
all the black boxes that you describe

1171
00:48:54,619 --> 00:48:57,069
with regards for the crime systems,

1172
00:48:57,069 --> 00:48:59,569
but when we look at the advertising systems

1173
00:48:59,569 --> 00:49:02,169
in many cases they are very networked.

1174
00:49:02,169 --> 00:49:04,160
There are many different systems collaborating

1175
00:49:04,160 --> 00:49:07,109
and exchanging data via open APIs:

1176
00:49:07,109 --> 00:49:08,720
RESTful APIs, and various

1177
00:49:08,720 --> 00:49:11,720
demand-side platforms
and audience-exchange platforms,

1178
00:49:11,720 --> 00:49:12,539
and everything.

1179
00:49:12,539 --> 00:49:15,420
So, can that help to at least

1180
00:49:15,420 --> 00:49:22,160
increase awareness on where targeting, personalization

1181
00:49:22,160 --> 00:49:23,679
might be happening?

1182
00:49:23,679 --> 00:49:26,190
I mean, I'm looking at systems like

1183
00:49:26,190 --> 00:49:29,539
BuiltWith, that surface what kind of

1184
00:49:29,539 --> 00:49:31,380
JavaScript libraries are used elsewhere.

1185
00:49:31,380 --> 00:49:32,999
So, is that something that could help

1186
00:49:32,999 --> 00:49:35,670
at least to give a better awareness

1187
00:49:35,670 --> 00:49:38,690
and listing all the points where

1188
00:49:38,690 --> 00:49:41,409
you might be targeted...

1189
00:49:41,409 --> 00:49:43,070
Dr. Helsby: So, like, with respect to

1190
00:49:43,070 --> 00:49:46,460
advertising, the fact that
there is behind the scenes

1191
00:49:46,460 --> 00:49:48,450
this like complicated auction process

1192
00:49:48,450 --> 00:49:50,650
that's occurring, just makes things

1193
00:49:50,650 --> 00:49:51,819
a lot more complicated.

1194
00:49:51,819 --> 00:49:54,170
So, for example, I said briefly

1195
00:49:54,170 --> 00:49:57,269
that they found that there's this
statistical difference

1196
00:49:57,269 --> 00:49:59,099
between how men and women are treated,

1197
00:49:59,099 --> 00:50:01,339
but it doesn't necessarily mean that

1198
00:50:01,339 --> 00:50:03,640
"Oh, the algorithm is definitely biased."

1199
00:50:03,640 --> 00:50:06,369
It could be because of this auction process,

1200
00:50:06,369 --> 00:50:10,569
it could be that women are considered

1201
00:50:10,569 --> 00:50:12,630
more valuable when it comes to advertising,

1202
00:50:12,630 --> 00:50:15,099
and so these executive ads are getting

1203
00:50:15,099 --> 00:50:17,160
outbid by some other ads,

1204
00:50:17,160 --> 00:50:18,890
and so there's a lot of potential

1205
00:50:18,890 --> 00:50:20,490
causes for that.

1206
00:50:20,490 --> 00:50:22,829
So, I think it just makes things
a lot more complicated.

1207
00:50:22,829 --> 00:50:25,910
I don't know if it helps
with the bias at all.

1208
00:50:25,910 --> 00:50:27,410
Mic 2: Well, the question was more

1209
00:50:27,410 --> 00:50:30,299
a direction... can it help to surface

1210
00:50:30,299 --> 00:50:32,499
and make people aware of that fact?

1211
00:50:32,499 --> 00:50:34,930
I mean, I can talk to my kids probably,

1212
00:50:34,930 --> 00:50:36,259
and they will probably understand,

1213
00:50:36,259 --> 00:50:38,420
but I can't explain that to my grandma,

1214
00:50:38,420 --> 00:50:43,150
who's also, umm, looking at an iPad.

1215
00:50:43,150 --> 00:50:44,289
Dr. Helsby: So, the fact that

1216
00:50:44,289 --> 00:50:45,690
the systems are...

1217
00:50:45,690 --> 00:50:48,509
I don't know if I understand.

1218
00:50:48,509 --> 00:50:50,529
Mic 2: OK. I think that the main problem

1219
00:50:50,529 --> 00:50:53,710
is that we are behind the industry efforts

1220
00:50:53,710 --> 00:50:57,179
to being targeted at, and many people

1221
00:50:57,179 --> 00:51:00,579
do know, but a lot more people don't know,

1222
00:51:00,579 --> 00:51:03,160
and making them aware of the fact

1223
00:51:03,160 --> 00:51:07,269
that they are a target, in a way,

1224
00:51:07,269 --> 00:51:10,990
is something that can only be shown

1225
00:51:10,990 --> 00:51:14,779
by a 3rd party that disposed that data,

1226
00:51:14,779 --> 00:51:16,339
and make audits in a way--

1227
00:51:16,339 --> 00:51:17,929
maybe in an automated way.

1228
00:51:17,929 --> 00:51:19,170
Dr. Helsby: Right.

1229
00:51:19,170 --> 00:51:21,410
Yeah, I think it certainly
could help with advocacy

1230
00:51:21,410 --> 00:51:23,059
if that's the point, yeah.

1231
00:51:23,059 --> 00:51:26,079
Herald: Another question
from the internet, please.

1232
00:51:26,079 --> 00:51:29,319
Signal Angel: Yes, on IRC they are asking

1233
00:51:29,319 --> 00:51:31,440
if we know that prediction in some cases

1234
00:51:31,440 --> 00:51:34,460
provides an influence that cannot be controlled.

1235
00:51:34,460 --> 00:51:38,480
So, r4v5 would like to know from you

1236
00:51:38,480 --> 00:51:41,519
if there are some cases or areas where

1237
00:51:41,519 --> 00:51:45,060
machine learning simply shouldn't go?

1238
00:51:45,060 --> 00:51:48,349
Dr. Helsby: Umm, so I think...

1239
00:51:48,349 --> 00:51:52,559
I mean, yes, I think that it is the case

1240
00:51:52,559 --> 00:51:54,650
that in some cases machine learning

1241
00:51:54,650 --> 00:51:56,180
might not be appropriate.

1242
00:51:56,180 --> 00:51:58,359
For example, if you use machine learning

1243
00:51:58,359 --> 00:52:00,970
to decide who should be searched.

1244
00:52:00,970 --> 00:52:02,619
I don't think it should be the case that

1245
00:52:02,619 --> 00:52:03,809
machine learning algorithms should

1246
00:52:03,809 --> 00:52:05,440
ever be used to determine

1247
00:52:05,440 --> 00:52:08,430
probable cause, or something like that.

1248
00:52:08,430 --> 00:52:12,339
So, if it's just one piece of evidence

1249
00:52:12,339 --> 00:52:13,299
that you consider,

1250
00:52:13,299 --> 00:52:14,990
and there's human oversight always,

1251
00:52:14,990 --> 00:52:18,519
*maybe* it's fine, but

1252
00:52:18,519 --> 00:52:20,839
we should be very suspicious and hesitant

1253
00:52:20,839 --> 00:52:22,119
in certain contexts where

1254
00:52:22,119 --> 00:52:24,529
the ramifications are very serious.

1255
00:52:24,529 --> 00:52:27,259
Like the No Fly List, and so on.

1256
00:52:27,259 --> 00:52:29,200
Herald: And #2 again.

1257
00:52:29,200 --> 00:52:30,809
Mic 2: A second question

1258
00:52:30,809 --> 00:52:33,509
that just occurred to me, if you don't mind.

1259
00:52:33,509 --> 00:52:35,339
Umm, until the advent of

1260
00:52:35,339 --> 00:52:36,559
algorithmic systems,

1261
00:52:36,559 --> 00:52:40,470
when there've been cases of serious harm

1262
00:52:40,470 --> 00:52:42,799
that's been resulted in individuals or groups,

1263
00:52:42,799 --> 00:52:44,579
and it's been demonstrated that

1264
00:52:44,579 --> 00:52:46,029
it's occurred because of

1265
00:52:46,029 --> 00:52:49,400
an individual or a system of people

1266
00:52:49,400 --> 00:52:53,019
being systematically biased, then often

1267
00:52:53,019 --> 00:52:55,130
one of the actions that's taken is

1268
00:52:55,130 --> 00:52:56,869
pressure's applied, and then

1269
00:52:56,869 --> 00:52:59,660
people are required to change,

1270
00:52:59,660 --> 00:53:01,049
and hopely be held responsible,

1271
00:53:01,049 --> 00:53:02,910
and then change the way that they do things

1272
00:53:02,910 --> 00:53:06,400
to try to remove bias from that system.

1273
00:53:06,400 --> 00:53:07,839
What's the current thinking about

1274
00:53:07,839 --> 00:53:10,299
how we can go about doing that

1275
00:53:10,299 --> 00:53:12,599
when the systems that are doing that

1276
00:53:12,599 --> 00:53:13,650
are algorithmic?

1277
00:53:13,650 --> 00:53:15,999
Is it just going to be human oversight,

1278
00:53:15,999 --> 00:53:16,910
and humans are gonna have to be

1279
00:53:16,910 --> 00:53:18,379
held responsible for the oversight?

1280
00:53:18,379 --> 00:53:20,890
Dr. Helsby: So, in terms of bias,

1281
00:53:20,890 --> 00:53:22,569
if we're concerned about bias towards

1282
00:53:22,569 --> 00:53:24,019
particular types of people,

1283
00:53:24,019 --> 00:53:25,710
that's something that we can optimize for.

1284
00:53:25,710 --> 00:53:28,839
So, we can train systems that are unbiased

1285
00:53:28,839 --> 00:53:30,019
in this way.

1286
00:53:30,019 --> 00:53:32,109
So that's one way to deal with it.

1287
00:53:32,109 --> 00:53:34,039
But there's always gonna be errors,

1288
00:53:34,039 --> 00:53:35,420
so that's sort of a separate issue

1289
00:53:35,420 --> 00:53:37,509
from the bias, and in the case

1290
00:53:37,509 --> 00:53:39,180
where there are errors,

1291
00:53:39,180 --> 00:53:40,539
there must be oversight.

1292
00:53:40,539 --> 00:53:45,079
So, one way that one could improve

1293
00:53:45,079 --> 00:53:46,410
the way that this is done

1294
00:53:46,410 --> 00:53:48,160
is by making sure that you're

1295
00:53:48,160 --> 00:53:50,799
keeping track of confidence of decisions.

1296
00:53:50,799 --> 00:53:54,039
So, if you have a low confidence prediction,

1297
00:53:54,039 --> 00:53:56,259
then maybe a human
should come in and check things.

1298
00:53:56,259 --> 00:53:58,809
So, that might be one way to proceed.

1299
00:54:02,099 --> 00:54:03,990
Herald: So, there's no more question.

1300
00:54:03,990 --> 00:54:06,199
I close this talk now,

1301
00:54:06,199 --> 00:54:08,239
and thank you very much

1302
00:54:08,239 --> 00:54:09,410
and a big applause to

1303
00:54:09,410 --> 00:54:11,780
Jennifer Helsby!

1304
00:54:11,780 --> 00:54:16,310
*roaring applause*