The arrival of superhuman machine intelligence will be in my opinion the biggest event in human history. The world’s great powers are finally waking up to this fact, and the world’s largest corporations have known it for some time. But what they may not fully understand is that how AI evolves will determine whether this event is also our last.
The problem is not the science – fiction plot that preoccupies Hollywood and the media- the humanold robot that spontaneously becomes conscious and decides to hate humans. Rather, it is the creation of machines that can draw on more information and look further into the future than ever humans can, exceeding our capacity for decisionmaking in the real world.
To better understand how and why this could lead to serious problems, we must first go back to the basic building blocks of most AI systems. The “standard model” in AI, borrowed from philosophical and economic notions of rational behavior, looks like this: “Machines are intelligent to the extent that their actions can be expected to achieve their objectives.”
Because machines, unlike humans, have no objectives of their own, we give them objectives to achieve. In other words, we build machines, feed objectives into them, and off they go. The more intelligent the machine, the more likely it is to complete that objective. This model recurs throughout society, not just AI Control engineers design autopilots to minimize deviations from level flight; statisticians design algorithms that reduce prediction errors; retailers choose store locations that will maximize shareholder return; and governments make policy choices to accelerate GDP growth as seen in China. Unfortunately, this standard model is a big mistake. It makes in my opinion no sense to design machines that are beneficial to us only if we write down our objectives completely and correctly, because if we insert the wrong objectives into the machine and it is more intelligent than us, we lose.
Until recently, we completely avoided the potentially serious consequences of poorly designed objectives only because our AI technology was not especially smart and it was mostly confined to the lab. Now, however, even the relatively simple learning algorithms for social media, which optimize clicks by manipulating human preferences, have been disastrous for democratic systems because they are so pervasive in the real world.
The effects of super intelligent algorithm operating on a global scale could be for more severe. What if a superintelligent climate control system for example, given the job of restoring carbon dioxide concentrations to preindustrial levels, believes the solution is to reduce the human population to zero?
Some AI researchers like to claim that “we can always just switch them off” – but this in my opinion makes no more sense than arguing that we can always just play better moves than the superhuman chess or Go program we’re facing. The machine will anticipate all the possible ways in which a human might interfere and take preemptive steps to prevent this from happening. The logical solution, then, is to change the way we think about AI. Instead of building machines that exist to achieve their objectives, we need in my opinion a model that looks more like this:
“Machines are beneficial to the extent that their actions can be expected to achieve our objectives.” This fix might seem small, but it is in my opinion crucial. Machines that have our objectives as their only guiding principle will be necessarily uncertain about what these objectives are, because they are in us – all eight billion of us, in all our glorious variety, and in generations yet unborn – not in the machines.
Uncertainty about objectives might sound can counterproductive, but it is actually an essential feature of safe intelligent systems. It implies that no matter how intelligent they become, machines will always defer to humans. They will ask permission when appropriate, they will accept correction, and, most important, they will allow themselves to be switched off – precisely because they want to avoid doing whatever it is that would give humans a reason to switch them off.
Once the focus shifts from building machines that are “intelligent” to ones that are “beneficial”, controlling them will become a far easier feat. Consider it the difference between nuclear power and nuclear explosions: a nuclear explosion is nuclear power in an uncontrolled form, and we greatly prefer the controlled form.
Of course, actually putting a model like this into practice requires a great deal of research. We need “minimally invasive” algorithms for decision making that prevent machines from messing with parts of the world whose value they are unsure about, as well as machines that learn more about our true, underlying preferences for how the future should unfold. Such machines will then face an age-old problem of moral philosophy: how to apportion benefits and costs among different individuals with conflicting desires.
All this could take a decade to complete – and even then, regulations will be required to ensure probably safe systems are adopted while those that don’t conform are retired. This of course won’t be easy. But to me it’s clear that this model must be in place before the abilities of AI systems exceed those of humans in the areas that matter. If we manage to do that, the result will be a new relationship between humans and machines, one that I hope will enable us to navigate the next few decades successfully. If we fail, we may face a difficult choice: curtail AI research and forgo the enormous benefits that will flow from it, or risk losing control of our own future.
Some skeptics with the AI community believe they see a third option: continue with the businesses as usual, because superintelligent machines will never arrive. But that’s as if a bus driver, with all of humanity’s passengers, said, “yes, I’m driving as fast as I can toward a clift, but trust me, we’ll run out of gas before we get there!“ I would rather not take the risk.