Teaching a Self-Driving A.I. To Make Human-Like Decisions

A powerful technique to deliver safe, human-like driving decisions

Oliver Cameron
Voyage

--

A Safe, Intelligent Self-Driving A.I.

The key to commercializing self-driving A.I. is to marry safety and intelligence. A safe self-driving A.I. can be validated—knowing what it can and cannot handle. An intelligent self-driving A.I. gets better over time—learning to make increasingly complex driving decisions without getting tripped up. In essence, the combination of the two delivers a safe ride that feels smooth (i.e., not robotic) and gets you to your destination on-time (i.e., doesn’t get stuck).

We’ve previously applied this methodology to our computer vision technology, with safety-focused perception algorithms like Sonic and intelligence-boosting research in Active Learning.

We are now excited to share our work applying this methodology to behavior planning (the module responsible for making key driving decisions). Recent versions of Commander—our self-driving A.I.—feature a new behavior planner that’s delivering human-like decision-making, but without other methods’ gargantuan data requirements. Fueling this method is a new form of fleet learning, with an investment in data quality, rather than quantity.

Read on to learn more about how we’ve achieved this.

Decisions, Decisions, Decisions

Given the countless permutations of scenarios a robotaxi can encounter on the road—often with noisy inputs from computer vision—making the right decision at the right time is one of the biggest challenges in self-driving technology. Making those decisions feel not only safe but natural (i.e., human-like) is key to the adoption of self-driving technology.

Here are some examples of a “decision” our behavior planner is tasked to make:

  • “Robotaxi, overtake a parked vehicle while reducing our speed.”
  • “Robotaxi, yield for the pedestrian at the crosswalk.”
  • “Robotaxi, enter an unprotected intersection after waiting patiently for the right-of-way.”
  • “Robotaxi, yield for an animal in our lane, then overtake it slowly if it persists in our lane.”

To further demonstrate what a “decision” looks like, below are a series of imperfect decisions, where — although the situations were entirely safe — there’s room for improvement in our behavior planner’s decision-making. Imperfect, robotic decisions like these are exactly what we set out to solve with the new version of our behavior planner.

Our self-driving A.I. decided to switch lanes a little too abruptly.
Our self-driving A.I. should have finished the overtake sooner.
Our self-driving A.I. should have overtaken this oncoming bicyclist instead of yielding.

An Introduction to Decision-Making for Self-Driving Cars

Before we share our solution to making safe and human-like decisions, let’s first discuss the two extremes of approaches common in the field. One is informed by rules and another by data.

The classical approach is rule-based. Such an approach is codified with a series of rules, such as “if the car in front is pulled over, then try to overtake.” This approach is verifiable — an important requirement for a safety-critical system — and capable of handling simplistic cases (e.g., lane-keeping). However, rules-based approaches are challenged by noisy inputs and prone to get stuck while navigating unstructured situations that can’t be codified in rules.

The modern approach is to utilize machine-learned models. Instead of manually writing rules for each new scenario you stumble upon, a machine-learned model—trained on large datasets of previous driving data—is asked to make the decision. The model is tasked with tapping into a history of driving events to then make the most human-like decision in the present. This approach is less susceptible to noisy inputs, but also comes with challenges, namely in the predictability and the verification of such a dynamic system.

A Verifiable and Human-Like Way to Make Decisions

Voyage has pioneered a new form of decision making that combines the verifiability and reliability of the classical approach with the modern approach’s intelligence. The result is a technique we call High-Quality Decision Making.

High-Quality Decision Making is fueled by two models, one optimization-based (i.e., reliable) and one machine-learned (i.e., intelligent), with each serving different responsibilities. The optimization-based model is responsible for ensuring our vehicle always adheres to the rules of the road (e.g., preventing the running of stop-lines, or getting too close to pedestrians), while the machine-learned model—trained on rich, historical driving data—is responsible for tapping into its vast history of experience to select the most human-like decision to make from a refined list of safe options.

Combining these models—optimization-based and machine-learned—in the way we have results in deterministic decisions (crucial for a measurable and validated safety case), while delivering smooth, human-like decision-making. What’s more, our decisions only improve over time with the addition of rich data.

Learning Human-Like Decisions With Our Fleet

The machine-learned model behind High-Quality Decision Making is what ultimately delivers human-like decision-making capabilities. To fuel the development and advancement of this model, we’ve innovated on the method in which we learn from our fleet, requiring substantially less data to be effective than other approaches.

While many others seek to create vast datasets collected with large fleets, we have focused on creating a curated set of high-quality and “rich” driving data to train our model. This rich data—updated daily by our fleet—is then put to work to continuously train the machine-learned model within High-Quality Decision Making.

What is “rich” data?

To best demonstrate, let’s walk through a real-world example where our new behavior planner found itself unable to navigate a situation—a roundabout with parked vehicles and a pedestrian. We’ll then show you how—by adding rich data to our dataset—we taught our machine-learned model how to handle this situation and others like it.

A safe but overly cautious yield to the pedestrian next to the vehicle.

In this situation, both the optimization-based and machine-learned models within our behavior planner had determined that the right course of action in this instance was to yield to the static pedestrian in the roundabout. This was a safe action, but something a rider in our robotaxi would not be pleased to wait for. A human driver would undertake them on the right albeit slower than usual.

In High-Quality Decision Making, we teach our self-driving A.I. to navigate this situation—and importantly, others that resemble it—by adding “rich” data to our dataset. This rich data has two components:

  1. The recorded driving data of the event.
  2. Explicit instructions on what our self-driving A.I. did well (an affirmation) or where it needed to improve (a correction) in that event.

In the above instruction, a member of our data team took a look at the recorded driving data and codified—explicitly—the actions they would have taken if they were in the driver’s seat. In this case, the author says we should have navigated the vehicles and the pedestrian on the right, and that we should have done so slower than usual.

What’s special here—and what makes this data rich—is that other machine learning approaches (e.g., end-to-end) may feed recorded driving events with just subtle signals (i.e., driver take-over) into their model, without also feeding explicit detail on was right or wrong in the event. It is then up to the machine-learned model itself to infer exactly what the self-driving A.I. did right or wrong, with the theory being that with enough data, it will figure this out by itself. Our approach explicitly feeds our machine-learned model detailed instructions—interpretable by both machine and human—on exactly how a human would have handled the situation. With this smaller, richer data—and by bounding what the machine-learned model is tasked to do—we have achieved great results.

After training the machine-learned model with our newly added rich data, our self-driving A.I. can now navigate the scenario. The yellow ghost car that gets stuck is utilizing the behavior planner before re-training.

After training, our robotaxi navigates the situation with ease.

It’s important to note that this situation being handled correctly after adding the explicit instructions is not what’s exciting (it is a form of overfitting, after all), but rather how we are seeing the machine-learned model generalize from a relatively small dataset to other situations.

With now thousands of pieces of rich data—similar to the above example—fueling the machine-learned model in High-Quality Decision Making, we are now regularly observing our self-driving A.I. making human-like decisions. These decisions are often not wow-worthy, but it’s the sum of many subtle driving interactions—like nudging for a pedestrian or slowing down appropriately—that makes for a human-like ride.

Here are a few examples of those subtle decisions made by High-Quality Decision Making.

Our self-driving A.I. slows down before nudging right to give the pedestrian more room.
Our self-driving A.I. yields for the crossing pedestrian, but doesn’t wait for them to exit the road before making the unprotected turn.
Our self-driving A.I. slows down to understand if the overtake can be made without interfering with the pedestrians.
Our self-driving A.I. makes a cautious overtake with multiple pedestrians around us.
Our self-driving A.I. overtakes pedestrians cautiously on a curve.
Our self-driving A.I. slows down and then drives between two pedestrians.

High-Quality Decision Making is Live

We are tremendously excited that our new behavior planner is live on roadway today—fueled by thousands of rich affirmations or corrections of prior driving decisions. High-Quality Decision Making will make for improved rides, while also enabling faster adaption of our self-driving A.I. to new geographies. Congratulations to our incredibly talented Autonomy team for delivering such an impactful technology, integral to delivering safe, human-like rides to our customers.

Voyage Commander in action!

--

--

Obsessed with AI. Built self-driving cars at Cruise and Voyage. Board member at Skyways. Y Combinator alum. Angel investor in 50+ AI startups.