Beyond Supervised Learning: How Researchers at Tesla and Toyota are Training the Robust ML Models of the Future

For safety-sensitive real-world applications of machine learning technology, robust ML models are essential. However, the need for very large, pre-labeled datasets can be a bottleneck

Training supervised learning machine learning models for safety-sensitive, real-world applications like autonomous vehicles and robots require very large, pre-labeled training datasets.

These datasets are expensive and time-consuming to create, and often cause an innovation bottleneck for researchers.

At RE•WORK’s Deep Learning Summit in San Francisco in February, Tesla Lead Machine Learning Scientist for Autopilot Simulation Landon Smith, and Toyota Senior Manager for Machine Learning Research Adrien Gaidon discussed their work moving beyond supervised learning to create more robust models for autonomous vehicles.

“Deep nets require manually labeling millions of examples and building a dataset can take months and cost millions of dollars. Consequently, overcoming the supervision bottleneck and drastically lowering the dataset costs for large-scale machine learning is one of our main goals,” says Gaidon.

Creating Robust Machine Learning Models

For companies like Toyota and Tesla, it is critical to produce products that are safe to use in the real world. This objective also carries over to their cutting-edge machine learning models.

“At the Toyota Research Institute (TRI) we want first and foremost to [train] robust machine learning models,” Gaidon says. “Why? It's because we want them to be safe to deploy in the real world on safety, critical physical platforms like cars or home robots.”

He continues: “Robust models are defined by design as interpolating smoothly within the training distribution. Recent research has shown that to achieve this objective you not only need a lot of data, but you also need large, in fact, over parameterized models.”

To overcome the bottleneck and achieve the goal of effectively scaling up machine learning initiatives, Gaidon’s team is researching techniques designed to side-step the need for very large, pre-labeled datasets.

One of these techniques is focused on self-supervised algorithms inspired by a process called ‘analysis by synthesis’.

“[Analysis by synthesis] consists in bottom-up proposals. So, they are derived from raw data itself with top-down validation in the form of self-supervised signals derived from our prior knowledge about the problem,” Gaidon says. “Which prior knowledge and how to incorporate it into the learning process are amongst the most important research questions we're exploring.”

And while Gaidon’s team has achieved impressive results that amount to a great validation of self-supervised machine learning, he warns about the potential limitations, especially when applied to the real world.

“The elephant in the room when it comes to self-supervision is that by avoiding costly manual labeling we have also lost something important – our oversight on the data – and we're risking all kinds of nightmarish negative consequences that we are sadly all too familiar with already in computer vision.”

Unlocking the Power of Simulation

For the creators of self-driving cars, cameras are a powerful tool to generate data to feed machine learning algorithms and, ultimately, to control the behavior of their vehicles.

At Tesla, they are harnessing the power of the data supplied by the cameras in every one of their vehicles to inform the ‘world creator’ – a machine learning pipeline designed to recreate the entire world in simulation.

“Autopilot gets a great amount of value on what we call the fleet, which is the millions of Tesla vehicles driving all over the world, each equipped with a full self-driving (FSD) computer,” Smith says.

And while the Tesla fleet produces a huge amount of real-world data, some of the most significant challenges for self-driving capabilities come from very rare or unusual real-world circumstances. These kinds of scenarios are significantly underrepresented in real-world data but self-driving vehicles must be able to deal with them safely.

“Scenarios like pedestrians and dogs running down the highway are very rare in the real world. And they're infrequently encountered even by our fleet of millions of cars, but that doesn't mean it can't happen [and it] doesn't mean that autopilot doesn't need to be able to handle it.” Smith says. “And in simulation, we can produce tens of thousands of these clips overnight.”

By using simulation, Smith’s team can abstract different potential situations and use their knowledge of the problem to apply labels of unlimited complexity.

“In the case of hundreds of pedestrians in a crosswalk, this might be very tedious for a human and difficult for an auto-labeling network, but in simulation with perfect access to the ground truth, we can trivially produce these labels,” Smith says.

However, simulated environments are not without their drawbacks. Not least, the difficulty in representing things like emergencies, erratic human behavior, and the limitations of physics.

Quantifying and closing the gap between the unlikely and the impossible in the simulated environment is one of the core objectives of Smith’s team.

“Simulation fails when it's less expressive than the real world, or when it presents content that could never occur in reality,” Smith explains. “And this conceptual bad simulator not only represents a narrow version of reality, but also a misaligned one which may comprise of completely unrealistic scene layouts, unrealistic lighting, or maybe pedestrians running at the speeds of cars.”

Despite the technical challenges of working with simulated data, there are many benefits when it comes to training machine learning models.

“From a training perspective, this allows the simulation to act as a multiplier for real-world data. We can take a single interesting real clip and procedurally generate variations upon it,” Smith says.

He concludes: “We can alternate the weather, the vehicle colors, the kinematics, and more. This means that one clip might be turned into hundreds of clips automatically, which is obviously very useful for training, but also validation.

___________

We'll be back in San Francisco for Deep Learning Summit 2023 on February 15-16, at the Hotel Kabuki.

Beyond Supervised Learning: How Researchers at Tesla and Toyota are Training the Robust ML Models of the Future

For safety-sensitive real-world applications of machine learning technology, robust ML models are essential. However, the need for very large, pre-labeled datasets can be a bottleneck

AI in Finance: Uncovering the Key to AI-First Banking

How Researchers at Twitter and Snapchat are Using NLP to Identify Hateful Content