OpenAI, the firm most famously known for its research in artificial intelligence and machine learning, showcased a robotic hand solving a Rubik"s Cube today.
The robotic hand, called Dactyl, was trained with a pair of neural networks using a conjunction of a new technique called Automatic Domain Randomization (ADR) and reinforcement learning. The reinforcement learning algorithm utilized by Dactyl has proven its mettle against human opponents in Dota 2 in the past.
But the new technique, ADR, was instrumental in teaching the robotic hand to solve the venerable puzzle by generating increasingly difficult training scenarios for Dactyl to train on. The modus operandi of Automatic Domain Randomization is as follows:
ADR starts with a single, nonrandomized environment, wherein a neural network learns to solve Rubik’s Cube. As the neural network gets better at the task and reaches a performance threshold, the amount of domain randomization is increased automatically. This makes the task harder, since the neural network must now learn to generalize to more randomized environments. The network keeps learning until it again exceeds the performance threshold, when more randomization kicks in, and the process is repeated.
ADR is a key advancement as, according to the team, the establishment of progressively difficult training simulations "frees us from having an accurate model of the real world, and enables the transfer of neural networks learned in simulation to be applied to the real world." Eventually, after putting the neural networks through its paces on a plethora of situations, Dactyl was able to adroitly acclimatize itself to varying physical scenarios, including the prospect of tied fingers, donning of rubber gloves, and pen perturbation.
A point worth mentioning here is that while OpenAI showed off Dactyl"s ability in rotating a cube last year, the level of dexterity and manipulation required to solve the venerable Rubik"s Cube cannot be discounted. Vis-à-vis the commendable feat, Dmitry Berenson, a specialist in machine manipulation at the University of Michigan remarked:
“This is a really hard problem. The kind of manipulation required to rotate the Rubik’s cube’s parts is actually much harder than to rotate a cube.”
Meanwhile, Leslie Kaelbling, a roboticist and professor at MIT, said:
“I was kind of amazed. It’s not a thing I would have imagined that they could have made to work.”
The researchers at OpenAI believe that the results from the experiment provide strong evidence that general-purpose robots that can adjust to varying conditions can be built using the same techniques in the future. Marcin Andrychowicz from OpenAI, envisaged: "I think this approach [reinforcement learning] is the approach to widespread adoption of robotics.”
At the same time, there are skeptics who are not entirely convinced whether reinforcement learning is the way to go for such robots in the future. Indeed, Berenson signaled towards more traditional methods when addressing the topic, "There will be some learning processes—probably reinforcement learning—at the end of the day. But I think that those actually should come much later."
The skepticism is rooted in the inherent nature of reinforcement learning—whether it will be able to adapt to doing multitudinous tasks as opposed to just mastering a single one. All in all, it is hard to tell the route that the future of robotics will take at this point. Only time will tell. Until then, we can savor the fascinating Dactyl solving Rubik"s Cubes.
For more information, you can read the blog post or study the team"s paper here.