Why is real world object recognition so challenging?
As humans, we are able to understand and perceive the real world instinctively with our senses. When we think about the future of AI, we expect it to be able to recognise objects much in the same way we can; in real time. Understanding these objects involves a variety of factors such as registering their physical locations as well as different relationships between them. Once objects can be understood by machines in this context, we will be able to overcome a multitude of problems we face every day from assisting blind people with their navigation through headphones, to allowing us to identify something that we may be unable to put into a search engine. Imagine, for example, being out in the countryside and coming across a plant that you can't identify - your smart phone could, in theory, scan the object and deliver an accurate output.
This level of understanding however, is much deeper than a 3D reconstruction and camera pose and to date has heavily relied on supervised training, which is time consuming, expensive, and limited.
OpenAI are currently working to overcome this by using photo-realistic simulations and computer graphics to provide the necessary data to help to get a better understanding of the real world. At the London edition of the Global Deep Learning Summit Series this September 21 & 22 we will hear from Ankur Handa, research scientist at OpenAI who will discuss understanding real world with synthetic data in further detail. I spoke to Ankur ahead of his talk to find out more about his work.
Can you give me an overview of your work at OpenAI?
I work with the robotics team looking mainly into making robots learn visuomotor skills to perform a variety of tasks ranging from picking a block to learning to assemble things. We, as humans, perform such tasks unconsciously on a daily basis but ironically the seemingly trivial skills required to perform such tasks are very hard to reverse engineer and acquire for robots.
What started your work in AI and deep learning?
It was during my undergrad at IIIT-Hyderabad when I took a project on building a robot that could navigate using vision and I was lucky to be given the opportunity to build the robot from scratch with all the hardware. Then I obtained a PhD in robotvision at Imperial College and later did my post-doctoral research in machine learning, particularly looking at deep learning in Cambridge University. I guess it all happened quite organically from my undergrad that's where it all started and I did SLAM (simultaneous localisation and mapping) for robotvision in my PhD and worked on deep learning during my post-doc.
What challenges are you currently facing in your work, specifically with object recognition?
Robotics requires lots of different things to come together e.g. hardware is very expensive and there isn't much labelled data. Moreover, the test environments are also interactive which means if you haven't seen something in your training it can quickly lead to catastrophic failures. Robots that need to work in household environments will have humans around so the design of algorithms must factor in such assumptions carefully to avoid any destructive effects.
Learning from human demonstrations is what we are looking at when collecting datasets where a human provides some sample demonstrations of how a task should be executed. Number of demonstration are limited so the challenge is to learn from limited data. Simulations, on the other hand can provide virtually infinite data and variations but transferring the learning to real world provides additional challenge. Whilst we have seen huge progress in computer vision with learning based approaches in the past few years but robotics hasn't seen that trickle down as much e.g. tasks that require some dexterity are just not possible yet with purely data driven and learning based approach just yet so I think 'horses for courses' approach still works better i.e. a combination of learning and traditional robotics methods.
How do you see AI progressing in the next 5 years, and what other industries do you see it positively disrupting?
I am 100% right only 50% of the time when it comes to making predictions so take them with a pinch of salt. For me beyond 2 years is always hard to predict e.g. I didn't think we'd see super-human performance on ImageNet within only three years of the first paper on CNNs that came out from Hinton's group at Toronto. I am particularly excited at the progress in robotics so I do hope we make significant strides forward there, particularly, in generalisation of skills across different tasks and dexterity. Overall, I hope healthcare, public sector and government are affected positively as these are the places where most often important decisions are made and it would be ideal if there is a technology that can provide more insights into making an informed decision than a human can. It is one thing to write a good paper that remains within the confines of academic circles and it is another when the technology transfers to the wider audience and that takes time but I'm cautiously optimistic.
Hear more from OpenAI at the Deep Learning Summit in London, September 21 & 22.
Other confirmed speakers include Irina Higgins & Jörg Bornschein, Research Scientists, DeepMind, Shubho Sengupta, AI Researcher, Facebook AI Research (FAIR), Ed Newton-Rex, Founder & CEO, Jukedeck, Christopher Bonnett, Senior Machine Learning Researcher, alpha-i, Andrew Tulloch, Research Engineer, Facebook.