With virtual assistants and robots becoming more intelligent every day, it’s easy to forget what sets us apart. Emotions, reason, our ability to make mistakes, love and care.

Whilst robots can display these characteristics, they are learning rather than feeling. Industry robots build our cars and laptops, rehabilitation robots help people walk again, machine teaching assistants can answer questions, machines can compose music. But they’re still not human.

In many cases, humans find their interactions with machines more enjoyable the more human they appear. For example, if you are conversing with a chatbot, it’s ability to process your language and colloquialisms and reply accordingly will make the conversation appear more personal and satisfying. Natural language processing (NLP) technologies have improved these bots drastically, and some virtual assistants can now converse on a near human scale. NLP has had a huge impact on home assistants such as Alexa and Cortana - you want to know the weather? All you need to do is ask - in whatever sentence structure you see fit.

But what about other forms of interaction?

Imagine if these assistants could see. How much time would it save you in those manic moments when you need to get out the door if  you could shout ‘Alexa, where are my keys?’ and have it direct you, or even better, guide you to the misplaced object that’s actually sitting in plain sight.

In order to do this, we need robots to be able to see in the same way that humans can. It’s all very well getting a computer to identify objects in an image, but if we want them to be able to read emotions, specific scenarios, and distinguish between similar objects in a frame, they will need a further layer of image processing.

At the Deep Learning Summit in Montreal, Sanja Fidler, Assistant Professor at the University of Toronto will explore how perceptual machines can see, converse and reason. She will be exploring computer vision and language and will discuss her research in human-centric computation. In order to teach a machine to learn and infer through complex images and videos we first need to understand how humans do this.

‘How can we understand long and complex data sources, say like movies, and make robots understand it as we do?’


Sanja also spoke about her research at the Deep Learning Summit in Boston earlier this year where she explained how she began her work in deep learning back in 2012 when Geoffrey Hinton, was pushing the boundaries of DL she was pulled into imaging and vision.

Geoffrey will also be presenting at the Summit in Montreal this October, where he will be joined by Yann LeCun and Yoshua Bengio. Together, the trio make up the Panel of Pioneers.


Sanja will present her most recent findings on this topic at the Deep Learning Summit in Montreal this October, and you can register now for Early Bird discounted passes.

Previously for a robot to analyse and understand the real world, several different systems were required to analyse vision and language independently, but now with one huge composite function, there is a joint framework that is much more efficient.

After her presentation in Boston, RE•WORK caught up with Sanja to hear about her current work. You can watch our interview here.

Sanja explained how she is pushing to teach robots to ‘make inferences and basically be able to understand any questions we might have just as humans do’. We want cameras to see like humans do, and then process the information accordingly.

Can't make it to Montreal?
We have a full calendar of events scheduled for 2018 including the Deep Learning Summit in San Francisco, and we are currently running a Summer Special Promotion, offering 25% off all Summits in 2018 *excluding dinners.
View our full calendar of events, and register now to join global AI leaders.