Preview Mode Links will not work in preview mode

Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders.

Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader.

Mar 18, 2020

Today we’re joined by Stefan Lee, assistant professor at the school of electrical engineering and computer science at Oregon State University. Stefan, who we sat down with at NeurIPS this past winter, is focused on the development of agents that can perceive their environment and communicate their understanding with humans in order to coordinate their actions to achieve mutual goals. 

In our conversation, we focus on his paper ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, a model for learning joint representations of image content and natural language. We talk through the development and training process for this model, the adaptation of the training process to incorporate additional visual information to BERT models, where this research leads from the perspective of integration between visual and language tasks and finally, we discuss the importance of visual grounding.

Check out the complete show notes page at