Limited Correspondence in Visual Representation between the Human Brain and Convolutional Neural Networks

21 October 2021 

Yaoda Xu, Psychology Department, Yale University

Convolutional neural networks (CNNs) have been recently used to model human vision due to their high object classification capabilities and their general correspondence to human visual processing regions. Here I compared the responses of 14 different CNNs with human fMRI responses to natural and artificial object images. By examining the representational structures, the coding of different types of visual features, and the formation of transformation-tolerant visual object representations in these two types of visual processing systems, I found some similarities as well as some large differences between the two. These results indicate that there likely exist some fundamental differences in how the human brain and CNNs represent visual objects.

Short Bio:  Yaoda Xu received her Ph.D. from MIT and has held postdoctoral and faculty positions at Harvard and Yale. She is presently a senior research scientist at Yale. Using fMRI measures on the human brain, she has documented the involvement of the human posterior parietal cortex in supporting adaptive and online visual information processing critical to attention and visual working memory. She is also interested in the nature of visual representation in the human occipito-temporal cortex. Her most recent work evaluates CNN modeling as a viable scientific method to understand human vision and whether there are fundamental differences in visual processing between the brain and CNNs that would limit CNN modeling as a shortcut to understand human vision.