About Fundamentals of Vision

Michael Herzog, Professor for psychophysics at the Brain Mind Institute (BMI) at the EPFL in Lausanne (Switzerland)

Thursday 24 June 2021 | 13:00 BST


Vision is usually explained by (feedforward) models, where visual features are analyzed in a hierarchical fashion starting with simple, but fine-grained, feature analysis (V1). Higher visual areas pool information from lower ones to detect increasingly complex features, losing information in the process. Results of psychophysical studies are typically explained within this framework, and convolutional networks are thought to be good models of this processing. For example, in crowding, human perception of a target is hindered by nearby elements because, as proposed, responses of neurons coding for nearby elements are pooled. A clear-cut prediction is that adding additional elements can only further impair performance. However, as I will show, this prediction is strongly and systematically countered by many psychophysical experiments and similar results are found in paradigms different from crowding. It seems we need to rethink the fundamentals of the standard models of vision. I will present EEG and fMRI studies, which further support this conclusion.