Archive Talks

Irfan Essa

Georgia Institute of Technology

September 10, 2015

Title: Data-Driven Methods for Video Analysis and Enhancement

Abstract: ▸

In this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras. I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube, and its extensions. (2) A robust and scaleable method for video segmentation.

I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use. Our method allows for video stabilization beyond the conventional filtering that only suppresses high frequency jitter. This method also supports removal of rolling shutter distortions common in modern CMOS cameras that capture the frame one scan-line at a time resulting in non-rigid image distortions such as shear and wobble. Our method does not rely on a-priori knowledge and works on video from any camera or on legacy footage. I will showcase examples of this approach and also discuss how this method is launched and running on YouTube, with Millions of users.

Then I will describe an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. This hierarchical approach generates high quality segmentations and we demonstrate the use of this segmentation as users interact with the video, enabling efficient annotation of objects within the video. I will also show some recent work on how this segmentation and annotation can be used to do dynamic scene understanding.

I will then follow up with some recent work on image and video analysis in the mobile domains. I will also make some observations about ubiquity of imaging and video in general and need for better tools for video analysis.

http://prof.irfanessa.com/

Sergi Rocamora

INRIA, Grenoble

September 8, 2015

Title: Bayesian Image-Based Rendering and Application to Stereoscopic Cinema and 3DTV

Abstract: ▸

Optics with long focal length have been extensively used for shooting 2D cinema and television, either to virtually get closer to the scene or to produce an aesthetical effect through the deformation of the perspective. However, in 3D cinema or television, the use of long focal length either creates a ``cardboard effect'' or causes visual divergence. To overcome this problem, state-of-the-art methods use disparity mapping techniques, which is a generalization of view interpolation, and generate new stereoscopic pairs from the two image sequences. We propose to use more than two cameras to solve for the remaining issues in disparity mapping methods. In the first part of the talk, we briefly review the causes of visual fatigue and visual discomfort when viewing a stereoscopic film. We model the depth perception from stereopsis of a 3D scene shot with two cameras, and projected in a movie theater or on a 3DTV. We mathematically characterize this 3D distortion, and derive the mathematical constraints associated with the causes of visual fatigue and discomfort. We illustrate these 3D distortions with a new interactive software, ``The Virtual Projection Room". In order to generate the desired stereoscopic images, we propose to use image-based rendering. These techniques usually proceed in two stages. First, the input images are warped into the target view, and then the warped images are blended together. The warps are usually computed with the help of a geometric proxy (either implicit or explicit). Image blending has been extensively addressed in the literature and a few heuristics have proven to achieve very good performance. Yet the combination of the heuristics is not straightforward, and requires manual adjustment of many parameters. We present a new Bayesian approach to the problem of novel view synthesis, based on a generative model taking into account the uncertainty of the image warps in the image formation model. The Bayesian formalism allows us to deduce the energy of the generative model and to compute the desired images as the Maximum a Posteriori estimate. The method outperforms state-of-the-art image-based rendering techniques on challenging datasets. Moreover, the energy equations provide a formalization of the heuristics widely used inimage-based rendering techniques. Besides, the proposed generative model also addresses the problem of super-resolution, allowing to render images at a higher resolution than the initial ones. In the last part of the presentation, we apply the new rendering technique to the case of the stereoscopic zoom.

http://sergipujades.free.fr

Darren Cosker

University of Bath

September 2, 2015

Title: Applying Computer Vision and Graphics Research in Visual Effects and Entertainment

Abstract: ▸

http://www.cs.bath.ac.uk/~dpc/

Bojan Pepik

Max Planck Institute for Informatics

September 1, 2015

Title: Towards Richer Object Representations for Object Class Detection in Real World Images

Abstract: ▸

https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/people/bojan-pepik/

Luca del Pero

University of Edinburgh

August 26, 2015

Title: Articulated motion discovery using pairs of trajectories

Abstract: ▸

http://vision.sista.arizona.edu/delpero/

Garrett Stanley

Department of Biomedical Engineering Georgia Institute of Technology and Emory University

July 10, 2015

Title: Reading and Writing the Neural Code: Challenges in Neuroengineering

Abstract: ▸

http://stanley.gatech.edu/

Trevor Darrell

UC Berkeley

July 3, 2015

Title: Perceptual representation learning across diverse modalities and domains

Abstract: ▸

http://www.eecs.berkeley.edu/~trevor/

Hans-Peter Seidel

MPI-Informatik Saarbruecken

May 18, 2015

Title: 3D Image Analysis and Synthesis -- The World inside the Computer

Abstract: ▸

http://people.mpi-inf.mpg.de/~hpseidel/

Andrea Vedaldi

Oxford University

May 4, 2015

Title: Learning and understanding visual representations

Abstract: ▸

http://www.robots.ox.ac.uk/~vedaldi/

Cristian Sminchisescu

Lund University

March 24, 2015

Title: From Perceptual Evidence to Large-Scale Visual Recognition Models

Abstract: ▸

Recent progress in computer-based visual recognition heavily relies on machine learning methods trained using large scale annotated datasets. While such data has made advances in model design and evaluation possible, it does not necessarily provide insights or constraints into those intermediate levels of computation, or deep structure, perceived as ultimately necessary in order to design reliable computer vision systems. This is noticeable in the accuracy of state of the art systems trained with such annotations, which still lag behind human performance in similar tasks. Nor does the existing data makes it immediately possible to exploit insights from a working system - the human eye - to derive potentially better features, models or algorithms. In this talk I will present a mix of perceptual and computational insights resulted from the analysis of large-scale human eye movement and 3d body motion capture datasets, collected in the context of visual recognition tasks (Human3.6M available at http://vision.imar.ro/human3.6m/, and Actions in the Eye available at http://vision.imar.ro/eyetracking/). I will show that attention models (fixation detectors, scan-paths estimators, weakly supervised object detector response functions and search strategies) can be learned from human eye movement data, and can produce state of the art results when used in end-to-end automatic visual recognition systems. I will also describe recent work in large-scale human pose estimation, showing the feasibility of pixel-level body part labeling from RGB, and towards promising 2D and 3D human pose estimation results in monocular images.In this context, I will discuss perceptual, perhaps surprising recent quantitative experiments, revealing that humans may not be significantly better than computers at perceiving 3D articulated poses from monocular images. Such findings may challenge established definitions of computer vision `tasks' and their expected levels of performance.

http://www.maths.lth.se/people/math-csu/