Paper discussion blog for COMP 150DL at Tufts University
Friday, April 21, 2017
Tues April 25: Learning to Generate Chairs
Learning to Generate Chairs, Tables and Cars with Convolutional Networks. Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox. CVPR 2015.
Wednesday, April 19, 2017
Thurs April 20: Image-to-Image Translation
Image-to-Image Translation with Conditional Adversarial Networks Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros
Sunday, April 16, 2017
Tues. April 18: Hierarchical Question-Image Co-Attention for Visual Question Answering
Hierarchical Question-Image Co-Attention for Visual Question Answering Jiasen Lu, Jianwei Yang, Dhruv Batra , Devi Parikh, NIPS 2016.
Wednesday, April 12, 2017
Thurs. April 13: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015.
Sunday, April 9, 2017
Tues. April 11: Using very deep autoencoders for content-based image retrieval.
Using very deep autoencoders for content-based image retrieval. Krizhevsky, Alex, and Geoffrey E. Hinton. ESANN. 2011.
Saturday, March 25, 2017
Tues. March 28: Understanding Deep Image Representations by Inverting Them
Understanding Deep Image Representations by Inverting Them. Aravindh Mahendran, Andrea Vedaldi. CVPR 2015.
Wednesday, March 15, 2017
Thurs, March 16:Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions. Andrej Karpathy and Li Fei-Fei. CVPR 2015.
Thursday, March 9, 2017
Tues, March 14: CNNs for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation. Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015.
Wednesday, March 8, 2017
Thurs, March 9: Single Shot Multi-box Detector
SSD: Single shot multibox detector. Liu, Wei, et al. ECCV 2016.
Sunday, March 5, 2017
Tues, March 9: You Only Look Once
You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. arXiv 2016.
Sunday, February 19, 2017
Tues Feb 21: AlexNet
ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012.
Tuesday, February 7, 2017
Tues, Feb 14: Batch Normalization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe, Christian Szegedy, arXiv, 2015.
Please comment with your summaries and questions.
Please comment with your summaries and questions.
Monday, January 9, 2017
Example discussion -- Rich Intrinsic Image Decomposition of Outdoor Scenes from Multiple Views
I've left this discussion of a paper from Spring 2013 as an example. This paper is not on the syllabus for this year. Notice that both example responses contain a Summary of the paper and several Discussion questions that the student would like to ask in class.Rich Intrinsic Image Decomposition of Outdoor Scenes from Multiple Views. Pierre-Yves Laffont, Adrien Bousseau, George Drettakis. TVCG 2013.
Project page.
Discussion:
How consistent do the images used in these multi-image techniques have to be for them to work correctly? Do the objects need to be static? Does the photographer need to be more or less revolving around some scene center where the camera is pointed? Or is it robust enough to handle more general movements? Can the reverse be used where the camera remains in a fixed position but rotates, creating a panorama that is stitched together (and decomposing it into intrinsic images)?
Discussion:
In section 6 it is mentioned that the sun visibility estimation algorithm assumes that the scene is composed of a sparse set of reflectances shared by multiple points. Does this mean that the algorithm might not be ideal for scenes that have a more diverse set of reflectances (though I guess only indoor scenes tend to have a richer set of reflectances, which is not the focus of this paper anyway)? Also, is there a parameter that can be tuned to adjust the desired degree of sparsity?
Project page.
Example Student Summary #1
This paper presents a new method for decomposing outdoor scenes into intrinsic images. This paper differs from previous papers by further decomposing the illumination component into components for illumination from the sun, the sky, and other scene objects (indirect illumination). Using multiple images of the scene, a sparse 3D point cloud is constructed. The reflectance and sun illumination for each point is learned using mean shift iterations which optimize the energies of regions of influence over candidate reflectance curves of the points. The sky and indirect illumination components are estimated by sending rays out from the 3D points and seeing which rays hit the sky and other objects (and contributing radiance to those illumination components.) The algorithm is effective for estimating the intrinsic images of rich outdoor scenes and allowing for their manipulation while keeping the scene consistent. It is limited by the need for a reasonably accurate estimation of the direction to the sun and its need for a reflective sphere (the sky) to capture an environment map.Discussion:
How consistent do the images used in these multi-image techniques have to be for them to work correctly? Do the objects need to be static? Does the photographer need to be more or less revolving around some scene center where the camera is pointed? Or is it robust enough to handle more general movements? Can the reverse be used where the camera remains in a fixed position but rotates, creating a panorama that is stitched together (and decomposing it into intrinsic images)?
Example Student Summary #2
This paper presents a pipeline for decomposing intrinsic images using multiple photos of the same scene captured at the same time, and the pipeline is able to decompose the illumination layer of the image further into sun, sky and indirect lighting layers. As inputs to the pipeline, the user needs to capture and provide a set of LDR photos from different viewpoints, two HDR images of the front and side of the reflective sphere, and HDR images of the viewpoints that need to be decomposed. The pipeline starts by generating a point cloud reconstruction of the scene, a approximate geometric proxy o the scene, the direction and radiance of the sun, and a HDR environment map containing the sky and distant indirect radiance. The geometric proxy is then used to compute sky illumination, indirect illumination, and approximate sun visibility for each point, but a more refined estimation of sun visibility needs to be computed by forming curves of candidate reflectances in the color space and finding their intersections, and this estimation algorithm is a key contribution of this work. After having illuminations for the initial set of points, the illuminations are propagated to the entire scene using a method similar to Bousseau et al.'s method for propagating user specified constraints, and finally the three illumination layers are separated using two successive matting procedures.Discussion:
In section 6 it is mentioned that the sun visibility estimation algorithm assumes that the scene is composed of a sparse set of reflectances shared by multiple points. Does this mean that the algorithm might not be ideal for scenes that have a more diverse set of reflectances (though I guess only indoor scenes tend to have a richer set of reflectances, which is not the focus of this paper anyway)? Also, is there a parameter that can be tuned to adjust the desired degree of sparsity?
Subscribe to:
Posts (Atom)