Alireza Fathi
I am a research scientist / TLM at Google DeepMind. Before joining Google, I spent a couple of great years at Apple working on 3d computer vision. Before that I was a Postdoctoral Fellow in FeiFei Li's lab at Stanford. I received my Ph.D. degree from Georgia Institute of Technology, and my B.Sc. degree from Sharif University of Technology.
My areas of interest:
Multi-Modal Large Language Models
Generative Models
Neural Rendering
Egocentric Vision
3D Scene Understanding
Serving as an area chair for ICCV 2025, CVPR 2025, NeurIPS2024, ECCV2024, 3DV 2024, CVPR 2024, CVPR 2023, ECCV2022, CVPR 2022.
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi, Trevor Darrell, Cordelia Schmid
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi, David Ross, Cordelia Schmid, Dina Katabi, Xiuye Gu
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David Ross, Cordelia Schmid, Alireza Fathi
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron, Alireza Fathi, Cordelia Schmid, Ahmet Iscen
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid
AVIS: Autonomous Visual Information Seeking with Large Language Models
Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David Ross, Cordelia Schmid, Alireza Fathi
Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David Ross, Alireza Fathi
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Ahmet Iscen, Alireza Fathi, Cordelia Schmid
A Memory Transformer Network for Incremental Learning
Ahmet Iscen, Tom Bird, Mathilde Caron, Alireza Fathi, Cordelia Schmid
im2nerf: Image to Neural Radiance Field in the Wild
Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely, Alireza Fathi
Pre-Tram: Self-supervised Pre-training via Connecting Trajectory and Map
Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer ,Masayoshi Tomizuka, Alireza Fathi, Wei Zhan
Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser
Object-Centric Neural Scene Rendering
Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser
An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds
Rui Huang, Wanyue Zhang, Thomas Funkhouser, Abhijit Kundu, Caroline Pantofaru, David A Ross, Alireza Fathi
ECCV, 2020 PDF
Virtual Multi-view Fusion for 3D Semantic Segmentation
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David A Ross, Brian E Brewington, Thomas Funkhouser, Caroline Pantofaru
ECCV, 2020 PDF
Pillar-based Object Detection for Autonomous Driving
Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Tom Funkhouser, Justin Solomon
ECCV, 2020 PDF
DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes
Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Tom Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi
CVPR, 2020 PDF
3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation
Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner
CVPR, 2020 PDF
Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction
Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa
arXiv:1906.06792, 2019 PDF
Tracking emerges by colorizing video
Carl Vondrick, Abhinav Shrivistava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy
Instance embedding transfer to unsupervised video object segmentation
Siyang Li, Bryan Seybold, Alexey Vorobyov, Alireza Fathi, Qin Huang, C.-C. Jay Kuo
Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings
Semantic instance segmentation via deep metric learning
Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin Murphy
arXiv:1703.10277, 2017 PDF
Speed/accuracy trade-offs for modern convolutional object detectors
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy
CVPR 2017 Winner of The COCO Object Detection Challenge in 2016 PDF
Reasoning about Object Affordances in a Knowledge Base Representation
Yuke Zhu, Alireza Fathi, Li Fei-Fei
VideoSET: Video Summary Evaluation through Text
Serena Yeung, Alireza Fathi, Li Fei-Fei
arXiv:1406.5824 [cs.CV] PDF Project Page
Learning Descriptive Models of Objects and Activities from Egocentric Video
Alireza Fathi
Ph.D. Thesis, Georgia Institute of Technology PDF
Learning to Recognize Daily Actions using Gaze
Alireza Fathi, Yin Li, James M. Rehg
ECCV 2012 PDF, Project Page
Detecting Eye Contact using Wearable Eye-Tracking Glasses
Zhefan Ye, Yin Li, Alireza Fathi, Yi Han, Agata Rozga, Gergory D. Abowd, James M. Rehg
2nd Workshop on Pervasive Eye Tracking and Mobile Eye-based Interaction (in conjunction with UbiComp), 2012PDF
Social Interactions: A First-Person Perspective
Alireza Fathi, Jessica K. Hodgins, James M. Rehg
Combining Self Training and Active Learning for Video Segmentation
Alireza Fathi, Maria Florina Balcan, Xiaofeng Ren, James M. Rehg
Learning to Recognize Objects in Egocentric Activities
Alireza Fathi, Xiaofeng Ren, James M. Rehg
Voice Synthesis using the Generalized Pressure-Controlled Valve
Tamara Smyth, Alireza Fathi
International Computer Music Conference (ICMC), 2008 PDF
A Standard Workflow for Illumination-Invariant Image Extraction
Mark S. Drew, Muntaseer Salahuddin, Alireza Fathi
15th Color and Imaging Conference, 2007 PDF