Bio
Education
Experience
Publications
Contact
Publications
Type
Date
2025
2024
2023
2022
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Abstract We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary input prompts.
Yushu Wu
,
Zhixing Zhang
,
Yanyu Li
,
Yanwu Xu
,
Anil Kag
,
Yang Sui
,
Huseyin Coskun
,
Ke Ma
,
Aleksei Lebedev
,
Ju Hu
,
Dimitris Metaxas
,
Yanzhi Wang
,
Sergey Tulyakov
,
Jian Ren
SF-V: Single Forward Video Generation Model
Abstract Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs.
Zhixing Zhang
,
Yanyu Li
,
Yushu Wu
,
Yanwu Xu
,
Anil Kag
,
Ivan Skorokhodov
,
Willi Menapace
,
Aliaksandr Siarohin
,
Junli Cao
,
Dimitris Metaxas
,
Sergey Tulyakov
,
Jian Ren
AVID: Any-Length Video Inpainting with Diffusion Model
Abstract Recent advances in diffusion models have successfully enabled text-guided image inpainting. While it seems straightforward to extend such editing capability into video domain, there has been fewer works regarding text-guided video inpainting.
Zhixing Zhang
,
Bichen Wu
,
Xiaoyan Wang
,
Yaqiao Luo
,
Luxin Zhang
,
Yinan Zhao
,
Peter Vajda
,
Dimitris Metaxas
,
Licheng Yu
OmniLabel: A Challenging Benchmark for Language-Based Object Detection
Abstract Language-based object detection is a promising direction towards building a natural interface to describe objects in images that goes far beyond plain category names. While recent methods show great progress in that direction, proper evaluation is lacking.
Samuel Schulter
,
Vijay Kumar.B.G
,
Yumin Suh
,
Konstantinos M. Dafnis
,
Zhixing Zhang
,
Shiyu Zhao
,
Dimitris Metaxas
SINE: Single Image Editing with Text-to-Image Diffusion Models
Abstract Recent works on diffusion models have demonstrated a strong capability for conditioning image generation, e.g., text-guided image synthesis. Such success inspires many efforts trying to use large-scale pre-trained diffusion models for tackling a challenging problem–real image editing.
Zhixing Zhang
,
Ligong Han
,
Arnab Ghosh
,
Dimitris Metaxas
,
Jian Ren
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Abstract Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale.
Shiyu Zhao
,
Zhixing Zhang
,
Samuel Schulter
,
Long Zhao
,
Vijay Kumar.B.G
,
Anastasis Stathopoulos
,
Manmohan Chandraker
,
Dimitris Metaxas
Global Matching with Overlapping Attention for Optical Flow Estimation
Abstract Optical flow estimation is a fundamental task in computer vision. Recent direct-regression methods using deep neural networks achieve remarkable performance improvement. However, they do not explicitly capture long-term motion correspondences and thus cannot handle large motions effectively.
Shiyu Zhao
,
Long Zhao
,
Zhixing Zhang
,
Enyu Zhou
,
Dimitris Metaxas
Cite
×