Ekta Prashnani

Ekta Prashnani

Senior Research Scientist · NVIDIA Research

I work on algorithms for verifying the trustworthiness of AI-generated content, multimodal machine learning, and computational genomics.

Some of my recent notable research outcomes include avatar fingerprinting for authorized use of synthetic avatars, phase-based deepfake detection, and collaborative efforts on synthetic video detection and computational genomics. During my Ph.D. at UC Santa Barbara, I worked on data-driven methods for evaluating the perceptual quality and authenticity of visual media.

Updates

  • Two papers at NeurIPS 2025 with NVIDIA colleagues and academic collaborators: Seeing What Matters (generalizable AI-video detection) and Unmasking Puppeteers (against puppeteering in AI videoconferencing).
  • Preprint “Fluctuation structure predicts genome-wide perturbation outcomes” (CIPHER) on Research Square, with the Goyal Lab at Northwestern University (code).
  • Avatar Fingerprinting accepted at ECCV (video). Also checkout the NVFAIR benchmark dataset here.
  • PhaseForensics (a frequency-based generalizable deepfake detector) accepted at IEEE Transactions on Image Processing.
  • Avatar Fingerprinting -- the pioneering work on verifying authorized use of synthetic talking-head videos -- is now available on arXiv.
  • Joined NVIDIA Research (Human Performance and Experience) as a Research Scientist.
  • Graduated with my Ph.D. in Electrical & Computer Engineering at UC Santa Barbara, where my research focus was on computer vision for media quality and authenticity. Dissertation: Data-driven Methods for Evaluating the Perceptual Quality and Authenticity of Visual Media.

Publications

Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation

Advances in Neural Information Processing Systems, 2025

Riccardo Corvi, Davide Cozzolino, Ekta Prashnani, Shalini De Mello, Koki Nagano, Luisa Verdoliva

This work improves the generalization of synthetic video detectors by training them to focus on intrinsic low-level artifacts shared across generative models, rather than model-specific semantic flaws. Using a forensic-oriented augmentation strategy based on wavelet decomposition, the method achieves strong cross-model detection performance even when trained on videos from only a single generator.

Teaser: Seeing What Matters

Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

Advances in Neural Information Processing Systems, 2025

Danial Samadi Vahdati, Tai Duc Nguyen, Ekta Prashnani, Koki Nagano, David Luebke, Orazio Gallo, Matthew Stamm

This work addresses real-time puppeteering attacks in AI-based talking-head videoconferencing by detecting identity swaps directly from the transmitted pose-expression latent, without relying on reconstructed video. It introduces a pose-conditioned contrastive encoder that isolates persistent biometric cues from the latent, enabling accurate, real-time detection and strong generalization across models and out-of-distribution scenarios.

Teaser: Unmasking Puppeteers

Fluctuation structure predicts genome-wide perturbation outcomes

Research Square (preprint), 2025

Benjamin Kuznets-Speck, Leon Schwartz, Hanxiao Sun, Madeline E. Melzer, Nitu Kumari, Benjamin Haley, Ekta Prashnani, Suriyanarayanan Vaikuntanathan, Yogesh Goyal

CIPHER predicts transcriptome-wide effects of single-cell perturbations by using gene co-fluctuation patterns in unperturbed cells, showing that baseline covariance structure contains substantial information about how cells respond to the genetic perturbation. Across large-scale datasets, it accurately recovers single and double perturbation responses, outperforms standard differential expression methods, and offers an interpretable, theory-grounded framework for functional genomics.

Teaser: CIPHER

Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos

European Conference on Computer Vision, 2024

Ekta Prashnani, Koki Nagano, Shalini De Mello, David Luebke, Orazio Gallo

This work introduces avatar fingerprinting, a method for verifying whether a synthetic talking avatar is using someone’s identity without consent by identifying the person driving its expressions rather than the facial appearance being shown. It also presents the large-scale NVFAIR dataset for the task and demonstrates strong performance, including generalization to previously unseen avatar generation models.

Teaser: Avatar Fingerprinting

Generalizable Deepfake Detection with Phase-Based Motion Analysis

IEEE Transactions on Image Processing, 2024

Ekta Prashnani, Michael Goebel, B. S. Manjunath

PhaseForensics detects DeepFake videos by modeling facial temporal dynamics through a phase-based motion representation, which is more robust than conventional pixel- or landmark-based temporal features. By leveraging temporal phase variations in band-pass facial components, it achieves stronger cross-dataset generalization, improved robustness to distortions and adversarial attacks, and state-of-the-art performance on challenging benchmarks like CelebDFv2.

Teaser: PhaseForensics

LOCL: Learning Object-Attribute Composition using Localization

British Machine Vision Conference (BMVC), 2022

Satish Kumar, ASM Iftekhar, Ekta Prashnani, B. S. Manjunath

LOCL tackles compositional zero-shot learning in realistic, cluttered scenes by using a modular, weakly supervised approach to localize both objects and their attributes before classifying their composition. This localization-driven design substantially improves generalization to unseen object-attribute pairings and boosts performance over prior methods on challenging datasets.

Teaser: LOCL

Noise-Aware Video Saliency Prediction

British Machine Vision Conference, 2021

Ekta Prashnani, Orazio Gallo, Joohwan Kim, Josef Spjut, Pradeep Sen, Iuri Frosio

This work improves video saliency prediction by introducing a noise-aware training framework that accounts for frame-specific uncertainty in gaze-derived saliency maps, helping prevent overfitting to noisy supervision. It is particularly effective when observer data is limited, and is supported by a new video game saliency dataset with rich temporal structure and multiple gaze attractors per frame.

Overview: noise-aware video saliency

PieAPP: Perceptual Image-Error Assessment through Pairwise Preference

Computer Vision and Pattern Recognition, 2018

Ekta Prashnani*, Herbert (Hong) Cai*, Yasamin Mostofi, Pradeep Sen

* Joint first authors

PieAPP is a learning-based perceptual image quality metric that predicts visual differences in a way that closely matches human judgment, without requiring humans to assign explicit error scores. It is trained instead on large-scale pairwise human preferences between distorted images and significantly outperforms prior methods, while also generalizing well to unseen distortions.

Teaser: PieAPP

A Phase-Based Approach for Animating Images Using Video Examples

Computer Graphics Forum, August 2016, Volume 36, Issue 6

Ekta Prashnani, Maneli Noorkami, Daniel Vaquero, Pradeep Sen

This work introduces a phase-based method for animating still images with subtle stochastic motion, such as rippling water or swaying trees, by transferring motion patterns from example videos of similar scenes. By using phase variations in a complex steerable pyramid rather than optical flow, the approach produces more robust and visually effective animations with fewer artifacts.

Teaser: Phase-based image animation