FLAG-4D: Flow Guided Local-global Dual-deformation Model for 4D Reconstruction

Abstract

We introduce FLAG-4D, a novel framework for generating novel views of dynamic scenes by reconstructing how 3D Gaussian primitives evolve through space and time. Existing methods typically rely on a single Multilayer Perceptron(MLP) to model temporal deformations, and they often struggle to capture complex point motions and fine-grained dynamic details consistently over time, especially from sparse input views.

Our approach, FLAG-4D overcomes this by employing a dual-deformation network that dynamically warps a canonical set of 3D Gaussians over time into new positions and anisotropic shapes. This dual-deformation network consists of an Instantaneous Deformation Network (IDN) for modeling fine-grained, local deformations, and Global Motion Network (GMN) for capturing long-range dynamics, refined via mutual learning.

To ensure these deformations are both accurate and temporally smooth, FLAG-4D incorporates dense motion features from a pretrained optical flow backbone. We fuse these motion cues from adjacent timeframes and use a deformation-guided attention mechanism to align this flow information with the current state of each evolving 3D Gaussian. Extensive experiments demonstrate that FLAG-4D achieves higher-fidelity and more temporally coherent reconstructions with finer detail preservation than state-of-the-art methods.

Framework

Figure 2: FLAG-4D Methodology: Our dual-deformation framework for 4D reconstruction. Top: The overall pipeline: A monocular video sequence is used to generate an initial SfM point cloud, from which a canonical set of 3D Gaussians at t=0 is derived. The Dual-Deformation Network consists of an Instantaneous Deformation Network (IDN) and a Global Motion Network (GMN), which are trained synergistically via Mutual Learning. Bottom Left: The IDN processes the canonical Gaussians and a window of future-oriented time embeddings to produce a hypothesized local deformation. Bottom Right: The GMN integrates this local deformation hypothesis (as Query) with fused optical flow embeddings (as Key/Value) via a cross-attention mechanism, producing the final globally consistent deformation.

Qualitative Comparisons

Scene: 3D Printer

Ours

SC-GS

DeformableGS

D-MiSo

Scene: Chicken

Ours

SC-GS

DeformableGS

D-MiSo

NeRF-DS Dataset

Figure 4: Qualitative comparisons on NeRF-DS dataset. Our method captures finer details and preserves complex structures better than baselines like SC-GS, D-MiSo, and 4DGS.

Table 1: Quantitative comparison on NeRF-DS (Yan, Li, and Lee 2023) dataset. Mean performance across all scenes.

Ablation Studies

Detailed ablation studies demonstrating the effectiveness of our dual-deformation network and flow-guided attention mechanism.

BibTeX

@article{tan2026flag4d,
  author    = {Tan, Guan Yuan and Vu, Ngoc Tuan and Pal, Arghya and Rajanala, Sailaja and Phan, Raphael CW and Srinivas, Mettu and Ting, Chee-Ming},
  title     = {FLAG-4D: Flow Guided Local-global Dual-deformation Model for 4D Reconstruction},
  journal   = {AAAI},
  year      = {2026},
}

FLAG-4D: Flow Guided Local-global Dual-deformation Model for 4D Reconstruction

FLAG-4D achieves high-fidelity and temporally coherent 4D reconstruction by employing a dual-deformation network guided by dense optical flow.

Abstract

Framework

Qualitative Comparisons

Scene: 3D Printer

Ours

SC-GS

DeformableGS

D-MiSo

Scene: Chicken

Ours

SC-GS

DeformableGS

D-MiSo

NeRF-DS Dataset

Ablation Studies

BibTeX