Zhuo Zheng
Postdoctoral Scholar, Computer Science
Bio
My research interests are Earth Vision and AI4Earth, especially multi-modal and multi-temporal remote sensing image analysis and their real-world applications.
First-author representative works:
- Our Change family: ChangeStar (single-temporal learning, ICCV 2021), ChangeMask (many-to-many architecture, ISPRS P&RS 2022), ChangeOS (one-to-many architecture, RSE 2021), Changen (generative change modeling, ICCV 2023)
- Geospatial object segmentation: FarSeg (CVPR 2020) and FarSeg++ (TPAMI 2023), LoveDA dataset (NeurIPS Datasets and Benchmark 2021)
- Missing-modality all weather mapping: Deep Multisensory Learning (first work on this topic, ISPRS P&RS 2021)
- Hyperspectral image classification: FPGA (first fully end-to-end patch-free method for HSI, TGRS 2020)
All Publications
-
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
2025; 47 (2): 725-741
Abstract
Our understanding of the temporal dynamics of the Earth's surface has been significantly advanced by deep vision models, which often require a massive amount of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present scalable multi-temporal change data generators based on generative models, which are cheap and automatic, alleviating these data problems. Our main idea is to simulate a stochastic change process over time. We describe the stochastic change process as a probabilistic graphical model, namely the generative probabilistic change model (GPCM), which factorizes the complex simulation problem into two more tractable sub-problems, i.e., condition-level change event simulation and image-level semantic change synthesis. To solve these two problems, we present Changen2, a GPCM implemented with a resolution-scalable diffusion transformer which can generate time series of remote sensing images and corresponding semantic and change labels from labeled and even unlabeled single-temporal images. Changen2 is a "generative change foundation model" that can be trained at scale via self-supervision, and is capable of producing change supervisory signals from unlabeled single-temporal images. Unlike existing "foundation models", our generative change foundation model synthesizes change data to train task-specific foundation models for change detection. The resulting model possesses inherent zero-shot change detection capabilities and excellent transferability. Comprehensive experiments suggest Changen2 has superior spatiotemporal scalability in data generation, e.g., Changen2 model trained on 256 2 pixel single-temporal images can yield time series of any length and resolutions of 1,024 2 pixels. Changen2 pre-trained models exhibit superior zero-shot performance (narrowing the performance gap to 3% on LEVIR-CD and approximately 10% on both S2Looking and SECOND, compared to fully supervised counterpart) and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment. The model and datasets are available at https://github.com/Z-Zheng/pytorch-change-models.
View details for DOI 10.1109/TPAMI.2024.3475824
View details for Web of Science ID 001395340500042
View details for PubMedID 39388323
-
Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection
INTERNATIONAL JOURNAL OF COMPUTER VISION
2024
View details for DOI 10.1007/s11263-024-02141-4
View details for Web of Science ID 001250359600001
-
FarSeg++: Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery.
IEEE transactions on pattern analysis and machine intelligence
2023; PP
Abstract
Geospatial object segmentation, a fundamental Earth vision task, always suffers from scale variation, the larger intra-class variance of background, and foreground-background imbalance in high spatial resolution (HSR) remote sensing imagery. Generic semantic segmentation methods mainly focus on the scale variation in natural scenarios. However, the other two problems are insufficiently considered in large area Earth observation scenarios. In this paper, we propose a foreground-aware relation network (FarSeg++) from the perspectives of relation-based, optimization-based, and objectness-based foreground modeling, alleviating the above two problems. From the perspective of the relations, the foreground-scene relation module improves the discrimination of the foreground features via the foreground-correlated contexts associated with the object-scene relation. From the perspective of optimization, foreground-aware optimization is proposed to focus on foreground examples and hard examples of the background during training to achieve a balanced optimization. Besides, from the perspective of objectness, a foreground-aware decoder is proposed to improve the objectness representation, alleviating the objectness prediction problem that is the main bottleneck revealed by an empirical upper bound analysis. We also introduce a new large-scale high-resolution urban vehicle segmentation dataset to verify the effectiveness of the proposed method and push the development of objectness prediction further forward. The experimental results suggest that FarSeg++ is superior to the state-of-the-art generic semantic segmentation methods and can achieve a better trade-off between speed and accuracy. The code and model are available at: https://github.com/Z-Zheng/FarSeg.
View details for DOI 10.1109/TPAMI.2023.3296757
View details for PubMedID 37467086
-
Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process
IEEE COMPUTER SOC. 2023: 21761-21770
View details for DOI 10.1109/ICCV51070.2023.01994
View details for Web of Science ID 001169500506036
-
ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2022; 183: 228-239
View details for DOI 10.1016/j.isprsjprs.2021.10.015
View details for Web of Science ID 000782582900002
-
Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters
REMOTE SENSING OF ENVIRONMENT
2021; 265
View details for DOI 10.1016/j.rse.2021.112636
View details for Web of Science ID 000697024400003
-
Deep multisensor learning for missing-modality all-weather mapping
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2021; 174: 254-264
View details for DOI 10.1016/j.isprsjprs.2020.12.009
View details for Web of Science ID 000640987800017
-
Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery
IEEE. 2021: 15173-15182
View details for DOI 10.1109/ICCV48922.2021.01491
View details for Web of Science ID 000798743205035
-
FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2020; 58 (8): 5612-5626
View details for DOI 10.1109/TGRS.2020.2967821
View details for Web of Science ID 000552371900031
-
HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2020; 166: 1-14
View details for DOI 10.1016/j.isprsjprs.2020.04.019
View details for Web of Science ID 000551268300001
-
Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery
IEEE. 2020: 4095-4104
View details for DOI 10.1109/CVPR42600.2020.00415
View details for Web of Science ID 000620679504037
-
Remote sensing intelligent interpretation brain: Real-time intelligent understanding of the Earth.
PNAS nexus
2025; 4 (6): pgaf182
Abstract
The large-scale understanding of nature and human activities in real time cannot be separated from Earth observation. Existing monitoring techniques, however, rely primarily on offline processing, with a separation between software and hardware in collection, processing and transmission processes. This limits the ability and timeliness in response to emergency tasks such as disaster relief and nighttime rescue. Our brain can process real-time information across different scales and modalities through perception, cognition, transmission, and decision-making, to take informed actions rapidly. Such an intelligent ability inspires us to establish a novel remote sensing intelligent interpretation brain (RSI2_Brain), by combining multimodal data processing, network transmission and communication on-the-fly, to demonstrate new understanding of the Earth. The RSI2_Brain can operate as online acquisition, real-time processing and transmission with low computational power and communication blocking constraints. It, therefore, has practical utility and wide applicability in extremely harsh conditions, providing all-day and online response automatically.
View details for DOI 10.1093/pnasnexus/pgaf182
View details for PubMedID 40501452
View details for PubMedCentralID PMC12152476
-
Learning Temporal Consistency for High Spatial Resolution Remote Sensing Imagery Semantic Change Detection
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2025; 63
View details for DOI 10.1109/TGRS.2025.3561021
View details for Web of Science ID 001488488600002
-
Towards transferable building damage assessment via unsupervised single-temporal change adaptation
REMOTE SENSING OF ENVIRONMENT
2024; 315
View details for DOI 10.1016/j.rse.2024.114416
View details for Web of Science ID 001329591400001
-
Unifying remote sensing change detection via deep probabilistic change models: From principles, models to applications
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2024; 215: 239-255
View details for DOI 10.1016/j.isprsjprs.2024.07.001
View details for Web of Science ID 001273221200001
-
Global road extraction using a pseudo-label guided framework: from benchmark dataset to cross-region semi-supervised learning
GEO-SPATIAL INFORMATION SCIENCE
2024
View details for DOI 10.1080/10095020.2024.2362760
View details for Web of Science ID 001251618600001
-
EarthVQANet: Multi-task visual question answering for remote sensing image understanding
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2024; 212: 422-439
View details for DOI 10.1016/j.isprsjprs.2024.05.001
View details for Web of Science ID 001243433300001
-
LoveNAS: Towards multi-scene land-cover mapping via hierarchical searching adaptive network
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2024; 209: 265-278
View details for DOI 10.1016/j.isprsjprs.2024.01.011
View details for Web of Science ID 001200052300001
-
MAPCHANGE: ENHANCING SEMANTIC CHANGE DETECTION WITH TEMPORALINVARIANT HISTORICAL MAPS BASED ON DEEP TRIPLET NETWORK
IEEE. 2024: 7653-7656
View details for DOI 10.1109/IGARSS53475.2024.10640532
View details for Web of Science ID 001415226902111
-
LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
IEEE COMPUTER SOC. 2024: 22563-22573
View details for DOI 10.1109/CVPR52733.2024.02129
View details for Web of Science ID 001342515505086
-
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 5481-5489
View details for Web of Science ID 001239936300043
-
Adaptive Self-Supporting Prototype Learning for Remote Sensing Few-Shot Semantic Segmentation
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2024; 62
View details for DOI 10.1109/TGRS.2024.3435086
View details for Web of Science ID 001289662900034
-
Temporal-agnostic change region proposal for semantic change detection
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2023; 204: 306-320
View details for DOI 10.1016/j.isprsjprs.2023.06.017
View details for Web of Science ID 001086073500001
-
Explicable Fine-Grained Aircraft Recognition Via Deep Part Parsing Prior Framework for High-Resolution Remote Sensing Imagery.
IEEE transactions on cybernetics
2023; PP
Abstract
Aircraft recognition is crucial in both civil and military fields, and high-spatial resolution remote sensing has emerged as a practical approach. However, existing data-driven methods fail to locate discriminative regions for effective feature extraction due to limited training data, leading to poor recognition performance. To address this issue, we propose a knowledge-driven deep learning method called the explicable aircraft recognition framework based on a part parsing prior (APPEAR). APPEAR explicitly models the aircraft's rigid structure as a pixel-level part parsing prior, dividing it into five parts: 1) the nose; 2) left wing; 3) right wing; 4) fuselage; and 5) tail. This fine-grained prior provides reliable part locations to delineate aircraft architecture and imposes spatial constraints among the parts, effectively reducing the search space for model optimization and identifying subtle interclass differences. A knowledge-driven aircraft part attention (KAPA) module uses this prior to achieving a geometric-invariant representation for identifying discriminative features. Part features are generated by part indexing in a specific order and sequentially embedded into a compact space to obtain a fixed-length representation for each part, invariant to aircraft orientation and scale. The part attention module then takes the embedded part features, adaptively reweights their importance to identify discriminative parts, and aggregates them for recognition. The proposed APPEAR framework is evaluated on two aircraft recognition datasets and achieves superior performance. Moreover, experiments with few-shot learning methods demonstrate the robustness of our framework in different tasks. Ablation analysis illustrates that the fuselage and wings of the aircraft are the most effective parts for recognition.
View details for DOI 10.1109/TCYB.2023.3293033
View details for PubMedID 37552595
-
Large-scale agricultural greenhouse extraction for remote sensing imagery based on layout attention network: A case study of China
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2023; 200: 73-88
View details for DOI 10.1016/j.isprsjprs.2023.04.020
View details for Web of Science ID 001007013800001
-
Large-scale deep learning based binary and semantic change detection in ultra high resolution remote sensing imagery: From benchmark datasets to urban application
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2022; 193: 164-186
View details for DOI 10.1016/j.isprsjprs.2022.08.012
View details for Web of Science ID 000876743000004
-
Cross-sensor domain adaptation for high spatial resolution urban land-cover mapping: From airborne to spaceborne imagery
REMOTE SENSING OF ENVIRONMENT
2022; 277
View details for DOI 10.1016/j.rse.2022.113058
View details for Web of Science ID 000804945700001
-
GRE AND BEYOND: A GLOBAL ROAD EXTRACTION DATASET
IEEE. 2022: 3035-3038
View details for DOI 10.1109/IGARSS46834.2022.9883915
View details for Web of Science ID 000920916603057
-
A Supervised Progressive Growing Generative Adversarial Network for Remote Sensing Image Scene Classification
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2022; 60
View details for DOI 10.1109/TGRS.2022.3151405
View details for Web of Science ID 000773300900029
-
Cascaded Multi-Task Road Extraction Network for Road Surface, Centerline, and Edge Extraction
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2022; 60
View details for DOI 10.1109/TGRS.2022.3165817
View details for Web of Science ID 000794217400005
-
National-scale greenhouse mapping for high spatial resolution remote sensing imagery using a dense object dual-task deep learning framework: A case study of China
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2021; 181: 279-294
View details for DOI 10.1016/j.isprsjprs.2021.08.024
View details for Web of Science ID 000704454800009
-
Cross-domain road detection based on global-local adversarial learning framework from very high resolution satellite imagery
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2021; 180: 296-312
View details for DOI 10.1016/j.isprsjprs.2021.08.018
View details for Web of Science ID 000697167200020
-
FactSeg: Foreground Activation-Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2022; 60
View details for DOI 10.1109/TGRS.2021.3097148
View details for Web of Science ID 000732756800001
-
Urban road mapping based on an end-to-end road vectorization mapping network framework
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2021; 178: 345-365
View details for DOI 10.1016/j.isprsjprs.2021.05.016
View details for Web of Science ID 000686383900001
-
A Spectral-Spatial-Dependent Global Learning Framework for Insufficient and Imbalanced Hyperspectral Image Classification
IEEE TRANSACTIONS ON CYBERNETICS
2022; 52 (11): 11709-11723
Abstract
Deep learning techniques have been widely applied to hyperspectral image (HSI) classification and have achieved great success. However, the deep neural network model has a large parameter space and requires a large number of labeled data. Deep learning methods for HSI classification usually follow a patchwise learning framework. Recently, a fast patch-free global learning (FPGA) architecture was proposed for HSI classification according to global spatial context information. However, FPGA has difficulty in extracting the most discriminative features when the sample data are imbalanced. In this article, a spectral-spatial-dependent global learning (SSDGL) framework based on the global convolutional long short-term memory (GCL) and global joint attention mechanism (GJAM) is proposed for insufficient and imbalanced HSI classification. In SSDGL, the hierarchically balanced (H-B) sampling strategy and the weighted softmax loss are proposed to address the imbalanced sample problem. To effectively distinguish similar spectral characteristics of land cover types, the GCL module is introduced to extract the long short-term dependency of spectral features. To learn the most discriminative feature representations, the GJAM module is proposed to extract attention areas. The experimental results obtained with three public HSI datasets show that the SSDGL has powerful performance in insufficient and imbalanced sample problems and is superior to other state-of-the-art methods.
View details for DOI 10.1109/TCYB.2021.3070577
View details for Web of Science ID 000732244900001
View details for PubMedID 34033562
-
GAMSNet: Globally aware road detection network with multi-scale residual learning
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
2021; 175: 340-352
View details for DOI 10.1016/j.isprsjprs.2021.03.008
View details for Web of Science ID 000644695700024
-
RSNet: The Search for Remote Sensing Deep Neural Networks in Recognition Tasks
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2021; 59 (3): 2520-2534
View details for DOI 10.1109/TGRS.2020.3001401
View details for Web of Science ID 000622319000049
-
COLOR: Cycling, Offline Learning, and Online Representation Framework for Airport and Airplane Detection Using GF-2 Satellite Images
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2020; 58 (12): 8438-8449
View details for DOI 10.1109/TGRS.2020.2987907
View details for Web of Science ID 000594389800015
-
Edge-Reinforced Convolutional Neural Network for Road Detection in Very-High-Resolution Remote Sensing Imagery
PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING
2020; 86 (3): 153-160
View details for DOI 10.14358/PERS.86.3.153
View details for Web of Science ID 000516713700003
-
A NOVEL GLOBAL-AWARE DEEP NETWORK FOR ROAD DETECTION OF VERY HIGH RESOLUTION REMOTE SENSING IMAGERY
IEEE. 2020: 2579-2582
View details for DOI 10.1109/IGARSS39084.2020.9323155
View details for Web of Science ID 000664335302151
-
Multi-Scale and Multi-Task Deep Learning Framework for Automatic Road Extraction
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
2019; 57 (11): 9362-9377
View details for DOI 10.1109/TGRS.2019.2926397
View details for Web of Science ID 000496155200076
-
S3NET: TOWARDS REAL-TIME HYPERSPECTRAL IMAGERY CLASSIFICATION
IEEE. 2019: 3293-3296
View details for Web of Science ID 000519270603077
-
POP-NET: ENCODER-DUAL DECODER FOR SEMANTIC SEGMENTATION AND SINGLE-VIEW HEIGHT ESTIMATION
IEEE. 2019: 4963-4966
View details for Web of Science ID 000519270604212
-
Deep Salient Feature Based Anti-Noise Transfer Network for Scene Classification of Remote Sensing Imagery
REMOTE SENSING
2018; 10 (3)
View details for DOI 10.3390/rs10030410
View details for Web of Science ID 000428280100056
-
COLOR: CYCLING OFFLINE LEARNING AND ONLINE REPRESENTING FOR REMOTE SENSING DATAFLOW
IEEE. 2018: 4093-4096
View details for Web of Science ID 000451039804016
-
Multi-channel Pose-aware Convolution Neural Networks for Multi-view Facial Expression Recognition
IEEE. 2018: 458-465
View details for DOI 10.1109/FG.2018.00074
View details for Web of Science ID 000454996700064
https://orcid.org/0000-0003-1811-6725