Bio


My research interests are Earth Vision and AI4Earth, especially multi-modal and multi-temporal remote sensing image analysis and their real-world applications.

First-author representative works:
- Our Change family: ChangeStar (single-temporal learning, ICCV 2021), ChangeMask (many-to-many architecture, ISPRS P&RS 2022), ChangeOS (one-to-many architecture, RSE 2021), Changen (generative change modeling, ICCV 2023)
- Geospatial object segmentation: FarSeg (CVPR 2020) and FarSeg++ (TPAMI 2023), LoveDA dataset (NeurIPS Datasets and Benchmark 2021)
- Missing-modality all weather mapping: Deep Multisensory Learning (first work on this topic, ISPRS P&RS 2021)
- Hyperspectral image classification: FPGA (first fully end-to-end patch-free method for HSI, TGRS 2020)

Stanford Advisors


All Publications


  • Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Zheng, Z., Ermon, S., Kim, D., Zhang, L., Zhong, Y. 2025; 47 (2): 725-741

    Abstract

    Our understanding of the temporal dynamics of the Earth's surface has been significantly advanced by deep vision models, which often require a massive amount of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present scalable multi-temporal change data generators based on generative models, which are cheap and automatic, alleviating these data problems. Our main idea is to simulate a stochastic change process over time. We describe the stochastic change process as a probabilistic graphical model, namely the generative probabilistic change model (GPCM), which factorizes the complex simulation problem into two more tractable sub-problems, i.e., condition-level change event simulation and image-level semantic change synthesis. To solve these two problems, we present Changen2, a GPCM implemented with a resolution-scalable diffusion transformer which can generate time series of remote sensing images and corresponding semantic and change labels from labeled and even unlabeled single-temporal images. Changen2 is a "generative change foundation model" that can be trained at scale via self-supervision, and is capable of producing change supervisory signals from unlabeled single-temporal images. Unlike existing "foundation models", our generative change foundation model synthesizes change data to train task-specific foundation models for change detection. The resulting model possesses inherent zero-shot change detection capabilities and excellent transferability. Comprehensive experiments suggest Changen2 has superior spatiotemporal scalability in data generation, e.g., Changen2 model trained on 256 2 pixel single-temporal images can yield time series of any length and resolutions of 1,024 2 pixels. Changen2 pre-trained models exhibit superior zero-shot performance (narrowing the performance gap to 3% on LEVIR-CD and approximately 10% on both S2Looking and SECOND, compared to fully supervised counterpart) and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment. The model and datasets are available at https://github.com/Z-Zheng/pytorch-change-models.

    View details for DOI 10.1109/TPAMI.2024.3475824

    View details for Web of Science ID 001395340500042

    View details for PubMedID 39388323

  • Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection INTERNATIONAL JOURNAL OF COMPUTER VISION Zheng, Z., Zhong, Y., Ma, A., Zhang, L. 2024
  • FarSeg++: Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery. IEEE transactions on pattern analysis and machine intelligence Zheng, Z., Zhong, Y., Wang, J., Ma, A., Zhang, L. 2023; PP

    Abstract

    Geospatial object segmentation, a fundamental Earth vision task, always suffers from scale variation, the larger intra-class variance of background, and foreground-background imbalance in high spatial resolution (HSR) remote sensing imagery. Generic semantic segmentation methods mainly focus on the scale variation in natural scenarios. However, the other two problems are insufficiently considered in large area Earth observation scenarios. In this paper, we propose a foreground-aware relation network (FarSeg++) from the perspectives of relation-based, optimization-based, and objectness-based foreground modeling, alleviating the above two problems. From the perspective of the relations, the foreground-scene relation module improves the discrimination of the foreground features via the foreground-correlated contexts associated with the object-scene relation. From the perspective of optimization, foreground-aware optimization is proposed to focus on foreground examples and hard examples of the background during training to achieve a balanced optimization. Besides, from the perspective of objectness, a foreground-aware decoder is proposed to improve the objectness representation, alleviating the objectness prediction problem that is the main bottleneck revealed by an empirical upper bound analysis. We also introduce a new large-scale high-resolution urban vehicle segmentation dataset to verify the effectiveness of the proposed method and push the development of objectness prediction further forward. The experimental results suggest that FarSeg++ is superior to the state-of-the-art generic semantic segmentation methods and can achieve a better trade-off between speed and accuracy. The code and model are available at: https://github.com/Z-Zheng/FarSeg.

    View details for DOI 10.1109/TPAMI.2023.3296757

    View details for PubMedID 37467086

  • ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Zheng, Z., Zhong, Y., Tian, S., Ma, A., Zhang, L. 2022; 183: 228-239
  • Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters REMOTE SENSING OF ENVIRONMENT Zheng, Z., Zhong, Y., Wang, J., Ma, A., Zhang, L. 2021; 265
  • Deep multisensor learning for missing-modality all-weather mapping ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Zheng, Z., Ma, A., Zhang, L., Zhong, Y. 2021; 174: 254-264
  • FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Zheng, Z., Zhong, Y., Ma, A., Zhang, L. 2020; 58 (8): 5612-5626
  • HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Zheng, Z., Zhong, Y., Ma, A., Han, X., Zhao, J., Liu, Y., Zhang, L. 2020; 166: 1-14
  • Remote sensing intelligent interpretation brain: Real-time intelligent understanding of the Earth. PNAS nexus Wan, Y., Zhang, C., Ma, A., Chen, Z., Sun, C., Wang, J., Zheng, Z., Bao, F., Zhang, L., Zhong, Y. 2025; 4 (6): pgaf182

    Abstract

    The large-scale understanding of nature and human activities in real time cannot be separated from Earth observation. Existing monitoring techniques, however, rely primarily on offline processing, with a separation between software and hardware in collection, processing and transmission processes. This limits the ability and timeliness in response to emergency tasks such as disaster relief and nighttime rescue. Our brain can process real-time information across different scales and modalities through perception, cognition, transmission, and decision-making, to take informed actions rapidly. Such an intelligent ability inspires us to establish a novel remote sensing intelligent interpretation brain (RSI2_Brain), by combining multimodal data processing, network transmission and communication on-the-fly, to demonstrate new understanding of the Earth. The RSI2_Brain can operate as online acquisition, real-time processing and transmission with low computational power and communication blocking constraints. It, therefore, has practical utility and wide applicability in extremely harsh conditions, providing all-day and online response automatically.

    View details for DOI 10.1093/pnasnexus/pgaf182

    View details for PubMedID 40501452

    View details for PubMedCentralID PMC12152476

  • Learning Temporal Consistency for High Spatial Resolution Remote Sensing Imagery Semantic Change Detection IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Tian, S., Ma, A., Zheng, Z., Tan, X., Zhong, Y. 2025; 63
  • Towards transferable building damage assessment via unsupervised single-temporal change adaptation REMOTE SENSING OF ENVIRONMENT Zheng, Z., Zhong, Y., Zhang, L., Burke, M., Lobell, D. B., Ermon, S. 2024; 315
  • Unifying remote sensing change detection via deep probabilistic change models: From principles, models to applications ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Zheng, Z., Zhong, Y., Zhao, J., Ma, A., Zhang, L. 2024; 215: 239-255
  • Global road extraction using a pseudo-label guided framework: from benchmark dataset to cross-region semi-supervised learning GEO-SPATIAL INFORMATION SCIENCE Lu, X., Zhong, Y., Zheng, Z., Wang, J., Chen, D., Su, Y. 2024
  • EarthVQANet: Multi-task visual question answering for remote sensing image understanding ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Wang, J., Ma, A., Chen, Z., Zheng, Z., Wan, Y., Zhang, L., Zhong, Y. 2024; 212: 422-439
  • LoveNAS: Towards multi-scene land-cover mapping via hierarchical searching adaptive network ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Wang, J., Zhong, Y., Ma, A., Zheng, Z., Wan, Y., Zhang, L. 2024; 209: 265-278
  • MAPCHANGE: ENHANCING SEMANTIC CHANGE DETECTION WITH TEMPORALINVARIANT HISTORICAL MAPS BASED ON DEEP TRIPLET NETWORK Liu, Y., Shi, S., Zheng, Z., Wang, J., Tian, S., Zhong, Y., IEEE IEEE. 2024: 7653-7656
  • LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images Zhang, J., Fang, I., Wu, H., Kaushik, A., Rodriguez, A., Zhao, H., Zhang, J., Zheng, Z., Iovita, R., Feng, C., IEEE IEEE COMPUTER SOC. 2024: 22563-22573
  • EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering Wang, J., Zheng, Z., Chen, Z., Ma, A., Zhong, Y., Dy, J., Natarajan, S., Wooldridge, M. ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 5481-5489
  • Adaptive Self-Supporting Prototype Learning for Remote Sensing Few-Shot Semantic Segmentation IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Shen, W., Ma, A., Wang, J., Zheng, Z., Zhong, Y. 2024; 62
  • Temporal-agnostic change region proposal for semantic change detection ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Tian, S., Tan, X., Ma, A., Zheng, Z., Zhang, L., Zhong, Y. 2023; 204: 306-320
  • Explicable Fine-Grained Aircraft Recognition Via Deep Part Parsing Prior Framework for High-Resolution Remote Sensing Imagery. IEEE transactions on cybernetics Chen, D., Zhong, Y., Ma, A., Zheng, Z., Zhang, L. 2023; PP

    Abstract

    Aircraft recognition is crucial in both civil and military fields, and high-spatial resolution remote sensing has emerged as a practical approach. However, existing data-driven methods fail to locate discriminative regions for effective feature extraction due to limited training data, leading to poor recognition performance. To address this issue, we propose a knowledge-driven deep learning method called the explicable aircraft recognition framework based on a part parsing prior (APPEAR). APPEAR explicitly models the aircraft's rigid structure as a pixel-level part parsing prior, dividing it into five parts: 1) the nose; 2) left wing; 3) right wing; 4) fuselage; and 5) tail. This fine-grained prior provides reliable part locations to delineate aircraft architecture and imposes spatial constraints among the parts, effectively reducing the search space for model optimization and identifying subtle interclass differences. A knowledge-driven aircraft part attention (KAPA) module uses this prior to achieving a geometric-invariant representation for identifying discriminative features. Part features are generated by part indexing in a specific order and sequentially embedded into a compact space to obtain a fixed-length representation for each part, invariant to aircraft orientation and scale. The part attention module then takes the embedded part features, adaptively reweights their importance to identify discriminative parts, and aggregates them for recognition. The proposed APPEAR framework is evaluated on two aircraft recognition datasets and achieves superior performance. Moreover, experiments with few-shot learning methods demonstrate the robustness of our framework in different tasks. Ablation analysis illustrates that the fuselage and wings of the aircraft are the most effective parts for recognition.

    View details for DOI 10.1109/TCYB.2023.3293033

    View details for PubMedID 37552595

  • Large-scale agricultural greenhouse extraction for remote sensing imagery based on layout attention network: A case study of China ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Chen, D., Ma, A., Zheng, Z., Zhong, Y. 2023; 200: 73-88
  • Large-scale deep learning based binary and semantic change detection in ultra high resolution remote sensing imagery: From benchmark datasets to urban application ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Tian, S., Zhong, Y., Zheng, Z., Ma, A., Tan, X., Zhang, L. 2022; 193: 164-186
  • Cross-sensor domain adaptation for high spatial resolution urban land-cover mapping: From airborne to spaceborne imagery REMOTE SENSING OF ENVIRONMENT Wang, J., Ma, A., Zhong, Y., Zheng, Z., Zhang, L. 2022; 277
  • GRE AND BEYOND: A GLOBAL ROAD EXTRACTION DATASET Lu, X., Zhong, Y., Zheng, Z., Chen, D., IEEE IEEE. 2022: 3035-3038
  • A Supervised Progressive Growing Generative Adversarial Network for Remote Sensing Image Scene Classification IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Ma, A., Yu, N., Zheng, Z., Zhong, Y., Zhang, L. 2022; 60
  • Cascaded Multi-Task Road Extraction Network for Road Surface, Centerline, and Edge Extraction IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Lu, X., Zhong, Y., Zheng, Z., Chen, D., Su, Y., Ma, A., Zhang, L. 2022; 60
  • National-scale greenhouse mapping for high spatial resolution remote sensing imagery using a dense object dual-task deep learning framework: A case study of China ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Ma, A., Chen, D., Zhong, Y., Zheng, Z., Zhang, L. 2021; 181: 279-294
  • Cross-domain road detection based on global-local adversarial learning framework from very high resolution satellite imagery ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Lu, X., Zhong, Y., Zheng, Z., Wang, J. 2021; 180: 296-312
  • FactSeg: Foreground Activation-Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Ma, A., Wang, J., Zhong, Y., Zheng, Z. 2022; 60
  • Urban road mapping based on an end-to-end road vectorization mapping network framework ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Chen, D., Zhong, Y., Zheng, Z., Ma, A., Lu, X. 2021; 178: 345-365
  • A Spectral-Spatial-Dependent Global Learning Framework for Insufficient and Imbalanced Hyperspectral Image Classification IEEE TRANSACTIONS ON CYBERNETICS Zhu, Q., Deng, W., Zheng, Z., Zhong, Y., Guan, Q., Lin, W., Zhang, L., Li, D. 2022; 52 (11): 11709-11723

    Abstract

    Deep learning techniques have been widely applied to hyperspectral image (HSI) classification and have achieved great success. However, the deep neural network model has a large parameter space and requires a large number of labeled data. Deep learning methods for HSI classification usually follow a patchwise learning framework. Recently, a fast patch-free global learning (FPGA) architecture was proposed for HSI classification according to global spatial context information. However, FPGA has difficulty in extracting the most discriminative features when the sample data are imbalanced. In this article, a spectral-spatial-dependent global learning (SSDGL) framework based on the global convolutional long short-term memory (GCL) and global joint attention mechanism (GJAM) is proposed for insufficient and imbalanced HSI classification. In SSDGL, the hierarchically balanced (H-B) sampling strategy and the weighted softmax loss are proposed to address the imbalanced sample problem. To effectively distinguish similar spectral characteristics of land cover types, the GCL module is introduced to extract the long short-term dependency of spectral features. To learn the most discriminative feature representations, the GJAM module is proposed to extract attention areas. The experimental results obtained with three public HSI datasets show that the SSDGL has powerful performance in insufficient and imbalanced sample problems and is superior to other state-of-the-art methods.

    View details for DOI 10.1109/TCYB.2021.3070577

    View details for Web of Science ID 000732244900001

    View details for PubMedID 34033562

  • GAMSNet: Globally aware road detection network with multi-scale residual learning ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING Lu, X., Zhong, Y., Zheng, Z., Zhang, L. 2021; 175: 340-352
  • RSNet: The Search for Remote Sensing Deep Neural Networks in Recognition Tasks IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Wang, J., Zhong, Y., Zheng, Z., Ma, A., Zhang, L. 2021; 59 (3): 2520-2534
  • COLOR: Cycling, Offline Learning, and Online Representation Framework for Airport and Airplane Detection Using GF-2 Satellite Images IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Zhong, Y., Zheng, Z., Ma, A., Lu, X., Zhang, L. 2020; 58 (12): 8438-8449
  • Edge-Reinforced Convolutional Neural Network for Road Detection in Very-High-Resolution Remote Sensing Imagery PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING Lu, X., Zhong, Y., Zheng, Z., Zhao, J., Zhang, L. 2020; 86 (3): 153-160
  • A NOVEL GLOBAL-AWARE DEEP NETWORK FOR ROAD DETECTION OF VERY HIGH RESOLUTION REMOTE SENSING IMAGERY Lu, X., Zhong, Y., Zheng, Z., IEEE IEEE. 2020: 2579-2582
  • Multi-Scale and Multi-Task Deep Learning Framework for Automatic Road Extraction IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING Lu, X., Zhong, Y., Zheng, Z., Liu, Y., Zhao, J., Ma, A., Yang, J. 2019; 57 (11): 9362-9377
  • S3NET: TOWARDS REAL-TIME HYPERSPECTRAL IMAGERY CLASSIFICATION Zheng, Z., Zhong, Y., IEEE IEEE. 2019: 3293-3296
  • POP-NET: ENCODER-DUAL DECODER FOR SEMANTIC SEGMENTATION AND SINGLE-VIEW HEIGHT ESTIMATION Zheng, Z., Zhong, Y., Wang, J., IEEE IEEE. 2019: 4963-4966
  • Deep Salient Feature Based Anti-Noise Transfer Network for Scene Classification of Remote Sensing Imagery REMOTE SENSING Gong, X., Xie, Z., Liu, Y., Shi, X., Zheng, Z. 2018; 10 (3)

    View details for DOI 10.3390/rs10030410

    View details for Web of Science ID 000428280100056

  • COLOR: CYCLING OFFLINE LEARNING AND ONLINE REPRESENTING FOR REMOTE SENSING DATAFLOW Zheng, Z., Zhong, Y., IEEE IEEE. 2018: 4093-4096
  • Multi-channel Pose-aware Convolution Neural Networks for Multi-view Facial Expression Recognition Liu, Y., Zeng, J., Shan, S., Zheng, Z., IEEE IEEE. 2018: 458-465