Category: 4. Zelia’s Research
Token-Based Global-Local Framework for Building Change Detection
ENG: The paper “Token-Based Global-Local Framework for Building Change Detection” by Bianca-Cerasela-Zelia Blaga, published in the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS), introduces RADNet, a Region-Aware Dual-path Network for building change detection in high-resolution remote sensing imagery. The study addresses the limitations of existing deep learning models that typically rely on uniform patch-based tokenization and global feature fusion, which often fail to preserve building-level semantics and temporal consistency. To overcome these challenges, the proposed architecture integrates an Instance-Consistent Tokenizer (ICT), which generates semantically aligned object-level tokens, and a Global-Local Attention Encoder (GLAE), which captures both contextual dependencies and geometric details.

Improving VQA Counting Accuracy for Post-Flood Damage Assessment
ENG: Floods cause massive destruction every year, making quick and accurate damage assessment critical for recovery efforts. Visual Question Answering (VQA) systems, which analyze images and answer related questions, can help streamline this process. However, many current systems struggle with tasks like counting flooded buildings, which is important for planning responses. To address this, researchers from the Technical University of Cluj-Napoca developed a new VQA system that combines advanced text and image analysis tools to improve accuracy in post-flood assessments.
Read MoreEnhancing Semantic Segmentation of Remote Sensing Images with Transformer-Based Attention Mechanisms
ENG: In a recent paper published in the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing that can be read here, Zelia Blaga and Sergiu Nedevschi introduced a novel architecture entitled SwinFAN (Swin-based Focal Axial attention Network), a transformer-based framework designed to advance semantic segmentation of remote sensing images. This model leverages the power of Swin transformers as an encoder combined with novel components like the Guided Focal-Axial (GFA) attention module and the Attention-based Feature Refinement Head (AFRH). The GFA module enhances the model’s ability to process both local and global contextual information, making it particularly effective in complex urban environments captured by drones.
Read More
