Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed

Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741

Browse

Search Results

Now showing 1 - 10 of 89
  • Publication
    Stereoscopic video deblurring transformer
    (NATURE PORTFOLIO, 2024) Imani, Hassan; Islam, Md Baharul; Junayed, Masum Shah; Ahad, Md Atiqur Rahman; Bahcesehir University; State University System of Florida; Florida Gulf Coast University; University of Connecticut; University of East London
    Stereoscopic cameras, such as those in mobile phones and various recent intelligent systems, are becoming increasingly common. Multiple variables can impact the stereo video quality, e.g., blur distortion due to camera/object movement. Monocular image/video deblurring is a mature research field, while there is limited research on stereoscopic content deblurring. This paper introduces a new Transformer-based stereo video deblurring framework with two crucial new parts: a self-attention layer and a feed-forward layer that realizes and aligns the correlation among various video frames. The traditional fully connected (FC) self-attention layer fails to utilize data locality effectively, as it depends on linear layers for calculating attention maps The Vision Transformer, on the other hand, also has this limitation, as it takes image patches as inputs to model global spatial information. 3D convolutional neural networks (3D CNNs) process successive frames to correct motion blur in the stereo video. Besides, our method uses other stereo-viewpoint information to assist deblurring. The parallax attention module (PAM) is significantly improved to combine the stereo and cross-view information for more deblurring. An extensive ablation study validates that our method efficiently deblurs the stereo videos based on the experiments on two publicly available stereo video datasets. Experimental results of our approach demonstrate state-of-the-art performance compared to the image and video deblurring techniques by a large margin.
  • Publication
    ARVA: An Augmented Reality-Based Visual Aid for Mobility Enhancement Through Real-Time Video Stream Transformation
    (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2024) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; Bahcesehir University; State University System of Florida; Florida Gulf Coast University; Daffodil International University
    Visual field loss (VFL) is a persistent visual impairment characterized by limited vision spots (scotoma) within the normal visual field, significantly impacting daily activities for affected individuals. Current Virtual Reality (VR) and Augmented Reality (AR)-based visual aids suffer from low video quality, content loss, high levels of contradiction, and limited mobility assessment. To address these issues, we propose an innovative vision aid utilizing AR headset and integrating advanced video processing techniques to elevate the visual perception of individuals with moderate to severe VFL to levels comparable to those with unimpaired vision. Our approach introduces a pioneering optimal video remapping function tailored to the characteristics of AR glasses. This function strategically maps the content of live video captures to the largest intact region of the visual field map, preserving quality while minimizing blurriness and content distortion. To evaluate the performance of our proposed method, a comprehensive empirical user study is conducted including object counting and multi-tasking walking track tests and involving 15 subjects with artificially induced scotomas in their normal visual fields. The proposed vision aid achieves 41.56% enhancement (from 57.31% to 98.87%) in the mean value of the average object recognition rates for all subjects in object counting test. In walking track test, the average mean scores for obstacle avoidance, detected signs, recognized signs, and grasped objects are significantly enhanced after applying the remapping function, with improvements of 7.56% (91.10% to 98.66%), 51.81% (44.85% to 96.66%), 49.31% (43.18% to 92.49%), and 77.77% (13.33% to 91.10%), respectively. Statistical analysis of data before and after applying the remapping function demonstrates the promising performance of our method in enhancing visual awareness and mobility for individuals with VFL.
  • Publication
    An efficient end-to-end deep neural network for interstitial lung disease recognition and classification
    (Tubitak Scientific & Technological Research Council Turkey, 2022) Junayed, Masum Shah; Jeny, Afsana Ahsan; Islam, Md Baharul; Ahmed, Ikhtiar; Shah, Afm Shahen; Bahcesehir University; Daffodil International University; Dortmund University of Technology; Yildiz Technical University
    The automated Interstitial Lung Diseases (ILDs) classification technique is essential for assisting clinicians during the diagnosis process. Detecting and classifying ILDs patterns is a challenging problem. This paper introduces an end-to-end deep convolution neural network (CNN) for classifying ILDs patterns. The proposed model comprises four convolutional layers with different kernel sizes and Rectified Linear Unit (ReLU) activation function, followed by batch normalization and max-pooling with a size equal to the final feature map size well as four dense layers. We used the ADAM optimizer to minimize categorical cross-entropy. A dataset consisting of 21328 image patches of 128 CT scans with five classes is taken to train and assess the proposed model. A comparison study showed that the presented model outperformed pre-trained CNNs and five-fold cross-validation on the same dataset. For ILDs pattern classification, the proposed approach achieved the accuracy scores of 99.09% and the average F score of 97.9% that outperforms three pre-trained CNNs. These outcomes show that the proposed model is relatively state-of-the-art in precision, recall, f score, and accuracy.
  • Publication
    Assistive Visual Tool: Enhancing Safe Navigation with Video Remapping in AR Headsets
    (SPRINGER INTERNATIONAL PUBLISHING AG, 2025) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; DelBue, A; Canton, C; Pont-Tuset, J; Tommasi, T; Bahcesehir University; State University System of Florida; Florida Gulf Coast University
    Visual Field Loss (VFL) is characterized by blind spots or scotomas that poses detrimental impact on fundamental movement activities of individuals. Addressing the challenges (e.g., low video quality, content loss, high levels of contradiction, and limited mobility assessment) faced by existing Extended Reality (XR) systems as vision aids, we introduce a groundbreaking method that enriches the real-time navigation using Augmented Reality (AR) glasses. Our novel vision aid employs advanced video processing techniques to enhance visual perception in individuals with moderate to severe VFL, bridging the gap to healthy vision. A unique optimal video remapping function, tailored to our selected AR glasses characteristics, dynamically maps live video content to the largest intact region of the Visual Field (VF) map. Our method preserves video quality, minimizing blurriness and distortion. Through a comprehensive empirical user study involving 29 subjects with artificially induced scotomas, statistical analyses of object counting and multi-tasking walking track tests demonstrate the promising performance of our method in enhancing visual awareness and navigation capability in real-time.
  • Publication
    MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition
    (ELSEVIER, 2024) Sadeghzadeh, Arezoo; Shah, A. F. M. Shahen; Islam, Md Baharul; Bahcesehir University; Yildiz Technical University; State University System of Florida; Florida Gulf Coast University
    Sign language (SL) serves as a visual communication tool bearing great significance for deaf people to interact with others and facilitate their daily life. Wide varieties of SLs and the lack of interpretation knowledge necessitate developing automated sign language recognition (SLR) systems to attenuate the communication gap between the deaf and hearing communities. Despite numerous advanced static SLR systems, they are not practical and favorable enough for real-life scenarios once assessed simultaneously from different critical aspects: accuracy in dealing with high intra- and slight inter-class variations, robustness, computational complexity, and generalization ability. To this end, we propose a novel multi-lingual multi-modal SLR system, namely MLMSign, , by taking full strengths of hand-crafted features and deep learning models to enhance the performance and the robustness of the system against illumination changes while minimizing computational cost. The RGB sign images and 2D visualizations of their hand-crafted features, i.e., Histogram of Oriented Gradients (HOG) features and a * channel of L * a * b * color space, are employed as three input modalities to train a novel Convolutional Neural Network (CNN). The number of layers, filters, kernel size, learning rate, and optimization technique are carefully selected through an extensive parametric study to minimize the computational cost without compromising accuracy. The system's performance and robustness are significantly enhanced by jointly deploying the models of these three modalities through ensemble learning. The impact of each modality is optimized based on their impact coefficient determined by grid search. In addition to the comprehensive quantitative assessment, the capabilities of our proposed model and the effectiveness of ensembling over three modalities are evaluated qualitatively using the Grad-CAM visualization model. Experimental results on the test data with additional illumination changes verify the high robustness of our system in dealing with overexposed and underexposed lighting conditions. Achieving a high accuracy (> > 99.33%) . 33% ) on six benchmark datasets (i.e., Massey, Static ASL, NUS II, TSL Fingerspelling, BdSL36v1, and PSL) demonstrates that our system notably outperforms the recent state-of-the-art approaches with a minimum number of parameters and high generalization ability over complex datasets. Its promising performance for four different sign languages makes it a feasible system for multi-lingual applications.
  • Publication
    CataractNet: An Automated Cataract Detection System Using Deep Learning for Fundus Images
    (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2021) Junayed, Masum Shah; Islam, Md Baharul; Sadeghzadeh, Arezoo; Rahman, Saimunur; Daffodil International University; Bahcesehir University; Commonwealth Scientific & Industrial Research Organisation (CSIRO); CSIRO Data61
    Cataract is one of the most common eye disorders that causes vision distortion. Accurate and timely detection of cataracts is the best way to control the risk and avoid blindness. Recently, artificial intelligence-based cataract detection systems have been received research attention. In this paper, a novel deep neural network, namely CataractNet, is proposed for automatic cataract detection in fundus images. The loss and activation functions are tuned to train the network with small kernels, fewer training parameters, and layers. Thus, the computational cost and average running time of CataractNet are significantly reduced compared to other pre-trained Convolutional Neural Network (CNN) models. The proposed network is optimized with the Adam optimizer. A total of 1130 cataract and non-cataract fundus images are collected and augmented to 4746 images to train the model. For avoiding the over-fitting problem, the dataset is extended through augmentation before model training. Experimental results prove that the proposed method outperforms the state-of-the-art cataract detection approaches with an average accuracy of 99.13%.
  • Publication
    SkNet: A Convolutional Neural Networks Based Classification Approach for Skin Cancer Classes
    (IEEE, 2020) Jeny, Afsana Ahsan; Sakib, Abu Noman Md; Junayed, Masum Shah; Lima, Khadija Akter; Ahmed, Ikhtiar; Islam, Md Baharul; Daffodil International University; Khulna University of Engineering & Technology (KUET); Bahcesehir University; Daffodil International University
    Skin Cancer is one of the most common types of cancer. A solution for this globally recognized health problem is much required. Machine Learning techniques have brought revolutionary changes in the field of biomedical researches. Previously, It took a significant amount of time and much effort in detecting skin cancers. In recent years, many works have been done with Deep Learning which made the process a lot faster and much more accurate. In this paper, We have proposed a novel Convolutional Neural Networks (CNN) based approach that can classify four different types of Skin Cancer. We have developed our model SkNet consisting of 19 convolution layers. In previous works, the highest accuracy gained on 1000 images was 80.52%. Our proposed model exceeded that previous performance and achieved an accuracy of 95.26% on a dataset of 4800 images which is the highest acquired accuracy.
  • Publication
    Spatial-Temporal Coherence in Extreme Video Retargeting for Consumer Screening Devices
    (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2025) Imani, Hassan; Islam, Md Baharul; Bahcesehir University; State University System of Florida; Florida Gulf Coast University
    The accessibility of diverse display devices and their aspect ratios has drawn much research attention to video retargeting. Non-consistent video retargeting can significantly affect a video's spatial and temporal quality, particularly in extreme retargeting cases. Since there are no perfectly annotated datasets for video retargeting, deep learning-based techniques are rarely utilized. This paper proposes a method that learns to retarget videos by detecting the salient areas and shifting them to the appropriate location. First, we segment the salient objects using a unified Transformer model. Using convolutional layers and a shifting strategy, we shift and warp objects to the appropriate size and location in the frame. We use 1D convolution to move the salient items in the scene. Additionally, we employ a frame interpolation technique to preserve temporal information. To train the network, we feed the retargeted frames to a variational auto-encoder network to map the retargeted frames back to the input frames. Furthermore, we design perceptual and wavelet-based loss functions to train our model. Thus, we train the network unsupervised. Extensive qualitative and quantitative experiments on the DAVIS dataset show the superiority of the proposed method over existing image and video-based methods.
  • Publication
    Advancing Retinal Image Segmentation: A Denoising Diffusion Probabilistic Model Perspective
    (IEEE COMPUTER SOC, 2024) Alimanov, Alnur; Islam, Md Baharul; Bahcesehir University; State University System of Florida; Florida Gulf Coast University
    Retinal images and vessel trees play a crucial role in aiding ophthalmologists to identify and diagnose various illnesses related to the eyes, blood vessels, and brain. However, manual retinal image segmentation is a laborious and highly skilled procedure, posing challenges in terms of both difficulty and time consumption. This study proposes a novel approach to retinal image segmentation, leveraging the Denoising Diffusion Probabilistic Model (DDPM) for precise performance. To our best knowledge, DDPM is being applied in this domain for the first time. Our approach incorporates a novel constraint to prevent DDPM from generating vessel structures that not present in the original retinal images during the segmentation process. Additionally, our model is not limited to the original DDPM size of 64 x 64 pixels. Instead, we train it to effectively segment images sized 256 x 256 pixels. This is a significant advancement since the original DDPM works exclusively with 64x64 image sizes and is primarily designed for generating random image samples. In our work, we address both limitations with a novel, efficient approach for accurate retinal image segmentation. A comprehensive evaluation of our methodology includes both quantitative and qualitative assessments. Our proposed method demonstrates competitive performance compared to state-of-the-art techniques, as indicated by both qualitative and quantitative scores. The source code of our method can be accessed at https://github.com/AAleka/DDPM-segmentation.
  • Publication
    Improving Image Compression With Adjacent Attention and Refinement Block
    (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2023) Jeny, Afsana Ahsan; Islam, Md Baharul; Junayed, Masum Shah; Das, Debashish; Daffodil International University; Bahcesehir University; Birmingham City University
    Recently, learned image compression algorithms have shown incredible performance compared to classic hand-crafted image codecs. Despite its considerable achievements, the fundamental disadvantage is not optimized for retaining local redundancies, particularly non-repetitive patterns, which have a detrimental influence on the reconstruction quality. This paper introduces the autoencoder-style network-based efficient image compression method, which contains three novel blocks, i.e., adjacent attention block, Gaussian merge block, and decoded image refinement block, to improve the overall image compression performance. The adjacent attention block allocates the additional bits required to capture spatial correlations (both vertical and horizontal) and effectively remove worthless information. The Gaussian merge block assists the rate-distortion optimization performance, while the decoded image refinement block improves the defects in low-resolution reconstructed images. A comprehensive ablation study analyzes and evaluates the qualitative and quantitative capabilities of the proposed model. Experimental results on two publicly available datasets reveal that our method outperforms the state-of-the-art methods on the KODAK dataset (by around 4dB and 5dB) and CLIC dataset (by about 4dB and 3dB) in terms of PSNR and MS-SSIM.