Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed

Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741

Browse

Search Results

Now showing 1 - 10 of 16
  • Publication
    An efficient end-to-end deep neural network for interstitial lung disease recognition and classification
    (Tubitak Scientific & Technological Research Council Turkey, 2022) Junayed, Masum Shah; Jeny, Afsana Ahsan; Islam, Md Baharul; Ahmed, Ikhtiar; Shah, Afm Shahen; Bahcesehir University; Daffodil International University; Dortmund University of Technology; Yildiz Technical University
    The automated Interstitial Lung Diseases (ILDs) classification technique is essential for assisting clinicians during the diagnosis process. Detecting and classifying ILDs patterns is a challenging problem. This paper introduces an end-to-end deep convolution neural network (CNN) for classifying ILDs patterns. The proposed model comprises four convolutional layers with different kernel sizes and Rectified Linear Unit (ReLU) activation function, followed by batch normalization and max-pooling with a size equal to the final feature map size well as four dense layers. We used the ADAM optimizer to minimize categorical cross-entropy. A dataset consisting of 21328 image patches of 128 CT scans with five classes is taken to train and assess the proposed model. A comparison study showed that the presented model outperformed pre-trained CNNs and five-fold cross-validation on the same dataset. For ILDs pattern classification, the proposed approach achieved the accuracy scores of 99.09% and the average F score of 97.9% that outperforms three pre-trained CNNs. These outcomes show that the proposed model is relatively state-of-the-art in precision, recall, f score, and accuracy.
  • Publication
    Assistive Visual Tool: Enhancing Safe Navigation with Video Remapping in AR Headsets
    (SPRINGER INTERNATIONAL PUBLISHING AG, 2025) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; DelBue, A; Canton, C; Pont-Tuset, J; Tommasi, T; Bahcesehir University; State University System of Florida; Florida Gulf Coast University
    Visual Field Loss (VFL) is characterized by blind spots or scotomas that poses detrimental impact on fundamental movement activities of individuals. Addressing the challenges (e.g., low video quality, content loss, high levels of contradiction, and limited mobility assessment) faced by existing Extended Reality (XR) systems as vision aids, we introduce a groundbreaking method that enriches the real-time navigation using Augmented Reality (AR) glasses. Our novel vision aid employs advanced video processing techniques to enhance visual perception in individuals with moderate to severe VFL, bridging the gap to healthy vision. A unique optimal video remapping function, tailored to our selected AR glasses characteristics, dynamically maps live video content to the largest intact region of the Visual Field (VF) map. Our method preserves video quality, minimizing blurriness and distortion. Through a comprehensive empirical user study involving 29 subjects with artificially induced scotomas, statistical analyses of object counting and multi-tasking walking track tests demonstrate the promising performance of our method in enhancing visual awareness and navigation capability in real-time.
  • Publication
    MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition
    (ELSEVIER, 2024) Sadeghzadeh, Arezoo; Shah, A. F. M. Shahen; Islam, Md Baharul; Bahcesehir University; Yildiz Technical University; State University System of Florida; Florida Gulf Coast University
    Sign language (SL) serves as a visual communication tool bearing great significance for deaf people to interact with others and facilitate their daily life. Wide varieties of SLs and the lack of interpretation knowledge necessitate developing automated sign language recognition (SLR) systems to attenuate the communication gap between the deaf and hearing communities. Despite numerous advanced static SLR systems, they are not practical and favorable enough for real-life scenarios once assessed simultaneously from different critical aspects: accuracy in dealing with high intra- and slight inter-class variations, robustness, computational complexity, and generalization ability. To this end, we propose a novel multi-lingual multi-modal SLR system, namely MLMSign, , by taking full strengths of hand-crafted features and deep learning models to enhance the performance and the robustness of the system against illumination changes while minimizing computational cost. The RGB sign images and 2D visualizations of their hand-crafted features, i.e., Histogram of Oriented Gradients (HOG) features and a * channel of L * a * b * color space, are employed as three input modalities to train a novel Convolutional Neural Network (CNN). The number of layers, filters, kernel size, learning rate, and optimization technique are carefully selected through an extensive parametric study to minimize the computational cost without compromising accuracy. The system's performance and robustness are significantly enhanced by jointly deploying the models of these three modalities through ensemble learning. The impact of each modality is optimized based on their impact coefficient determined by grid search. In addition to the comprehensive quantitative assessment, the capabilities of our proposed model and the effectiveness of ensembling over three modalities are evaluated qualitatively using the Grad-CAM visualization model. Experimental results on the test data with additional illumination changes verify the high robustness of our system in dealing with overexposed and underexposed lighting conditions. Achieving a high accuracy (> > 99.33%) . 33% ) on six benchmark datasets (i.e., Massey, Static ASL, NUS II, TSL Fingerspelling, BdSL36v1, and PSL) demonstrates that our system notably outperforms the recent state-of-the-art approaches with a minimum number of parameters and high generalization ability over complex datasets. Its promising performance for four different sign languages makes it a feasible system for multi-lingual applications.
  • Publication
    Advancing Retinal Image Segmentation: A Denoising Diffusion Probabilistic Model Perspective
    (IEEE COMPUTER SOC, 2024) Alimanov, Alnur; Islam, Md Baharul; Bahcesehir University; State University System of Florida; Florida Gulf Coast University
    Retinal images and vessel trees play a crucial role in aiding ophthalmologists to identify and diagnose various illnesses related to the eyes, blood vessels, and brain. However, manual retinal image segmentation is a laborious and highly skilled procedure, posing challenges in terms of both difficulty and time consumption. This study proposes a novel approach to retinal image segmentation, leveraging the Denoising Diffusion Probabilistic Model (DDPM) for precise performance. To our best knowledge, DDPM is being applied in this domain for the first time. Our approach incorporates a novel constraint to prevent DDPM from generating vessel structures that not present in the original retinal images during the segmentation process. Additionally, our model is not limited to the original DDPM size of 64 x 64 pixels. Instead, we train it to effectively segment images sized 256 x 256 pixels. This is a significant advancement since the original DDPM works exclusively with 64x64 image sizes and is primarily designed for generating random image samples. In our work, we address both limitations with a novel, efficient approach for accurate retinal image segmentation. A comprehensive evaluation of our methodology includes both quantitative and qualitative assessments. Our proposed method demonstrates competitive performance compared to state-of-the-art techniques, as indicated by both qualitative and quantitative scores. The source code of our method can be accessed at https://github.com/AAleka/DDPM-segmentation.
  • Publication
    ASD-EVNet: An Ensemble Vision Network based on Facial Expression for Autism Spectrum Disorder Recognition
    (IEEE, 2023) Jaby, Assil; Islam, Md Baharul; Ahad, Md Atiqur Rahman; Bahcesehir University; University of East London
    Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects individuals' social interaction, communication, and behavior. Early diagnosis and intervention are critical for the well-being and development of children with ASD. Available methods for diagnosing ASD are unpredictable (or with limited accuracy) or require significant time and resources. We aim to enhance the precision of ASD diagnosis by utilizing facial expressions, a readily accessible and limited time-consuming approach. This paper presents ASD Ensemble Vision Network (ASD-EVNet) for recognizing ASD based on facial expressions. The model utilizes three Vision Transformer (ViT) architectures, pre-trained on imageNet-21K and fine-tuned on the ASD dataset. We also develop an extensive collection of facial expression-based ASD dataset for children (FADC). The ensemble learning model was then created by combining the predictions of the three ViT models and feeding it to a classifier. Our experiments demonstrate that the proposed ensemble learning model outperforms and achieves state-of-the-art results in detecting ASD based on facial expressions.
  • Publication
    Machine Vision-Based Expert System for Automated Skin Cancer Detection
    (SPRINGER INTERNATIONAL PUBLISHING AG, 2022) Junayed, Masum Shah; Jeny, Afsana Ahsan; Rada, Lavdie; Islam, Md Baharul; BritoLoeza, C; MartinGonzalez, A; CastanedaZeman, V; Safi, A; Bahcesehir University; Daffodil International University
    Skin cancer is the most frequently occurring kind of cancer, accounting for about one-third of all cases. Automatic early detection without expert intervention for a visual inspection would be of great help for society. The image processing and machine learning methods have significantly contributed to medical and biomedical research, resulting in fast and exact inspection in different problems. One of such problems is accurate cancer detection and classification. In this study, we introduce an expert system based on image processing and machine learning for skin cancer detection and classification. The proposed approach consists of three significant steps: pre-processing, feature extraction, and classification. The pre-processing step uses the grayscale conversion, Gaussian filter, segmentation, and morphological operation to represent skin lesion images better. We employ two feature extractors, i.e., the ABCD scoring method (asymmetry, border, color, diameter) and gray level co-occurrence matrix (GLCM), to extract cancer-affected areas. Finally, five different machine learning classifiers such as logistic regression (LR), decision tree (DT), k-nearest neighbors (KNN), support vector machine (SVM), and random forest (RF) used to detect and classify skin cancer. Experimental results show that random forest exceeds all other classifiers achieving an accuracy of 97.62% and 0.97 Area Under Curve (AUC), which is state-of-the-art on the experimented open-source dataset PH2.
  • Publication
    WNet: A dual-encoded multi-human parsing network
    (WILEY, 2024) Hosen, Md Imran; Aydin, Tarkan; Islam, Md Baharul; Bahcesehir University
    In recent years, multi-human parsing has become a focal point in research, yet prevailing methods often rely on intermediate stages and lacking pixel-level analysis. Moreover, their high computational demands limit real-world efficiency. To address these challenges and enable real-time performance, low-latency end-to-end network is proposed. This approach leverages vision transformer and convolutional neural network in a dual-encoded network, featuring a lightweight Transformer-based vision encoder) and a convolution encoder based on Darknet. This combination adeptly captures long-range dependencies and spatial relationships. Incorporating a fuse block enables the seamless merging of features from the encoders. Residual connections in the decoder design amplify information flow. Experimental validation on crowd instance-level human parsing and look into person datasets showcases the WNet's effectiveness, achieving high-speed multi-human parsing at 26.7 frames per second. Ablation studies further underscore WNet's capabilities, emphasizing its efficiency and accuracy in complex multi-human parsing tasks. We present WNet, a low-latency end-to-end network for multi-human parsing that integrates vision transformer and Convolutional Neural Network in a dual-encoded structure (vision encoder and a convolution encoder). By adeptly capturing long-range dependencies and spatial relationships, WNet achieves real-time performance and high-speed parsing at 26.7 frames per second on crowd instance-level human parsing and look into person datasets. The inclusion of a fuse block for seamless feature merging, along with residual connections in the decoder, amplifies information flow, emphasizing WNet's efficiency and accuracy in complex multi-human parsing tasks. image
  • Publication
    Deep Covariance Feature and CNN-based End-to-End Masked Face Recognition
    (IEEE, 2021) Junayed, Masum Shah; Sadeghzadeh, Arezoo; Islam, Md Baharul; Struc, V; Ivanovska, M; Bahcesehir University
    With the emergence of the global epidemic of COVID-19, face recognition systems have achieved much attention as contactless identity verification methods. However, covering a considerable part of the face by the mask poses severe challenges for conventional face recognition systems. This paper proposes an automated Masked Face Recognition (MFR) system based on the combination of a mask occlusion discarding technique and a deep-learning model. Initially, a pre-processing step is carried out in which the images pass three filters. Then, a Convolutional Neural Network (CNN) model is proposed to extract the features from unoccluded regions of the faces (i.e., eyes and forehead). These feature maps are employed to obtain covariance-based features. Two extra layers, i.e., Bitmap and Eigenvalue, are designed to reduce the dimension and concatenate these covariance feature matrices. The deep covariance features are quantized to codebooks combined based on Bag-of-Features (BoF) paradigm. Finally, a global histogram is created based on these codebooks and utilized for training an SVM classifier. The proposed method is trained and evaluated on Real-World-Masked-Face-Recognition-Dataset (RMFRD) and Simulated-Masked-Face-Recognition-Dataset (SMFRD) achieves an accuracy of 95.07% and 92.32%, respectively, showing its competitive performance compared to the state-of-the-art. Experimental results prove that our system has high robustness against noisy data and illumination variations.
  • Publication
    AN EFFICIENT END-TO-END IMAGE COMPRESSION TRANSFORMER
    (IEEE, 2022) Jeny, Afsana Ahsan; Junayed, Masum Shah; Islam, Md Baharul; Bahcesehir University
    Image and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This paper introduces an efficient end-to-end transformer-based image compression model, which generates a global receptive field to tackle the long-range correlation issues. A hyper encoder-decoder-based transformer block employs a multi-head spatial reduction self-attention (MHSRSA) layer to minimize the computational cost of the self-attention layer and enable rapid learning of multi-scale and high-resolution features. A Casual Global Anticipation Module (CGAM) is designed to construct highly informative adjacent contexts utilizing channel-wise linkages and identify global reference points in the latent space for end-to-end rate-distortion optimization (RDO). Experimental results demonstrate the effectiveness and competitive performance of the KODAK dataset.
  • Publication
    DeepPyNet: A Deep Feature Pyramid Network for Optical Flow Estimation
    (IEEE, 2021) Jeny, Afsana Ahsan; Islam, Md Baharul; Aydin, Tarkan; Cree, MJ; Bahcesehir University
    Recent advances in optical flow prediction have been made possible by using feature pyramids and iterative refining. Though downsampling in feature pyramids may cause foreground items to merge with the background, the iterative processing could be incorrect in optical flow experiments. Particularly the outcomes of the movement of narrow and tiny objects can be more invisible in the flow scene. We introduce a novel method called DeepPyNet for optical flow estimation that includes feature extractor, multi-channel cost volume, and flow decoder. In this method, we propose a deep recurrent feature pyramid-based network for the end-to-end optical flow estimation. The feature extraction from each pixel of the feature map keeps essential information without modifying the feature receptive field. Then, a multi-scale 4 Dc orrelation volume is built from the visual similarity of each pair of pixels. Finally, we utilize the multi-scale correlation volumes to continuously update the flow field through an iterative recurrent method. Experimental results demonstrate that DeepPyNet significantly eliminates flow errors and provides state-of-the-art performance in various datasets. Moreover, DeepPyNet is less complex and uses only 6.1M parameters 81% and 35% smaller than the popular FlowNet and PWC-Net+, respectively.