Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed
Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741
Browse
16 results
Search Results
Publication Metadata only Assistive Visual Tool: Enhancing Safe Navigation with Video Remapping in AR Headsets(SPRINGER INTERNATIONAL PUBLISHING AG, 2025) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; DelBue, A; Canton, C; Pont-Tuset, J; Tommasi, T; Bahcesehir University; State University System of Florida; Florida Gulf Coast UniversityVisual Field Loss (VFL) is characterized by blind spots or scotomas that poses detrimental impact on fundamental movement activities of individuals. Addressing the challenges (e.g., low video quality, content loss, high levels of contradiction, and limited mobility assessment) faced by existing Extended Reality (XR) systems as vision aids, we introduce a groundbreaking method that enriches the real-time navigation using Augmented Reality (AR) glasses. Our novel vision aid employs advanced video processing techniques to enhance visual perception in individuals with moderate to severe VFL, bridging the gap to healthy vision. A unique optimal video remapping function, tailored to our selected AR glasses characteristics, dynamically maps live video content to the largest intact region of the Visual Field (VF) map. Our method preserves video quality, minimizing blurriness and distortion. Through a comprehensive empirical user study involving 29 subjects with artificially induced scotomas, statistical analyses of object counting and multi-tasking walking track tests demonstrate the promising performance of our method in enhancing visual awareness and navigation capability in real-time.Publication Metadata only SkNet: A Convolutional Neural Networks Based Classification Approach for Skin Cancer Classes(IEEE, 2020) Jeny, Afsana Ahsan; Sakib, Abu Noman Md; Junayed, Masum Shah; Lima, Khadija Akter; Ahmed, Ikhtiar; Islam, Md Baharul; Daffodil International University; Khulna University of Engineering & Technology (KUET); Bahcesehir University; Daffodil International UniversitySkin Cancer is one of the most common types of cancer. A solution for this globally recognized health problem is much required. Machine Learning techniques have brought revolutionary changes in the field of biomedical researches. Previously, It took a significant amount of time and much effort in detecting skin cancers. In recent years, many works have been done with Deep Learning which made the process a lot faster and much more accurate. In this paper, We have proposed a novel Convolutional Neural Networks (CNN) based approach that can classify four different types of Skin Cancer. We have developed our model SkNet consisting of 19 convolution layers. In previous works, the highest accuracy gained on 1000 images was 80.52%. Our proposed model exceeded that previous performance and achieved an accuracy of 95.26% on a dataset of 4800 images which is the highest acquired accuracy.Publication Metadata only Advancing Retinal Image Segmentation: A Denoising Diffusion Probabilistic Model Perspective(IEEE COMPUTER SOC, 2024) Alimanov, Alnur; Islam, Md Baharul; Bahcesehir University; State University System of Florida; Florida Gulf Coast UniversityRetinal images and vessel trees play a crucial role in aiding ophthalmologists to identify and diagnose various illnesses related to the eyes, blood vessels, and brain. However, manual retinal image segmentation is a laborious and highly skilled procedure, posing challenges in terms of both difficulty and time consumption. This study proposes a novel approach to retinal image segmentation, leveraging the Denoising Diffusion Probabilistic Model (DDPM) for precise performance. To our best knowledge, DDPM is being applied in this domain for the first time. Our approach incorporates a novel constraint to prevent DDPM from generating vessel structures that not present in the original retinal images during the segmentation process. Additionally, our model is not limited to the original DDPM size of 64 x 64 pixels. Instead, we train it to effectively segment images sized 256 x 256 pixels. This is a significant advancement since the original DDPM works exclusively with 64x64 image sizes and is primarily designed for generating random image samples. In our work, we address both limitations with a novel, efficient approach for accurate retinal image segmentation. A comprehensive evaluation of our methodology includes both quantitative and qualitative assessments. Our proposed method demonstrates competitive performance compared to state-of-the-art techniques, as indicated by both qualitative and quantitative scores. The source code of our method can be accessed at https://github.com/AAleka/DDPM-segmentation.Publication Metadata only ASD-EVNet: An Ensemble Vision Network based on Facial Expression for Autism Spectrum Disorder Recognition(IEEE, 2023) Jaby, Assil; Islam, Md Baharul; Ahad, Md Atiqur Rahman; Bahcesehir University; University of East LondonAutism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects individuals' social interaction, communication, and behavior. Early diagnosis and intervention are critical for the well-being and development of children with ASD. Available methods for diagnosing ASD are unpredictable (or with limited accuracy) or require significant time and resources. We aim to enhance the precision of ASD diagnosis by utilizing facial expressions, a readily accessible and limited time-consuming approach. This paper presents ASD Ensemble Vision Network (ASD-EVNet) for recognizing ASD based on facial expressions. The model utilizes three Vision Transformer (ViT) architectures, pre-trained on imageNet-21K and fine-tuned on the ASD dataset. We also develop an extensive collection of facial expression-based ASD dataset for children (FADC). The ensemble learning model was then created by combining the predictions of the three ViT models and feeding it to a classifier. Our experiments demonstrate that the proposed ensemble learning model outperforms and achieves state-of-the-art results in detecting ASD based on facial expressions.Publication Metadata only Machine Vision-Based Expert System for Automated Skin Cancer Detection(SPRINGER INTERNATIONAL PUBLISHING AG, 2022) Junayed, Masum Shah; Jeny, Afsana Ahsan; Rada, Lavdie; Islam, Md Baharul; BritoLoeza, C; MartinGonzalez, A; CastanedaZeman, V; Safi, A; Bahcesehir University; Daffodil International UniversitySkin cancer is the most frequently occurring kind of cancer, accounting for about one-third of all cases. Automatic early detection without expert intervention for a visual inspection would be of great help for society. The image processing and machine learning methods have significantly contributed to medical and biomedical research, resulting in fast and exact inspection in different problems. One of such problems is accurate cancer detection and classification. In this study, we introduce an expert system based on image processing and machine learning for skin cancer detection and classification. The proposed approach consists of three significant steps: pre-processing, feature extraction, and classification. The pre-processing step uses the grayscale conversion, Gaussian filter, segmentation, and morphological operation to represent skin lesion images better. We employ two feature extractors, i.e., the ABCD scoring method (asymmetry, border, color, diameter) and gray level co-occurrence matrix (GLCM), to extract cancer-affected areas. Finally, five different machine learning classifiers such as logistic regression (LR), decision tree (DT), k-nearest neighbors (KNN), support vector machine (SVM), and random forest (RF) used to detect and classify skin cancer. Experimental results show that random forest exceeds all other classifiers achieving an accuracy of 97.62% and 0.97 Area Under Curve (AUC), which is state-of-the-art on the experimented open-source dataset PH2.Publication Metadata only HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model(IEEE, 2022) Junayed, Masum Shah; Sadeghzadeh, Arezoo; Islam, Md Baharul; Wong, Lai-Kuan; Aydin, Tarkan; Bahcesehir University; Multimedia UniversityMonocular omnidirectional depth estimation is receiving considerable research attention due to its broad applications for sensing 360 degrees surroundings. Existing approaches in this field suffer from limitations in recovering small object details and data lost during the ground-truth depth map acquisition. In this paper, a novel monocular omnidirectional depth estimation model, namely HiMODE is proposed based on a hybrid CNN+Transformer (encoder-decoder) architecture whose modules are efficiently designed to mitigate distortion and computational cost, without performance degradation. Firstly, we design a feature pyramid network based on the HNet block to extract high-resolution features near the edges. The performance is further improved, benefiting from a self and cross attention layer and spatial/temporal patches in the Transformer encoder and decoder, respectively. Besides, a spatial residual block is employed to reduce the number of parameters. By jointly passing the deep features extracted from an input image at each backbone block, along with the raw depth maps predicted by the transformer encoder-decoder, through a context adjustment layer, our model can produce resulting depth maps with better visual quality than the ground-truth. Comprehensive ablation studies demonstrate the significance of each individual module. Extensive experiments conducted on three datasets, Stanford3D, Matterport3D, and SunCG, demonstrate that HiMODE can achieve state-of-the-art performance for 360 degrees monocular depth estimation. Complete project code and supplementary materials are available at https://github.com/himode5008/HiMODE.Publication Metadata only A New Dataset and Transformer for Stereoscopic Video Super-Resolution(IEEE, 2022) Imani, Hassan; Islam, Md Baharul; Wong, Lai-Kuan; Bahcesehir University; Multimedia UniversityStereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo video super-resolution. In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises two key novel components: a spatio-temporal convolutional self-attention layer and an optical flow-based feed-forward layer that discovers the correlation across different video frames and aligns the features. The parallax attention mechanism (PAM) that uses the cross-view information to consider the significant disparities is used to fuse the stereo views. Due to the lack of a benchmark dataset suitable for the SVSR task, we collected a new stereoscopic video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos captured using a professional stereo camera. Extensive experiments on the collected dataset, along with two other datasets, demonstrate that the Trans-SVSR can achieve competitive performance compared to the state-of-the-art methods. Project code and additional results are available at https://github.com/H-deep/Trans-SVSR/.Publication Metadata only BinoVFAR: An Efficient Binocular Visual Field Assessment Method using Augmented Reality Glasses(ASSOC COMPUTING MACHINERY, 2021) Islam, Md Baharul; Sadeghzadeh, Arezoo; Bahcesehir University; Bahcesehir UniversityVirtual Reality (VR)-based Visual Field Assessment (VFA) methods completely isolate the users from the real world, which results in nausea, eye strain, and lack of concentration and patience for the time-consuming test. In this paper, a robust binocular visual field assessment method based on novel Augmented Reality (AR) glasses is presented, namely, BinoVFAR that can simultaneously find the VF of both eyes. In this method, 60 stimuli in an arrangement of 6 rows and 10 columns randomly appear on a white background on the display of the AR glasses. These stimuli are displayed for 2 seconds that continuously change the intensities from light gray to black. Wearing the AR glasses and focusing on the central fixation point, the users are asked to click the clicker by seen a stimulus. The visible stimuli's intensities and positions are recorded in a 6 x 10 matrix based on the users' responses. A bi-cubic interpolation is applied to compute the binocular visual field map (as a 600 x 1000 matrix). A set of experiments (with an average accuracy of 99.93%), including repeatability and reproducibility tests (with an average Intra-class correlation coefficient (ICC) of 99.72%), are conducted to evaluate the BinoVFAR method.Publication Metadata only Deep Covariance Feature and CNN-based End-to-End Masked Face Recognition(IEEE, 2021) Junayed, Masum Shah; Sadeghzadeh, Arezoo; Islam, Md Baharul; Struc, V; Ivanovska, M; Bahcesehir UniversityWith the emergence of the global epidemic of COVID-19, face recognition systems have achieved much attention as contactless identity verification methods. However, covering a considerable part of the face by the mask poses severe challenges for conventional face recognition systems. This paper proposes an automated Masked Face Recognition (MFR) system based on the combination of a mask occlusion discarding technique and a deep-learning model. Initially, a pre-processing step is carried out in which the images pass three filters. Then, a Convolutional Neural Network (CNN) model is proposed to extract the features from unoccluded regions of the faces (i.e., eyes and forehead). These feature maps are employed to obtain covariance-based features. Two extra layers, i.e., Bitmap and Eigenvalue, are designed to reduce the dimension and concatenate these covariance feature matrices. The deep covariance features are quantized to codebooks combined based on Bag-of-Features (BoF) paradigm. Finally, a global histogram is created based on these codebooks and utilized for training an SVM classifier. The proposed method is trained and evaluated on Real-World-Masked-Face-Recognition-Dataset (RMFRD) and Simulated-Masked-Face-Recognition-Dataset (SMFRD) achieves an accuracy of 95.07% and 92.32%, respectively, showing its competitive performance compared to the state-of-the-art. Experimental results prove that our system has high robustness against noisy data and illumination variations.Publication Metadata only AN EFFICIENT END-TO-END IMAGE COMPRESSION TRANSFORMER(IEEE, 2022) Jeny, Afsana Ahsan; Junayed, Masum Shah; Islam, Md Baharul; Bahcesehir UniversityImage and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This paper introduces an efficient end-to-end transformer-based image compression model, which generates a global receptive field to tackle the long-range correlation issues. A hyper encoder-decoder-based transformer block employs a multi-head spatial reduction self-attention (MHSRSA) layer to minimize the computational cost of the self-attention layer and enable rapid learning of multi-scale and high-resolution features. A Casual Global Anticipation Module (CGAM) is designed to construct highly informative adjacent contexts utilizing channel-wise linkages and identify global reference points in the latent space for end-to-end rate-distortion optimization (RDO). Experimental results demonstrate the effectiveness and competitive performance of the KODAK dataset.
