Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed

Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741

Browse

Search Results

Now showing 1 - 10 of 23
  • Publication
    ARVA: An Augmented Reality-Based Visual Aid for Mobility Enhancement Through Real-Time Video Stream Transformation
    (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2024) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; Bahcesehir University; State University System of Florida; Florida Gulf Coast University; Daffodil International University
    Visual field loss (VFL) is a persistent visual impairment characterized by limited vision spots (scotoma) within the normal visual field, significantly impacting daily activities for affected individuals. Current Virtual Reality (VR) and Augmented Reality (AR)-based visual aids suffer from low video quality, content loss, high levels of contradiction, and limited mobility assessment. To address these issues, we propose an innovative vision aid utilizing AR headset and integrating advanced video processing techniques to elevate the visual perception of individuals with moderate to severe VFL to levels comparable to those with unimpaired vision. Our approach introduces a pioneering optimal video remapping function tailored to the characteristics of AR glasses. This function strategically maps the content of live video captures to the largest intact region of the visual field map, preserving quality while minimizing blurriness and content distortion. To evaluate the performance of our proposed method, a comprehensive empirical user study is conducted including object counting and multi-tasking walking track tests and involving 15 subjects with artificially induced scotomas in their normal visual fields. The proposed vision aid achieves 41.56% enhancement (from 57.31% to 98.87%) in the mean value of the average object recognition rates for all subjects in object counting test. In walking track test, the average mean scores for obstacle avoidance, detected signs, recognized signs, and grasped objects are significantly enhanced after applying the remapping function, with improvements of 7.56% (91.10% to 98.66%), 51.81% (44.85% to 96.66%), 49.31% (43.18% to 92.49%), and 77.77% (13.33% to 91.10%), respectively. Statistical analysis of data before and after applying the remapping function demonstrates the promising performance of our method in enhancing visual awareness and mobility for individuals with VFL.
  • Publication
    Assistive Visual Tool: Enhancing Safe Navigation with Video Remapping in AR Headsets
    (SPRINGER INTERNATIONAL PUBLISHING AG, 2025) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; DelBue, A; Canton, C; Pont-Tuset, J; Tommasi, T; Bahcesehir University; State University System of Florida; Florida Gulf Coast University
    Visual Field Loss (VFL) is characterized by blind spots or scotomas that poses detrimental impact on fundamental movement activities of individuals. Addressing the challenges (e.g., low video quality, content loss, high levels of contradiction, and limited mobility assessment) faced by existing Extended Reality (XR) systems as vision aids, we introduce a groundbreaking method that enriches the real-time navigation using Augmented Reality (AR) glasses. Our novel vision aid employs advanced video processing techniques to enhance visual perception in individuals with moderate to severe VFL, bridging the gap to healthy vision. A unique optimal video remapping function, tailored to our selected AR glasses characteristics, dynamically maps live video content to the largest intact region of the Visual Field (VF) map. Our method preserves video quality, minimizing blurriness and distortion. Through a comprehensive empirical user study involving 29 subjects with artificially induced scotomas, statistical analyses of object counting and multi-tasking walking track tests demonstrate the promising performance of our method in enhancing visual awareness and navigation capability in real-time.
  • Publication
    MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition
    (ELSEVIER, 2024) Sadeghzadeh, Arezoo; Shah, A. F. M. Shahen; Islam, Md Baharul; Bahcesehir University; Yildiz Technical University; State University System of Florida; Florida Gulf Coast University
    Sign language (SL) serves as a visual communication tool bearing great significance for deaf people to interact with others and facilitate their daily life. Wide varieties of SLs and the lack of interpretation knowledge necessitate developing automated sign language recognition (SLR) systems to attenuate the communication gap between the deaf and hearing communities. Despite numerous advanced static SLR systems, they are not practical and favorable enough for real-life scenarios once assessed simultaneously from different critical aspects: accuracy in dealing with high intra- and slight inter-class variations, robustness, computational complexity, and generalization ability. To this end, we propose a novel multi-lingual multi-modal SLR system, namely MLMSign, , by taking full strengths of hand-crafted features and deep learning models to enhance the performance and the robustness of the system against illumination changes while minimizing computational cost. The RGB sign images and 2D visualizations of their hand-crafted features, i.e., Histogram of Oriented Gradients (HOG) features and a * channel of L * a * b * color space, are employed as three input modalities to train a novel Convolutional Neural Network (CNN). The number of layers, filters, kernel size, learning rate, and optimization technique are carefully selected through an extensive parametric study to minimize the computational cost without compromising accuracy. The system's performance and robustness are significantly enhanced by jointly deploying the models of these three modalities through ensemble learning. The impact of each modality is optimized based on their impact coefficient determined by grid search. In addition to the comprehensive quantitative assessment, the capabilities of our proposed model and the effectiveness of ensembling over three modalities are evaluated qualitatively using the Grad-CAM visualization model. Experimental results on the test data with additional illumination changes verify the high robustness of our system in dealing with overexposed and underexposed lighting conditions. Achieving a high accuracy (> > 99.33%) . 33% ) on six benchmark datasets (i.e., Massey, Static ASL, NUS II, TSL Fingerspelling, BdSL36v1, and PSL) demonstrates that our system notably outperforms the recent state-of-the-art approaches with a minimum number of parameters and high generalization ability over complex datasets. Its promising performance for four different sign languages makes it a feasible system for multi-lingual applications.
  • Publication
    CataractNet: An Automated Cataract Detection System Using Deep Learning for Fundus Images
    (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2021) Junayed, Masum Shah; Islam, Md Baharul; Sadeghzadeh, Arezoo; Rahman, Saimunur; Daffodil International University; Bahcesehir University; Commonwealth Scientific & Industrial Research Organisation (CSIRO); CSIRO Data61
    Cataract is one of the most common eye disorders that causes vision distortion. Accurate and timely detection of cataracts is the best way to control the risk and avoid blindness. Recently, artificial intelligence-based cataract detection systems have been received research attention. In this paper, a novel deep neural network, namely CataractNet, is proposed for automatic cataract detection in fundus images. The loss and activation functions are tuned to train the network with small kernels, fewer training parameters, and layers. Thus, the computational cost and average running time of CataractNet are significantly reduced compared to other pre-trained Convolutional Neural Network (CNN) models. The proposed network is optimized with the Adam optimizer. A total of 1130 cataract and non-cataract fundus images are collected and augmented to 4746 images to train the model. For avoiding the over-fitting problem, the dataset is extended through augmentation before model training. Experimental results prove that the proposed method outperforms the state-of-the-art cataract detection approaches with an average accuracy of 99.13%.
  • Publication
    ScarNet: Development and Validation of a Novel Deep CNN Model for Acne Scar Classification With a New Dataset
    (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2022) Junayed, Masum Shah; Islam, Md Baharul; Jeny, Afsana Ahsan; Sadeghzadeh, Arezoo; Biswas, Topu; Shah, A. F. M. Shahen; Daffodil International University; Bahcesehir University; Multimedia University; Yildiz Technical University
    Acne scarring occurs in 95% of people with acne vulgaris due to collagen loss or gains when the body is healing the damages of the skin caused by acne inflammation. Accurate classification of acne scars is a vital factor in providing a timely, effective treatment protocol. Dermatologists mainly recognize the type of acne scars manually based on visual inspections, which are time- and energy-consuming and subject to intra- and inter-reader variability. In this paper, a novel automated acne scar classification system is proposed based on a deep Convolutional Neural Network (CNN) model. First, a dataset of 250 images from five different classes is collected and labeled by four well-experienced dermatologists. The pre-processed input images are fed into our proposed model, namely ScarNet, for deep feature map extraction. The optimizer, loss function, activation functions, filter and kernel sizes, regularization methods, and the batch size of the proposed architecture are tuned so that the classification performance is maximized while minimizing the computational cost. Experimental results demonstrate the feasibility of the proposed method with accuracy, specificity, and kappa score of 92.53%, 95.38%, and 76.7%, respectively.
  • Publication
    DEEP LEARNING-BASED VECTOR MOSQUITOES CLASSIFICATION FOR PREVENTING INFECTIOUS DISEASES TRANSMISSION
    (INT SOC STEREOLOGY, 2020) Asgari, Misagh; Sadeghzadeh, Arezoo; Islam, Md Baharul; Rada, Lavdie; Bozeman, James; Bahcesehir University; Daffodil International University; Bahcesehir University
    Healthcare systems worldwide are burdened by mosquitoes transmitting dangerous diseases. Conventional mosquito surveillance methods to alleviate these diseases are based on expert entomologists' manual examination of the morphological characteristics, which is time-consuming and unscalable. The lack of professional experts brings a high necessity for cheap and accurate automated alternatives for mosquito classification. This paper proposes an end-to-end deep Convolutional Neural Network (CNN) for mosquito species classification by taking advantage of both dropout layers and transfer learning to enhance performance accuracy. Dropout layers randomly disable the neurons of the neural network, mitigating co-adaptation and data overfitting. Transfer learning efficiently applies the extracted features from one dataset to others. Furthermore, a Region of Interest (ROI) visualization component is adopted to gain insight into the model learning. The generalization ability and feasibility of the proposed model are validated on four publicly available mosquito datasets. Experimental results on these datasets with an accuracy of 98.82%, 98.92%, 94.66%, and 98.40% demonstrate the superiority of our proposed system over the recent state-of-the-art approaches. The effectiveness of different numbers of dropout layers, their positions in the network, and their values are all investigated through ablation studies. Visualizing the model attention confirms that useful mosquito features are learned from insect legs and thorax through our model leading to optimistic predictions.
  • Publication
    HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model
    (IEEE, 2022) Junayed, Masum Shah; Sadeghzadeh, Arezoo; Islam, Md Baharul; Wong, Lai-Kuan; Aydin, Tarkan; Bahcesehir University; Multimedia University
    Monocular omnidirectional depth estimation is receiving considerable research attention due to its broad applications for sensing 360 degrees surroundings. Existing approaches in this field suffer from limitations in recovering small object details and data lost during the ground-truth depth map acquisition. In this paper, a novel monocular omnidirectional depth estimation model, namely HiMODE is proposed based on a hybrid CNN+Transformer (encoder-decoder) architecture whose modules are efficiently designed to mitigate distortion and computational cost, without performance degradation. Firstly, we design a feature pyramid network based on the HNet block to extract high-resolution features near the edges. The performance is further improved, benefiting from a self and cross attention layer and spatial/temporal patches in the Transformer encoder and decoder, respectively. Besides, a spatial residual block is employed to reduce the number of parameters. By jointly passing the deep features extracted from an input image at each backbone block, along with the raw depth maps predicted by the transformer encoder-decoder, through a context adjustment layer, our model can produce resulting depth maps with better visual quality than the ground-truth. Comprehensive ablation studies demonstrate the significance of each individual module. Extensive experiments conducted on three datasets, Stanford3D, Matterport3D, and SunCG, demonstrate that HiMODE can achieve state-of-the-art performance for 360 degrees monocular depth estimation. Complete project code and supplementary materials are available at https://github.com/himode5008/HiMODE.
  • Publication
    BinoVFAR: An Efficient Binocular Visual Field Assessment Method using Augmented Reality Glasses
    (ASSOC COMPUTING MACHINERY, 2021) Islam, Md Baharul; Sadeghzadeh, Arezoo; Bahcesehir University; Bahcesehir University
    Virtual Reality (VR)-based Visual Field Assessment (VFA) methods completely isolate the users from the real world, which results in nausea, eye strain, and lack of concentration and patience for the time-consuming test. In this paper, a robust binocular visual field assessment method based on novel Augmented Reality (AR) glasses is presented, namely, BinoVFAR that can simultaneously find the VF of both eyes. In this method, 60 stimuli in an arrangement of 6 rows and 10 columns randomly appear on a white background on the display of the AR glasses. These stimuli are displayed for 2 seconds that continuously change the intensities from light gray to black. Wearing the AR glasses and focusing on the central fixation point, the users are asked to click the clicker by seen a stimulus. The visible stimuli's intensities and positions are recorded in a 6 x 10 matrix based on the users' responses. A bi-cubic interpolation is applied to compute the binocular visual field map (as a 600 x 1000 matrix). A set of experiments (with an average accuracy of 99.93%), including repeatability and reproducibility tests (with an average Intra-class correlation coefficient (ICC) of 99.72%), are conducted to evaluate the BinoVFAR method.
  • Publication
    Deep Covariance Feature and CNN-based End-to-End Masked Face Recognition
    (IEEE, 2021) Junayed, Masum Shah; Sadeghzadeh, Arezoo; Islam, Md Baharul; Struc, V; Ivanovska, M; Bahcesehir University
    With the emergence of the global epidemic of COVID-19, face recognition systems have achieved much attention as contactless identity verification methods. However, covering a considerable part of the face by the mask poses severe challenges for conventional face recognition systems. This paper proposes an automated Masked Face Recognition (MFR) system based on the combination of a mask occlusion discarding technique and a deep-learning model. Initially, a pre-processing step is carried out in which the images pass three filters. Then, a Convolutional Neural Network (CNN) model is proposed to extract the features from unoccluded regions of the faces (i.e., eyes and forehead). These feature maps are employed to obtain covariance-based features. Two extra layers, i.e., Bitmap and Eigenvalue, are designed to reduce the dimension and concatenate these covariance feature matrices. The deep covariance features are quantized to codebooks combined based on Bag-of-Features (BoF) paradigm. Finally, a global histogram is created based on these codebooks and utilized for training an SVM classifier. The proposed method is trained and evaluated on Real-World-Masked-Face-Recognition-Dataset (RMFRD) and Simulated-Masked-Face-Recognition-Dataset (SMFRD) achieves an accuracy of 95.07% and 92.32%, respectively, showing its competitive performance compared to the state-of-the-art. Experimental results prove that our system has high robustness against noisy data and illumination variations.
  • Publication
    An Effective Multi-Camera Dataset and Hybrid Feature Matcher for Real-Time Video Stitching
    (IEEE, 2021) Hosen, Md Imran; Islam, Md Baharul; Sadeghzadeh, Arezoo; Cree, MJ; Bahcesehir University
    Multi-camera video stitching combines several videos captured by different cameras into a single video for a wide Field-of-View (FOV). In this paper, a novel dataset is developed for video stitching which consists of 30 video sets captured by four static cameras in various environmental scenarios. Then, a new video stitching method is proposed based on a hybrid matcher for stitching four videos with over 200 degrees FOV. The keypoints and descriptors are obtained by the scale-invariant feature transform (SIFT) and Root-SIFT, respectively. Then, these keypoint descriptors are matched by applying a hybrid matcher, a combination of Brute Force (BF), and Fast Linear Approximated Nearest Neighbours (FLANN) matchers. After geometrical verification and eliminating outlier matching points, one-time homography is estimated based on Random Sample Consensus (RANSAC). The proposed method is implemented and evaluated in different indoor/outdoor video settings. Experimental results demonstrate the capability, high accuracy, and robustness of the proposed method.