Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed
Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741
Browse
11 results
Search Results
Publication Metadata only Assistive Visual Tool: Enhancing Safe Navigation with Video Remapping in AR Headsets(SPRINGER INTERNATIONAL PUBLISHING AG, 2025) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; DelBue, A; Canton, C; Pont-Tuset, J; Tommasi, T; Bahcesehir University; State University System of Florida; Florida Gulf Coast UniversityVisual Field Loss (VFL) is characterized by blind spots or scotomas that poses detrimental impact on fundamental movement activities of individuals. Addressing the challenges (e.g., low video quality, content loss, high levels of contradiction, and limited mobility assessment) faced by existing Extended Reality (XR) systems as vision aids, we introduce a groundbreaking method that enriches the real-time navigation using Augmented Reality (AR) glasses. Our novel vision aid employs advanced video processing techniques to enhance visual perception in individuals with moderate to severe VFL, bridging the gap to healthy vision. A unique optimal video remapping function, tailored to our selected AR glasses characteristics, dynamically maps live video content to the largest intact region of the Visual Field (VF) map. Our method preserves video quality, minimizing blurriness and distortion. Through a comprehensive empirical user study involving 29 subjects with artificially induced scotomas, statistical analyses of object counting and multi-tasking walking track tests demonstrate the promising performance of our method in enhancing visual awareness and navigation capability in real-time.Publication Metadata only Advancing Retinal Image Segmentation: A Denoising Diffusion Probabilistic Model Perspective(IEEE COMPUTER SOC, 2024) Alimanov, Alnur; Islam, Md Baharul; Bahcesehir University; State University System of Florida; Florida Gulf Coast UniversityRetinal images and vessel trees play a crucial role in aiding ophthalmologists to identify and diagnose various illnesses related to the eyes, blood vessels, and brain. However, manual retinal image segmentation is a laborious and highly skilled procedure, posing challenges in terms of both difficulty and time consumption. This study proposes a novel approach to retinal image segmentation, leveraging the Denoising Diffusion Probabilistic Model (DDPM) for precise performance. To our best knowledge, DDPM is being applied in this domain for the first time. Our approach incorporates a novel constraint to prevent DDPM from generating vessel structures that not present in the original retinal images during the segmentation process. Additionally, our model is not limited to the original DDPM size of 64 x 64 pixels. Instead, we train it to effectively segment images sized 256 x 256 pixels. This is a significant advancement since the original DDPM works exclusively with 64x64 image sizes and is primarily designed for generating random image samples. In our work, we address both limitations with a novel, efficient approach for accurate retinal image segmentation. A comprehensive evaluation of our methodology includes both quantitative and qualitative assessments. Our proposed method demonstrates competitive performance compared to state-of-the-art techniques, as indicated by both qualitative and quantitative scores. The source code of our method can be accessed at https://github.com/AAleka/DDPM-segmentation.Publication Metadata only ASD-EVNet: An Ensemble Vision Network based on Facial Expression for Autism Spectrum Disorder Recognition(IEEE, 2023) Jaby, Assil; Islam, Md Baharul; Ahad, Md Atiqur Rahman; Bahcesehir University; University of East LondonAutism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects individuals' social interaction, communication, and behavior. Early diagnosis and intervention are critical for the well-being and development of children with ASD. Available methods for diagnosing ASD are unpredictable (or with limited accuracy) or require significant time and resources. We aim to enhance the precision of ASD diagnosis by utilizing facial expressions, a readily accessible and limited time-consuming approach. This paper presents ASD Ensemble Vision Network (ASD-EVNet) for recognizing ASD based on facial expressions. The model utilizes three Vision Transformer (ViT) architectures, pre-trained on imageNet-21K and fine-tuned on the ASD dataset. We also develop an extensive collection of facial expression-based ASD dataset for children (FADC). The ensemble learning model was then created by combining the predictions of the three ViT models and feeding it to a classifier. Our experiments demonstrate that the proposed ensemble learning model outperforms and achieves state-of-the-art results in detecting ASD based on facial expressions.Publication Metadata only Machine Vision-Based Expert System for Automated Skin Cancer Detection(SPRINGER INTERNATIONAL PUBLISHING AG, 2022) Junayed, Masum Shah; Jeny, Afsana Ahsan; Rada, Lavdie; Islam, Md Baharul; BritoLoeza, C; MartinGonzalez, A; CastanedaZeman, V; Safi, A; Bahcesehir University; Daffodil International UniversitySkin cancer is the most frequently occurring kind of cancer, accounting for about one-third of all cases. Automatic early detection without expert intervention for a visual inspection would be of great help for society. The image processing and machine learning methods have significantly contributed to medical and biomedical research, resulting in fast and exact inspection in different problems. One of such problems is accurate cancer detection and classification. In this study, we introduce an expert system based on image processing and machine learning for skin cancer detection and classification. The proposed approach consists of three significant steps: pre-processing, feature extraction, and classification. The pre-processing step uses the grayscale conversion, Gaussian filter, segmentation, and morphological operation to represent skin lesion images better. We employ two feature extractors, i.e., the ABCD scoring method (asymmetry, border, color, diameter) and gray level co-occurrence matrix (GLCM), to extract cancer-affected areas. Finally, five different machine learning classifiers such as logistic regression (LR), decision tree (DT), k-nearest neighbors (KNN), support vector machine (SVM), and random forest (RF) used to detect and classify skin cancer. Experimental results show that random forest exceeds all other classifiers achieving an accuracy of 97.62% and 0.97 Area Under Curve (AUC), which is state-of-the-art on the experimented open-source dataset PH2.Publication Metadata only Deep Covariance Feature and CNN-based End-to-End Masked Face Recognition(IEEE, 2021) Junayed, Masum Shah; Sadeghzadeh, Arezoo; Islam, Md Baharul; Struc, V; Ivanovska, M; Bahcesehir UniversityWith the emergence of the global epidemic of COVID-19, face recognition systems have achieved much attention as contactless identity verification methods. However, covering a considerable part of the face by the mask poses severe challenges for conventional face recognition systems. This paper proposes an automated Masked Face Recognition (MFR) system based on the combination of a mask occlusion discarding technique and a deep-learning model. Initially, a pre-processing step is carried out in which the images pass three filters. Then, a Convolutional Neural Network (CNN) model is proposed to extract the features from unoccluded regions of the faces (i.e., eyes and forehead). These feature maps are employed to obtain covariance-based features. Two extra layers, i.e., Bitmap and Eigenvalue, are designed to reduce the dimension and concatenate these covariance feature matrices. The deep covariance features are quantized to codebooks combined based on Bag-of-Features (BoF) paradigm. Finally, a global histogram is created based on these codebooks and utilized for training an SVM classifier. The proposed method is trained and evaluated on Real-World-Masked-Face-Recognition-Dataset (RMFRD) and Simulated-Masked-Face-Recognition-Dataset (SMFRD) achieves an accuracy of 95.07% and 92.32%, respectively, showing its competitive performance compared to the state-of-the-art. Experimental results prove that our system has high robustness against noisy data and illumination variations.Publication Metadata only AN EFFICIENT END-TO-END IMAGE COMPRESSION TRANSFORMER(IEEE, 2022) Jeny, Afsana Ahsan; Junayed, Masum Shah; Islam, Md Baharul; Bahcesehir UniversityImage and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This paper introduces an efficient end-to-end transformer-based image compression model, which generates a global receptive field to tackle the long-range correlation issues. A hyper encoder-decoder-based transformer block employs a multi-head spatial reduction self-attention (MHSRSA) layer to minimize the computational cost of the self-attention layer and enable rapid learning of multi-scale and high-resolution features. A Casual Global Anticipation Module (CGAM) is designed to construct highly informative adjacent contexts utilizing channel-wise linkages and identify global reference points in the latent space for end-to-end rate-distortion optimization (RDO). Experimental results demonstrate the effectiveness and competitive performance of the KODAK dataset.Publication Metadata only DeepPyNet: A Deep Feature Pyramid Network for Optical Flow Estimation(IEEE, 2021) Jeny, Afsana Ahsan; Islam, Md Baharul; Aydin, Tarkan; Cree, MJ; Bahcesehir UniversityRecent advances in optical flow prediction have been made possible by using feature pyramids and iterative refining. Though downsampling in feature pyramids may cause foreground items to merge with the background, the iterative processing could be incorrect in optical flow experiments. Particularly the outcomes of the movement of narrow and tiny objects can be more invisible in the flow scene. We introduce a novel method called DeepPyNet for optical flow estimation that includes feature extractor, multi-channel cost volume, and flow decoder. In this method, we propose a deep recurrent feature pyramid-based network for the end-to-end optical flow estimation. The feature extraction from each pixel of the feature map keeps essential information without modifying the feature receptive field. Then, a multi-scale 4 Dc orrelation volume is built from the visual similarity of each pair of pixels. Finally, we utilize the multi-scale correlation volumes to continuously update the flow field through an iterative recurrent method. Experimental results demonstrate that DeepPyNet significantly eliminates flow errors and provides state-of-the-art performance in various datasets. Moreover, DeepPyNet is less complex and uses only 6.1M parameters 81% and 35% smaller than the popular FlowNet and PWC-Net+, respectively.Publication Metadata only Towards Stereoscopic Video Deblurring Using Deep Convolutional Networks(SPRINGER INTERNATIONAL PUBLISHING AG, 2021) Imani, Hassan; Islam, Md Baharul; Bebis, G; Athitsos, V; Yan, T; Lau, M; Li, F; Shi, C; Yuan, X; Mousas, C; Bruder, G; Bahcesehir UniversityThese days stereoscopic cameras are commonly used in daily life, such as the new smartphones and emerging technologies. The quality of the stereo video can be affected by various factors (e.g., blur artifact due to camera/object motion). For solving this issue, several methods are proposed for monocular deblurring, and there are some limited proposed works for stereo content deblurring. This paper presents a novel stereoscopic video deblurring model considering the consecutive left and right video frames. To compensate for the motion in stereoscopic video, we feed consecutive frames from the previous and next frames to the 3D CNN networks, which can help for further deblurring. Also, our proposed model uses the stereoscopic other view information to help for deblurring. Specifically, to deblur the stereo frames, our model takes the left and right stereoscopic frames and some neighboring left and right frames as the inputs. Then, after compensation for the transformation between consecutive frames, a 3D Convolutional Neural Network (CNN) is applied to the left and right batches of frames to extract their features. This model consists of the modified 3D U-Net networks. To aggregate the left and right features, the Parallax Attention Module (PAM) is modified to fuse the left and right features and create the output deblurred frames. The experimental results on the recently proposed Stereo Blur dataset show that the proposed method can effectively deblur the blurry stereoscopic videos.Publication Metadata only PoseTED: A Novel Regression-Based Technique for Recognizing Multiple Pose Instances(SPRINGER INTERNATIONAL PUBLISHING AG, 2021) Jeny, Afsana Ahsan; Junayed, Masum Shah; Islam, Md Baharul; Bebis, G; Athitsos, V; Yan, T; Lau, M; Li, F; Shi, C; Yuan, X; Mousas, C; Bruder, G; Bahcesehir UniversityPose estimation for multiple people can be viewed as a hierarchical set predicting challenge. Algorithms are needed to classify all persons according to their physical components appropriately. Pose estimation methods are divided into two categories: (1) heatmap-based, (2) regression-based. Heatmap-based techniques are susceptible to various heuristic designs and are not end-to-end trainable, while regression-based methods involve fewer intermediary non-differentiable stages. This paper presents a novel regression-based multi-instance human pose recognition network called PoseTED. It utilizes the well-known object detector YOLOv4 for person detection, and the spatial transformer network (STN) used as a cropping filter. After that, we used a CNN-based backbone that extracts deep features and positional encoding with an encoder-decoder transformer applied for keypoint detection, solving the heuristic design problem before regression-based techniques and increasing overall performance. A prediction-based feed-forward network (FFN) is used to predict several key locations' posture as a group and display the body components as an output. Two available public datasets are tested in this experiment. Experimental results are shown on the COCO andMPII datasets, with an average precision (AP) of 73.7% on the COCO val. dataset, 72.7% on the COCO test dev. dataset, and 89.7% on the MPII datasets, respectively. These results are comparable to the state-of-the-art methods.Publication Metadata only An Effective Multi-Camera Dataset and Hybrid Feature Matcher for Real-Time Video Stitching(IEEE, 2021) Hosen, Md Imran; Islam, Md Baharul; Sadeghzadeh, Arezoo; Cree, MJ; Bahcesehir UniversityMulti-camera video stitching combines several videos captured by different cameras into a single video for a wide Field-of-View (FOV). In this paper, a novel dataset is developed for video stitching which consists of 30 video sets captured by four static cameras in various environmental scenarios. Then, a new video stitching method is proposed based on a hybrid matcher for stitching four videos with over 200 degrees FOV. The keypoints and descriptors are obtained by the scale-invariant feature transform (SIFT) and Root-SIFT, respectively. Then, these keypoint descriptors are matched by applying a hybrid matcher, a combination of Brute Force (BF), and Fast Linear Approximated Nearest Neighbours (FLANN) matchers. After geometrical verification and eliminating outlier matching points, one-time homography is estimated based on Random Sample Consensus (RANSAC). The proposed method is implemented and evaluated in different indoor/outdoor video settings. Experimental results demonstrate the capability, high accuracy, and robustness of the proposed method.
