Araştırma Çıktıları | WoS | Scopus | TR-Dizin | PubMed
Permanent URI for this communityhttps://hdl.handle.net/20.500.14719/1741
Browse
8 results
Search Results
Publication Metadata only Assistive Visual Tool: Enhancing Safe Navigation with Video Remapping in AR Headsets(SPRINGER INTERNATIONAL PUBLISHING AG, 2025) Sadeghzadeh, Arezoo; Islam, Md Baharul; Uddin, Md Nur; Aydin, Tarkan; DelBue, A; Canton, C; Pont-Tuset, J; Tommasi, T; Bahcesehir University; State University System of Florida; Florida Gulf Coast UniversityVisual Field Loss (VFL) is characterized by blind spots or scotomas that poses detrimental impact on fundamental movement activities of individuals. Addressing the challenges (e.g., low video quality, content loss, high levels of contradiction, and limited mobility assessment) faced by existing Extended Reality (XR) systems as vision aids, we introduce a groundbreaking method that enriches the real-time navigation using Augmented Reality (AR) glasses. Our novel vision aid employs advanced video processing techniques to enhance visual perception in individuals with moderate to severe VFL, bridging the gap to healthy vision. A unique optimal video remapping function, tailored to our selected AR glasses characteristics, dynamically maps live video content to the largest intact region of the Visual Field (VF) map. Our method preserves video quality, minimizing blurriness and distortion. Through a comprehensive empirical user study involving 29 subjects with artificially induced scotomas, statistical analyses of object counting and multi-tasking walking track tests demonstrate the promising performance of our method in enhancing visual awareness and navigation capability in real-time.Publication Metadata only SkNet: A Convolutional Neural Networks Based Classification Approach for Skin Cancer Classes(IEEE, 2020) Jeny, Afsana Ahsan; Sakib, Abu Noman Md; Junayed, Masum Shah; Lima, Khadija Akter; Ahmed, Ikhtiar; Islam, Md Baharul; Daffodil International University; Khulna University of Engineering & Technology (KUET); Bahcesehir University; Daffodil International UniversitySkin Cancer is one of the most common types of cancer. A solution for this globally recognized health problem is much required. Machine Learning techniques have brought revolutionary changes in the field of biomedical researches. Previously, It took a significant amount of time and much effort in detecting skin cancers. In recent years, many works have been done with Deep Learning which made the process a lot faster and much more accurate. In this paper, We have proposed a novel Convolutional Neural Networks (CNN) based approach that can classify four different types of Skin Cancer. We have developed our model SkNet consisting of 19 convolution layers. In previous works, the highest accuracy gained on 1000 images was 80.52%. Our proposed model exceeded that previous performance and achieved an accuracy of 95.26% on a dataset of 4800 images which is the highest acquired accuracy.Publication Metadata only Machine Vision-Based Expert System for Automated Skin Cancer Detection(SPRINGER INTERNATIONAL PUBLISHING AG, 2022) Junayed, Masum Shah; Jeny, Afsana Ahsan; Rada, Lavdie; Islam, Md Baharul; BritoLoeza, C; MartinGonzalez, A; CastanedaZeman, V; Safi, A; Bahcesehir University; Daffodil International UniversitySkin cancer is the most frequently occurring kind of cancer, accounting for about one-third of all cases. Automatic early detection without expert intervention for a visual inspection would be of great help for society. The image processing and machine learning methods have significantly contributed to medical and biomedical research, resulting in fast and exact inspection in different problems. One of such problems is accurate cancer detection and classification. In this study, we introduce an expert system based on image processing and machine learning for skin cancer detection and classification. The proposed approach consists of three significant steps: pre-processing, feature extraction, and classification. The pre-processing step uses the grayscale conversion, Gaussian filter, segmentation, and morphological operation to represent skin lesion images better. We employ two feature extractors, i.e., the ABCD scoring method (asymmetry, border, color, diameter) and gray level co-occurrence matrix (GLCM), to extract cancer-affected areas. Finally, five different machine learning classifiers such as logistic regression (LR), decision tree (DT), k-nearest neighbors (KNN), support vector machine (SVM), and random forest (RF) used to detect and classify skin cancer. Experimental results show that random forest exceeds all other classifiers achieving an accuracy of 97.62% and 0.97 Area Under Curve (AUC), which is state-of-the-art on the experimented open-source dataset PH2.Publication Metadata only HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model(IEEE, 2022) Junayed, Masum Shah; Sadeghzadeh, Arezoo; Islam, Md Baharul; Wong, Lai-Kuan; Aydin, Tarkan; Bahcesehir University; Multimedia UniversityMonocular omnidirectional depth estimation is receiving considerable research attention due to its broad applications for sensing 360 degrees surroundings. Existing approaches in this field suffer from limitations in recovering small object details and data lost during the ground-truth depth map acquisition. In this paper, a novel monocular omnidirectional depth estimation model, namely HiMODE is proposed based on a hybrid CNN+Transformer (encoder-decoder) architecture whose modules are efficiently designed to mitigate distortion and computational cost, without performance degradation. Firstly, we design a feature pyramid network based on the HNet block to extract high-resolution features near the edges. The performance is further improved, benefiting from a self and cross attention layer and spatial/temporal patches in the Transformer encoder and decoder, respectively. Besides, a spatial residual block is employed to reduce the number of parameters. By jointly passing the deep features extracted from an input image at each backbone block, along with the raw depth maps predicted by the transformer encoder-decoder, through a context adjustment layer, our model can produce resulting depth maps with better visual quality than the ground-truth. Comprehensive ablation studies demonstrate the significance of each individual module. Extensive experiments conducted on three datasets, Stanford3D, Matterport3D, and SunCG, demonstrate that HiMODE can achieve state-of-the-art performance for 360 degrees monocular depth estimation. Complete project code and supplementary materials are available at https://github.com/himode5008/HiMODE.Publication Metadata only A New Dataset and Transformer for Stereoscopic Video Super-Resolution(IEEE, 2022) Imani, Hassan; Islam, Md Baharul; Wong, Lai-Kuan; Bahcesehir University; Multimedia UniversityStereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo video super-resolution. In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises two key novel components: a spatio-temporal convolutional self-attention layer and an optical flow-based feed-forward layer that discovers the correlation across different video frames and aligns the features. The parallax attention mechanism (PAM) that uses the cross-view information to consider the significant disparities is used to fuse the stereo views. Due to the lack of a benchmark dataset suitable for the SVSR task, we collected a new stereoscopic video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos captured using a professional stereo camera. Extensive experiments on the collected dataset, along with two other datasets, demonstrate that the Trans-SVSR can achieve competitive performance compared to the state-of-the-art methods. Project code and additional results are available at https://github.com/H-deep/Trans-SVSR/.Publication Metadata only BinoVFAR: An Efficient Binocular Visual Field Assessment Method using Augmented Reality Glasses(ASSOC COMPUTING MACHINERY, 2021) Islam, Md Baharul; Sadeghzadeh, Arezoo; Bahcesehir University; Bahcesehir UniversityVirtual Reality (VR)-based Visual Field Assessment (VFA) methods completely isolate the users from the real world, which results in nausea, eye strain, and lack of concentration and patience for the time-consuming test. In this paper, a robust binocular visual field assessment method based on novel Augmented Reality (AR) glasses is presented, namely, BinoVFAR that can simultaneously find the VF of both eyes. In this method, 60 stimuli in an arrangement of 6 rows and 10 columns randomly appear on a white background on the display of the AR glasses. These stimuli are displayed for 2 seconds that continuously change the intensities from light gray to black. Wearing the AR glasses and focusing on the central fixation point, the users are asked to click the clicker by seen a stimulus. The visible stimuli's intensities and positions are recorded in a 6 x 10 matrix based on the users' responses. A bi-cubic interpolation is applied to compute the binocular visual field map (as a 600 x 1000 matrix). A set of experiments (with an average accuracy of 99.93%), including repeatability and reproducibility tests (with an average Intra-class correlation coefficient (ICC) of 99.72%), are conducted to evaluate the BinoVFAR method.Publication Metadata only Towards Stereoscopic Video Deblurring Using Deep Convolutional Networks(SPRINGER INTERNATIONAL PUBLISHING AG, 2021) Imani, Hassan; Islam, Md Baharul; Bebis, G; Athitsos, V; Yan, T; Lau, M; Li, F; Shi, C; Yuan, X; Mousas, C; Bruder, G; Bahcesehir UniversityThese days stereoscopic cameras are commonly used in daily life, such as the new smartphones and emerging technologies. The quality of the stereo video can be affected by various factors (e.g., blur artifact due to camera/object motion). For solving this issue, several methods are proposed for monocular deblurring, and there are some limited proposed works for stereo content deblurring. This paper presents a novel stereoscopic video deblurring model considering the consecutive left and right video frames. To compensate for the motion in stereoscopic video, we feed consecutive frames from the previous and next frames to the 3D CNN networks, which can help for further deblurring. Also, our proposed model uses the stereoscopic other view information to help for deblurring. Specifically, to deblur the stereo frames, our model takes the left and right stereoscopic frames and some neighboring left and right frames as the inputs. Then, after compensation for the transformation between consecutive frames, a 3D Convolutional Neural Network (CNN) is applied to the left and right batches of frames to extract their features. This model consists of the modified 3D U-Net networks. To aggregate the left and right features, the Parallax Attention Module (PAM) is modified to fuse the left and right features and create the output deblurred frames. The experimental results on the recently proposed Stereo Blur dataset show that the proposed method can effectively deblur the blurry stereoscopic videos.Publication Metadata only PoseTED: A Novel Regression-Based Technique for Recognizing Multiple Pose Instances(SPRINGER INTERNATIONAL PUBLISHING AG, 2021) Jeny, Afsana Ahsan; Junayed, Masum Shah; Islam, Md Baharul; Bebis, G; Athitsos, V; Yan, T; Lau, M; Li, F; Shi, C; Yuan, X; Mousas, C; Bruder, G; Bahcesehir UniversityPose estimation for multiple people can be viewed as a hierarchical set predicting challenge. Algorithms are needed to classify all persons according to their physical components appropriately. Pose estimation methods are divided into two categories: (1) heatmap-based, (2) regression-based. Heatmap-based techniques are susceptible to various heuristic designs and are not end-to-end trainable, while regression-based methods involve fewer intermediary non-differentiable stages. This paper presents a novel regression-based multi-instance human pose recognition network called PoseTED. It utilizes the well-known object detector YOLOv4 for person detection, and the spatial transformer network (STN) used as a cropping filter. After that, we used a CNN-based backbone that extracts deep features and positional encoding with an encoder-decoder transformer applied for keypoint detection, solving the heuristic design problem before regression-based techniques and increasing overall performance. A prediction-based feed-forward network (FFN) is used to predict several key locations' posture as a group and display the body components as an output. Two available public datasets are tested in this experiment. Experimental results are shown on the COCO andMPII datasets, with an average precision (AP) of 73.7% on the COCO val. dataset, 72.7% on the COCO test dev. dataset, and 89.7% on the MPII datasets, respectively. These results are comparable to the state-of-the-art methods.
