Publication: An improved method of locality-sensitive hashing for scalable instance matching
| dc.contributor.author | Aydar, Mehmet | |
| dc.contributor.author | Ayvaz, Serkan | |
| dc.contributor.institution | Aydar, Mehmet, Department of Computer Science, Kent State University, Kent, United States | |
| dc.contributor.institution | Ayvaz, Serkan, Department of Software Engineering, Bahçeşehir Üniversitesi, Istanbul, Turkey | |
| dc.date.accessioned | 2025-10-05T16:01:21Z | |
| dc.date.issued | 2019 | |
| dc.description.abstract | In this study, we propose a scalable approach for automatically identifying similar candidate instance pairs in very large datasets. Efficient candidate pair generation is an essential to many computational problems involving calculation of instance similarities. Calculating similarities of instances with a large number of properties and efficiently matching a large number of similar instances in a scalable way are two significant bottlenecks of candidate instance pair generation. In our approach, we utilize locality-sensitive hashing (LSH) technique to greatly improve the scalability of candidate instance pair generation. Based on the candidate similarity threshold, our algorithm automatically discovers the optimum number of hash functions in each band in LSH. Moreover, we evaluated the scalability of our approach and its effectiveness in instance matching task using real-world very large datasets. © 2021 Elsevier B.V., All rights reserved. | |
| dc.identifier.doi | 10.1007/s10115-018-1199-5 | |
| dc.identifier.endpage | 294 | |
| dc.identifier.issn | 02193116 | |
| dc.identifier.issn | 02191377 | |
| dc.identifier.issue | 2 | |
| dc.identifier.scopus | 2-s2.0-85046024215 | |
| dc.identifier.startpage | 275 | |
| dc.identifier.uri | https://doi.org/10.1007/s10115-018-1199-5 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14719/11220 | |
| dc.identifier.volume | 58 | |
| dc.language.iso | en | |
| dc.publisher | Springer London | |
| dc.relation.source | Knowledge and Information Systems | |
| dc.subject.authorkeywords | Candidate Pairs Generation | |
| dc.subject.authorkeywords | Instance Matching | |
| dc.subject.authorkeywords | Instance Similarity | |
| dc.subject.authorkeywords | Locality-sensitive Hashing | |
| dc.subject.authorkeywords | Scalability | |
| dc.subject.authorkeywords | Hash Functions | |
| dc.subject.authorkeywords | Scalability | |
| dc.subject.authorkeywords | Calculating Similarities | |
| dc.subject.authorkeywords | Candidate Pairs Generation | |
| dc.subject.authorkeywords | Computational Problem | |
| dc.subject.authorkeywords | Instance Matching | |
| dc.subject.authorkeywords | Instance Similarity | |
| dc.subject.authorkeywords | Locality Sensitive Hashing | |
| dc.subject.authorkeywords | Scalable Approach | |
| dc.subject.authorkeywords | Similarity Threshold | |
| dc.subject.authorkeywords | Large Dataset | |
| dc.subject.indexkeywords | Hash functions | |
| dc.subject.indexkeywords | Scalability | |
| dc.subject.indexkeywords | Calculating similarities | |
| dc.subject.indexkeywords | Candidate Pairs Generation | |
| dc.subject.indexkeywords | Computational problem | |
| dc.subject.indexkeywords | Instance matching | |
| dc.subject.indexkeywords | Instance Similarity | |
| dc.subject.indexkeywords | Locality sensitive hashing | |
| dc.subject.indexkeywords | Scalable approach | |
| dc.subject.indexkeywords | Similarity threshold | |
| dc.subject.indexkeywords | Large dataset | |
| dc.title | An improved method of locality-sensitive hashing for scalable instance matching | |
| dc.type | Article | |
| dcterms.references | Achichi, Manel, Results of the Ontology Alignment Evaluation Initiative 2016, CEUR Workshop Proceedings, 1766, pp. 73-129, (2016), Aumueller, David, Schema and ontology matching with COMA++, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 906-908, (2005), Workshop on Intelligent Exploration of Semantic Data Iesd2015 Co Located with Iswc2015, (2015), Ayvaz, Serkan, Building Summary Graphs of RDF Data in Semantic Web, Proceedings - IEEE Computer Society's International Computer Software and Applications Conference, 2, pp. 686-691, (2015), Berlin, Jacob, Database schema matching using machine learning with feature selection, Lecture Notes in Computer Science, 2348, pp. 452-466, (2002), Bilenko, Mikhail, Adaptive name matching in information integration, IEEE Intelligent Systems, 18, 5, pp. 16-23, (2003), Bilke, Alexander, Schema matching using duplicates, Proceedings - International Conference on Data Engineering, pp. 69-80, (2005), Bizer, Christian, Linked data - The story so far, International Journal on Semantic Web and Information Systems, 5, 3, pp. 1-22, (2009), Broder, Andrei Z., On the resemblance and containment of documents, pp. 21-29, (1997), Castano, Silvana, Instance matching for ontology population, pp. 121-132, (2008) | |
| dspace.entity.type | Publication | |
| local.indexed.at | Scopus | |
| person.identifier.scopus-author-id | 57063196900 | |
| person.identifier.scopus-author-id | 56676074300 |
