Residential College | false |
Status | 已發表Published |
The Comparative Analysis of Smith-Waterman Algorithm with Jaro-Winkler Algorithm for the Detection of Duplicate Health Related Records | |
Israel Edem Agbehadji1; Hongji Yang2; Simon Fong3; Richard Millham1 | |
2018-09-17 | |
Conference Name | 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD) |
Source Publication | 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD) |
Conference Date | 6-7 Aug. 2018 |
Conference Place | Durban, South Africa |
Publication Place | 345 E 47TH ST, NEW YORK, NY 10017 USA |
Publisher | IEEE |
Abstract | Duplicate detection is a process of identifying a pair of words that refers to the same real-word object. Generally, words consist of letters that have a syntax representation. In most cases, words, such as names, are incorrectly spelt during data entry and that creates duplicate data and if it is unresolved could lead to inc onsistency of data. Fundamental algorithms that are applied in th e design of duplicate detection systems includes Smith-Waterman and Jaro-Winkler algorithms. The study compares and analyses t he application of Smith-Waterman algorithm and Jaro-Winkler a lgorithm to find duplicate words in large dataset such as health d ataset. The basis for comparison is to find how accurate these algo rithms are in detecting duplicate words in large health dataset. T he contribution of this paper is the use of transitive and symmetry property on both Smith-Waterman and Jaro-Winkler algorithm when large dataset is involved in the duplicate detection processes |
Keyword | Smith-waterman Algorithm Jaro-winkler Algorith m Duplicate Detection Similarity Measure Tokenization |
DOI | 10.1109/ICABCD.2018.8465458 |
URL | View the original |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science ; Engineering ; Telecommunications |
WOS Subject | Computer Science, Theory & Methods ; Engineering, Electrical & Electronic ; Telecommunications |
WOS ID | WOS:000446104500068 |
The Source to Article | WOS |
Scopus ID | 2-s2.0-85054617689 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | 1.ICT and Society Research Group Department of Information Technology Durban University of Technology, Durban, South Africa 2.Department of Computer and Information Science, Bath Spa University, Bath, UK 3.Department of Computer and Information Science University of Macau Taipa, Macau SAR |
Recommended Citation GB/T 7714 | Israel Edem Agbehadji,Hongji Yang,Simon Fong,et al. The Comparative Analysis of Smith-Waterman Algorithm with Jaro-Winkler Algorithm for the Detection of Duplicate Health Related Records[C], 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE, 2018. |
APA | Israel Edem Agbehadji., Hongji Yang., Simon Fong., & Richard Millham (2018). The Comparative Analysis of Smith-Waterman Algorithm with Jaro-Winkler Algorithm for the Detection of Duplicate Health Related Records. 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD). |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment