Summary

Top 9 papers analyzed

The Jaro Winkler Distance method is a great way to measure the similarity between strings and documents. This method uses the distance between strings to measure the similarity and is often used in both plagiarism detection and record linkage. For example, the Smith-Waterman algorithm and Jaro-Winkler algorithm have been compared for the detection of duplicate health related records, showing that Jaro-Winkler can accurately identify and match health records. Additionally, the Jaro-Winkler distance is used in feature extraction based deep indexing for image retrieval. In this case, parameters such as the length of the strings, the number of matching characters and the relative positions of the matching characters are all taken into consideration when calculating the distance. This type of distance calculation is also used in an extended Fellegi-Sunter probabilistic record linkage method, allowing for the accurate linking of patient records. To find the distance between strings using Jaro Winkler, parameters such as the length of the strings, the number of matching characters, and the relative positions of the matching characters must be taken into consideration. The distance is then calculated using a formula that takes these parameters into account. The result is a number that represents the distance between the two strings, which can then be used to measure the similarity between documents.

Consensus Meter

Yes - 0%
No - 0%
Non conclusive - 0%

For instance, it is quite easy to search through a directory of records if the name of a record is known in its exact form it is recorded; however, if there is a difference that could be as a result of incorrect spelling, then the search becomes a challenge [1]. Usually, institutions either assign their staff to manually identify the duplicate records that has to be merged/unmerged [2] or periodically review potential duplicate reports generated by automated systems so as to resolve any mismatched record. The health sector is one of the sectors that has seen a widespread use of electronic information exchange [2]. As the use of different health systems and electronic exchange of patient data increases, accurately identifying and matching health records has been recognized as a major challenge to the industry and many organizations [2]. The study will discuss duplicate detection algorithms and compare the accuracy of two mainly used algorithms namely the Smith Waterman and the Jaro Winkler algorithm.

Published By:

IE Agbehadji, H Yang, S Fong… - … on Advances in Big …, 2018 - ieeexplore.ieee.org

Cited By:

10

Jaro-Winkler is a method that calculates the distance between strings and then measures the similarity. The results of this study compare the results of plagiarism detection between the Jaro-Winkler Distance method and the Doc2Vec method.

Published By:

SC Cahyono - IOP Conference Series: Materials Science and …, 2019 - iopscience.iop.org

Cited By:

14

Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to 106 strings and show that it scales linearly with the size of the data for large thresholds.

Published By:

K Dreßler, AC Ngonga Ngomo - Semantic Web, 2017 - content.iospress.com

Cited By:

34

I. Introduction Correctly linking patients' records is essential in care delivery and epidemiological research. Because of legislation of privacy protection, the use of a unique patient identifier - such as a social security number-to link patient data is not allowed in many countries [1]. To make the linkage feasible, we can compare a range of available identifier fields (e.g. first name, last name, birthdate and sex) in record pairs among different databases, in order to make a decision according to their agreement/disagreement. Unfortunately, these identifiers are sometimes subject to typographical errors [2]–[4]. Thus, we need an efficient record linkage method with a strong theoretical background to link patients' records.

Published By:

X Li, A Guttmann, S Cipière, L Maigne… - … on Biomedical and …, 2014 - ieeexplore.ieee.org

Cited By:

21

If you are a member of an institution with an active account, you may be able to access content in one of the following ways: IP based access Typically, access is provided across an institutional network to a range of IP addresses. Society Members Society member access to a journal is achieved in one of the following ways: Sign in through society site Many societies offer single sign-on between the society website and Oxford Academic.

Published By:

BM Kumar, BS Ainapure, SP Singh… - The Computer …, 2022 - academic.oup.com

Cited By:

1

Supported by an article published on one of the news websites shows that plagiarism cases have been found among academics. Plagiarism is an act of misusing someone's work by quoting part or all of the work without including an exact and clear source. [16] One way to reduce acts of plagiarism is to detect the similarity of text in a document.

Published By:

K Manaf, SW Pitara, B Subaeki… - 2019 IEEE 13th …, 2019 - ieeexplore.ieee.org

Cited By:

8

The digital environments for human learning have been much evolving thanks to the incredible progress of information technologies. The digital environments for human learning have been much evolving thanks to the incredible progress of information technologies.

Published By:

H Gueddah, A Yousfi… - 2015 IEEE/ACS 12th …, 2015 - ieeexplore.ieee.org

Cited By:

13

One of known methods for KWS problem is phone lattice search (PLS). In this method, accuracy and speed of lattice search are most important aspects. In this paper, we propose some approaches to improve the false alarm rate and also the search speed.

Published By:

M Rajabzadeh, S Tabibian, A Akbari… - The 16th CSI …, 2012 - ieeexplore.ieee.org

Cited By:

20

I. Introduction The ease of internet access has led to high developments in social media and also the conversion of conventional news media to online news media. One of the social media that is often used by the public to express their opinion is Twitter [1]. Meanwhile, in terms of online news media, according to a survey conducted by Alexa [2], Liputan6.com and Detik.com are online media sites with the highest traffic numbers in Indonesia.

Published By:

V Nurcahyawati, Z Mustaffa - 2020 Emerging Technology in …, 2020 - ieeexplore.ieee.org

Cited By:

2