FuzzyWuzzy Python library

shardul Kulkarni
3 min readFeb 25, 2021

--

Photo by Tianyi Ma on Unsplash

You can compare data in Python using Different Libraries & Methods

1) Regex : Python Methods & Functions
2) Simple compare : Python Methods & Functions
3) Difflib Python Library : FuzzyWuzzy Python library

Today we will discuss about FuzzyWuzzy Python library

It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

Python 2.7 or higher
fuzzywuzzy Library
pymysql Library

Installation

Using PIP via PyPI

pip install fuzzywuzzy

Usage

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

Fuzzywuzzy Types

  1. Simple Ratio
  2. Partial Ratio
  3. Token Sort Ratio
  4. Token Set Ratio

Partial Ratio

The partial ratio() function allows us to perform substring matching. This works by taking the shortest string and matching it with all substrings that are of the same length.

Token Sort Ratio

FuzzyWuzzy also has token functions that tokenize the strings, change capitals to lowercase, and remove punctuation. The token_sort_ratio() function sorts the strings alphabetically and then joins them together.

Token Set Ratio

The token_set_ratio() function is similar to the token_sort_ratio() function above, except it takes out the common tokens before calculating the fuzz.ratio() between the new strings. This function is the most helpful when applied to a set of strings with a significant difference in lengths.

Example

import fuzzywuzzy
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

# Simple Ratio Start
strFirst = ‘’
strSecond = ‘’
strFirst = ‘This is a fuzzywuzzy Example by Shardul !’
strSecond = ‘This is a fuzzywuzzy Example by Shardul.’

ratio = fuzz.ratio(strFirst, strSecond)
print(‘String Compare Percentage using Ratio is : ‘ + str(ratio))
# OutPut : “String Compare Percentage using Ratio is : 97”
# Simple Ratio End

# Partial Ratio Start
strFirst = ‘’
strSecond = ‘’
strFirst = ‘This is a fuzzywuzzy Example by Shardul !’
strSecond = ‘This is a fuzzywuzzy Example by Shardul.’

ratio = fuzz.partial_ratio(strFirst, strSecond)
print(‘String Compare Percentage using Partial Ratio is : ‘ + str(ratio))
# OutPut : “String Compare Percentage using Partial Ratio is : 100”
# Partial Ratio End

# Token Sort Ratio Start
strFirst = ‘’
strSecond = ‘’
strFirst = ‘This is a fuzzy wuzzy Example by Shardul !’
strSecond = ‘This is a wuzzy fuzzy Example by Shardul.’

ratio = fuzz.token_sort_ratio(strFirst, strSecond)
print(‘String Compare Percentage using Token Sort Ratio is : ‘ + str(ratio))
# OutPut : “String Compare Percentage using Token Sort Ratio is : 100”
# Token Sort Ratio End

# Token Set Ratio Start
strFirst = ‘’
strSecond = ‘’
strFirst = ‘This is a fuzzy wuzzy Example by Shardul !’
strSecond = ‘This is a fuzzy Example by Shardul.’

ratio = fuzz.token_set_ratio(strFirst, strSecond)
print(‘String Compare Percentage using Token Set Ratio is : ‘ + str(ratio))
# OutPut : “String Compare Percentage using Token Set Ratio is : 100”
# Token Set Ratio End

Reference

https://towardsdatascience.com/string-matching-with-fuzzywuzzy-e982c61f8a84
https://www.geeksforgeeks.org/fuzzywuzzy-python-library/
https://pypi.org/project/fuzzywuzzy/

Assignment For you

Compare Data and store 90% Match data into Other Table

--

--

shardul Kulkarni
shardul Kulkarni

Written by shardul Kulkarni

I have more than 12.5 years exp in IT Domain. Have good hands on PHP, Python, MySql etc

No responses yet