sanityze.spotters

Module Contents

Classes

Spotter

The Spotter interface to be implemented

CreditCardSpotter

The Credit Card Spotter Subclass

EmailSpotter

The Email Spotter Subclass

class sanityze.spotters.Spotter(uid: str, hashSpotted=False)[source]

The Spotter interface to be implemented

uid

uid of the spotter

Type:

str

hashSpotted

False by default, whether to hash or replace the spotted sensitive information

Type:

bool, optional

getSpotterUID()[source]

return the Spotter uid

isHashSpotted()[source]

return whether the hashSpotted is True or False

process(text)[source]

process the text depending on the hashSpotted value, if it is hash, replace it with hash otherwise, replace it with some default value

Examples

Spotter should be initialized in a subclass level, therefore, skipping examples in the parent class >>>

getSpotterUID() str[source]

Getting the spotter uid

Returns:

self.uid – the spotter uid

Return type:

str

Examples

>>> sub_spotter.getSpotterUID()
"<sub class spotter UID>"
isHashSpotted() bool[source]

Getting the value of hashSpotted

Returns:

self.hashSpotted – the Truth value of hashSpotted

Return type:

bool

Examples

>>> sub_spotter.isHashSpotted()
TRUE
process(text: str) str[source]

Process the given text, if hashSpotted is True, replace the spotted text with hash, otherwise, replace the spotted text with some default values

Parameters:

text (str) – The text to be spotted & modified

Returns:

new_text

Return type:

str

Examples

>>> df = pd.DataFrame(data = {'product_name': ['laptop', 'printer foo@gaga.com', 'tablet', 'desk 5555 5555 5555 4444', 'chair'],
                            'price': [1200, 150, 300, 450, 200]})
>>> c = Cleanser()
>>> c.clean(df, verbose=False)
    product_name        price
0       laptop  1200
1       printer EMAILADDRS      150
2       tablet  300
3       desk 5555 5555 5555 4444        450
4       chair   200
class sanityze.spotters.CreditCardSpotter(uid: str, hashSpotted=False)[source]

Bases: Spotter

The Credit Card Spotter Subclass

uid

uid of the spotter, “CREDITCARD”

Type:

str

hashSpotted

False by default, whether to hash or replace the spotted sensitive information

Type:

bool, optional

getSpotterUID()[source]

return the Spotter uid, “CREDITCARD”

isHashSpotted()

return whether the hashSpotted is True or False

process(text)[source]

process the text depending on the hashSpotted value, if hashSpotted is True, replace the spotted credit card number with hash otherwise, replace the spotted credit card number with some default value

Examples

>>> CreditCardSpotter("CREDITCARDS",True)
<sanityze.spotters.CreditCardSpotter object at 0x000001207F7B5880>
getSpotterUID() str[source]

Getting the credit card spotter uid

Returns:

“CREDITCARD” – a fixed str value for CreditCardSpotter

Return type:

str

Examples

>>> cc = CreditCardSpotter("CREDITCARDS",True)
>>> cc.getSpotterUID()
CREDITCARD
process(text: str) str[source]

Process the given text, if hashSpotted is True, replace the spotted credit card number with hash, otherwise, replace the spotted credit card number with some default values

Parameters:

text (str) – The text to be spotted & modified

Returns:

new_text – the text with credit card number replaced by a hash or the default string value

Return type:

str

Examples

>>> cc = CreditCardSpotter("CREDITCARDS", False)
>>> cc.process("4556129404313766")
CREDITCARD
class sanityze.spotters.EmailSpotter(uid: str, hashSpotted=False)[source]

Bases: Spotter

The Email Spotter Subclass

uid

uid of the spotter, “EMAILADDRS”

Type:

str

hashSpotted

False by default, whether to hash or replace the spotted sensitive information

Type:

bool, optional

getSpotterUID()[source]

return the Spotter uid, “EMAILADDRS”

isHashSpotted()

return whether the hashSpotted is True or False

process(text)[source]

process the text depending on the hashSpotted value, if hashSpotted is True, replace the spotted email with hash otherwise, replace the spotted email with some default value

getSpotterUID() str[source]

Getting the email spotter uid

Returns:

“EMAILADDRS” – a fixed str value for EmailSpotter

Return type:

str

Examples

>>> ee = EmailSpotter("EMAILS", False)
>>> ee.getSpotterUID()
EMAILADDRS
process(text: str) str[source]

Process the given text, if hashSpotted is True, replace the spotted email with hash, otherwise, replace the spotted email with some default values

Parameters:

text (str) – The text to be spotted & modified

Returns:

new_text – the text with email replaced by a hash or the default string value

Return type:

str

Examples

>>> ee = EmailSpotter("EMAILS", False)
>>> ee.process("abcd1234@gmail.com")
EMAILADDRS