text_sensitivity.perturbation

Apply perturbation to one or multiple (tokenized) strings.

Submodules:

text_sensitivity.perturbation.base module

Apply perturbations to TextInstances and/or strings, generating one or many new instances.

class text_sensitivity.perturbation.base.OneToManyPerturbation(perturbation_function)

Bases: Perturbation

Apply a perturbation function to a single TextInstance, getting a multiple results per instance.

Parameters: perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instances. Should return None if no perturbation has been applied.

classmethod from_dictionary(dictionary, label_from, label_to, n=10, tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToManyPerturbation from a dictionary.

Example

Replace the word ‘good’ (positive) with ‘bad’, ‘mediocre’, ‘terrible’ (negative) up to 5 times in each instance. The default tokenizer/detokenizer assumes word-level tokens:

>>> replacements = {'good': ['bad', 'mediocre', 'terrible']}
>>> OneToManyPerturbation.from_dictionary(replacement,
>>>                                       n=5,
>>>                                       label_from='positive',
>>>                                       label_to='negative')

Parameters

dictionary (Dict[str, List[str]]) – Lookup dictionary to map tokens (e.g. words, characters).
label_from (LT) – Attribute label of original instance (left-hand side of dictionary).
label_to (LT) – Attribute label of perturbed instance (right-hand side of dictionary).
n (int, optional) – Number of instances to generate. Defaults to 10.
tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.
detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

classmethod from_function(function, label_from='original', label_to='perturbed', n=10, perform_once=False)

Construct a OneToManyPerturbation from a perturbation applied to a string.

Parameters

function (Callable[[str], Optional[Union[str, Sequence[str]]]]) – Function to apply to each string. Return None if no change was applied.
label_from (LT, optional) – Attribute label of original instance. Defaults to ‘original’.
label_to (LT, optional) – Attribute label of perturbed instance. Defaults to ‘perturbed’.
n (int, optional) – Number of instances to generate. Defaults to 10.
perform_once (bool, optional) – If the n parameter is in class construction perform once. Defaults to False.

classmethod from_nlpaug(augmenter, label_from='original', label_to='perturbed', n=10, **augment_kwargs)

Construct a OneToManyPerturbation from a nlpaug Augmenter.

Example

Add n=5 versions of keyboard typing mistakes to lowercase characters in a sentence using nlpaug.augmenter.char.KeyboardAug():

>>> import nlpaug.augmenter.char as nac
>>> augmenter = nac.KeyboardAug(include_upper_case=False,
>>>                             include_special_char=False)
>>> OneToManyPerturbation.from_nlpaug(augmenter, n=5, label_from='no_typos', label_to='typos')

Parameters

augmenter (Augmenter) – Class with .augment() function applying a perturbation to a string.
label_from (LT, optional) – Attribute label of original instance. Defaults to ‘original’.
label_to (LT, optional) – Attribute label of perturbed instance. Defaults to ‘perturbed’.
n (int, optional) – Number of instances to generate. Defaults to 10.
**augment_kwargs – Optional arguments passed to .augment() function.

perturb(instance)

Apply a perturbation function to a single TextInstance, getting a multiple results per instance.

Parameters

perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instances. Should return None if no perturbation has been applied.
instance (TextInstance) –

Returns

None if no perturbation has been applied.: Otherwise a sequence of perturbed TextInstances, and attribute labels for the original and perturbed instances.

Return type

Optional[Sequence[Tuple[TextInstance, Sequence[Tuple[KT, LT]]]]]

class text_sensitivity.perturbation.base.OneToOnePerturbation(perturbation_function)

Bases: Perturbation

Apply a perturbation function to a single TextInstance, getting a single result per instance.

Parameters: perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instance. Should return None if no perturbation has been applied.

classmethod from_dictionary(dictionary, label_from, label_to, tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToOnePerturbation from a dictionary.

Example

Replace the word ‘a’ or ‘an’ (indefinite article) with ‘the’ (definite article) in each instance. The default tokenizer/detokenizer assumes word-level tokens:

>>> replacements = {'a': 'the',
>>>                 'an': 'the'}
>>> OneToOnePerturbation.from_dictionary(replacement,
>>>                                      label_from='indefinite',
>>>                                      label_to='definite')

Replace the character ‘.’ with ‘!’ (character-level replacement): >>> from text_explainability import character_tokenizer, character_detokenizer >>> OneToOnePerturbation.from_dictionary({‘.’: ‘!’}, >>> label_from=’not_excited’, >>> label_to=’excited’, >>> tokenizer=character_tokenizer, >>> detokenizer=character_detokenizer)

Parameters

dictionary (Dict[str, str]) – Lookup dictionary to map tokens (e.g. words, characters).
label_from (LT) – Attribute label of original instance (left-hand side of dictionary).
label_to (LT) – Attribute label of perturbed instance (right-hand side of dictionary).
tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.
detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

classmethod from_list(mapping_list, label_from='original', label_to='perturbed', tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToOnePerturbation from a list.

A function is constructed that aims to map any value in the list to any other value in the list.

Example

For example, if list [‘Amsterdam’, ‘Rotterdam’, ‘Utrecht’] is provided it aims to map ‘Amsterdam’ to ‘Rotterdam’ or ‘Utrecht’, ‘Rotterdam’ to ‘Amsterdam’ to ‘Utrecht’ and ‘Utrecht’ to ‘Rotterdam’ or ‘Amsterdam’. If None of these is possible, it returns None.

>>> map_list = ['Amsterdam', 'Rotterdam', 'Utrecht']
>>> OneToOnePerturbation.from_list(map_list)

Parameters

mapping_list (List[str]) – Lookup list of tokens (e.g. words, characters).
label_from (LT) – Attribute label of original instance (non-replaced).
label_to (LT) – Attribute label of perturbed instance (replaced).
tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.
detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

classmethod from_nlpaug(augmenter, label_from='original', label_to='perturbed', **augment_kwargs)

Construct a OneToOnePerturbation from a nlpaug Augmenter.

Example

Add random spaces to words in a sentence using nlpaug.augmenter.word.SplitAug():

>>> import nlpaug.augmenter.word as naw
>>> OneToOnePerturbation.from_nlpaug(naw.SplitAug(), label_to='with_extra_space')

Or add keyboard typing mistakes to lowercase characters in a sentence using nlpaug.augmenter.char.KeyboardAug():

>>> import nlpaug.augmenter.char as nac
>>> augmenter = nac.KeyboardAug(include_upper_case=False,
>>>                             include_special_char=False,
>>>                             include_numeric=False)
>>> OneToOnePerturbation.from_nlpaug(augmenter, label_from='no_typos', label_to='typos')

Parameters

augmenter (Augmenter) – Class with .augment() function applying a perturbation to a string.
label_from (LT, optional) – Attribute label of original instance. Defaults to ‘original’.
label_to (LT, optional) – Attribute label of perturbed instance. Defaults to ‘perturbed’.
**augment_kwargs – Optional arguments passed to .augment() function.

classmethod from_string(prefix=None, suffix=None, replacement=None, label_from='original', label_to='perturbed', connector=' ', connector_before=None, connector_after=None)

Construct a OneToOnePerturbation from a string (replacement, prefix and/or suffix).

Provides the ability to replace each instance string with a new one, add a prefix to each instance string and/or add a suffix to each instance string. At least one of prefix, suffix or replacement should be a string to apply the replacement.

Example

Add a random unrelated string ‘Dit is ongerelateerd.’ to each instance (as prefix), where you expect that predictions will not change:

>>> OneToOnePerturbation.from_string(prefix='Dit is ongerelateerd.', label_to='with_prefix')

Or add a negative string ‘Dit is negatief!’ to each instance (as suffix on the next line), where you expect that instances will have the same label or become more negative:

>>> OneToOnePerturbation.from_string(suffix='Dit is negatief!',
>>>                                  connector_after='\n',
>>>                                  label_to='more_negative')

Or replace all instances with ‘UNKWRDZ’: >>> OneToOnePerturbation.from_string(replacement=’UNKWRDZ’)

Raises

ValueError – At least one of prefix, suffix and replacement should be provided.

Parameters

label_from (LT) – Attribute label of original instance. Defaults to ‘original’.
label_to (LT) – Attribute label of perturbed instance. Defaults to ‘perturbed’.
prefix (Optional[str], optional) – Text to add before instance.data. Defaults to None.
suffix (Optional[str], optional) – Text to add after instance.data. Defaults to None.
replacement (Optional[str], optional) – Text to replace instance.data with. Defaults to None.
connector (str) – General connector between prefix, instance.data and suffix. Defaults to ‘ ‘.
connector_before (Optional[str], optional) – Overrides connector between prefix and instance.data, if it is None connector is used. Defaults to None.
connector_after (Optional[str], optional) – Overrides connector between instance.data and suffix, if it is None connector is used. Defaults to None.

classmethod from_tuples(tuples, label_from, label_to, tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToOnePerturbation from tuples.

A function is constructed where if first aims to perform the mapping from the tokens on the left-hand side (LHS) to the right-hand side (RHS), and if this has no result it aims to perform the mapping from the tokens on the RHS to the LHS.

Example

For example, if [(‘he’, ‘she’)] with label_from=’male’ and label_to=’female’ is provided it first checks whether the tokenized instance contains the word ‘he’ (and if so applies the perturbation and returns), and otherwise aims to map ‘she’ to ‘he’. If neither is possible, it returns None.

>>> tuples = [('he', 'she'),
>>>.          ('his', 'her')]
>>> OneToOnePerturbation.from_tuples(tuples, label_from='male', label_to='female')

Parameters

tuples (List[Tuple[str, str]]) – Lookup tuples to map tokens (e.g. words, characters).
label_from (LT) – Attribute label of original instance (left-hand side of tuples).
label_to (LT) – Attribute label of perturbed instance (right-hand side of tuples).
tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.
detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

perturb(instance)

Apply a perturbation function to a single TextInstance, getting a single result per instance.

Parameters

perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instance. Should return None if no perturbation has been applied.
instance (TextInstance) –

Returns

None if no perturbation has been applied.: Otherwise a sequence of perturbed TextInstances, and attribute labels for the original and perturbed instances.

Return type

Optional[Sequence[Tuple[TextInstance, Sequence[Tuple[KT, LT]]]]]

class text_sensitivity.perturbation.base.Perturbation(perturbation_function)

Bases: Readable

Apply a perturbation function to a single TextInstance.

Parameters: perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and resulting instance(s). Should return None if no perturbation has been applied.

classmethod from_dict(*args, **kwargs): Alias for Perturbation.from_dictionary().

classmethod from_dictionary(*args, **kwargs)

Construct a Perturbation from a dictionary.

Return type: Perturbation

classmethod from_function(function, label_from='original', label_to='perturbed')

Construct a Perturbation from a perturbation applied to a string.

Example

Make each sentence uppercase:

>>> OneToOnePerturbation(str.upper, 'not_upper', 'upper')

Parameters

function (Callable[[str], Optional[Union[str, Sequence[str]]]]) – Function to apply to each string. Return None if no change was applied.
label_from (LT, optional) – Attribute label of original instance. Defaults to ‘original’.
label_to (LT, optional) – Attribute label of perturbed instance. Defaults to ‘perturbed’.

classmethod from_str(*args, **kwargs): Alias for Perturbation.from_string().

classmethod from_string(*args, **kwargs): Construct a Perturbation from a string.

perturb(instance)

Apply perturbation to a single TextInstance.

Parameters: instance (TextInstance) –

text_sensitivity.perturbation.base.as_list(x)

Ensure an element x is a list.

Return type: list

text_sensitivity.perturbation.base.format_identifier(instance, key): Format identifier of child.

text_sensitivity.perturbation.base.one_to_many_dictionary_mapping(instance, dictionary, label_from, label_to, n, tokenizer, detokenizer)

Create one-to-many replacement for a TextInstance.

Parameters

instance (TextInstance) – Instance to create mapping for.
dictionary (Dict[str, List[str]]) – Options for each token.
label_from (LT) – Label of original element.
label_to (LT) – Label of element with replacements applied.
n (int) – Number of replacements for each instance.
tokenizer (Callable[[str], List[str]]) – Tokenize string into sequence of tokens.
detokenizer (Callable[[List[str]], str]) – Detokenize sequence of tokens to string.

Yields

Optional[List[Tuple[str, LT, LT]]] –

None if no change applied, or list of tuples containing detokenized: instance, original label and replaced label.

Return type

Optional[List[Tuple[str, TypeVar(LT), TypeVar(LT)]]]

text_sensitivity.perturbation.base.one_to_one_dictionary_mapping(instance, dictionary, label_from, label_to, tokenizer, detokenizer)

Create one-to-one replacement for a TextInstance.

Parameters

instance (TextInstance) – Instance to create mapping for.
dictionary (Dict[str, List[str]]) – Options for each token.
label_from (LT) – Label of original element.
label_to (LT) – Label of element with replacements applied.
tokenizer (Callable[[str], List[str]]) – Tokenize string into sequence of tokens.
detokenizer (Callable[[List[str]], str]) – Detokenize sequence of tokens to string.

Yields

Optional[Tuple[str, LT, LT]] –

None if no change applied, or tuple containing detokenized instance, original: label and replaced label.

Return type

Optional[Tuple[str, TypeVar(LT), TypeVar(LT)]]

text_sensitivity.perturbation.base.oneway_dictionary_mapping(instance, dictionary, label_from, label_to, n, tokenizer, detokenizer)

Create corresponding replacements for tokens in a TextInstance.

Parameters

instance (TextInstance) – Instance to create mapping for.
dictionary (Dict[str, List[str]]) – Options for each token.
label_from (LT) – Label of original element.
label_to (LT) – Label of element with replacements applied.
n (int) – Number of replacements to pick.
tokenizer (Callable[[str], List[str]]) – Tokenize string into sequence of tokens.
detokenizer (Callable[[List[str]], str]) – Detokenize sequence of tokens to string.

Yields

Iterator[Optional[Tuple[str, LT, LT]]] – Detokenized instance, original label and replaced label.

Return type

Iterator[Optional[Tuple[str, TypeVar(LT), TypeVar(LT)]]]

text_sensitivity.perturbation.characters module

Create character-level perturbations (text_sensitivity.perturbation.base.Perturbation).

text_sensitivity.perturbation.characters.add_typos(n=1, **kwargs)

Create a Perturbation object that adds keyboard typos within words.

Parameters

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See `naw.KeyboardAug`_ for optional constructor arguments.

Returns

Object able to apply perturbations on strings or TextInstances.

Return type

Perturbation

text_sensitivity.perturbation.characters.delete_random(n=1, **kwargs)

Create a Perturbation object with random character deletions in words.

Parameters

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See nac.RandomCharAug for optional constructor arguments (uses action=’delete’ by default).

Returns

Object able to apply perturbations on strings or TextInstances.

Return type

Perturbation

text_sensitivity.perturbation.characters.random_case_swap(n=1)

Create a Perturbation object that randomly swaps characters case (lower to higher or vice versa).

Parameters: n (int, optional) – Number of perturbed instances required. Defaults to 1.
Returns: Object able to apply perturbations on strings or TextInstances.
Return type: Perturbation

text_sensitivity.perturbation.characters.random_lower(n=1)

Create a Perturbation object that randomly swaps characters to lowercase.

Parameters: n (int, optional) – Number of perturbed instances required. Defaults to 1.
Returns: Object able to apply perturbations on strings or TextInstances.
Return type: Perturbation

text_sensitivity.perturbation.characters.random_spaces(n=1, **kwargs)

Create a Perturbation object that adds random spaces within words (splits them up).

Parameters

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See naw.SplitAug for optional constructor arguments.

Returns

Object able to apply perturbations on strings or TextInstances.

Return type

Perturbation

text_sensitivity.perturbation.characters.random_upper(n=1)

Create a Perturbation object that randomly swaps characters to uppercase.

Parameters: n (int, optional) – Number of perturbed instances required. Defaults to 1.
Returns: Object able to apply perturbations on strings or TextInstances.
Return type: Perturbation

text_sensitivity.perturbation.characters.swap_random(n=1, **kwargs)

Create a Perturbation object that randomly swaps characters within words.

Parameters

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See nac.RandomCharAug for optional constructor arguments (uses action=’swap’ by default).

Returns

Object able to apply perturbations on strings or TextInstances.

Return type

Perturbation

text_sensitivity.perturbation.sentences module

Create sentence-level perturbations (text_sensitivity.perturbation.base.Perturbation).

text_sensitivity.perturbation.sentences.repeat_k_times(k=10, connector=' ')

Repeat a string k times.

Parameters

k (int, optional) – Number of times to repeat a string. Defaults to 10.
connector (Optional[str], optional) – Connector between adjacent repeats. Defaults to ‘ ‘.

Returns

Object able to apply perturbations on strings or TextInstances.

Return type

Perturbation

text_sensitivity.perturbation.sentences.to_lower()

Make all characters in a string lowercase.

Returns: Object able to apply perturbations on strings or TextInstances.
Return type: Perturbation

text_sensitivity.perturbation.sentences.to_upper()

Make all characters in a string uppercase.

Returns: Object able to apply perturbations on strings or TextInstances.
Return type: Perturbation

text_sensitivity.perturbation.words module

Create word-level perturbations (text_sensitivity.perturbation.base.Perturbation).