Sensitivity testing (fairness, robustness & safety) for text machine learning models

Note

Extension of text_explainability

Uses the generic architecture of text_explainability to also include tests of safety (how safe it the model in production, i.e. types of inputs it can handle), robustness (how generalizable the model is in production, e.g. stability when adding typos, or the effect of adding random unrelated data) and fairness (if equal individuals are treated equally by the model, e.g. subgroup fairness on sex and nationality).

Quick tour

Safety: test if your model is able to handle different data types.

from text_sensitivity import RandomAscii, RandomEmojis, combine_generators

# Generate 10 strings with random ASCII characters
RandomAscii().generate_list(n=10)

# Generate 5 strings with random ASCII characters and emojis
combine_generators(RandomAscii(), RandomEmojis()).generate_list(n=5)

Robustness: if your model performs equally for different entities …

from text_sensitivity import RandomAddress, RandomEmail

# Random address of your current locale (default = 'nl')
RandomAddress(sep=', ').generate_list(n=5)

# Random e-mail addresses in Spanish ('es') and Portuguese ('pt'), and include from which country the e-mail is
RandomEmail(languages=['es', 'pt']).generate_list(n=10, attributes=True)

… and if it is robust under simple perturbations.

from text_sensitivity import compare_accuracy
from text_sensitivity.perturbation import to_upper, add_typos

# Is model accuracy equal when we change all sentences to uppercase?
compare_accuracy(env, model, to_upper)

# Is model accuracy equal when we add typos in words?
compare_accuracy(env, model, add_typos)

Fairness: see if performance is equal among subgroups.

from text_sensitivity import RandomName

# Generate random Dutch ('nl') and Russian ('ru') names, both 'male' and 'female' (+ return attributes)
RandomName(languages=['nl', 'ru'], sex=['male', 'female']).generate_list(n=10, attributes=True)

Using text_sensitivity

Installation: Installation guide, directly installing it via pip or through the git.
Example Usage: An extended usage example.
text_sensitivity API reference: A reference to all classes and functions included in the text_sensitivity.

Development

text_sensitivity @ GIT: The git includes the open-source code and the most recent development version.
Changelog: Changes for each version are recorded in the changelog.
Contributing: Contributors to the open-source project and contribution guidelines.

Citation

@misc{text_sensitivity,
  title = {Python package text\_sensitivity},
  author = {Marcel Robeer},
  howpublished = {\url{https://github.com/MarcelRobeer/text_sensitivity}},
  doi = {10.5281/zenodo.14192941},
  year = {2021}
}

Credits

Edward Ma. NLP Augmentation. 2019.
Daniele Faraglia and other contributors. Faker. 2012.
Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin and Sameer Singh. Beyond Accuracy: Behavioral Testing of NLP models with CheckList. Association for Computational Linguistics (ACL). 2020.