Sensitivity testing (fairness & robustness) for text machine learning models
Note
Extension of text_explainability
Uses the generic architecture of text_explainability
to also include tests of safety (how safe it the model in production, i.e. types of inputs it can handle), robustness (how generalizable the model is in production, e.g. stability when adding typos, or the effect of adding random unrelated data) and fairness (if equal individuals are treated equally by the model, e.g. subgroup fairness on sex and nationality).
© Marcel Robeer, 2021
Quick tour
Safety: test if your model is able to handle different data types.
from text_sensitivity import RandomAscii, RandomEmojis, combine_generators
# Generate 10 strings with random ASCII characters
RandomAscii().generate_list(n=10)
# Generate 5 strings with random ASCII characters and emojis
combine_generators(RandomAscii(), RandomEmojis()).generate_list(n=5)
Robustness: if your model performs equally for different entities …
from text_sensitivity import RandomAddress, RandomEmail
# Random address of your current locale (default = 'nl')
RandomAddress(sep=', ').generate_list(n=5)
# Random e-mail addresses in Spanish ('es') and Portuguese ('pt'), and include from which country the e-mail is
RandomEmail(languages=['es', 'pt']).generate_list(n=10, attributes=True)
… and if it is robust under simple perturbations.
from text_sensitivity import compare_accuracy
from text_sensitivity.perturbation import to_upper, add_typos
# Is model accuracy equal when we change all sentences to uppercase?
compare_accuracy(env, model, to_upper)
# Is model accuracy equal when we add typos in words?
compare_accuracy(env, model, add_typos)
Fairness: see if performance is equal among subgroups.
from text_sensitivity import RandomName
# Generate random Dutch ('nl') and Russian ('ru') names, both 'male' and 'female' (+ return attributes)
RandomName(languages=['nl', 'ru'], sex=['male', 'female']).generate_list(n=10, attributes=True)
Using text_sensitivity
- Installation
Installation guide, directly installing it via pip or through the git.
- Example Usage
An extended usage example.
- text_sensitivity API reference
A reference to all classes and functions included in the
text_sensitivity
.
Development
- text_sensitivity @ GIT
The git includes the open-source code and the most recent development version.
- Changelog
Changes for each version are recorded in the changelog.
- Contributing
Contributors to the open-source project and contribution guidelines.
Citation
@misc{text_sensitivity,
title = {Python package text\_sensitivity},
author = {Marcel Robeer},
howpublished = {\url{https://git.science.uu.nl/m.j.robeer/text_sensitivity}},
year = {2021}
}
Credits
Edward Ma. NLP Augmentation. 2019.
Daniele Faraglia and other contributors. Faker. 2012.
Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin and Sameer Singh. Beyond Accuracy: Behavioral Testing of NLP models with CheckList. Association for Computational Linguistics (ACL). 2020.