text_sensitivity.data

All randomly generated data, and data for lookups (e.g. word lists).

Subpackages:

Submodules:

text_sensitivity.data.generate module

Generate data from a pattern, e.g. ‘{He|She} lives in {city}.’

text_sensitivity.data.generate.default_patterns()

Overview of all default patterns.

Return type: List[str]

text_sensitivity.data.generate.from_pattern(pattern, n=3, seed=0, **kwargs)

Generate data from a pattern.

Examples

Generate a list [‘This is his house’, ‘This was his house’, ‘This is his car’, ‘This was his car’, …]:

>>> from_pattern('This {is|was} his {house|car|boat}')

Generate a list [‘His home town is Eindhoven.’, ‘Her home town is Eindhoven.’, ‘His home town is Meerssen.’, …]. By default uses RandomCity() to generate the city name.

>>> from_pattern('{His|Her} home town is {city}.')

Override the ‘city’ default with your own list [‘Amsterdam’, ‘Rotterdam’, ‘Utrecht’]:

>>> from_pattern('{His|Her} home town is {city}.', city=['Amsterdam', 'Rotterdam', 'Utrecht'])

Apply lower case to the first argument and uppercase to the last, getting [‘Vandaag, donderdag heeft Sanne COLIN gebeld!’, …, ‘Vandaag, maandag heeft Nora SEPP gebeld!’, …] for five random elements of each:

>>> from_pattern('Vandaag, {lower:day_of_week}, heeft {first_name} {upper:first_name} gebeld!', n=5)

Parameters

pattern (str) – String containing pattern.
n (int, optional) – Number of elements to generate for each element, when generator is random. Defaults to 3.
seed (int, optional) – Seed for reproducibility. Defaults to 0.

Returns

Generated instances and corresponding labels.

Return type

Tuple[TextInstanceProvider, Dict[LT, MemoryLabelProvider]]

text_sensitivity.data.generate.options_from_brackets(string, n=3, seed=0, **kwargs)

Generate options from string.

Example

Generate random list of houses:

>>> options_from_brackets('I have {number} houses!', number=[5, 10, 100, 5000])

Parameters

string (str) – String with curly braces.
n (int, optional) – Number of elements to generate for each option. Defaults to 3.
seed (int, optional) – Seed for reproducibility. Defaults to 0.

Returns

Strings with elements generated.

Return type

List[str]

text_sensitivity.data.wordlist module

Select data from a list of words, optionally with a probability to choose each element.

class text_sensitivity.data.wordlist.WordList(wordlist, main_column=None, attribute_column=None, seed=0)

Bases: Readable, SeedMixin, CaseMixin

Capture data in wordlist.

Parameters

wordlist (pd.DataFrame) – Dataframe containing a column with data (e.g. city names).
main_column (Optional[Label], optional) – Column containing data. Defaults to None.
attribute_column (Optional[Label], optional) – Column containing attributes. If None defaults to the main column.
seed (int, optional) – Seed for reproducibility. Defaults to 0.

filter(column, values)

Filter the wordlist column if it is in values.

Parameters

column (Label) – Column to filter.
values (Union[Label, List[Label]]) – Values to filter.

Returns

Self.

Return type

WordList

classmethod from_csv(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)

Create a WordList from a CSV file.

Parameters

filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_csv().

Returns

WordList class.

Return type

WordList

classmethod from_dict(*args, **kwargs): Alias for WordList.from_dictionary().

classmethod from_dictionary(wordlist, key_name='key', value_name='value', value_as_main=False, seed=0)

Create a WordList from a dictionary.

Example

Create list of pronouns with genders:

>>> wl = WordList.from_dictionary({'he': 'male', 'she': 'female', 'they': 'neuter'},
...                               key_name='pronoun',
...                               value_name='gender')

Parameters

wordlist (Dict) – Dictionary of elements and corresponding attribute.
key_name (Label, optional) – Name of keys. Defaults to ‘key’.
value_name (Label, optional) – Name of values. Defaults to ‘value’.
value_as_main (bool, optional) – Whether data is in the key column (False) or value column (True). Defaults to False.
seed (int, optional) – Seed for reproducibility. Defaults to 0.

Returns

WordList class.

Return type

WordList

classmethod from_excel(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)

Create a WordList from an Excel (.xls or .xlsx) file.

Parameters

filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_excel().

Returns

WordList class.

Return type

WordList

classmethod from_file(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)

Create a WordList from a file.

The file type is inferred based on the file extension.

Parameters

filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas reader.

Returns

WordList class.

Return type

WordList

classmethod from_json(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)

Create a WordList from a JSON file.

Parameters

filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_json().

Returns

WordList class.

Return type

WordList

classmethod from_list(wordlist, name='words', seed=0)

Create a WordList from a list of strings.

Example

Create list of city names and pick one random element:

>>> wl = WordList.from_list(['Amsterdam', 'Rotterdam', 'Utrecht'], name='city')
>>> wl.generate_list(n=1)

Parameters

wordlist (List[str]) – List of strings.
name (Label, optional) – Name of attribute. Defaults to ‘words’.
seed (int, optional) – Seed for reproducibility. Defaults to 0.

Returns

WordList class.

Return type

WordList

classmethod from_pickle(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)

Create a WordList from a Pickled (.pkl) file.

Parameters

filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_pickle().

Returns

WordList class.

Return type

WordList

generate_list(n=None, attributes=False, likelihood_column=None)

Generate a random list of n elements.

Parameters

n (Optional[int], optional) – Number of elements to generate. Defaults to None.
attributes (bool, optional) – Include attributes or not. Defaults to False.
likelihood_column (Optional[Label], optional) – Attribute to determine likelihood on. Defaults to None.

Returns

Wordlist elements (up to n).

Return type

List[str]

get(sort_by=None, attributes=False, **sort_kwargs)

Get all elements in wordlist.

Parameters

sort_by (Optional[Label], optional) – Label to sort on (e.g. frequency). Defaults to None.
attributes (bool, optional) – Include attributes or not. Defaults to False.

Returns

Wordlist elements.

Return type

List[str]

reset(): Reset wordlist.

class text_sensitivity.data.wordlist.WordListGetterMixin

Bases: object

filter(*args, **kwargs): Wrapper of WordList.filter().

generate_list(*args, **kwargs): Wrapper of WordList.generate_list().

get(*args, **kwargs): Get item in wordlist.

reset(): Wrapper of WordList.reset().