text_sensitivity.data
All randomly generated data, and data for lookups (e.g. word lists).
Subpackages:
- text_sensitivity.data.lists
- text_sensitivity.data.random
- text_sensitivity.data.random.entity module
- text_sensitivity.data.random.string module
Submodules:
text_sensitivity.data.generate module
Generate data from a pattern, e.g. ‘{He|She} lives in {city}.’
- text_sensitivity.data.generate.default_patterns()
Overview of all default patterns.
- Return type
List
[str
]
- text_sensitivity.data.generate.from_pattern(pattern, n=3, seed=0, **kwargs)
Generate data from a pattern.
Examples
Generate a list [‘This is his house’, ‘This was his house’, ‘This is his car’, ‘This was his car’, …]:
>>> from_pattern('This {is|was} his {house|car|boat}')
Generate a list [‘His home town is Eindhoven.’, ‘Her home town is Eindhoven.’, ‘His home town is Meerssen.’, …]. By default uses RandomCity() to generate the city name.
>>> from_pattern('{His|Her} home town is {city}.')
Override the ‘city’ default with your own list [‘Amsterdam’, ‘Rotterdam’, ‘Utrecht’]:
>>> from_pattern('{His|Her} home town is {city}.', city=['Amsterdam', 'Rotterdam', 'Utrecht'])
Apply lower case to the first argument and uppercase to the last, getting [‘Vandaag, donderdag heeft Sanne COLIN gebeld!’, …, ‘Vandaag, maandag heeft Nora SEPP gebeld!’, …] for five random elements of each:
>>> from_pattern('Vandaag, {lower:day_of_week}, heeft {first_name} {upper:first_name} gebeld!', n=5)
- Parameters
pattern (str) – String containing pattern.
n (int, optional) – Number of elements to generate for each element, when generator is random. Defaults to 3.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
- Returns
Generated instances and corresponding labels.
- Return type
Tuple[TextInstanceProvider, Dict[LT, MemoryLabelProvider]]
- text_sensitivity.data.generate.options_from_brackets(string, n=3, seed=0, **kwargs)
Generate options from string.
Example
Generate random list of houses:
>>> options_from_brackets('I have {number} houses!', number=[5, 10, 100, 5000])
- Parameters
string (str) – String with curly braces.
n (int, optional) – Number of elements to generate for each option. Defaults to 3.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
- Returns
Strings with elements generated.
- Return type
List[str]
text_sensitivity.data.wordlist module
Select data from a list of words, optionally with a probability to choose each element.
- class text_sensitivity.data.wordlist.WordList(wordlist, main_column=None, attribute_column=None, seed=0)
Bases:
Readable
,SeedMixin
,CaseMixin
Capture data in wordlist.
- Parameters
wordlist (pd.DataFrame) – Dataframe containing a column with data (e.g. city names).
main_column (Optional[Label], optional) – Column containing data. Defaults to None.
attribute_column (Optional[Label], optional) – Column containing attributes. If None defaults to the main column.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
- filter(column, values)
Filter the wordlist column if it is in values.
- Parameters
column (Label) – Column to filter.
values (Union[Label, List[Label]]) – Values to filter.
- Returns
Self.
- Return type
- classmethod from_csv(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)
Create a WordList from a CSV file.
- Parameters
filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_csv().
- Returns
WordList class.
- Return type
- classmethod from_dict(*args, **kwargs)
Alias for WordList.from_dictionary().
- classmethod from_dictionary(wordlist, key_name='key', value_name='value', value_as_main=False, seed=0)
Create a WordList from a dictionary.
Example
Create list of pronouns with genders:
>>> wl = WordList.from_dictionary({'he': 'male', 'she': 'female', 'they': 'neuter'}, ... key_name='pronoun', ... value_name='gender')
- Parameters
wordlist (Dict) – Dictionary of elements and corresponding attribute.
key_name (Label, optional) – Name of keys. Defaults to ‘key’.
value_name (Label, optional) – Name of values. Defaults to ‘value’.
value_as_main (bool, optional) – Whether data is in the key column (False) or value column (True). Defaults to False.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
- Returns
WordList class.
- Return type
- classmethod from_excel(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)
Create a WordList from an Excel (.xls or .xlsx) file.
- Parameters
filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_excel().
- Returns
WordList class.
- Return type
- classmethod from_file(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)
Create a WordList from a file.
The file type is inferred based on the file extension.
- Parameters
filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas reader.
- Returns
WordList class.
- Return type
- classmethod from_json(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)
Create a WordList from a JSON file.
- Parameters
filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_json().
- Returns
WordList class.
- Return type
- classmethod from_list(wordlist, name='words', seed=0)
Create a WordList from a list of strings.
Example
Create list of city names and pick one random element:
>>> wl = WordList.from_list(['Amsterdam', 'Rotterdam', 'Utrecht'], name='city') >>> wl.generate_list(n=1)
- Parameters
wordlist (List[str]) – List of strings.
name (Label, optional) – Name of attribute. Defaults to ‘words’.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
- Returns
WordList class.
- Return type
- classmethod from_pickle(filename, main_column=None, attribute_column=None, seed=0, *args, **kwargs)
Create a WordList from a Pickled (.pkl) file.
- Parameters
filename (str) – Filename.
main_column (Optional[Label], optional) – Data column. Defaults to None.
attribute_column (Optional[Label], optional) – Attribute column. Defaults to None.
seed (int, optional) – Seed for reproducibility. Defaults to 0.
**kwargs – Optional arguments passed to pandas.read_pickle().
- Returns
WordList class.
- Return type
- generate_list(n=None, attributes=False, likelihood_column=None)
Generate a random list of n elements.
- Parameters
n (Optional[int], optional) – Number of elements to generate. Defaults to None.
attributes (bool, optional) – Include attributes or not. Defaults to False.
likelihood_column (Optional[Label], optional) – Attribute to determine likelihood on. Defaults to None.
- Returns
Wordlist elements (up to n).
- Return type
List[str]
- get(sort_by=None, attributes=False, **sort_kwargs)
Get all elements in wordlist.
- Parameters
sort_by (Optional[Label], optional) – Label to sort on (e.g. frequency). Defaults to None.
attributes (bool, optional) – Include attributes or not. Defaults to False.
- Returns
Wordlist elements.
- Return type
List[str]
- reset()
Reset wordlist.