text_sensitivity.data.random

Generate random data for robustness and sensitivity testing.

Submodules:

text_sensitivity.data.random.entity module

Generation of random entities (e.g. names, telephone numbers) for given languages.

class text_sensitivity.data.random.entity.CityByPopulationMixin

Bases: Readable

add_likelihood_to_cities()

Add likelihood to cities, based on population.

static cities_by_population(cities, country_code)

Add population scores to each city in a country.

Parameters
  • cities (List[str]) – Current list of cities. If no replacement is found, this will be returned back.

  • country_code (str) – Two-letter country code (e.g. ‘nl’).

class text_sensitivity.data.random.entity.RandomAddress(languages=<Proxy at 0x7fb578cebb40 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, likelihood_based_on_city_population=True, sep='\\n', seed=0)

Bases: RandomEntity, CityByPopulationMixin

Generate random cities in (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • likelihood_based_on_city_population (bool) –

  • sep (str) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomCity(languages=<Proxy at 0x7fb578cebdc0 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, likelihood_based_on_city_population=True, seed=0)

Bases: RandomEntity, CityByPopulationMixin

Generate random cities in (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • likelihood_based_on_city_population (bool) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomCountry(languages=<Proxy at 0x7fb578cee040 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, seed=0)

Bases: RandomEntity

Generate random countries for (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomCryptoCurrency(seed=0)

Bases: RandomEntity

Generate random cryptocurrency names.

Parameters

seed (int) –

class text_sensitivity.data.random.entity.RandomCurrencySymbol(seed=0)

Bases: RandomEntity

Generate random currency symbols.

Parameters

seed (int) –

class text_sensitivity.data.random.entity.RandomDay(seed=0)

Bases: RandomEntity

Generate random day of the month.

Parameters

seed (int) –

class text_sensitivity.data.random.entity.RandomDayOfWeek(languages=<Proxy at 0x7fb578ceefc0 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, seed=0)

Bases: RandomEntity

Generate random day of week in (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomEmail(languages=<Proxy at 0x7fb578cee8c0 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, seed=0)

Bases: RandomEntity

Generate random e-mail addresses for (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomEntity(languages=<Proxy at 0x7fb5790b4c40 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, providers=['person'], fn_name='name', attribute='fn', attribute_rename=None, sep='\\n', seed=0)

Bases: Readable, SeedMixin, CaseMixin

Base class to generate entity data for (a) given language(s).

Example

Generate a 10 random English names entity using package faker:

>>> RandomEntity(locale='en', providers=['person'], fn_name='name').generate_list(n=10)
Parameters
  • languages (Union[str, List[str]], optional) – Languages to generate data from. Defaults to your current locale (see get_locale()).

  • providers (List[str], optional) – Providers from faker used in generation. Defaults to [‘person’].

  • fn_name (Union[str, List[str]], optional) – Function name(s) to call for each generator. Defaults to ‘name’.

  • attribute (str, optional) – Name of additional attribute (other than language). Defaults to ‘fn’.

  • attribute_rename (Optional[Callable[[str], str]], optional) – Rename function for attribute value. Defaults to None.

  • sep (str, optional) – Separator to replace ‘n’ character with. Defaults to ‘n’.

  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

generate(n, attributes=False)

Generate n instances of random data.

Parameters
  • n (int) – Number of instances to generate.

  • attributes (bool, optional) – Include attributes (language, which function was used, etc.) or not. Defaults to False.

Returns

Provider containing generated instances (if attributes = False). Tuple[TextInstanceProvider, Dict[str, MemoryLabelProvider]]: Provider and corresponding attribute

labels (if attributes = True).

Return type

TextInstanceProvider

generate_list(n, attributes=False)

Generate n instances of random data and return as list.

Parameters
  • n (int) – Number of instances to generate.

  • attributes (bool, optional) – Include attributes (language, which function was used, etc.) or not. Defaults to False.

Returns

Generated instances (if attributes = False). Tuple[List[str], Dict[str, str]]: Generated instances and corresponding attributes (if attributes = True).

Return type

List[str]

class text_sensitivity.data.random.entity.RandomFirstName(languages=<Proxy at 0x7fb578cee480 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, sex=['male', 'female'], seed=0)

Bases: RandomEntity

Generate random first names for (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • sex (List[str]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomLastName(languages=<Proxy at 0x7fb578cee6c0 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, seed=0)

Bases: RandomEntity

Generate random last names for (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomLicensePlate(seed=0)

Bases: RandomEntity

Generate random license plates for a given country.

Parameters

seed (int) –

class text_sensitivity.data.random.entity.RandomMonth(languages=<Proxy at 0x7fb578ceed40 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, seed=0)

Bases: RandomEntity

Generate random month name in (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomName(languages=<Proxy at 0x7fb578cee240 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, sex=['male', 'female'], seed=0)

Bases: RandomEntity

Generate random full names for (a) given language(s).

Parameters
  • languages (Union[str, List[str]]) –

  • sex (List[str]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomPhoneNumber(languages=<Proxy at 0x7fb578ceeac0 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, seed=0)

Bases: RandomEntity

Generate random phone numbers for (a) given language(s) / country.

Parameters
  • languages (Union[str, List[str]]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomPriceTag(languages=<Proxy at 0x7fb578cf3200 wrapping 'nl' at 0x7fb5a0dea5b0 with factory <function lazy.<locals>.<lambda>>>, seed=0)

Bases: RandomEntity

Generate random pricetag names in (a) given languages’ currency.

Parameters
  • languages (Union[str, List[str]]) –

  • seed (int) –

class text_sensitivity.data.random.entity.RandomYear(seed=0)

Bases: RandomEntity

Generate random year.

Parameters

seed (int) –

text_sensitivity.data.random.string module

Generate random strings from characters/strings.

class text_sensitivity.data.random.string.RandomAscii(seed=0)

Bases: RandomString

Generate random ASCII characters.

Parameters

seed (int) –

class text_sensitivity.data.random.string.RandomCyrillic(languages='ru', upper=True, lower=True, seed=0)

Bases: RandomString

Generate containing random Cyrillic characters.

Can generate text in Bulgarian (‘bg’), Macedonian (‘mk’), Russian (‘ru’), Serbian (‘sr’), Ukrainian (‘uk’), and all combinations thereof.

Parameters
  • languages (Union[List[str], str], optional) – Cyrillic languages to select. Defaults to ‘ru’.

  • upper (bool, optional) – Whether to include

  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

  • lower (bool) –

Raises
  • ValueError – Either upper or lower should be True.

  • ValueError – One of the selected languages is unknown.

class text_sensitivity.data.random.string.RandomDigits(seed=0)

Bases: RandomString

Generate strings containing random digits.

Parameters

seed (int) –

class text_sensitivity.data.random.string.RandomEmojis(seed=0, base=True, dingbats=True, flags=True, components=True)

Bases: RandomString

Generate strings containing a subset of random unicode emojis.

Parameters
  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

  • base (bool, optional) – Include base emojis (e.g. smiley face). Defaults to True.

  • dingbats (bool, optional) – Include dingbat emojis. Defaults to True.

  • flags (bool, optional) – Include flag emojis. Defaults to True.

  • components (bool, optional) – Include emoji components (e.g. skin color modifier or country flags). Defaults to True.

Raises

ValueError – At least one of base, dingbats, flags should be True.

class text_sensitivity.data.random.string.RandomLower(seed=0)

Bases: RandomString

Generate random ASCII lowercase characters.

Parameters

seed (int) –

class text_sensitivity.data.random.string.RandomPunctuation(seed=0)

Bases: RandomString

Generate strings containing random punctuation characters.

Parameters

seed (int) –

class text_sensitivity.data.random.string.RandomSpaces(seed=0)

Bases: RandomString

Generate strings with a random number of spaces.

Parameters

seed (int) –

class text_sensitivity.data.random.string.RandomString(seed=0, options='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\\'()*+, -./:;<=>?@[\\\\]^_`{|}~ \\t\\n\\r\\x0b\\x0c')

Bases: Readable, SeedMixin

Base class for random data (string) generation.

Parameters
  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

  • options (Union[str, List[str]], optional) – Characters or strings to generate data from. Defaults to string.printable.

generate(n, min_length=0, max_length=100)

Generate n instances of random strings.

Example

Create a TextInstanceProvider containing n=10 strings of random characters from ‘12345xXyY!?’ between length 3 and 10:

>>> RandomString(seed=0, options='12345xXyY!?').generate_list(n=10, min_length=3, max_length=10)
Parameters
  • n (int) – Number of instances to generate.

  • min_length (int, optional) – Minimum length of random instance. Defaults to 0.

  • max_length (int, optional) – Maximum length of random instance. Defaults to 100.

Raises

ValueErrormin_length should be smaller than max_length.

Returns

Provider containing generated instances.

Return type

TextInstanceProvider

generate_list(n, min_length=0, max_length=100)

Generate n instances of random strings and return as list.

Example

Generate a list of random characters from u’ABCabcU0001F600’ between length 10 and 50 (n=10 strings):

>>> RandomString(seed=0, options=u'ABCabc\U0001F600').generate_list(n=10, min_length=10, max_length=50)
Parameters
  • n (int) – Number of instances to generate.

  • min_length (int, optional) – Minimum length of random instance. Defaults to 0.

  • max_length (int, optional) – Maximum length of random instance. Defaults to 100.

Raises

ValueErrormin_length should be smaller than max_length.

Returns

List containing generated instances.

Return type

List[str]

class text_sensitivity.data.random.string.RandomUpper(seed=0)

Bases: RandomString

Generate random ASCII uppercase characters.

Parameters

seed (int) –

class text_sensitivity.data.random.string.RandomWhitespace(seed=0)

Bases: RandomString

Generate strings with a random number whitespace characters.

Parameters

seed (int) –

text_sensitivity.data.random.string.combine_generators(*generators, seed=None)

Combine muliple random string generators into one.

Parameters
  • *generators – Generators to combine.

  • seed (Optional[int]) – Seed value for new generator. If None picks a random seed from the generators. Defaults to None.

Return type

RandomString

Example

Make a generator that generates random punctuation, emojis and ASCII characters:

>>> new_generator = combine_generators(RandomPunctuation(), RandomEmojis(), RandomAscii())
Returns

Generator with all generator options combined.

Return type

RandomString

Parameters

seed (Optional[int]) –