Creating Mock Data¶
In the case you have a data model but no data, it can be useful to be able to generate some data for your model. There are several use cases for this
- In the design phase, you want to quickly try out your current data model iteration with some data.
- You need data for testing the data model.
- Load testing of a data model.
pygen
comes with a MockGenerator
. Reference for this you can find here, this is a practical guide to the usage of this module.
Generate Default Mock Data¶
from cognite.pygen.utils import MockGenerator, load_cognite_client_from_toml
client = load_cognite_client_from_toml()
In this example, we will use the WindMill
data model as an example.
Lets instantiate the MockGenerator
, we do this by passing the data model along with the instance space we want to use for the
generated data. In addition, we need an instantiated CogniteClient
to fetch the data model.
generator = MockGenerator.from_data_model(
("power-models", "Windmill", "1"), instance_space="sp_sandbox", client=client, seed=42
)
generator
The MockGenerator
has one method generate_mock_data
which will generate the mock data using the default settings.
mock_data = generator.generate_mock_data()
Inspect Generated Mock Data¶
mock_data
resource | count | |
---|---|---|
0 | node | 55 |
1 | edge | 28 |
2 | timeseries | 136 |
3 | sequence | 0 |
4 | file | 0 |
mock_data.nodes.to_pandas().head()
instance_type | space | external_id | sources | |
---|---|---|---|---|
0 | node | sp_sandbox | blade_92349 | [{'properties': {'name': 'KriXref', 'is_damage... |
1 | node | sp_sandbox | blade_9116 | [{'properties': {'name': 'evblAbk', 'is_damage... |
2 | node | sp_sandbox | blade_6006 | [{'properties': {'name': 'HbolMJU', 'is_damage... |
3 | node | sp_sandbox | blade_86673 | [{'properties': {'name': None, 'is_damaged': T... |
4 | node | sp_sandbox | blade_29871 | [{'properties': {'name': 'HClEQaP', 'is_damage... |
The mock data has a few convenience methods to make it easy to use the data. We can deploy and clean it, as well as dump it as yaml.
mock_data.deploy(client)
Created 55 nodes and 28 edges Created/Updated 136 timeseries
mock_data.clean(client, delete_space=True)
Deleted 55 nodes and 28 edges Deleted 136 timeseries Deleted space sp_sandbox
Amount of Mock Data¶
You can also control how the data is generated by providing one or more configs.
The easiest is to probide a default config that will be used for all views, but you can also have one config per view.
from cognite.pygen.utils.mock_generator import DataType, GeneratorFunction, IDGeneratorFunction, ViewMockConfig
If we want to generate more nodes and edges we can set the default config.
views = (
client.data_modeling.data_models.retrieve(("power-models", "Windmill", "1"), inline_views=True)
.latest_version()
.views
)
new_generator = MockGenerator(views, instance_space="sp_sandbox")
more_data = new_generator.generate_mock_data(node_count=100, max_edge_per_type=3, null_values=0.1)
more_data
resource | count | |
---|---|---|
0 | node | 1100 |
1 | edge | 466 |
2 | timeseries | 3060 |
3 | sequence | 0 |
4 | file | 0 |
Customized Random Generation¶
We can also control how the random data is generated.
We have two interfaces:
- Generation of mock data
- Generation of node IDs
DataType
typing.Union[int, float, bool, str, dict, NoneType]
print(GeneratorFunction.__doc__)
Interface for a function that generates mock data. Examples: >>> def my_data_generator(count: int) -> list[str]: ... return [ ... "".join(random.choices(string.ascii_lowercase + string.ascii_uppercase, k=7)) ... for _ in range(count) ... ] >>> my_data_generator(5)
print(IDGeneratorFunction.__doc__)
Interface for a function that generates mock data. Examples: >>> def my_id_generator(view_id: dm.ViewId, count: int) -> list[str]: ... return [f"{view_id.external_id.casefold()}_{no}" for no in range(count)] >>> my_id_generator(dm.ViewId("my_space", "MyView", "v1"), 5)
In the data model, there is a Blade
view
blade = next(v for v in views if v.external_id == "Blade")
blade.dump()["properties"].keys()
dict_keys(['name', 'is_damaged', 'sensor_positions'])
We see that this view has a property is_damaged
, we want to replace the default generation of random value for this property.
We set it such that the blade is never damaged
blade_config = ViewMockConfig(properties={"is_damaged": lambda count: [False] * count})
In addition, we want all properties of type Text
to be a random name
from faker import Faker
from cognite.client import data_modeling as dm
# Note that since we are using an external source for the ransomness, we have to set the seed ourselves
Faker.seed(42)
faker = Faker()
default_config = ViewMockConfig(
# Note that this setting will not apply to the Blade View as that we are passing a custom config to it
property_types={dm.Text: lambda count: [faker.unique.name() for _ in range(count)]}
)
custom_generator = MockGenerator(
views, "sp_sandbox", view_configs={blade.as_id(): blade_config}, default_config=default_config, seed=7
)
customized_mock_data = custom_generator.generate_mock_data()
customized_mock_data
resource | count | |
---|---|---|
0 | node | 55 |
1 | edge | 30 |
2 | timeseries | 136 |
3 | sequence | 0 |
4 | file | 0 |
blade_data = next(view_data for view_data in customized_mock_data if view_data.view_id == blade.as_id())
blade_data.node.dump()
[{'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'blade_73972', 'sources': [{'properties': {'name': None, 'is_damaged': False}, 'source': {'space': 'power-models', 'externalId': 'Blade', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'blade_7812', 'sources': [{'properties': {'name': 'cSphgqQ', 'is_damaged': False}, 'source': {'space': 'power-models', 'externalId': 'Blade', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'blade_81134', 'sources': [{'properties': {'name': 'glGXEuY', 'is_damaged': None}, 'source': {'space': 'power-models', 'externalId': 'Blade', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'blade_26995', 'sources': [{'properties': {'name': 'qhHdBtd', 'is_damaged': False}, 'source': {'space': 'power-models', 'externalId': 'Blade', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'blade_65066', 'sources': [{'properties': {'name': 'AbwdewQ', 'is_damaged': False}, 'source': {'space': 'power-models', 'externalId': 'Blade', 'version': '1', 'type': 'view'}}]}]
We see that is_damaged
is set to False
when it is not nullable. Note that in addition, the new default function for text is not applied here as the blade view has its own config were we did not overwrite the Text field generator.
windmill = next(v for v in views if v.external_id == "Windmill")
windmill_data = next(view_data for view_data in customized_mock_data if view_data.view_id == windmill.as_id())
windmill_data.node.dump()
[{'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'windmill_34438', 'sources': [{'properties': {'name': 'Matthew Foster', 'windfarm': 'Zachary Hicks', 'capacity': -6475.645430192593, 'rotor': {'space': 'sp_sandbox', 'externalId': 'rotor_34702'}, 'nacelle': {'space': 'sp_sandbox', 'externalId': 'nacelle_94781'}}, 'source': {'space': 'power-models', 'externalId': 'Windmill', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'windmill_36953', 'sources': [{'properties': {'name': None, 'windfarm': 'Anthony Rodriguez', 'capacity': -5360.862663609285}, 'source': {'space': 'power-models', 'externalId': 'Windmill', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'windmill_536', 'sources': [{'properties': {'name': 'Stephanie Ross', 'windfarm': None, 'capacity': None, 'rotor': {'space': 'sp_sandbox', 'externalId': 'rotor_98261'}}, 'source': {'space': 'power-models', 'externalId': 'Windmill', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'windmill_19094', 'sources': [{'properties': {'name': 'Judy Baker', 'windfarm': 'Rebecca Henderson', 'capacity': 9154.624079279823, 'nacelle': {'space': 'sp_sandbox', 'externalId': 'nacelle_45125'}}, 'source': {'space': 'power-models', 'externalId': 'Windmill', 'version': '1', 'type': 'view'}}]}, {'instanceType': 'node', 'space': 'sp_sandbox', 'externalId': 'windmill_54912', 'sources': [{'properties': {'name': 'Justin Baker', 'windfarm': 'James Ferrell', 'capacity': -6981.581884177821, 'rotor': {'space': 'sp_sandbox', 'externalId': 'rotor_44909'}, 'nacelle': {'space': 'sp_sandbox', 'externalId': 'nacelle_90770'}}, 'source': {'space': 'power-models', 'externalId': 'Windmill', 'version': '1', 'type': 'view'}}]}]
For the windfarm, we see that the text property windfarm
has been set with our random Text generator.