Data Migration¶
Prerequisites
- Access to a CDF Project.
 - Know how to use a terminal, so you can run 
pygenfrom the command line to generate the SDK. - Knowledge of your the data and data models.
 
Introduction to Problem¶
In the development of a solution, it is common that you move from one data model to another. Typically, the physicaly storage of data, the containers, stays the same, or are extended, and only the views of a data model change. In this case, there is no need for a data migration and all you do is update the solution code for the new model.
If you, however, need to change how the data is stored in the containers, you need to move the data from one model to the other. A common to do this is to use CDF Transformations. An alternative
ways is to use pygen for data migration.
Advantage of CDF Transformations¶
- Can handle large volume of data, millions of instances.
 - You can write the transformations in the UI using SQL.
 
Advantages of using pygen¶
- Edges are automatically created.
 - Can handle medium volume of data, hundreds of thousands of instances.
 - Support doing very custom migration (what ever you can do in Python code).
 
Why would you chose pygen over CDF Transformations? If you are more comfortable with Python than writing SQL, you have data models with a of one-to-many edges, the number of nodes and edges (instances) is in the order of hundreds of thousands.
Use Case¶
As we see in the illustrations above (click to enlarge), we have many more edges in the destination than the source model,
whiche means this is a good use case for using pygen for the data migration.
In this guide, we will not do the entire migration, but instead focus on the APM_Template to Template migration.
Generating SDKs¶
For demo purposes we will generate the SDKs in this notebook, however, depending on your use case it might be useful to use the pygen CLI to generate the SDKs.
Migration scrips are often a one of, meaning you are not expected to regenerat the SDK as the model changes, thus if you generate it locally, you can do changes to the generated code as you are not expecting this to change later. This can be useful if you want to do some customization to the generated code.
Note that we have set up a config.toml with credentials to connect to CDF.
from cognite.pygen import generate_sdk_notebook, load_cognite_client_from_toml
client = load_cognite_client_from_toml("config.toml")
source = generate_sdk_notebook(("APM_AppData_4", "APM_AppData_4", "7"), client)
Successfully retrieved data model(s) ('APM_AppData_4', 'APM_AppData_4', '7')
Writing SDK to C:\Users\ANDERS~1\AppData\Local\Temp\pygen
Done!
Added C:\Users\ANDERS~1\AppData\Local\Temp\pygen to sys.path to enable import
Imported apm_app_data_4.client
destination = generate_sdk_notebook(("IntegrationTestsImmutable", "ApmAppData", "v3"), client)
Successfully retrieved data model(s) ('IntegrationTestsImmutable', 'ApmAppData', 'v3')
Writing SDK to C:\Users\ANDERS~1\AppData\Local\Temp\pygen
Done!
C:\Users\ANDERS~1\AppData\Local\Temp\pygen already in sys.path
Imported apm_app_data.client
Retrieving Data¶
It is common to store the nodes and edges, i.e., instances, in a differen space than the data model. To only get instances from an instance space, sourceSpace,
we filter for it. In addition, it important that we retrieve all edges so we can connect the templates with the templates_items.
templates = source.apm_template.list(space="sourceSpace", retrieve_edges=True, limit=-1)
template_items = source.apm_template_item.list(space="sourceSpace", limit=-1)
len(templates), len(template_items)
(50, 105)
Converting the Data¶
We do the conversion in in the following steps
- We show some of the data of the source templates. This is usefult to have as a reference when writing the transformation code
 - We put the 
APMTemplateItems into a dictionary,template_items_by_external_idso we can easily reference it. - We include a few other dictionaries, 
visiblity_by_item_external_id,schedules_by_item_external_id,measurements_by_external_id, andvisiblity_by_external_id. This is to show how we can include data from different sources. - We iterate through all the 
APMTemplatesand do the transformation for eachtemplate_item(s)s andtemplate 
# Importing the new destination data classes
from apm_app_data.data_classes import TemplateItemWrite, TemplateWrite, TemplateWriteList
templates.to_pandas().head(5)
| external_id | assigned_to | created_by_external_id | is_archived | root_location_external_id | status | title | updated_by_external_id | template_items | space | version | last_updated_time | created_time | deleted_time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 108c5106-f638-4d9c-8ef2-4b470ed94495 | None | 80d32fb4-bcbc-44ae-abed-fee53b148d82 | False | 31d9f834-a1ab-452a-b2a5-875f0e9f384a | OnHold | good | f1e7d13d-3fd2-42e4-bebc-68ad2aa905eb | [bef8179e-9e0c-44a8-8577-cf6e5ce0bd8f, 479ef47... | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 1 | 32f92434-e5fb-49c1-b4c4-1a454ed699ea | None | 6dbb225f-5059-4ff2-9543-7ab400e235e9 | False | dac4077b-61d6-445c-a4f4-f6f3247125ab | OnHold | three | df561fd7-dad6-484e-95f7-ae967fb00ed3 | [f6ce45c7-b5a6-4a3b-b549-91e4a05580a7, bbbdda6... | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 2 | 6b072855-8408-4c21-b1a3-5caf4bd22204 | None | 863a0cd1-f55e-4529-ac50-34090596a279 | True | 1b582c43-bc07-4ab7-959e-cf7e434711a0 | Completed | also | 37d17cd1-2bbe-416a-b7b1-111e44c64b36 | [8b3670e5-a6e4-4636-8066-ee54c0a52582, 29989d8... | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 3 | 95c51139-438b-4fd8-a5ce-b80dbcd77db4 | None | ac5c7d3f-564b-44d2-bdc6-6eaf23b075ce | True | 1b582c43-bc07-4ab7-959e-cf7e434711a0 | Completed | likely | caba29e9-fa98-438d-b5b7-6d43677764d3 | [cc2adb6c-34e8-4258-8634-3213c789854c, ffede5f... | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 4 | 9544b2c9-9abe-4a9b-b106-3db9510363c2 | None | b3fd00cb-8b06-4365-aac5-b0fcf6f8044c | True | 41ad2ff1-88b6-4c45-9bfd-a76d0c0ba9c6 | Cancelled | thing | ac5c7d3f-564b-44d2-bdc6-6eaf23b075ce | [502250e9-237b-4c87-85a3-ee100601bdba, 1881a55... | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
template_items.to_pandas().sample(5)
| external_id | asset_external_id | created_by_external_id | labels | order | template_external_id | title | updated_by_external_id | space | version | last_updated_time | created_time | deleted_time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 68 | 79bd1733-afa5-4b6c-9d2f-e8458f042e7e | d810d514-7fbf-4e2d-a8f8-cfc4770b4539 | 09e50d8e-9fe8-4340-916a-349e722eaf73 | [#f495e3, #7fe0a9, #dd6388] | 9134 | 78bbec25-03a3-4f4d-bfb5-7e047717d27c | car | None | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 83 | 156fc529-b365-4341-8d7b-c87cb155b50c | 72786d8a-bbb4-4e83-8cf0-bca2fc3ed796 | 8a717d87-09c0-4b1c-abdb-721e1ff5dfa5 | [] | 4868 | bb479412-b9bf-44c6-a067-58df6ba233c3 | decision | None | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 79 | 11ec48cc-b2d2-43dd-9f43-60f3f8b3b1b4 | d810d514-7fbf-4e2d-a8f8-cfc4770b4539 | bb5f75ed-e6ce-4269-ad5c-3ef4da6598b0 | [#dd6388] | 5920 | acd07034-830d-420f-b163-080145aa7844 | hard | None | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 0 | bef8179e-9e0c-44a8-8577-cf6e5ce0bd8f | 4dc95bc8-49e8-4ee8-a5f5-c7aa47f139de | 3caa43df-2fc3-4bba-b267-6e57c2ab826a | [#f495e3, #8516fc] | 4071 | 108c5106-f638-4d9c-8ef2-4b470ed94495 | half | None | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
| 2 | a92cfb02-09cd-4b47-b870-a7aded8aed07 | 72786d8a-bbb4-4e83-8cf0-bca2fc3ed796 | 2b0f4e82-f515-42ee-9545-03e08799c7c7 | [#7fe0a9, #dd6388] | 5920 | 108c5106-f638-4d9c-8ef2-4b470ed94495 | behind | None | sourceSpace | 1 | 2023-11-05 10:53:03.351000+00:00 | 2023-11-05 10:53:03.351000+00:00 | None | 
template_item_by_external_id = {item.external_id: item for item in template_items}
# The visiblity, schedules, and measurements are changes to the new template,
# and thus the source of these needs to come from elsewhere. The point of including
# these data structers is to show how
visiblity_by_item_external_id: dict[str, str] = {}
schedules_by_item_external_id: dict[str, list[str]] = {}
measurements_by_external_id: dict[str, list[str]] = {}
visiblity_by_external_id: dict[str, str] = {}
destination_templates = TemplateWriteList([])
for template in templates:
    destination_items = []
    for item_external_id in template.template_items or []:
        item = template_item_by_external_id[item_external_id]
        destination_item = TemplateItemWrite(
            external_id=item.external_id,
            space="destinationSpace",
            title=item.title,
            labels=item.labels,
            visibility=visiblity_by_item_external_id.get(item.external_id),
            created_by=item.created_by_external_id,
            updated_by=item.updated_by_external_id,
            is_archived=template.is_archived,
            order=item.order,
            asset=item.asset_external_id,
            schedules=schedules_by_item_external_id.get(item.external_id),
            measurements=measurements_by_external_id.get(item.external_id),
        )
        destination_items.append(destination_item)
    destination_template = TemplateWrite(
        external_id=template.external_id,
        space="destinationSpace",
        title=template.title,
        visibility=visiblity_by_external_id.get(template.external_id),
        created_by=template.created_by_external_id,
        updated_by=template.updated_by_external_id,
        is_archived=template.is_archived,
        status=template.status,
        labels=list({label for item in destination_items for label in item.labels}),
        root_location=template.root_location_external_id,
        assigned_to=template.assigned_to,
        template_items=destination_items,
    )
    destination_templates.append(destination_template)
How pygen Creates Edges¶
pygen create edges by looking at the relationships in the generated data classes.
For example, in the current example we have the TemplateItem node which have the fields schedules and measurements. When writing to CDF pygen iterates through all the TemplateItems and check these fields, and if there is a string it is assumed to be the external_id of the end node, and pygen creates an edge to it. If it is an object that object is assumed to be the end node and pygen creates an edge to it.
We can inspect what pygen is doing by calling the .to_instances_write() method and check the results.
destination_templates[0]
| value | |
|---|---|
| space | destinationSpace | 
| external_id | 108c5106-f638-4d9c-8ef2-4b470ed94495 | 
| existing_version | None | 
| title | good | 
| description | None | 
| labels | [#8516fc, #7fe0a9, #dd6388, #f4d89f, #f495e3] | 
| visibility | None | 
| created_by | 80d32fb4-bcbc-44ae-abed-fee53b148d82 | 
| updated_by | f1e7d13d-3fd2-42e4-bebc-68ad2aa905eb | 
| is_archived | False | 
| status | OnHold | 
| root_location | 31d9f834-a1ab-452a-b2a5-875f0e9f384a | 
| assigned_to | None | 
| template_items | [{'space': 'destinationSpace', 'external_id': ... | 
len(destination_templates[0].template_items)
4
instances = destination_templates[0].to_instances_write()
instances.nodes
| instance_type | space | external_id | sources | |
|---|---|---|---|---|
| 0 | node | destinationSpace | 108c5106-f638-4d9c-8ef2-4b470ed94495 | [{'properties': {'title': 'good', 'labels': ['... | 
| 1 | node | destinationSpace | bef8179e-9e0c-44a8-8577-cf6e5ce0bd8f | [{'properties': {'title': 'half', 'labels': ['... | 
| 2 | node | destinationSpace | 479ef472-a2c1-4f03-b38d-7e29e4f90227 | [{'properties': {'title': 'agreement', 'labels... | 
| 3 | node | destinationSpace | a92cfb02-09cd-4b47-b870-a7aded8aed07 | [{'properties': {'title': 'behind', 'labels': ... | 
| 4 | node | destinationSpace | 54e8a840-624d-41cb-91a3-1b978e292565 | [{'properties': {'title': 'wall', 'labels': ['... | 
instances.edges
| instance_type | space | external_id | type | start_node | end_node | |
|---|---|---|---|---|---|---|
| 0 | edge | destinationSpace | 108c5106-f638-4d9c-8ef2-4b470ed94495:bef8179e-... | {'space': 'IntegrationTestsImmutable', 'extern... | {'space': 'destinationSpace', 'external_id': '... | {'space': 'destinationSpace', 'external_id': '... | 
| 1 | edge | destinationSpace | 108c5106-f638-4d9c-8ef2-4b470ed94495:479ef472-... | {'space': 'IntegrationTestsImmutable', 'extern... | {'space': 'destinationSpace', 'external_id': '... | {'space': 'destinationSpace', 'external_id': '... | 
| 2 | edge | destinationSpace | 108c5106-f638-4d9c-8ef2-4b470ed94495:a92cfb02-... | {'space': 'IntegrationTestsImmutable', 'extern... | {'space': 'destinationSpace', 'external_id': '... | {'space': 'destinationSpace', 'external_id': '... | 
| 3 | edge | destinationSpace | 108c5106-f638-4d9c-8ef2-4b470ed94495:54e8a840-... | {'space': 'IntegrationTestsImmutable', 'extern... | {'space': 'destinationSpace', 'external_id': '... | {'space': 'destinationSpace', 'external_id': '... | 
We can see here that for one Template item with 4 TemplateItems pygen creates one node for each =5, and an edge connecting each of the TemplateItem to Template =4.
all_instances = destination_templates.to_instances_write()
By doing the .to_instances_write call on the all the destination Templates we can see how
many nodes and edges we are expecting to create.
len(all_instances.nodes), len(all_instances.edges)
(155, 105)
Note that for the use case we are using in this example one of the changes to the new model is that a lot of the relationships are made explicit. For example, in the old template we have root_location_external_id, while now we have direct link root_location. Due to the way pygen automatically creates edges this becomes an easier migration as if pygen knows that there should be an edge and have a string it is assumed to be an external_id.
Writing Data¶
We are no ready to write the new Template data to the new model.
We use the destination client and call .upsert.
results = destination.upsert(destination_templates)
len(results.nodes), len(results.edges)
(155, 105)
Note that it is not necessary to do a call to destination.template_items.apply(...) as the above call iterates recursively through all linked objects and finds the nodes and edges.
We can verify that the data has been uploaded correctly by calling the .list() method on the destination client.
created = destination.template.list(space="destinationSpace", limit=-1)
len(created)
50
created.to_pandas().head()
| external_id | title | description | labels | visibility | created_by | updated_by | is_archived | status | root_location | assigned_to | template_items | space | version | last_updated_time | created_time | deleted_time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 108c5106-f638-4d9c-8ef2-4b470ed94495 | good | None | [#8516fc, #7fe0a9, #dd6388, #f4d89f, #f495e3] | None | 80d32fb4-bcbc-44ae-abed-fee53b148d82 | f1e7d13d-3fd2-42e4-bebc-68ad2aa905eb | False | OnHold | 31d9f834-a1ab-452a-b2a5-875f0e9f384a | None | [bef8179e-9e0c-44a8-8577-cf6e5ce0bd8f, 479ef47... | destinationSpace | 1 | 2023-11-05 14:57:29.950000+00:00 | 2023-11-05 14:57:29.950000+00:00 | None | 
| 1 | 32f92434-e5fb-49c1-b4c4-1a454ed699ea | three | None | [#8516fc, #7fe0a9, #dd6388, #f4d89f, #f495e3] | None | 6dbb225f-5059-4ff2-9543-7ab400e235e9 | df561fd7-dad6-484e-95f7-ae967fb00ed3 | False | OnHold | dac4077b-61d6-445c-a4f4-f6f3247125ab | None | [f6ce45c7-b5a6-4a3b-b549-91e4a05580a7, bbbdda6... | destinationSpace | 1 | 2023-11-05 14:57:29.950000+00:00 | 2023-11-05 14:57:29.950000+00:00 | None | 
| 2 | 6b072855-8408-4c21-b1a3-5caf4bd22204 | also | None | [#8516fc, #dd6388, #7fe0a9] | None | 863a0cd1-f55e-4529-ac50-34090596a279 | 37d17cd1-2bbe-416a-b7b1-111e44c64b36 | True | Completed | 1b582c43-bc07-4ab7-959e-cf7e434711a0 | None | [8b3670e5-a6e4-4636-8066-ee54c0a52582, 29989d8... | destinationSpace | 1 | 2023-11-05 14:57:29.950000+00:00 | 2023-11-05 14:57:29.950000+00:00 | None | 
| 3 | 95c51139-438b-4fd8-a5ce-b80dbcd77db4 | likely | None | [#8516fc, #f4d89f, #7fe0a9] | None | ac5c7d3f-564b-44d2-bdc6-6eaf23b075ce | caba29e9-fa98-438d-b5b7-6d43677764d3 | True | Completed | 1b582c43-bc07-4ab7-959e-cf7e434711a0 | None | [cc2adb6c-34e8-4258-8634-3213c789854c, ffede5f... | destinationSpace | 1 | 2023-11-05 14:57:29.950000+00:00 | 2023-11-05 14:57:29.950000+00:00 | None | 
| 4 | 9544b2c9-9abe-4a9b-b106-3db9510363c2 | thing | None | [#dd6388, #7fe0a9] | None | b3fd00cb-8b06-4365-aac5-b0fcf6f8044c | ac5c7d3f-564b-44d2-bdc6-6eaf23b075ce | True | Cancelled | 41ad2ff1-88b6-4c45-9bfd-a76d0c0ba9c6 | None | [502250e9-237b-4c87-85a3-ee100601bdba, 1881a55... | destinationSpace | 1 | 2023-11-05 14:57:29.950000+00:00 | 2023-11-05 14:57:29.950000+00:00 | None | 

