Selecting / Querying¶
We assume that you have generated a SDK for the WindTurbine
model and have a client ready to go.
Selecting/Querying is a different way of retrieving data than doing .list()
, .retrieve()
, search()
in that it supports retrieving nested data structures, including filtering anywhere on the retrieved data structure. SDKs generated by pygen
have two ways of selecting/querying which we will refer to as Python and GraphQL qurying. In this section, we will go through both of them,
and list limitations as well as when to use one over the other.
from wind_turbine import WindTurbineClient
pygen = WindTurbineClient.from_toml("config.toml")
Comparison¶
Python Based Querying/Selecting¶
Advantages
- Discovarability through the IDE through autocompletion.
- The returning data classes have required/optional set as in the data model, which are useful for static type checking and IDE auto complete.
- Regular filtering has same syntax as querying along any edge, direct relation, or reverse direct relation.
Limitations
- You can only query along one chain of connections. For example, if we start from
WindTurbine
above we can either go to blades or metmast, not both.
GraphQL Based Querying¶
Advantages
- Flexiblity in querying. You can retrieve anything you can write up as a graphql query.
Limitations
- Difficult to write the querying, typically you would use the CDF UI to create the query.
- The returning data class has all properties as optional and is thus less structured.
- Can become very verbose when you want to retrieve a lot of properties.
Summary¶
GraphQL based querying is more flexible, but gives you less structure on the returning data classes. In addition, the query will have to be created outside of your IDE which requires context switching.
Python Based Querying¶
Examples¶
Query: All turbines with nacelle:
result = pygen.wind_turbine.select().nacelle.list_full()
result
space | external_id | capacity | name | blades | datasheets | nacelle | rotor | windfarm | data_record | |
---|---|---|---|---|---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_3 | 7.0 | hornsea_1_mill_3 | [hornsea_1_mill_3_blade_A, hornsea_1_mill_3_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_3_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
1 | sp_wind | hornsea_1_mill_2 | 7.0 | hornsea_1_mill_2 | [hornsea_1_mill_2_blade_B, hornsea_1_mill_2_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_2_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
2 | sp_wind | hornsea_1_mill_1 | 7.0 | hornsea_1_mill_1 | [hornsea_1_mill_1_blade_A, hornsea_1_mill_1_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_1_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
3 | sp_wind | hornsea_1_mill_4 | 7.0 | hornsea_1_mill_4 | [hornsea_1_mill_4_blade_C, hornsea_1_mill_4_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_4_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
4 | sp_wind | hornsea_1_mill_5 | 7.0 | hornsea_1_mill_5 | [hornsea_1_mill_5_blade_B, hornsea_1_mill_5_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_5_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
Query: Get all blades for the windturbine named "hornsea_1_mill_1"
result = pygen.wind_turbine.select().name.equals("hornsea_1_mill_1").blades.list_blade()
result
space | external_id | is_damaged | name | data_record | |
---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_1_blade_A | True | A | {'version': 1, 'last_updated_time': 2024-11-16... |
1 | sp_wind | hornsea_1_mill_1_blade_B | False | B | {'version': 1, 'last_updated_time': 2024-11-16... |
2 | sp_wind | hornsea_1_mill_1_blade_C | False | C | {'version': 1, 'last_updated_time': 2024-11-16... |
Query: Get all blades for the windmill with external ID "hornsea_1_mill_3" or "hornsea_1_mill_4" with a damaged blade
result = (
pygen.wind_turbine.select()
.external_id.in_(["hornsea_1_mill_3", "hornsea_1_mill_4"])
.blades.is_damaged.equals(True)
.list_blade()
)
result
space | external_id | is_damaged | name | data_record | |
---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_4_blade_C | True | C | {'version': 1, 'last_updated_time': 2024-11-16... |
Same query as above but return the turbines
result = (
pygen.wind_turbine.select()
.external_id.in_(["hornsea_1_mill_3", "hornsea_1_mill_4"])
.blades.is_damaged.equals(True)
.list_full()
)
result
space | external_id | capacity | name | blades | datasheets | nacelle | rotor | windfarm | data_record | |
---|---|---|---|---|---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_4 | 7.0 | hornsea_1_mill_4 | [{'space': 'sp_wind', 'external_id': 'hornsea_... | [windmill_schematics] | hornsea_1_mill_4_nacelle | hornsea_1_mill_4_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
We can also inspect the query we are doing:
pygen.wind_turbine.query().nacelle.gearbox
Query
Call .list_full() to return a list of Windturbine and .list_gearbox() to return a list of Gearbox.
In the query above, we go from WindTurbine
node through the nacelle
direct relation to the Nacelle
node, and continue through the gearbox
direct relation to the Gearbox
node.
Autocomplete¶
Warning Notebook IDE
The autocomplete shown in the screenshots below works best in a jupyter notebook IDE. For example, VS Code notebook/PyCharm notebooks/Jupyter Lite (CDF Notebook) you might not get the same autocomplete when writing .
+ tab.
This style of querying is initiated by calling the .select()
method on the place we want to start the query. Then, all properties that pygen supports filtering on are available as an attribute on the returning object of query. In addition, all types of connections (edges, direct relations, and reverse direct relations) can be traversed. Illustrated in the screenshot below. To get the list of possible options write .
and press tab.
We can select a properties, and depending on the type of property, we will get the available filters up. For example, name
is a string property which will make equals
, in
and prefix
filter available.
Ones we have input the filterin values we are back at the source node and can continue to filter on properties of this node or traverse do the next one. In the example below, we traverse to the nacelle:
To make the query more readable it is recommended that you use the paranthesis syntax. This makes the query more readable.
Finally, finish with either .list_full()
or .list_<type>()
to return the all nodes and edges in the query, or only the last node.
result = (
pygen.wind_turbine.select()
.capacity.range(6.0, 8.0)
.name.prefix("hornsea")
.blades.is_damaged.equals(True)
.list_blade()
)
result
space | external_id | is_damaged | name | data_record | |
---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_1_blade_A | True | A | {'version': 1, 'last_updated_time': 2024-11-16... |
1 | sp_wind | hornsea_1_mill_4_blade_C | True | C | {'version': 1, 'last_updated_time': 2024-11-16... |
2 | sp_wind | hornsea_1_mill_2_blade_B | True | B | {'version': 1, 'last_updated_time': 2024-11-16... |
Sorting¶
You can also sort the results returned by the .select()
method. This is done by the .sort_ascending()
and .sort_descending()
property you are sorting.
Query: Get all damaged blades connected to turbines with capacity between 6.0 and 8.0 and a names that starts with "hornsea". Sort the blades by name
result = (
pygen.wind_turbine.select()
.capacity.range(6.0, 8.0)
.name.prefix("hornsea")
.blades.is_damaged.equals(True)
.name.sort_ascending()
.list_blade()
)
result
space | external_id | is_damaged | name | data_record | |
---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_1_blade_A | True | A | {'version': 1, 'last_updated_time': 2024-11-16... |
1 | sp_wind | hornsea_1_mill_2_blade_B | True | B | {'version': 1, 'last_updated_time': 2024-11-16... |
2 | sp_wind | hornsea_1_mill_4_blade_C | True | C | {'version': 1, 'last_updated_time': 2024-11-16... |
In addition, for properties of type TimeStamp
and Date
you can use .latest()
and .earliest()
. This is a shorthand for calling .sort_descending()
/.sort_decending()
+ setting limit=1
.
Query: Get the latest uploaded datasheet for the "hornsea_1_mill_1" turbine
result = pygen.wind_turbine.select().name.equals("hornsea_1_mill_1").datasheets.uploaded_time.latest().list_data_sheet()
result
space | external_id | is_uploaded | mime_type | name | uploaded_time | data_record | |
---|---|---|---|---|---|---|---|
0 | sp_wind | windmill_schematics | True | application/pdf | windmill_schematics.pdf | 2024-11-16 14:19:28.484000+00:00 | {'version': 2, 'last_updated_time': 2024-11-16... |
Direct Relations¶
We can both traverse and filter on connections that are modeled as a direct relations. pygen
distinguish these by appending _filter
to the attribute when filtering and nothing when traversing.
Query: Get all wind turbines with nacelles.
result = pygen.wind_turbine.select().nacelle.list_full(limit=5)
result
space | external_id | capacity | name | blades | datasheets | nacelle | rotor | windfarm | data_record | |
---|---|---|---|---|---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_3 | 7.0 | hornsea_1_mill_3 | [hornsea_1_mill_3_blade_A, hornsea_1_mill_3_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_3_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
1 | sp_wind | hornsea_1_mill_2 | 7.0 | hornsea_1_mill_2 | [hornsea_1_mill_2_blade_B, hornsea_1_mill_2_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_2_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
2 | sp_wind | hornsea_1_mill_1 | 7.0 | hornsea_1_mill_1 | [hornsea_1_mill_1_blade_A, hornsea_1_mill_1_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_1_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
3 | sp_wind | hornsea_1_mill_4 | 7.0 | hornsea_1_mill_4 | [hornsea_1_mill_4_blade_C, hornsea_1_mill_4_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_4_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
4 | sp_wind | hornsea_1_mill_5 | 7.0 | hornsea_1_mill_5 | [hornsea_1_mill_5_blade_B, hornsea_1_mill_5_bl... | [windmill_schematics] | {'space': 'sp_wind', 'external_id': 'hornsea_1... | hornsea_1_mill_5_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
result[0].nacelle
value | |
---|---|
space | sp_wind |
external_id | hornsea_1_mill_3_nacelle |
data_record | {'version': 1, 'last_updated_time': 2024-11-16... |
node_type | None |
acc_from_back_side_x | V52-WindTurbine.Acc1N |
acc_from_back_side_y | V52-WindTurbine.Acc2N |
acc_from_back_side_z | V52-WindTurbine.Acc3N |
gearbox | hornsea_1_mill_3_nacelle_gearbox |
generator | hornsea_1_mill_3_nacelle_generator |
high_speed_shaft | hornsea_1_mill_3_nacelle_highspeedshaft |
main_shaft | hornsea_1_mill_3_nacelle_mainshaft |
power_inverter | hornsea_1_mill_3_nacelle_powerinverter |
wind_turbine | None |
yaw_direction | V52-WindTurbine.yaw |
yaw_error | V52-WindTurbine.YawErr |
Query: Get the wind turibne with the nacelle hornsea_1_mill_3_nacelle
result = pygen.wind_turbine.select().nacelle_filter.equals("hornsea_1_mill_3_nacelle").list_full(limit=5)
result
space | external_id | capacity | name | blades | datasheets | nacelle | rotor | windfarm | data_record | |
---|---|---|---|---|---|---|---|---|---|---|
0 | sp_wind | hornsea_1_mill_3 | 7.0 | hornsea_1_mill_3 | [hornsea_1_mill_3_blade_A, hornsea_1_mill_3_bl... | [windmill_schematics] | hornsea_1_mill_3_nacelle | hornsea_1_mill_3_rotor | Hornsea 1 | {'version': 8, 'last_updated_time': 2024-12-17... |
GraphQL based Querying¶
When querying with GraphQL we must include __typename
of the top level items as this is used by pygen
to understand how to pase the object.
The querying method is available on the top level client as this is not particular to any of the data types in your data model
my_query = """{
listWindTurbine(first:1){
items{
__typename
name
capacity
nacelle{
externalId
}
rotor{
externalId
}
blades{
items{
name
is_damaged
}
}
}
}
}"""
result = pygen.graphql_query(my_query)
result
blades | capacity | nacelle | name | rotor | |
---|---|---|---|---|---|
0 | [{'space': None, 'external_id': None, 'data_re... | 7.0 | {'space': None, 'external_id': 'hornsea_1_mill... | hornsea_1_mill_3 | {'space': None, 'external_id': 'hornsea_1_mill... |
result[0].model_dump(exclude_none=True)
{'blades': [{'is_damaged': False, 'name': 'A'}, {'is_damaged': False, 'name': 'B'}, {'is_damaged': False, 'name': 'C'}], 'capacity': 7.0, 'nacelle': {'external_id': 'hornsea_1_mill_3_nacelle'}, 'name': 'hornsea_1_mill_3', 'rotor': {'external_id': 'hornsea_1_mill_3_rotor'}}
turbine = result[0]
turbine.nacelle.external_id
'hornsea_1_mill_3_nacelle'
Pitfalls¶
If you forget to include __typename
on the top level item pygen
will raise a Runtime
error
my_invalid_query = """
{
listWindTurbine(first:1){
items{
name
}
}
}
"""
try:
pygen.graphql_query(my_invalid_query)
except RuntimeError as e:
print(e)
Missing '__typename' in GraphQL response. Cannot determine the type of the response.
Data Classes¶
When you call .list()
, .retrieve()
and .search()
, pygen
returns ther read format of data classes. This read format matches the type/view required/optional properties.
When you do the graphql_query
above all properties are optional as pygen
cannot know which objects you included in your query, thus pygen
uses a special GraphQL format of the
data class it is returning
my_query = """{
listWindTurbine(first:1){
items{
name
__typename
}
}
}"""
result = pygen.graphql_query(my_query)
type(result)
wind_turbine.data_classes._core.base.GraphQLList
type(result[0])
wind_turbine.data_classes._wind_turbine.WindTurbineGraphQL
This data class can be converted to a regular write or read format by calling the as_write
and as_read
call.
Warning If you have not included all required properties in the your GraphQL query, pygen
will raise an ValueError
when you do this call.
try:
result[0].as_read()
except ValueError as e:
print(e)
2 validation errors for WindTurbine externalId Field required [type=missing, input_value={'name': 'hornsea_1_mill_3'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.9/v/missing data_record Field required [type=missing, input_value={'name': 'hornsea_1_mill_3'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.9/v/missing
my_query = """{
listWindTurbine(first:1){
items{
name
space
externalId
createdTime
lastUpdatedTime
__typename
}
}
}"""
result = pygen.graphql_query(my_query)
windmill_read = result[0].as_read()
windmill_read
value | |
---|---|
space | sp_wind |
external_id | hornsea_1_mill_3 |
data_record | {'version': 0, 'last_updated_time': 2024-12-17... |
node_type | None |
capacity | None |
description | None |
name | hornsea_1_mill_3 |
blades | None |
datasheets | None |
metmast | None |
nacelle | None |
power_curve | None |
rotor | None |
windfarm | None |
type(windmill_read)
wind_turbine.data_classes._wind_turbine.WindTurbine
windmill_write = result[0].as_write()
windmill_write
value | |
---|---|
space | sp_wind |
external_id | hornsea_1_mill_3 |
data_record | {'existing_version': 0} |
node_type | None |
capacity | None |
description | None |
name | hornsea_1_mill_3 |
blades | None |
datasheets | None |
metmast | None |
nacelle | None |
power_curve | None |
rotor | None |
windfarm | None |
type(windmill_write)
wind_turbine.data_classes._wind_turbine.WindTurbineWrite
Paging¶
If we include a pageInfo
in our query this will be available directly on the result returned from the .graphql_query
method
my_query = """
{
listWindTurbine{
items{
__typename
name
}
pageInfo{
hasNextPage
hasPreviousPage
startCursor
endCursor
}
}
}"""
result = pygen.graphql_query(my_query)
result.page_info.has_next_page
True
result.page_info.end_cursor
'Z0FBQUFBQm5jcVdlTGd2ZmRFdVNrLW1PUnFvaEl0RV8xaGU4cU95d2o1SWFQdWxFZVVyeTZ4OEhKR09kNjIzMTh4b1BYU09jZHI1bkhRN05rMDkzUWhubnFkRVhVNjB0bUZkd0JFQ3BzS0l6ZVBuRUdIQXRGX0tza2tTVVhlWkV6cUMxVGdyX2t1aEk4S1Bvd0Y0aVJnUDk0NU41d2M1R3pFV2ZnelpOS1J6RDkxTWRiUTY2SHB2Y0ZEc05MVElLbk9WQjNaUERtaE1DRlY4akhWSmhZWmJHZzhVaEFLdm5rQjFHbU9vNFRFQm5BUG90ZXNlNHhPQm5HUEZzajJZSmhJNjFpVFYzcEpkVQ=='
Datapoints¶
You can use the getDatapoints
to include datapoints for a referenced timeseries.
my_query = """{
listMetmast(first: 1){
items{
__typename
temperature{
externalId
name
getDataPoints(granularity: "1d", aggregates: SUM, first: 100){
items{
timestamp
sum
}
}
}
}
}
}"""
result = pygen.graphql_query(my_query)
metmast = result[0]
metmast.temperature.data.to_pandas()
utsira_station_temperature|sum | |
---|---|
2020-12-31 | -9.844667 |
2021-01-01 | -254.494219 |
2021-01-02 | -495.438664 |
2021-01-03 | -588.549775 |
2021-01-04 | -602.327553 |
... | ... |
2021-04-05 | 87.561336 |
2021-04-06 | 235.783558 |
2021-04-07 | 295.394670 |
2021-04-08 | 167.116892 |
2021-04-09 | -0.383108 |
100 rows × 1 columns
You can use the getLatestDataPoint
to include the latest datapoint for a referenced timeseries.
my_query = """{
listMetmast(first: 1){
items{
__typename
temperature{
externalId
name
getLatestDataPoint{
items{
timestamp
value
}
}
}
}
}
}"""
result = pygen.graphql_query(my_query)
metmast = result[0]
metmast.temperature.latest
value | |
---|---|
2024-12-30 13:00:00 | -0.122444 |
Next section: Creating and Deleting