Data ingestion
This page just contains our recommendations, if you want to do it in a slightly different way and want to have some feedback about it please contact the Backstage team.
When ingesting data from 3rd party systems (that do not provide webhooks) and pushing it to Backstage this is the workflow we'd recommend:
- Grab data to ingest from 3rd party system
- Chunk into reasonable sizes (50 assets at a time for example)
- For each chunk do:
- Request a collection from backstage looking for the external ID's of your assets (3rd party ID's)
- Convert each new asset (not in the backstage collection) into a backstage asset
- Check whether existing assets have the same checksum
- If checksum differs update/convert the data
- Do a bulk call with all creations/updates to backstage
Example
Below is some sample code of how something like this could be implemented.
This is only pseudo code and will not actually work, it's here to outline what the solution could look like. It also does not take into account any other separate data that might need to be ingested such as editions/streams, genres or crew.
const data = await fetch(...);
// Chunk into 50 assets max
const chunks = chunk(data, 50);
for (const chunk of chunks) {
// Grab the IDs of the external system
const ids = chunk.map(item => item.id);
const query = JSON.stringify({ external_id: { type: 'array', 'operation': 'or', value: ids} });
const backstageResponse = await fetch('https://dapi.backstage-api.com/media/search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-service-id': '',
'consumer-key': ''
}
body: JSON.stringify({
query
})
});
const backstageItems = await backstageRepsonse.json();
// Split into existing and non existing items
const newItems = chunk.filter(item => !backstageItems.items.find(element => element.external_id === item.id));
const existingItems = chunk.filter(item => {
const backstageItem = backstageItems.items.find(element => element.external_id === item.id);
// If no backstage item is found it's a new item
if (!backstageItem)
return false;
// Check whether original data checksum has changed
return backstageItem.checksum === checksum(item);
})
/*
* Convert items into backstage items.
*
* the convert function needs to handle all the property conversion logic
*/
const newItemsConverted = newItems.map(item => convert(item));
const existingItemsConverted = existingItems.map(item => convert(item));
const backstageTypes = [{ type: 'movie', endpoint: 'movies', type: 'series', endpoint: 'series' }];
for (const bsType of backstageTypes) {
fetch(`https://dapi.backstage-api.com/${bsType.endpoint}`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-service-id': '',
'consumer-key': ''
}
body: JSON.stringify({
create: newItemsConverted.filter(item => item.type === bsType.type),
update: existingItemsConverted.filter(item => item.type === bsType.type)
})
})
}
}
External ID's
All asset metadata has an external_id
field that shoud be used to populate the 3rd party system's ID with. This external ID can then also be used to try and retrieve a collection of them to assertain whether they're already in the system or not (and if so whether they need to be updated).
Collection
You can construct a query to search on external ID's like this:
{
"external_id": {
"type": "array",
"operation": "or",
"value": ["ID", "ID"]
}
}
And pass it into the POST /media/search
endpoint as an escaped string into the query
property in the body.
Checksums
When creating assets you can provide a checksum
property to the asset that can be used to check whether the original data was modified yes or no.
During your ingest process you can create a hash from the original asset data (or a different set of consistent data if you'd like) that you can later use to quickly determine whether the asset contents have changed. Depending on that check you can either ignore assets from the 3rd party system or decide to update the asset in Backstage.