CreatePipeline Embeddable
Currently, it's not possible to create pipelines via the Ingestro API. To allow users, either internal teams or your customers, to create pipelines, you need to integrate the CreatePipeline component. This component is easy to implement into your existing application and provides an intuitive workflow:
- Select the pipeline's input and output connectors, as well as its target data model (TDM)
- Configure how the input data should be transformed based on the selected TDM
- Set an execution schedule and an error threshold (optional)
Depending on your use case, for example, whether you want your customers to go through the flow or, say, your internal customer success team, you can configure the component in different ways. You can use the embeddable as-is, with a linked pipeline template, and/or by injecting parts of the configuration at the component level. If specific components are predefined in the template or within the component itself, they will not be shown in the flow."
You can optionally execute the pipeline immediately after it is created by setting settings.runPipelineOnCreation to true. When this setting is enabled, users can review and manually adjust individual entries during the pipeline creation process. These manual changes will then be applied to the pipeline's initial execution.
The CreatePipeline component supports a streamlined upload flow when all required pipeline components are preconfigured — including the name, manual input connector, output connector, target data model, and error threshold. If the input connector has node.type == MANUAL, the component immediately displays a file upload step as the entry point to pipeline creation. Users can upload their file first, then proceed to header selection and data transformation. This mirrors the importer-like experience, providing a simpler, more intuitive setup flow tailored for manual data ingestion.
A pipeline template is the blueprint of a pipeline. You can predefine certain components so that the user going through the "Create Pipeline" flow doesn't have to configure them manually.
Input connectors always trigger executions for all pipelines they are connected to.
Preview Mode for Large Datasets
The CreatePipeline component can automatically activate preview mode to optimize performance and provide a faster configuration experience when working with large datasets. Preview mode allows users to configure transformations on a sample of the data before applying them to the entire dataset.
How Preview Mode Works
Preview mode is automatically activated when the number of input rows exceeds the sampleSize setting (default: 10.000 rows). Set sampleSize to null to disable preview mode entirely.
When active and runPipelineOnCreation is set to false:
- Sample Data Throughout: The entire pipeline creation flow uses only a sample of the data (limited to the number of rows specified by
sampleSize). - Info Banner: An info banner is displayed at the top of the header selection and data transformation steps, indicating that you're working with a sample of the data.
- Configuring Transformations: Users can set up mappings and transformations using the sample data, ensuring a responsive and efficient workflow.
When active and runPipelineOnCreation is set to true:
- Initial Data Transformation: Only the initial data transformation step uses a sample of the data (limited to the number of rows specified by
sampleSize). - Info Banner: An info banner is displayed at the top of the data transformation step, indicating that you're working with a sample of the data.
- Configuring Transformations: Users can set up mappings and transformations using the sample data, ensuring a responsive and efficient workflow.
- Applying to All Rows: After configuring transformations, users can apply the changes to the entire dataset. The transformations are then processed across all input rows.
- Continuing the Flow: Once transformations are applied to all rows, the flow continues normally with the complete dataset, and preview mode is no longer active for subsequent steps.
Preview mode is designed to improve performance when working with large files while maintaining the same level of control over data transformation and mapping.
Configure the component based on your specific use case
Add the code snippet below and insert the component on the page where you want it to appear:
Fields
accessToken
Add here the access token you've got in Step 3
templateId
Add here the ID of the template you want to use when a pipeline is created
configuration
The configuration determines if certain settings or components, such as connectors, target data model, schedule config etc., are already set for the pipeline, meaning that users going through the flow won't have to set them themselves
developerMode
Set developer mode to true to test pipelines in your testing environment. Pipeline executions in developer mode are free of charge. Set it to false for production use. Please note that pipelines executed in developer mode will only output 100 rows
name
The name of the pipeline
tdmId
The ID of the target data model that should be used for the created pipeline. If this is set, the user won't be able to select another target data model
inputConnectorId
The ID of the input connector that should be used for the created pipeline. If this is set, the user won't be able to select another input connector
outputConnectorId
The ID of the output connector that should be used for the created pipeline. If this is set, the user won't be able to select another output connector
errorConfig
Defines how the pipeline should handle errors that might occur during pipeline execution
errorThreshold
A float between 0 and 100, representing the allowed percentage of erroneous cells during a pipeline execution. For example, if it is set to 10, it means that pipeline executions with less than 10% erroneous cells will be considered successful and will not fail
scheduleConfig
Defines when the pipeline is executed for the first and last time, as well as the interval at which it is executed
frequency
Sets how often the pipeline is executed. It is intertwined with interval. For example, if frequency is set to HOURLY and interval is set to 2, the pipeline is executed every 2 hours:
- HOURLY
- DAILY
- WEEKLY
- MONTHLY
interval
Sets the interval based on the frequency at which the pipeline is executed. For example, if interval is set to 2 and frequency is set to HOURLY, the pipeline is executed every 2 hours. The next execution cannot be scheduled further into the future than 1 year from the set start date and time
startsOn
The date and time when the pipeline is first executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). The date and time cannot be in the past
endsOn
The date and time when the pipeline is last executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). This date and time cannot be earlier than the start date and time
settings
i18nOverrides
Allows you to override each text element in the interface
language
Defines the language of the embeddable (so far we only support English ("en"))
modal
Defines whether the component is shown inline (false) or within a modal view (true)
allowTdmCreation
Defines whether the "Create target data model" button is shown in the TDM selection dropdown
allowInputConnectorCreation
Defines whether the "Create connector" button is shown in the input connector selection dropdown
allowOutputConnectorCreation
Defines whether the "Create connector" button is shown in the output connector selection dropdown
runPipelineOnCreation
Defines whether the pipeline is executed after it was created.
sampleSize
Defines the maximum number of rows to use during the data transformation preview phase when working with large datasets. Default: 10000 (integer or null). When the number of input rows exceeds this value, preview mode is automatically activated, allowing users to configure transformations on a sample before applying them to the entire dataset. Set to null to disable preview mode entirely.
onPipelineCreate
Runs after the user has confirmed the final step of the flow to create a pipeline
onClose
Runs when the user attempts to exit the "Create Pipeline" flow by clicking "Cancel" or closing the modal using the "X" button
onConnectorCreate
Runs when the user clicks on "Create connector" when selecting an input or output connector
onTdmCreate
Runs when the user clicks on "Create target data model" when selecting a target data model
onExecutionView
Runs when the user clicks on the "View" or "Fix" button of the execution that was created after the new pipeline was created. When defined, each execution element shows a "View" or "Fix" button that triggers this hook when clicked.
onSuccessContinue
Runs when the user clicks the "Continue" button on the success screen after successfully creating a pipeline.
<CreatePipeline
accessToken="ACCESS_TOKEN"
templateId="TEMPLATE_ID"
configuration={{
developerMode: boolean (default: false),
name: string,
tdmId: string,
outputConnectorId: string,
inputConnectorId: string,
errorConfig: {
error_threshold: number,
},
scheduleConfig: {
frequency: "HOURLY" | "DAILY" | "WEEKLY" | "MONTHLY",
interval: number,
startsOn: Date,
endsOn: Date,
}
}}
settings={{
i18nOverrides: {},
language: "en",
modal: boolean (default: true),
allowTdmCreation: boolean (default: false),
allowInputConnectorCreation: boolean (default: false),
allowOutputConnectorCreation: boolean (default: false),
runPipelineOnCreation: boolean (default: false),
sampleSize: number (default: 10000)
}}
onPipelineCreate={({data}) => {
// runs after the user has confirmed the final step of the flow to create a pipeline
// data: pipeline object after creation
}}
onClose={() => {
// runs when the creation workflow is closed via the "Cancel" button or the "X" button
}}
onConnectorCreate={({reload, connectorType}) => {
// runs when the user clicks on "Create connector" when selecting an input or output connector
// reload: on function call, refetch the connectors
// connectorType: "input" or "output"
}}
onTdmCreate={({reload}) => {
// runs when the user clicks on "Create target data model" when selecting a target data model
// reload: on function call, refetch the TDMs
}}
onExecutionView={({data}) => {
// runs when the user selects an execution from the list of triggered pipelines
// data: object of the selected execution
}}
onSuccessContinue={({data}) => {
// runs when the user clicks the "Continue" button on the success screen after pipeline creation
// data: object of the created pipeline
}}
/>