Pipeline API
A pipeline is the combination of all essential components, such as input and output connectors, as well as the target data model. It enables you to automate the process of importing data into your target system.
You also want to create pipelines within your system?
Currently, you cannot create pipelines via our API. If you want to create pipelines, you can do the following:
- use our
CreatePipelineembeddable component by checking our guide - use the Ingestro User Platform
Use this base URL and add the corresponding endpoint respectively:
Base URL
api-gateway.ingestro.com/dp/api/v1/
Update
Endpoint
PUT /pipeline/{id}
Payload
Attributes
name
The name of the pipeline
configuration
Defines the specific setup of your pipeline
input_connectors
The list of all input connectors used for this pipeline. Currently, we only support one input connector per pipeline. Find out more about connectors here
output_connectors
The list of all output connectors used for this pipeline. Currently, we only support one output connector per pipeline. Find out more about connectors here
mapping_config
Defines how the input columns are mapped to the target data model columns and how their values are transformed to meet the requirements of the target data model
mode
Defines whether Ingestro AI is used to map input columns that haven’t been mapped yet to the output columns during future executions:
DEFAULT: Ingestro AI is applied to unmapped input columnsEXACT: Only already mapped columns are used
mappings
The list of all target data model columns with their mapped input columns and applied transformations
source_columns
The columns from the input data mapped to the target_column
target_column
An output column from the given target data model
transformations
The transformations applied to map the input columns to the output column in the correct format
name
The name of the applied transformation
type
The type of transformation applied:
- HYPER_FORMULA
- OPTION_MAPPING
function
The code or formula of the transformation, provided as a string
prompt
The prompt used to generate the transformation
tdm
The ID of the set target data model
error_config
Defines how the pipeline should handle errors that might occur during pipeline execution
error_threshold
A float between 0 and 100, representing the allowed percentage of erroneous cells during a pipeline execution. For example, if it is set to 10, it means that pipeline executions with less than 10% erroneous cells will be considered successful and will not fail.
schedule_config
Defines when the pipeline is executed for the first and last time, as well as the interval at which it is executed
frequency
Sets how often the pipeline is executed. It is intertwined with interval. For example, if frequency is set to HOURLY and interval is set to 2, the pipeline is executed every 2 hours:
- HOURLY
- DAILY
- WEEKLY
- MONTHLY
interval
Sets the interval based on the frequency at which the pipeline is executed. For example, if interval is set to 2 and frequency is set to HOURLY, the pipeline is executed every 2 hours. The next execution cannot be scheduled further into the future than 1 year from the set start date and time
starts_on
The date and time when the pipeline is first executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). The date and time cannot be in the past
ends_on
The date and time when the pipeline is last executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). This date and time cannot be earlier than the start date and time
header_config
Defines how the header row is determined
type
Specifies whether Ingestro's header detection is applied or if the set row_index is used to determine the header row:
SMART: Ingestro's header detection is used to define the header rowSTATIC: The row at the specifiedrow_indexis used as the header row
row_index
The index of the row that should be used as the header row if type is set to STATIC
sheet_config
Defines which sheet to process when working with multi-sheet file types (XLS, XLSX, XML). This configuration is only applicable for multi-sheet files and is ignored for single-sheet file types (CSV, TSV, JSON).
When to use:
- Your input connector uses XLS, XLSX, or XML files with multiple sheets
- You want to specify which sheet contains the data to process
- You need consistent sheet selection across multiple executions
Behavior:
- For single-sheet file types (CSV, TSV, JSON):
sheet_configis ignored - For multi-sheet files with only one sheet:
sheet_configis optional (defaults to first sheet) - For multi-sheet files with multiple sheets:
sheet_configdetermines which sheet to use
selection_type
Defines the strategy for selecting which sheet to process:
INDEX: Select sheet by its position (0-indexed). Use this when you always want to process the sheet at a specific position (e.g., always the 2nd sheet)NAME: Select sheet by its name. Use this when you always want to process a sheet with a specific name (e.g., always the "CRM" sheet)
sheet_index
The 0-based index of the sheet to process. Required when selection_type is INDEX.
Examples:
0= First sheet1= Second sheet2= Third sheet
Important: Backend uses 0-indexing. The first sheet is at index 0, not 1.
sheet_name
The name of the sheet to process. Required when selection_type is NAME.
Examples:
"Sheet1"= Process the sheet named "Sheet1""CRM"= Process the sheet named "CRM""Data"= Process the sheet named "Data"
Important: Sheet names are case-sensitive. The execution will fail if a sheet with the specified name is not found in the file.
developer_mode
Defines if the pipeline is executed in developer mode (true) or not (false). Use the developer mode to test pipelines in your testing environment. Pipeline executions in developer mode are free of charge. Deactivate it for production use. Please note that pipelines executed in developer mode will only output 100 rows
active
Indicates whether the pipeline is set to active (true) or inactive (false) after creation. When a pipeline is active it can be either executed by triggering the execution manually or based on the set schedule. An inactive pipeline cannot be executed in any way
Payload
{
"name": "string",
"configuration": {
"input_connectors": [
"string"
],
"output_connectors": [
"string"
],
"mapping_config": {
"mode": "DEFAULT",
"mappings": [
{
"source_columns": [
"string"
],
"target_column": "string",
"transformations": [
{
"name": "string",
"type": "HYPER_FORMULA",
"function": "string",
"prompt": "string"
}
]
}
]
},
"tdm": "string",
"error_config": {
"error_threshold": 0
},
"schedule_config": {
"frequency": "HOURLY",
"interval": 0,
"starts_on": "2024-09-02T13:26:13.642Z",
"ends_on": "2024-09-02T13:26:13.642Z"
},
"header_config": {
"type": "SMART",
"row_index": 0
},
"sheet_config": {
"selection_type": "NAME",
"sheet_name": "CRM"
},
"developer_mode": 0
},
"active": 0
}
Response
Attributes
id
The ID of the pipeline
name
The name of the pipeline
active
Indicates whether the pipeline is set to active (true) or inactive (false) after creation. When a pipeline is active it can be either executed by triggering the execution manually or based on the set schedule. An inactive pipeline cannot be executed in any way
draft
Shows if the pipeline is in draft (true) or not (false). A pipeline in draft cannot be executed in any way
configuration
Defines the specific setup of your pipeline
input_connectors
The list of all input connectors used for this pipeline. Currently, we only support one input connector per pipeline. Find out more about connectors here
output_connectors
The list of all output connectors used for this pipeline. Currently, we only support one output connector per pipeline. Find out more about connectors here
mapping_config
Defines how the input columns are mapped to the target data model columns and how their values are transformed to meet the requirements of the target data model
mode
Defines whether Ingestro AI is used to map input columns that haven’t been mapped yet to the output columns during future executions:
DEFAULT: Ingestro AI is applied to unmapped input columnsEXACT: Only already mapped columns are used
mappings
The list of all target data model columns with their mapped input columns and applied transformations
source_columns
The columns from the input data mapped to the target_column
target_column
An output column from the given target data model
transformations
The transformations applied to map the input columns to the output column in the correct format
name
The name of the applied transformation
type
The type of transformation applied:
- HYPER_FORMULA
- OPTION_MAPPING
function
The code or formula of the transformation, provided as a string
prompt
The prompt used to generate the transformation
tdm
The ID of the set target data model
error_config
Defines how the pipeline should handle errors that might occur during pipeline execution
error_threshold
A float between 0 and 100, representing the allowed percentage of erroneous cells during a pipeline execution. For example, if it is set to 10, it means that pipeline executions with less than 10% erroneous cells will be considered successful and will not fail
schedule_config
Defines when the pipeline is executed for the first and last time, as well as the interval at which it is executed
frequency
Sets how often the pipeline is executed. It is intertwined with interval. For example, if frequency is set to HOURLY and interval is set to 2, the pipeline is executed every 2 hours:
- HOURLY
- DAILY
- WEEKLY
- MONTHLY
interval
Sets the interval based on the frequency at which the pipeline is executed. For example, if interval is set to 2 and frequency is set to HOURLY, the pipeline is executed every 2 hours. The next execution cannot be scheduled further into the future than 1 year from the set start date and time
starts_on
The date and time when the pipeline is first executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). The date and time cannot be in the past
ends_on
The date and time when the pipeline is last executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). This date and time cannot be earlier than the start date and time
header_config
Defines how the header row is determined
type
Specifies whether Ingestro's header detection is applied or if the set row_index is used to determine the header row:
SMART: Ingestro's header detection is used to define the header rowSTATIC: The row at the specifiedrow_indexis used as the header row
row_index
The index of the row that should be used as the header row if type is set to STATIC
developer_mode
Defines if the pipeline is executed in developer mode (true) or not (false). Use the developer mode to test pipelines in your testing environment. Pipeline executions in developer mode are free of charge. Deactivate it for production use. Please note that pipelines executed in developer mode will only output 100 rows
created_at
The date and time when the pipline was first created
created_by
Information about whom created the pipeline
id
The ID of the user or sub-organization who created the pipeline
name
The name of the user or sub-organization who created the pipeline
identifier
The identifier of the user or sub-organization who created the pipeline
type
Defines the type of user who created the pipeline:
USER: A user of your organizationSUB_ORG: A sub-organization that is part of your organization
updated_at
The date and time when the pipeline was last updated
updated_by
Information about whom last updated the pipeline
id
The ID of the user or sub-organization who last updated the pipeline
name
The name of the user or sub-organization who last updated the pipeline
identifier
The identifier of the user or sub-organization who last updated the pipeline
type
Defines the type of user who last updated the pipeline:
USER: A user of your organizationSUB_ORG: A sub-organization that is part of your organization
Response
{
"data": {
"id": "string",
"name": "string",
"active": true,
"draft": true,
"configuration": {
"input_connectors": [
"string"
],
"output_connectors": [
"string"
],
"mapping_config": {
"mode": "string",
"mappings": [
{
"source_columns": [
"string"
],
"target_column": "string",
"transformations": [
{
"name": "string",
"type": "HYPER_FORMULA",
"function": "string",
"prompt": "string"
}
]
}
]
},
"tdm": "string",
"error_config": {
"error_threshold": 0
},
"schedule_config": {
"frequency": "HOURLY",
"interval": 0,
"starts_on": "2024-08-28T15:18:27.477Z",
"ends_on": "2024-08-28T15:18:27.477Z"
},
"header_config": {
"type": "SMART",
"row_index": 0
},
"sheet_config": {
"selection_type": "NAME",
"sheet_name": "CRM"
},
"configuration_type": "PIPELINE",
"developer_mode": true
},
"createdAt": "2022-03-07 12:48:28.653",
"created_by": {
"id": "string",
"name": "string",
"identifier": "string",
"type": "USER"
},
"updateAt": "2022-03-07 12:48:28.653",
"update_by": {
"id": "string",
"name": "string",
"identifier": "string",
"type": "USER"
}
}
}
Example
curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/${pipelineId}' \
-H 'accept: application/json' \
-H 'Authorization: Bearer ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"name": "NEW NAME",
"configuration": {
"input_connectors": [
"INPUT_CONNECTOR_ID"
],
"output_connectors": [
"OUTPUT_CONNECTOR_ID"
],
"error_config": {
"error_threshold": 10
},
"schedule_config": {
"frequency": "WEEKLY",
"interval": 3,
"starts_on": "2025-03-17T18:21:47.332Z"
}
}
}'
Sheet Selection Examples
When working with Excel (XLS, XLSX) or XML files that contain multiple sheets, use the sheet_config property to specify which sheet to process. You can select sheets either by their position (INDEX) or by their name (NAME). For single-sheet file types like CSV, TSV, or JSON, the sheet_config property is not needed and will be ignored if provided.
Example 1: Select Sheet by Position (INDEX)
This example shows how to always process the second sheet (index 1) in a multi-sheet Excel file.
curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/{id}' \
-H 'accept: application/json' \
-H 'Authorization: Bearer ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"name": "CRM Data Import",
"configuration": {
"input_connectors": ["connector_id"],
"output_connectors": ["connector_id"],
"tdm": "tdm_id",
"header_config": {
"type": "STATIC",
"row_index": 0
},
"sheet_config": {
"selection_type": "INDEX",
"sheet_index": 1
}
}
}'
Use case: When your data is always in the second sheet, regardless of what the sheet is named. This is useful when sheet names might change but the position remains constant.
Example 2: Select Sheet by Name (NAME)
This example shows how to always process the sheet named "CRM" in a multi-sheet Excel file.
curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/{id}' \
-H 'accept: application/json' \
-H 'Authorization: Bearer ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"name": "CRM Data Import",
"configuration": {
"input_connectors": ["connector_id"],
"output_connectors": ["connector_id"],
"tdm": "tdm_id",
"header_config": {
"type": "STATIC",
"row_index": 0
},
"sheet_config": {
"selection_type": "NAME",
"sheet_name": "CRM"
}
}
}'
Use case: When your data is always in a sheet with a specific name (e.g., "CRM", "Data", "Sales"). This is useful when the sheet name is consistent but its position might change.
Example 3: Pipeline Without Sheet Selection (Single-Sheet File)
For CSV, TSV, or JSON files, you don't need to specify sheet_config:
curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/{id}' \
-H 'accept: application/json' \
-H 'Authorization: Bearer ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"name": "CSV Data Import",
"configuration": {
"input_connectors": ["connector_id"],
"output_connectors": ["connector_id"],
"tdm": "tdm_id",
"header_config": {
"type": "STATIC",
"row_index": 0
}
}
}'
Important Notes About Sheet Selection:
-
File Type Support:
- Multi-sheet file types: XLS, XLSX, XML
- Single-sheet file types: CSV, TSV, JSON
-
Selection Type Guidelines:
- Use
INDEXwhen sheet position is consistent but names may vary - Use
NAMEwhen sheet names are consistent but positions may vary
- Use
-
Indexing:
- Sheet indices are 0-based (first sheet = 0, second sheet = 1, etc.)
- When displaying to users, convert to 1-based (1st, 2nd, 3rd, etc.)
-
Error Handling:
- If
selection_typeisINDEXand the specified index doesn't exist, the execution will fail - If
selection_typeisNAMEand the specified sheet name doesn't exist, the execution will fail - Sheet names are case-sensitive
- If
-
Default Behavior:
- If
sheet_configis not provided for multi-sheet files, the first sheet (index 0) is used - For single-sheet files,
sheet_configis ignored even if provided
- If
Read (by ID)
Endpoint
GET /pipeline/{id}
Response
Attributes
id
The ID of the pipeline
name
The name of the pipeline
active
Indicates whether the pipeline is set to active (true) or inactive (false) after creation. When a pipeline is active it can be either executed by triggering the execution manually or based on the set schedule. An inactive pipeline cannot be executed in any way
draft
Shows if the pipeline is in draft (true) or not (false). A pipeline in draft cannot be executed in any way.
configuration
Defines the specific setup of your pipeline
input_connectors
The list of all input connectors used for this pipeline. Currently, we only support one input connector per pipeline. Find out more about connectors here
output_connectors
The list of all output connectors used for this pipeline. Currently, we only support one output connector per pipeline. Find out more about connectors here
mapping_config
Defines how the input columns are mapped to the target data model columns and how their values are transformed to meet the requirements of the target data model
mode
Defines whether Ingestro AI is used to map input columns that haven’t been mapped yet to the output columns during future executions:
DEFAULT: Ingestro AI is applied to unmapped input columnsEXACT: Only already mapped columns are used
mappings
The list of all target data model columns with their mapped input columns and applied transformations
source_columns
The columns from the input data mapped to the target_column
target_column
An output column from the given target data model
transformations
The transformations applied to map the input columns to the output column in the correct format
name
The name of the applied transformation
type
The type of transformation applied:
- HYPER_FORMULA
- OPTION_MAPPING
function
The code or formula of the transformation, provided as a string
prompt
The prompt used to generate the transformation
tdm
The ID of the set target data model
error_config
Defines how the pipeline should handle errors that might occur during pipeline execution
error_threshold
A float between 0 and 100, representing the allowed percentage of erroneous cells during a pipeline execution. For example, if it is set to 10, it means that pipeline executions with less than 10% erroneous cells will be considered successful and will not fail
schedule_config
Defines when the pipeline is executed for the first and last time, as well as the interval at which it is executed
frequency
Sets how often the pipeline is executed. It is intertwined with interval. For example, if frequency is set to HOURLY and interval is set to 2, the pipeline is executed every 2 hours:
- HOURLY
- DAILY
- WEEKLY
- MONTHLY
interval
Sets the interval based on the frequency at which the pipeline is executed. For example, if interval is set to 2 and frequency is set to HOURLY, the pipeline is executed every 2 hours. The next execution cannot be scheduled further into the future than 1 year from the set start date and time
starts_on
The date and time when the pipeline is first executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). The date and time cannot be in the past
ends_on
The date and time when the pipeline is last executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). This date and time cannot be earlier than the start date and time
header_config
Defines how the header row is determined
type
Specifies whether Ingestro's header detection is applied or if the set row_index is used to determine the header row:
SMART: Ingestro's header detection is used to define the header rowSTATIC: The row at the specifiedrow_indexis used as the header row
row_index
The index of the row that should be used as the header row if type is set to STATIC
developer_mode
Defines if the pipeline is executed in developer mode (true) or not (false). Use the developer mode to test pipelines in your testing environment. Pipeline executions in developer mode are free of charge. Deactivate it for production use. Please note that pipelines executed in developer mode will only output 100 rows.
created_at
The date and time when the pipeline was first created
created_by
Information about whom created the pipeline
id
The ID of the user or sub-organization who created the pipeline
name
The name of the user or sub-organization who created the pipeline
identifier
The identifier of the user or sub-organization who created the pipeline
type
Defines the type of user who created the pipeline:
USER: A user of your organizationSUB_ORG: A sub-organization that is part of your organization
updated_at
The date and time when the pipeline was last updated
updated_by
Information about whom last updated the pipeline
id
The ID of the user or sub-organization who last updated the pipeline
name
The name of the user or sub-organization who last updated the pipeline
identifier
The identifier of the user or sub-organization who last updated the pipeline
type
Defines the type of user who last updated the pipeline:
USER: A user of your organizationSUB_ORG: A sub-organization that is part of your organization
Response
{
"data": {
"id": "string",
"name": "string",
"active": true,
"draft": false,
"configuration": {
"developer_mode": true,
"input_connectors": [
"string"
],
"output_connectors": [
"string"
],
"tdm": "string",
"header_config": {
"type": "SMART",
"row_index": 0
},
"sheet_config": {
"selection_type": "NAME",
"sheet_name": "CRM"
},
"mapping_config": {
"mode": "DEFAULT",
"mappings": [
{
"source_columns": [
"string"
],
"target_column": "string",
"transformations": [
{
"name": "string",
"type": "HYPER_FORMULA",
"function": "string",
"prompt": "string"
}
]
}
]
},
"error_config": {
"error_threshold": 0
}
},
"created_at": "2022-03-07 12:48:28.653",
"created_by": {
"id": "string",
"name": "string",
"identifier": "string",
"type": "USER"
},
"updated_at": "2022-03-07 12:48:28.653",
"updated_by": {
"id": "string",
"name": "string",
"identifier": "string",
"type": "USER"
}
}
}
Example
curl -X 'GET' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/${pipelineId}' \
-H 'accept: application/json' \
-H 'Authorization: Bearer ACCESS_TOKEN'
Read (all)
To further refine the response you can use query parameters like sort, filters, pagination and options. Look at a more detailed explanation here.
Endpoint
GET /pipeline/
Response
Attributes
id
The ID of the pipeline
name
The name of the pipeline
active
Indicates whether the pipeline is set to active (true) or inactive (false) after creation. When a pipeline is active it can be either executed by triggering the execution manually or based on the set schedule. An inactive pipeline cannot be executed in any way
draft
Shows if the pipeline is in draft (true) or not (false). A pipeline in draft cannot be executed in any way
created_at
The date and time when the pipeline was first created
created_by
Information about whom created the pipeline
id
The ID of the user or sub-organization who created the pipeline
name
The name of the user or sub-organization who created the pipeline
identifier
The identifier of the user or sub-organization who created the pipeline
type
Defines the type of user who created the pipeline:
USER: A user of your organizationSUB_ORG: A sub-organization that is part of your organization
updated_at
The date and time when the pipeline was last updated
updated_by
Information about whom last updated the pipeline
id
The ID of the user or sub-organization who last updated the pipeline
name
The name of the user or sub-organization who last updated the pipeline
identifier
The identifier of the user or sub-organization who last updated the pipeline
type
Defines the type of user who last updated the pipeline:
USER: A user of your organizationSUB_ORG: A sub-organization that is part of your organization
pagination
An object containing metadata about the result
total
The number of entries in the data array
offset
The offset set in the request parameters
limit
The limit set in the request parameters
Response
{
"data": [
{
"id": "string",
"name": "test",
"active": true,
"draft": false,
"created_at": "2022-03-07 12:48:28.653",
"created_by": {
"id": "string",
"name": "string",
"identifier": "string",
"type": "USER"
},
"updated_at": "2022-03-07 12:48:28.653",
"updated_by": {
"id": "string",
"name": "string",
"identifier": "string",
"type": "USER"
}
}
],
"pagination": {
"total": 0,
"offset": 0,
"limit": 0
}
}
Example
curl -X 'GET' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline' \
-H 'accept: application/json' \
-H 'Authorization: Bearer ACCESS_TOKEN'
Delete
Endpoint
DELETE /pipeline/{id}
Response
Attributes
message
Message confirming the deletion of the pipeline or providing an error message
Response
{
"data": {
"message": "string"
}
}
Example
curl -X 'DELETE' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/${pipelineId}' \
-H 'accept: application/json' \
-H 'Authorization: Bearer ACCESS_TOKEN'