Pipeline API

A pipeline is the combination of all essential components, such as input and output connectors, as well as the target data model. It enables you to automate the process of importing data into your target system.

You also want to create pipelines within your system?

Currently, you cannot create pipelines via our API. If you want to create pipelines, you can do the following:

use our CreatePipeline embeddable component by checking our guide
use the Ingestro User Platform

Use this base URL and add the corresponding endpoint respectively:

Base URL

api-gateway.ingestro.com/dp/api/v1/

Update

Endpoint

PUT /pipeline/{id}

Payload

Attributes

name

The name of the pipeline

configuration

Defines the specific setup of your pipeline

input_connectors

The list of all input connectors used for this pipeline. Currently, we only support one input connector per pipeline. Find out more about connectors here

output_connectors

The list of all output connectors used for this pipeline. Currently, we only support one output connector per pipeline. Find out more about connectors here

mapping_config

Defines how the input columns are mapped to the target data model columns and how their values are transformed to meet the requirements of the target data model

mode

Defines whether Ingestro AI is used to map input columns that haven’t been mapped yet to the output columns during future executions:

DEFAULT: Ingestro AI is applied to unmapped input columns
EXACT: Only already mapped columns are used

mappings

The list of all target data model columns with their mapped input columns and applied transformations

source_columns

The columns from the input data mapped to the target_column

target_column

An output column from the given target data model

transformations

The transformations applied to map the input columns to the output column in the correct format

name

The name of the applied transformation

type

The type of transformation applied:

HYPER_FORMULA
OPTION_MAPPING

function

The code or formula of the transformation, provided as a string

prompt

The prompt used to generate the transformation

tdm

The ID of the set target data model

error_config

Defines how the pipeline should handle errors that might occur during pipeline execution

error_threshold

A float between 0 and 100, representing the allowed percentage of erroneous cells during a pipeline execution. For example, if it is set to 10, it means that pipeline executions with less than 10% erroneous cells will be considered successful and will not fail.

schedule_config

Defines when the pipeline is executed for the first and last time, as well as the interval at which it is executed

frequency

Sets how often the pipeline is executed. It is intertwined with interval. For example, if frequency is set to HOURLY and interval is set to 2, the pipeline is executed every 2 hours:

HOURLY
DAILY
WEEKLY
MONTHLY

interval

Sets the interval based on the frequency at which the pipeline is executed. For example, if interval is set to 2 and frequency is set to HOURLY, the pipeline is executed every 2 hours. The next execution cannot be scheduled further into the future than 1 year from the set start date and time

starts_on

The date and time when the pipeline is first executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). The date and time cannot be in the past

ends_on

The date and time when the pipeline is last executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). This date and time cannot be earlier than the start date and time

header_config

Defines how the header row is determined

type

Specifies whether Ingestro's header detection is applied or if the set row_index is used to determine the header row:

SMART: Ingestro's header detection is used to define the header row
STATIC: The row at the specified row_index is used as the header row

row_index

The index of the row that should be used as the header row if type is set to STATIC

sheet_config

Defines which sheet to process when working with multi-sheet file types (XLS, XLSX, XML). This configuration is only applicable for multi-sheet files and is ignored for single-sheet file types (CSV, TSV, JSON).

When to use:

Your input connector uses XLS, XLSX, or XML files with multiple sheets
You want to specify which sheet contains the data to process
You need consistent sheet selection across multiple executions

Behavior:

For single-sheet file types (CSV, TSV, JSON): sheet_config is ignored
For multi-sheet files with only one sheet: sheet_config is optional (defaults to first sheet)
For multi-sheet files with multiple sheets: sheet_config determines which sheet to use

selection_type

Defines the strategy for selecting which sheet to process:

INDEX: Select sheet by its position (0-indexed). Use this when you always want to process the sheet at a specific position (e.g., always the 2nd sheet)
NAME: Select sheet by its name. Use this when you always want to process a sheet with a specific name (e.g., always the "CRM" sheet)

sheet_index

The 0-based index of the sheet to process. Required when selection_type is INDEX.

Examples:

0 = First sheet
1 = Second sheet
2 = Third sheet

Important: Backend uses 0-indexing. The first sheet is at index 0, not 1.

sheet_name

The name of the sheet to process. Required when selection_type is NAME.

Examples:

"Sheet1" = Process the sheet named "Sheet1"
"CRM" = Process the sheet named "CRM"
"Data" = Process the sheet named "Data"

Important: Sheet names are case-sensitive. The execution will fail if a sheet with the specified name is not found in the file.

developer_mode

Defines if the pipeline is executed in developer mode (true) or not (false). Use the developer mode to test pipelines in your testing environment. Pipeline executions in developer mode are free of charge. Deactivate it for production use. Please note that pipelines executed in developer mode will only output 100 rows

active

Indicates whether the pipeline is set to active (true) or inactive (false) after creation. When a pipeline is active it can be either executed by triggering the execution manually or based on the set schedule. An inactive pipeline cannot be executed in any way

Payload

{
  "name": "string",
  "configuration": {
    "input_connectors": [
      "string"
    ],
    "output_connectors": [
      "string"
    ],
    "mapping_config": {
      "mode": "DEFAULT",
      "mappings": [
        {
          "source_columns": [
            "string"
          ],
          "target_column": "string",
          "transformations": [
            {
              "name": "string",
              "type": "HYPER_FORMULA",
              "function": "string",
              "prompt": "string"
            }
          ]
        }
      ]
    },
    "tdm": "string",
    "error_config": {
      "error_threshold": 0
    },
    "schedule_config": {
      "frequency": "HOURLY",
      "interval": 0,
      "starts_on": "2024-09-02T13:26:13.642Z",
      "ends_on": "2024-09-02T13:26:13.642Z"
    },
    "header_config": {
      "type": "SMART",
      "row_index": 0
    },
    "sheet_config": {
      "selection_type": "NAME",
      "sheet_name": "CRM"
    },
    "developer_mode": 0
  },
  "active": 0
}

Response

Attributes

The ID of the pipeline

name

The name of the pipeline

active

draft

Shows if the pipeline is in draft (true) or not (false). A pipeline in draft cannot be executed in any way

configuration

Defines the specific setup of your pipeline

input_connectors

The list of all input connectors used for this pipeline. Currently, we only support one input connector per pipeline. Find out more about connectors here

output_connectors

The list of all output connectors used for this pipeline. Currently, we only support one output connector per pipeline. Find out more about connectors here

mapping_config

Defines how the input columns are mapped to the target data model columns and how their values are transformed to meet the requirements of the target data model

mode

Defines whether Ingestro AI is used to map input columns that haven’t been mapped yet to the output columns during future executions:

DEFAULT: Ingestro AI is applied to unmapped input columns
EXACT: Only already mapped columns are used

mappings

The list of all target data model columns with their mapped input columns and applied transformations

source_columns

The columns from the input data mapped to the target_column

target_column

An output column from the given target data model

transformations

The transformations applied to map the input columns to the output column in the correct format

name

The name of the applied transformation

type

The type of transformation applied:

HYPER_FORMULA
OPTION_MAPPING

function

The code or formula of the transformation, provided as a string

prompt

The prompt used to generate the transformation

tdm

The ID of the set target data model

error_config

Defines how the pipeline should handle errors that might occur during pipeline execution

error_threshold

schedule_config

Defines when the pipeline is executed for the first and last time, as well as the interval at which it is executed

frequency

Sets how often the pipeline is executed. It is intertwined with interval. For example, if frequency is set to HOURLY and interval is set to 2, the pipeline is executed every 2 hours:

HOURLY
DAILY
WEEKLY
MONTHLY

interval

starts_on

The date and time when the pipeline is first executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). The date and time cannot be in the past

ends_on

The date and time when the pipeline is last executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). This date and time cannot be earlier than the start date and time

header_config

Defines how the header row is determined

type

Specifies whether Ingestro's header detection is applied or if the set row_index is used to determine the header row:

SMART: Ingestro's header detection is used to define the header row
STATIC: The row at the specified row_index is used as the header row

row_index

The index of the row that should be used as the header row if type is set to STATIC

developer_mode

created_at

The date and time when the pipline was first created

created_by

Information about whom created the pipeline

The ID of the user or sub-organization who created the pipeline

name

The name of the user or sub-organization who created the pipeline

identifier

The identifier of the user or sub-organization who created the pipeline

type

Defines the type of user who created the pipeline:

USER: A user of your organization
SUB_ORG: A sub-organization that is part of your organization

updated_at

The date and time when the pipeline was last updated

updated_by

Information about whom last updated the pipeline

The ID of the user or sub-organization who last updated the pipeline

name

The name of the user or sub-organization who last updated the pipeline

identifier

The identifier of the user or sub-organization who last updated the pipeline

type

Defines the type of user who last updated the pipeline:

USER: A user of your organization
SUB_ORG: A sub-organization that is part of your organization

Response

{
  "data": {
    "id": "string",
    "name": "string",
    "active": true,
    "draft": true,
    "configuration": {
      "input_connectors": [
        "string"
      ],
      "output_connectors": [
        "string"
      ],
      "mapping_config": {
        "mode": "string",
        "mappings": [
          {
            "source_columns": [
              "string"
            ],
            "target_column": "string",
            "transformations": [
              {
                "name": "string",
                "type": "HYPER_FORMULA",
                "function": "string",
                "prompt": "string"
              }
            ]
          }
        ]
      },
      "tdm": "string",
      "error_config": {
        "error_threshold": 0
      },
      "schedule_config": {
        "frequency": "HOURLY",
        "interval": 0,
        "starts_on": "2024-08-28T15:18:27.477Z",
        "ends_on": "2024-08-28T15:18:27.477Z"
      },
      "header_config": {
        "type": "SMART",
        "row_index": 0
      },
      "sheet_config": {
        "selection_type": "NAME",
        "sheet_name": "CRM"
      },
      "configuration_type": "PIPELINE",
      "developer_mode": true
    },
    "createdAt": "2022-03-07 12:48:28.653",
    "created_by": {
      "id": "string",
      "name": "string",
      "identifier": "string",
      "type": "USER"
    },
    "updateAt": "2022-03-07 12:48:28.653",
    "update_by": {
      "id": "string",
      "name": "string",
      "identifier": "string",
      "type": "USER"
    }
  }
}

Example

cURL

Javascript

curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/${pipelineId}' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "NEW NAME",
  "configuration": {
    "input_connectors": [
      "INPUT_CONNECTOR_ID"
    ],
    "output_connectors": [
      "OUTPUT_CONNECTOR_ID"
    ],
    "error_config": {
      "error_threshold": 10
    },
    "schedule_config": {
      "frequency": "WEEKLY",
      "interval": 3,
      "starts_on": "2025-03-17T18:21:47.332Z"
    }
  }
}'

Sheet Selection Examples

Multi-Sheet File Support

When working with Excel (XLS, XLSX) or XML files that contain multiple sheets, use the sheet_config property to specify which sheet to process. You can select sheets either by their position (INDEX) or by their name (NAME). For single-sheet file types like CSV, TSV, or JSON, the sheet_config property is not needed and will be ignored if provided.

Example 1: Select Sheet by Position (INDEX)

This example shows how to always process the second sheet (index 1) in a multi-sheet Excel file.

cURL

Javascript

curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/{id}' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "CRM Data Import",
    "configuration": {
      "input_connectors": ["connector_id"],
      "output_connectors": ["connector_id"],
      "tdm": "tdm_id",
      "header_config": {
        "type": "STATIC",
        "row_index": 0
      },
      "sheet_config": {
        "selection_type": "INDEX",
        "sheet_index": 1
      }
    }
  }'

Use case: When your data is always in the second sheet, regardless of what the sheet is named. This is useful when sheet names might change but the position remains constant.

Example 2: Select Sheet by Name (NAME)

This example shows how to always process the sheet named "CRM" in a multi-sheet Excel file.

cURL

Javascript

curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/{id}' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "CRM Data Import",
    "configuration": {
      "input_connectors": ["connector_id"],
      "output_connectors": ["connector_id"],
      "tdm": "tdm_id",
      "header_config": {
        "type": "STATIC",
        "row_index": 0
      },
      "sheet_config": {
        "selection_type": "NAME",
        "sheet_name": "CRM"
      }
    }
  }'

Use case: When your data is always in a sheet with a specific name (e.g., "CRM", "Data", "Sales"). This is useful when the sheet name is consistent but its position might change.

Example 3: Pipeline Without Sheet Selection (Single-Sheet File)

For CSV, TSV, or JSON files, you don't need to specify sheet_config:

cURL

Javascript

curl -X 'PUT' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/{id}' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "CSV Data Import",
    "configuration": {
      "input_connectors": ["connector_id"],
      "output_connectors": ["connector_id"],
      "tdm": "tdm_id",
      "header_config": {
        "type": "STATIC",
        "row_index": 0
      }
    }
  }'

info

Important Notes About Sheet Selection:

File Type Support:
- Multi-sheet file types: XLS, XLSX, XML
- Single-sheet file types: CSV, TSV, JSON
Selection Type Guidelines:
- Use INDEX when sheet position is consistent but names may vary
- Use NAME when sheet names are consistent but positions may vary
Indexing:
- Sheet indices are 0-based (first sheet = 0, second sheet = 1, etc.)
- When displaying to users, convert to 1-based (1st, 2nd, 3rd, etc.)
Error Handling:
- If selection_type is INDEX and the specified index doesn't exist, the execution will fail
- If selection_type is NAME and the specified sheet name doesn't exist, the execution will fail
- Sheet names are case-sensitive
Default Behavior:
- If sheet_config is not provided for multi-sheet files, the first sheet (index 0) is used
- For single-sheet files, sheet_config is ignored even if provided

Read (by ID)

Endpoint

GET /pipeline/{id}

Response

Attributes

The ID of the pipeline

name

The name of the pipeline

active

draft

Shows if the pipeline is in draft (true) or not (false). A pipeline in draft cannot be executed in any way.

configuration

Defines the specific setup of your pipeline

input_connectors

The list of all input connectors used for this pipeline. Currently, we only support one input connector per pipeline. Find out more about connectors here

output_connectors

The list of all output connectors used for this pipeline. Currently, we only support one output connector per pipeline. Find out more about connectors here

mapping_config

Defines how the input columns are mapped to the target data model columns and how their values are transformed to meet the requirements of the target data model

mode

Defines whether Ingestro AI is used to map input columns that haven’t been mapped yet to the output columns during future executions:

DEFAULT: Ingestro AI is applied to unmapped input columns
EXACT: Only already mapped columns are used

mappings

The list of all target data model columns with their mapped input columns and applied transformations

source_columns

The columns from the input data mapped to the target_column

target_column

An output column from the given target data model

transformations

The transformations applied to map the input columns to the output column in the correct format

name

The name of the applied transformation

type

The type of transformation applied:

HYPER_FORMULA
OPTION_MAPPING

function

The code or formula of the transformation, provided as a string

prompt

The prompt used to generate the transformation

tdm

The ID of the set target data model

error_config

Defines how the pipeline should handle errors that might occur during pipeline execution

error_threshold

schedule_config

Defines when the pipeline is executed for the first and last time, as well as the interval at which it is executed

frequency

Sets how often the pipeline is executed. It is intertwined with interval. For example, if frequency is set to HOURLY and interval is set to 2, the pipeline is executed every 2 hours:

HOURLY
DAILY
WEEKLY
MONTHLY

interval

starts_on

The date and time when the pipeline is first executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). The date and time cannot be in the past

ends_on

The date and time when the pipeline is last executed, provided as a timestamp in UTC (e.g. 2024-09-02T13:26:13.642Z). This date and time cannot be earlier than the start date and time

header_config

Defines how the header row is determined

type

Specifies whether Ingestro's header detection is applied or if the set row_index is used to determine the header row:

SMART: Ingestro's header detection is used to define the header row
STATIC: The row at the specified row_index is used as the header row

row_index

The index of the row that should be used as the header row if type is set to STATIC

developer_mode

created_at

The date and time when the pipeline was first created

created_by

Information about whom created the pipeline

The ID of the user or sub-organization who created the pipeline

name

The name of the user or sub-organization who created the pipeline

identifier

The identifier of the user or sub-organization who created the pipeline

type

Defines the type of user who created the pipeline:

USER: A user of your organization
SUB_ORG: A sub-organization that is part of your organization

updated_at

The date and time when the pipeline was last updated

updated_by

Information about whom last updated the pipeline

The ID of the user or sub-organization who last updated the pipeline

name

The name of the user or sub-organization who last updated the pipeline

identifier

The identifier of the user or sub-organization who last updated the pipeline

type

Defines the type of user who last updated the pipeline:

USER: A user of your organization
SUB_ORG: A sub-organization that is part of your organization

Response

{
  "data": {
    "id": "string",
    "name": "string",
    "active": true,
    "draft": false,
    "configuration": {
      "developer_mode": true,
      "input_connectors": [
        "string"
      ],
      "output_connectors": [
        "string"
      ],
      "tdm": "string",
      "header_config": {
        "type": "SMART",
        "row_index": 0
      },
      "sheet_config": {
        "selection_type": "NAME",
        "sheet_name": "CRM"
      },
      "mapping_config": {
        "mode": "DEFAULT",
        "mappings": [
          {
            "source_columns": [
              "string"
            ],
            "target_column": "string",
            "transformations": [
                {
                "name": "string",
                "type": "HYPER_FORMULA",
                "function": "string",
                "prompt": "string"
              }
            ]
          }
        ]
      },
      "error_config": {
        "error_threshold": 0
      }
    },
    "created_at": "2022-03-07 12:48:28.653",
    "created_by": {
      "id": "string",
      "name": "string",
      "identifier": "string",
      "type": "USER"
    },
    "updated_at": "2022-03-07 12:48:28.653",
    "updated_by": {
      "id": "string",
      "name": "string",
      "identifier": "string",
      "type": "USER"
    }
  }
}

Example

cURL

Javascript

curl -X 'GET' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/${pipelineId}' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN'

Read (all)

To further refine the response you can use query parameters like sort, filters, pagination and options. Look at a more detailed explanation here.

Endpoint

GET /pipeline/

Response

Attributes

The ID of the pipeline

name

The name of the pipeline

active

draft

Shows if the pipeline is in draft (true) or not (false). A pipeline in draft cannot be executed in any way

created_at

The date and time when the pipeline was first created

created_by

Information about whom created the pipeline

The ID of the user or sub-organization who created the pipeline

name

The name of the user or sub-organization who created the pipeline

identifier

The identifier of the user or sub-organization who created the pipeline

type

Defines the type of user who created the pipeline:

USER: A user of your organization
SUB_ORG: A sub-organization that is part of your organization

updated_at

The date and time when the pipeline was last updated

updated_by

Information about whom last updated the pipeline

The ID of the user or sub-organization who last updated the pipeline

name

The name of the user or sub-organization who last updated the pipeline

identifier

The identifier of the user or sub-organization who last updated the pipeline

type

Defines the type of user who last updated the pipeline:

USER: A user of your organization
SUB_ORG: A sub-organization that is part of your organization

pagination

An object containing metadata about the result

total

The number of entries in the data array

offset

The offset set in the request parameters

limit

The limit set in the request parameters

Response

{
  "data": [
    {
      "id": "string",
      "name": "test",
      "active": true,
      "draft": false,
      "created_at": "2022-03-07 12:48:28.653",
      "created_by": {
        "id": "string",
        "name": "string",
        "identifier": "string",
        "type": "USER"
      },
      "updated_at": "2022-03-07 12:48:28.653",
      "updated_by": {
        "id": "string",
        "name": "string",
        "identifier": "string",
        "type": "USER"
      }
    }
  ],
  "pagination": {
    "total": 0,
    "offset": 0,
    "limit": 0
  }
}

Example

cURL

Javascript

curl -X 'GET' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN'

Delete

Endpoint

DELETE /pipeline/{id}

Response

Attributes

message

Message confirming the deletion of the pipeline or providing an error message

Response

{
  "data": {
    "message": "string"
  }
}

Example

cURL

Javascript

curl -X 'DELETE' 'https://api-gateway.ingestro.com/dp/api/v1/pipeline/${pipelineId}' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer ACCESS_TOKEN'

Update​

Payload​

Response​

Example​

Sheet Selection Examples​

Example 1: Select Sheet by Position (INDEX)​

Example 2: Select Sheet by Name (NAME)​

Example 3: Pipeline Without Sheet Selection (Single-Sheet File)​

Read (by ID)​

Response​

Example​

Read (all)​

Response​

Example​

Delete​

Response​

Example​

Update

Payload

Response

Example

Sheet Selection Examples

Example 1: Select Sheet by Position (INDEX)

Example 2: Select Sheet by Name (NAME)

Example 3: Pipeline Without Sheet Selection (Single-Sheet File)

Read (by ID)

Response

Example

Read (all)

Response

Example

Delete

Response

Example