Python Processor

Version

3.1
Agent restriction

None
Processing type

Row by row & Bulk
Multi-input step

Not Supported

The Python Processor allows you to run Python scripts using the CPython runtime (version 3.11). Although the Python ecosystem is very large, we provide only a limited yet powerful set of imported modules. For security reasons, your Python code can only access a limited set of imported modules. We allow the following standard modules:

We provide the following 3rd party modules:

pandas (2.0.3) Powerful data structures for data analysis, time series, and statistics
numpy (1.25.2) Fundamental package for array computing in Python
PyYAML (6.0.1) For parsing & building YAML content
openai (0.28.0) Client library for the OpenAI API
deepdiff (6.3.1) Deep Difference and Search of any Python object/data.
python-jose ([cryptography] 3.3.0) JOSE implementation in Python
passlib (1.7.4) Comprehensive password hashing framework supporting over 30 schemes
httpx (0.25.0) Fully featured HTTP client library
matplotlib (3.8.2) Matplotlib is a comprehensive library for creating static and animated visualizations in Python.

Allowed imports are:

(
    #
    # STD
    "string",
    "math",
    "itertools",
    "random",
    "warnings",
    "base64",
    "io",
    "json",
    "xml",
    "ssl",
    "time",
    "datetime",
    #
    # 3RD PARTY
    "yaml",
    "httpx",
    "pandas",
    "numpy",
    "deepdiff",
    "passlib.hash",
    "jose",
    "jose.backends",
    "jose.constants",
    "jose.utils",
    "openai",
    "matplotlib",
    "matplotlib.pyplot",
)

In addition to the modules above, which were already mentioned, we restrict imports. For example, you cannot import os module or sys module. You only can import the modules listed above. Find an example of allowed import below:

import string
import pandas as pd
from json import dumps
from xml import etree
import os
import sys
from xml.etree import ElementTree

As you can see, we can import xml module, but we cannot import xml.etree module. This is because xml.etree module is not listed in allowed imports. So, if module a is listed in allowed imports, then you can import a module, but you cannot import a.b module. But if module a.b is listed in allowed imports, then you can import a.b module. Also, this restricted Python Processor does not allow type hints. For example:

number = 1
number: int = 1

The code shown above with the type hint would fail with a syntax error. In the standard Python runtime, this code would work. When using our Python Processor, please mind to remove all type hints from your code.

In addition, we allso provide the following APIs:

sleep(seconds: float) -> Coroutine

INPUT_DATA

DATA_CHECKPOINT

"""Connector logger API"""
log.trace(message: str) -> None
log.debug(message: str) -> None
log.info(message: str) -> None
log.warn(message: str) -> None
log.error(message: str) -> None

Possible uses of the Multi input step include facilitating branch synchronization and enabling the acceptance of multiple inputs within integration processes. By incorporating this feature, you can streamline complex integration tasks and achieve better scalability, parallel processing, and system responsiveness. For more information, refer directly to the article dedicated to the Multi-Input Step

Configuration

Python statement configuration

Statement

Python statement to be executed using the Python processor service. The output of the connector expects a list of objects that must match the output schema in structure. The defined Statement represents the body of the async function whose output is parsed and returned as the output of the connector itself. The last line of the Statement must therefore be:

return items_in_list

Example

users = [
    { "name": "Alice" },
    { "name": "Bob" },
    { "name": "Charlie" },
]
return users

Note

As the statement is async you must ensure that all Coroutine are awaited. Non-awaited code may not be completed before your statement code.

Data checkpoint column

The data checkpoint column is a column (field), from which the platform takes the last row value after each executed task run and stores it as a Data checkpoint. The data checkpoint value can be used in the Python statements to control, which data should be processed in the next run. You can refer to the value using the predefined variable DATA_CHECKPOINT. Example of use: processing data in cycles, where every cycle processes only a subset of the entire set due to the total size. If you use e.g. record ID as a data checkpoint column, the platform will store after each cycle the last processed ID from the data subset processed by the task run. If your statement is written in a way that will evaluate the value in data checkpoint against the IDs of the records in the data set, you can ensure this way, that only not processed records will be considered in the next task run.

Input & Output Schema

Input

Data schema is optional

The connector does not expect a specific schema. The required data structure can be achieved by correct configuration. Although the selected connector doesn't require a schema generally, the individual integration task step may need to match the output data structure of the preceding task step and use a data schema selected from the repository or create a new input schema.

Output

Data schema is mandatory

The connector requires mandatory input or output data schema, which must be selected by the user from the existing data schema repository or a new one must be created. The connector will fail without structured data.

Examples

Examples of Python Processor can be found in Python Processor examples.

Release notes

3.1.2

Fix additional input data processing.
Multi input feature included - more info here

3.0.2

Fixed processing sensitive errors

3.0.0

First release