The Python Processor allows you to run Python scripts using the CPython runtime (version 3.11). Although the Python ecosystem is very large, we provide only a limited yet powerful set of imported modules. For security reasons, your Python code can only access a limited set of imported modules. We allow the following standard modules:
We provide the following 3rd party modules:
-
pandas (2.0.3) Powerful data structures for data analysis, time series, and statistics
-
numpy (1.25.2) Fundamental package for array computing in Python
-
PyYAML (6.0.1) For parsing & building YAML content
-
openai (0.28.0) Client library for the OpenAI API
-
deepdiff (6.3.1) Deep Difference and Search of any Python object/data.
-
python-jose ([cryptography] 3.3.0) JOSE implementation in Python
-
passlib (1.7.4) Comprehensive password hashing framework supporting over 30 schemes
-
httpx (0.25.0) Fully featured HTTP client library
-
matplotlib (3.8.2) Matplotlib is a comprehensive library for creating static and animated visualizations in Python.
Allowed imports are:
(
#
# STD
"string",
"math",
"itertools",
"random",
"warnings",
"base64",
"io",
"json",
"xml",
"ssl",
"time",
"datetime",
#
# 3RD PARTY
"yaml",
"httpx",
"pandas",
"numpy",
"deepdiff",
"passlib.hash",
"jose",
"jose.backends",
"jose.constants",
"jose.utils",
"openai",
"matplotlib",
"matplotlib.pyplot",
)
In addition to the modules above, which were already mentioned, we restrict imports. For example, you cannot import os
module or sys
module. You only can import the modules listed above. Find an example of allowed import below:
import string
import pandas as pd
from json import dumps
from xml import etree
import os
import sys
from xml.etree import ElementTree
As you can see, we can import xml
module, but we cannot import xml.etree
module. This is because xml.etree
module is not listed in allowed imports. So, if module a
is listed in allowed imports, then you can import a
module, but you cannot import a.b
module. But if module a.b
is listed in allowed imports, then you can import a.b
module.
Also, this restricted Python Processor does not allow type hints. For example:
number = 1
number: int = 1
The code shown above with the type hint would fail with a syntax error. In the standard Python runtime, this code would work. When using our Python Processor, please mind to remove all type hints from your code.
In addition, we allso provide the following APIs:
sleep(seconds: float) -> Coroutine
INPUT_DATA
DATA_CHECKPOINT
"""Connector logger API"""
log.trace(message: str) -> None
log.debug(message: str) -> None
log.info(message: str) -> None
log.warn(message: str) -> None
log.error(message: str) -> None
Possible uses of the Multi input step include facilitating branch synchronization and enabling the acceptance of multiple inputs within integration processes. By incorporating this feature, you can streamline complex integration tasks and achieve better scalability, parallel processing, and system responsiveness. For more information, refer directly to the article dedicated to the Multi-Input Step
Configuration
Python statement configuration
Statement
Python statement to be executed using the Python processor service.
The output of the connector expects a list of objects that must match the output schema in structure. The defined Statement
represents the body of the async
function whose output is parsed and returned as the output of the connector itself. The last line of the Statement
must therefore be:
return items_in_list
Example
users = [
{ "name": "Alice" },
{ "name": "Bob" },
{ "name": "Charlie" },
]
return users
Note
As the statement is async
you must ensure that all Coroutine
are awaited. Non-awaited code may not be completed before your statement code.
Data checkpoint column
The data checkpoint column is a column (field), from which the platform takes the last row value after each executed task run and stores it as a Data checkpoint.
The data checkpoint value can be used in the Python statements to control, which data should be processed in the next run. You can refer to the value using the predefined variable DATA_CHECKPOINT
.
Example of use: processing data in cycles, where every cycle processes only a subset of the entire set due to the total size. If you use e.g. record ID as a data checkpoint column, the platform will store after each cycle the last processed ID from the data subset processed by the task run. If your statement is written in a way that will evaluate the value in data checkpoint against the IDs of the records in the data set, you can ensure this way, that only not processed records will be considered in the next task run.
Input & Output Schema
Input
Data schema is optional
The connector does not expect a specific schema. The required data structure can be achieved by correct configuration. Although the selected connector doesn't require a schema generally, the individual integration task step may need to match the output data structure of the preceding task step and use a data schema selected from the repository or create a new input schema.
Output
Data schema is mandatory
The connector requires mandatory input or output data schema, which must be selected by the user from the existing data schema repository or a new one must be created. The connector will fail without structured data.
Examples
Examples of Python Processor can be found in Python Processor examples.
Release notes
3.2.0
- Added support for Local variables.
3.1.2
- Fix additional input data processing.
- Multi input feature included - more info here
3.0.2
- Fixed processing sensitive errors
3.0.0
- First release