Python
Operators in the Python category
Home > User-defined Functions > Python
Operators
Total: 5 operators
1 - 1-out Python UDF
User-defined function operator in Python script
Home > User Defined Functions > Python
| Property | Requirement | Type | Default | Description |
|---|
| Python script | ✓ | Code (python) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Columns | | List | - | The columns of the source |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - | |
Default Code Template
Python script
# from pytexera import *
# class GenerateOperator(UDFSourceOperator):
#
# @overrides
#
# def produce(self) -> Iterator[Union[TupleLike, TableLike, None]]:
# yield
Output Ports
2 - 2-in Python UDF
User-defined function operator in Python script
Home > User Defined Functions > Python
| Property | Requirement | Type | Default | Description |
|---|
| Python script | ✓ | Code (python) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Retain input columns | ✓ | Boolean | true | Keep the original input columns? |
| Extra output column(s) | | List | - | Name of the newly added output columns that the UDF will produce, if any |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - | |
Default Code Template
Python script
# Choose from the following templates:
#
# from pytexera import *
#
# class ProcessTupleOperator(UDFOperatorV2):
#
# @overrides
# def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
# yield tuple_
#
# class ProcessBatchOperator(UDFBatchOperator):
# BATCH_SIZE = 10 # must be a positive integer
#
# @overrides
# def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
# yield batch
#
# class ProcessTableOperator(UDFTableOperator):
#
# @overrides
# def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
# yield table
Output Ports
3 - Python Lambda Function
Modify or add a new column with more ease
Home > User Defined Functions > Python
| Property | Requirement | Type | Default | Description |
|---|
| Add/Modify column(s) | | List | - | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Expression | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - | |
Output Ports
4 - Python Table Reducer
Reduce Table to Tuple
Home > User Defined Functions > Python
| Property | Requirement | Type | Default | Description |
|---|
| Output columns | | List | - | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Expression | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - | |
Output Ports
5 - Python UDF
User-defined function operator in Python script
Home > User Defined Functions > Python
| Property | Requirement | Type | Default | Description |
|---|
| Python script | ✓ | Code (python) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Retain input columns | ✓ | Boolean | true | Keep the original input columns? |
| Extra output column(s) | | List | - | Name of the newly added output columns that the UDF will produce, if any |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - | |
Default Code Template
Python script
# Choose from the following templates:
#
# from pytexera import *
#
# class ProcessTupleOperator(UDFOperatorV2):
#
# @overrides
# def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
# yield tuple_
#
# class ProcessBatchOperator(UDFBatchOperator):
# BATCH_SIZE = 10 # must be a positive integer
#
# @overrides
# def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
# yield batch
#
# class ProcessTableOperator(UDFTableOperator):
#
# @overrides
# def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
# yield table
Output Ports