This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Operators

Complete reference for all Texera operators organized by category

Operator Categories

1 - Data Input

Operators in the Data Input category

Home > Data Input

Operators

OperatorDescription
Arrow File ScanScan data from an Arrow file
CSV File ScanScan data from a CSV file
CSVOld File ScanScan data from a CSVOld file
File ListerSelect a dataset version and output one filename tuple per file
File ScanScan data from a file
File Scan From InputScan data from file paths provided by input tuples
JSONL File ScanScan data from a JSONL file
Text InputSource data from manually inputted text

Total: 8 operators

1.1 - Arrow File Scan

Scan data from an Arrow file

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
FileString-
LimitInteger-Max output count
OffsetInteger-Starting point of output

Output Ports

PortMode
0Set Snapshot

1.2 - CSV File Scan

Scan data from a CSV file

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
FileString-
File EncodingUTF_8, UTF_16, US_ASCIIUTF_8Decoding charset to use on input
LimitInteger-Max output count
OffsetInteger-Starting point of output
DelimiterString,Delimiter to separate each line into fields
HeaderBooleantrueWhether the CSV file contains a header line

Output Ports

PortMode
0Set Snapshot

1.3 - CSVOld File Scan

Scan data from a CSVOld file

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
FileString-
File EncodingUTF_8, UTF_16, US_ASCIIUTF_8Decoding charset to use on input
LimitInteger-Max output count
OffsetInteger-Starting point of output
DelimiterString,Delimiter to separate each line into fields
HeaderBooleantrueWhether the CSV file contains a header line

Output Ports

PortMode
0Set Snapshot

1.4 - File Lister

Select a dataset version and output one filename tuple per file

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
DatasetString-

Output Ports

PortMode
0Set Snapshot

1.5 - File Scan

Scan data from a file

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
FileString-
EncodingUTF_8, UTF_16, US_ASCIIUTF_8
ExtractBooleanfalse
↳ Include FilenameBooleanfalse
Attribute Typestring, single string, integer, long,
double, boolean, timestamp, binary,
large binary
string
Attribute NameStringline
LimitInteger-
OffsetInteger-

Output Ports

PortMode
0Set Snapshot

1.6 - File Scan From Input

Scan data from file paths provided by input tuples

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
EncodingUTF_8, UTF_16, US_ASCIIUTF_8
ExtractBooleanfalse
Include FilenameBooleanfalse
Attribute Typestring, single string, integer, long,
double, boolean, timestamp, binary,
large binary
string
Attribute NameStringline
LimitInteger-
OffsetInteger-

Output Ports

PortMode
0Set Snapshot

1.7 - JSONL File Scan

Scan data from a JSONL file

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
FileString-
File EncodingUTF_8, UTF_16, US_ASCIIUTF_8Decoding charset to use on input
LimitInteger-Max output count
OffsetInteger-Starting point of output
FlattenBooleanfalseFlatten nested objects and arrays

Output Ports

PortMode
0Set Snapshot

1.8 - Text Input

Source data from manually inputted text

Home > Data Input

Input Properties

PropertyRequirementTypeDefaultDescription
TextString-
Attribute Typestring, single string, integer, long,
double, boolean, timestamp, binary,
large binary
string
Attribute NameStringline
LimitInteger-
OffsetInteger-

Output Ports

PortMode
0Set Snapshot

2 - Database Connector

Operators in the Database Connector category

Home > Database Connector

Operators

OperatorDescription
AsterixDB SourceRead data from a AsterixDB instance
MySQL SourceRead data from a MySQL instance
PostgreSQL SourceRead data from a PostgreSQL instance

Total: 3 operators

2.1 - AsterixDB Source

Read data from a AsterixDB instance

Home > Database Connector

Input Properties

PropertyRequirementTypeDefaultDescription
HostString-
PortStringdefaultA port number or ‘default’
DatabaseString-
Table NameString-
LimitLong-Max output count
OffsetLong-Starting point of output
Keyword Search?Booleanfalse
↳ Keyword Search ColumnString-
↳ Keywords to SearchString-“[‘hello’, ‘world’], {‘mode’:‘any’}” OR
"[‘hello’, ‘world’], {‘mode’:‘all’}"
Progressive?Booleanfalse
↳ Batch by ColumnString-
↳ MinStringauto
↳ MaxStringauto
↳ Batch by IntervalLong1000000000
Geo Search?Booleanfalse
↳ Geo Search By ColumnsList-Column(s) to check if any of them is in the
bounding box below
↳ Geo Search Bounding BoxList-At least 2 entries should be provided to form a
bounding box. format of each entry: long, lat
Regex Search?Booleanfalse
↳ Regex Search By ColumnString-
↳ Regex to SearchString-
Filter Condition?Booleanfalse
↳ PredicatesList-Multiple predicates in OR
  ↳ AttributeString-
  ↳ Condition=, >, >=, <, <=, !=, is null,
is not null
-
  ↳ ValueString-

Output Ports

PortMode
0Set Snapshot

2.2 - MySQL Source

Read data from a MySQL instance

Home > Database Connector

Input Properties

PropertyRequirementTypeDefaultDescription
HostString-
PortStringdefaultA port number or ‘default’
DatabaseString-
Table NameString-
UsernameString-
PasswordString-
LimitLong-Max output count
OffsetLong-Starting point of output
Keyword Search?Booleanfalse
↳ Keyword Search ColumnString-
↳ Keywords to SearchString-
Progressive?Booleanfalse
↳ Batch by ColumnString-
↳ MinStringauto
↳ MaxStringauto
↳ Batch by IntervalLong1000000000

Output Ports

PortMode
0Set Snapshot

2.3 - PostgreSQL Source

Read data from a PostgreSQL instance

Home > Database Connector

Input Properties

PropertyRequirementTypeDefaultDescription
HostString-
PortStringdefaultA port number or ‘default’
DatabaseString-
Table NameString-
UsernameString-
PasswordString-
LimitLong-Max output count
OffsetLong-Starting point of output
Keyword Search?Booleanfalse
↳ Keyword Search ColumnString-
↳ Keywords to SearchString-E.g. ‘sore & throat’ for AND; ‘sore’, ’throat’
for OR. See official postgres documents for
details
Progressive?Booleanfalse
↳ Batch by ColumnString-
↳ MinStringauto
↳ MaxStringauto
↳ Batch by IntervalLong1000000000

Output Ports

PortMode
0Set Snapshot

3 - Search

Operators in the Search category

Home > Search

Operators

OperatorDescription
Dictionary matcherMatches tuples if they appear in a given dictionary
Keyword SearchSearch for keyword(s) in a string column
Regular ExpressionSearch a regular expression in a string column
Substring SearchSearch for Substring(s) in a string column

Total: 4 operators

3.1 - Dictionary matcher

Matches tuples if they appear in a given dictionary

Home > Search

Input Properties

PropertyRequirementTypeDefaultDescription
DictionaryString-Dictionary values separated by a comma
AttributeString-Column name to match
Result AttributeStringmatchedColumn name of the matching result
Matching TypeScan, Substring, Conjunction-

Output Ports

PortMode
0Set Snapshot

3.2 - Keyword Search

Search for keyword(s) in a string column

Home > Search

Input Properties

PropertyRequirementTypeDefaultDescription
attributeString-Column to search keyword on
keywordsString-Keywords

Output Ports

PortMode
0Set Snapshot

3.3 - Regular Expression

Search a regular expression in a string column

Home > Search

Input Properties

PropertyRequirementTypeDefaultDescription
Case InsensitiveBooleanfalseRegex match is case sensitive
AttributeString-Column to search regex on
RegexString-Regular expression

Output Ports

PortMode
0Set Snapshot

3.4 - Substring Search

Search for Substring(s) in a string column

Home > Search

Input Properties

PropertyRequirementTypeDefaultDescription
attributeString-Column to search substring on
SubstringString-Substring
Case SensitiveBooleanfalseWhether the substring match is case sensitive

Output Ports

PortMode
0Set Snapshot

4 - Data Cleaning

Operators in the Data Cleaning category

Home > Data Cleaning

Subcategories

Operators

OperatorDescription
DistinctRemove duplicate tuples
FilterPerforms a filter operation using OR between multiple predicates
LimitLimit the number of output rows
ProjectionKeeps or drops the column
Type CastingCast between types

Total: 5 operators

4.1 - Join

Operators in the Join category

Home > Data Cleaning > Join

Operators

OperatorDescription
Cartesian ProductAppend fields together to get the cartesian product of two inputs
Hash JoinJoin two inputs
Interval JoinJoin two inputs with left table join key in the range of [right table join key, right table join key + constant value]

Total: 3 operators

4.1.1 - Cartesian Product

Append fields together to get the cartesian product of two inputs

Home > Data Cleaning > Join

Output Ports

PortMode
0Set Snapshot

4.1.2 - Hash Join

Join two inputs

Home > Data Cleaning > Join

Input Properties

PropertyRequirementTypeDefaultDescription
Left Input AttributeString-Attribute to be joined on the Left Input
Right Input AttributeString-Attribute to be joined on the Right Input
Join Typeinner, left outer, right outer,
full outer
innerSelect the join type to execute

Output Ports

PortMode
0Set Snapshot

4.1.3 - Interval Join

Join two inputs with left table join key in the range of [right table join key, right table join key + constant value]

Home > Data Cleaning > Join

Input Properties

PropertyRequirementTypeDefaultDescription
Interval ConstantLong10Left attri in (right, right + constant)
Include Left BoundBooleantrueInclude condition left attri = right attri
Include Right BoundBooleantrueInclude condition left attri = right attri
Time interval typeTimeIntervalTypedayYear, Month, Day, Hour, Minute or Second
Left Input attrString (integer, long, double, timestamp)-Choose one attribute in the left table
Right Input attrString-Choose one attribute in the right table

Output Ports

PortMode
0Set Snapshot

4.2 - Set

Operators in the Set category

Home > Data Cleaning > Set

Operators

OperatorDescription
DifferenceFind the set difference of two inputs
IntersectTake the intersect of two inputs
SymmetricDifferenceFind the symmetric difference (the set of elements which are in either of the sets, but not in their intersection) of two inputs
UnionUnions the output rows from multiple input operators

Total: 4 operators

4.2.1 - Difference

Find the set difference of two inputs

Home > Data Cleaning > Set

Output Ports

PortMode
0Set Snapshot

4.2.2 - Intersect

Take the intersect of two inputs

Home > Data Cleaning > Set

Output Ports

PortMode
0Set Snapshot

4.2.3 - SymmetricDifference

Find the symmetric difference (the set of elements which are in either of the sets, but not in their intersection) of two inputs

Home > Data Cleaning > Set

Output Ports

PortMode
0Set Snapshot

4.2.4 - Union

Unions the output rows from multiple input operators

Home > Data Cleaning > Set

Output Ports

PortMode
0Set Snapshot

4.3 - Aggregate

Operators in the Aggregate category

Home > Data Cleaning > Aggregate

Operators

OperatorDescription
AggregateCalculate different types of aggregation values

Total: 1 operator

4.3.1 - Aggregate

Calculate different types of aggregation values

Home > Data Cleaning > Aggregate

Input Properties

PropertyRequirementTypeDefaultDescription
AggregationsList-Multiple aggregation functions (min: 1,
aggregations cannot be empty)
↳ Aggregate Funcsum, count, average, min, max, concat-Sum, count, average, min, max, or concat
↳ AttributeString-Column to calculate average value
↳ Result AttributeString-Column name of average result
Group By KeysList-Group by columns

Output Ports

PortMode
0Set Snapshot

4.4 - Sort

Operators in the Sort category

Home > Data Cleaning > Sort

Operators

OperatorDescription
SortSort based on the columns and sorting methods
Sort PartitionsSort Partitions
Stable Merge SortStable per-partition sort with multi-key ordering (incremental stack of sorted buckets)

Total: 3 operators

4.4.1 - Sort

Sort based on the columns and sorting methods

Home > Data Cleaning > Sort

Input Properties

PropertyRequirementTypeDefaultDescription
AttributesList-Column to perform sorting on
↳ AttributeString-Attribute name to sort by
↳ Sort PreferenceASC, DESC-Sort preference (ASC or DESC)

Output Ports

PortMode
0Set Snapshot

4.4.2 - Sort Partitions

Sort Partitions

Home > Data Cleaning > Sort

Input Properties

PropertyRequirementTypeDefaultDescription
AttributeString (integer, long, double)-Attribute to sort (must be numerical)
Attribute Domain MinLong0Minimum value of the domain of the attribute
Attribute Domain MaxLong0Maximum value of the domain of the attribute

Output Ports

PortMode
0Set Snapshot

4.4.3 - Stable Merge Sort

Stable per-partition sort with multi-key ordering (incremental stack of sorted buckets)

Home > Data Cleaning > Sort

Input Properties

PropertyRequirementTypeDefaultDescription
Sort KeysList-List of attributes to sort by with ordering
preferences
↳ AttributeString-Attribute name to sort by
↳ Sort PreferenceASC, DESC-Sort preference (ASC or DESC)

Output Ports

PortMode
0Set Snapshot

4.5 - Distinct

Remove duplicate tuples

Home > Data Cleaning

Output Ports

PortMode
0Set Snapshot

4.6 - Filter

Performs a filter operation using OR between multiple predicates

Home > Data Cleaning

Input Properties

PropertyRequirementTypeDefaultDescription
PredicatesList-Multiple predicates in OR
↳ AttributeString-
↳ Condition=, >, >=, <, <=, !=, is null,
is not null
-
↳ ValueString-

Output Ports

PortMode
0Set Snapshot

4.7 - Limit

Limit the number of output rows

Home > Data Cleaning

Input Properties

PropertyRequirementTypeDefaultDescription
LimitInteger0The max number of output rows

Output Ports

PortMode
0Set Snapshot

4.8 - Projection

Keeps or drops the column

Home > Data Cleaning

Input Properties

PropertyRequirementTypeDefaultDescription
Drop OptionBooleanfalseCheck to drop the selected attributes
AttributesList-
↳ AttributeString-Attribute name in the schema
↳ AliasString-Renamed attribute name

Output Ports

PortMode
0Set Snapshot

4.9 - Type Casting

Cast between types

Home > Data Cleaning

Input Properties

PropertyRequirementTypeDefaultDescription
TypeCasting UnitsList-Multiple type castings
↳ AttributeString-Attribute for type casting
↳ Cast typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-Result type after type casting

Output Ports

PortMode
0Set Snapshot

5 - Machine Learning

Operators in the Machine Learning category

Home > Machine Learning

Subcategories

5.1 - Sklearn

Operators in the Sklearn category

Home > Machine Learning > Sklearn

Subcategories

Operators

OperatorDescription
Adaptive BoostingSklearn Adaptive Boosting Operator
BaggingSklearn Bagging Operator
Bernoulli Naive BayesSklearn Bernoulli Naive Bayes Operator
Complement Naive BayesSklearn Complement Naive Bayes Operator
Decision TreeSklearn Decision Tree Operator
Dummy ClassifierSklearn Dummy Classifier Operator
Extra TreeSklearn Extra Tree Operator
Extra TreesSklearn Extra Trees Operator
Gaussian Naive BayesSklearn Gaussian Naive Bayes Operator
Gradient BoostingSklearn Gradient Boosting Operator
K-nearest NeighborsSklearn K-nearest Neighbors Operator
Linear RegressionSklearn Linear Regression Operator
Linear Support Vector MachineSklearn Linear Support Vector Machine Operator
Logistic RegressionSklearn Logistic Regression Operator
Logistic Regression Cross ValidationSklearn Logistic Regression Cross Validation Operator
Multi-layer PerceptronSklearn Multi-layer Perceptron Operator
Multinomial Naive BayesSklearn Multinomial Naive Bayes Operator
Nearest CentroidSklearn Nearest Centroid Operator
Passive AggressiveSklearn Passive Aggressive Operator
Linear PerceptronSklearn Linear Perceptron Operator
Sklearn PredictionSklearn Prediction Operator
Probability CalibrationSklearn Probability Calibration Operator
Random ForestSklearn Random Forest Operator
Ridge RegressionSklearn Ridge Regression Operator
Ridge Regression Cross ValidationSklearn Ridge Regression Cross Validation Operator
Stochastic Gradient DescentSklearn Stochastic Gradient Descent Operator
Support Vector MachineSklearn Support Vector Machine Operator
Sklearn TestingIt will generate scorers for Sklearn model

Total: 28 operators

5.1.1 - Sklearn Training

Operators in the Sklearn Training category

Home > Sklearn > Sklearn Training

Operators

OperatorDescription
Training: Adaptive BoostingSklearn Training: Adaptive Boosting Operator
Training: Bagging TrainingSklearn Training: Bagging Training Operator
Training: Bernoulli Naive BayesSklearn Training: Bernoulli Naive Bayes Operator
Training: Complement Naive BayesSklearn Training: Complement Naive Bayes Operator
Training: Decision TreeSklearn Training: Decision Tree Operator
Training: Dummy ClassifierSklearn Training: Dummy Classifier Operator
Training: Extra TreeSklearn Training: Extra Tree Operator
Training: Extra TreesSklearn Training: Extra Trees Operator
Training: Gaussian Naive BayesSklearn Training: Gaussian Naive Bayes Operator
Training: Gradient BoostingSklearn Training: Gradient Boosting Operator
Training: K-nearest NeighborsSklearn Training: K-nearest Neighbors Operator
Training: Linear RegressionSklearn Training: Linear Regression Operator
Training: Linear Support Vector MachineSklearn Training: Linear Support Vector Machine Operator
Training: Logistic RegressionSklearn Training: Logistic Regression Operator
Training: Logistic Regression Cross ValidationSklearn Training: Logistic Regression Cross Validation Operator
Training: Multi-layer PerceptronSklearn Training: Multi-layer Perceptron Operator
Training: Multinomial Naive BayesSklearn Training: Multinomial Naive Bayes Operator
Training: Nearest CentroidSklearn Training: Nearest Centroid Operator
Training: Passive AggressiveSklearn Training: Passive Aggressive Operator
Training: Linear PerceptronSklearn Training: Linear Perceptron Operator
Training: Probability CalibrationSklearn Training: Probability Calibration Operator
Training: Random ForestSklearn Training: Random Forest Operator
Training: Ridge RegressionSklearn Training: Ridge Regression Operator
Training: Ridge Regression Cross ValidationSklearn Training: Ridge Regression Cross Validation Operator
Training: Stochastic Gradient DescentSklearn Training: Stochastic Gradient Descent Operator
Training: Support Vector MachineSklearn Training: Support Vector Machine Operator

Total: 26 operators

5.1.1.1 - Training: Adaptive Boosting

Sklearn Training: Adaptive Boosting Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.2 - Training: Bagging Training

Sklearn Training: Bagging Training Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.3 - Training: Bernoulli Naive Bayes

Sklearn Training: Bernoulli Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.4 - Training: Complement Naive Bayes

Sklearn Training: Complement Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.5 - Training: Decision Tree

Sklearn Training: Decision Tree Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.6 - Training: Dummy Classifier

Sklearn Training: Dummy Classifier Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.7 - Training: Extra Tree

Sklearn Training: Extra Tree Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.8 - Training: Extra Trees

Sklearn Training: Extra Trees Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.9 - Training: Gaussian Naive Bayes

Sklearn Training: Gaussian Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.10 - Training: Gradient Boosting

Sklearn Training: Gradient Boosting Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.11 - Training: K-nearest Neighbors

Sklearn Training: K-nearest Neighbors Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.12 - Training: Linear Perceptron

Sklearn Training: Linear Perceptron Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.13 - Training: Linear Regression

Sklearn Training: Linear Regression Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.14 - Training: Linear Support Vector Machine

Sklearn Training: Linear Support Vector Machine Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.15 - Training: Logistic Regression

Sklearn Training: Logistic Regression Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.16 - Training: Logistic Regression Cross Validation

Sklearn Training: Logistic Regression Cross Validation Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.17 - Training: Multi-layer Perceptron

Sklearn Training: Multi-layer Perceptron Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.18 - Training: Multinomial Naive Bayes

Sklearn Training: Multinomial Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.19 - Training: Nearest Centroid

Sklearn Training: Nearest Centroid Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.20 - Training: Passive Aggressive

Sklearn Training: Passive Aggressive Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.21 - Training: Probability Calibration

Sklearn Training: Probability Calibration Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.22 - Training: Random Forest

Sklearn Training: Random Forest Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.23 - Training: Ridge Regression

Sklearn Training: Ridge Regression Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.24 - Training: Ridge Regression Cross Validation

Sklearn Training: Ridge Regression Cross Validation Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.25 - Training: Stochastic Gradient Descent

Sklearn Training: Stochastic Gradient Descent Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.1.26 - Training: Support Vector Machine

Sklearn Training: Support Vector Machine Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.2 - Adaptive Boosting

Sklearn Adaptive Boosting Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.3 - Bagging

Sklearn Bagging Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.4 - Bernoulli Naive Bayes

Sklearn Bernoulli Naive Bayes Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.5 - Complement Naive Bayes

Sklearn Complement Naive Bayes Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.6 - Decision Tree

Sklearn Decision Tree Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.7 - Dummy Classifier

Sklearn Dummy Classifier Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.8 - Extra Tree

Sklearn Extra Tree Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.9 - Extra Trees

Sklearn Extra Trees Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.10 - Gaussian Naive Bayes

Sklearn Gaussian Naive Bayes Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.11 - Gradient Boosting

Sklearn Gradient Boosting Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.12 - K-nearest Neighbors

Sklearn K-nearest Neighbors Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.13 - Linear Perceptron

Sklearn Linear Perceptron Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.14 - Linear Regression

Sklearn Linear Regression Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
DegreeInteger1Degree of polynomial function

Output Ports

PortMode
0Set Snapshot

5.1.15 - Linear Support Vector Machine

Sklearn Linear Support Vector Machine Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.16 - Logistic Regression

Sklearn Logistic Regression Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.17 - Logistic Regression Cross Validation

Sklearn Logistic Regression Cross Validation Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.18 - Multi-layer Perceptron

Sklearn Multi-layer Perceptron Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.19 - Multinomial Naive Bayes

Sklearn Multinomial Naive Bayes Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.20 - Nearest Centroid

Sklearn Nearest Centroid Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.21 - Passive Aggressive

Sklearn Passive Aggressive Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.22 - Probability Calibration

Sklearn Probability Calibration Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.23 - Random Forest

Sklearn Random Forest Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.24 - Ridge Regression

Sklearn Ridge Regression Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.25 - Ridge Regression Cross Validation

Sklearn Ridge Regression Cross Validation Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.26 - Sklearn Prediction

Sklearn Prediction Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Model AttributeStringmodelAttribute corresponding to ML model
Output Attribute NameStringpredictionAttribute name of the prediction result
Ground Truth Attribute Name To IgnoreString-Attribute name of the ground truth

Output Ports

PortMode
0Set Snapshot

5.1.27 - Sklearn Testing

It will generate scorers for Sklearn model

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
RegressionBooleanfalseChoose to solve a regression task
Model AttributeStringmodelAttribute corresponding to ML model
Target AttributeString-Attribute in your dataset corresponding to target

Output Ports

PortMode
0Set Snapshot

5.1.28 - Stochastic Gradient Descent

Sklearn Stochastic Gradient Descent Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.1.29 - Support Vector Machine

Sklearn Support Vector Machine Operator

Home > Machine Learning > Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Target AttributeString-Attribute in your dataset corresponding to target
Count VectorizerBooleanfalseConvert a collection of text documents to a
matrix of token counts
↳ Text AttributeString-Attribute in your dataset with text to vectorize
↳ Tfidf TransformerBooleanfalseTransform a count matrix to a normalized tf or
tf-idf representation

Output Ports

PortMode
0Set Snapshot

5.2 - Advanced Sklearn

Operators in the Advanced Sklearn category

Home > Machine Learning > Advanced Sklearn

Operators

OperatorDescription
KNN ClassifierSklearn KNN Classifier Operator
KNN RegressorSklearn KNN Regressor Operator
SVM ClassifierSklearn SVM Classifier Operator
SVM RegressorSklearn SVM Regressor Operator

Total: 4 operators

5.2.1 - KNN Classifier

Sklearn KNN Classifier Operator

Home > Machine Learning > Advanced Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Parameter SettingSklearnAdvancedKNNParameters-
Ground Truth Attribute ColumnString-Ground truth attribute column
Selected FeaturesList-Features used to train the model

Output Ports

PortMode
0Set Snapshot

5.2.2 - KNN Regressor

Sklearn KNN Regressor Operator

Home > Machine Learning > Advanced Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Parameter SettingSklearnAdvancedKNNParameters-
Ground Truth Attribute ColumnString-Ground truth attribute column
Selected FeaturesList-Features used to train the model

Output Ports

PortMode
0Set Snapshot

5.2.3 - SVM Classifier

Sklearn SVM Classifier Operator

Home > Machine Learning > Advanced Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Parameter SettingSklearnAdvancedSVCParameters-
Ground Truth Attribute ColumnString-Ground truth attribute column
Selected FeaturesList-Features used to train the model

Output Ports

PortMode
0Set Snapshot

5.2.4 - SVM Regressor

Sklearn SVM Regressor Operator

Home > Machine Learning > Advanced Sklearn

Input Properties

PropertyRequirementTypeDefaultDescription
Parameter SettingSklearnAdvancedSVRParameters-
Ground Truth Attribute ColumnString-Ground truth attribute column
Selected FeaturesList-Features used to train the model

Output Ports

PortMode
0Set Snapshot

5.3 - Hugging Face

Operators in the Hugging Face category

Home > Machine Learning > Hugging Face

Operators

OperatorDescription
Hugging Face Iris Logistic RegressionPredict whether an iris is an Iris-setosa using a pre-trained logistic regression model
Hugging Face Sentiment AnalysisAnalyzing Sentiments with a Twitter-Based Model from Hugging Face
Hugging Face Spam DetectionSpam Detection by SMS Spam Detection Model from Hugging Face
Hugging Face Text SummarizationSummarize the given text content with a mini2bert pre-trained model from Hugging Face

Total: 4 operators

5.3.1 - Hugging Face Iris Logistic Regression

Predict whether an iris is an Iris-setosa using a pre-trained logistic regression model

Home > Machine Learning > Hugging Face

Input Properties

PropertyRequirementTypeDefaultDescription
Petal Length Cm AttributeString-Attribute in your dataset corresponding to
PetalLengthCm
Petal Width Cm AttributeString-Attribute in your dataset corresponding to
PetalWidthCm
Prediction Class NameStringSpecies_predictionOutput attribute name for the predicted class of
species
Prediction Probability NameStringSpecies_probabilityOutput attribute name for the prediction’s
probability of being a Iris-setosa

Output Ports

PortMode
0Set Snapshot

5.3.2 - Hugging Face Sentiment Analysis

Analyzing Sentiments with a Twitter-Based Model from Hugging Face

Home > Machine Learning > Hugging Face

Input Properties

PropertyRequirementTypeDefaultDescription
AttributeString-Column to perform sentiment analysis on
Positive Result AttributeStringhuggingface_sentiment_positiveColumn name of the sentiment analysis result
(positive)
Neutral Result AttributeStringhuggingface_sentiment_neutralColumn name of the sentiment analysis result
(neutral)
Negative Result AttributeStringhuggingface_sentiment_negativeColumn name of the sentiment analysis result
(negative)

Output Ports

PortMode
0Set Snapshot

5.3.3 - Hugging Face Spam Detection

Spam Detection by SMS Spam Detection Model from Hugging Face

Home > Machine Learning > Hugging Face

Input Properties

PropertyRequirementTypeDefaultDescription
AttributeString-Column to perform spam detection on
Spam Result AttributeStringis_spamColumn name of whether spam or not
Score Result AttributeStringscoreColumn name of Probability for classification

Output Ports

PortMode
0Set Snapshot

5.3.4 - Hugging Face Text Summarization

Summarize the given text content with a mini2bert pre-trained model from Hugging Face

Home > Machine Learning > Hugging Face

Input Properties

PropertyRequirementTypeDefaultDescription
AttributeString-Attribute to perform text summarization on
Result Attribute NameStringsummaryAttribute name of the text summary result

Output Ports

PortMode
0Set Snapshot

5.4 - Machine Learning General

Operators in the Machine Learning General category

Home > Machine Learning > Machine Learning General

Operators

OperatorDescription
Machine Learning ScorerScorer for machine learning models

Total: 1 operator

5.4.1 - Machine Learning Scorer

Scorer for machine learning models

Home > Machine Learning > Machine Learning General

Input Properties

PropertyRequirementTypeDefaultDescription
RegressionBooleanfalseChoose to solve a regression task
↳ Scorer FunctionsList-Select classification tasks metrics
↳ Scorer FunctionsList-Select regression tasks metrics
Actual ValueString-Specify the label attribute
Predicted ValueString-Specify the attribute generated by the model

Output Ports

PortMode
0Set Snapshot

6 - Utilities

Operators in the Utilities category

Home > Utilities

Operators

OperatorDescription
Random K SamplingRandom sampling with given percentage
Reservoir SamplingReservoir Sampling with k items being kept randomly
SplitSplit data to two different ports
Unnest StringUnnest the string values in the column separated by a delimiter to multiple values

Total: 4 operators

6.1 - Random K Sampling

Random sampling with given percentage

Home > Utilities

Input Properties

PropertyRequirementTypeDefaultDescription
Random K Sample PercentageInteger0Random k sampling with given percentage

Output Ports

PortMode
0Set Snapshot

6.2 - Reservoir Sampling

Reservoir Sampling with k items being kept randomly

Home > Utilities

Input Properties

PropertyRequirementTypeDefaultDescription
Number Of Item Sampled In Reservoir SamplingInteger0Reservoir sampling with k items being kept
randomly

Output Ports

PortMode
0Set Snapshot

6.3 - Split

Split data to two different ports

Home > Utilities

Input Properties

PropertyRequirementTypeDefaultDescription
Split PercentageInteger80Percentage of data going to the upper port
Auto-Generate SeedBooleantrueShuffle the data based on a random seed
↳ SeedInteger1An int for reproducible output across multiple
runs

Output Ports

PortMode
0Set Snapshot
1Set Snapshot

6.4 - Unnest String

Unnest the string values in the column separated by a delimiter to multiple values

Home > Utilities

Input Properties

PropertyRequirementTypeDefaultDescription
DelimiterString,String that separates the data
AttributeString-Column of the string to unnest
Result AttributeStringunnestResultColumn name of the unnest result

Output Ports

PortMode
0Set Snapshot

7 - External API

Operators in the External API category

Home > External API

Operators

OperatorDescription
Reddit SearchSearch for recent posts with python-wrapped Reddit API, PRAW
Twitter Full Archive Search APIRetrieve data from Twitter Full Archive Search API
Twitter Search APIRetrieve data from Twitter Search API
URL FetcherFetch the content of a single URL

Total: 4 operators

7.1 - Reddit Search

Search for recent posts with python-wrapped Reddit API, PRAW

Home > External Api

Input Properties

PropertyRequirementTypeDefaultDescription
Client IdString-Client id that uses to access Reddit API
Client SecretString-Client secret that uses to access Reddit API
QueryString-Search query
LimitInteger100Up to 1000
Sortingnone, controversial, gilded, hot, new,
rising, top
noneThe sorting method, hot, new, etc

Output Ports

PortMode
0Set Snapshot

7.2 - Twitter Full Archive Search API

Retrieve data from Twitter Full Archive Search API

Home > External Api

Input Properties

PropertyRequirementTypeDefaultDescription
API KeyString-
API Secret KeyString-
Stop Upon Rate LimitBooleanfalseStop when hitting rate limit?
Search QueryString-Up to 1024 characters (Limited By Twitter)
From DatetimeString2021-04-01T00:00:00ZISO 8601 format
To DatetimeString2021-05-01T00:00:00ZISO 8601 format
LimitInteger100Maximum number of tweets to retrieve

Output Ports

PortMode
0Set Snapshot

7.3 - Twitter Search API

Retrieve data from Twitter Search API

Home > External Api

Input Properties

PropertyRequirementTypeDefaultDescription
API KeyString-
API Secret KeyString-
Stop Upon Rate LimitBooleanfalseStop when hitting rate limit?
Search QueryString-Up to 1024 characters (Limited by Twitter)
LimitInteger100Maximum number of tweets to retrieve

Output Ports

PortMode
0Set Snapshot

7.4 - URL Fetcher

Fetch the content of a single URL

Home > External Api

Input Properties

PropertyRequirementTypeDefaultDescription
URLString-Only accepts standard URL format
DecodingUTF-8, RAW BYTES-The decoding method for the url content

Output Ports

PortMode
0Set Snapshot

8 - User-defined Functions

Operators in the User-defined Functions category

Home > User-defined Functions

Subcategories

8.1 - Python

Operators in the Python category

Home > User-defined Functions > Python

Operators

OperatorDescription
2-in Python UDFUser-defined function operator in Python script
Python Lambda FunctionModify or add a new column with more ease
Python Table ReducerReduce Table to Tuple
1-out Python UDFUser-defined function operator in Python script
Python UDFUser-defined function operator in Python script

Total: 5 operators

8.1.1 - 1-out Python UDF

User-defined function operator in Python script

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Python scriptCode (python)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
ColumnsList-The columns of the source
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Python script

# from pytexera import *
# class GenerateOperator(UDFSourceOperator):
# 
#     @overrides
#     
#     def produce(self) -> Iterator[Union[TupleLike, TableLike, None]]:
#         yield

Output Ports

PortMode
0Set Snapshot

8.1.2 - 2-in Python UDF

User-defined function operator in Python script

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Python scriptCode (python)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Python script

# Choose from the following templates:
# 
# from pytexera import *
# 
# class ProcessTupleOperator(UDFOperatorV2):
#     
#     @overrides
#     def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
#         yield tuple_
# 
# class ProcessBatchOperator(UDFBatchOperator):
#     BATCH_SIZE = 10 # must be a positive integer
# 
#     @overrides
#     def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
#         yield batch
# 
# class ProcessTableOperator(UDFTableOperator):
# 
#     @overrides
#     def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
#         yield table

Output Ports

PortMode
0Set Snapshot

8.1.3 - Python Lambda Function

Modify or add a new column with more ease

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Add/Modify column(s)List-
↳ Attribute NameString-
↳ ExpressionString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Output Ports

PortMode
0Set Snapshot

8.1.4 - Python Table Reducer

Reduce Table to Tuple

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Output columnsList-
↳ Attribute NameString-
↳ ExpressionString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Output Ports

PortMode
0Set Snapshot

8.1.5 - Python UDF

User-defined function operator in Python script

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Python scriptCode (python)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Python script

# Choose from the following templates:
# 
# from pytexera import *
# 
# class ProcessTupleOperator(UDFOperatorV2):
#     
#     @overrides
#     def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
#         yield tuple_
# 
# class ProcessBatchOperator(UDFBatchOperator):
#     BATCH_SIZE = 10 # must be a positive integer
# 
#     @overrides
#     def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
#         yield batch
# 
# class ProcessTableOperator(UDFTableOperator):
# 
#     @overrides
#     def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
#         yield table

Output Ports

PortMode
0Set Snapshot

8.2 - Java

Operators in the Java category

Home > User-defined Functions > Java

Operators

OperatorDescription
Java UDFUser-defined function operator in Java script

Total: 1 operator

8.2.1 - Java UDF

User-defined function operator in Java script

Home > User Defined Functions > Java

Input Properties

PropertyRequirementTypeDefaultDescription
Java UDF scriptCode (java)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Java UDF script

import org.apache.texera.amber.operator.map.MapOpExec;
import org.apache.texera.amber.core.tuple.Tuple;
import org.apache.texera.amber.core.tuple.TupleLike;
import scala.Function1;
import java.io.Serializable;

public class JavaUDFOpExec extends MapOpExec {
    public JavaUDFOpExec () {
        this.setMapFunc((Function1<Tuple, TupleLike> & Serializable) this::processTuple);
    }
    
    public TupleLike processTuple(Tuple tuple) {
        return tuple;
    }
}

Output Ports

PortMode
0Set Snapshot

8.3 - R

Operators in the R category

Home > User-defined Functions > R

Operators

OperatorDescription
R UDFUser-defined function operator in R script
1-out R UDFUser-defined function operator in R script

Total: 2 operators

8.3.1 - 1-out R UDF

User-defined function operator in R script

Home > User Defined Functions > R

Input Properties

PropertyRequirementTypeDefaultDescription
R Source UDF ScriptCode (r)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Use Tuple API?BooleanfalseCheck this box to use Tuple API, leave unchecked
to use Table API
ColumnsList-The columns of the source
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

R Source UDF Script

# If using Table API:
# function() { 
#   return (data.frame(Column_Here = "Value_Here")) 
# }

# If using Tuple API:
# library(coro)
# coro::generator(function() {
#   yield (list(text= "hello world!"))
# })

Output Ports

PortMode
0Set Snapshot

8.3.2 - R UDF

User-defined function operator in R script

Home > User Defined Functions > R

Input Properties

PropertyRequirementTypeDefaultDescription
R UDF ScriptCode (r)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Use Tuple API?BooleanfalseCheck this box to use Tuple API, leave unchecked
to use Table API
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

R UDF Script

# If using Table API:
# function(table, port) { 
#   return (table) 
# }

# If using Tuple API:
# library(coro)
# coro::generator(function(tuple, port) {
#   yield (tuple)
# })

Output Ports

PortMode
0Set Snapshot

9 - Visualization

Operators in the Visualization category

Home > Visualization

Subcategories

Operators

OperatorDescription
Nested TableVisualize Data in a Depth Two Nested Table

Total: 1 operator

9.1 - Basic

Operators in the Basic category

Home > Visualization > Basic

Operators

OperatorDescription
Bar ChartVisualize data in a Bar Chart
Bubble ChartA 3D Scatter Plot; Bubbles are graphed using x and y labels, and their sizes determined by a z-value.
Dot PlotVisualize data using a dot plot
Dumbbell PlotVisualize data in a Dumbbell Plot. A dumbbell plot (also known as a lollipop chart) is typically used to compare two distinct values or time points for the same entity.
Figure Factory TableVisualize data in a figure factory table
Filled Area PlotVisualize data in a filled area plot
Gantt ChartA Gantt chart is a type of bar chart that illustrates a project schedule. The chart lists the tasks to be performed on the vertical axis, and time intervals on the horizontal axis. The width of the horizontal bars in the graph shows the duration of each activity.
Hierarchy ChartVisualize data in hierarchy
Icicle ChartVisualize hierarchical data from root to leaves
Line ChartView the result in line chart
Pie ChartVisualize data in a Pie Chart
Range SliderVisualize data in a Range Slider
Sankey DiagramVisualize data using a Sankey diagram
Scatter PlotView the result in a scatterplot
Tables PlotVisualize data in a table chart.
Time Series PlotVisualize trends and patterns over time.

Total: 16 operators

9.1.1 - Bar Chart

Visualize data in a Bar Chart

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
FieldsString-Visualize categorical data in a Bar Chart
Category ColumnStringNo SelectionOptional - Select a column to Color Code the
Categories
Horizontal OrientationBooleanfalseOrientation Style
PatternString-Add texture to the chart based on an attribute
Value ColumnString (integer, long, double)-The value associated with each category

Output Ports

PortMode
0Single Snapshot

9.1.2 - Bubble Chart

A 3D Scatter Plot; Bubbles are graphed using x and y labels, and their sizes determined by a z-value.

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
X-ColumnString-Data column for the x-axis
Y-ColumnString-Data column for the y-axis
Z-ColumnString-Data column to determine bubble size
Enable ColorBooleanfalseColors bubbles using a data column
Color-ColumnString-Picks data column to color bubbles with if color
is enabled

Output Ports

PortMode
0Single Snapshot

9.1.3 - Dot Plot

Visualize data using a dot plot

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Count AttributeString-The attribute for the counting of the dot plot

Output Ports

PortMode
0Single Snapshot

9.1.4 - Dumbbell Plot

Visualize data in a Dumbbell Plot. A dumbbell plot (also known as a lollipop chart) is typically used to compare two distinct values or time points for the same entity.

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Category Column NameString-The name of the category column
Dumbbell Start ValueString-The start point value of each dumbbell
Dumbbell End ValueString-The end value of each dumbbell
Measurement Column NameString (integer, long, double)-The name of the measurement column
Compared Column NameString-The column name that is being compared
DotsList-
↳ Dot Column ValueString (integer, long, double)-Value for dot axis
Show Legends?BooleanfalseWhether to show legends in the graph

Output Ports

PortMode
0Single Snapshot

9.1.5 - Figure Factory Table

Visualize data in a figure factory table

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Font SizeDouble12Font size of the Figure Factory Table
Font Color (Hex Code)String#000000Font color of the Figure Factory Table
Row HeightDouble30Row height of the Figure Factory Table
Add AttributeList[1 items]List of columns to include in the figure factory
table
↳ Attribute NameString-

Output Ports

PortMode
0Single Snapshot

9.1.6 - Filled Area Plot

Visualize data in a filled area plot

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
X-axis AttributeString-The attribute for your x-axis
Y-axis AttributeString-The attribute for your y-axis
Line GroupString-The attribute for group of each line
ColorString-Choose an attribute to color the plot
Split Plot by Line GroupBooleanfalseDo you want to split the graph
PatternString-Add texture to the chart based on an attribute

Output Ports

PortMode
0Single Snapshot

9.1.7 - Gantt Chart

A Gantt chart is a type of bar chart that illustrates a project schedule. The chart lists the tasks to be performed on the vertical axis, and time intervals on the horizontal axis. The width of the horizontal bars in the graph shows the duration of each activity.

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
PatternString-Add texture to the chart based on an attribute
Start Datetime ColumnString (timestamp)-The start timestamp of the task
Finish Datetime ColumnString (timestamp)-The end timestamp of the task
Task ColumnString-The name of the task
Color ColumnString-Column to color tasks

Output Ports

PortMode
0Single Snapshot

9.1.8 - Hierarchy Chart

Visualize data in hierarchy

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Chart Typetreemap, sunburst-Treemap or Sunburst
Hierarchy PathList-Hierarchy of attributes from a higher-level
category to lower-level category
↳ Attribute NameString-
Value ColumnString (integer, long, double)-The value associated with the size of each sector
in the chart

Output Ports

PortMode
0Single Snapshot

9.1.9 - Icicle Chart

Visualize hierarchical data from root to leaves

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Hierarchy PathList-Hierarchy of attributes from a root (higher-level
category) to leaves (lower-level category)
↳ Attribute NameString-
Value ColumnString (integer, long, double)-The value associated with the size of each sector
in the chart

Output Ports

PortMode
0Single Snapshot

9.1.10 - Line Chart

View the result in line chart

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Y LabelStringY AxisThe label for y axis
X LabelStringX AxisThe label for x axis
LinesList-
↳ Y ValueString-Value for y axis
↳ X ValueString-Value for x axis
↳ Line Modeline, dots, line with dotsline with dots
↳ Line NameString-
↳ Line ColorString-Must be a valid CSS color or hex color string

Output Ports

PortMode
0Single Snapshot

9.1.11 - Pie Chart

Visualize data in a Pie Chart

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Value ColumnString (integer, long, double)-The value associated with slice of pie
Name ColumnString-The name of the slice of pie

Output Ports

PortMode
0Single Snapshot

9.1.12 - Range Slider

Visualize data in a Range Slider

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Y-axisString-The name of the column to represent y-axis
X-axisString-The name of the column to represent the x-axis
Handle DuplicatesNothing, Mean, SumNOTHINGHow to handle duplicate values in y-axis

Output Ports

PortMode
0Single Snapshot

9.1.13 - Sankey Diagram

Visualize data using a Sankey diagram

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Source AttributeString-The source node of the Sankey diagram
Target AttributeString-The target node of the Sankey diagram
Value AttributeString-The value/volume of the flow between source and
target

Output Ports

PortMode
0Single Snapshot

9.1.14 - Scatter Plot

View the result in a scatterplot

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
X-ColumnString (integer, double)-X Column
Y-ColumnString (integer, double)-Y Column
Alpha ValueDouble1.0Alpha (opacity) value from 0.0 (transparent) to
1.0 (opaque)
Color-ColumnString-Dots will be assigned different colors based on
their values of this column
log scale XBooleanfalseValues in X-column is log-scaled
log scale YBooleanfalseValues in Y-column is log-scaled
Hover columnString-Column value to display when a dot is hovered over

Output Ports

PortMode
0Single Snapshot

9.1.15 - Tables Plot

Visualize data in a table chart.

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Add AttributeList
-List of columns to include in the table chart
↳ Attribute NameString-

Output Ports

PortMode
0Single Snapshot

9.1.16 - Time Series Plot

Visualize trends and patterns over time.

Home > Visualization > Basic

Input Properties

PropertyRequirementTypeDefaultDescription
Time ColumnString-The column containing time/date values (e.g.,
Date, Timestamp)
Value ColumnString-The numerical column to plot on the Y-axis (e.g.,
Sales, Temperature)
Category ColumnStringNo SelectionOptional - A categorical column to create
separate lines
Facet ColumnStringNo SelectionOptional - A column to create separate subplots
Plot TypeStringlineSelect the type of time series plot (line, area)
Show Range SliderBooleanfalseDisplay a range slider at the bottom of the plot

Output Ports

PortMode
0Single Snapshot

9.2 - Statistical

Operators in the Statistical category

Home > Visualization > Statistical

Operators

OperatorDescription
Box/Violin PlotVisualize data using either a Box Plot or a Violin Plot. Box plots are drawn as a box with a vertical line down the middle which is mean value, and has horizontal lines attached to each side (known as “whiskers”). Violin plots provide more detail by showing a smoothed density curve on each side, and also include a box plot inside for comparison.
Continuous Error BandsVisualize error or uncertainty along a continuous line
Empirical Cumulative Distribution PlotVisualize the empirical cumulative distribution of a numeric column.
HistogramVisualize data in a Histogram Chart
Histogram2DDisplays a bivariate histogram as a density heatmap
Scatter Matrix ChartVisualize datasets in a Scatter Matrix
Strip ChartVisualize distribution of data points as a strip plot
Tree PlotVisualize hierarchical data as a top-down, interactive, auto-sizing tree

Total: 8 operators

9.2.1 - Box/Violin Plot

Visualize data using either a Box Plot or a Violin Plot. Box plots are drawn as a box with a vertical line down the middle which is mean value, and has horizontal lines attached to each side (known as “whiskers”). Violin plots provide more detail by showing a smoothed density curve on each side, and also include a box plot inside for comparison.

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
Value ColumnString (integer, long, double)-Data column for box plot
Quartile Methodlinear, inclusive, exclusivelinear
Horizontal OrientationBooleanfalseOrientation style
Violin PlotBooleanfalseCheck this box to overlay a violin plot on the
box plot; otherwise, show only the box plot

Output Ports

PortMode
0Single Snapshot

9.2.2 - Continuous Error Bands

Visualize error or uncertainty along a continuous line

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
X LabelStringX AxisLabel used for x axis
Y LabelStringY AxisLabel used for y axis
BandsList-
↳ Y-Axis Upper BoundString-Represents upper bound error of y-values
↳ Y-Axis Lower BoundString-Represents lower bound error of y-values
↳ Fill ColorString-Must be a valid CSS color or hex color string
↳ Y ValueString-Value for y axis
↳ X ValueString-Value for x axis
↳ Line Modeline, dots, line with dotsline with dots
↳ Line NameString-
↳ Line ColorString-Must be a valid CSS color or hex color string

Output Ports

PortMode
0Single Snapshot

9.2.3 - Empirical Cumulative Distribution Plot

Visualize the empirical cumulative distribution of a numeric column.

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
Value ColumnString (integer, long, double)-Numeric column used to compute the empirical
cumulative distribution
Color ColumnString-Optional column for coloring ECDF lines by group
Separate By ColumnString-Optional column for splitting ECDF plots into
subplots
Y Axis ModeStringprobabilityDisplay cumulative probability, raw count, or
cumulative sum
CDF ModeStringstandard‘standard’ shows P(X ≤ x), ‘reversed’ shows P(X ≥
x), ‘complementary’ shows 1 - P(X ≤ x)
OrientationStringverticalPlot ECDF vertically or horizontally
Show MarkersBooleanfalseDisplay sample markers on the ECDF line
Marginal PlotStringnoneOptional marginal plot to display alongside the
ECDF

Output Ports

PortMode
0Single Snapshot

9.2.4 - Histogram

Visualize data in a Histogram Chart

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
Color ColumnString-Column for differentiating data by its value
SeparateBy ColumnString-Column for separating histogram chart by its value
Distribution TypeString-Distribution type (rug, box, violin)
PatternString-Add texture to the chart based on an attribute
Value ColumnString-Column for counting values

Output Ports

PortMode
0Single Snapshot

9.2.5 - Histogram2D

Displays a bivariate histogram as a density heatmap

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
X ColumnString-Numeric column for the X axis bins
Y ColumnString-Numeric column for the Y axis bins
X BinsInteger10Number of bins along the X axis (Default: 10)
Y BinsInteger10Number of bins along the Y axis (Default: 10)
Normalizationdensity, probability, percentdensityType of histogram normalization

Output Ports

PortMode
0Single Snapshot

9.2.6 - Scatter Matrix Chart

Visualize datasets in a Scatter Matrix

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
Selected AttributesList-The axes of each scatter plot in the matrix
Color ColumnString-Column to color points

Output Ports

PortMode
0Single Snapshot

9.2.7 - Strip Chart

Visualize distribution of data points as a strip plot

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
X-Axis ColumnString-Column containing numeric values for the x-axis
Y-Axis ColumnString-Column containing categorical values for the
y-axis
Color ByString-Optional - Color points by category
Facet ColumnString-Optional - Create separate subplots for each
category

Output Ports

PortMode
0Single Snapshot

9.2.8 - Tree Plot

Visualize hierarchical data as a top-down, interactive, auto-sizing tree

Home > Visualization > Statistical

Input Properties

PropertyRequirementTypeDefaultDescription
Edge List ColumnString-Column with [parent, child] pairs

Output Ports

PortMode
0Single Snapshot

9.3 - Scientific

Operators in the Scientific category

Home > Visualization > Scientific

Operators

OperatorDescription
Carpet PlotVisualize data in a Carpet Plot
Contour PlotDisplays terrain or gradient variations in a Contour Plot
DendrogramVisualize data in a Dendrogram
HeatmapVisualize data in a HeatMap Chart
Network GraphVisualize data in a network graph
Parallel Coordinates PlotVisualize multivariate data using parallel coordinate axes
Polar ChartDisplays data points in a polar scatter plot
Quiver PlotVisualize vector data in a Quiver Plot
Radar ChartVisualize data in a Radar Chart
Radar PlotView the result in a radar plot.
Ternary ContourShows how a measured value changes across all mixtures of three components that sum to a constant
Ternary PlotPoints are graphed on a Ternary Plot using 3 specified data fields
Volcano PlotDisplays statistical significance versus effect size
Wind Rose ChartDisplays wind distribution using a polar bar chart

Total: 14 operators

9.3.1 - Carpet Plot

Visualize data in a Carpet Plot

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
First Parameter Axis ColumnString-Column representing the first parameter axis (a)
Second Parameter Axis ColumnString-Column representing the second parameter axis (b)
Value ColumnString-Column representing the value at each (a, b)
coordinate

Output Ports

PortMode
0Single Snapshot

9.3.2 - Contour Plot

Displays terrain or gradient variations in a Contour Plot

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Grid SizeString10Grid resolution of the final image
Connect GapsBooleantrueAutomatically fill in the missing parts
xString-The column name of X-axis
yString-The column name of Y-axis
zString-The column name of color bar
Coloring Methodheatmap, lines, noneheatmap

Output Ports

PortMode
0Single Snapshot

9.3.3 - Dendrogram

Visualize data in a Dendrogram

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Color ThresholdString-Value at which separation of clusters will be made
Value X ColumnString-The x values of points in dendrogram
Value Y ColumnString-The y value of points in dendrogram
LabelsString-The label of points in dendrogram

Output Ports

PortMode
0Single Snapshot

9.3.4 - Heatmap

Visualize data in a HeatMap Chart

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Value X ColumnString-The values along the x-axis
Value Y ColumnString-The values along the y-axis
ValuesString-The values of the heatmap

Output Ports

PortMode
0Single Snapshot

9.3.5 - Network Graph

Visualize data in a network graph

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Source ColumnString-Source node for edge in graph
Destination ColumnString-Destination node for edge in graph
TitleStringNetwork Graph

Output Ports

PortMode
0Single Snapshot

9.3.6 - Parallel Coordinates Plot

Visualize multivariate data using parallel coordinate axes

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
DimensionsList-List of numeric columns to visualize as parallel
axes (min: 1, At least one dimension is required)
Color ColumnString-Column used to color or group the lines

Output Ports

PortMode
0Single Snapshot

9.3.7 - Polar Chart

Displays data points in a polar scatter plot

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
rString-The column name for radial values (must be
numeric)
thetaString-The column name for angular values (must be
numeric)

Output Ports

PortMode
0Single Snapshot

9.3.8 - Quiver Plot

Visualize vector data in a Quiver Plot

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
xString-Column for the x-coordinate of the starting point
yString-Column for the y-coordinate of the starting point
uString-Column for the vector component in the x-direction
vString-Column for the vector component in the y-direction

Output Ports

PortMode
0Single Snapshot

9.3.9 - Radar Chart

Visualize data in a Radar Chart

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Name ColumnString-Column containing entity names for each radar
Value ColumnsList-Columns containing numeric values for radar chart
axes
Fill OpacityDouble0.5Opacity value for radar chart fill from 0.0
(transparent) to 1.0 (opaque)

Output Ports

PortMode
0Single Snapshot

9.3.10 - Radar Plot

View the result in a radar plot.

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
AxesList-Numeric columns to use as radar axes
Trace Name ColumnStringNo SelectionOptional - Select a column to use for naming each
radar trace
Trace Color ColumnStringNo SelectionOptional - Select a column to use for coloring
each radar trace (note: if there are too many
traces with distinct coloring values, colors may
repeat)
Line Patternsolid, dash, dotsolidPattern of the lines connecting points on the
radar plot
Max NormalizeBooleantrueNormalize radar plot values by scaling them
relative to the maximum value on their respective
axes
Fill TraceBooleantrueFill the area within each radar trace
Show Point MarkersBooleantrueDisplay point markers on the radar plot
Show LegendBooleantrueDisplay the legend (note: without the legend, you
are unable to selectively hide or show traces in
the plot)

Output Ports

PortMode
0Single Snapshot

9.3.11 - Ternary Contour

Shows how a measured value changes across all mixtures of three components that sum to a constant

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Variable 1String-First variable data field
Variable 2String-Second variable data field
Variable 3String-Third variable data field
Measured ValueString-Measured value data field

Output Ports

PortMode
0Single Snapshot

9.3.12 - Ternary Plot

Points are graphed on a Ternary Plot using 3 specified data fields

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Variable 1String-First variable data field
Variable 2String-Second variable data field
Variable 3String-Third variable data field
Categorize by ColorBooleanfalseOptionally color points using a data field
Color Data FieldString-Specify the data field to color

Output Ports

PortMode
0Single Snapshot

9.3.13 - Volcano Plot

Displays statistical significance versus effect size

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Effect Size (log2 Fold Change)String-Select the column representing the effect size or
magnitude of change between two experimental
groups. This value is typically a log2 fold
change and is used for the x-axis of the volcano
plot
P-Value ColumnString-Select the column representing the p-value
associated with the statistical test for each
feature. This value is transformed using
-log10(p-value) and plotted on the y-axis to
indicate statistical significance

Output Ports

PortMode
0Single Snapshot

9.3.14 - Wind Rose Chart

Displays wind distribution using a polar bar chart

Home > Visualization > Scientific

Input Properties

PropertyRequirementTypeDefaultDescription
Radial Values (r)String-Numeric values representing magnitude (e.g.,
frequency)
Angular Values (θ)String-Direction or angle categories (e.g., N, NE, E)
Color GroupString-Optional grouping column (e.g., wind strength)

Output Ports

PortMode
0Single Snapshot

9.4 - Financial

Operators in the Financial category

Home > Visualization > Financial

Operators

OperatorDescription
Bullet ChartVisualize data using a Bullet Chart that shows a primary quantitative bar and delta indicator.
Optional elements such as qualitative ranges (steps) and a performance threshold are displayed only when provided.
Candlestick ChartVisualize data in a Candlestick Chart
Funnel PlotVisualize data in a Funnel Plot
Gauge ChartVisualize a single value with a radial gauge chart, showing progress towards a goal with optional steps, threshold, and delta.
Waterfall ChartVisualize data as a waterfall chart

Total: 5 operators

9.4.1 - Bullet Chart

Visualize data using a Bullet Chart that shows a primary quantitative bar and delta indicator. Optional elements such as qualitative ranges (steps) and a performance threshold are displayed only when provided.

Home > Visualization > Financial

Input Properties

PropertyRequirementTypeDefaultDescription
ValueString-The actual value to display on the bullet chart
Delta ReferenceString-The reference value for the delta indicator.
e.g., 100
Threshold ValueString-The performance threshold value. e.g., 100
StepsList[]Optional: Each step includes a start and end
value e.g., 0, 100
↳ StartString-
↳ EndString-

Output Ports

PortMode
0Single Snapshot

9.4.2 - Candlestick Chart

Visualize data in a Candlestick Chart

Home > Visualization > Financial

Input Properties

PropertyRequirementTypeDefaultDescription
Date ColumnString-The date of the candlestick
Opening Price ColumnString-The opening price of the candlestick
Highest Price ColumnString-The highest price of the candlestick
Lowest Price ColumnString-The lowest price of the candlestick
Closing Price ColumnString-The closing price of the candlestick

Output Ports

PortMode
0Single Snapshot

9.4.3 - Funnel Plot

Visualize data in a Funnel Plot

Home > Visualization > Financial

Input Properties

PropertyRequirementTypeDefaultDescription
X ColumnString-Data column for the x-axis
Y ColumnString-Data column for the y-axis
Color ColumnString-Column to categorically colorize funnel sections

Output Ports

PortMode
0Single Snapshot

9.4.4 - Gauge Chart

Visualize a single value with a radial gauge chart, showing progress towards a goal with optional steps, threshold, and delta.

Home > Visualization > Financial

Input Properties

PropertyRequirementTypeDefaultDescription
Gauge ValueString-The primary value displayed on the gauge chart
DeltaString-The baseline value used to calculate the delta
from the gauge value
Threshold ValueString-Defines a boundary or target value shown on the
gauge chart
StepsList-List of step ranges for the gauge
↳ StartString-
↳ EndString-

Output Ports

PortMode
0Single Snapshot

9.4.5 - Waterfall Chart

Visualize data as a waterfall chart

Home > Visualization > Financial

Input Properties

PropertyRequirementTypeDefaultDescription
X Axis ValuesString-The column representing categories or stages
Y Axis ValuesString-The column representing numeric values for each
stage

Output Ports

PortMode
0Single Snapshot

9.5 - Media

Operators in the Media category

Home > Visualization > Media

Operators

OperatorDescription
HTML VisualizerRender the result of HTML content
Image VisualizerVisualize image content
URL VisualizerRender the content of URL
Word CloudGenerate word cloud for texts

Total: 4 operators

9.5.1 - HTML Visualizer

Render the result of HTML content

Home > Visualization > Media

Input Properties

PropertyRequirementTypeDefaultDescription
HTML contentString-

Output Ports

PortMode
0Single Snapshot

9.5.2 - Image Visualizer

Visualize image content

Home > Visualization > Media

Input Properties

PropertyRequirementTypeDefaultDescription
image content columnString-The Binary data of the Image

Output Ports

PortMode
0Single Snapshot

9.5.3 - URL Visualizer

Render the content of URL

Home > Visualization > Media

Input Properties

PropertyRequirementTypeDefaultDescription
URL contentString-

Output Ports

PortMode
0Single Snapshot

9.5.4 - Word Cloud

Generate word cloud for texts

Home > Visualization > Media

Input Properties

PropertyRequirementTypeDefaultDescription
Text columnString-
Number of most frequent wordsInteger100

Output Ports

PortMode
0Single Snapshot

9.6 - Advanced

Operators in the Advanced category

Home > Visualization > Advanced

Operators

OperatorDescription
Choropleth MapVisualize data using a Choropleth Map that uses shades of colors to show differences in properties or quantities between regions
Scatter3D ChartVisualize data in a Scatter3D Plot

Total: 2 operators

9.6.1 - Choropleth Map

Visualize data using a Choropleth Map that uses shades of colors to show differences in properties or quantities between regions

Home > Visualization > Advanced

Input Properties

PropertyRequirementTypeDefaultDescription
Locations ColumnString-Column used to describe location. Currently only
supports countries and needs to be three-letter
ISO country code
Color ColumnString (integer, long, double)-Column used to determine intensity of color of
the region

Output Ports

PortMode
0Single Snapshot

9.6.2 - Scatter3D Chart

Visualize data in a Scatter3D Plot

Home > Visualization > Advanced

Input Properties

PropertyRequirementTypeDefaultDescription
X ColumnString-Data column for the x-axis
Y ColumnString-Data column for the y-axis
Z ColumnString-Data column for the z-axis

Output Ports

PortMode
0Single Snapshot

9.7 - Nested Table

Visualize Data in a Depth Two Nested Table

Home > Visualization

Input Properties

PropertyRequirementTypeDefaultDescription
Add AttributeList-List of columns to include in the nested table
chart and their subgroup
↳ Attribute groupString-
↳ Original attribute NameString-
↳ New Attribute NameString-

Output Ports

PortMode
0Single Snapshot

10 - Control Block

Operators in the Control Block category

Home > Control Block

Operators

OperatorDescription
IfIf
SleepSleep n seconds between each tuple

Total: 2 operators

10.1 - If

If

Home > Control Block

Input Properties

PropertyRequirementTypeDefaultDescription
Condition StateString-Name of the state variable to evaluate

Output Ports

PortMode
0Set Snapshot
1Set Snapshot

10.2 - Sleep

Sleep n seconds between each tuple

Home > Control Block

Input Properties

PropertyRequirementTypeDefaultDescription
Sleep Time (seconds)Integer0

Output Ports

PortMode
0Set Snapshot

11 - Output Port Modes

Reference for operator output port modes

Home

Texera operators emit data through output ports. Each port advertises a mode that describes how downstream operators should interpret the stream of tuples it produces.

Set Snapshot

The port re-emits the complete result set on each update. Downstream operators always see the full materialized result.

Delta Updates

The port emits an incremental delta of the result set on each update. Downstream operators apply the delta on top of prior state instead of receiving a re-materialized snapshot.

Single Snapshot

The port emits exactly one snapshot for the entire execution (not per update). Used for visualization operators whose output may exceed the memory limit, making repeated full-snapshot emission impractical.

12 - Parameter Reference

Complete reference for machine learning operator parameters

← Home

Available Parameter Sets

Parameter SetUsed ByOperators
SklearnAdvancedKNN2KNN Classifier, KNN Regressor
SklearnAdvancedSVC1SVM Classifier
SklearnAdvancedSVR1SVM Regressor

12.1 - SklearnAdvancedKNN Parameters

Hyperparameters accepted by SklearnAdvancedKNN

← Parameters Index

Used By

This parameter set is used by the following operators:

Parameters

ParameterType
n_neighborsint
pint
weightsstr
algorithmstr
leaf_sizeint
metricint
metric_paramsstr

12.2 - SklearnAdvancedSVC Parameters

Hyperparameters accepted by SklearnAdvancedSVC

← Parameters Index

Used By

This parameter set is used by the following operators:

Parameters

ParameterType
Cfloat
kernelstr
gammafloat
degreeint
coef0float
tolfloat
probability(lambda value: value.lower() == "true")

12.3 - SklearnAdvancedSVR Parameters

Hyperparameters accepted by SklearnAdvancedSVR

← Parameters Index

Used By

This parameter set is used by the following operators:

Parameters

ParameterType
Cfloat
kernelstr
gammafloat
degreeint
coef0float
tolfloat
shrinking(lambda value: value.lower() == "true")
verbose(lambda value: value.lower() == "true")
epsilonfloat
cache_sizeint
max_iterint
Apache Texera is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the ASF. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
Apache Texera, Texera, Apache, the Apache logo, and the Apache Texera project logo are either
registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.