Creating Pipeline Stages
You can create stages with the following syntax:
All pipeline stages have a predefined precondition function that returns True for dataframes to which the stage can be applied. By default, pipeline stages raise an exception if a DataFrame not meeting their precondition is piped through. This behaviour can be set per-stage by assigning
exraise with a bool in the constructor call. If
exraise is set to
False the input DataFrame is instead returned without change:
Applying Pipeline Stages
You can apply a pipeline stage to a DataFrame using its
Pipeline stages are also callables, making the following syntax equivalent:
The initialized exception behaviour of a pipeline stage can be overridden on a per-application basis:
Additionally, to have an explanation message print after the precondition is checked but before the application of the pipeline stage, pass
All pipeline stages also adhere to the
scikit-learn transformer API, and so have
transform methods; these behave exactly like
apply, and accept the input dataframe as parameter
X. For the same reason, pipeline stages also have a
fit method, which applies them but returns the input dataframe unchanged.
Fittable Pipeline Stages
Some pipeline stages can be fitted, meaning that some transformation parameters are set the first time a dataframe is piped through the stage, while later applications of the stage use these now-set parameters without changing them; the
Encode scikit-learn-dependent stage is a good example.
For these type of stages the first call to
apply will both fit the stage and transform the input dataframe, while subsequent calls to
apply will transform input dataframes according to the already-fitted transformation parameters.
Additionally, for fittable stages the
scikit-learn transformer API methods behave as expected:
fitsets the transformation parameters of the stage but returns the input dataframe unchanged.
fit_transformboth sets the transformation parameters of the stage and returns the input dataframe after transformation.
transformtransforms input dataframes according to already-fitted transformation parameters; if the stage is not fitted, an
fit_transform and are all of equivalent for non-fittable pipeline stages. And in all cases the
y parameter of these methods is ignored.