Condition Objects
You can find an introduction to Conditions in our Getting Started section.
An Introduction to Conditions :fontawesome-solid-leanpub:
Fittable conditions for pdpipe.
Classes
UnfittedConditionError
Condition
Bases: object
A fittable condition that returns a boolean value from a dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func |
callable
|
A callable that given an input pandas.DataFrame objects returns a boolean value. |
required |
fittable |
bool, default False
|
If set to True, this condition becomes fittable, and |
None
|
error_message |
str, default None
|
A string that describes the error when the condition fails. |
None
|
Examples:
>>> import numpy as np; import pdpipe as pdp;
>>> cond = pdp.cond.Condition(lambda X: 'a' in X.columns)
>>> cond
<pdpipe.Condition: By function>
>>> col_drop = pdp.ColDrop(['lbl'], prec=cond)
Source code in pdpipe/cond.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
Functions
fit_transform(X)
Fit this condition and returns the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
The input dataframe on which the condition is checked. |
required |
Returns:
Type | Description |
---|---|
bool
|
Either True or False. |
Source code in pdpipe/cond.py
fit(X)
Fit this condition on the input dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
The input dataframe on which the condition is checked. |
required |
transform(X)
Return the result of this condition.
Is this Condition is fittable, it will return the result that was determined when fitted, if it's fitted, and throw an exception if it is not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
The input dataframe on which the condition is checked. |
required |
Returns:
Type | Description |
---|---|
bool
|
Either True or False. |
Source code in pdpipe/cond.py
PerColumnCondition
Bases: Condition
Check whether the columns of input dataframes satisfy a condition set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conditions |
callable or list-like
|
The condition, or set of conditions, that columns of input dataframes
must satisfy. Conditions are callables that accept a |
required |
conditions_reduce |
str, default 'all'
|
How condition satisfaction results are reduced per-column, in case of multiple conditions. 'all' requires a column to satisfy all conditions, while 'any' requires at least one condition to be satisfied. |
None
|
columns_reduce |
str, default 'all'
|
How condition satisfaction results are reduced among multiple columns.
'all' requires all columns of input dataframes to satisfy the given
condition (in the case of multiple conditions, behaviour is determined
by the |
None
|
**kwargs |
Additionaly accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp; import numpy as np;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.PerColumnCondition(
... conditions=lambda x: x.dtype == np.int64,
... )
>>> cond
<pdpipe.Condition: Dataframes with all columns satisfying all conditions: anonymous condition>
>>> cond(X)
False
>>> cond = pdp.cond.PerColumnCondition(
... conditions=lambda x: x.dtype == np.int64,
... columns_reduce='any',
... )
>>> cond(X)
True
>>> cond = pdp.cond.PerColumnCondition(
... conditions=[
... lambda x: x.dtype == np.int64,
... lambda x: x.dtype == object,
... ],
... )
>>> cond(X)
False
>>> cond = pdp.cond.PerColumnCondition(
... conditions=[
... lambda x: x.dtype == np.int64,
... lambda x: x.dtype == object,
... ],
... conditions_reduce='any',
... )
>>> cond(X)
True
Source code in pdpipe/cond.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 |
|
HasAllColumns
Bases: Condition
Check whether input dataframes contain a list of columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
labels |
single label or list-like
|
Column labels to check for. |
required |
**kwargs |
Additionaly accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.HasAllColumns('num')
>>> cond
<pdpipe.Condition: Has all columns in num>
>>> cond(X)
True
>>> cond = pdp.cond.HasAllColumns(['num', 'chr'])
>>> cond(X)
True
>>> cond = pdp.cond.HasAllColumns(['num', 'gar'])
>>> cond(X)
False
Source code in pdpipe/cond.py
ColumnsFromList
Bases: PerColumnCondition
Check whether input dataframes contain columns from a list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
labels |
single label or list-like
|
Column labels to check for. |
required |
columns_reduce |
str, default 'all'
|
How condition satisfaction results are reduced among multiple columns. 'all' requires all columns of input dataframes to satisfy the given condition, while 'any' requires at least one column to satisfy it. |
None
|
**kwargs |
Additionaly accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.ColumnsFromList('num')
>>> cond
<pdpipe.Condition: Dataframes with all columns satisfying all conditions: Series with labels in num>
>>> cond(X)
False
>>> cond = pdp.cond.ColumnsFromList(['num', 'chr', 'nur'])
>>> cond(X)
True
>>> cond = pdp.cond.ColumnsFromList(
... ['num', 'gar'], columns_reduce='any')
>>> cond(X)
True
Source code in pdpipe/cond.py
HasNoColumn
Bases: Condition
Check whether input dataframes contains no column from a list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
labels |
single label or list-like
|
Column labels to check for. |
required |
**kwargs |
Additionaly accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.HasNoColumn('num')
>>> cond
<pdpipe.Condition: Has no column in num>
>>> cond(X)
False
>>> cond = pdp.cond.HasNoColumn(['num', 'gar'])
>>> cond(X)
False
>>> cond = pdp.cond.HasNoColumn(['ph', 'gar'])
>>> cond(X)
True
Source code in pdpipe/cond.py
HasAtMostMissingValues
Bases: Condition
Check if a dataframes has no more than X missing values across all columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_missing |
int or float
|
If int, then interpreted as the maximal allowed number of missing values in input dataframes. If float, interpreted as the maximal allowed ratio of missing values in input dataframes. |
required |
**kwargs |
Additionally accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[None,'a',5],[5,None,7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.HasAtMostMissingValues(1)
>>> cond
<pdpipe.Condition: Has at most 1 missing values>
>>> cond(X)
False
>>> cond = pdp.cond.HasAtMostMissingValues(2)
>>> cond(X)
True
>>> cond = pdp.cond.HasAtMostMissingValues(0.4)
>>> cond(X)
True
>>> cond = pdp.cond.HasAtMostMissingValues(0.2)
>>> cond(X)
False
Source code in pdpipe/cond.py
HasNoMissingValues
Bases: HasAtMostMissingValues
Check whether input dataframes has no missing values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs |
Accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[None,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.HasNoMissingValues()
>>> cond
<pdpipe.Condition: Has no missing values>
>>> cond(X)
False
Source code in pdpipe/cond.py
AlwaysTrue
Bases: Condition
A condition letting all dataframes through, always returning True.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs |
Accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.AlwaysTrue()
>>> cond
<pdpipe.Condition: AlwaysTrue>
>>> cond(X)
True
Source code in pdpipe/cond.py
HasAtMostNQualifyingColumns
Bases: Condition
Check whether a dataframe has at most N columns statisfying a qualifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
The maximal number of columns that should satisfy the qualifier. |
required |
qualifier |
callable
|
A function that takes a pandas.DataFrame and returns the labels of the subset of qualifying columns. See the pdp.cq module. |
required |
**kwargs |
Additionaly accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.HasAtMostNQualifyingColumns(
... n=2, qualifier=pdp.cq.StartsWith('n'))
>>> cond
<pdpipe.Condition: Has at most 2 columns qualifying <ColumnQualifier: Columns starting with n>>
>>> cond(X)
True
>>> cond = pdp.cond.HasAtMostNQualifyingColumns(
... n=1, qualifier=pdp.cq.StartsWith('n'))
>>> cond(X)
False
Source code in pdpipe/cond.py
HasAtLeastNQualifyingColumns
Bases: Condition
Check if a dataframe has at least N columns statisfying a qualifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
The minimal number of columns that should satisfy the qualifier. |
required |
qualifier |
callable
|
A function that takes a pandas.DataFrame and returns the labels of the subset of qualifying columns. See the pdp.cq module. |
required |
**kwargs |
Additionaly accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.HasAtLeastNQualifyingColumns(
... n=2, qualifier=pdp.cq.StartsWith('n'))
>>> cond
<pdpipe.Condition: Has at least 2 columns qualifying <ColumnQualifier: Columns starting with n>>
>>> cond(X)
True
>>> cond = pdp.cond.HasAtLeastNQualifyingColumns(
... n=3, qualifier=pdp.cq.StartsWith('n'))
>>> cond(X)
False
Source code in pdpipe/cond.py
HasNoQualifyingColumns
Bases: HasAtMostNQualifyingColumns
Check whether a dataframe has no columns statisfying a qualifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qualifier |
callable
|
A function that takes a pandas.DataFrame and returns the labels of the subset of qualifying columns. See the pdp.cq module. |
required |
**kwargs |
Additionaly accepts all keyword arguments of the constructor of Condition. See the documentation of Condition for details. |
{}
|
Examples:
>>> import pandas as pd; import pdpipe as pdp;
>>> X = pd.DataFrame(
... [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cond = pdp.cond.HasNoQualifyingColumns(
... qualifier=pdp.cq.StartsWith('n'))
>>> cond
<pdpipe.Condition: Has no columns qualifying <ColumnQualifier: Columns starting with n>>
>>> cond(X)
False