Skip to content

Column Qualifiers

You can find an introduction to Column Qualifiers in our Getting Started section.

An Introduction to Column Qualifiers

Column qualifiers for pdpipe.



Bases: Exception

Raised when a transform is attempted with an unfitted column qualifier.

Source code in pdpipe/
class UnfittedColumnQualifierError(Exception):
    Raised when a transform is attempted with an unfitted column qualifier.


Bases: object

A fittable qualifier that returns column labels from an input dataframe.


Name Type Description Default
func callable

A callable that given an input pandas.DataFrame objects returns a list of labels of a subset of the columns of the input dataframe.

fittable bool, default True

If set to false, this qualifier becomes unfittable, and func is called on every call to transform. True by default.

subset bool, default False

If set to true, fitted qualifiers return the subset of fitted columns found in input dataframes during transform, in the order they appeared when fitted (NOT in the order they appear in the input dataframe of the transform). False by default, which means fitted qualifiers return the FULL list of fitted columns, ignoring input dataframes completely on transforms. When combined with most pipeline stages, this means the stage will fail on its precondition if trying to transform with it a dataframe that is missing some values in the fitted qualifier.



>>> import numpy as np; import pdpipe as pdp;
>>> cq = pdp.cq.ColumnQualifier(lambda df: [
...    l for l, s in df.iteritems()
...    if s.dtype == np.int64 and l in ['a', 'b', 5]
... ])
>>> cq
<ColumnQualifier: Qualify columns by function>
>>> col_drop = pdp.ColDrop(columns=cq)
Source code in pdpipe/
class ColumnQualifier(object):
    A fittable qualifier that returns column labels from an input dataframe.

    func : callable
        A callable that given an input pandas.DataFrame objects returns a list
        of labels of a subset of the columns of the input dataframe.
    fittable : bool, default True
        If set to false, this qualifier becomes unfittable, and `func` is
        called on every call to transform. True by default.
    subset : bool, default False
        If set to true, fitted qualifiers return the subset of fitted columns
        found in input dataframes during transform, in the order they appeared
        when fitted (NOT in the order they appear in the input dataframe of the
        transform). False by default, which means fitted qualifiers return the
        FULL list of fitted columns, ignoring input dataframes completely on
        transforms. When combined with most pipeline stages, this means the
        stage will fail on its precondition if trying to transform with it a
        dataframe that is missing some values in the fitted qualifier.

    >>> import numpy as np; import pdpipe as pdp;
    >>> cq = pdp.cq.ColumnQualifier(lambda df: [
    ...    l for l, s in df.iteritems()
    ...    if s.dtype == np.int64 and l in ['a', 'b', 5]
    ... ])
    >>> cq
    <ColumnQualifier: Qualify columns by function>
    >>> col_drop = pdp.ColDrop(columns=cq)

    def __init__(self, func, fittable=None, subset=None):
        if fittable is None:
            fittable = True
        self._cqfunc = func
        self.__doc__ = func.__doc__
        self._fittable = fittable
        self._subset = subset

    def __call__(self, X: pandas.DataFrame) -> List[object]:
        Return column labels of qualified columns from an input dataframe.

        X : pandas.DataFrame
            The input dataframe, from which columns are selected.

        list of objects
            A list of labels of the qualified columns for the input dataframe.
            return self.transform(X)
        except UnfittedColumnQualifierError:
            return self.fit_transform(X)

    def fit_transform(self, X: pandas.DataFrame) -> List[object]:

        Fit this qualifier and return the labels of the qualifying columns.

        X : pandas.DataFrame
            The input dataframe, from which columns are selected.

        list of objects
            A list of labels of the qualified columns for the input dataframe.
        self._columns = self._cqfunc(X)
        return self._columns

    def fit(self, X: pandas.DataFrame) -> None:
        Fit this qualifier on the input dataframe.

        X : pandas.DataFrame
            The input dataframe, from which columns are selected.

    def transform(self, X: pandas.DataFrame) -> List[object]:
        Apply and returns the labels of the qualifying columns.

        If this ColumnQualifier is fittable, it will return the list of column
        labels that was determined when fitted (or the subset of it that can
        be found in the input dataframe). It will throw an exception if it
        is not.

        X : pandas.DataFrame
            The input dataframe, from which columns are selected.

        list of objects
            A list of labels of the qualified columns for the input dataframe.
        if not self._fittable:
            return self._cqfunc(X)
            if self._subset:
                return [x for x in self._columns if x in X.columns]
            return self._columns
        except AttributeError:
            raise UnfittedColumnQualifierError

    def __repr__(self):
        fstr = ""
        if self._cqfunc.__doc__:  # pragma: no cover
            fstr = f" - {self._cqfunc.__doc__}"
        return f"<ColumnQualifier: Qualify columns by function{fstr}>"

    # --- overriding boolean operators ---

    def _x_inorderof_y(x, y):
        return [i for i in y if i in x]

    class _AndQualifierFunc(object):
        """A pickle-able AND qualifier class."""

        def __init__(self, first, second):
            self.first = first
            self.second = second

        def __call__(self, X):
            return ColumnQualifier._x_inorderof_y(

    def __and__(self, other):
            res_func = ColumnQualifier._AndQualifierFunc(
            res_func.__doc__ = (
                f"{self._cqfunc.__doc__ or 'Anonymous qualifier 1'} AND "
                f"{other._cqfunc.__doc__ or 'Anonymous qualifier 2'}"
            return ColumnQualifier(func=res_func)
        except AttributeError:
            return NotImplemented

    class _XorQualifierFunc(object):
        """A pickle-able XOR qualifier class."""

        def __init__(self, first, second):
            self.first = first
            self.second = second

        def __call__(self, X):
            return ColumnQualifier._x_inorderof_y(

    def __xor__(self, other):
            res_func = ColumnQualifier._XorQualifierFunc(
            res_func.__doc__ = (
                f"{self._cqfunc.__doc__ or 'Anonymous qualifier 1'} XOR "
                f"{other._cqfunc.__doc__ or 'Anonymous qualifier 2'}"
            return ColumnQualifier(func=res_func)
        except AttributeError:
            return NotImplemented

    class _OrQualifierFunc(object):
        """A pickle-able OR qualifier class."""

        def __init__(self, first, second):
            self.first = first
            self.second = second

        def __call__(self, X):
            return ColumnQualifier._x_inorderof_y(

    def __or__(self, other):
            res_func = ColumnQualifier._OrQualifierFunc(
            res_func.__doc__ = (
                f"{self._cqfunc.__doc__ or 'Anonymous qualifier 1'} OR "
                f"{other._cqfunc.__doc__ or 'Anonymous qualifier 2'}"
            return ColumnQualifier(func=res_func)
        except AttributeError:
            return NotImplemented

    class _SubQualifierFunc(object):
        """A pickle-able SUB qualifier class."""

        def __init__(self, first, second):
            self.first = first
            self.second = second

        def __call__(self, X):
            return ColumnQualifier._x_inorderof_y(

    def __sub__(self, other):
            res_func = ColumnQualifier._SubQualifierFunc(
            res_func.__doc__ = (
                f"{self._cqfunc.__doc__ or 'Anonymous qualifier 1'} NOT IN "
                f"{other._cqfunc.__doc__ or 'Anonymous qualifier 2'}"
            return ColumnQualifier(func=res_func)
        except AttributeError:
            return NotImplemented

    class _NotQualifierFunc(object):
        """A pickle-able NOT qualifier class."""

        def __init__(self, cq):
            self.cq = cq

        def __call__(self, X):
            return ColumnQualifier._x_inorderof_y(

    def __invert__(self):
        res_func = ColumnQualifier._NotQualifierFunc(cq=self._cqfunc)
        res_func.__doc__ = f"NOT {self._cqfunc.__doc__ or 'Anonymous qualifier'}"
        return ColumnQualifier(func=res_func)



Fit this qualifier and return the labels of the qualifying columns.


Name Type Description Default
X pandas.DataFrame

The input dataframe, from which columns are selected.



Type Description
list of objects

A list of labels of the qualified columns for the input dataframe.

Source code in pdpipe/
def fit_transform(self, X: pandas.DataFrame) -> List[object]:

    Fit this qualifier and return the labels of the qualifying columns.

    X : pandas.DataFrame
        The input dataframe, from which columns are selected.

    list of objects
        A list of labels of the qualified columns for the input dataframe.
    self._columns = self._cqfunc(X)
    return self._columns

Fit this qualifier on the input dataframe.


Name Type Description Default
X pandas.DataFrame

The input dataframe, from which columns are selected.

Source code in pdpipe/
def fit(self, X: pandas.DataFrame) -> None:
    Fit this qualifier on the input dataframe.

    X : pandas.DataFrame
        The input dataframe, from which columns are selected.

Apply and returns the labels of the qualifying columns.

If this ColumnQualifier is fittable, it will return the list of column labels that was determined when fitted (or the subset of it that can be found in the input dataframe). It will throw an exception if it is not.


Name Type Description Default
X pandas.DataFrame

The input dataframe, from which columns are selected.



Type Description
list of objects

A list of labels of the qualified columns for the input dataframe.

Source code in pdpipe/
def transform(self, X: pandas.DataFrame) -> List[object]:
    Apply and returns the labels of the qualifying columns.

    If this ColumnQualifier is fittable, it will return the list of column
    labels that was determined when fitted (or the subset of it that can
    be found in the input dataframe). It will throw an exception if it
    is not.

    X : pandas.DataFrame
        The input dataframe, from which columns are selected.

    list of objects
        A list of labels of the qualified columns for the input dataframe.
    if not self._fittable:
        return self._cqfunc(X)
        if self._subset:
            return [x for x in self._columns if x in X.columns]
        return self._columns
    except AttributeError:
        raise UnfittedColumnQualifierError


Bases: ColumnQualifier

Select all columns in input dataframes.


Name Type Description Default

Accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp;
>>> df = pd.DataFrame([[8,1],[5,2]], [1,2], ['a', 'b'])
>>> cq = pdp.cq.AllColumns()
>>> cq
<ColumnQualifier: Qualify all columns>
>>> cq(df)
['a', 'b']
>>> df2 = pd.DataFrame([[8,1],[5,2]], [1,2], ['b', 'c'])
>>> cq(df2)
['a', 'b']
>>> cq = pdp.cq.AllColumns(fittable=False)
>>> cq(df)
['a', 'b']
>>> cq(df2)
['b', 'c']
>>> cq = pdp.cq.AllColumns(subset=True)
>>> cq(df)
['a', 'b']
>>> cq(df2)
Source code in pdpipe/
class AllColumns(ColumnQualifier):
    Select all columns in input dataframes.

        Accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp;
    >>> df = pd.DataFrame([[8,1],[5,2]], [1,2], ['a', 'b'])
    >>> cq = pdp.cq.AllColumns()
    >>> cq
    <ColumnQualifier: Qualify all columns>
    >>> cq(df)
    ['a', 'b']
    >>> df2 = pd.DataFrame([[8,1],[5,2]], [1,2], ['b', 'c'])
    >>> cq(df2)
    ['a', 'b']
    >>> cq = pdp.cq.AllColumns(fittable=False)
    >>> cq(df)
    ['a', 'b']
    >>> cq(df2)
    ['b', 'c']
    >>> cq = pdp.cq.AllColumns(subset=True)
    >>> cq(df)
    ['a', 'b']
    >>> cq(df2)

    class _SelectAllColumns(object):
        def __call__(self, X):
            return list(X.columns)

    def __init__(self, **kwargs):
        kwargs["func"] = AllColumns._SelectAllColumns()

    def __repr__(self):
        return "<ColumnQualifier: Qualify all columns>"


Bases: ColumnQualifier

A fittable column qualifier based on a per-column condition.


Name Type Description Default
cond callable

A callable that given an input pandas.Series object returns a boolean value.

safe bool, default False

If set to True, every call to given condition cond is is wrapped in a way that interprets every raised exception as a returned False value. This is useful when generating qualifiers based on conditions that assume a specific datatype for the checked column.


Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp;
>>> df = pd.DataFrame(
...    [[1, 2, 'A'],[4, 1, 'C']], [1,2], ['age', 'count', 'grade'])
>>> cq = pdp.cq.ByColumnCondition(lambda s: s.sum() > 3, safe=True)
>>> cq(df)
Source code in pdpipe/
class ByColumnCondition(ColumnQualifier):
    A fittable column qualifier based on a per-column condition.

    cond : callable
        A callable that given an input pandas.Series object returns a boolean
    safe : bool, default False
        If set to True, every call to given condition `cond` is is wrapped in
        a way that interprets every raised exception as a returned False value.
        This is useful when generating qualifiers based on conditions that
        assume a specific datatype for the checked column.
        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp;
    >>> df = pd.DataFrame(
    ...    [[1, 2, 'A'],[4, 1, 'C']], [1,2], ['age', 'count', 'grade'])
    >>> cq = pdp.cq.ByColumnCondition(lambda s: s.sum() > 3, safe=True)
    >>> cq(df)

    class _SafeCond(object):
        def __init__(self, cond):
            self.cond = cond

        def __call__(self, series):
                return self.cond(series)
            except Exception:
                return False

    class _ColumnConditionChecker(object):
        def __init__(self, cond):
            self.cond = cond

        def __call__(self, X):
            return list([lbl for lbl, series in X.iteritems() if self.cond(series)])

    def __init__(self, cond, safe=False, **kwargs):
        self._cond = cond
        if safe:
            self._cond = ByColumnCondition._SafeCond(cond)
        kwargs["func"] = ByColumnCondition._ColumnConditionChecker(self._cond)


Bases: ColumnQualifier

Select all columns with the given label or labels.


Name Type Description Default
labels single label or list-like

Column labels which qualify.


Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp;
>>> df = pd.DataFrame(
...    [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cq = pdp.cq.ByLabels('num')
>>> cq(df)
>>> cq = pdp.cq.ByLabels(['chr', 'nur'])
>>> cq(df)
['chr', 'nur']
>>> cq = pdp.cq.ByLabels(['num', 'foo'])
>>> cq(df)
Source code in pdpipe/
class ByLabels(ColumnQualifier):
    Select all columns with the given label or labels.

    labels : single label or list-like
        Column labels which qualify.
        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp;
    >>> df = pd.DataFrame(
    ...    [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
    >>> cq = pdp.cq.ByLabels('num')
    >>> cq(df)
    >>> cq = pdp.cq.ByLabels(['chr', 'nur'])
    >>> cq(df)
    ['chr', 'nur']
    >>> cq = pdp.cq.ByLabels(['num', 'foo'])
    >>> cq(df)

    class _LabelsQualifierFunc(object):
        def __init__(self, labels):
            self.labels = labels

        def __call__(self, X):
            return [lbl for lbl in X.columns if lbl in self.labels]

    def __init__(self, labels, **kwargs):
        if isinstance(labels, str) or not hasattr(labels, "__iter__"):
            labels = [labels]
        self._labels = labels
        self._labels_str = _list_str(self._labels)
        cqfunc = ByLabels._LabelsQualifierFunc(self._labels)
        cqfunc.__doc__ = f"Columns with labels in {self._labels_str}"
        self.__doc__ = cqfunc.__doc__
        kwargs["func"] = cqfunc

    def __repr__(self):
        return f"<ColumnQualifier: By labels in {self._labels_str}>"


Bases: ColumnQualifier

Select all columns that start with the given string.


Name Type Description Default
prefix str

The prefix which qualifies columns.


Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp;
>>> df = pd.DataFrame(
...    [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
>>> cq = pdp.cq.StartsWith('nu')
>>> cq
<ColumnQualifier: Columns starting with nu>
>>> cq(df)
['num', 'nur']
Source code in pdpipe/
class StartsWith(ColumnQualifier):
    Select all columns that start with the given string.

    prefix : str
        The prefix which qualifies columns.
        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp;
    >>> df = pd.DataFrame(
    ...    [[8,'a',5],[5,'b',7]], [1,2], ['num', 'chr', 'nur'])
    >>> cq = pdp.cq.StartsWith('nu')
    >>> cq
    <ColumnQualifier: Columns starting with nu>
    >>> cq(df)
    ['num', 'nur']

    def _safe_startwith(string, prefix):
            return string.startswith(prefix)
        except AttributeError:
            return False

    class _StartsWithFunc(object):
        def __init__(self, prefix):
            self.prefix = prefix

        def __call__(self, X):
            return [
                lbl for lbl in X.columns if StartsWith._safe_startwith(lbl, self.prefix)

    def __init__(self, prefix, **kwargs):
        self._prefix = prefix
        cqfunc = StartsWith._StartsWithFunc(prefix)
        cqfunc.__doc__ = f"Columns that start with {self._prefix}"
        self.__doc__ = cqfunc.__doc__
        kwargs["func"] = cqfunc

    def __repr__(self):
        return f"<ColumnQualifier: Columns starting with {self._prefix}>"


Bases: ColumnQualifier

Select all columns that are of a given dtypes.

Use dtypes=np.number to qualify all numeric columns.


Name Type Description Default
dtypes object or list of objects

The dtype or dtypes which qualify columns. Support all valid arguments to the include parameter of pandas.DataFrame.select_dtypes().


Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp; import numpy as np;
>>> df = pd.DataFrame(
...    [[8.2,'a',5],[5.1,'b',7]], [1,2], ['ph', 'grade', 'age'])
>>> cq = pdp.cq.OfDtypes(np.number)
>>> cq(df)
['ph', 'age']
>>> cq = pdp.cq.OfDtypes([np.number, object])
>>> cq(df)
['ph', 'grade', 'age']
>>> cq = pdp.cq.OfDtypes(np.int64)
>>> cq
<ColumnQualifier: With dtypes in <class 'numpy.int64'>>
>>> cq(df)
Source code in pdpipe/
class OfDtypes(ColumnQualifier):
    Select all columns that are of a given dtypes.

    Use `dtypes=np.number` to qualify all numeric columns.

    dtypes : object or list of objects
        The dtype or dtypes which qualify columns. Support all valid arguments
        to the `include` parameter of pandas.DataFrame.select_dtypes().
        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp; import numpy as np;
    >>> df = pd.DataFrame(
    ...    [[8.2,'a',5],[5.1,'b',7]], [1,2], ['ph', 'grade', 'age'])
    >>> cq = pdp.cq.OfDtypes(np.number)
    >>> cq(df)
    ['ph', 'age']
    >>> cq = pdp.cq.OfDtypes([np.number, object])
    >>> cq(df)
    ['ph', 'grade', 'age']
    >>> cq = pdp.cq.OfDtypes(np.int64)
    >>> cq
    <ColumnQualifier: With dtypes in <class 'numpy.int64'>>
    >>> cq(df)

    class _OfDtypeFunc(object):
        def __init__(self, dtypes):
            self.dtypes = dtypes

        def __call__(self, X):
            return list(X.select_dtypes(include=self.dtypes).columns)

    def __init__(self, dtypes, **kwargs):
        self._dtypes = dtypes
        self._dtypes_str = _list_str(self._dtypes)
        cqfunc = OfDtypes._OfDtypeFunc(dtypes)
        cqfunc.__doc__ = f"Columns of dtypes {self._dtypes_str}"
        self.__doc__ = cqfunc.__doc__
        kwargs["func"] = cqfunc

    def __repr__(self):
        return f"<ColumnQualifier: With dtypes in {self._dtypes_str}>"


Bases: OfDtypes

Select all columns that are of a numeric dtypes.


Name Type Description Default

Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp; import numpy as np;
>>> df = pd.DataFrame(
...    [[8.2,'a',5],[5.1,'b',7]], [1,2], ['ph', 'grade', 'age'])
>>> cq = pdp.cq.OfNumericDtypes()
>>> cq
<ColumnQualifier: With dtypes in <class 'numpy.number'>>
>>> cq(df)
['ph', 'age']
Source code in pdpipe/
class OfNumericDtypes(OfDtypes):
    Select all columns that are of a numeric dtypes.

        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp; import numpy as np;
    >>> df = pd.DataFrame(
    ...    [[8.2,'a',5],[5.1,'b',7]], [1,2], ['ph', 'grade', 'age'])
    >>> cq = pdp.cq.OfNumericDtypes()
    >>> cq
    <ColumnQualifier: With dtypes in <class 'numpy.number'>>
    >>> cq(df)
    ['ph', 'age']

    def __init__(self, **kwargs):
        kwargs["dtypes"] = np.number


Bases: ColumnQualifier

Select all columns with no more than X missing values.


Name Type Description Default
n_missing int

The maximum number of missing values with which columns can still qualify.


Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp; import numpy as np;
>>> df = pd.DataFrame(
...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
>>> cq = pdp.cq.WithAtMostMissingValues(1)
>>> cq
<ColumnQualifier: With at most 1 missing values>

['grade', 'age']

Source code in pdpipe/
class WithAtMostMissingValues(ColumnQualifier):
    Select all columns with no more than X missing values.

    n_missing : int
        The maximum number of missing values with which columns can still
        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp; import numpy as np;
    >>> df = pd.DataFrame(
    ...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
    >>> cq = pdp.cq.WithAtMostMissingValues(1)
    >>> cq
    <ColumnQualifier: With at most 1 missing values>

    ['grade', 'age']

    class _AtMostFunc(object):
        def __init__(self, n_missing):
            self._n_missing = n_missing

        def __call__(self, X):
            return list(X.columns[X.isna().sum() <= self._n_missing])

    def __init__(self, n_missing, **kwargs):
        self._n_missing = n_missing
        cqfunc = WithAtMostMissingValues._AtMostFunc(n_missing)
        cqfunc.__doc__ = f"Columns with at most {self._n_missing} missing values"
        self.__doc__ = cqfunc.__doc__
        kwargs["func"] = cqfunc

    def __repr__(self):
        return f"<ColumnQualifier: " f"With at most {self._n_missing} missing values>"


Bases: WithAtMostMissingValues

Select all columns with no missing values.


Name Type Description Default

Accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp; import numpy as np;
>>> df = pd.DataFrame(
...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
>>> cq = pdp.cq.WithoutMissingValues()
>>> cq
<ColumnQualifier: Without missing values>
>>> cq(df)
Source code in pdpipe/
class WithoutMissingValues(WithAtMostMissingValues):
    Select all columns with no missing values.

        Accepts all keyword arguments of the constructor of ColumnQualifier.
        See the documentation of `ColumnQualifier` for details.

    >>> import pandas as pd; import pdpipe as pdp; import numpy as np;
    >>> df = pd.DataFrame(
    ...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
    >>> cq = pdp.cq.WithoutMissingValues()
    >>> cq
    <ColumnQualifier: Without missing values>
    >>> cq(df)

    def __init__(self, **kwargs):
        kwargs["n_missing"] = 0

    def __repr__(self):
        return "<ColumnQualifier: Without missing values>"


Bases: ColumnQualifier

Select all columns with no more than P% missing values.


Name Type Description Default
rate float, between 0 and 1

The maximum rate of missing values with which columns can still qualify.


Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp; import numpy as np;
>>> df = pd.DataFrame(
...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
>>> cq = pdp.cq.WithAtMostMissingValueRate(0.6)
>>> cq
<ColumnQualifier: With at most 0.6 missing value rate>
>>> cq(df)
['grade', 'age']
Source code in pdpipe/
class WithAtMostMissingValueRate(ColumnQualifier):
    Select all columns with no more than P% missing values.

    rate : float, between 0 and 1
        The maximum rate of missing values with which columns can still
        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp; import numpy as np;
    >>> df = pd.DataFrame(
    ...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
    >>> cq = pdp.cq.WithAtMostMissingValueRate(0.6)
    >>> cq
    <ColumnQualifier: With at most 0.6 missing value rate>
    >>> cq(df)
    ['grade', 'age']

    class _AtMostRateFunc(object):
        def __init__(self, rate):
            self._rate = rate

        def __call__(self, X):
            return list(X.columns[(X.isna().sum() / len(X)) <= self._rate])

    def __init__(self, rate, **kwargs):
        self._rate = rate
        cqfunc = WithAtMostMissingValueRate._AtMostRateFunc(rate)
        cqfunc.__doc__ = f"Columns with at most {self._rate} missing value rate"
        self.__doc__ = cqfunc.__doc__
        kwargs["func"] = cqfunc

    def __repr__(self):
        return f"<ColumnQualifier: " f"With at most {self._rate} missing value rate>"


Bases: ColumnQualifier

Select all columns with no less than P% missing values.


Name Type Description Default
rate float, between 0 and 1

The minimum rate of missing values with which columns can still qualify.


Additionaly accepts all keyword arguments of the constructor of ColumnQualifier. See the documentation of ColumnQualifier for details.



>>> import pandas as pd; import pdpipe as pdp; import numpy as np;
>>> df = pd.DataFrame(
...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
>>> cq = pdp.cq.WithAtLeastMissingValueRate(0.6)
>>> cq
<ColumnQualifier: With at least 0.6 missing value rate>
>>> cq(df)
Source code in pdpipe/
class WithAtLeastMissingValueRate(ColumnQualifier):
    Select all columns with no less than P% missing values.

    rate : float, between 0 and 1
        The minimum rate of missing values with which columns can still
        Additionaly accepts all keyword arguments of the constructor of
        ColumnQualifier. See the documentation of `ColumnQualifier` for

    >>> import pandas as pd; import pdpipe as pdp; import numpy as np;
    >>> df = pd.DataFrame(
    ...    [[None, 1, 2],[None, None, 5]], [1,2], ['ph', 'grade', 'age'])
    >>> cq = pdp.cq.WithAtLeastMissingValueRate(0.6)
    >>> cq
    <ColumnQualifier: With at least 0.6 missing value rate>
    >>> cq(df)

    class _AtLeastRateFunc(object):
        def __init__(self, rate):
            self._rate = rate

        def __call__(self, X):
            return list(X.columns[(X.isna().sum() / len(X)) >= self._rate])

    def __init__(self, rate, **kwargs):
        self._rate = rate
        cqfunc = WithAtLeastMissingValueRate._AtLeastRateFunc(rate)
        cqfunc.__doc__ = f"Columns with at least {self._rate} missing value rate"
        self.__doc__ = cqfunc.__doc__
        kwargs["func"] = cqfunc

    def __repr__(self):
        return f"<ColumnQualifier: " f"With at least {self._rate} missing value rate>"



Return True for objects that are fittable ColumnQualifier objects.


Name Type Description Default
obj object

The object to examine.



Type Description

True if the given object is an instance of ColumnQualifier and fittable, False otherwise.

Source code in pdpipe/
def is_fittable_column_qualifier(obj: object) -> bool:
    Return True for objects that are fittable ColumnQualifier objects.

    obj : object
        The object to examine.

        True if the given object is an instance of ColumnQualifier and
        fittable, False otherwise.
    return isinstance(obj, ColumnQualifier) and obj._fittable


Convert the given columns parameter to an equivalent column qualifier.


Name Type Description Default
columns single label, list-like or callable

The label, or an iterable of labels, of columns. Alternatively, this parameter can be assigned a callable returning an iterable of labels from an input pandas.DataFrame. See pdpipe.cq.



Type Description

The equivalent ColumnQualifier object.


>>> import pdpipe as pdp;
>>> pdp.cq.columns_to_qualifier('nu')
<ColumnQualifier: By labels in nu>
>>> pdp.cq.columns_to_qualifier(['nu', 'bu'])
<ColumnQualifier: By labels in nu, bu>
>>> pdp.cq.columns_to_qualifier(lambda df: [l for l in df.columns])
<ColumnQualifier: Qualify columns by function>
Source code in pdpipe/
def columns_to_qualifier(columns) -> ColumnQualifier:
    Convert the given columns parameter to an equivalent column qualifier.

    columns : single label, list-like or callable
        The label, or an iterable of labels, of columns. Alternatively,
        this parameter can be assigned a callable returning an iterable of
        labels from an input pandas.DataFrame. See `pdpipe.cq`.

        The equivalent ColumnQualifier object.

    >>> import pdpipe as pdp;
    >>> pdp.cq.columns_to_qualifier('nu')
    <ColumnQualifier: By labels in nu>
    >>> pdp.cq.columns_to_qualifier(['nu', 'bu'])
    <ColumnQualifier: By labels in nu, bu>
    >>> pdp.cq.columns_to_qualifier(lambda df: [l for l in df.columns])
    <ColumnQualifier: Qualify columns by function>
    if callable(columns):
        if isinstance(columns, ColumnQualifier):
            return columns
        return ColumnQualifier(columns, fittable=False)
    return ByLabels(columns)

Last update: 2022-01-23