Skip to content

Pierce's criterion

Run Peirce's criterion to reject data.

Implementation after Ross (2003) using calculation method for table from Wikipedia. Note that the table that Ross (2003) presents is for R, which is the square root of what x**2 means in Gould (1855). Also, the first value of Ross (2003) for three observations, one doubtful value seems to off by a little bit. The rest of the table agrees well.

To import:

from rttools imoprt peirce

peirce_criterion(n_tot, n, m=1)

Peirce's criterion

Returns the threshold error deviation for outlier identification using Peirce's criterion based on Gould's methodology. This routine is heavily copied from Wikipedia

Parameters:

Name Type Description Default
n_tot int

Total number of observations.

required
n int

Number of outliers to be removed.

required
m int

Number of model unknowns, defaults to 1.

1

Returns:

Type Description
float

Error threshold R (Ross, 2003) / Square root of x**2 (Gould, 1955)

Source code in src/rttools/peirce.py
def peirce_criterion(n_tot: int, n: int, m: int = 1) -> float:
    """Peirce's criterion

    Returns the threshold error deviation for outlier identification
    using Peirce's criterion based on Gould's methodology.
    This routine is heavily copied from Wikipedia

    :param n_tot: Total number of observations.
    :param n: Number of outliers to be removed.
    :param m: Number of model unknowns, defaults to 1.

    :return: Error threshold `R` (Ross, 2003) / Square root of `x**2` (Gould, 1955)
    """
    # Check number of observations:
    if n_tot > 1:
        # Calculate Q (Nth root of Gould's equation B):
        q_cap = (n ** (n / n_tot) * (n_tot - n) ** ((n_tot - n) / n_tot)) / n_tot

        # Initialize R values (as floats)
        r_new = 1.0
        r_old = 0.0
        #
        # Start iteration to converge on R:
        while abs(r_new - r_old) > (n_tot * 2.0e-16):
            # Calculate Lamda
            # (1/(N-n)th root of Gould's equation A'):
            ldiv = r_new**n
            if ldiv == 0:
                ldiv = 1.0e-6
            lambda_g = ((q_cap**n_tot) / (ldiv)) ** (1.0 / (n_tot - n))
            # Calculate x-squared (Gould's equation C):
            x2 = 1.0 + (n_tot - m - n) / n * (1.0 - lambda_g**2.0)
            # If x2 goes negative, return 0:
            if x2 < 0:
                x2 = 0.0
                r_old = r_new
            else:
                # Use x-squared to update R (Gould's equation D):
                r_old = r_new
                r_new = np.exp((x2 - 1) / 2.0) * special.erfc(
                    np.sqrt(x2) / np.sqrt(2.0)
                )
    else:
        x2 = 0.0
    return np.sqrt(x2)

reject_outliers(data, m=1)

Applies Peirce's criterion to reject outliers.

Algorithm implmeneted as given by Ross (2003).

Parameters:

Name Type Description Default
data ndarray

All data points.

required
m int

Number of model unknowns, defaults to 1.

1

Returns:

Type Description
Tuple[float, float, ndarray]

New average and standard deviation, Array with the outliers.

Source code in src/rttools/peirce.py
def reject_outliers(data: np.ndarray, m: int = 1) -> Tuple[float, float, np.ndarray]:
    """Applies Peirce's criterion to reject outliers.

    Algorithm implmeneted as given by Ross (2003).

    :param data: All data points.
    :param m: Number of model unknowns, defaults to 1.

    :return: New average and standard deviation, Array with the outliers.
    """
    data = np.array(data)  # just making sure it's a numpy array

    avg = np.average(data)
    std = np.std(data)
    n_tot = len(data)

    outliers = []
    diffs = np.abs(data - avg)

    for it in range(len(data)):  # check for every data point if it should be rejected
        max_diff = diffs.max()
        max_ind = diffs.argmax()

        rejection_limit = peirce_criterion(n_tot, it + 1, m)
        if max_diff > rejection_limit * std:
            outliers.append(data[max_ind])
            # delete max from diffs and data
            data = np.delete(data, max_ind)
            diffs = np.delete(diffs, max_ind)
        else:
            break  # we are done rejecting

    avg_new = float(np.average(data))
    std_new = float(np.std(data))

    return avg_new, std_new, np.array(outliers)