Local Correlation Integral (LOCI) is a density based approach for outlier analysis. It is local in nature, i.e., uses only nearby data points in terms of distance to compute density of a point. In this algorithm we have one tunable parameter - $latex \delta&s=2 $. Personally, I believe that we need to tune $latex k&s=2 $ also according to data distribution. LOCI works with following steps

  1.  Compute density, $latex M(X,\epsilon)&s=2 $ of data point $latex X&s=2 $ as the number of neighbours within distance $latex \epsilon&s=2 $. Here, density is known as counting neighbourhood of data point $latex X&s=2 $
    $latex
    M(X,\epsilon) = COUNT_{(Y:dist(X,Y) \leq \epsilon; Y \in datapoints )} Y&s=2$
  2.  Compute average density, $latex AM(X,\epsilon,\delta)&s=2 $ of data point $latex   X&s=2 $ as the MEAN(density of neighbours of $latex X&s=2$ within distance, $latex   \delta&s=2$). Here, $latex \delta&s=2$ is known as sampling neighbourhood of $latex X&s=2$
    $latex AM(X,\epsilon,\delta) = MEAN_{(Y:dist(X,Y) \leq \delta)} M(Y,\epsilon)&s=2$

    The value of $latex \epsilon&s=2$ is always set to be half of $latex \delta&s=2$ in order to enable fast approximation. Therefore, we need to tune $latex  \delta&s=2$ for accuracy without touching $latex \epsilon&s=2$

  3.  Compute Multi-Granularity Deviation Factor (MDEF) at distance, $latex  \delta&s=2$ as                                                                                                                                                                    $latex {MDEF}(X,\epsilon,\delta) = \frac{AM(X,\epsilon,\delta) - M(X,\epsilon)}{AM(X,\epsilon,\delta)}&s=3 $ 

    This factor shows the deviation of $latex  M&s=2$ from $latex  AM&s=2$ for $latex  X&s=2$. Since this computation only considers local/neighbour points, therefore LOCI is referred as local in nature. The larger the value of MDEF, the greater is the outlier score. We use multiple values of $latex  \delta&s=2$ to compute MDEF. Mostly we start with a radius containing 20 points to a maximum of radius spanning most of data.

  4. In this step, the deviation of $latex  M&s=2$ from $latex  AM&s=2$ is converted into binary label, i.e., whether $latex  X&s=2$ is outlier or not. For this, we use $latex  \sigma(X,\epsilon,\delta)&s=2$ metric as
    $latex \begin{aligned}{\sigma}(X,\epsilon,\delta) = \frac{STD_{(Y:dist(X,Y) \leq \delta)}M(Y,\epsilon)}{AM(X,\epsilon,\delta)}&s=2 \end{aligned}$
    Here, STD refers to standard deviation.
  5.  A data point,$latex  X&s=2$ is declared as an outlier if its MDEF value is greater than $latex  k. \sigma(X,\epsilon,\delta)&s=2 $, where $latex  k&s=2$ is chosen to be 3.

Reference:

  1.  I have understood this algorithm from book: Outlier Analysis by Charu Aggarwal