Statistics

Pythagorean Means

Arithmetic, Geometric, and Harmonic Means are all 'Pythagorean'.

See the wikipedia page on Pythagorean Means for more.

Arithmetic, Geometric, and Harmonic Mean Examples

Imports

import cats.implicits._

import spire.math.Real
import spire.algebra.Field
import spire.algebra.NRoot

import axle.math._

implicit val fieldDouble: Field[Double] = spire.implicits.DoubleAlgebra
implicit val nrootDouble: NRoot[Double] = spire.implicits.DoubleAlgebra

Examples

Arithmetic mean

arithmeticMean(List(2d, 3d, 4d, 5d))
// res0: Double = 3.5

Geometric mean

geometricMean[Real, List](List(1d, 5d, 25d))
// res1: Real = Inexact(
//   f = spire.math.Real$$Lambda$10977/0x000000080284de60@3241785
// )

Harmonic mean

harmonicMean(List(2d, 3d, 4d, 5d))
// res2: Double = 3.116883116883117

Generalized Mean

See the wikipedia page on Generalized Mean.

When the parameter p is 1, it is the arithmetic mean.

generalizedMean[Double, List](1d, List(2d, 3d, 4d, 5d))
// res3: Double = 3.5

As p approaches 0, it is the geometric mean.

generalizedMean[Double, List](0.0001, List(1d, 5d, 25d))
// res4: Double = 5.00043173370165

At -1 it is the harmonic mean.

generalizedMean[Double, List](-1d, List(2d, 3d, 4d, 5d))
// res5: Double = 3.116883116883117

Moving means

import spire.math._

Moving arithmetic mean

movingArithmeticMean[List, Int, Double](
    (1 to 100).toList.map(_.toDouble),
    5)
// res6: List[Double] = List(
//   3.0,
//   4.0,
//   5.0,
//   6.0,
//   7.0,
//   8.0,
//   9.0,
//   10.0,
//   11.0,
//   12.0,
//   13.0,
//   14.0,
//   15.0,
// ...

Moving geometric mean

movingGeometricMean[List, Int, Real](
    List(1d, 5d, 25d, 125d, 625d),
    3)
// res7: List[Real] = List(
//   Inexact(f = spire.math.Real$$Lambda$10977/0x000000080284de60@56119b9b),
//   Inexact(f = spire.math.Real$$Lambda$10869/0x00000008027fe1b8@30e2f156),
//   Inexact(f = spire.math.Real$$Lambda$10869/0x00000008027fe1b8@717ac5da)
// )

Moving harmonic mean

movingHarmonicMean[List, Int, Real](
    (1 to 5).toList.map(v => Real(v)),
    3)
// res8: List[Real] = List(
//   Exact(n = 18/11),
//   Exact(n = 36/13),
//   Exact(n = 180/47)
// )

Mean Average Precision at K

See the page on mean average precision at Kaggle

import spire.math.Rational
import axle.ml.RankedClassifierPerformance._

Examples (from benhamner/Metrics)

meanAveragePrecisionAtK[Int, Rational](List(1 until 5), List(1 until 5), 3)
// res10: Rational = 1
meanAveragePrecisionAtK[Int, Rational](List(List(1, 3, 4), List(1, 2, 4), List(1, 3)), List(1 until 6, 1 until 6, 1 until 6), 3)
// res11: Rational = 37/54
meanAveragePrecisionAtK[Int, Rational](List(1 until 6, 1 until 6), List(List(6, 4, 7, 1, 2), List(1, 1, 1, 1, 1)), 5)
// res12: Rational = 13/50
meanAveragePrecisionAtK[Int, Rational](List(List(1, 3), List(1, 2, 3), List(1, 2, 3)), List(1 until 6, List(1, 1, 1), List(1, 2, 1)), 3)
// res13: Rational = 11/18

Uniform Distribution

Imports and implicits (for all sections below)

import cats.implicits._
import spire.algebra._
import axle.probability._

implicit val fieldDouble: Field[Double] = spire.implicits.DoubleAlgebra

Example

val X = uniformDistribution(List(2d, 4d, 4d, 4d, 5d, 5d, 7d, 9d))
// X: ConditionalProbabilityTable[Double, spire.math.Rational] = ConditionalProbabilityTable(
//   p = HashMap(5.0 -> 1/4, 9.0 -> 1/8, 2.0 -> 1/8, 7.0 -> 1/8, 4.0 -> 3/8)
// )

Standard Deviation

Example

import axle.stats._

implicit val nrootDouble: NRoot[Double] = spire.implicits.DoubleAlgebra
standardDeviation(X)
// res15: Double = 2.0

See also Probability Model

Root-mean-square deviation

See the Wikipedia page on Root-mean-square deviation.

import cats.implicits._

import spire.algebra.Field
import spire.algebra.NRoot

import axle.stats._

implicit val fieldDouble: Field[Double] = spire.implicits.DoubleAlgebra
implicit val nrootDouble: NRoot[Double] = spire.implicits.DoubleAlgebra

Given four numbers and an estimator function, compute the RMSD:

val data = List(1d, 2d, 3d, 4d)
def estimator(x: Double): Double =
  x + 0.2

rootMeanSquareDeviation[List, Double](data, estimator)
// res17: Double = 0.4000000000000002

Reservoir Sampling

Reservoir Sampling is the answer to a common interview question.

import spire.random.Generator.rng
import spire.algebra.Field

implicit val fieldDouble: Field[Double] = spire.implicits.DoubleAlgebra

import axle.stats._

Demonstrate it uniformly sampling 15 of the first 100 integers

val sample = reservoirSampleK(15, LazyList.from(1), rng).drop(100).head
// sample: List[Int] = List(
//   98,
//   90,
//   86,
//   67,
//   64,
//   63,
//   60,
//   59,
//   57,
//   46,
//   42,
//   31,
//   30,
//   22,
//   7
// )

The mean of the sample should be in the ballpark of the mean of the entire list (50.5):

import axle.math.arithmeticMean

arithmeticMean(sample.map(_.toDouble))
// res19: Double = 54.8

Indeed it is.

Future Work

Clarify imports starting with uniformDistribution