Regression Analysis

Linear Regression

axle.ml.LinearRegression makes use of axle.algebra.LinearAlgebra.

See the wikipedia page on Linear Regression

Example: Home Prices

case class RealtyListing(size: Double, bedrooms: Int, floors: Int, age: Int, price: Double)

val listings = List(
  RealtyListing(2104, 5, 1, 45, 460d),
  RealtyListing(1416, 3, 2, 40, 232d),
  RealtyListing(1534, 3, 2, 30, 315d),
  RealtyListing(852, 2, 1, 36, 178d))

Create a price estimator using linear regression.

import cats.implicits._
import spire.algebra.Rng
import spire.algebra.NRoot
import axle.jblas._

implicit val rngDouble: Rng[Double] = spire.implicits.DoubleAlgebra
implicit val nrootDouble: NRoot[Double] = spire.implicits.DoubleAlgebra
implicit val laJblasDouble = axle.jblas.linearAlgebraDoubleMatrix[Double]
implicit val rngInt: Rng[Int] = spire.implicits.IntAlgebra

import axle.ml.LinearRegression

val priceEstimator = LinearRegression(
  listings,
  numFeatures = 4,
  featureExtractor = (rl: RealtyListing) => (rl.size :: rl.bedrooms.toDouble :: rl.floors.toDouble :: rl.age.toDouble :: Nil),
  objectiveExtractor = (rl: RealtyListing) => rl.price,
  α = 0.1,
  iterations = 100)

Use the estimator

priceEstimator(RealtyListing(1416, 3, 2, 40, 0d))
// res0: Double = 288.60017635814035

Create a Plot of the error during the training

import axle.visualize._
import axle.algebra.Plottable._

val errorPlot = Plot(
  () => List(("error" -> priceEstimator.errTree)),
  connect = true,
  drawKey = true,
  colorOf = (label: String) => Color.black,
  title = Some("Linear Regression Error"),
  xAxis = Some(0d),
  xAxisLabel = Some("step"),
  yAxis = Some(0),
  yAxisLabel = Some("error"))

Create the SVG

import axle.web._
import cats.effect._

errorPlot.svg[IO]("docwork/images/lrerror.svg").unsafeRunSync()

lr error

Logistic Regression

WARNING: implementation is incorrect

axle.ml.LogisticRegression makes use of axle.algebra.LinearAlgebra.

See the wikipedia page on Logistic Regression

Example: Test Pass Probability

Predict Test Pass Probability as a Function of Hours Studied

case class Student(hoursStudied: Double, testPassed: Boolean)

val data = List(
  Student(0.50, false),
  Student(0.75, false),
  Student(1.00, false),
  Student(1.25, false),
  Student(1.50, false),
  Student(1.75, false),
  Student(1.75, true),
  Student(2.00, false),
  Student(2.25, true),
  Student(2.50, false),
  Student(2.75, true),
  Student(3.00, false),
  Student(3.25, true),
  Student(3.50, false),
  Student(4.00, true),
  Student(4.25, true),
  Student(4.50, true),
  Student(4.75, true),
  Student(5.00, true),
  Student(5.50, true)
)

Create a test pass probability function using logistic regression.

import spire.algebra.Rng
import spire.algebra.NRoot
import axle.jblas._

implicit val rngDouble: Rng[Double] = spire.implicits.DoubleAlgebra
// rngDouble: Rng[Double] = spire.std.DoubleAlgebra@3825d648
implicit val nrootDouble: NRoot[Double] = spire.implicits.DoubleAlgebra
// nrootDouble: NRoot[Double] = spire.std.DoubleAlgebra@3825d648
implicit val laJblasDouble = axle.jblas.linearAlgebraDoubleMatrix[Double]
// laJblasDouble: axle.algebra.LinearAlgebra[org.jblas.DoubleMatrix, Int, Int, Double] = axle.jblas.package$$anon$5@469c0c69
implicit val rngInt: Rng[Int] = spire.implicits.IntAlgebra
// rngInt: Rng[Int] = spire.std.IntAlgebra@5a0d8dc7

import axle.ml.LogisticRegression

val featureExtractor = (s: Student) => (s.hoursStudied :: Nil)
// featureExtractor: Student => List[Double] = <function1>

val objectiveExtractor = (s: Student) => s.testPassed
// objectiveExtractor: Student => Boolean = <function1>

val pTestPass = LogisticRegression(
  data,
  1,
  featureExtractor,
  objectiveExtractor,
  0.1,
  10)
// pTestPass: LogisticRegression[Student, org.jblas.DoubleMatrix] = LogisticRegression(
//   examples = List(
//     Student(hoursStudied = 0.5, testPassed = false),
//     Student(hoursStudied = 0.75, testPassed = false),
//     Student(hoursStudied = 1.0, testPassed = false),
//     Student(hoursStudied = 1.25, testPassed = false),
//     Student(hoursStudied = 1.5, testPassed = false),
//     Student(hoursStudied = 1.75, testPassed = false),
//     Student(hoursStudied = 1.75, testPassed = true),
//     Student(hoursStudied = 2.0, testPassed = false),
//     Student(hoursStudied = 2.25, testPassed = true),
//     Student(hoursStudied = 2.5, testPassed = false),
//     Student(hoursStudied = 2.75, testPassed = true),
//     Student(hoursStudied = 3.0, testPassed = false),
//     Student(hoursStudied = 3.25, testPassed = true),
//     Student(hoursStudied = 3.5, testPassed = false),
//     Student(hoursStudied = 4.0, testPassed = true),
//     Student(hoursStudied = 4.25, testPassed = true),
//     Student(hoursStudied = 4.5, testPassed = true),
//     Student(hoursStudied = 4.75, testPassed = true),
//     Student(hoursStudied = 5.0, testPassed = true),
//     Student(hoursStudied = 5.5, testPassed = true)
//   ),
//   numFeatures = 1,
//   featureExtractor = <function1>,
//   objectiveExtractor = <function1>,
//   α = 0.1,
//   numIterations = 10
// )

Use the estimator

testPassProbability(2d :: Nil)

(Note: The implementation is incorrect, so the result is elided until the error is fixed)

Future Work

Fix Logistic Regression