correlationCoefficient()

Computes Pearson’s correlation coefficient of two variables given by their column names firstColumn and secondColumn which is a measure for the linear dependence of both variables. It assumes that both variables follow a normal distribution, otherwise a rank correlation measure needs to be calculated instead. Its value for is in the interval , with a value of -1 implying total negative linear correlation, a value of 0 for no linear correlation and a value of 1 for total positive linear correlation. This method will ignore any missing values and report its number along the correlation coefficient in a common object.
Formula:
Parameters:
(string) firstColumn (required)
(string) secondColumn (required)
Scales of measure:
interval, metric
Usage:
var bodyMeasurements = [
	{ weight: 63, height: 1.65 },
	{ weight: 64, height: 1.67 },
	{ weight: 74, height: 1.80 },
	{ weight: 79, height: 1.82 },
	{ weight: 82, height: 1.86 },
	{ weight: 66, height: 1.70 },
	{ weight: 91, height: 1.83 },
	{ weight: 72, height: 1.76 },
	{ weight: 85, height: 1.89 },
	{ weight: 68, height: 1.68 }
];

var bodyVars = {
	weight: 'metric',
	height: 'metric'
};

var stats = new Statistics(bodyMeasurements, bodyVars);
var r = stats.correlationCoefficient('weight', 'height');
Returns:
{
	correlationCoefficient: 0.91258…
	missings: 0
}

covariance()

Computes the covariance of two variables given by their column names firstColumn and secondColumn. If corrected is set to true, the corrected sample covariance will be calculated. This is the suitable method if the covariance should be estimated for the whole population, while the uncorrected sample covariance is often the better choice for a merely descriptive measure. A positive value indicates a concordant linear coherence, while a negative value indicates a discordant linear coherence between the two variables. A value of zero suggests no linear coherence between the two variables. It is worth noting that this measure does not account for non-linear coherences and that the size of the covariance can not suggest the strength of the coherence. If this strength should be evaluated, consider calculating the correlation coefficient.
Formulas:
Sample covariance
Corrected sample covariance
Parameters:
(string) firstColumn (required)
(string) secondColumn (required)
(boolean) corrected (optional, default: true)
Scales of measure:
interval, metric
Usage:
var covariance = stats.covariance('weight', 'height');
// same data as in example for the correlationCoefficient() method
Returns:
{
	covariance: 0.75177…,
	missings: 0
}

fisherTransformation()

Applies the Fisher transformation for a given correlation coefficient r. In some cases, it can also be used for Spearman’s rank correlation coefficient. Since it’s defined as this method is merely a wrapper for Math.atanh(r) extending it with input validation and offering a polyfill.
Formula:
Parameters:
(integer or float) coeff (required)
Usage:
var transform = stats.fisherTransformation(0.3);
Returns:
0.30951…

goodmanKruskalsGamma()

Computes Goodman and Kruskal’s Gamma for two variables given by firstColumn and secondColumn. It is a measure for rank correlation of two variables and thus similar to Kendall’s Tau, however it is generally a poorer measure since it does not account for ties (and is equal to Kendall’s Tau a if no ties are present in the ranked data). Goodman and Kruskal’s Gamma is defined over a range of , with a value of -1 implying total disgreement, a value of 0 for total independence and a value of 1 for total agreement between two variables. Returns an object containing the value for gamma, the approximated t statistic t and both one- and two-sided significance levels under the assumption of a Student’s t-distributed null hypothesis.
Parameters:
(string) firstColumn (required)
(string) secondColumn (required)
Scales of measure:
ordinal, interval, metric
Usage:
var testData = [
	{ satisfaction: 0, affordability: 0 },
	{ satisfaction: 1, affordability: 2 },
	{ satisfaction: 9, affordability: 8 },
	{ satisfaction: 2, affordability: 3 },
	{ satisfaction: 6, affordability: 1 },
	{ satisfaction: 4, affordability: 5 },
	{ satisfaction: 3, affordability: 4 },
	{ satisfaction: 5, affordability: 6 },
	{ satisfaction: 7, affordability: 9 },
	{ satisfaction: 8, affordability: 7 }
];

var testVars = { satisfaction: 'ordinal', affordability: 'ordinal' };

var stats = new Statistics(testData, testVars);
var gamma = stats.goodmanKruskalsGamma('satisfaction', 'affordability');
Returns:
{
	gamma: 0.68888…,
	tStatistic: 2.01603…,
	pOneTailed: 0.03927…,
	pTwoTailed: 0.07854…,
	missings: 0
}

kendallsTau()

Computes the Kendall rank correlation coefficient (in short, Kendall’s tau) for two variables given by firstColumn and secondColumn. It it similar to Spearman’s rho, however, it does not account for the rank differences but for their comparisons. It is usually smaller than Spearman’s rho, but is a more robust measure if the variables are not normally distributed, their scales are very different or if the small sample size is small.

This method returns an object for three different parameters Tau a, Tau b and Tau c, all of which defined over a range of , with a value of -1 implying total disgreement, a value of 0 for total independence and a value of 1 for total agreement between two variables.
  • Tau a: only defined if no ties are present. In that case, the value for , the statistic and the one- and two-sided probabilities for statistical independence under the assumption of a normally distributed null hypothesis are returned.
  • Tau b: more robust than Tau a as it accounts for tied values. If no ties are present, its value and all subsequent measures will be equal to those of Tau a. Its value, the statistic and the one- and two-sided probabilities for statistical independence under the assumption of a normally distributed null hypothesis are returned.
  • Tau c: similar to Tau b but more suitable for rectangular cases as the former rarely reaches its extreme values.
Parameters:
(string) firstColumn (required)
(string) secondColumn (required)
Scales of measure:
ordinal, interval, metric
Usage:
var testData = … // same data as in the example for the goodmanKruskalsGamma() method
var testVars = … // same data as in the example for the goodmanKruskalsGamma() method

var stats = new Statistics(testData, testVars);
var kendall = stats.kendallsTau('satisfaction', 'affordability');
Returns:
{
	a: {
		tauA: 0.68888…,
		z: 2.77272…
		pOneSided: 0.00278,
		pTwoSided: 0.00556
	},
	b: {
		tauB: 0.68888…,
		z: 2.77272…
		pOneSided: 0.00278,
		pTwoSided: 0.00556
	},
	c: {
		tauC: 1.24
	},
	missings: 0
}

spearmansRho()

Computes Spearman’s rank correlation coefficient (in short, Spearmans’s rho) for two variables given by firstColumn and secondColumn. It it similar to Pearson’s correlation coefficient, however, it does not compare the values of the variables but their ranks. It assesses monotonic relationships between two variables and does not discern between linear and non-linear relationships whereas the correlation coefficient assesses only linear relationships. Therefore, Spearman’s rho is a more general parameter and also suitable for variables on an ordinal scale. However, this method is flawed when many ties are present (e.g. non-unique values for either variable). Setting adjustForTies to true alleviates the problem. Spearman’s rho is defined over a range of , with a value of -1 implying total disgreement, a value of 0 for total independence and a value of 1 for total agreement between two variables. Besides computing rho, this method also returns statistics and p values both for the assumption of a normally distributed and a Student’s t-distributed null hypothesis.
Parameters:
(string) firstColumn (required)
(string) secondColumn (required)
(boolean) adjustForTies (optional, default: false)
Scales of measure:
ordinal, interval, metric
Usage:
var testData = … // same data as in the example for the assignRanks() method 
var testVars = … // same data as in the example for the assignRanks() method
var stats = new Statistics(testData, testVars);
var dependence = stats.spearmansRho('age', 'iq');
Returns:
{
	rho: -0.76969…
	significanceNormal: {
		pOneSided: 0.0044,
		pTwoSided: 0.0088,
		zScore: -2.62010…
	},
	significanceStudent: {
		degreesOfFreedom: 8,
		pOneSided: 0.00461…,
		pTwoSided: 0.00922…
		tStatistic: -3.41008…
	},
	missings: 0
}

linearRegression()

Performs simple linear regression on two variables given by firstColumn and secondColumn. This method assumes that their correlation — if there is any — is linear and calculates several characterising measures:
  • correlationCoefficient: The empirical measure corresponds to Pearson’s correlation coefficient. It’s distributed over the interval , with a value of -1 implying total negative linear correlation, a value of 0 for no linear correlation and a value of 1 for total positive linear correlation.
  • coefficientOfDetermination: This measure, signified as , describes how much variance in the data can be explained by the computed regression model. It falls into the interval , where a value of 0 implies no linear correlation and a value of 1 implies perfect linear correlation.
  • coefficientOfDeterminationCorrected: The coefficient of determination tends to get larger with a larger amount of independent variables. In linear regression, this is rarely a problem, however the corrected coefficient yields a more conservative result than the standard measure.
  • regressionFirst and regressionSecond: A regression model describes a regression line following the linear equation with regression coefficients and . As this linear equation assumes as mapping of the value onto , thus being directional (i.e. " is dependent on "), the opposite mapping of onto (i.e. " is dependent on ") needs to be considered as well. This method will return values for beta1 and beta2 both for the case mapping of firstColumn to secondColumn as regressionFirst and the opposite mapping as regressionSecond.
  • phi: Both regression lines given by regressionFirst and regressionSecond form an angle which is another measure for the regression system’s correlation, with values for phi closer to 0° indicating a better correlation than values closer to 90°.
Parameters:
(string) firstColumn (required)
(string) secondColumn (required)
Scales of measure:
interval, metric
Usage:
var testData = [
	{ price: 2.00, sold: 0 },
	{ price: 1.90, sold: 2 },
	{ price: 1.80, sold: 1 },
	{ price: 1.70, sold: 2 },
	{ price: 1.60, sold: 3 },
	{ price: 1.50, sold: 6 },
	{ price: 1.40, sold: 4 },
	{ price: 1.30, sold: 9 },
	{ price: 1.20, sold: 8 },
	{ price: 1.10, sold: 10 },
	{ price: 1.00, sold: 14 }
];

var testVars = {
	price: 'metric',
	sold: 'metric'
};

var stats = new Statistics(testData, testVars);
var regression = stats.linearRegression('price', 'sold'));
Returns:
{
	coefficientOfDetermination: 0.88990…,
	coefficientOfDeterminationCorrected: 0.87767…,
	correlationCoefficient: -0.94334…,
	phi: 19.37826…,
	regressionFirst: {
		beta1: 24.18181…
		beta2: -12.54545…
	},
	regressionSecond: {
		beta1: 1.88046…,
		beta2: -0.07093…
	}
}