LM_diagnostics

Attributes

style_talk

Classes

LinearRegDiagnostic

Diagnostic plots to identify potential problems in a linear regression fit.

Module Contents

LM_diagnostics.style_talk = 'seaborn-talk'
class LM_diagnostics.LinearRegDiagnostic(results: Type[statsmodels.regression.linear_model.RegressionResultsWrapper])

Diagnostic plots to identify potential problems in a linear regression fit. Mainly,

  1. non-linearity of data

  2. Correlation of error terms

  3. non-constant variance

  4. outliers

  5. high-leverage points

  6. collinearity

Authors:

Prajwal Kafle (p33ajkafle@gmail.com, where 3 = r) Does not come with any sort of warranty. Please test the code one your end before using.

Matt Spinelli (m3spinelli@gmail.com, where 3 = r) (1) Fixed incorrect annotation of the top most extreme residuals in

the Residuals vs Fitted and, especially, the Normal Q-Q plots.

  1. Changed Residuals vs Leverage plot to match closer the y-axis range shown in the equivalent plot in the R package ggfortify.

  2. Added horizontal line at y=0 in Residuals vs Leverage plot to match the plots in R package ggfortify and base R.

  3. Added option for placing a vertical guideline on the Residuals vs Leverage plot using the rule of thumb of h = 2p/n to denote high leverage (high_leverage_threshold=True).

  4. Added two more ways to compute the Cook’s Distance (D) threshold: * ‘baseR’: D > 1 and D > 0.5 (default) * ‘convention’: D > 4/n * ‘dof’: D > 4 / (n - k - 1)

  5. Fixed class name to conform to Pascal casing convention

  6. Fixed Residuals vs Leverage legend to work with loc=’best’

results
y_true
y_predict
xvar
xvar_names
residual
residual_norm
leverage
cooks_distance
nparams
nresids
__call__(plot_context='seaborn-v0_8-paper', **kwargs)
residual_plot(ax=None)

Residual vs Fitted Plot

Graphical tool to identify non-linearity. (Roughly) Horizontal red line is an indicator that the residual has a linear pattern

qq_plot(ax=None)

Standarized Residual vs Theoretical Quantile plot

Used to visually check if residuals are normally distributed. Points spread along the diagonal line will suggest so.

scale_location_plot(ax=None)

Sqrt(Standarized Residual) vs Fitted values plot

Used to check homoscedasticity of the residuals. Horizontal line will suggest so.

leverage_plot(ax=None, high_leverage_threshold=False, cooks_threshold='baseR')

Residual vs Leverage plot

Points falling outside Cook’s distance curves are considered observation that can sway the fit aka are influential. Good to have none outside the curves.

vif_table()

VIF table

VIF, the variance inflation factor, is a measure of multicollinearity. VIF > 5 for a variable indicates that it is highly collinear with the other input variables.

get_influence_ids(n_i=10)

Get top 10 influential residuals

__cooks_dist_line(factor)

Helper function for plotting Cook’s distance curves

__qq_top_resid(quantiles, top_residual_indices)

Helper generator function yielding the index and coordinates