LM_diagnostics¶
Attributes¶
Classes¶
Diagnostic plots to identify potential problems in a linear regression fit. |
Module Contents¶
- LM_diagnostics.style_talk = 'seaborn-talk'¶
- class LM_diagnostics.LinearRegDiagnostic(results: Type[statsmodels.regression.linear_model.RegressionResultsWrapper])¶
Diagnostic plots to identify potential problems in a linear regression fit. Mainly,
non-linearity of data
Correlation of error terms
non-constant variance
outliers
high-leverage points
collinearity
- Authors:
Prajwal Kafle (p33ajkafle@gmail.com, where 3 = r) Does not come with any sort of warranty. Please test the code one your end before using.
Matt Spinelli (m3spinelli@gmail.com, where 3 = r) (1) Fixed incorrect annotation of the top most extreme residuals in
the Residuals vs Fitted and, especially, the Normal Q-Q plots.
Changed Residuals vs Leverage plot to match closer the y-axis range shown in the equivalent plot in the R package ggfortify.
Added horizontal line at y=0 in Residuals vs Leverage plot to match the plots in R package ggfortify and base R.
Added option for placing a vertical guideline on the Residuals vs Leverage plot using the rule of thumb of h = 2p/n to denote high leverage (high_leverage_threshold=True).
Added two more ways to compute the Cook’s Distance (D) threshold: * ‘baseR’: D > 1 and D > 0.5 (default) * ‘convention’: D > 4/n * ‘dof’: D > 4 / (n - k - 1)
Fixed class name to conform to Pascal casing convention
Fixed Residuals vs Leverage legend to work with loc=’best’
- results¶
- y_true¶
- y_predict¶
- xvar¶
- xvar_names¶
- residual¶
- residual_norm¶
- leverage¶
- cooks_distance¶
- nparams¶
- nresids¶
- __call__(plot_context='seaborn-v0_8-paper', **kwargs)¶
- residual_plot(ax=None)¶
Residual vs Fitted Plot
Graphical tool to identify non-linearity. (Roughly) Horizontal red line is an indicator that the residual has a linear pattern
- qq_plot(ax=None)¶
Standarized Residual vs Theoretical Quantile plot
Used to visually check if residuals are normally distributed. Points spread along the diagonal line will suggest so.
- scale_location_plot(ax=None)¶
Sqrt(Standarized Residual) vs Fitted values plot
Used to check homoscedasticity of the residuals. Horizontal line will suggest so.
- leverage_plot(ax=None, high_leverage_threshold=False, cooks_threshold='baseR')¶
Residual vs Leverage plot
Points falling outside Cook’s distance curves are considered observation that can sway the fit aka are influential. Good to have none outside the curves.
- vif_table()¶
VIF table
VIF, the variance inflation factor, is a measure of multicollinearity. VIF > 5 for a variable indicates that it is highly collinear with the other input variables.
- get_influence_ids(n_i=10)¶
Get top 10 influential residuals
- __cooks_dist_line(factor)¶
Helper function for plotting Cook’s distance curves
- __qq_top_resid(quantiles, top_residual_indices)¶
Helper generator function yielding the index and coordinates