LM_diagnostics ============== .. py:module:: LM_diagnostics Attributes ---------- .. autoapisummary:: LM_diagnostics.style_talk Classes ------- .. autoapisummary:: LM_diagnostics.LinearRegDiagnostic Module Contents --------------- .. py:data:: style_talk :value: 'seaborn-talk' .. py:class:: LinearRegDiagnostic(results: Type[statsmodels.regression.linear_model.RegressionResultsWrapper]) Diagnostic plots to identify potential problems in a linear regression fit. Mainly, a. non-linearity of data b. Correlation of error terms c. non-constant variance d. outliers e. high-leverage points f. collinearity Authors: Prajwal Kafle (p33ajkafle@gmail.com, where 3 = r) Does not come with any sort of warranty. Please test the code one your end before using. Matt Spinelli (m3spinelli@gmail.com, where 3 = r) (1) Fixed incorrect annotation of the top most extreme residuals in the Residuals vs Fitted and, especially, the Normal Q-Q plots. (2) Changed Residuals vs Leverage plot to match closer the y-axis range shown in the equivalent plot in the R package ggfortify. (3) Added horizontal line at y=0 in Residuals vs Leverage plot to match the plots in R package ggfortify and base R. (4) Added option for placing a vertical guideline on the Residuals vs Leverage plot using the rule of thumb of h = 2p/n to denote high leverage (high_leverage_threshold=True). (5) Added two more ways to compute the Cook's Distance (D) threshold: * 'baseR': D > 1 and D > 0.5 (default) * 'convention': D > 4/n * 'dof': D > 4 / (n - k - 1) (6) Fixed class name to conform to Pascal casing convention (7) Fixed Residuals vs Leverage legend to work with loc='best' .. py:attribute:: results .. py:attribute:: y_true .. py:attribute:: y_predict .. py:attribute:: xvar .. py:attribute:: xvar_names .. py:attribute:: residual .. py:attribute:: residual_norm .. py:attribute:: leverage .. py:attribute:: cooks_distance .. py:attribute:: nparams .. py:attribute:: nresids .. py:method:: __call__(plot_context='seaborn-v0_8-paper', **kwargs) .. py:method:: residual_plot(ax=None) Residual vs Fitted Plot Graphical tool to identify non-linearity. (Roughly) Horizontal red line is an indicator that the residual has a linear pattern .. py:method:: qq_plot(ax=None) Standarized Residual vs Theoretical Quantile plot Used to visually check if residuals are normally distributed. Points spread along the diagonal line will suggest so. .. py:method:: scale_location_plot(ax=None) Sqrt(Standarized Residual) vs Fitted values plot Used to check homoscedasticity of the residuals. Horizontal line will suggest so. .. py:method:: leverage_plot(ax=None, high_leverage_threshold=False, cooks_threshold='baseR') Residual vs Leverage plot Points falling outside Cook's distance curves are considered observation that can sway the fit aka are influential. Good to have none outside the curves. .. py:method:: vif_table() VIF table VIF, the variance inflation factor, is a measure of multicollinearity. VIF > 5 for a variable indicates that it is highly collinear with the other input variables. .. py:method:: get_influence_ids(n_i=10) Get top 10 influential residuals .. py:method:: __cooks_dist_line(factor) Helper function for plotting Cook's distance curves .. py:method:: __qq_top_resid(quantiles, top_residual_indices) Helper generator function yielding the index and coordinates