banner



What Is An Influential Point

Observation that would cause a big modify if deleted

In Anscombe'due south quartet the 2 datasets on the bottom both incorporate influential points. All four sets are identical when examined using simple summary statistics, but vary considerably when graphed. If one point is removed, the line would expect very dissimilar.

In statistics, an influential observation is an observation for a statistical calculation whose deletion from the dataset would noticeably change the consequence of the calculation.[i] In particular, in regression analysis an influential ascertainment is i whose deletion has a large effect on the parameter estimates.[2]

Cess [edit]

Various methods have been proposed for measuring influence.[three] [iv] Assume an estimated regression y = X b + e {\displaystyle \mathbf {y} =\mathbf {X} \mathbf {b} +\mathbf {e} } , where y {\displaystyle \mathbf {y} } is an n×1 column vector for the response variable, X {\displaystyle \mathbf {Ten} } is the n×k blueprint matrix of explanatory variables (including a constant), eastward {\displaystyle \mathbf {e} } is the north×1 residual vector, and b {\displaystyle \mathbf {b} } is a k×ane vector of estimates of some population parameter β R yard {\displaystyle \mathbf {\beta } \in \mathbb {R} ^{k}} . Also define H Ten ( X T 10 ) 1 10 T {\displaystyle \mathbf {H} \equiv \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}} , the projection matrix of X {\displaystyle \mathbf {X} } . Then we have the following measures of influence:

  1. DFBETA i b b ( i ) = ( 10 T X ) one 10 i T e i 1 h i {\displaystyle {\text{DFBETA}}_{i}\equiv \mathbf {b} -\mathbf {b} _{(-i)}={\frac {\left(\mathbf {X} ^{\mathsf {T}}\mathbf {Ten} \right)^{-1}\mathbf {ten} _{i}^{\mathsf {T}}e_{i}}{1-h_{i\cdot }}}} , where b ( i ) {\displaystyle \mathbf {b} _{(-i)}} denotes the coefficients estimated with the i-th row x i {\displaystyle \mathbf {10} _{i}} of X {\displaystyle \mathbf {X} } deleted, h i = ten i ( 10 T Ten ) ane x i T {\displaystyle h_{i\cdot }=\mathbf {x} _{i}\left(\mathbf {Ten} ^{\mathsf {T}}\mathbf {10} \right)^{-1}\mathbf {x} _{i}^{\mathsf {T}}} denotes the i-thursday row of H {\displaystyle \mathbf {H} } . Thus DFBETA measures the difference in each parameter guess with and without the influential point. At that place is a DFBETA for each variable and each observation (if there are North observations and thou variables at that place are North·m DFBETAs).[5] Tabular array shows DFBETAs for the third dataset from Anscombe'southward quartet (bottom left chart in the effigy):
x y intercept gradient
ten.0 7.46 -0.005 -0.044
8.0 6.77 -0.037 0.019
13.0 12.74 -357.910 525.268
ix.0 seven.11 -0.033 0
xi.0 7.81 0.049 -0.117
fourteen.0 8.84 0.490 -0.667
6.0 6.08 0.027 -0.021
4.0 5.39 0.241 -0.209
12.0 8.15 0.137 -0.231
7.0 6.42 -0.020 0.013
5.0 5.73 0.105 -0.087
  1. DFFITS - divergence in fits
  2. Cook's D measures the outcome of removing a data signal on all the parameters combined.[2]

Outliers, leverage and influence [edit]

An outlier may be defined as a data point that differs significantly from other observations.[6] [seven] A high-leverage point are observations made at extreme values of independent variables.[8] Both types of atypical observations will force the regression line to be close to the point.[2] In Anscombe's quartet, the bottom right image has a point with loftier leverage and the bottom left paradigm has an outlying betoken.

See also [edit]

  • Influence function (statistics)
  • Outlier
  • Leverage
    • Partial leverage
  • Regression assay
  • Melt'southward distance § Detecting highly influential observations
  • Anomaly detection

References [edit]

  1. ^ Burt, James E.; Barber, Gerald Yard.; Rigby, David L. (2009), Elementary Statistics for Geographers, Guilford Press, p. 513, ISBN9781572304840 .
  2. ^ a b c Everitt, Brian (1998). The Cambridge Dictionary of Statistics. Cambridge, Uk New York: Cambridge University Press. ISBN0-521-59346-8.
  3. ^ Winner, Larry (March 25, 2002). "Influence Statistics, Outliers, and Collinearity Diagnostics".
  4. ^ Belsley, David A.; Kuh, Edwin; Welsh, Roy Due east. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons. pp. 11–xvi. ISBN0-471-05856-4.
  5. ^ "Outliers and DFBETA" (PDF). Archived (PDF) from the original on May 11, 2013.
  6. ^ Grubbs, F. Eastward. (February 1969). "Procedures for detecting outlying observations in samples". Technometrics. xi (one): i–21. doi:10.1080/00401706.1969.10490657. An outlying observation, or "outlier," is 1 that appears to deviate markedly from other members of the sample in which it occurs.
  7. ^ Maddala, Grand. S. (1992). "Outliers". Introduction to Econometrics (2nd ed.). New York: MacMillan. pp. 89. ISBN978-0-02-374545-4. An outlier is an observation that is far removed from the remainder of the observations.
  8. ^ Everitt, B. S. (2002). Cambridge Lexicon of Statistics. Cambridge Academy Press. ISBN0-521-81099-Ten.

Further reading [edit]

  • Dehon, Catherine; Gassner, Marjorie; Verardi, Vincenzo (2009). "Beware of 'Good' Outliers and Overoptimistic Conclusions". Oxford Bulletin of Economics and Statistics. 71 (three): 437–452. doi:10.1111/j.1468-0084.2009.00543.10.
  • Kennedy, Peter (2003). "Robust Interpretation". A Guide to Econometrics (Fifth ed.). Cambridge: The MIT Press. pp. 372–388. ISBN0-262-61183-X.

What Is An Influential Point,

Source: https://en.wikipedia.org/wiki/Influential_observation

Posted by: rossarishe.blogspot.com

0 Response to "What Is An Influential Point"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel