{"id":290,"date":"2012-02-13T20:22:53","date_gmt":"2012-02-13T20:22:53","guid":{"rendered":"http:\/\/rulesofreason.wordpress.com\/?p=290"},"modified":"2012-02-13T20:22:53","modified_gmt":"2012-02-13T20:22:53","slug":"weighted-variance-and-weighted-coefficient-of-variation","status":"publish","type":"post","link":"http:\/\/re-design.dimiter.eu\/?p=290","title":{"rendered":"Weighted variance and weighted coefficient of variation"},"content":{"rendered":"<p>Often we want to compare the <em>variability<\/em>\u00a0of a variable in different contexts &#8211; say, the variability of unemployment in different countries over time, or the variability of height in two populations, etc. The most often used measures of variability are the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Variance\" target=\"_blank\">variance<\/a>\u00a0and the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Standard_deviation\" target=\"_blank\">standard deviation<\/a> (which is just the square root of the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Variance\" target=\"_blank\">variance<\/a>). However, for some types of data, these measures are not entirely appropriate. For example, when data is generated by a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Poisson_distribution\" target=\"_blank\">Poisson process<\/a> (e.g. when you have counts of rare events) the mean equals the variance <strong>by definition<\/strong>.\u00a0Clearly, comparing the variability\u00a0of two\u00a0Poisson distributions\u00a0using the <em>variance<\/em> or the <em>standard deviation<\/em> would not work if the means of these populations differ. A common and easy fix is to use the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Coefficient_of_variation\" target=\"_blank\">coefficient of variation<\/a> instead, which is simply the standard deviation divided by the mean. So far, so good.<\/p>\n<p>Things get tricky however\u00a0when we want to calculate the <em>weighted coefficient of variation<\/em>. The <em>weighted mean<\/em> is\u00a0just the mean but <a href=\"http:\/\/en.wikipedia.org\/wiki\/Weighted_mean\" target=\"_blank\">some data points contribute more than others<\/a>. For example the mean of 0.4 and 0.8 is 0.6. If we assign the weights 0.9 to the first observation [0.4] and 0.1 to the second [0.8], the weighted mean is (0.9*0.4+0.1*0.8)\/1, which equals to 0.44. You would guess that we can compute the weighted variance by analogy, \u00a0and you would be wrong.<\/p>\n<p>For example, the sample variance of {0.4,0.8} is given by [Wikipedia]:<a href=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/sample-variance.png\"><img data-attachment-id=\"298\" data-permalink=\"http:\/\/re-design.dimiter.eu\/?attachment_id=298\" data-orig-file=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/sample-variance.png?fit=228%2C122\" data-orig-size=\"228,122\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"sample variance\" data-image-description=\"\" data-medium-file=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/sample-variance.png?fit=228%2C122\" data-large-file=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/sample-variance.png?fit=228%2C122\" loading=\"lazy\" class=\"alignnone size-full wp-image-298\" title=\"sample variance\" src=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/sample-variance.png?resize=228%2C122\" alt=\"\" width=\"228\" height=\"122\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>or in our example <strong>(<\/strong>(0.4-0.6)^2+(0.8-0.6)^2<strong>)<\/strong> <strong>\/<\/strong> <strong>(<\/strong>2-1<strong>)<\/strong> which equals to 0.02. But, the <em>weighted sample variance<\/em> <strong>cannot<\/strong> be computed by simply adding the weights to the above formula <strong>(0.9*<\/strong>(0.4-0.6)^2+<strong>0.1*<\/strong>(0.8-0.6)^2<strong>)<\/strong> <strong>\/<\/strong> <strong>(<\/strong>2-1<strong>). <\/strong>The formula for the weighted variance is different [<a href=\"http:\/\/en.wikipedia.org\/wiki\/Weighted_mean\" target=\"_blank\">Wikipedia<\/a>]:<br \/>\n<a href=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/weighted-sample-variance1.png\"><img data-attachment-id=\"300\" data-permalink=\"http:\/\/re-design.dimiter.eu\/?attachment_id=300\" data-orig-file=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/weighted-sample-variance1.png?fit=265%2C51\" data-orig-size=\"265,51\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"weighted sample variance\" data-image-description=\"\" data-medium-file=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/weighted-sample-variance1.png?fit=265%2C51\" data-large-file=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/weighted-sample-variance1.png?fit=265%2C51\" loading=\"lazy\" class=\"alignnone size-full wp-image-300\" title=\"weighted sample variance\" src=\"https:\/\/i1.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/weighted-sample-variance1.png?resize=265%2C51\" alt=\"\" width=\"265\" height=\"51\" data-recalc-dims=\"1\" \/><\/a><br \/>\nwhere V1 is the sum of the weights and V2 is the sum of squared weights:<img loading=\"lazy\" title=\"v2\" src=\"https:\/\/i0.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/02\/v21.png?resize=98%2C48\" alt=\"\" width=\"98\" height=\"48\" data-recalc-dims=\"1\" \/>.<br \/>\nThe next steps are straightforward: the <em>weighted standard deviation<\/em> is the square root of the above, and the <em>weighted coefficient of variation<\/em> is the weighted standard deviation divided by the weighted mean.<\/p>\n<p>Although there is nothing new here, I thought it&#8217;s a good idea to put it together because it appears to be causing some confusion. \u00a0For example, in the latest issue of <a href=\"http:\/\/eup.sagepub.com\/\" target=\"_blank\">European Union Politics<\/a> you can find the <a href=\"http:\/\/eup.sagepub.com\/content\/13\/1\/70.abstract\" target=\"_blank\">article<\/a> &#8216;Measuring common standards \u00a0and equal responsibility-sharing in EU asylum outcome data&#8217; \u00a0by a team of scientists from LSE. On page 74, you can read that:<\/p>\n<blockquote><p>The weighted variance <em>[of the set p={0.38, 0.42} with weights W={0.50,0.50}] <\/em>equals 0.5(0.38-.0.40)^2+0.5(0.42-0.40)^2 =0.0004.<\/p><\/blockquote>\n<p>As explained above, this is not generally correct unless the biased (population) rather than the unbiased (sample) \u00a0weighted variance is meant. When calculated properly, the weighted variance turns out to be 0.0008. <a href=\"https:\/\/stat.ethz.ch\/pipermail\/r-help\/2008-July\/168762.html\" target=\"_blank\">Here<\/a> you can find the function Gavin Simpson has provided \u00a0for calculating the weighted variance in R and try for yourself.<\/p>\n<p><em>P.S.<\/em> To be clear, the weighted variance issue is not central to the argument of the article cited above but is significant as the authors discuss at length the methodology for estimating variability in data and introduce the so-called Coffey-Feingold-Broomberg measure of variability which the authors \u00a0deem more appropriate for proportions.<\/p>\n<p><em>P.P.S<\/em> On the internet, there is yet more confusion: for example, <a href=\"http:\/\/www.itl.nist.gov\/div898\/software\/dataplot\/refman2\/ch2\/weightsd.pdf\" target=\"_blank\">this document<\/a> (which pops high in the Google results) has yet a different formula, shown in a slightly different form <a href=\"http:\/\/stats.stackexchange.com\/questions\/6534\/how-do-i-calculate-a-weighted-standard-deviation-in-excel\" target=\"_blank\">here<\/a> \u00a0as \u00a0well.<\/p>\n<p><em>Disclaimer.<\/em> I have a forthcoming <a href=\"http:\/\/www.dimiter.eu\/articles\/europeanization%20of%20asylum.pdf\" target=\"_blank\">paper<\/a> on the same topic (asylum policy) as the EUP article mentioned above.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Often we want to compare the variability\u00a0of a variable in different contexts &#8211; say, the variability of unemployment in different countries over time, or the variability of height in two populations, etc. The most often used measures of variability are the variance\u00a0and the standard deviation (which is just the square root of the variance). However, for some types of data, these measures are not entirely appropriate. For example, when data is generated by a Poisson process (e.g. when you have counts of rare events) the mean equals the variance by definition.\u00a0Clearly, comparing the variability\u00a0of two\u00a0Poisson distributions\u00a0using the variance or the standard deviation would not work if the means of these populations differ. A common and easy fix is to use the coefficient of variation instead, which is simply the standard deviation divided by the mean. So far, so good. Things get tricky however\u00a0when we want to calculate the weighted coefficient of variation. The weighted mean is\u00a0just the mean but some data points contribute more than others. For example the mean of 0.4 and 0.8 is 0.6. If we assign the weights 0.9 to the first observation [0.4] and 0.1 to the second [0.8], the weighted mean is (0.9*0.4+0.1*0.8)\/1, which equals to 0.44. You would guess that we can compute the weighted variance by analogy, \u00a0and you would be wrong. For example, the sample variance of {0.4,0.8} is given by [Wikipedia]: or in our example ((0.4-0.6)^2+(0.8-0.6)^2) \/ (2-1) which equals to 0.02. But, the weighted sample variance cannot be computed by&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"http:\/\/re-design.dimiter.eu\/?p=290\">Continue reading<span class=\"screen-reader-text\">Weighted variance and weighted coefficient of variation<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[26],"tags":[83,135,231,405,676,688,689,690],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7g3hj-4G","jetpack-related-posts":[{"id":1013,"url":"http:\/\/re-design.dimiter.eu\/?p=1013","url_meta":{"origin":290,"position":0},"title":"The political geography of human development","date":"November 12, 2018","format":false,"excerpt":"The research I did for the previous post on the inadequacy of the widely-used term 'Global South' led me to some surprising results about the political geography of development. Although the relationship between latitude and human development is not linear, distance from the equator turned out to have a rather\u2026","rel":"","context":"In &quot;Data visualization&quot;","img":{"alt_text":"","src":"https:\/\/i2.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2018\/11\/f3_hdi_eq.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":354,"url":"http:\/\/re-design.dimiter.eu\/?p=354","url_meta":{"origin":290,"position":1},"title":"Compiling government positions from the Manifesto Project data with R","date":"March 12, 2012","format":false,"excerpt":"****N.B. I have updated the functions in February 2019 to makes use of the latest Manifesto data. See for details here.*** The Manifesto Project (former Manifesto Research Group, Comparative Manifestos Project) has assembled a database of 'quantitative content analyses of parties\u2019 election programs from more than 50 countries covering all\u2026","rel":"","context":"In &quot;Policy making&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":336,"url":"http:\/\/re-design.dimiter.eu\/?p=336","url_meta":{"origin":290,"position":2},"title":"Explanation and the quest for 'significant' relationships. Part II","date":"February 22, 2012","format":false,"excerpt":"In Part I I argue that the search and discovery of statistically significant relationships does not amount to explanation and is often misplaced in the social sciences because the variables which are purported to have\u00a0effects\u00a0on the outcome cannot be manipulated. Just to make sure that my message is not misinterpreted\u2026","rel":"","context":"In &quot;Causality&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":227,"url":"http:\/\/re-design.dimiter.eu\/?p=227","url_meta":{"origin":290,"position":3},"title":"Is this a common (ecological) fallacy?","date":"December 20, 2011","format":false,"excerpt":"You have data on two levels\u00a0(individuals and countries) for\u00a0an outcome variable (e.g. 'trust') and a predictor (e.g. 'wealth'). Supppose that the\u00a0pooled and within-country individual-level correlations between the two variables\u00a0are strongly positive\u00a0but the between-country (country-level) correlation is zero. You build a regression with individual-level 'trust' as the dependent variable and individual-level\u2026","rel":"","context":"In &quot;Multi-level models&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":260,"url":"http:\/\/re-design.dimiter.eu\/?p=260","url_meta":{"origin":290,"position":4},"title":"When 'just looking' beats regression","date":"January 30, 2012","format":false,"excerpt":"In a draft paper\u00a0currently under review I argue that the institutionalization of\u00a0a common EU\u00a0asylum policy has not led to a race to the bottom with respect to asylum applications, refugee status grants, and some other indicators. The graph below traces the number of asylum applications lodged in 29 European countries\u2026","rel":"","context":"In &quot;Time series analysis&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2012\/01\/asylumapplications5.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":526,"url":"http:\/\/re-design.dimiter.eu\/?p=526","url_meta":{"origin":290,"position":5},"title":"Correlation does not imply causation. Then what does it imply?","date":"October 9, 2012","format":false,"excerpt":"'Correlation does not imply causation' is an adage students\u00a0from all social sciences are made to recite from a very\u00a0early age. What is less often systematically discussed is what\u00a0could be actually going on so that two\u00a0phenomena are correlated but not\u00a0causally related. Let's try to make a list: 1) The correlation might\u2026","rel":"","context":"In &quot;Causality&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/posts\/290"}],"collection":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=290"}],"version-history":[{"count":0,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/posts\/290\/revisions"}],"wp:attachment":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=290"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=290"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=290"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}