<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Noel Schutt &#187; statistics</title>
	<atom:link href="http://schutt.org/blog/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://schutt.org/blog</link>
	<description></description>
	<lastBuildDate>Fri, 03 Feb 2012 15:07:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>AM Statistics</title>
		<link>http://schutt.org/blog/2011/09/am-statistics/</link>
		<comments>http://schutt.org/blog/2011/09/am-statistics/#comments</comments>
		<pubDate>Sat, 24 Sep 2011 12:15:48 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[AM radio]]></category>
		<category><![CDATA[car game]]></category>
		<category><![CDATA[fallacies]]></category>
		<category><![CDATA[logic]]></category>
		<category><![CDATA[road trip game]]></category>
		<category><![CDATA[talk show]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=1497</guid>
		<description><![CDATA[While sitting in the car waiting to give my sister a ride back from a volleyball game, I came up with a new road trip game. There was nothing interesting on the radio stations I usually listen to, so I switched to AM and started scanning. Unsurprisingly, there wasn&#8217;t anything interesting on. I heard a [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://schutt.org/blog/wp-content/uploads/2011/09/am.png" alt="" title="Amplitude Modulated signal" width="250" height="150" class="alignright size-full wp-image-1499" />
<p>While sitting in the car waiting to give my sister a ride back from a volleyball game, I came up with a new road trip game. There was nothing interesting on the radio stations I usually listen to, so I switched to AM and started scanning. Unsurprisingly, there wasn&#8217;t anything interesting on. I heard a few questionable bible lessons, some ballgames, the usual political shows, a french broadcast I could almost understand, bad music, and lots of advertisements. I decided to listen to one of the political shows for a while. After a few minutes I got tired of the false &#8216;facts&#8217; and incoherent reasoning, and switched to another talk station. It didn&#8217;t take long to hear misleading statistics, so I tried again. I made it around the dial three times before the team made it out of the locker room.</p>
<p>This gave me the idea for a new car game: <strong>AM Statistics</strong> or <strong>Check your arithmetic</strong>.</p>
<h2 id="the-rules">The rules</h2>
<ol>
<li>Pick an AM talk radio station</li>
<li>Listen until you hear an obvious mathematical or statistical error</li>
<li>Tune to the next talk radio station</li>
<li>Repeat</li>
</ol>
<p>If you are good, you won&#8217;t have to stay on any one station for long. Try to be the first one to spot an error; or to play alone, see which program requires the fastest station switching.</p>
<h3 id="example-errors">Example errors</h3>
<ul>
<li>You&#8217;ll constantly notice <em>missing numbers</em>, typically involving citing a partial statistic to support a conclusion, but if an extra sentence of clarification and detail was added, the conclusion would have to change.</li>
<li>You&#8217;ll hear the mistake of <em>semiattachment</em>. This means using numbers that appear to be related and good measures, but aren’t. Knowing or noticing the proper definitions helps in finding this error, but it&#8217;s often obvious even from the detail included.</li>
</ul>
<p>If you have read <a href="/writing/#huff1954"><em>How to Lie With Statistics</em></a>, you&#8217;ll have a hard time not finding these errors. These are usually pretty easy to spot even without much subject familiarity. If you know the subject or statistics, there will be plenty more.</p>
<h2 id="expanded-rules">Expanded rules</h2>
<p>To make the game go even faster, you can expand the rule from errors in math and stats by including logical errors. You&#8217;ll probably have to keep your hand on the dial if you are playing <strong>Find the Fallacy</strong>.</p>
<h3 id="example-errors-1">Example errors</h3>
<ul>
<li>It isn&#8217;t hard to find <em>self contradictory statements</em>. If you make it past a commercial break, you&#8217;ll likely hear the argument of the previous segment used on a different topic to reach the opposite conclusion as in the last segment.</li>
<li>If you are paying attention, <em>equivocation</em> will constantly jump out at you. This is when a term is used to mean different things in the same argument. You&#8217;ll often hear equivocation used in a single sentence.</li>
<li>Even if you aren&#8217;t good at finding the other errors, it&#8217;s hard to miss the constant <em>ad hominin</em> attacks.</li>
</ul>
<p>It shouldn&#8217;t take too long to make it around the dial. You could even expand the rules to include any factual errors. If you have passed a history or science class, this will likely be too easy. This game works even better if you have the misfortune of watching cable news.</p>
<hr />
<p>Paul Krugman provides a <a href="http://www.nytimes.com/2011/09/23/opinion/krugman-the-social-contract.html">good example</a> of using misleading statistics in his latest column.</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2011/09/am-statistics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>National Drivers Test</title>
		<link>http://schutt.org/blog/2010/09/national-drivers-test/</link>
		<comments>http://schutt.org/blog/2010/09/national-drivers-test/#comments</comments>
		<pubDate>Tue, 07 Sep 2010 01:57:55 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[cars]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[dangerous drivers]]></category>
		<category><![CDATA[driver's education]]></category>
		<category><![CDATA[driver's license]]></category>
		<category><![CDATA[driving]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=929</guid>
		<description><![CDATA[To continue on my latest post&#8230; GMAC Insurance released their 2010 National Drivers Test results. The findings are interesting: If taken today, 18.4 percent of drivers on the road – amounting to roughly 38 million licensed Americans – would not pass a written drivers test exam. The national average score was 76.2 percent; a score [...]]]></description>
			<content:encoded><![CDATA[<p>To continue on my <a href="http://schutt.org/blog/2010/09/americas-best-drivers/">latest post</a>&#8230;</p>
<p>GMAC Insurance released their <a href="http://www.nationaldriverstest.com/national-drivers-test/research-executive-summary.php">2010 National Drivers Test results</a>. The findings are interesting:</p>
<blockquote><ul>
<li>If taken today, 18.4 percent of drivers on the road – amounting to roughly 38 million licensed Americans – would not pass a written drivers test exam.
</li>
<li>The national average score was 76.2 percent; a score below 70 percent is considered failing.
</li>
<li>Average test scores in 2010 continue to show a slight trending downward, from 76.6 percent in 2009 to 76.2 percent this year and a drop of almost 2 percent from the national average in 2008 (78.1 percent).
</li>
<li>With Age Comes Wisdom: The older the driver, the higher the test score. Males over 45 earned the highest average score.
</li>
<li>Factoring in margin for error, the average test score was significantly higher among males than females (78.1 percent male versus 74.4 percent female). Females also had a higher failure rate than males (24 percent female versus 18.1 percent male).
</li>
</ul>
</blockquote>
<p>I&#8217;m not particularly surprised at the results. It can be hard to remember which of several similar answers is correct, which partially accounts for the low score; but this isn&#8217;t an excuse on something this important. The way I see it, the results are evidence that nearly one fifth of the licensed drivers in the country should have their licenses suspended until they re-pass their licensing requirements. Even this is optimistic. A survey like this can&#8217;t distinguish between book knowledge and its application. How many people can pass the test but never follow the rules afterward? This could be examined by including both the scores on the driver&#8217;s written tests and state accident data in the analysis.</p>
<p>Even in the wildly optimistic case that all drivers are able to re-pass the driver&#8217;s licensing tests after driving for a few years, it still wouldn&#8217;t be enough to ensure safe roads. The driver&#8217;s tests are currently only for the absolute minimum required knowledge and skill to drive a car on public roads, safe driving requires much more skill and awareness. A good step toward this level of driving is to take the <a href="http://www.abateofindiana.org/education/general_info.html" title="American Bikers Aimed Towards Education">ABATE</a> or <a href="http://www.msf-usa.org/" title="Motorcycle Safety Foundation">MSF</a> motorcycle safety test. I think that passing one of these courses should be a prerequisite to applying for a car learner&#8217;s permit. This would make the roads much safer.</p>
<p>The <a href="http://www.nationaldriverstest.com/national-drivers-test/ndt-test.php">GMAC test is online</a> so you can try it yourself. Then you should read my page on <a href="http://schutt.org/velo/driving/">safe driving around bicycles</a>.</p>
<hr />
<p>Source: <a href="http://www.nationaldriverstest.com/">2010 GMAC Insurance National Drivers Test</a> via <a href="http://autos.aol.com/article/study-driving-test/">AOL Autos</a> via <a href="http://content.usatoday.com/communities/driveon/post/2010/09/one-in-five-motorists-now-would-fail-dmv-driving-test-/1">USA Today</a>. Note that this was an online survey.</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/09/national-drivers-test/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>America&#8217;s Best Drivers</title>
		<link>http://schutt.org/blog/2010/09/americas-best-drivers/</link>
		<comments>http://schutt.org/blog/2010/09/americas-best-drivers/#comments</comments>
		<pubDate>Sat, 04 Sep 2010 14:19:31 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[cars]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[drivers]]></category>
		<category><![CDATA[Fort Wayne]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=925</guid>
		<description><![CDATA[This is surprising, according to the latest &#8216;Allstate America&#8217;s Best Drivers Report,&#8217; Fort Wayne ranks eleventh for longest time between accidents among drivers in the two hundred largest cities in the country. Even more surprising, 11th is down from 6th last year. I never would have guessed, but apparently drivers in Fort Wayne are involved [...]]]></description>
			<content:encoded><![CDATA[<p>This is surprising, according to the latest &lsquo;<a href="http://www.allstatenewsroom.com/releases/4654-sixth-annual-allstate-america">Allstate America&#8217;s Best Drivers Report</a>,&rsquo; Fort Wayne ranks eleventh for longest time between accidents among drivers in the two hundred largest cities in the country. Even more surprising, 11<sup>th</sup> is down from 6<sup>th</sup> last year.</p>
<p>I never would have guessed, but apparently drivers in Fort Wayne are involved in accidents 16.3% less than the average driver in the 200 most populous cities in the USA. Just think how much safer we could be if Indiana had slightly less easy licensing tests.</p>
<hr />
<p>Study found through &lsquo;<a href="http://www.journalgazette.net/article/20100903/LOCAL/309039971/1002/LOCAL">For safe driving, city ranks No. 11</a>&rsquo; in the Journal Gazette.</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/09/americas-best-drivers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rare?</title>
		<link>http://schutt.org/blog/2010/03/abortion/</link>
		<comments>http://schutt.org/blog/2010/03/abortion/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 11:06:32 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[politics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[abortion]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=668</guid>
		<description><![CDATA[If you&#8217;ve paid attention to national political news at any point in the last 18 years, I&#8217;m sure you&#8217;ve heard variations on the saying &#8216;safe and legal, but rare,&#8217; when discussing abortion. Beyond the fact that a procedure where fewer than half of the patients survive can never be considered &#8216;safe,&#8217; how is &#8216;rare&#8217; defined? [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve paid attention to national political news at any point in the last 18 years, I&#8217;m sure you&#8217;ve heard variations on the saying &lsquo;<a href="http://www.presidency.ucsb.edu/ws/index.php?pid=46219&#038;st=safe%20and%20legal,%20but%20rare">safe and legal, but rare</a>,&rsquo; when discussing abortion. Beyond the fact that a procedure where fewer than half of the patients survive can never be considered &lsquo;safe,&rsquo; how is &lsquo;rare&rsquo; defined? From the folks who use the &lsquo;<a href="http://www.presidency.ucsb.edu/ws/index.php?pid=47104&#038;st=legal,%20safe,%20and%20rare">legal, safe, and rare</a>&rsquo; statement, you&#8217;d get the impression that abortion is already fairly infrequent, and that they&#8217;d just like to reduce the numbers a little further. I figured they understated the frequency, but I never examined the data myself. Then I saw a t-shirt with the statement &lsquo;1/4 of my generation is missing&rsquo;:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/quarter-missing-tshirt.jpg"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/quarter-missing-tshirt.jpg" alt="" title="1/4 of my generation is missing tshirt" width="320" height="117" class="alignnone size-full wp-image-688" /></a></p>
<p>This is much higher than I expected, so I decided to check the numbers myself. I found the data on the CDC website and plotted it along with moving averages to define &lsquo;generation&rsquo;:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/abor-frac_1970-2005.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/abor-frac_1970-2005.png" alt="" title="USA Abortion Fraction 1970-2005" width="340" height="378" class="alignnone size-full wp-image-670" /></a></p>
<p>Huh. That&#8217;s much higher than I expected. I&#8217;d figured it would be around a tenth that rate, and at most up to half that rate. These higher than expected numbers mean that one-quarter <em>is</em> a reasonable estimate. Depending on how &lsquo;my generation&rsquo; is defined, the one-quarter figure may be a little high, but it is at minimum one-fifth. Under any reasonable definition, there is no way I&#8217;d consider 1/5 to be &lsquo;rare.&rsquo; It&#8217;s a stretch to call one-tenth rare. I&#8217;d hesitate to call 1/36&#8212;the <a href="http://mathworld.wolfram.com/Dice.html">probability of rolling snake eyes</a>&#8212;rare. I&#8217;d consider a general definition of rare to start around 1/500, approximately the <a href="http://mathworld.wolfram.com/Poker.html">probability of drawing a flush</a> in poker. For diseases, the NIH uses a <a href="http://rarediseases.info.nih.gov/RareDiseaseList.aspx">definition</a> somewhere around 1/1500. Even if the abortion rate in the USA is lowered to one sixth the current level, it still wouldn&#8217;t fit in the loosest of these definitions of rare. This means that, even for a generous definition, we have a long way to go before abortion could be considered &lsquo;rare.&rsquo; It&#8217;s worth noting that the pro-abortion politicians mostly stopped using the &lsquo;rare&rsquo; statement a couple years ago, showing that they were probably never serious about it.</p>
<hr />
<p>Note: Data are from the US Centers for Disease Control and Prevention. The <a href="http://www.cdc.gov/mmwr/">CDC MMWR</a> abortion reports are released at the end of November with a three year lag. &lsquo;Fetal loss&rsquo; other than induced abortion is excluded from the data I used in the plots. The numbers include both surgical and medical (non-surgical) abortions. Data is abortion rate per 1000 live births, I converted it to a fraction to clarify the plots. The full report is worth a look. Since it is split between two tables in the report, here is a plot of when during a pregnancy abortions are performed:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/gestation.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/gestation.png" alt="" title="USA Percent of abortions by weeks of gestation" width="340" height="318" class="alignnone size-full wp-image-669" /></a></p>
<p>Sources:</p>
<ul>
<li>CDC Morbidity and Mortality Weekly Report, Surveillance Summaries, <a href="http://www.cdc.gov/mmwr/PDF/ss/ss5713.pdf">Abortion Surveillance — United States, 2006</a>, November 28, 2008 / Vol. 57 / No. SS-13</li>
<li>CDC Morbidity and Mortality Weekly Report, Surveillance Summaries, <a href="http://www.cdc.gov/mmwr/PDF/ss/ss5808.pdf">Abortion Surveillance — United States, 2006</a>, November 27, 2009 / Vol. 58 / No. SS-8</li>
<li><a href="http://www.democrats.org/pdfs/2004platform.pdf">The 2004 Democratic National Platform for America</a>. Includes &lsquo;Abortion should be safe, legal, and rare.&rsquo;</li>
<li><a href="http://www.democrats.org/page/-/pdf/dem-platform.pdf">The 2004 Democratic National Platform</a>. Uses &lsquo;safe and legal abortion, regardless of ability to pay.&rsquo;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/03/abortion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Term Limits</title>
		<link>http://schutt.org/blog/2010/03/term-limits/</link>
		<comments>http://schutt.org/blog/2010/03/term-limits/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 00:14:29 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[politics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[term limits]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=626</guid>
		<description><![CDATA[It&#8217;s amazing how often a little data can overthrow conventional wisdom. Today&#8217;s example is term limits. I had long thought that most elected offices should have strict term limits to solve the problem of the same politicians staying in office as a career, loosing touch with their constituents. Thinking about it a bit more a [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s amazing how often a little data can overthrow conventional wisdom. Today&#8217;s example is term limits. I had long thought that most elected offices should have strict term limits to solve the problem of the same politicians staying in office as a career, loosing touch with their constituents. Thinking about it a bit more a couple years ago convinced me that we shouldn&#8217;t have term limits, but should have a maximum number of consecutive terms in a particular office. Some more thinking lead me to realize that term limits can create other problems. Through this whole process I thought that congress was mostly full of the same old people who have been in office most of my life. A <a href="http://www.fivethirtyeight.com/2010/03/throw-all-bums-out-bad-idea.html">post</a> on a blog I occasionally read brought the subject back to mind, and this time I checked the data. I found and plotted the time served by all sitting representatives:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/house.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/house.png" alt="" title="Number of Representatives by Time in US House" width="289" height="340" class="size-full wp-image-628" /></a></p>
<p>This doesn&#8217;t match the distribution I expected. After looking at the House, I examined the results for the Senate:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/senate.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/senate.png" alt="" title="Number of Senators by Years in US Senate" width="304" height="338" class="size-full wp-image-629" /></a></p>
<p>This looks a little broader than the House, but still not quite as biased toward long times in office as I had expected. To be a little more thorough, I calculated basic summary statistics:</p>
<table>
<caption>Years served in US Congress</caption>
<thead>
<th></th>
<th>House</th>
<th>Senate</th>
</thead>
<tbody>
<tr>
<th align="right">Mean</th>
<td>11.05</td>
<td>12.79</td>
</tr>
<tr>
<th align="right">Median</th>
<td>9.00</td>
<td>11.00</td>
</tr>
<tr>
<th align="right">Standard deviation</th>
<td>9.07</td>
<td>10.90</td>
</tr>
</tbody>
</table>
<p>It turns out that careerism isn&#8217;t nearly as bad as I thought. Of course, the summary stats don&#8217;t quite give the true answer, because they don&#8217;t take into account that this is the time served <em>so far</em> by those currently sitting in congress. What I really want to see is the total time served by all those now in congress, after they have retired or lost an election. Basically, I&#8217;d expect this to show a result slightly lower than the true average.</p>
<p>A <a href="http://www.cato.org/pubs/journal/cj14n3-2.html">little reading</a> shows that the average reelection rate is about 84% (for 1982 to 1994, I don&#8217;t have more recent data). Using this number with some simple statistics gives a theoretical mean time in the House almost identical to the average of those currently serving. However, the measured variance is significantly larger than the expected variance.</p>
<p>Now that I have a more accurate view of the makeup of congress, I can look at the reasons behind my previous misconception. The first reason is the number of times I have heard politicians and others talk about the need for term limits. The exaggerated view of careerism is reinforced by the fact that much of the congressional leadership has been in office for years. Since they are the ranking members, they appear to be disproportionately represented in national news stories. The fact that Indiana has only had <a href="http://en.wikipedia.org/wiki/United_States_congressional_delegations_from_Indiana#United_States_Senate">three senators</a> since I have lived here, and my House district has been <strike>mis</strike>represented by <a href="http://sourcewatch.org/index.php?title=Mark_Souder#Term-limit_pledge">Mark Souder</a> since 1995, contribute to the appearance of a permanent congress.</p>
<p>This leads to the question of what should be done. The problem clearly isn&#8217;t as bad as I originally expected, but is it really a problem? A year or two ago, I realized that overly strict term limits for the Senate would undermine the purpose of having a <a href="http://en.wikipedia.org/wiki/Bicameralism">bicameral legislature</a>. But what about the House? There is <a href="http://dx.doi.org/10.3162/036298010790821978">evidence</a> that strict term limits can create new problems. For now, I think that term limits in the Senate are unimportant, and I am undecided about the House. I&#8217;ll stick to my general rule of voting against incumbents that aren&#8217;t significantly better than their opponent. One final point to remember:</p>
<blockquote><p>The founding fathers, by the way, did give us the best system of term limits there is. If you don’t like what they do, vote ’em out.<br />&#8211;<a href="http://jackshow.blogs.com/jack/2010/03/essay-the-failuer-of-term-limits-3910.html">Jack Lessenberry</a>
</p></blockquote>
<hr />
<p>Suggested reading:</p>
<ul>
<li><a href="http://www.fivethirtyeight.com/2010/03/throw-all-bums-out-bad-idea.html">Throw All The Bums Out? Bad Idea</a> by Tom Schaller on FiveThirtyEight</li>
<li><a href="http://www.cato.org/pubs/journal/cj14n3-2.html">The Entrenching of Incumbency: Reelections in the U.S. House of Representatives, 1790-1994</a> by Stephen C. Erickson in the Cato Journal. Being a Cato Institute publication, this talks about time in office tending to corrupt, but ignores other sources of corruption. Erickson writes about politicians bowing to special interests to be reelected, but ignores the role of special interests in getting them elected in the first place.</li>
<li><a href="http://dx.doi.org/10.3162/036298010790821978">Legislators and Administrators: Complex Relationships Complicated by Term Limits</a> by Sarbaugh-Thompson et alli. Legislative Studies Quarterly isn&#8217;t open access, but a number of stories <a href="http://jackshow.blogs.com/jack/2010/03/essay-the-failuer-of-term-limits-3910.html">report</a> on this paper.</li>
<li><a href="http://www.nytimes.com/1990/01/05/opinion/what-permanent-congress.html">What &#8216;Permanent Congress&#8217;?</a> by Mickey Edwards (R-OK) in the NYTimes. It is interesting how opinions shift according to which party has a majority.</li>
<li><a href="http://books.google.com/books?id=NSFntwPRYmUC">Reelection rates of incumbents</a> By David C. Huckabee. This is where I found the reelection rate.</li>
</ul>
<p>Future problems:</p>
<ul>
<li>Look at the age of representatives during their first term. A large portion of the drop off could be from retirement.</li>
<li>Look at the careers of legislators before and after their time in congress.</li>
<li>Look at how being chosen for appointed offices (e.g. the Cabinet) change the numbers.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/03/term-limits/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Olympics</title>
		<link>http://schutt.org/blog/2010/03/olympics/</link>
		<comments>http://schutt.org/blog/2010/03/olympics/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 01:06:01 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[2010 Olympics]]></category>
		<category><![CDATA[Vancouver]]></category>
		<category><![CDATA[Whistler]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=570</guid>
		<description><![CDATA[I enjoyed watching the Whistler/Vancouver Olympics. As anyone who knows me would guess, I was excited to watch the Nordic events, since the Olympics are the only time they are on broadcast TV in the USA. This was a good year for Team USA, with the first Nordic Combined medals. And it&#8217;s fun to watch [...]]]></description>
			<content:encoded><![CDATA[<p>I enjoyed watching the Whistler/Vancouver Olympics. As anyone who knows me would guess, I was excited to watch the Nordic events, since the Olympics are the only time they are on broadcast TV in the USA. This was a good year for Team USA, with the first Nordic Combined medals. And it&#8217;s fun to watch some of the speed skating, Alpine skiing, and hockey. NBC&#8217;s coverage of the Olympics seemed better than usual, but still has plenty of room for improvement. They did an OK job of showing the Nordic sports, but still wasted a lot of time on things like figure skating warmups and personal interest stories.</p>
<p>With the TV and newspapers constantly referencing medal counts, I thought it would be interesting to come up with a good measure of how well each country performed. The problem with medal counts is that it gives an advantage to large countries that enter a  number of athletes. I am much more impressed when a small or poor country enters a couple athletes who perform better than expected than I am by professional athletes from larger countries. One way create a combined score would be to give 3 points for gold, 2 for silver, 1 for bronze, then add the points up by country and divide by the number of athletes from that country. A better way would be something like:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/olyeqn.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/olyeqn.png" alt="\frac{1}{c_a}\sum_{s}{\frac{1}{a}\sum_{a}{\frac{n - a_p + 1}{n}}}" title="\frac{1}{c_a}\sum_{s}{\frac{1}{a}\sum_{a}{\frac{n - a_p + 1}{n}}}" width="181" height="45" class="alignnone size-full wp-image-574" /></a></p>
<p>Where:<br />
n == number of athletes in an event<br />
a<sub>p</sub> == the finishing rank of an athlete (1 = gold)<br />
a == the number of athletes a nation entered in an event<br />
s == all events<br />
c<sub>a</sub> == the number of events a nation entered</p>
<p>This will give an average score for each country, compensating for some countries entering more athletes. While watching a couple events, I started scraping the results from the Vancouver 2010 website. This took a longer than expected, so I never got around to doing the calculations. Since I haven&#8217;t seen the results posted as nice clean CSV files anywhere, I&#8217;ll post them for others to use. I ran out of time before including the results for curling and hockey.</p>
<p>Now, if only the Olympics would go back to being amateurs only&#8230;</p>
<p>Files:</p>
<ul>
<li><a href="http://schutt.org/files/2010-olympics.zip" title="2010 Winter Olympics results spreadsheets (CSV/TSV files)">2010 Winter Olympics results spreadsheets</a> (does not include hockey and curling)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/03/olympics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Playing up the numbers</title>
		<link>http://schutt.org/blog/2010/01/playing-up-the-numbers/</link>
		<comments>http://schutt.org/blog/2010/01/playing-up-the-numbers/#comments</comments>
		<pubDate>Sat, 30 Jan 2010 18:39:58 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[politics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[abortion]]></category>
		<category><![CDATA[Guttmacher Institute]]></category>
		<category><![CDATA[USA Today]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=519</guid>
		<description><![CDATA[This week I read a USA Today story (from the first screen of Tuesday&#8217;s homepage) that made a couple common mistakes. The newspaper&#8217;s mistake is basing a story on a press release from an advocacy group, instead of doing an independant story based on the study itself. As frequently happens, the press release commits a [...]]]></description>
			<content:encoded><![CDATA[<p>This week I read a USA Today <a href="http://www.usatoday.com/news/health/2010-01-26-1Ateenpregnancy26_ST_N.htm" title="USA Today">story</a> (from the first screen of Tuesday&#8217;s homepage) that made a couple common mistakes. The newspaper&#8217;s mistake is basing a story on a <a href="http://www.guttmacher.org/media/nr/2010/01/26/index.html" title="Guttmacher press release">press release</a> from an advocacy group, instead of doing an independant story based on the <a href="http://www.guttmacher.org/pubs/USTPtrends.pdf" title="Guttmacher study">study itself</a>. As frequently happens, the press release commits a common error that is not in the study itself.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/01/abor.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/01/abor.png" alt="Abortion rates for American teen girls" title="Abortion rates for American teen girls" width="398" height="417" class="aligncenter size-full wp-image-520" /></a></p>
<p>The story is based on press release from the Guttmacher Institute, an abortion advocacy group, originally a division of the Planned Parenthood Federation of America. I have no reason to doubt the published numbers, but the press release makes the mistake of over-interpreting the data to agree with predetermined conclusions. The annual change in the numbers from 2005 to 2006 isn&#8217;t large enough to draw a conclusion, yet the press release attributes the change to policies they oppose. This is what <a href="http://schutt.org/writing/reviews/huff-how_to_lie_with_statistics.pdf" title="How to Lie With Statistics by Darrell Huff">Darrell Huff</a> would call playing up numbers, and the wording could be considered cherry-picking. This is exactly the same kind of mistake I see every slightly cool day during the summer when someone (often in the news) claims that it disproves anthropogenic global warming. While it is possible that the Guttmacher Institute&#8217;s conclusion is correct, the evidence is not yet strong enough to make a conclusion. The Guttmacher Institute&#8217;s press release presents an explanation for 1995 through 2006, leaving out an explanation of the data from 1986 to 1995. This is a problem because there is a larger unexplained peak in 1988 and the decline begins in 1989, not 1995. At the present time, without presenting stronger evidence, an equally plausible explanation is that the 2005 to 2006 change merely represents the expected annual fluctuations around a steady state, and that the slow in the decline is simply due to approaching the steady state. To make their conclusions will require a longer trend, and to explain the prior changes, not just assume the change is due to policies they oppose. It is important to remember to actually look at data and to know that the world is more complex than advocacy groups pretend.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/01/aborchange.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/01/aborchange.png" alt="Change in abortion rates for American teen girls" title="Change in abortion rates for American teen girls" width="398" height="337" class="aligncenter size-full wp-image-521" /></a></p>
<p><span class="update">Update 2010-02-02:</span> A <a href="http://archpedi.ama-assn.org/cgi/content/short/164/2/152" title="Efficacy of a Theory-Based Abstinence-Only Intervention Over 24 Months">new study</a> published in the Archives of Pediatric and Adolescent Medicine provides evidence that the Guttmacher Institute&#8217;s attribution of the changes in the abortion rate is likely incorrect. Unlike the Guttmacher Institute&#8217;s conclusion, this is published in a peer-reviewed journal. (Found through a <a href="http://www.washingtonpost.com/wp-dyn/content/article/2010/02/01/AR2010020102628.html" title="WaPo: Abstinence-only programs might work, study says">story</a> in the Washington Post.)</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/01/playing-up-the-numbers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zipf&#8217;s Law</title>
		<link>http://schutt.org/blog/2009/10/zipf-law/</link>
		<comments>http://schutt.org/blog/2009/10/zipf-law/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 16:31:54 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mandelbrot]]></category>
		<category><![CDATA[Zipf]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=388</guid>
		<description><![CDATA[I&#8217;ve run across the interesting Zipfian distribution several times recently. Zipf&#8217;s law states that for many things, particularly words, the frequency is inversely proportional to the rank of the frequency. So, for example, the most common word is used twice as often as the second most common word, which is used twice as often as [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve run across the interesting <a href="http://mathworld.wolfram.com/ZipfDistribution.html">Zipfian distribution</a> several times recently. <a href="http://mathworld.wolfram.com/ZipfsLaw.html">Zipf&#8217;s law</a> states that for many things, particularly words, the frequency is inversely proportional to the rank of the frequency. So, for example, the most common word is used twice as often as the second most common word, which is used twice as often as the third most common word, and so on. This means that if you plot rank vs. count on a log-log plot, you will see a straight line.</p>
<p>I decided to try it on several of the most common religious texts. The key requirement was that I could download them from Project Gutenberg; one of my favorite websites. A second similarity is that they are rarely studied or understood by many of their professing followers. I also tried with several other texts, but left them off the chart for simplicity.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2009/10/zipf21.png"><img src="http://schutt.org/blog/wp-content/uploads/2009/10/zipf21.png" alt="Zipf's Law for common religious texts." title="Zipf's Law for common religious texts." width="346" height="349" class="aligncenter size-full wp-image-491" /></a></p>
<p>This rule also works well for problems outside of linguistics, for example: city populations, website visits, et cetera. I plotted the frequency of hits to different pages on my website and for city sizes in the United States, and found fairly good fits.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2009/10/cities2.png"><img src="http://schutt.org/blog/wp-content/uploads/2009/10/cities2.png" alt="Zipf's law for US cities" title="Zipf's law for US cities" width="345" height="349" class="aligncenter size-full wp-image-492" /></a></p>
<p>Zipf&#8217;s Law, and later <a href="http://en.wikipedia.org/wiki/Zipf–Mandelbrot_law">extensions</a>, are common and useful enough that there is an [R] package for dealing with them. The package developers, Stefan Evert and Marco Baroni, posted a good <a href="http://zipfr.r-forge.r-project.org//#counting_words">introduction</a> to Zipf&#8217;s law and its uses on the zipfR project page. Their presentation materials are also a good example of slide design.</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2009/10/zipf-law/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>One-hundred and ten</title>
		<link>http://schutt.org/blog/2009/07/110/</link>
		<comments>http://schutt.org/blog/2009/07/110/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 17:35:57 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[All-Star]]></category>
		<category><![CDATA[baseball]]></category>
		<category><![CDATA[doping]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=275</guid>
		<description><![CDATA[I had a flash of inspiration this morning. A story about the All-Star game came on the radio and got me thinking. I&#8217;ve always been annoyed when people, often athletes, say &#8220;I gave it one-hundred and ten percent,&#8221; or something along those lines. Now, there are plenty of times when percentages over 100 make sense, [...]]]></description>
			<content:encoded><![CDATA[<p>I had a flash of inspiration this morning. A <a href="http://www.npr.org/templates/story/story.php?storyId=106615999" title="Frank Deford on NPR">story</a> about the <a href="http://www.nytimes.com/2009/07/15/sports/baseball/15allstar.html" title="2009 MLB All-Star game">All-Star game</a> came on the radio and got me thinking.</p>
<p>I&#8217;ve always been annoyed when people, often athletes, say &ldquo;I gave it one-hundred and ten percent,&rdquo; or something along those lines. Now, there are plenty of times when percentages over 100 make sense, but&#8212;by definition&#8212;your best <em>is</em> 100%. Unless . . . That&#8217;s it! The athletes <em>are</em> giving one-hundred and ten percent of their natural ability. They are admitting to <a href="http://www.wada-ama.org/en/" title="WADA">doping</a>. Their apparent abuse of basic math is really a veiled admission to using performance-enhancing drugs. The extra ten percent is the drugs!</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2009/07/110/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Obesity</title>
		<link>http://schutt.org/blog/2009/07/obesity/</link>
		<comments>http://schutt.org/blog/2009/07/obesity/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 12:30:30 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[Indiana]]></category>
		<category><![CDATA[obesity]]></category>
		<category><![CDATA[overweight]]></category>
		<category><![CDATA[USA]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=232</guid>
		<description><![CDATA[One of today&#8217;s headlines is &#8220;Hoosier obesity rate flat at 27.4%.&#8221; The article cites a RWJF/Trust for America&#8217;s Health report, which included this map and an interesting interactive map. The original data are from the CDC National Center for Health Statistics. Sources: The Journal Gazette: Hoosier obesity rate flat at 27.4% Trust for America&#8217;s Health: [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.rwjf.org/pr/interactive.jsp?id=37" title="interactive map of American obesity"><img src="http://schutt.org/blog/wp-content/uploads/2009/07/obesity-1991-2008.png" alt="Obesity 1991 to 2008" title="Obesity 1991 to 2008" width="438" height="241" class="aligncenter size-full wp-image-233" /></a></p>
<p>One of today&#8217;s headlines is &ldquo;<a href="http://www.journalgazette.net/article/20090702/LOCAL/307029948" title="Hoosier obesity rate flat at 27.4%">Hoosier obesity rate flat at 27.4%</a>.&rdquo; The article cites a RWJF/Trust for America&#8217;s Health report, which included this map and an interesting <a href="http://www.rwjf.org/pr/interactive.jsp?id=37">interactive map</a>. The original data are from the CDC National Center for Health Statistics.</p>
<p>Sources:</p>
<ul>
<li>The Journal Gazette: <a href="http://www.journalgazette.net/article/20090702/LOCAL/307029948" title="Hoosier obesity rate flat at 27.4%">Hoosier obesity rate flat at 27.4%</a>
</li>
<li>Trust for America&#8217;s Health: <a href="http://healthyamericans.org/reports/obesity2009/" title="F as in Fat 2009">F as in Fat 2009</a>
</li>
<li>RWJF: <a href="http://www.rwjf.org/childhoodobesity/product.jsp?id=45050" title="F as in Fat 2009">F as in Fat 2009</a>
</li>
<li>CDC: <a href="http://www.cdc.gov/nchs/fastats/overwt.htm" title="CDC">Overweight Prevalence</a>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2009/07/obesity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

