<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Noel Schutt &#187; statistics</title>
	<atom:link href="http://schutt.org/blog/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://schutt.org/blog</link>
	<description></description>
	<lastBuildDate>Wed, 21 Jul 2010 16:15:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Rare?</title>
		<link>http://schutt.org/blog/2010/03/abortion/</link>
		<comments>http://schutt.org/blog/2010/03/abortion/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 11:06:32 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[Politics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[abortion]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=668</guid>
		<description><![CDATA[If you&#8217;ve paid attention to national political news at any point in the last 18 years, I&#8217;m sure you&#8217;ve heard variations on the saying &#8216;safe and legal, but rare,&#8217; when discussing abortion. Beyond the fact that a procedure where fewer than half of the patients survive can never be considered &#8216;safe,&#8217; how is &#8216;rare&#8217; defined? [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve paid attention to national political news at any point in the last 18 years, I&#8217;m sure you&#8217;ve heard variations on the saying &lsquo;<a href="http://www.presidency.ucsb.edu/ws/index.php?pid=46219&#038;st=safe%20and%20legal,%20but%20rare">safe and legal, but rare</a>,&rsquo; when discussing abortion. Beyond the fact that a procedure where fewer than half of the patients survive can never be considered &lsquo;safe,&rsquo; how is &lsquo;rare&rsquo; defined? From the folks who use the &lsquo;<a href="http://www.presidency.ucsb.edu/ws/index.php?pid=47104&#038;st=legal,%20safe,%20and%20rare">legal, safe, and rare</a>&rsquo; statement, you&#8217;d get the impression that abortion is already fairly infrequent, and that they&#8217;d just like to reduce the numbers a little further. I figured they understated the frequency, but I never examined the data myself. Then I saw a t-shirt with the statement &lsquo;1/4 of my generation is missing&rsquo;:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/quarter-missing-tshirt.jpg"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/quarter-missing-tshirt.jpg" alt="" title="1/4 of my generation is missing tshirt" width="320" height="117" class="alignnone size-full wp-image-688" /></a></p>
<p>This is much higher than I expected, so I decided to check the numbers myself. I found the data on the CDC website and plotted it along with moving averages to define &lsquo;generation&rsquo;:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/abor-frac_1970-2005.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/abor-frac_1970-2005.png" alt="" title="USA Abortion Fraction 1970-2005" width="340" height="378" class="alignnone size-full wp-image-670" /></a></p>
<p>Huh. That&#8217;s much higher than I expected. I&#8217;d figured it would be around a tenth that rate, and at most up to half that rate. These higher than expected numbers mean that one-quarter <em>is</em> a reasonable estimate. Depending on how &lsquo;my generation&rsquo; is defined, the one-quarter figure may be a little high, but it is at minimum one-fifth. Under any reasonable definition, there is no way I&#8217;d consider 1/5 to be &lsquo;rare.&rsquo; It&#8217;s a stretch to call one-tenth rare. I&#8217;d hesitate to call 1/36&#8212;the <a href="http://mathworld.wolfram.com/Dice.html">probability of rolling snake eyes</a>&#8212;rare. I&#8217;d consider a general definition of rare to start around 1/500, approximately the <a href="http://mathworld.wolfram.com/Poker.html">probability of drawing a flush</a> in poker. For diseases, the NIH uses a <a href="http://rarediseases.info.nih.gov/RareDiseaseList.aspx">definition</a> somewhere around 1/1500. Even if the abortion rate in the USA is lowered to one sixth the current level, it still wouldn&#8217;t fit in the loosest of these definitions of rare. This means that, even for a generous definition, we have a long way to go before abortion could be considered &lsquo;rare.&rsquo; It&#8217;s worth noting that the pro-abortion politicians mostly stopped using the &lsquo;rare&rsquo; statement a couple years ago, showing that they were probably never serious about it.</p>
<hr />
<p>Note: Data are from the US Centers for Disease Control and Prevention. The <a href="http://www.cdc.gov/mmwr/">CDC MMWR</a> abortion reports are released at the end of November with a three year lag. &lsquo;Fetal loss&rsquo; other than induced abortion is excluded from the data I used in the plots. The numbers include both surgical and medical (non-surgical) abortions. Data is abortion rate per 1000 live births, I converted it to a fraction to clarify the plots. The full report is worth a look. Since it is split between two tables in the report, here is a plot of when during a pregnancy abortions are performed:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/gestation.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/gestation.png" alt="" title="USA Percent of abortions by weeks of gestation" width="340" height="318" class="alignnone size-full wp-image-669" /></a></p>
<p>Sources:</p>
<ul>
<li>CDC Morbidity and Mortality Weekly Report, Surveillance Summaries, <a href="http://www.cdc.gov/mmwr/PDF/ss/ss5713.pdf">Abortion Surveillance — United States, 2006</a>, November 28, 2008 / Vol. 57 / No. SS-13</li>
<li>CDC Morbidity and Mortality Weekly Report, Surveillance Summaries, <a href="http://www.cdc.gov/mmwr/PDF/ss/ss5808.pdf">Abortion Surveillance — United States, 2006</a>, November 27, 2009 / Vol. 58 / No. SS-8</li>
<li><a href="http://www.democrats.org/pdfs/2004platform.pdf">The 2004 Democratic National Platform for America</a>. Includes &lsquo;Abortion should be safe, legal, and rare.&rsquo;</li>
<li><a href="http://www.democrats.org/page/-/pdf/dem-platform.pdf">The 2004 Democratic National Platform</a>. Uses &lsquo;safe and legal abortion, regardless of ability to pay.&rsquo;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/03/abortion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Term Limits</title>
		<link>http://schutt.org/blog/2010/03/term-limits/</link>
		<comments>http://schutt.org/blog/2010/03/term-limits/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 00:14:29 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[Politics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[term limits]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=626</guid>
		<description><![CDATA[It&#8217;s amazing how often a little data can overthrow conventional wisdom. Today&#8217;s example is term limits. I had long thought that most elected offices should have strict term limits to solve the problem of the same politicians staying in office as a career, loosing touch with their constituents. Thinking about it a bit more a [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s amazing how often a little data can overthrow conventional wisdom. Today&#8217;s example is term limits. I had long thought that most elected offices should have strict term limits to solve the problem of the same politicians staying in office as a career, loosing touch with their constituents. Thinking about it a bit more a couple years ago convinced me that we shouldn&#8217;t have term limits, but should have a maximum number of consecutive terms in a particular office. Some more thinking lead me to realize that term limits can create other problems. Through this whole process I thought that congress was mostly full of the same old people who have been in office most of my life. A <a href="http://www.fivethirtyeight.com/2010/03/throw-all-bums-out-bad-idea.html">post</a> on a blog I occasionally read brought the subject back to mind, and this time I checked the data. I found and plotted the time served by all sitting representatives:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/house.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/house.png" alt="" title="Number of Representatives by Time in US House" width="289" height="340" class="size-full wp-image-628" /></a></p>
<p>This doesn&#8217;t match the distribution I expected. After looking at the House, I examined the results for the Senate:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/senate.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/senate.png" alt="" title="Number of Senators by Years in US Senate" width="304" height="338" class="size-full wp-image-629" /></a></p>
<p>This looks a little broader than the House, but still not quite as biased toward long times in office as I had expected. To be a little more thorough, I calculated basic summary statistics:</p>
<table>
<caption>Years served in US Congress</caption>
<thead>
<th></th>
<th>House</th>
<th>Senate</th>
</thead>
<tbody>
<tr>
<th align="right">Mean</th>
<td>11.05</td>
<td>12.79</td>
</tr>
<tr>
<th align="right">Median</th>
<td>9.00</td>
<td>11.00</td>
</tr>
<tr>
<th align="right">Standard deviation</th>
<td>9.07</td>
<td>10.90</td>
</tr>
</tbody>
</table>
<p>It turns out that careerism isn&#8217;t nearly as bad as I thought. Of course, the summary stats don&#8217;t quite give the true answer, because they don&#8217;t take into account that this is the time served <em>so far</em> by those currently sitting in congress. What I really want to see is the total time served by all those now in congress, after they have retired or lost an election. Basically, I&#8217;d expect this to show a result slightly lower than the true average.</p>
<p>A <a href="http://www.cato.org/pubs/journal/cj14n3-2.html">little reading</a> shows that the average reelection rate is about 84% (for 1982 to 1994, I don&#8217;t have more recent data). Using this number with some simple statistics gives a theoretical mean time in the House almost identical to the average of those currently serving. However, the measured variance is significantly larger than the expected variance.</p>
<p>Now that I have a more accurate view of the makeup of congress, I can look at the reasons behind my previous misconception. The first reason is the number of times I have heard politicians and others talk about the need for term limits. The exaggerated view of careerism is reinforced by the fact that much of the congressional leadership has been in office for years. Since they are the ranking members, they appear to be disproportionately represented in national news stories. The fact that Indiana has only had <a href="http://en.wikipedia.org/wiki/United_States_congressional_delegations_from_Indiana#United_States_Senate">three senators</a> since I have lived here, and my House district has been <strike>mis</strike>represented by <a href="http://sourcewatch.org/index.php?title=Mark_Souder#Term-limit_pledge">Mark Souder</a> since 1995, contribute to the appearance of a permanent congress.</p>
<p>This leads to the question of what should be done. The problem clearly isn&#8217;t as bad as I originally expected, but is it really a problem? A year or two ago, I realized that overly strict term limits for the Senate would undermine the purpose of having a <a href="http://en.wikipedia.org/wiki/Bicameralism">bicameral legislature</a>. But what about the House? There is <a href="http://dx.doi.org/10.3162/036298010790821978">evidence</a> that strict term limits can create new problems. For now, I think that term limits in the Senate are unimportant, and I am undecided about the House. I&#8217;ll stick to my general rule of voting against incumbents that aren&#8217;t significantly better than their opponent. One final point to remember:</p>
<blockquote><p>The founding fathers, by the way, did give us the best system of term limits there is. If you don’t like what they do, vote ’em out.<br />&#8211;<a href="http://jackshow.blogs.com/jack/2010/03/essay-the-failuer-of-term-limits-3910.html">Jack Lessenberry</a>
</p></blockquote>
<hr />
<p>Suggested reading:</p>
<ul>
<li><a href="http://www.fivethirtyeight.com/2010/03/throw-all-bums-out-bad-idea.html">Throw All The Bums Out? Bad Idea</a> by Tom Schaller on FiveThirtyEight</li>
<li><a href="http://www.cato.org/pubs/journal/cj14n3-2.html">The Entrenching of Incumbency: Reelections in the U.S. House of Representatives, 1790-1994</a> by Stephen C. Erickson in the Cato Journal. Being a Cato Institute publication, this talks about time in office tending to corrupt, but ignores other sources of corruption. Erickson writes about politicians bowing to special interests to be reelected, but ignores the role of special interests in getting them elected in the first place.</li>
<li><a href="http://dx.doi.org/10.3162/036298010790821978">Legislators and Administrators: Complex Relationships Complicated by Term Limits</a> by Sarbaugh-Thompson et alli. Legislative Studies Quarterly isn&#8217;t open access, but a number of stories <a href="http://jackshow.blogs.com/jack/2010/03/essay-the-failuer-of-term-limits-3910.html">report</a> on this paper.</li>
<li><a href="http://www.nytimes.com/1990/01/05/opinion/what-permanent-congress.html">What &#8216;Permanent Congress&#8217;?</a> by Mickey Edwards (R-OK) in the NYTimes. It is interesting how opinions shift according to which party has a majority.</li>
<li><a href="http://books.google.com/books?id=NSFntwPRYmUC">Reelection rates of incumbents</a> By David C. Huckabee. This is where I found the reelection rate.</li>
</ul>
<p>Future problems:</p>
<ul>
<li>Look at the age of representatives during their first term. A large portion of the drop off could be from retirement.</li>
<li>Look at the careers of legislators before and after their time in congress.</li>
<li>Look at how being chosen for appointed offices (e.g. the Cabinet) change the numbers.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/03/term-limits/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Olympics</title>
		<link>http://schutt.org/blog/2010/03/olympics/</link>
		<comments>http://schutt.org/blog/2010/03/olympics/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 01:06:01 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[2010 Olympics]]></category>
		<category><![CDATA[Vancouver]]></category>
		<category><![CDATA[Whistler]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=570</guid>
		<description><![CDATA[I enjoyed watching the Whistler/Vancouver Olympics. As anyone who knows me would guess, I was excited to watch the Nordic events, since the Olympics are the only time they are on broadcast TV in the USA. This was a good year for Team USA, with the first Nordic Combined medals. And it&#8217;s fun to watch [...]]]></description>
			<content:encoded><![CDATA[<p>I enjoyed watching the Whistler/Vancouver Olympics. As anyone who knows me would guess, I was excited to watch the Nordic events, since the Olympics are the only time they are on broadcast TV in the USA. This was a good year for Team USA, with the first Nordic Combined medals. And it&#8217;s fun to watch some of the speed skating, Alpine skiing, and hockey. NBC&#8217;s coverage of the Olympics seemed better than usual, but still has plenty of room for improvement. They did an OK job of showing the Nordic sports, but still wasted a lot of time on things like figure skating warmups and personal interest stories.</p>
<p>With the TV and newspapers constantly referencing medal counts, I thought it would be interesting to come up with a good measure of how well each country performed. The problem with medal counts is that it gives an advantage to large countries that enter a  number of athletes. I am much more impressed when a small or poor country enters a couple athletes who perform better than expected than I am by professional athletes from larger countries. One way create a combined score would be to give 3 points for gold, 2 for silver, 1 for bronze, then add the points up by country and divide by the number of athletes from that country. A better way would be something like:</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/03/olyeqn.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/03/olyeqn.png" alt="\frac{1}{c_a}\sum_{s}{\frac{1}{a}\sum_{a}{\frac{n - a_p + 1}{n}}}" title="\frac{1}{c_a}\sum_{s}{\frac{1}{a}\sum_{a}{\frac{n - a_p + 1}{n}}}" width="181" height="45" class="alignnone size-full wp-image-574" /></a></p>
<p>Where:<br />
n == number of athletes in an event<br />
a<sub>p</sub> == the finishing rank of an athlete (1 = gold)<br />
a == the number of athletes a nation entered in an event<br />
s == all events<br />
c<sub>a</sub> == the number of events a nation entered</p>
<p>This will give an average score for each country, compensating for some countries entering more athletes. While watching a couple events, I started scraping the results from the Vancouver 2010 website. This took a longer than expected, so I never got around to doing the calculations. Since I haven&#8217;t seen the results posted as nice clean CSV files anywhere, I&#8217;ll post them for others to use. I ran out of time before including the results for curling and hockey.</p>
<p>Now, if only the Olympics would go back to being amateurs only&#8230;</p>
<p>Files:</p>
<ul>
<li><a href="http://schutt.org/files/2010-olympics.zip" title="2010 Winter Olympics results spreadsheets (CSV/TSV files)">2010 Winter Olympics results spreadsheets</a> (does not include hockey and curling)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/03/olympics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Playing up the numbers</title>
		<link>http://schutt.org/blog/2010/01/playing-up-the-numbers/</link>
		<comments>http://schutt.org/blog/2010/01/playing-up-the-numbers/#comments</comments>
		<pubDate>Sat, 30 Jan 2010 18:39:58 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[Politics]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[abortion]]></category>
		<category><![CDATA[Guttmacher Institute]]></category>
		<category><![CDATA[USA Today]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=519</guid>
		<description><![CDATA[This week I read a USA Today story (from the first screen of Tuesday&#8217;s homepage) that made a couple common mistakes. The newspaper&#8217;s mistake is basing a story on a press release from an advocacy group, instead of doing an independant story based on the study itself. As frequently happens, the press release commits a [...]]]></description>
			<content:encoded><![CDATA[<p>This week I read a USA Today <a href="http://www.usatoday.com/news/health/2010-01-26-1Ateenpregnancy26_ST_N.htm" title="USA Today">story</a> (from the first screen of Tuesday&#8217;s homepage) that made a couple common mistakes. The newspaper&#8217;s mistake is basing a story on a <a href="http://www.guttmacher.org/media/nr/2010/01/26/index.html" title="Guttmacher press release">press release</a> from an advocacy group, instead of doing an independant story based on the <a href="http://www.guttmacher.org/pubs/USTPtrends.pdf" title="Guttmacher study">study itself</a>. As frequently happens, the press release commits a common error that is not in the study itself.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/01/abor.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/01/abor.png" alt="Abortion rates for American teen girls" title="Abortion rates for American teen girls" width="398" height="417" class="aligncenter size-full wp-image-520" /></a></p>
<p>The story is based on press release from the Guttmacher Institute, an abortion advocacy group, originally a division of the Planned Parenthood Federation of America. I have no reason to doubt the published numbers, but the press release makes the mistake of over-interpreting the data to agree with predetermined conclusions. The annual change in the numbers from 2005 to 2006 isn&#8217;t large enough to draw a conclusion, yet the press release attributes the change to policies they oppose. This is what <a href="http://schutt.org/writing/reviews/huff-how_to_lie_with_statistics.pdf" title="How to Lie With Statistics by Darrell Huff">Darrell Huff</a> would call playing up numbers, and the wording could be considered cherry-picking. This is exactly the same kind of mistake I see every slightly cool day during the summer when someone (often in the news) claims that it disproves anthropogenic global warming. While it is possible that the Guttmacher Institute&#8217;s conclusion is correct, the evidence is not yet strong enough to make a conclusion. The Guttmacher Institute&#8217;s press release presents an explanation for 1995 through 2006, leaving out an explanation of the data from 1986 to 1995. This is a problem because there is a larger unexplained peak in 1988 and the decline begins in 1989, not 1995. At the present time, without presenting stronger evidence, an equally plausible explanation is that the 2005 to 2006 change merely represents the expected annual fluctuations around a steady state, and that the slow in the decline is simply due to approaching the steady state. To make their conclusions will require a longer trend, and to explain the prior changes, not just assume the change is due to policies they oppose. It is important to remember to actually look at data and to know that the world is more complex than advocacy groups pretend.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2010/01/aborchange.png"><img src="http://schutt.org/blog/wp-content/uploads/2010/01/aborchange.png" alt="Change in abortion rates for American teen girls" title="Change in abortion rates for American teen girls" width="398" height="337" class="aligncenter size-full wp-image-521" /></a></p>
<p><span class="update">Update 2010-02-02:</span> A <a href="http://archpedi.ama-assn.org/cgi/content/short/164/2/152" title="Efficacy of a Theory-Based Abstinence-Only Intervention Over 24 Months">new study</a> published in the Archives of Pediatric and Adolescent Medicine provides evidence that the Guttmacher Institute&#8217;s attribution of the changes in the abortion rate is likely incorrect. Unlike the Guttmacher Institute&#8217;s conclusion, this is published in a peer-reviewed journal. (Found through a <a href="http://www.washingtonpost.com/wp-dyn/content/article/2010/02/01/AR2010020102628.html" title="WaPo: Abstinence-only programs might work, study says">story</a> in the Washington Post.)</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2010/01/playing-up-the-numbers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zipf&#8217;s Law</title>
		<link>http://schutt.org/blog/2009/10/zipf-law/</link>
		<comments>http://schutt.org/blog/2009/10/zipf-law/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 16:31:54 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mandelbrot]]></category>
		<category><![CDATA[Zipf]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=388</guid>
		<description><![CDATA[I&#8217;ve run across the interesting Zipfian distribution several times recently. Zipf&#8217;s law states that for many things, particularly words, the frequency is inversely proportional to the rank of the frequency. So, for example, the most common word is used twice as often as the second most common word, which is used twice as often as [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve run across the interesting <a href="http://mathworld.wolfram.com/ZipfDistribution.html">Zipfian distribution</a> several times recently. <a href="http://mathworld.wolfram.com/ZipfsLaw.html">Zipf&#8217;s law</a> states that for many things, particularly words, the frequency is inversely proportional to the rank of the frequency. So, for example, the most common word is used twice as often as the second most common word, which is used twice as often as the third most common word, and so on. This means that if you plot rank vs. count on a log-log plot, you will see a straight line.</p>
<p>I decided to try it on several of the most common religious texts. The key requirement was that I could download them from Project Gutenberg; one of my favorite websites. A second similarity is that they are rarely studied or understood by many of their professing followers. I also tried with several other texts, but left them off the chart for simplicity.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2009/10/zipf21.png"><img src="http://schutt.org/blog/wp-content/uploads/2009/10/zipf21.png" alt="Zipf's Law for common religious texts." title="Zipf's Law for common religious texts." width="346" height="349" class="aligncenter size-full wp-image-491" /></a></p>
<p>This rule also works well for problems outside of linguistics, for example: city populations, website visits, et cetera. I plotted the frequency of hits to different pages on my website and for city sizes in the United States, and found fairly good fits.</p>
<p><a href="http://schutt.org/blog/wp-content/uploads/2009/10/cities2.png"><img src="http://schutt.org/blog/wp-content/uploads/2009/10/cities2.png" alt="Zipf's law for US cities" title="Zipf's law for US cities" width="345" height="349" class="aligncenter size-full wp-image-492" /></a></p>
<p>Zipf&#8217;s Law, and later <a href="http://en.wikipedia.org/wiki/Zipf–Mandelbrot_law">extensions</a>, are common and useful enough that there is an [R] package for dealing with them. The package developers, Stefan Evert and Marco Baroni, posted a good <a href="http://zipfr.r-forge.r-project.org//#counting_words">introduction</a> to Zipf&#8217;s law and its uses on the zipfR project page. Their presentation materials are also a good example of slide design.</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2009/10/zipf-law/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>One-hundred and ten</title>
		<link>http://schutt.org/blog/2009/07/110/</link>
		<comments>http://schutt.org/blog/2009/07/110/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 17:35:57 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[All-Star]]></category>
		<category><![CDATA[baseball]]></category>
		<category><![CDATA[doping]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=275</guid>
		<description><![CDATA[I had a flash of inspiration this morning. A story about the All-Star game came on the radio and got me thinking. I&#8217;ve always been annoyed when people, often athletes, say &#8220;I gave it one-hundred and ten percent,&#8221; or something along those lines. Now, there are plenty of times when percentages over 100 make sense, [...]]]></description>
			<content:encoded><![CDATA[<p>I had a flash of inspiration this morning. A <a href="http://www.npr.org/templates/story/story.php?storyId=106615999" title="Frank Deford on NPR">story</a> about the <a href="http://www.nytimes.com/2009/07/15/sports/baseball/15allstar.html" title="2009 MLB All-Star game">All-Star game</a> came on the radio and got me thinking.</p>
<p>I&#8217;ve always been annoyed when people, often athletes, say &ldquo;I gave it one-hundred and ten percent,&rdquo; or something along those lines. Now, there are plenty of times when percentages over 100 make sense, but&#8212;by definition&#8212;your best <em>is</em> 100%. Unless . . . That&#8217;s it! The athletes <em>are</em> giving one-hundred and ten percent of their natural ability. They are admitting to <a href="http://www.wada-ama.org/en/" title="WADA">doping</a>. Their apparent abuse of basic math is really a veiled admission to using performance-enhancing drugs. The extra ten percent is the drugs!</p>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2009/07/110/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Obesity</title>
		<link>http://schutt.org/blog/2009/07/obesity/</link>
		<comments>http://schutt.org/blog/2009/07/obesity/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 12:30:30 +0000</pubDate>
		<dc:creator>Noel</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[Indiana]]></category>
		<category><![CDATA[obesity]]></category>
		<category><![CDATA[overweight]]></category>
		<category><![CDATA[USA]]></category>

		<guid isPermaLink="false">http://schutt.org/blog/?p=232</guid>
		<description><![CDATA[One of today&#8217;s headlines is &#8220;Hoosier obesity rate flat at 27.4%.&#8221; The article cites a RWJF/Trust for America&#8217;s Health report, which included this map and an interesting interactive map. The original data are from the CDC National Center for Health Statistics. Sources: The Journal Gazette: Hoosier obesity rate flat at 27.4% Trust for America&#8217;s Health: [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.rwjf.org/pr/interactive.jsp?id=37" title="interactive map of American obesity"><img src="http://schutt.org/blog/wp-content/uploads/2009/07/obesity-1991-2008.png" alt="Obesity 1991 to 2008" title="Obesity 1991 to 2008" width="438" height="241" class="aligncenter size-full wp-image-233" /></a></p>
<p>One of today&#8217;s headlines is &ldquo;<a href="http://www.journalgazette.net/article/20090702/LOCAL/307029948" title="Hoosier obesity rate flat at 27.4%">Hoosier obesity rate flat at 27.4%</a>.&rdquo; The article cites a RWJF/Trust for America&#8217;s Health report, which included this map and an interesting <a href="http://www.rwjf.org/pr/interactive.jsp?id=37">interactive map</a>. The original data are from the CDC National Center for Health Statistics.</p>
<p>Sources:</p>
<ul>
<li>The Journal Gazette: <a href="http://www.journalgazette.net/article/20090702/LOCAL/307029948" title="Hoosier obesity rate flat at 27.4%">Hoosier obesity rate flat at 27.4%</a>
</li>
<li>Trust for America&#8217;s Health: <a href="http://healthyamericans.org/reports/obesity2009/" title="F as in Fat 2009">F as in Fat 2009</a>
</li>
<li>RWJF: <a href="http://www.rwjf.org/childhoodobesity/product.jsp?id=45050" title="F as in Fat 2009">F as in Fat 2009</a>
</li>
<li>CDC: <a href="http://www.cdc.gov/nchs/fastats/overwt.htm" title="CDC">Overweight Prevalence</a>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://schutt.org/blog/2009/07/obesity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
