Error Analysis

Section 3.5 Error Analysis

Subsection 3.5.1 Introduction to Error Analysis

Fake news is one type of “misinformation” or disinformation.” However, much of the “news” can be simply misleading. Some people or news outlets convey misleading quantitative information unconsciously; some do it deliberately. Researchers often present numerical results, and conclusions based on those results, without also putting forth enough effort to calculate and report the uncertainties and inaccuracies in those results. Perhaps more insidious, many professionals construct quantitative information to serve a pre-disposition, a political goal, or a business objective. What is an individual reader or consumer of this information to do?

One thing you can do is apply something called uncertainty and error analysis. This can be done informally as you read through a piece, but error analysis is also a rigorous mathematical discipline all to itself.

Let’s first consider a real-world situation in which uncertainty proved to be catastrophic. In 1985, the NASA spaceship Challenger, with the first civilian passenger, exploded shortly after takeoff. A post-mortem, or after-event, analysis was conducted and the NASA team struggled to explain what happened to the public, until Richard Feynman, a well-known physicist, dropped a rubber O-ring (something used to seal off an area) in a beaker of ice water and showed how the rubber became inflexible. It turned out that the failure of this O-ring was the root cause of the explosion.

The ability of that O-ring to hold up during the near-freezing temperatures during launch was an uncertainty which had propagated through to the launch event. It may have been the neglect of the NASA engineering team to think through all of the components and how they might fare under adverse conditions. It may have been that the launch team made a fatal assumption “in the moment” that the O-ring would remain stable enough. Whatever the deeper explanation, uncertainty surrounding that O-ring’s behavior at those temperatures led to a catastrophe. Uncertainty analysis, which is also more broadly categorized as risk assessment, is designed to flag such issues, especially for enormously complex engineering systems like a spaceship.

Now let’s turn to a error analysis as a discipline. Unless it is a highly theoretical treatment, most quantitative and numerical analysis begins with measurements of some kind. Temperature is a measurement. So is an answer to a question in an opinion poll. All measurements have error. Unfortunately, that error propagates through the analysis and the event the analysis supports, and can even get magnified.

Here is a simple example of how error gets magnified that also illustrates rigor: Suppose I want to calculate the volume of a large cubic concrete tank. Well, that’s pretty easy from a math perspective. The volume of a cube is the length of one side cubed, or $V = L^3\text{,}$ where V is volume and L is length. So, I just have to measure the length of a side. I take a tape measure and measure the side. Let’s say I measure it at 36 inches. There is an inherent inaccuracy to the tape measure; let’s say it is $/pm 1/8$ in. Oh, and did I make sure that the tape measure was completely straight and rigid from one side to the other? Maybe not. Maybe there’s a 1-inch error because of that. Or maybe I stretched my arms so far, I really only “eyeballed” the length.

Let’s assume there’s a total of 2 in. of error in my measurement. I calculate the volume as $36 x 36 x 36 = 46656 in^3\text{.}$ Now, to report the error in this numerical result, I also have to concede that the length could be between 34 and 38. So the volume based on those two numbers are 39,304 and 54,872. Suddenly, that’s a pretty big spread!

Whether the error is significant or not depends on what you do with the number. The error could have a financial impact if you are buying the materials to construct such a concrete tank. The error could have little or no impact if you just need an estimate for something else.

The critical point is that the error in the original measurement has now been propagated and magnified in the final numerical result.

Now suppose I want to calculate the volume of the cylindrical fiberglass water reclamation tank I am staring at in my backyard as I write this. The equation for the volume of a cylindrical object is area of the base x height = volume and area of a circle is $\pi r^2$ = Area. In this case, I have to make two measurements, height and diameter (diameter is twice the radius represented as r in the equation). I can measure the height with a tape measure, although that tank has been sitting there for 15 years so I’m not sure how much of bottom is under the gravel surrounding it. I can “back-calculate” the radius by measuring the circumference with the tape measure.

You probably can see where this is going. Two measurements, each with their own sources of error propagated and magnified into the final numerical result.

Now consider some numerical results with grave consequences. The forecasts for global climate change and its consequences involve some of the most sophisticated measurements and computer modeling humanity has ever devised. While it might seem overwhelming to consider the error in each of the measurements and how they are propagated, you can take comfort in the fact that thousands of researchers around the world are arriving at similar conclusions. Only a small percentage of scientists deny that human-induced climate change is advancing dangerously.

But here’s how politics come into play. Climate policy makers, and climate deniers, use a parameter called the “social cost of carbon” to “monetize” the effects of climate change. This is a very complex analysis. But realize that the Environmental Protection Agency in the Obama administration used a figure around $50/ton carbon, while Trump administration officials revised the analysis and came up with a figure of around $7/ton. How could such an important figure be so different? The answer is that the analysis involves several key assumptions. Change the assumptions and you can drastically change the result.

Many news reports containing quantitative results will give, at best, a cursory view of the error in those results. Journalists don’t have an infinite amount of column inches to devote to the story. Researchers often are limited by the number of pages they can publish for a journal article. But realize that every one of these numerical results has sources of error, often significant or even huge, and many of them will not be exposed. It is up to the reader to think through them.

So, one root cause of misinformation or misleading information is the error in the original measurements. Sadly, much of what we are presented as “information” is based on measurements which are not physical measurements, as in the examples above, but are data arrived at in various ways. If I want to understand public sentiment about gun violence, for example, I can commission an opinion poll. There are rigorous procedures for taking polls and then doing statistical analysis on the results. But statistical error is very different from, say, the biases inherent in the person formulating the questions for the poll.

Sadly, much news we consume is not even intended to pursue an “objective truth” (even as a direction), but instead is constructed to support a position. Consulting firms, government departments, policy shops, NGOs, and others spend or raise millions of dollars to generate reports which are consumed by elected officials, and breathtakingly reported by journalists, but that often amount to a string of assumptions wrapped around cherry-picked data presented in dazzling, colorful graphs (see next section).

Our message here is that fake news often has quantitative and numerical components to it, but you don’t need to know math or be a “quant type” to think a bit more deeply about those results. You do need to be a healthy skeptic, however, and simple “qualitative tools,” like error analysis, will help you.

In closing this section, it is important not to throw your hands up and say “all information is tainted, there is no truth.” “Facts” and “truth” are asymptotic, which means simply that we can get closeWindow()r and closeWindow()r even if there is always some doubt or “error.” Error and uncertainty, however, are compensated for as you consider the following “path” towards real knowledge and away from fake news:

Coincidence or randomness – one thing that happens (“I was assaulted in the park so this must be a high crime neighborhood”) or two things that happen around the same time (“my friend was assaulted in the same park the same week”) may constitute coincidence or randomness but not real information. Something similar can be said for a researcher or policy expert who reports results from an analysis or experiment that has not been repeated or validated by others.

Correlation – an association between two things (gun sales and violent crime, e.g., in a specific region) may show a weak or strong statistical correlation, or a significant association, which would have to be corroborated to constitute information you can rely on.
Causation – something caused by another thing (human industrial and consumption activity and global average temperature rises, as opposed to natural causes of those temperature rises) is a much higher bar to scale. This requires many independent and repeatable studies, perhaps coming at the problem from different directions.
Convergence – Specialists and experts collaborate and review each other’s work and begin to converge on a common “theory of the case” or explanation for why something is occurring.
Consensus – experts can gather in a room and nod their heads in agreement, but so what? Consensus among decision makers and their constituencies are then necessary for any action to be taken based on the information and knowledge.

Error and uncertainty in the information are progressively and incrementally reduced to low or insignificant levels as the analysis proceeds along the path of the “five C’s.”

Any one of these steps alone is insufficient. After all, the townspeople of Trent, Italy, achieved consensus around the fake news of the murder of Simon by the Jews in their community, and acted on the fake news. They did not withhold judgment until the “analysis” came out.

Subsection 3.5.2 Error Propogation

Let's look at an example of how errors can propogate in a calculation, and how that can produce very different results in a calculation.

In May 2024, the Congressional Budget Office (CBO), a non-partisan group which analyzes the cost of potential legislation, posted an analysis of a tax plan presented by the Republican Party [3.10.1.30]. The analysis said that the tax plan, which called for tax cuts on wealthier taxpayers and corporations, would cost $4.6 trillion ($10^12$) dollars over the next 10 years.

The mathematics that the CBO uses to make these calculations are pretty far beyond where we are in this class, but we'll focus on something else very interesting in this report. The CBO previously had estimated that the tax plan would cost $3.5 trillion dollars over 10 years - an estimate that was off by $1.1 trillion dollars - which is a huge amount. How could their estimate have been off by so much? And more importantly, which of these estimates is correct? A group which is in favor of the tax plan might want to use the smaller number to make the point that the tax cuts would not cost the government as much, while a group which is opposed could use the higher number to make the argument that their opposition is justified.

When we make estimates of something, there will always be some expected error - if it were perfect, it wouldn't be an estimate! For example, imagine I asked you to estimate how much money you would spend on food this week - the chances of you getting the answer exactly correct are pretty small. When we make mathematical estimates, mathematicians usually prefer to use a range of values - for example, a mathematician might say "I'll spend $50-$150 on food this week." This is much more likely to be a correct statement than saying "I will spend exactly $100 on food this week," since the correct value only needs to be in the interval!

If we give our estimate as an interval, half of the width of that interval is called the margin of error. In our example above, we would say that our estimate of food costs is $100, with a margin of error of $\frac{150-50}{2} = 50\text{.}$ We can also write the interval as $100\pm 50\text{.}$

When we use estimates like this, the errors from multiple estimates can combine, a process called error propogation. For example, we know that $10*20 = 200\text{.}$ However, assume that we had an estimate of $a = 10\pm 5$ times an estimate of $b = 20\pm 3\text{.}$ Our values for $a$ range from 5 to 15, and our values for $b$ range from 17 to 23. That means our product $a*b$ could be as small as $5*17 = 85$ and as large as $15*23 = 345\text{.}$ When we multiply an estimate with a margin of error of $\pm 5$ times an estimate with a margin of error of $\pm 3\text{,}$ we get a margin of error of $\frac{345-85}{2} = 130\text{!}$ This error propogation explains why we can get such huge variation in estimates, especially when our numbers are large - even when our initial estimates are very good.

We'll look at a very simplified example of this problem to illustrate how we can get such large errors in an example like this.

Activity 3.5.1.

Let's imagine a situation where everyone pays 10% tax on their income. A city wants to estimate the tax that their citizens will pay. They estimate that there are between 2.5 and 2.7 million people in the city, and that the average taxable income of those people is between $47,000 and $49,000.

(a)

Use the low estimates of the population and average income to calculate the lower end of the estimated tax collected.

Solution.

We multiply the low end of the estimate for the population (2,500,000) times the low end of the estimate for the income ($47,000) times 10% (0.1).

\begin{equation*} 2,500,000*47,000*0.1 = 1.175*10^{10} \end{equation*}

Our low estimate for the tax collected is 11.75 billion dollars.

(b)

Use the high estimates of the population and average income to calculate the higher end of the estimated tax collected.

Solution.

We multiply the high end of the estimate for the population (2,700,000) times the high end of the estimate for the income ($49,000) times 10% (0.1).

\begin{equation*} 2,700,000*49,000*0.1 = 1.323*10^{10} \end{equation*}

The high estimate for the tax collected is 13.23 billion dollars.

With error propogation, the more quantities we combine in our estimate, the greater the margin of error in our final calculation.

Activity 3.5.2.

Consider a city now where, rather than everyone paying 10% tax, people pay between 8% and 12%. Repeat the calculations from Activity 3.5.1 with this variable tax rate. What are the low and high ends for your estimates now?

Solution.

The low estimate is

\begin{equation*} 2,500,000*47,000*0.08 = 9.4*10^9 \end{equation*}

and the high estimate is

\begin{equation*} 2,700,000*49,000*0.12 = 1.5876*10^{10} \end{equation*}

The estimated tax is now between $9.4 billion and $15.8 billion.