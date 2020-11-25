© Washington Examiner

Executive Summary

An update in Michigan listed as of 6:31AM Eastern Time on November 4th, 2020, which shows 141,258 votes for Joe Biden and 5,968 votes for Donald Trump An update in Wisconsin listed as 3:42AM Central Time on November 4th, 2020, which shows 143,379 votes for Joe Biden and 25,163 votes for Donald Trump A vote update in Georgia listed at 1:34AM Eastern Time on November 4th, 2020, which shows 136,155 votes for Joe Biden and 29,115 votes for Donald Trump An update in Michigan listed as of 3:50AM Eastern Time on November 4th, 2020, which shows 54,497 votes for Joe Biden and 4,718 votes for Donald Trump

Background

The Concept, the Intuition, and the Measurement

Measuring This Relationship Between The Candidate's Margin and their Ratio

The difference between the number of votes for Biden and the number of votes for Trump — the "margin"

The logarithm[4] of the ratio between the number of votes for Biden and the number of votes for Trump — the "log-ratio"

Ratios demonstrate an important property: the farther ahead a candidate is, the harder it is to move the next 1 percent ahead. They reflect the relative difficulty of each marginal vote as the pool of remaining votes decreases. As a candidate approaches 0% or 100% of the vote, the rates at which the ratio of that candidate's votes to the other candidate's votes converge to zero or infinity are very different. Ratios allow us to spot a potential sign of fraud: unusually low ratios between the losing (major) candidate and other, less well-known candidates. Because those who watch and participate in elections tend not to think in these terms, if there is fraud, they're much less likely to have covered their tracks in this respect. A tin-pot-dictator style election where the favored candidate gets 99% of the vote is obviously suspect, but less attention is often paid to details like whether the ratio between the most popular losing candidate and long-shot third-party candidates actually makes sense[5]. Looking at metrics which are less popular in practical use will be tremendously helpful here, as we will see.

A Look at Michigan

the update with the next largest margin was an update with merely 7,776 votes, while this update had over 7 times as many votes

and

broke more heavily for Biden

In particular, it calls into serious question the veracity of this vote update, and is perhaps some of the strongest direct evidence of fraud in this entire report.

A Look at Wisconsin

A Look at Georgia

A Short Survey of Other States

Consolidating, Comparing, and Measuring

Quantifying the Extremity

The graph is presented two-dimensionally, but it's really three-dimensional. It's visibly much denser in the center, has what appear to be something like two normal distributions, and as you move farther from the origin along a positive-sloping line which runs through the origin, the lower the density you can expect. The outer "edges" of the graph, in the top-right and bottom-left quadrants, closely resemble the shape of the line y = 1 / x.

Predicting More Typical Results and Assessing Their Implications

This raises the obvious question: what might these vote updates look like if they were less extreme?

At the very least, it is possible to definitively say that Joe Biden's victory in all three of these states relied on four of the seven most co-extreme vote updates in the entire data set of 8,954 vote updates.

Important Considerations

If it cannot be shown that the ballots counted during these spikes were qualitatively different from all other vote updates in Michigan, then the results are likely too extreme along multiple dimensions to be accepted at face value.

Conclusion

Footnotes:

It is stored as a zip of a folder with enough to deterministically reconstruct the entirety of this report. The SHA256 hash of this file is fc1d9e17fc831e288609099e290f4d0152918f6365e7a602f7bd37dbe5347546. The time-series data provided by the New York Times provides what appear to be precise vote tallies along with vote proportions which are truncated after three decimal places. This introduces some imprecision, which becomes more meaningful as the vote total grows. The implications of this and various mechanisms for estimating the true vote total are discussed in the data gathering and processing appendix of this report. There are several updates in this data set where the implied vote update suggests a loss of votes for one or both candidates. A more detailed discussion of that can be found in the appendix on the data collection. To compute ratios, we need to exclude updates where the denominator is zero, and negative values for one candidate cause the ratio to be meaningless (e.g. -5000/20 is indistinguishable from 5000/-20, even though an update where candidate A loses 5,000 votes while candidate B gains 20 is fundamentally different from one where the converse is true). We use the natural logarithm, but a logarithm base would have the desired properties here. This is an especially important because a good way to push the margin in a precinct without running up a high percentage is to inflate the votes for long-shot candidates while depressing the votes for the most likely alternative. See, e.g., this paper. An archived version is provided here, should it become unavailable for any reason. Restricted to updates with a non-negative other vote. Values computed from Michigan Secretary of State's website here. The skeptical reader will likely have immediately noticed that 143,379 - 25,163 = 118,216, not 118,215. This is an artifact introduced by the imprecision of how NYT's data set presents these updates, i.e. as vote proportions truncated after 3 decimal places. The implications of this and various ways of minimizing error introduced by it are discussed in the Data-Gathering and Processing appendix to this report. Many technology websites which discuss machine learning and training models mention standardization as an important data pre-processing technique. As a reflection of its popularity, major software libraries designed for data science and machine learning tend to provide support for this out-of-the-box, such as the highly popular scikit-learn library for the Python programming language (see here). The R language, designed for data processing and analysis tasks, even provides it with its base language installation "out of the box." Ideally we would be able to do this more granularly, e.g. at the county level. This appears to be the only publicly available Presidential race time-series data online which covers election night. Others, such as this county-level time-series file, exist, but do not begin until the morning of November 4th, and thus, while more precise (and thus valuable for investigations of, e.g., digit-frequency), are of far less utility when doing systematic analysis of patterns which are not subject to rounding errors. Restricted to updates where the vote count for both candidates was positive. This one was also both a very lopsided and large update, going 54,497 for Biden to 4,718 for Trump, for a Biden - Trump margin of 49,779 and a Biden:Trump ratio of about 11.55:1. In clear contrast to what this distribution predicts, it was both the second-largest in terms of margin and ratio. See footnote 10 The next-largest update in Wisconsin in terms of Biden - Trump margin was an update which arrived at 8:26pm CST on November 3rd and went 53,016 to 13,517, for a margin of 39,499 -- about three times smaller than the 3:42am update with a margin of 118,215. That update, which arrived at 12:36am CST on November 4th, went 3,037 for Biden to 495 for Trump, a ratio of 6.14:1 but with a margin of only 2,543 (see footnotes 10, 17, regarding rounding errors). This is the sort of data point we expect in line with the distribution where one of the two values is very large in magnitude. These probabilities are rough not only because they are rounded but because they are calculated assuming sampling with replacement. We are, however, sampling without replacement -- a value cannot both be above and below the top 10 most co-extreme vote updates. Making this simplifying assumption slightly understates the actual improbability of these states being so well-represented in the top four, seven, and ten most co-extreme vote updates. Generally speaking, this means the set of points for which some function produces the same value. In this context, it means the set of points with identical products, i.e. lines of the form y = k / x. i.e. It is rare for a vote update to both have an extreme ratio favoring one candidate and an extreme margin between the vote numbers. This, like all other values presented here, is rounded to three decimal places. z-scores here are used only as a method of centering and scaling data between distributions with values of different magnitudes. We make no assumptions about the normality of any of those distributions, and z-scores are never used as a hypothesis test statistic at any point in this report. Since the standardization process involves subtracting the mean and dividing by the standard deviation, the process here is to multiply by the standard deviation and then add the mean. See above. See above. 107,143 - 21,250 = 85,893, not 85,892. Numbers reported here are the results of computations performed on unrounded values. See footnote 22. See above. Which, to the author's knowledge, are original