Package 'elec' reference manual

Title:	Collection of Functions for Statistical Election Audits
Description:	This is a bizarre collection of functions written to do various sorts of statistical election audits. There are also functions to generate simulated voting data, and simulated "truth" so as to do simulations to check characteristics of these methods.
Authors:	Luke Miratrix
Maintainer:	Luke Mirarix <[email protected]>
License:	GPL (>= 2)
Version:	0.1.2.2
Built:	2025-03-04 05:02:33 UTC
Source:	https://github.com/lmiratrix/elec

Statistical Election Audits Package

Description

This is a collection of functions written to do various sorts of statistical election audits. There are also functions to generate simulated voting data, and simulated “truth” so as to do simulations to check charactaristics of these methods. The package includes two data sets consisting of actual reported voting results for races held November, 2008, in California. It also includes actual audit date for one of these races.

Package:	elec
Type:	Package
Version:	0.1
Date:	2009-01-14
License:	GPL (>= 2)
LazyLoad:	yes

There are three general audit styles implemented in this package. For each style there are two main computational tasks provided: estimate the needed sample size and expected workload, and calculate $P$-values for a given audit result. The three methods are CAST (see CAST.calc.sample and CAST.audit, the Trinomial Bound (see tri.calc.sample and trinomial.audit), and the Kaplan-Markov (KM) Bound (see KM.calc.sample and KM.audit).

The examples primarily use a data set included in the package, santa.cruz and santa.cruz.audit, which holds the ballot counts for a Santa Cruz, CA race that we audited using these methods. See trinomial.bound for how these data were analyzed. The yolo data set holds precinct level counts for a race in Yolo county.

There are also many functions allowing for construction of new audit methods and simulations. This includes methods that generate fake race data that can be used for computational simulations to assess the efficay of different auditing approaches (see, e.g., make.sample and make.truth).

The package grew out of an earlier, disorganized package that implemented general routines for election auditing. Pieces of this package are used by the aforementioned cleaner methods, but all the individual functions are still there for specific uses, such as making different tests. Start with stark.test, which has an index of these pieces in its “see also” section.

If you find yourself confused, please contact the maintainer, L. Miratrix, for help. This will help improve the clarity of the package a great deal.

Author(s)

Luke W. Miratrix

Maintainer: Luke W. Miratrix <[email protected]>

References

CAST and KM were developed by Philip B. Stark. The Trinomial bound was developed by Luke W. Miratrix and Philip B. Stark.

For general papers on election auditing see the list at http://www.stat.berkeley.edu/~stark/Vote/index.htm.

In particular, for the trinomial bound, see Luke W. Miratrix and Philip B. Stark. (2009) Election Audits using a Trinomial Bound (in press).

For the KM bound see Stark, P.B., 2009. Risk-limiting post-election audits: P-values from common probability inequalities.

For an overview of the races and the methods, see Joseph Lorenzo Hall, Philip B. Stark, Luke W. Miratrix, Elaine Ginnold, Freddie Oakley, Tom Stanionis, and Gail Pellerin. (2009) Implementing Risk-Limiting Audits in California.

Audit Plans for CAST and Trinomial Methods

Description

An audit.plan is returned by CAST.calc.sample, containing details of how to audit for a desired level of confidence. It has a print method for pretty output.

The audit.plan.tri, similarly, is an object that holds information about conduting a PPEB election audit, in particular an audit that will use the trinomial bound to analyze resultant audit data. It is what is returned by the tri.calc.sample method.

Theoretically, auditors will use the plan and go out and generate actual audit data. (You can fake it with simulations–see make.truth.) The audit data should be stored in a new data frame with new vote totals, or overstatements, for the candidates in the audited precincts. To convert from totals to overstatements, use audit.totals.to.OS. You can store that in a elec.data object under “audit”, or keep it seperate.

Usage

is.audit.plan(x)

## S3 method for class 'audit.plan'
print(x, ...)

is.audit.plan.tri(x)

## S3 method for class 'audit.plan.tri'
print(x, ...)
is.audit.plan(x)

## S3 method for class 'audit.plan'
print(x, ...)

is.audit.plan.tri(x)

## S3 method for class 'audit.plan.tri'
print(x, ...)

Arguments

`x`	object to check
`...`	No extra options passed.
`audit.plan`	to print.
`audit.plan.tri`	to print.

Value

is.audit.plan: TRUE if object is an audit.plan object.

print: No return value; prints results.

is.audit.plan.tri: TRUE if object is an audit.plan.tri object.

print: No return value; prints results.

Author(s)

Luke W. Miratrix

Converting total vote counts to Over Statements

Description

This utility function takes a collection of total votes from an audit and subtracts the originally reported totals from them to give overstatement errors (i.e., how many votes more than actual a candidate had). I.e., the overstatement error is REPORTED - ACTUAL.

Usage

audit.totals.to.OS(Z, audit)
audit.totals.to.OS(Z, audit)

Arguments

`Z`	Elec.data object holding the originally reported results
`audit`	A data.frame with one column per candidate that holds the totals from the audit. Each row corresponds to a precinct. Object needs a PID column with precinct ids that match the ones in Z.

Details

Make sure the audit's PID column is a character vector and not a factor. If needed, convert via audit\$PID = as.character(audit\$PID).

Value

A new data.frame with overstatement errors.

Author(s)

Luke W. Miratrix

Examples



## Generate a fake race, a fake audit, and then compute overstatements
Z = make.sample(0.08, 150, per.winner=0.4, R=2.01)
Z
Zb = make.ok.truth(Z, num.off=150, amount.off=5)
Zb
aud = Zb$V[ sample(1:Zb$N, 10), ]
aud
audit.totals.to.OS(Z, aud )

## Generate a fake race, a fake audit, and then compute overstatements
Z = make.sample(0.08, 150, per.winner=0.4, R=2.01)
Z
Zb = make.ok.truth(Z, num.off=150, amount.off=5)
Zb
aud = Zb$V[ sample(1:Zb$N, 10), ]
aud
audit.totals.to.OS(Z, aud )

Functions that Compute Error Levels Given Audit Data

Description

Calculate the error amounts for all precincts in Z that were audited from the audit data, given as overstatement errors for all candidates.

compute.audit.errors uses the calc functions and the weight functions in a 1-2 combination.

calc.pairwise.e\_p() is often used with an err.override for simulation studies and whatnot to see what a fixed vote impact would have on taints for trinomial.

Usage

calc.overstatement.e_p(Z)

calc.pairwise.e_p(Z, audit = NULL, err.override = NULL)
calc.overstatement.e_p(Z)

calc.pairwise.e_p(Z, audit = NULL, err.override = NULL)

Arguments

`Z`	elec.data object
`audit`	The audit object, if it is not in the Z object, or if some other object other than the one in the Z object is desired to be considered as the audit object. Used by the simulation functions to generate errors for some fixed amount of error in conjunction with the err.override.
`err.override`	Assume a baserate of this amount of error everywhere, ignoring audit data. If non-null, use this as the found error in votes rather than the actual errors found in the audit.

Value

compute.audit.errors returns a new audit table from Z with two new columns, err and err.weighted, corresponding to the errors found in each audited precinct before and after the weight function has been applied to them.

calc.overstatement.e_p: Vector (of length of audited precincts) of found errors by precinct.

Note

Z must have an audit component, or one must be passed, for this function to make sense! Remember that audit objects have overstatements, NOT total votes for candidates. With err.override being set this is less relevant as the actual votes are usually ignored.

Author(s)

Luke W. Miratrix

Given audit data, compute p.values and all that.

Description

Given audit data, compute p.values and all that.

Usage

CAST.audit(Z, audit = NULL, plan = NULL, ...)
CAST.audit(Z, audit = NULL, plan = NULL, ...)

Arguments

`Z`	elec.data object (voter matrix)
`audit`	A data.matrix holding the audit data, if the Z object does not have one, or if it is desirable to override it. If both the Z object has an audit object and audit is not null, it will use this parameter and ignore the one in Z.
`plan`	An audit.plan object that the audit was conducted under.
`...`	Passed to CAST.calc.sample if plan is null and needs to be regenerated.

Calculate Optimal CAST plan

Description

With CAST, it is sometimes advantageous to set aside small precincts and assume they are entirely in error so as to reduce the total number of precincts in the pool that we sample from. This trade-off can increase the power of the audit or, in other terms, allow us to sample fewer precincts as the chance of nabbing the large, dangerous ones is larger.

Usage

CAST.calc.opt.cut(Z, beta = 0.9, stages = 2, t = 3, plot = FALSE, ...)
CAST.calc.opt.cut(Z, beta = 0.9, stages = 2, t = 3, plot = FALSE, ...)

Arguments

`Z`	The elec.data object
`beta`	1-`beta` is the risk of the audit failing to notice the need to go to a full manual count if it should.
`stages`	Number of stages in the audit.
`t`	The allowed vote swing that is not considered a material error.
`plot`	TRUE/FALSE. Plot the trade-off curve.
`...`	Extra arguments to the plot command.

Details

Of all cuts that produce the smallest n, it returns the smallest cut (since sometimes multiple cut-offs lead to the same sample size).

This function also plots the trade-off of sample size for a specific cut, if the plot flag is TRUE.

This function iteratively passes increasing values of small.cut to CAST.calc.sample and examines the resulting n.

Value

Returns a list.

`cut`	Size of the optimal cut. All precincts with an error smaller than or equal to cut would not be audited, and instead be assumed to be in full error.
`n`	Corresponding needed sample size given that cut.
`q`	The number of tainted precincts that would be needed to throw the election, beyond the ones set aside due to being smaller than `cut`.

Author(s)

Luke W. Miratrix

Examples



        ## Find optimial cut for  determining which small precincts that
        ## we would set aside and not audit in Santa Cruz
        data(santa.cruz)
        Z = elec.data( santa.cruz, C.names=c("leopold","danner") )

        CAST.calc.opt.cut( Z, beta=0.75, stages=1, t=5, plot=TRUE )

## Find optimial cut for  determining which small precincts that
        ## we would set aside and not audit in Santa Cruz
        data(santa.cruz)
        Z = elec.data( santa.cruz, C.names=c("leopold","danner") )

        CAST.calc.opt.cut( Z, beta=0.75, stages=1, t=5, plot=TRUE )

Construct a sample for auditing using CAST

Description

Collection of functions for planning and evaluating results of a CAST election audit. CAST is a system devised by Dr. Philip B., Stark, UC Berkeley Department of Statistics.

CAST.calc.sample determines what size SRS sample should be drawn to have a reasonable chance of certification if the election does not have substantial error. It returns an audit.plan. CAST.sample takes the audit.plan and draws a sample to audit. CAST.audit takes audit data (presumably from the audit of the sample drawn in previous step) and analyzes it.

Make an audit.plan given reported results for an election. It gives back what to do for a single stage. If stages is > 1, then it adjusts beta appropriately.

Usage

CAST.calc.sample(
  Z,
  beta = 0.9,
  stages = 1,
  t = 3,
  as.taint = FALSE,
  small.cut = NULL,
  strata = NULL,
  drop = NULL,
  method = c("select", "binomial", "hypergeometric"),
  calc.e.max = TRUE,
  bound.function = maximumMarginBound
)
CAST.calc.sample(
  Z,
  beta = 0.9,
  stages = 1,
  t = 3,
  as.taint = FALSE,
  small.cut = NULL,
  strata = NULL,
  drop = NULL,
  method = c("select", "binomial", "hypergeometric"),
  calc.e.max = TRUE,
  bound.function = maximumMarginBound
)

Arguments

`Z`	elec.data object (voter matrix)
`beta`	the confidence level desired - overall chance of correctly escalating a bad election to full recount
`stages`	number of auditing stages. Each stage will have the same confidence level, determined by a function of beta. A value of 1 is a single-stage audit.
`t`	The maximum amount of error, in votes, expected. Threshold error for escalation – if >= 1 then number of votes, otherwise fraction of margin.
`as.taint`	Boolean value. TRUE means interpret $t$ as a taint in $[0,1]$ by batch (so the threshold error will be batch-specific). FALSE means interpret $t$ as a proportion of the margin or as number of votes (as described above).
`small.cut`	Cut-off in votes–any precincts with potential error smaller than this value will not be audited and be assumed to be worst case error.
`strata`	Name of the stratification column of Z. Not needed if audit plan also being passed in case of CAST.sample. NULL means single strata.
`drop`	Vector of precincts to drop for whatever reasons (such as they are already known). This is a vector of TRUE/FALSE.
`method`	Method of calculation.
`calc.e.max`	Should the e.max be taken as given, or recalculated?
`bound.function`	What function should be used to calculate worst-case potential error of precincts.

Author(s)

Luke W. Miratrix

References

Philip B. Stark. CAST: Canvass Audits by Sampling and Testing. University of California at Berkeley Department of Statistics, 2009. URL: http://statistics.berkeley.edu/~stark/Preprints/cast09.pdf. Also see http://www.stat.berkeley.edu/~stark/Vote/index.htm for other relevant information.

Examples


        ## Make an example cartoon race (from Stark paper)
	Z = make.cartoon()

        ## What should we do?
	samp.info = CAST.calc.sample( Z )
	samp.info

        ## Draw a sample.
	samp = CAST.sample( Z, samp.info$ns )
        samp

        ## Analyze what a CAST audit of santa cruz would entail
        data(santa.cruz)
        Z = elec.data( santa.cruz, C.names=c("leopold","danner") )
        CAST.calc.sample( Z, beta=0.75, stages=1, t=5, small.cut=60)
## Make an example cartoon race (from Stark paper)
	Z = make.cartoon()

        ## What should we do?
	samp.info = CAST.calc.sample( Z )
	samp.info

        ## Draw a sample.
	samp = CAST.sample( Z, samp.info$ns )
        samp

        ## Analyze what a CAST audit of santa cruz would entail
        data(santa.cruz)
        Z = elec.data( santa.cruz, C.names=c("leopold","danner") )
        CAST.calc.sample( Z, beta=0.75, stages=1, t=5, small.cut=60)

Sample from the various strata according to the schedule set by 'ns'. Ignore all precincts that are known (i.e., have been previously audited).

Description

Sample from the various strata according to the schedule set by 'ns'. Ignore all precincts that are known (i.e., have been previously audited).

Usage

CAST.sample(
  Z,
  ns,
  strata = NULL,
  seed = NULL,
  print.trail = FALSE,
  known = "known"
)
CAST.sample(
  Z,
  ns,
  strata = NULL,
  seed = NULL,
  print.trail = FALSE,
  known = "known"
)

Arguments

`Z`	elec.data object (voter matrix)
`ns`	EITHER an audit.plan or a vector of sample sizes for the strata. Names must correspond ot the names of the strata. If ns is an audit plan, then the strata variable should not be passed as well.
`strata`	Name of the stratification column of Z. Not needed if audit plan also being passed in case of CAST.sample. NULL means single strata.
`seed`	Seed to use–for reproducability.
`print.trail`	Print out diagnostics.
`known`	The column of known precincts that should thus not be selected. Similar to "drop", above.

Value

: List of precincts to be audited.

Examples

Z = make.cartoon()
samp.info = CAST.calc.sample( Z )
samp.info
samp = CAST.sample( Z, samp.info )

Z = make.cartoon()
samp.info = CAST.calc.sample( Z )
samp.info
samp = CAST.sample( Z, samp.info )

Calculate the measured error in each of the audited precicnts.

Description

Calculate the measured error in each of the audited precicnts.

Usage

compute.audit.errors(
  Z,
  audit = NULL,
  calc.e_p = calc.pairwise.e_p,
  w_p = weight.function("no.weight"),
  bound.col = "tot.votes",
  err.override = NULL
)
compute.audit.errors(
  Z,
  audit = NULL,
  calc.e_p = calc.pairwise.e_p,
  w_p = weight.function("no.weight"),
  bound.col = "tot.votes",
  err.override = NULL
)

Arguments

`Z`	Elec.data object holding the originally reported results
`audit`	A data.frame with one column per candidate that holds the totals from the audit. Each row corresponds to a precinct. Object needs a PID column with precinct ids that match the ones in Z.
`calc.e_p`	Calculate e\_p or take as given.
`w_p`	The weight function to use to reweight the errors of precincts.
`bound.col`	This is the vector (in audit) containing the maximum number of votes possible in the various precincts.
`err.override`	If non-null, use this as the found error in votes rather than the actual errors found in the audit.

Value

Orig audit table from Z with two new columns, err and err.weighted, corresponding to the errors found in each audited precinct before and after the weight function has been applied to them.

compute.stark.t

Description

Compute the test statistic for election audits, essentially the largest error found in the audit, as measured by the passed functions and methods.

Usage

compute.stark.t(
  Z,
  bound.col,
  calc.e_p = calc.pairwise.e_p,
  w_p = weight.function("no.weight"),
  err.override = NULL,
  return.revised.audit = FALSE
)
compute.stark.t(
  Z,
  bound.col,
  calc.e_p = calc.pairwise.e_p,
  w_p = weight.function("no.weight"),
  err.override = NULL,
  return.revised.audit = FALSE
)

Arguments

`Z`	If it already has an audit table with err and err.weighted then it will use those errors, otherwise it will compute them with compute.stark.err
`bound.col`	This is the vector containing the maximum number of votes possible in the various precincts.
`calc.e_p`	Function to compute e_p. Default is calc.pairwise.e_p.
`w_p`	The weight function to be applied to the precinct error.
`err.override`	If non-null, use this as the found error in votes rather than the actual errors found in the audit.
`return.revised.audit`	Return the updated audit frame with the error and weighted errors calculated.

Details

This is an older method that other methods sometime use—it is probably best ignored unless you have a good reason not to.

Value

The test statistic, i.e. the maximum found error in the audit sample, as computed by calc.e\_p and weighted by w\_p.

Author(s)

Luke W. Miratrix

countVotes

Description

Given a elec.data object, count the votes as reported and determine winner(s) and loser(s).

Usage

countVotes(Z)
countVotes(Z)

Arguments

`Z`	the elec.data object.

Value

Updated 'Z' matrix with the total votes as components inside it.

Author(s)

Luke W. Miratrix

Examples


  Z = make.cartoon()
  ## Take away 20 percent of C1's votes.
  Z$V$C1 = Z$V$C1 * 0.8
  ## Count again to find winner.
  Z = countVotes(Z)
  Z

Z = make.cartoon()
  ## Take away 20 percent of C1's votes.
  Z$V$C1 = Z$V$C1 * 0.8
  ## Count again to find winner.
  Z = countVotes(Z)
  Z

do.audit

Description

Given a list of precincts to audit, the truth (as an elec.data object), and the original votes (also as an elec.data object), do a simulated CAST audit and return the audit frame as a result.

Usage

do.audit(Z, truth, audit.names, ns = NULL)
do.audit(Z, truth, audit.names, ns = NULL)

Arguments

`Z`	elec.data object
`truth`	another elec.data object–this one's vote counts are considered "true"
`audit.names`	name of precincts to audit. Correspond to rownames of the Z and truth elec.data objects.
`ns`	List of sample sizes for strata. If this is passed, this method will randomly select the precincts to audit. In this case audit.names should be set to NULL.

Details

Given the reported vote table, Z, and the actual truth (simulated) (a Z matrix with same precincts), and a list of precincts to audit, do the audit. If audit.names is null and the ns is not null, it will sample from precincts via CAST.sample automatically.

Value

Overstatments for each candidate for each precinct.

Author(s)

Luke W. Miratrix

Examples


Z = make.cartoon(n=200)
truth = make.truth.opt.bad(Z, t=0, bound="WPM")
samp.info=CAST.calc.sample(Z, beta=0.75, stages=1, t=5 )
audit.names = CAST.sample( Z, samp.info )
do.audit( Z, truth, audit.names )


Z = make.cartoon(n=200)
truth = make.truth.opt.bad(Z, t=0, bound="WPM")
samp.info=CAST.calc.sample(Z, beta=0.75, stages=1, t=5 )
audit.names = CAST.sample( Z, samp.info )
do.audit( Z, truth, audit.names )

core election audit data structure

Description

Makes an object (often called a ‘Z’ object in this documentation) that holds all the vote totals, etc., as well as some precomputed information such as vote margins between candidates, the theoretical winners, and so on.

Usage

elec.data(
  V,
  C.names = names(V)[2:length(V)],
  f = 1,
  audit = NULL,
  pool = TRUE,
  tot.votes.col = "tot.votes",
  PID.col = "PID"
)

## S3 method for class 'elec.data'
print(x, n = 4, ...)
elec.data(
  V,
  C.names = names(V)[2:length(V)],
  f = 1,
  audit = NULL,
  pool = TRUE,
  tot.votes.col = "tot.votes",
  PID.col = "PID"
)

## S3 method for class 'elec.data'
print(x, n = 4, ...)

Arguments

`V`	Voter matrix OR 2-element list with Voter Matrix followed by Candidate names
`C.names`	List of candidate names. Also names of columns in V
`f`	Number of winners
`audit`	The audit data—must have columns that match C.names. Columns are overstatements of votes found for those candidates.
`pool`	Combine small candidates into single pseudo-candidates to increase power
`tot.votes.col`	Name of column that has the total votes for the precincts.
`PID.col`	Name of column that identifies unique PIDs for precincts.
`x`	For print() and is.elec.data(). An elec.data object
`n`	Number to print
`...`	The collection of arguments that are passed directly to elec.data, or (in the case of print), unused.

Details

elec.data does some cleaning and renaming of the passed data structure. In particular it will rename the tot.votes column to "tot.votes" if it is not that name already.

Value

A “elec.data” data structure. Note: Will add PID (precinct ID) column if no PID provided (and generate unique PIDs). It will rename the PID column to PID. Also, rownames are always PIDs (so indexing by PID works).

print: No return value; prints results.

Author(s)

Luke W. Miratrix

Examples


data(santa.cruz)
elec.data( santa.cruz, C.names=c("danner","leopold") )

data(santa.cruz)
elec.data( santa.cruz, C.names=c("danner","leopold") )

find.q

Description

Find q, the minimum number of precints with w\_p's greater than given t.stat that can hold an entire election shift in them.

Usage

find.q(
  V,
  t.stat,
  bound.col,
  M,
  threshold = 1,
  w_p = weight.function("no.weight"),
  drop = NULL
)
find.q(
  V,
  t.stat,
  bound.col,
  M,
  threshold = 1,
  w_p = weight.function("no.weight"),
  drop = NULL
)

Arguments

`V`	The data.frame of votes–the subwing of a elec.data object, usually.
`t.stat`	The worst error found in the audit (weighted, etc.)
`bound.col`	The name of the column in V to be used for the passed size (max number of votes, total votes, incl undervotes, etc.) to the error function.
`M`	The margin to close. Usually 1 for proportional. Can be less if error from other sources is assumed.
`threshold`	The total amount of error to pack in the set of tainted precincts
`w_p`	The weight function for errors.
`drop`	Drop precincts with this column having a "true" value–they are previously audited or otherwise known, and thus can't hold error. Can also pass a logical T/F vector of the length of nrow(V)

Details

This number is behind the SRS methods such as CAST. If we know how many precincts, at minimum, would have to hold substantial error in order to have the reported outcome be wrong, we can compute the chance of finding at least one such precinct given a SRS draw of size n.

Find the number of precints that need to have "large taint" in order to flip the election. This is, essentially, finding a collection of precints such that the max error (e.max) plus the background error (the w\_p-inverse of the t.stat) for the rest of the precints is greater than the margin (or 1 if done by proportions).

Value

integer, number of badly tainted precints needed to hold 'threshold' error

Author(s)

Luke W. Miratrix

find.stark.SRS.p

Description

Find the p-value for a given q, n, and N. Helper function for a simple hypergeometric calculaton–see reports.

Usage

find.stark.SRS.p(N, n, q)
find.stark.SRS.p(N, n, q)

Arguments

`N`	total number of precints
`n`	total number of audited precints (must be less than N)
`q`	min number of precints that could hold taint to flip election

Value

Chance that 1 or more of the q 'bad' things will be seen in a size n SRS draw from the N sized bucket.

Author(s)

Luke W. Miratrix

find.stratification

Description

Find how audit covered the strata for a given table of votes and audits.

Usage

find.stratification(D, aud, strat.col)
find.stratification(D, aud, strat.col)

Arguments

`D`	Table of votes
`aud`	Table of audit data
`strat.col`	The column to use that identifies the stratification levels

Value

Table of strata. For each stratum (row) the table has the name of the stratam, the number of precincts in the stratum, the number of audited precincts and percent of precincts audited.

Author(s)

Luke W. Miratrix

Fraction of votes bound

Description

WPM. The maximum error of the unit is a fixed percentage of the total votes cast in the unit. Typically the 20% WPM is used–meaning a swing of 40% is the largest error possible as 20% of the votes go from the winner to the loser.

Usage

fractionOfVotesBound(Z, frac = 0.4)
fractionOfVotesBound(Z, frac = 0.4)

Arguments

`Z`	The elec.data object.
`frac`	Fraction of total votes that could be a winner overstatement/loser understatement. So if the worst-case is a 20% flip then enter 0.4

Check if object is elec.data object

Description

Check if object is elec.data object

Usage

is.elec.data(x)
is.elec.data(x)

Arguments

`x`	object to test.

Value

is.elec.data: TRUE if object is an elec.data object.

KM Audit Calculator

Description

Do a KM audit given a specified list of audited batches for a specified election.

Usage

KM.audit(
  data,
  U,
  Z,
  alpha = 0.25,
  plot = FALSE,
  debug = FALSE,
  return.Ps = FALSE,
  truncate.Ps = TRUE
)
KM.audit(
  data,
  U,
  Z,
  alpha = 0.25,
  plot = FALSE,
  debug = FALSE,
  return.Ps = FALSE,
  truncate.Ps = TRUE
)

Arguments

`data`	Data frame holding audit data with taint and tot.votes as two columns.
`U`	Maximum total error bound (sum of e.max for all batches in race).
`Z`	elec.data object for the race—the original reported results.
`alpha`	Risk.
`plot`	Plot the audit?
`debug`	Print debugging info
`return.Ps`	Return the stepwise P-values
`truncate.Ps`	Return the stepwise P-values only up to the audit stop point.

Details

This will do a single-stage KM audit as a consequence of doing the stepwise version (since the single-stage is the same as the stepwise up to the number of batches audited).

WARNING: This function is not fully debugged!

Value

List of various things, including final p-value.

Author(s)

Miratrix

References

Stark, Miratrix

Calculate sample size for KM-audit.

Description

Calculate the size of a sample needed to certify a correct election if a KM audit is planned.

Usage

KM.calc.sample(Z, beta = 0.75, taint = 0, bound = c("e.plus", "WPM", "passed"))
KM.calc.sample(Z, beta = 0.75, taint = 0, bound = c("e.plus", "WPM", "passed"))

Arguments

`Z`	elec.data object
`beta`	Desired level of confidence. This is 1-risk, where risk is the maximum chance of not going to a full recount if the results are wrong. Note that in Stark's papers, the value of interest is typically risk, denoted $alpha$.
`taint`	Assumed taint. Taint is assumed to be the taint for all batches (very conservative). If taint=0 then we produce a good baseline.
`bound`	Type of bound on the maximum error one could find in a batch.

Value

A audit.plan.KM object.

Author(s)

Based on the KM audit by Stark.

Examples


  data(santa.cruz)
  Z = elec.data( santa.cruz, C.names=c("danner","leopold") )
  KM.calc.sample( Z, beta=0.75, taint=0 )

data(santa.cruz)
  Z = elec.data( santa.cruz, C.names=c("danner","leopold") )
  KM.calc.sample( Z, beta=0.75, taint=0 )

Make a fake audit given specified error for simulations

Description

Functions that make fake audits given a specified error mechanism and a elec.data object holding reported outcomes.

Usage

make.audit.from.Z(Z, N = 400, ...)

make.audit(
  Z = NULL,
  method = c("tweak", "opt.bad", "opt.bad.WPM", "opt.bad.packed", "opt.bad.packed.WPM",
    "ok", "no error"),
  p_d = 0.2,
  swing = 20,
  max.taint = 1,
  print.race = FALSE,
  ...
)
make.audit.from.Z(Z, N = 400, ...)

make.audit(
  Z = NULL,
  method = c("tweak", "opt.bad", "opt.bad.WPM", "opt.bad.packed", "opt.bad.packed.WPM",
    "ok", "no error"),
  p_d = 0.2,
  swing = 20,
  max.taint = 1,
  print.race = FALSE,
  ...
)

Arguments

`Z`	elec.data object. For make.audit.from.Z, this is the large election, holding precincts with size, votes, etc., that get sampled to make an election of a requested number of batches.
`N`	The desired size of the new election.
`...`	other arguments to the method functions
`method`	the method of error generation. if "tweak" (the default), then add random amounts of swing to some precincts, and call that the "truth". The other methods generate the truth according to various metrics.
`p_d`	percent chance of error in precinct (for ok method)
`swing`	vote swing if batch has error (for ok method)
`max.taint`	maximum taint allowed in batch
`print.race`	print info on race to command line?

Details

make.audit is to make the election results that can be sampled from with the simulator. This method generates the true taint and sampling weights of all precincts in the race. The taint is in column 'taint', sampling weights in 'e.max'

make.audit.from.Z Given the structure of some large election, make a small election by sampling batches (with replacement) from the full list. This first samples N precincts (and gets the totals from them) and then builds the 'truth' as normal using the make.audit() method. Note different calls to this will produce different margins based on precincts selected.

WARNING: It is concievable that the winner will flip due to the sampling, if the sample has too many batches for the loser.

Value

Data frame with precinct information for the race. NOTE- The reported vote totals are just that, reported.

Author(s)

Miratrix

Make the cartoon example from the CAST paper as a voter data matrix.

Description

This makes the sample scenario described in P. B. Stark's CAST paper.

Usage

make.cartoon(n = 400, vote.dist = c(125, 113, 13), stratify = TRUE)
make.cartoon(n = 400, vote.dist = c(125, 113, 13), stratify = TRUE)

Arguments

`n`	Size of sample.
`vote.dist`	reported votes for C1, C2, and C3 in order for all precincts.prompt
`stratify`	Should the sample be stratified?

make.truth.opt.bad

Description

Generate a “truth” that is optimally bad in the sense of the margin in error is packed into as few precints as possible.

Usage

make.opt.packed.bad(
  Z,
  max.taint = 1,
  max.taint.good = max.taint,
  WPM = FALSE,
  add.good = 0,
  add.random = FALSE
)
make.opt.packed.bad(
  Z,
  max.taint = 1,
  max.taint.good = max.taint,
  WPM = FALSE,
  add.good = 0,
  add.random = FALSE
)

Arguments

`Z`	elec.data object to make bad truth for.
`max.taint`	max taint for any batch
`max.taint.good`	max taint in good direction for any batch
`WPM`	Use WPM bound on error.
`add.good`	add this amount of margin in good error (i.e. for the winner)
`add.random`	add a random tweak to error

Details

Make an audit data.frame with the error being exactly 1 margin, and packed into a small number of precincts (with some potential for binding amount of error per precinct).

Warning: error is not necessarily achievable as the discrete nature of whole votes is disregarded.

Value

Return the vote matrix (a data.frame) with tot.votes, e.max, and taint computed (NOT the elec data object).

making fake truth for electios

Description

Make a random truth that is with the reported outcome, but has random error scattered throughout.

Usage

make.random.truth(
  Z,
  p_d = 0.1,
  swing = 10,
  uniform = TRUE,
  seed = NULL,
  PID = "PID"
)
make.random.truth(
  Z,
  p_d = 0.1,
  swing = 10,
  uniform = TRUE,
  seed = NULL,
  PID = "PID"
)

Arguments

`Z`	elec.data object. The original reported results.
`p_d`	chance a batch has error
`swing`	max amount of error in votes.
`uniform`	if yes, then error is from 1 to swing. If no, then error is swing.
`seed`	random seed to ease replication
`PID`	which column has batch IDs.

Details

Given reported results (Z), make a new data.frame which is the truth (that can be 'audited' by looking at relevant precincts).

This is the generic small error generation used in trinomial paper and elsewhere as a baseline "normal" mode of operations.

Value

# Return: elec.data object holding the 'truth'.

Generate fake election results for simulation studies

Description

These methods are for SIMULATION STUDIES. These functions will build a sample, i.e. simulated, record of votes given certain parameters.

Usage

make.sample(
  M,
  N,
  strata = 1,
  per.winner = NULL,
  worst.e.max = NULL,
  R = NULL,
  tot.votes = 1e+05
)
make.sample(
  M,
  N,
  strata = 1,
  per.winner = NULL,
  worst.e.max = NULL,
  R = NULL,
  tot.votes = 1e+05
)

Arguments

`M`	The margin desired between the winner and loser (as a percent).
`N`	Number of precincts desired.
`strata`	Number of strata desired.
`per.winner`	The percent of votes the winner should receive.
`worst.e.max`	The worst e.max possible for any precinct.
`R`	The "dispersion" a measure of how unequal in size precincts should be. R needs to be greater than 0. NULL indicates equal size. For R between 0 and 1, the precincts are distributed 'linearly', i.e., the size of precinct i is proportional to i. At 2, the smallest precint will be near 0 and the largest twice the average votes per precinct. After 2, the precincts are distributed in a more curved fashion so that the smaller precincts do not go negative.
`tot.votes`	The total votes desired.

Value

A elec.data object meeting the desired specifications.

Author(s)

Luke W. Miratrix

References

See http://www.stat.berkeley.edu/~stark/Vote/index.htm for relevant information.

Examples


Z = make.sample(0.08, 150, per.winner=0.4)
Z

Z2 = make.sample(0.08, 150, per.winner=0.4, R=2.2)
Z2

## Note how they have different precinct sizes.

summary(Z$V$tot.votes)
summary(Z2$V$tot.votes)



Z = make.sample(0.08, 150, per.winner=0.4)
Z

Z2 = make.sample(0.08, 150, per.winner=0.4, R=2.2)
Z2

## Note how they have different precinct sizes.

summary(Z$V$tot.votes)
summary(Z2$V$tot.votes)

Make sample from vote totals (for simulations)

Description

Given a vector of precinct totals and the total votes for the winner and the loser, make a plausible precinct-by-precinct vote count that works. Note: the margins of the precincts will all be the same as the margin of the overall race.

Usage

make.sample.from.totals(vote.W, vote.L, totals)
make.sample.from.totals(vote.W, vote.L, totals)

Arguments

`vote.W`	Total votes for winner.
`vote.L`	Total votes for loser.
`totals`	Vector of total votes for precincts.

Make baseline truth for simulations

Description

For simulations. These methods, given an elec.data object, make a “truth”—i.e. a different vote count—that meets the same precinct and tot.votes structure, but has potentially different results and outcomes.

make.truth.opt.bad makes the “optimally worse truth”, where the error needed to flip the winner and runner-up is packed into as a few precincts as possible.

make.ok.truth makes the truth have the same outcome as the reported, but some errors here and there.

Warning: if bound is WPM this error is made by simply adding the max amount of error to the first loser's total (so that total votes may in this case exceed the total votes of the precinct)–this could potentially cause trouble. Be careful!

make bad truth as described in Stark's paper (assuming fixed precinct size)

Usage

make.truth.ex.bad(Z)

make.truth.opt.bad(Z, strata = "strata", bound = c("margin", "WPM"), t = 0)

make.truth.opt.bad.strat(Z, strata = "strata", t = 3, shuffle.strata = FALSE)

make.ok.truth(Z, num.off = 8, amount.off = 5)
make.truth.ex.bad(Z)

make.truth.opt.bad(Z, strata = "strata", bound = c("margin", "WPM"), t = 0)

make.truth.opt.bad.strat(Z, strata = "strata", t = 3, shuffle.strata = FALSE)

make.ok.truth(Z, num.off = 8, amount.off = 5)

Arguments

`Z`	The elec.data to build from.
`strata`	name of column holding strata, if any.
`bound`	What sort of maximum error can be held in a precinct.
`t`	an allowed backgound level of error for all precincts
`shuffle.strata`	Should the error be randomly put in the strata?
`num.off`	Number of precincts that should have small errors. Direction of errors split 50-50 positive and negative.
`amount.off`	Size of the small errors that should be imposed.

Value

Another elec.data matrix with the same candidates and total ballot counts as the passed frame, but with different candidate totals and by-precinct votes. Can be used to test the power or actual confidence of the various auditing procedures.

WARNING: make.ok.truth randomly adds votes and can thus sometimes exceed the allowed ballot count for a precinct by small amounts.

WARNING: If the desired bound is WPM, the error in make.opt.bad.truth is made by simply adding the maximum allowed amount of error in votes to the first loser's total (so that total votes may in this case exceed the total votes of the precinct)–this could potentially cause trouble. Be careful!

WARNING: make.truth.ex.bad and make.truth.opt.bad.strat only work in conjunction with the make.cartoon method.

Author(s)

Luke W. Miratrix

Examples


## First make a fake election.
Z = make.sample(0.08, 150, per.winner=0.4, R=2.2)
Z

## Now make a fake truth, which has a lot of small errors:
Zb = make.ok.truth(Z, num.off=150, amount.off=5)
Zb

## Finally, make the hardest to detect (via SRS) ``wrong'' election:
Zw = make.truth.opt.bad( Z, t=4 )
Zw 
## First make a fake election.
Z = make.sample(0.08, 150, per.winner=0.4, R=2.2)
Z

## Now make a fake truth, which has a lot of small errors:
Zb = make.ok.truth(Z, num.off=150, amount.off=5)
Zb

## Finally, make the hardest to detect (via SRS) ``wrong'' election:
Zw = make.truth.opt.bad( Z, t=4 )
Zw

Marin Measure B Reported Results

Description

These are the reported vote totals from the 2009 election in Marin, CA for Measure B.

Note the vote totals for the VBM strata are made up. The batches are the “Decks”, which could not be individually tallied with ease. The work-around was complex. See the references, below.

Format

A data frame with 544 observations on the following 5 variables.

PID: Batch ID
strata: There are two levels, ST-IB ST-VBM for in-precinct and Vote-by-Mail.
tot.votes: total ballots cast in the batch.
Yes: Number recorded for Yes
No: Number recorded for No

Source

Marin, CA 2009 reported election results.

References

See J. L. Hall, L. W. Miratrix, P. B. Stark, M. Briones, E. Ginnold, F. Oakley, M. Peaden, G. Pellerin, T. Stanionis, and T. Webber. Implementing risk-limiting audits in california. USENIX EVT/WOTE in press, July 2009.

Examples


data(marin)
marin = elec.data( marin, C.names=c("Yes","No") )

# Hand fixing error bound due to unknown
# vote totals in the VBM decks
marin$V$e.max = maximumMarginBound(marin)
sum( marin$V$e.max )   # 7.128
vbm = marin$V$strata=="ST-VBM"
marin$V[ vbm, "e.max" ] = 2 * marin$V[ vbm, "tot.votes" ] / marin$margin

sum( marin$V$e.max )   # 9.782


data(marin)
marin = elec.data( marin, C.names=c("Yes","No") )

# Hand fixing error bound due to unknown
# vote totals in the VBM decks
marin$V$e.max = maximumMarginBound(marin)
sum( marin$V$e.max )   # 7.128
vbm = marin$V$strata=="ST-VBM"
marin$V[ vbm, "e.max" ] = 2 * marin$V[ vbm, "tot.votes" ] / marin$margin

sum( marin$V$e.max )   # 9.782

Election Audit Error Bound Functions

Description

This is one of the various bounding functions used to bound the maximum amount of error one could see in a single audit unit.

maximumMarginBound returns the maximum margin reduction for each precint by computing all margin reductions between pairs of winners & losers and then scaling by that pair's total margin to get a proportion and then taking the max of all such proportions (usually will be the last winner to the closest loser).

Usage

maximumMarginBound(Z, votes = NULL)
maximumMarginBound(Z, votes = NULL)

Arguments

`Z`	The elec.data object.
`votes`	The data.frame to compute the maximumMarginBounds for. If null, will return all bounds for all precincts in Z.

Value

Vector (of length of precincts) of maximum possible error for each precinct.

Author(s)

Luke W. Miratrix

KM Audit Sample Size Calc

Description

Calc KM Optimal Sample Size

Usage

opt.sample.size(Z, beta = 0.25)
opt.sample.size(Z, beta = 0.25)

Arguments

`Z`	elec.data object
`beta`	risk

Details

This is how many steps would be needed if no error was found with each step. Obviously a bit idealistic, but still useful.

Value

Single number of batches to sample.

Pretty print KM audit plan

Description

Pretty print KM audit plan

Usage

## S3 method for class 'audit.plan.KM'
print(x, ...)
## S3 method for class 'audit.plan.KM'
print(x, ...)

Arguments

`x`	A audit.plan.KM object, such as one returned by KM.calc.sample.
`...`	ignored

Santa Cruz Election Data

Description

santa.cruz and santa.cruz.audit hold data from a Santa Cruz County, CA, contest held in November, 2008, for County Supervisor in the 1st District. The competitive candidates were John Leopold and Betty Danner. According to the semi-official results provided to us by the Santa Cruz County Clerk's office, Leopold won with votes on 45% of the 26,655 ballots. Danner received the votes on 37% of the ballots. The remaining ballots were undervoted, overvoted, or had votes for minor candidates.

santa.cruz holds the semi-official results for the race. santa.cruz.audit holds the audit totals for the random sample of precincts selected for the audit. Note the santa.cruz.audit vote counts are larger for some precincts due the missing provisional ballot counts in the semi-official results.

Format

A data frame with 152 observations on the following 5 variables.

PID: Precinct IDs (unique) for all precincts involved in race
r: Total number of registered voters in the precinct.
tot.votes: Total number of ballots cast in the precinct.
leopold: Total number of ballots marked for John Leopold.
danner: Total number of ballots marked for Betty Danner.

Source

Santa Cruz County, CA, Clerk Gail Pellerin, and their staff.

Examples


data(santa.cruz)
elec.data( santa.cruz, C.names=c("danner","leopold") )

data(santa.cruz)
elec.data( santa.cruz, C.names=c("danner","leopold") )

Santa Cruz Election Data

Description

santa.cruz.audit holds the audit totals for the random sample of precincts selected for the audit. Note the santa.cruz.audit vote counts are larger for some precincts due the missing provisional ballot counts in the semi-official results.

Format

A data frame with 16 observations on the following 4 variables.

PID: Precinct IDs (unique) for all precincts involved in race
leopold: Total number of ballots marked for John Leopold.
danner: Total number of ballots marked for Betty Danner.
count: The number of times precinct was sampled in the PPEB sample taken.

Source

Santa Cruz County, CA, Clerk Gail Pellerin, and their staffs, which we thank for their generous cooperation and the considerable time and effort they spent counting ballots by hand in order to collect these data.

Examples


data(santa.cruz.audit)
data(santa.cruz)
santa.cruz = elec.data(santa.cruz, C.names=c("leopold","danner"))
trinomial.audit( santa.cruz, santa.cruz.audit )

data(santa.cruz.audit)
data(santa.cruz)
santa.cruz = elec.data(santa.cruz, C.names=c("leopold","danner"))
trinomial.audit( santa.cruz, santa.cruz.audit )

Simulate CAST audits to assess performance

Description

Simulate a race (using the make.cartoon method) and run a CAST audit on that simulation. CAST is a system devised by Dr. Philip B., Stark, UC Berkeley Department of Statistics.

Usage

sim.race(
  n = 800,
  beta = 0.75,
  stages = 2,
  truth.maker = make.truth.opt.bad,
  print.trail = FALSE
)
sim.race(
  n = 800,
  beta = 0.75,
  stages = 2,
  truth.maker = make.truth.opt.bad,
  print.trail = FALSE
)

Arguments

`n`	Desired sample size.
`beta`	the confidence level desired
`stages`	number of auditing stages. Each stage will have the same confidence level, determined by a function of beta.
`truth.maker`	Function to generate "truth"
`print.trail`	Print out diagnostics.

Value

A vector of 3 numbers. The first is the stage reached. The second is the total number of precincts audited. The third is 0 if the audit failed to certify (i.e. found large error in the final stage), and 1 if the audit certified the election (did not find large error in the final stage).

Author(s)

Luke W. Miratrix

References

See http://www.stat.berkeley.edu/~stark/Vote/index.htm for relevant information.

Examples


     ## See how many times the CAST method fails to catch a wrong
     ##  election in 20 trials.
     replicate( 20, sim.race( beta=0.75, stages=2, truth.maker=make.truth.opt.bad) )

     ## Now see how much work the CAST method does for typical elections.
     replicate( 20, sim.race( beta=0.75, stages=2, truth.maker=make.ok.truth) )

## See how many times the CAST method fails to catch a wrong
     ##  election in 20 trials.
     replicate( 20, sim.race( beta=0.75, stages=2, truth.maker=make.truth.opt.bad) )

     ## Now see how much work the CAST method does for typical elections.
     replicate( 20, sim.race( beta=0.75, stages=2, truth.maker=make.ok.truth) )

simulate KM audits

Description

This takes an election and a truth and conducts a KM audit.

Usage

simulateIt(
  data,
  M = 50,
  alpha = 0.25,
  plot = FALSE,
  debug = FALSE,
  return.Ps = FALSE,
  truncate.Ps = TRUE
)
simulateIt(
  data,
  M = 50,
  alpha = 0.25,
  plot = FALSE,
  debug = FALSE,
  return.Ps = FALSE,
  truncate.Ps = TRUE
)

Arguments

`data`	a data frame, one row per patch, with: tot.votes, e.max, taint
`M`	the maximum number of samples to draw before automatically escalating to a full recount.
`alpha`	level of risk.
`plot`	plot a chart?
`debug`	debug diag printed?
`return.Ps`	Return the sequence of p-values all the way up to N.
`truncate.Ps`	Return Ps only up to where audit stopped.

Details

Given a list of all precincts and their true taints and their sampling weights (in data, a data.frame), do a sequential audit at the specified alpha.

Value

stopPt - number of draws drawn n - number of unique precincts audited

Workhorse driver for stark.test

Description

These main methods conduct the test of the election audit and returns a p-value and other related info on that test.

Usage

stark.test.Z(
  Z,
  calc.e_p = calc.pairwise.e_p,
  w_p = weight.function("no.weight"),
  max_err = maximumMarginBound,
  bound.col = Z$tot.votes.col,
  strat.col = NULL,
  drop = NULL,
  strat.method = NULL,
  err.override = NULL,
  n = NULL,
  t = NULL,
  q = NULL
)

stark.test(
  votes,
  audits,
  C.names = NULL,
  f = 1,
  pool = TRUE,
  pairwise = FALSE,
  ...
)
stark.test.Z(
  Z,
  calc.e_p = calc.pairwise.e_p,
  w_p = weight.function("no.weight"),
  max_err = maximumMarginBound,
  bound.col = Z$tot.votes.col,
  strat.col = NULL,
  drop = NULL,
  strat.method = NULL,
  err.override = NULL,
  n = NULL,
  t = NULL,
  q = NULL
)

stark.test(
  votes,
  audits,
  C.names = NULL,
  f = 1,
  pool = TRUE,
  pairwise = FALSE,
  ...
)

Arguments

`Z`	The object holding all the voting information. See below for details.
`calc.e_p`	The Function used to calculate maximum error bounds
`w_p`	The function used to calculate weights of error (A list of two functions)
`max_err`	Function to compute max error bounds for each precint
`bound.col`	Name (or column index) of column in the vote matrix corresponding to maximum number of votes allowed in precinct.
`strat.col`	Name of column that determines how to stratify if NULL will not stratify
`drop`	Either a vector of TRUE/FALSE or a name of a column in Z\$V of T/F values. Precincts identified by drop will be dropped from calculations.
`strat.method`	Not currently implemented.
`err.override`	If non-null, use this as the found error in votes rather than the actual errors found in the audit.
`n`	Elements of the test statistic. Can pass to avoid computation if those values are already known (e.g., for a simulation)
`t`	Elements of the test statistic. Can pass to avoid computation if those values are already known (e.g., for a simulation)
`q`	Elements of the test statistic. Can pass to avoid computation if those values are already known (e.g., for a simulation)
`votes`	data.frame of votes. Each row is precinct.
`audits`	data.frame of audits. Each row is precinct. Table reports overstatement by candidate.
`C.names`	Names of candidates (and names of cor columns in votes and audits tables. If NULL will derive from cols 2 on of votes
`f`	The number of winners
`pool`	If TRUE, combine small candidates into single pseudo-candidates to increase power
`pairwise`	if TRUE then do a pairwise test for all pairs and return highest p-value
`...`	Extra arguments passed directly to the work-horse method stark.test.Z

Details

It is an older method. Most likely CAST.audit or trinomial.audit should be used instead.

stark.test() will do the entire test. It is basically a driver function that sets up 'Z' matrix and passes buck to the stark.test.Z

The Z object, in particular has: Z\$V: The table of reported votes Z\$audit: The table of audits as differences from recorded votes

Value

Return an htest object with pvalue, some relevant statistics, and the Z object used (possibly constructed) that produced those results.

Author(s)

Luke W. Miratrix

Examples


## pretending that santa cruz audit was a SRS audit (which it was not)
data(santa.cruz)
Z = elec.data(santa.cruz, C.names=c("leopold","danner"))
data(santa.cruz.audit)
## do some work to get the audit totals to overstatements
rownames(santa.cruz.audit) = santa.cruz.audit$PID
Z$audit = audit.totals.to.OS(Z, santa.cruz.audit)
Z$audit
stark.test.Z(Z)


## pretending that santa cruz audit was a SRS audit (which it was not)
data(santa.cruz)
Z = elec.data(santa.cruz, C.names=c("leopold","danner"))
data(santa.cruz.audit)
## do some work to get the audit totals to overstatements
rownames(santa.cruz.audit) = santa.cruz.audit$PID
Z$audit = audit.totals.to.OS(Z, santa.cruz.audit)
Z$audit
stark.test.Z(Z)

tri.audit.sim

Description

This is a SIMULATION FUNCTION, and is not used for actual auditing of elections.

Usage

tri.audit.sim(
  Z,
  n,
  p_d = 0.1,
  swing = 5,
  return.type = c("statistics", "taints", "precinct"),
  seed = NULL,
  PID = "PID",
  ...
)
tri.audit.sim(
  Z,
  n,
  p_d = 0.1,
  swing = 5,
  return.type = c("statistics", "taints", "precinct"),
  seed = NULL,
  PID = "PID",
  ...
)

Arguments

`Z`	elec.data object.
`n`	Sample size to draw.
`p_d`	The probability of a precinct having an error.
`swing`	The size of the error, in votes.
`return.type`	What kind of results to return: "statistics","taints", or "precinct"
`seed`	Random seed to use.
`PID`	Column name of column holding unique precinct IDs
`...`	Extra arguments passed to tri.sample

Details

Given a matrix of votes, calculate the weights for all precincts and then draw a sample (using tri.sample). Then, assuming that p\_d percent of the precincts (at random) have error, and the errors are due to vote miscounts of size 'swing', conduct a simulated “audit”, returning the found descrepancies.

Value

List of taints found in such a circumstance OR precincts selected with relevant attributes (including simulated errors, if asked) OR the number of non-zero taints and the size of largest taint.

Author(s)

Luke W. Miratrix

Examples



  data(santa.cruz)
  Z = elec.data(santa.cruz, C.names=c("leopold","danner"))
  Z$V$e.max = maximumMarginBound( Z )
  ## Sample from fake truth, see how many errors we get.
  tri.audit.sim( Z, 10,  p_d=0.25, swing=10, return.type="precinct" )

  ## what does distribution look like?
  res = replicate( 200, tri.audit.sim( Z, 10,  p_d=0.25, swing=10 ) )
  apply(res,1, summary) 
  hist( res[2,], main="Distribution of maximum size taint" )

data(santa.cruz)
  Z = elec.data(santa.cruz, C.names=c("leopold","danner"))
  Z$V$e.max = maximumMarginBound( Z )
  ## Sample from fake truth, see how many errors we get.
  tri.audit.sim( Z, 10,  p_d=0.25, swing=10, return.type="precinct" )

  ## what does distribution look like?
  res = replicate( 200, tri.audit.sim( Z, 10,  p_d=0.25, swing=10 ) )
  apply(res,1, summary) 
  hist( res[2,], main="Distribution of maximum size taint" )

Calculate needed sample size for election auditing using the Trinomial Bound

Description

Calculate an estimated sample size to do a trinomial bound that would have a specified power (the chance to certify assuming a given estimate of low-error error rate), and a specified maximum risk of erroneously certifying if the actual election outcome is wrong.

Usage

tri.calc.sample(
  Z,
  beta = 0.75,
  guess.N = 20,
  p_d = 0.1,
  swing = 5,
  power = 0.9,
  bound = c("e.plus", "WPM", "passed")
)
tri.calc.sample(
  Z,
  beta = 0.75,
  guess.N = 20,
  p_d = 0.1,
  swing = 5,
  power = 0.9,
  bound = c("e.plus", "WPM", "passed")
)

Arguments

`Z`	elec.data object
`beta`	1-beta is the acceptable risk of failing to notice that a full manual count is needed given an election with an actual outcome different from the semi-official outcome.
`guess.N`	The guessed needed sample size.
`p_d`	For the alternate: estimate of the proportion of precincts that have error.
`swing`	For the alternate: estimate of the max size of an error in votes, given that error exists.
`power`	The desired power of the test against the specified alternate defined by p\_d and swing.
`bound`	e.plus, WPM, or use the passed, previously computed, e.max values in the Z object.

Value

An audit.plan.tri object. This is an object that holds information on how many samples are needed in the audit, the maximum amount of potential overstatement in the election, and a few other things.

References

See Luke W. Miratrix and Philip B. Stark. (2009) Election Audits using a Trinomial Bound. http://www.stat.berkeley.edu/~stark

Examples


data(santa.cruz)
Z = elec.data( santa.cruz, C.names=c("danner","leopold") )
tri.calc.sample( Z, beta=0.75, guess.N = 10, p_d = 0.05,
               swing=10, power=0.9, bound="e.plus" )

data(santa.cruz)
Z = elec.data( santa.cruz, C.names=c("danner","leopold") )
tri.calc.sample( Z, beta=0.75, guess.N = 10, p_d = 0.05,
               swing=10, power=0.9, bound="e.plus" )

Sample from List of Precincts PPEB

Description

tri.sample selects a sample of precincts PPEB. Namely, samples n times, with replacement, from the precincts proportional to the weights of the precincts.

Usage

tri.sample(
  Z,
  n,
  seed = NULL,
  print.trail = FALSE,
  simplify = TRUE,
  return.precincts = TRUE,
  PID = "PID",
  known = "known"
)
tri.sample(
  Z,
  n,
  seed = NULL,
  print.trail = FALSE,
  simplify = TRUE,
  return.precincts = TRUE,
  PID = "PID",
  known = "known"
)

Arguments

`Z`	elec.data object
`n`	Either a audit.plan.tri object (that contains n) or an integer which is the size of the sample
`seed`	Seed to use.
`print.trail`	Print diagnostics and info on the selection process.
`simplify`	If TRUE, return a data frame of unique precincts sampled, with counts of how many times they were sampled. Otherwise return repeatedly sampled precincts seperately.
`return.precincts`	Return the precincts, or just the precint IDs
`PID`	The name of the column in Z\$V holding unique precinct IDs
`known`	Name of column in Z\$V of TRUE/FALSE, where TRUE are precincts that are considered “known”, and thus should not be sampled for whatever reason.

Details

The weights, if passed, are in the “e.max” column of Z\$V.

Value

a sample of precincts.

Author(s)

Luke W. Miratrix

Examples


data(santa.cruz)
Z = elec.data( santa.cruz, C.names=c("danner","leopold") )
samp = tri.calc.sample( Z, beta=0.75, guess.N = 10, p_d = 0.05,
               swing=10, power=0.9, bound="e.plus" )
tri.sample( Z, samp, seed=541227 )

data(santa.cruz)
Z = elec.data( santa.cruz, C.names=c("danner","leopold") )
samp = tri.calc.sample( Z, beta=0.75, guess.N = 10, p_d = 0.05,
               swing=10, power=0.9, bound="e.plus" )
tri.sample( Z, samp, seed=541227 )

Utility function for tri.sample

Description

A utility function returning the total number of unique precincts and ballots given a sample.

Usage

tri.sample.stats(samp)
tri.sample.stats(samp)

Arguments

samp

A sample, such as one returned from tri.sample

Value

the total number of unique precincts and ballots given a sample.

Conduct trinomial audit

Description

trinomial.audit converts the audited total counts for candidates to overstatements and taints. trinomial.bound calculates the trinomial bound given the size of an audit sample, the number of non-zero errors, and the size of the small-error threshold. It can also plot a contour of the distribution space, bounds, and alpha lines.

Usage

trinomial.audit(Z, audit)
trinomial.audit(Z, audit)

Arguments

`Z`	An elec.data object that is the race being audited.
`audit`	A data.frame with a column for each candidate and a row for each audited precinct, holding the audit totals for each candidate. An additional column, `count`, holds the number of times that precinct was sampled (since sampling was done by replacement).

Details

Right now the p-value is computed in a clumsy, bad way. A grid of points over (0, xlim) X (0, ylim) is generated corresponding to values of p0 and pd, and for each point the mean of that distribution and the chance of generating an outcome as extreme as k is calculated. Then the set of points with an outcome close to alpha is extrated, and the corresponding bound is optimized over this subset. Not the best way to do things.

Auditing with the Trinomial Bound: trinomial.bound and trinomial.audit

Description

This method makes a contour plot of the optimization problem.

Usage

trinomial.bound(
  n = 11,
  k = 2,
  d = 40,
  e.max = 100,
  xlim = c(0.4, 1),
  ylim = c(0, 0.55),
  alpha.lvls = c(10),
  zero.threshold = 0.3,
  tick.lines = NULL,
  alpha.lwd = 2,
  bold.first = FALSE,
  plot = TRUE,
  p.value.bound = NULL,
  grid.resolution = 300,
  ...
)
trinomial.bound(
  n = 11,
  k = 2,
  d = 40,
  e.max = 100,
  xlim = c(0.4, 1),
  ylim = c(0, 0.55),
  alpha.lvls = c(10),
  zero.threshold = 0.3,
  tick.lines = NULL,
  alpha.lwd = 2,
  bold.first = FALSE,
  plot = TRUE,
  p.value.bound = NULL,
  grid.resolution = 300,
  ...
)

Arguments

`n`	Size of the sample (not precincts, but samples which could potentially be multiple samples of the same precinct).
`k`	The number of positive taints found in sample.
`d`	The maximum size of a small taint. This is the threshold for being in the middle bin of the trinomial. All taints larger than d would be in the largest error bin.
`e.max`	The size of the largest error bin. Typically 100 (for percent) or 1.
`xlim`	Range of possible values of p0 worth considering
`ylim`	Range of possible values of pd worth considering
`alpha.lvls`	List of alphas for which bounds should be calculated. The first is the one that will be returned. The others will be graphed.
`zero.threshold`	Since the method calculates on a numerical grid, what difference between alpha and the calculated probabilty should be considered no difference.
`tick.lines`	A list of bounds. For these bound levels, add tick-lines (more faint lines) to graph
`alpha.lwd`	Line width for alpha line.
`bold.first`	TRUE/FALSE. Should first alpha line be in bold.
`plot`	Should a plot be generated.
`p.value.bound`	What is the bound (1/U) that would correspond to the entire margin. Finding the alpha corresponding to this bound is a method for finding the p-value for the trinomial bound test.
`grid.resolution`	How many divisions of the grid should there be? More gives greater accuracy in the resulting p-values and bounds.
`...`	Extra arguments passed to the plot command.

Details

Note: alphas are multiplied by 100 to get in percents.

Value

List with characteristics of the audit and the final results.

`n`	Size of sample.
`k`	Number of non-zero taints.
`d`	Threshold for what a small taint is.
`e.max`	The worst-case taint.
`max`	The upper confidence bound for the passed alpha-level.
`p`	A length three vector. The distribution (p0, pd, p1) that achieves the worst case.
`p.value`	The p.value for the test, if a specific worst-case bound 1/U was passed via p.value.bound.

References

See Luke W. Miratrix and Philip B. Stark. (2009) Election Audits using a Trinomial Bound. https://www.stat.berkeley.edu/~stark/Vote/index.htm

Examples



# The reported poll data: make an elec.data object for processing
data(santa.cruz)
Z = elec.data(santa.cruz, C.names=c("leopold","danner"))
Z

# Make a plan
plan = tri.calc.sample( Z, beta=0.75, guess.N = 10, p_d = 0.05,
               swing=10, power=0.9, bound="e.plus" )

# Conduct the audit
data(santa.cruz.audit)
res = trinomial.audit( Z, santa.cruz.audit )
res

# Compute the bound.  Everything is scaled by 100 (i.e. to percents) for easier numbers. 
trinomial.bound(n=res$n, k = res$k, d=100*plan$d, e.max=100, p.value.bound=100/plan$T,
           xlim=c(0.75,1), ylim=c(0.0,0.25),
           alpha.lvls=c(25), asp=1,
           main="Auditing Santa Cruz with Trinomial Bound" )

# The reported poll data: make an elec.data object for processing
data(santa.cruz)
Z = elec.data(santa.cruz, C.names=c("leopold","danner"))
Z

# Make a plan
plan = tri.calc.sample( Z, beta=0.75, guess.N = 10, p_d = 0.05,
               swing=10, power=0.9, bound="e.plus" )

# Conduct the audit
data(santa.cruz.audit)
res = trinomial.audit( Z, santa.cruz.audit )
res

# Compute the bound.  Everything is scaled by 100 (i.e. to percents) for easier numbers. 
trinomial.bound(n=res$n, k = res$k, d=100*plan$d, e.max=100, p.value.bound=100/plan$T,
           xlim=c(0.75,1), ylim=c(0.0,0.25),
           alpha.lvls=c(25), asp=1,
           main="Auditing Santa Cruz with Trinomial Bound" )

Looking at fake “truths” for election simulations

Description

This prints out total error in a fake truth for an election, and some other info.

Usage

truth.looker(data)
truth.looker(data)

Arguments

data

The data.frame returned from such things as make.audit

Details

Utility function for debugging and understanding stuff.

Look at a specific "truth" and print out what total error, etc. is.

Value

None. Just does printout.

weight functions

Description

This function produces weight functions to reweight found audit miscounts.

Usage

weight.function(
  name = c("no.weight", "weight", "weight.and.slop", "margin.weight", "taint")
)
weight.function(
  name = c("no.weight", "weight", "weight.and.slop", "margin.weight", "taint")
)

Arguments

name

name of function desired

Details

The functions are no weighting, weighted by size of precint, weight by size, after a slop of 2 votes has been taken off, and weighing for pairwise margin tests, and finally, the taint weight function that takes maximum error in precincts and gives a ratio of actual error to maximum error.

Value

A two-element list of two functions, the second being the inverse of the first. All the functions have three parameters, x, b\_m, and M, which are the things to weight, the bound on votes (or maximum error in precincts), and the (smallest) margin.

Author(s)

Luke W. Miratrix

Yolo County, CA Election Data

Description

This is for measure W in Yolo County, CA, November 2008. The file includes precinct-level reports.

In the actual audit, 6 precincts were selected (see example) and audited by hand-to-eye count by a group of 4 people cross-checking each other. One of the 6 batches had underreported the "yes" votes by 1, and one had overreported the "yes" votes by 1. There were no other errors.

Format

A data frame with 114 observations on the following 8 variables.

PID: Unique identifier for the batches of ballots
Pct: The precinct id of the batch
how: Vote by mail (VBM) or walk-in (PCT)
b: Number of votes cast in that unit
under: Number of undervotes (ballots not voted).
over: Number of overvotes (where someone marked both yes and no).
y: Reported number of valid ballots marked yes.
n: Reported number of valid ballots marked no.

Source

Yolo County, CA. Special thanks to Freddie Oakley and Tom Stanionis.

References

See Stark et al. for papers using this data to illustrate risk-limiting audits of election data.

Examples


# Make an elec.data object out of precicnt-level results
data(yolo)
yolo = elec.data( yolo, C.names=c("y","n","under","over"), tot.votes.col="b" ) 

# Look at different sample sizes and cuts for setting aside
# small precincts
CAST.calc.opt.cut( yolo, beta=0.75, stages=1, t=5, plot=TRUE )

print( yolo )

# Get details of the audit plan -- expected work, etc.
ap <- CAST.calc.sample( yolo, beta=0.75, stages=1, t=5, small.cut=5 )
print( ap )

# Draw a sample (seed not used for actual audit)
CAST.sample(yolo, ap, seed=12345678)



# Make an elec.data object out of precicnt-level results
data(yolo)
yolo = elec.data( yolo, C.names=c("y","n","under","over"), tot.votes.col="b" ) 

# Look at different sample sizes and cuts for setting aside
# small precincts
CAST.calc.opt.cut( yolo, beta=0.75, stages=1, t=5, plot=TRUE )

print( yolo )

# Get details of the audit plan -- expected work, etc.
ap <- CAST.calc.sample( yolo, beta=0.75, stages=1, t=5, small.cut=5 )
print( ap )

# Draw a sample (seed not used for actual audit)
CAST.sample(yolo, ap, seed=12345678)

Package 'elec'

Help Index

Statistical Election Audits Package

Description

Author(s)

References

Audit Plans for CAST and Trinomial Methods

Description

Usage

Arguments

Value

Author(s)

See Also

Converting total vote counts to Over Statements

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Functions that Compute Error Levels Given Audit Data

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Given audit data, compute p.values and all that.

Description

Usage

Arguments

Calculate Optimal CAST plan

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Construct a sample for auditing using CAST

Description

Usage

Arguments

Author(s)

References

See Also

Examples

Sample from the various strata according to the schedule set by 'ns'. Ignore all precincts that are known (i.e., have been previously audited).

Description

Usage

Arguments

Value

Examples

Calculate the measured error in each of the audited precicnts.

Description

Usage

Arguments

Value

compute.stark.t

Description

Usage

Arguments

Details

Value

Author(s)

See Also

countVotes

Description

Usage

Arguments

Value

Author(s)

Examples

do.audit

Description

Usage

Arguments