Parametric Linear Models

Section 7.7 Parametric Linear Models

In order to help OAFC decide on the best place for mobile health clinics within Walworth county, we will need to create a model of medical need. How exactly we do this will depend on many different factors, and there is no single right way to do it. One way of approaching the problem is to start by making some simplifying assumptions for our model. For the moment, me might make the following assumptions:

We will place three mobile clinics in three separate municipalities around the county. At the moment, we will not worry about exactly where the clinics are placed within each municipality.
For each municipality, we will compute a value between \(0\) and \(1\) which represents the medical need within that municipality as a percentage. This value will take into account several different variables which are strategically chosen, and whose values come from some trusted data set.
For each variable, we will use percentages of residents or households rather than population. This will ensure that our solution does not simply target the most populous areas.
We will choose to place our clinics in the three municipalities having the three highest measures of need.

What variables are important in determining medical need? To simplify things, let's start by considering two of them: The percentage of working individuals who do not have a car (which we call \(x\)), and the percentage of individuals who self-identify as Hispanic or Latino (which we call \(y\)). The choice of variable \(x\) is justified by the fact that OAFC is particularly interested in serving individuals in their mobile clinic who cannot make it to the existing clinic in Elkhorn. It stands to reason that individuals without a car would struggle to get to the clinic in an area with limited public transportation. On the other hand, the choice of variable \(y\) is partially justified by the systemic issues mentioned in Section 7.3. Indeed, individuals in this category are more likely to be noncitizens or undocumented than those in other minority categories, and data shows that these populations are more likely to be uninsured [7.12.1.129]. Finally, the data below shows that this category is the largert minority category, by far, in the county.

Race/ethnic category	Percentage
White alone	83.70%
Black or African American alone	1%
American Indian and Alaska Native alone	0.10%
Asian alone	0.70%
Native Hawaiian and Other Pacific Islander alone	0%
Some other race alone	0.60%
Two or more races	2.30%
Hispanic or Latino	11.60%

Figure 7.7.1. Race/ethnic categories in Walworth County, WI [7.12.1.139].

How can we combine these two variables to create a single model of medical need? The simplest way to do so is by choosing a weight \(w\) between \(0\) and \(1\) and creating a new variable \(z\) defined as follows:

\begin{equation*} z=wx+(1-w)y. \end{equation*}

The number \(w\) represents the fraction of the significance we place on the variable \(x\text{.}\) When \(w=0\text{,}\) all weight goes into variable \(y\text{,}\) and when \(w=1\text{,}\) all weight goes into variable \(x\text{.}\) Equal weight is placed on the two variables when \(w=\frac{1}{2}\text{.}\) The model is flexible in that we can allow the client to choose a value of \(w\) based on their own judgement of the relative importance of the variables. We will default to \(w=\frac{1}{2}\) for simplicity. The table below shows the values of \(x\text{,}\) \(y\) and \(z\) for each municipality in the county when \(w=\frac{1}{2}\text{.}\)

Municipality	Percentage of working households with no vehicle available as decimal \((x)\)	Percentage of individuals identifying as Hispanic/Latino as decimal \((y)\)	\(z=wx+(1-w)y\) \((w=\frac{1}{2})\)
Bloomfield village	0.0179	0.2026	0.1102
Bloomfield town	0	0.2805	0.1402
Darien village	0	0.1552	0.0776
Darien town	0.0033	0.1803	0.0918
Delavan city	0.0275	0.234	0.1308
Delavan town	0.0403	0.0914	0.0658
East Troy village	0.0202	0.0171	0.0186
East Troy town	0.0171	0.0369	0.027
Elkhorn city	0.0106	0.1531	0.0819
Fontana-on-Geneva Lake village	0.025	0.0152	0.0201
Geneva town	0.0121	0.1366	0.0743
Genoa City village	0.0054	0.1336	0.0695
Lafayette town	0.0042	0.0193	0.0117
La Grange town	0.0131	0.0455	0.0293
Lake Geneva city	0	0.0964	0.0482
Linn town	0.0046	0.0591	0.0319
Lyons town	0.0203	0.0234	0.0219
Mukwonago village	0	0	0
Richmond town	0	0.0708	0.0354
Sharon village	0.0092	0.3139	0.1615
Sharon town	0.0046	0.0663	0.0354
Spring Prairie town	0	0.0125	0.0062
Sugar Creek town	0	0.0178	0.0089
Troy town	0.0132	0.0282	0.0207
Walworth village	0	0.2457	0.1229
Walworth town	0	0.1169	0.0585
Whitewater city	0.027	0.1067	0.0668
Whitewater town	0.065	0.0979	0.0814
Williams Bay village	0.0259	0.1083	0.0671

Figure 7.7.2. Vehicle availability and Hispanic/Latino self-identification by municipality [7.12.1.141], [7.12.1.140].

The highlighted values in the table show the three municipalities of greatest need according to our model.

But, if you look closeWindow()ly, you may notice an interesting imbalance in this data: The values of the variable \(y\) are on average much larger than the values of the variable \(x\text{,}\) and this may be throwing off the weighting. We can fix this by normalizing \(x\) and \(y\) so that each column sums to \(1\text{.}\) To do this, we can divide each variable by sum of all of its values. We call these new variables \(\bar{x}\) and \(\bar{y}\text{.}\) Using \(\bar{x}\) and \(\bar{y}\) ensures that the variables will have equal weight when \(w=\frac{1}{2}\text{.}\) The table below shows the values of \(\bar{x}\text{,}\) \(\bar{y}\text{,}\) and \(\bar{z}=w\bar{x}+(1-w)\bar{y}\) for \(w=\frac{1}{2}\text{.}\) This model also has the nice property that the measures of need in the final column \((z)\) sum up to \(1\) no matter what the value of \(w\) is, so we can think of them as decimal percentages.

Municipality	Normalized percentage of working households with no vehicle available as decimal \((\bar{x})\)	Normalized percentage of individuals identifying as Hispanic/Latino as decimal \((\bar{y})\)	\(z=w\bar{x}+(1-w)\bar{y}\) \((w=\frac{1}{2})\)
Bloomfield village	0.0489	0.0661	0.0575
Bloomfield town	0	0.0915	0.0458
Darien village	0	0.0506	0.0253
Darien town	0.0089	0.0588	0.0339
Delavan city	0.0752	0.0763	0.0757
Delavan town	0.1099	0.0298	0.0699
East Troy village	0.055	0.0056	0.0303
East Troy town	0.0468	0.0121	0.0294
Elkhorn city	0.0289	0.05	0.0394
Fontana-on-Geneva Lake village	0.0682	0.005	0.0366
Geneva town	0.0329	0.0446	0.0387
Genoa City village	0.0149	0.0436	0.0292
Lafayette town	0.0114	0.0063	0.0088
La Grange town	0.0357	0.0148	0.0253
Lake Geneva city	0	0.0314	0.0157
Linn town	0.0127	0.0193	0.016
Lyons town	0.0555	0.0076	0.0316
Mukwonago village	0	0	0
Richmond town	0	0.0231	0.0115
Sharon village	0.025	0.1024	0.0637
Sharon town	0.0125	0.0216	0.0171
Spring Prairie town	0	0.0041	0.002
Sugar Creek town	0	0.0058	0.0029
Troy town	0.0361	0.0092	0.0227
Walworth village	0	0.0802	0.0401
Walworth town	0	0.0381	0.0191
Whitewater city	0.0736	0.0348	0.0542
Whitewater town	0.1773	0.032	0.1046
Williams Bay village	0.0707	0.0353	0.053

Figure 7.7.3. Normalized vehicle availability and Hispanic/Latino self-identification by municipality.

Now we can advise OAFC to place mobile clinics in the municipalities of Delavan city, Delavan town, and Whitewater town, assuming they want to weight the variables \(x\) and \(y\) equally!

The model we have just created is called a parametric linear model with two variables. The only difference when we add additional variables and weights is that the weights should all add up to \(1\text{.}\)

Parametric linear model.

Suppose that \(x_1,x_2,x_3,\ldots,x_n\) are quantitative variables associated to some dataset. A parametric linear model is an equation

\begin{equation*} z=w_1x_1+w_2x_2+w_3x_3+\cdots+w_nx_n. \end{equation*}

The numbers \(w_1,w_2,w_3,\ldots,w_n\) are called weights or parameters. In practice, we usually assume or arrange that the weights are positive and all sum to \(1\text{,}\) and that for each \(i\text{,}\) the value of the variable \(x_i\) over each point in the dataset sums to \(1\) as well. In this case we say the model is normalized.