Section 7.7 Parametric Linear Models
In order to help OAFC decide on the best place for mobile health clinics within Walworth county, we will need to create a model of medical need. How exactly we do this will depend on many different factors, and there is no single right way to do it. One way of approaching the problem is to start by making some simplifying assumptions for our model. For the moment, me might make the following assumptions:
We will place three mobile clinics in three separate municipalities around the county. At the moment, we will not worry about exactly where the clinics are placed within each municipality.
For each municipality, we will compute a value between \(0\) and \(1\) which represents the medical need within that municipality as a percentage. This value will take into account several different variables which are strategically chosen, and whose values come from some trusted data set.
For each variable, we will use percentages of residents or households rather than population. This will ensure that our solution does not simply target the most populous areas.
We will choose to place our clinics in the three municipalities having the three highest measures of need.
What variables are important in determining medical need? To simplify things, let's start by considering two of them: The percentage of working individuals who do not have a car (which we call \(x\)), and the percentage of individuals who self-identify as Hispanic or Latino (which we call \(y\)). The choice of variable \(x\) is justified by the fact that OAFC is particularly interested in serving individuals in their mobile clinic who cannot make it to the existing clinic in Elkhorn. It stands to reason that individuals without a car would struggle to get to the clinic in an area with limited public transportation. On the other hand, the choice of variable \(y\) is partially justified by the systemic issues mentioned in Section 7.3. Indeed, individuals in this category are more likely to be noncitizens or undocumented than those in other minority categories, and data shows that these populations are more likely to be uninsured [7.12.1.129]. Finally, the data below shows that this category is the largert minority category, by far, in the county.
Race/ethnic category | Percentage |
---|---|
White alone | 83.70% |
Black or African American alone | 1% |
American Indian and Alaska Native alone | 0.10% |
Asian alone | 0.70% |
Native Hawaiian and Other Pacific Islander alone | 0% |
Some other race alone | 0.60% |
Two or more races | 2.30% |
Hispanic or Latino | 11.60% |
How can we combine these two variables to create a single model of medical need? The simplest way to do so is by choosing a weight \(w\) between \(0\) and \(1\) and creating a new variable \(z\) defined as follows:
The number \(w\) represents the fraction of the significance we place on the variable \(x\text{.}\) When \(w=0\text{,}\) all weight goes into variable \(y\text{,}\) and when \(w=1\text{,}\) all weight goes into variable \(x\text{.}\) Equal weight is placed on the two variables when \(w=\frac{1}{2}\text{.}\) The model is flexible in that we can allow the client to choose a value of \(w\) based on their own judgement of the relative importance of the variables. We will default to \(w=\frac{1}{2}\) for simplicity. The table below shows the values of \(x\text{,}\) \(y\) and \(z\) for each municipality in the county when \(w=\frac{1}{2}\text{.}\)
Municipality | Percentage of working households with no vehicle available as decimal \((x)\) | Percentage of individuals identifying as Hispanic/Latino as decimal \((y)\) | \(z=wx+(1-w)y\) \((w=\frac{1}{2})\) |
---|---|---|---|
Bloomfield village | 0.0179 | 0.2026 | 0.1102 |
Bloomfield town | 0 | 0.2805 | 0.1402 |
Darien village | 0 | 0.1552 | 0.0776 |
Darien town | 0.0033 | 0.1803 | 0.0918 |
Delavan city | 0.0275 | 0.234 | 0.1308 |
Delavan town | 0.0403 | 0.0914 | 0.0658 |
East Troy village | 0.0202 | 0.0171 | 0.0186 |
East Troy town | 0.0171 | 0.0369 | 0.027 |
Elkhorn city | 0.0106 | 0.1531 | 0.0819 |
Fontana-on-Geneva Lake village | 0.025 | 0.0152 | 0.0201 |
Geneva town | 0.0121 | 0.1366 | 0.0743 |
Genoa City village | 0.0054 | 0.1336 | 0.0695 |
Lafayette town | 0.0042 | 0.0193 | 0.0117 |
La Grange town | 0.0131 | 0.0455 | 0.0293 |
Lake Geneva city | 0 | 0.0964 | 0.0482 |
Linn town | 0.0046 | 0.0591 | 0.0319 |
Lyons town | 0.0203 | 0.0234 | 0.0219 |
Mukwonago village | 0 | 0 | 0 |
Richmond town | 0 | 0.0708 | 0.0354 |
Sharon village | 0.0092 | 0.3139 | 0.1615 |
Sharon town | 0.0046 | 0.0663 | 0.0354 |
Spring Prairie town | 0 | 0.0125 | 0.0062 |
Sugar Creek town | 0 | 0.0178 | 0.0089 |
Troy town | 0.0132 | 0.0282 | 0.0207 |
Walworth village | 0 | 0.2457 | 0.1229 |
Walworth town | 0 | 0.1169 | 0.0585 |
Whitewater city | 0.027 | 0.1067 | 0.0668 |
Whitewater town | 0.065 | 0.0979 | 0.0814 |
Williams Bay village | 0.0259 | 0.1083 | 0.0671 |
The highlighted values in the table show the three municipalities of greatest need according to our model.
But, if you look closeWindow()ly, you may notice an interesting imbalance in this data: The values of the variable \(y\) are on average much larger than the values of the variable \(x\text{,}\) and this may be throwing off the weighting. We can fix this by normalizing \(x\) and \(y\) so that each column sums to \(1\text{.}\) To do this, we can divide each variable by sum of all of its values. We call these new variables \(\bar{x}\) and \(\bar{y}\text{.}\) Using \(\bar{x}\) and \(\bar{y}\) ensures that the variables will have equal weight when \(w=\frac{1}{2}\text{.}\) The table below shows the values of \(\bar{x}\text{,}\) \(\bar{y}\text{,}\) and \(\bar{z}=w\bar{x}+(1-w)\bar{y}\) for \(w=\frac{1}{2}\text{.}\) This model also has the nice property that the measures of need in the final column \((z)\) sum up to \(1\) no matter what the value of \(w\) is, so we can think of them as decimal percentages.
Municipality | Normalized percentage of working households with no vehicle available as decimal \((\bar{x})\) | Normalized percentage of individuals identifying as Hispanic/Latino as decimal \((\bar{y})\) | \(z=w\bar{x}+(1-w)\bar{y}\) \((w=\frac{1}{2})\) |
---|---|---|---|
Bloomfield village | 0.0489 | 0.0661 | 0.0575 |
Bloomfield town | 0 | 0.0915 | 0.0458 |
Darien village | 0 | 0.0506 | 0.0253 |
Darien town | 0.0089 | 0.0588 | 0.0339 |
Delavan city | 0.0752 | 0.0763 | 0.0757 |
Delavan town | 0.1099 | 0.0298 | 0.0699 |
East Troy village | 0.055 | 0.0056 | 0.0303 |
East Troy town | 0.0468 | 0.0121 | 0.0294 |
Elkhorn city | 0.0289 | 0.05 | 0.0394 |
Fontana-on-Geneva Lake village | 0.0682 | 0.005 | 0.0366 |
Geneva town | 0.0329 | 0.0446 | 0.0387 |
Genoa City village | 0.0149 | 0.0436 | 0.0292 |
Lafayette town | 0.0114 | 0.0063 | 0.0088 |
La Grange town | 0.0357 | 0.0148 | 0.0253 |
Lake Geneva city | 0 | 0.0314 | 0.0157 |
Linn town | 0.0127 | 0.0193 | 0.016 |
Lyons town | 0.0555 | 0.0076 | 0.0316 |
Mukwonago village | 0 | 0 | 0 |
Richmond town | 0 | 0.0231 | 0.0115 |
Sharon village | 0.025 | 0.1024 | 0.0637 |
Sharon town | 0.0125 | 0.0216 | 0.0171 |
Spring Prairie town | 0 | 0.0041 | 0.002 |
Sugar Creek town | 0 | 0.0058 | 0.0029 |
Troy town | 0.0361 | 0.0092 | 0.0227 |
Walworth village | 0 | 0.0802 | 0.0401 |
Walworth town | 0 | 0.0381 | 0.0191 |
Whitewater city | 0.0736 | 0.0348 | 0.0542 |
Whitewater town | 0.1773 | 0.032 | 0.1046 |
Williams Bay village | 0.0707 | 0.0353 | 0.053 |
Now we can advise OAFC to place mobile clinics in the municipalities of Delavan city, Delavan town, and Whitewater town, assuming they want to weight the variables \(x\) and \(y\) equally!
The model we have just created is called a parametric linear model with two variables. The only difference when we add additional variables and weights is that the weights should all add up to \(1\text{.}\)
Parametric linear model.
Suppose that \(x_1,x_2,x_3,\ldots,x_n\) are quantitative variables associated to some dataset. A parametric linear model is an equation
The numbers \(w_1,w_2,w_3,\ldots,w_n\) are called weights or parameters. In practice, we usually assume or arrange that the weights are positive and all sum to \(1\text{,}\) and that for each \(i\text{,}\) the value of the variable \(x_i\) over each point in the dataset sums to \(1\) as well. In this case we say the model is normalized.