Kid Friendly Cities
SEARCH
Statistical Analysis
 

Analysis Part I: Variance in Final Rank Explained by Data Categories

A stepwise multiple linear regression procedure was used to assess the relative contribution of each data category (Population, Health, Education, etc.) to variability in the final rank of cities.

More specifically, we were interested in each coefficient of partial determination, which represents the proportion of the variability in the dependent variable (final city rank) explained by an independent variable (a data category rank) after all other independent variables have explained as much of the variability as possible.

The following pie charts display the results of the analysis. Each group of cities was analyzed separately.

Note: Only those categories that explained at least 2% of the remaining variability were retained.

In the Major cities, health and education were the largest contributors, trailed by public safety, community life, economics, and environment.

Education and health were also the largest contributors to the final rank among the Independent cities. The next largest contributors were community life and public safety.

A very different picture resulted from the analysis of ranks among the Component cities. Nearly half the variability in final city rank was explained by the economics category. Education, community life, health, and public safety trailed as contributors.

Analysis Part II: Variance in Final Score Explained by Independent Indicators

During the second phase of analysis, we examined the importance of the individual indicators (unemployment rate, for example) in explaining the variability in final city scores.

We first conducted an exploratory data analysis using raw data for each indicator variable, noting cases of extreme outliers and other evidence of non-normality. In some instances, a normal log transformation corrected marked skewness.

Data for the four education indicators was only available at the state-level. Multiple cities within one state thus shared the same value for each of these indicators. This presents a statistical dilemma because observations from the same state are perfectly correlated with one another with respect to education variables. Therefore, we utilized two different regression procedures to arrive at our final models. First, we fit a separate stepwise linear model regressing final city score on the set of indicator variables comprising one category (Health, for example). All remaining statistically significant variables in these separate models were combined into a single stepwise model to ascertain which variables would remain significant.

The same set of variables tested in this combined variable stepwise procedure were also tested using a second regression procedure which allowed us to fit a random components model in which the perfect correlation of the education variables is acknowledged. We were thus able to validate the statistical significance of those variables that remained after running the combined variable stepwise models.

The largest contributor to the final score among the Major cities was violent crime (violent crimes per 1,000 people). The next two largest contributors were children's library program attendance and average ACT score. The final two contributors were average elementary class size and percent low birthweight births. However, the random components model revealed that the variance estimates for average elementary class size and percent low birthweight births were underestimated in the final stepwise model, and are hence not statistically significant.

The final stepwise model for Independent cities revealed that violent crime was again the largest contributor to the final score of a city. This was followed by average secondary class size, percent low birthweight births, unemployment rate, and children's library program attendance. All of these variables remained statistically significant in the random components model.

Unemployment rate was, by far, the largest contributor in the final scores of Component cities. Average SAT score, population change, and average elementary class size trailed as contributors. The final two contributors were the percent births to teens and the number of Title X-funded clinics. All of these variables remained statistically significant in the random components model.

Analysis Part III: Final Rank vs. Health Improvement in the Major Cities

We were also interested in looking at the relationship between a city's health improvement rank and it's final rank. Data from the Major cities was graphed, with a city's health improvement rank on the X axis and its final rank on the Y axis. The 45 degree line represents a one to one correlation between the two ranks.

Among the Major cities, only two cities have the same health improvement rank and final rank: Chicago at rank 16 and Miami at rank 20. All cities that are located above the 45 degree line (Atlanta, Washington, DC, Detroit, Tampa, Pittsburgh, and Cleveland) have a health improvement rank that is higher than their final rank. In other words, despite finishing with a low overall rank, they show tremendous improvements over the last eight years, relative to the other cities in the study. The 17 cities located below the 45 degree line have a health improvement rank that is lower than their final rank.

Compiled with assistance from Rebecca Y. Stallings. Ms. Stallings holds a Master of Health Science degree in Biostatistics from Johns Hopkins University, where she has worked on dozens of research studies. She is a freelance consultant and Biostatistics instructor at Morgan State University.

 

Major Cities - Component Cities - Independent Cities - Honor Roll - Contact Us
A-F | G-O | P-S | T-Z