Statistics: What you Need to Know

Introduction

Often, when people begin a statistics course, they worry about doing advanced mathematics or their math phobias kick in. Understanding that statistics as addressed in this course is not a math course at all is important. The only math you will do is addition, subtraction, multiplication, and division. In these days of computer capability, you generally don’t even have to do that much, since Excel is set up to do basic statistics for you. The key elements for the student in this course is to understand the various types of statistics, what their requirements are, what they do, and how you can use and interpret the results. Referring back to the basic components of a valid research study, which statistic a researcher uses depends on several things:

- The research question itself
- The sample size
- The type of data you have collected
- The type of statistic called for by the design

All quantitative studies require a data set. Qualitative studies may use a data set or may use observations with no numerical data at all. For the purposes of the next modules, our focus will be on quantitative studies.

Types of Statistics

There are several types of statistics available to the researcher. Descriptive statistics provide a basic description of the data set. This includes the measures of central tendency: means, medians, and modes, and the measures of dispersion, including variances and standard deviations. Descriptive statistics also include the sample size, or “N”, and the frequency with which each data point occurs in the data set.

Inferential statistics allow the researcher to make predictions, estimations, and generalizations about the data set, the sample, and the population from which the sample was drawn. They allow you to draw inferences, generalizations, and possibilities regarding the relationship between the independent variable and the dependent variable to indicate how those inferences answer the research question. Researchers can make predictions and estimations about how the results will fit the overall population. Statistics can also be described in terms of the types of data they can analyze. Non-parametric statistics can be used with nominal or ordinal data, while parametric statistics can be used with interval and ratio data types.

Types of Data

There are four types of data that a researcher may collect.

Nominal Data Sets

The Nominal data set includes simple classifications of data into categories which are all of equal weight and value. Examples of categories that are equal to each other include gender (male, female), state of birth (Arizona, Wyoming, etc.), membership in a group (yes, no). Each of these categories is equivalent to the other, without value judgments.

Ordinal Data Sets

Ordinal data sets also have data classified into categories, but these categories have some form or order or ranking attached, often of some sort of value / value perception. Examples include rankings of poor, fair, good, excellent, very satisfied to very dissatisfied, etc. While the categories may be rank ordered, there are not equal intervals between the categories. The difference between poor and fair is not necessarily the same difference between good and excellent, for example.

Interval Data Sets

Interval data sets have equal intervals between the units of measure, although they lack a true zero. For example, test scores of 50 and 60 have the same interval between them as test scores of 70 and 80. Degrees of body fever have the same difference between a temperature of 97.5 and 98.5 as between a temperature of 99 and 100. However, the body does not reach a true zero temperature. IQ scores have the same interval between 75 and 100 as they do between 100 and 125.

Ratio Data Sets

Ratio data sets have equal intervals, and a true zero to the scale. Examples include water temperatures when the temperature of the water has the same interval between 20 and 25 degrees as between 30 and 35 degrees, with a true zero at the point water freezes solid (in Celsius; Fahrenheit is 32 degrees above Fahrenheit “0”). Another example may be the level of a certain drug in the blood stream. There is a true zero, and the level of drug at 100 mcg/cc is exactly double the level of drug at 50 mcg/cc.

A Review of Data Sets:

Nominal − Data into categories with equal weight and value

Ordinal − Data into categories with rank ordering

Interval − Data with equal intervals between all data points, but no true zero

Ratio − Data with equal intervals between all data points, plus a true zero on the scale

Knowing what data type is being used in a statistical analysis is important because all data types cannot be used by all statistics. As noted above in Types of Statistics, nominal and ordinal data can be analyzed with non-parametric statistics, while interval and ratio data can be analyzed by parametric statistics. As a rule of thumb, you should use the highest statistic your data set will allow to extract as much information as possible.

Choosing the Right Statistics

In order to select the appropriate statistic for a research study, start with the basics.

What Does the Research Question Ask?

If you are comparing one data set to another or one group to another, use a method of description to see if they are alike. If you are asking if one group is different from another in a meaningful way, use methods of inference. For example, let’s say you have two samples of staff intention to stay on their jobs. In the first sample, the staff has received a raise. In the second, they have not. The samples are described as follows:

Sample 1 − mean score of 95, standard deviation + or − 4, range of scores 91-99 (meaning a likelihood they will stay)

Sample 2 − mean score of 80, standard deviation + or − 16, range of scores 64-96 (meaning a lower, and wider ranging, likelihood of staying)

If you are asked just to compare the two data samples, you use the data to say that the mean score in Sample 1 shows higher intent to stay on the job than the mean score in Sample 2. Also, the standard deviation shows that there is more consistency in the scores in Sample 1 (smaller deviation) than in Sample 2 (larger deviation). These are methods of description from which you can draw some conclusions. You can answer the question: what do the data sets look like? However, if you are asked to determine whether Sample 1 is *significantly* higher than Sample 2 in intention to stay on their jobs, you would need to use statistics such as a t-test or ANOVA (analysis of variance) to determine whether the difference is *statistically significant*. If you had been asked to determine whether it was likely that *the raise* contributed to the differences in intention to stay on the job, you would also need to use an inferential statistic, such as t-test or ANOVA, which enabled you to draw that inference. So, if the research question asks what a data set *looks like*, you can use the descriptive statistics. If you want to know what the variables listed are *inferring*, you must use the inferential statistics, as well.

What is the Sample Size?

Some statistics are designed to be used with small sample sizes, generally less than 30, such as t-tests. Others fit better with larger sample sizes, such as ANOVAs. For many statistics, sample size is not a critical factor, but it is helpful to know if the sample is large enough to warrant the use of the statistics used.

What Type of Data is Used?

Again, nominal and ordinal data can be analyzed with non-parametric statistics, while interval and ratio data can be analyzed by parametric statistics. The key data type is in the dependent variable. If you have interval or ratio data in the dependent variable, you can use parametric statistics for your analysis.

In order to help you review studies in terms of their selection of statistics, refer to the following chart:

Using Statistical Assessment |
|||

Statistic |
Data Type |
Sample Size |
Research Question |

Mean and Standard Deviation | Interval or ratio | >2 | What does the data set look like? |

Frequency Distribution | Nominal-Ratio | >2 | What does the data set look like? |

Chi-Square | Nominal-Ordinal | >4 | Is the variance of one variable equal to or different from another variable |

Correlation Coefficient | Interval-Ratio | >4 | What is the relationship between two variables? |

t-Test | Interval-Ratio | <30 | Is there a significant difference between two groups? May be intrinsic or may be experimental |

ANOVA (Analysis of Variance) | Interval-Ratio | >30 | Is there a significant difference between 2 or more groups? |

Regression Analysis | Interval-Ratio | >30 | How much of the change in the dependent variable is predicted by the independent variable or variables? |

Statistical Significance

The concept of statistical significance is a part of inferential statistics. When you are looking for differences or changes in data sets as a result of the impact of an independent variable, you need to be able to determine if the changes are significant or resulting from random chance. Statistical significance answers the question: What is the probability that the results seen in this study are due to the effects of the independent variable on the dependent variable rather than random chance?

Significance is also expressed as Type I Error, which is the possibility of rejecting a true finding as false. It is depicted in statistics as an alpha character (Greek), or “α.” It can also be expressed as **p<0.05**. The level at which alpha is set will tell you that the probability of the finding being attributable to chance itself is less than (<) 0.05%. Other common settings for the alpha include 0.01% or even 0.001%. Most studies require at least an alpha of 0.05% in order for the results to be deemed significant.

Referring to the research study in Coyne et al (Coyne, Richards, Short, Shultz, and Singh, 2009) on hospital cost and efficiency and the effects of hospital size and ownership, the research question asked if size and ownership type made a difference in the efficiency and cost results of hospitals in the State of Washington. The sample size was 100 hospitals. The dependent variables, efficiency and cost measures, used numbers with a true zero (cost, occupancy, revenue, etc) so that the data would be classified as a ratio. The independent variables were hospital size and ownership type, which could be classified as ordinal for size (small, medium, large) and nominal for ownership type (not for profit; government owned). The statistic selected was an ANOVA (good choice), and the statistical significance was set as 0.05% alpha. Read the results on page 168, Table 3, to determine which results were significant, or less than 0.05%. Note that bed size made a significant difference in two of the efficiency ratios and two of the cost ratios. Also note which ownership affected one of the efficiency ratios and two of the cost ratios. The interaction of the two independent variables affected two of the efficiency ratios.

Conclusion

Picking the correct statistic to answer the research question, fit the sample size, and fit the data types is an essential element of a useful and valid research study. When these are correctly done, the results and their statistical significance can give you a better sense of elements you want to translate into evidence-based practice based on solid research.

References

“Hospital Cost and Efficiency: Do Hospital Size and Ownership Really Matter?” by Coyne, Richards, Short, Shultz, and Singh, from the *Journal of Healthcare Management* (2009), located in the GCU eLibrary at http://library.gcu.edu:2048/login?url=http://proquest.umi.com.library.gcu.edu:2048/pqdweb?did=1752526591&sid=1&Fmt=4&clientId=48377&RQT=309&VName=PQD