The most important step in choosing the appropriate statistical procedure is to know what the variables of your study are. What are the independent and dependent variables of your study? How are each of the variables measured? Once you have a better grasp of your variables, you can easily choose the statistical procedure that will best answer your study's questions.

An Example

Five students are asked to design a study that will assess the relationship between using the Wii Fit and weight loss in a group of 150 overweight pre-teens during a month-long period. The weight of the participants is taken at the beginning and at the end of the study. The students come up with five different proposals:

  1. Student A proposes to randomly assign half of the group to the Wii intervention. This group will be instructed to do Wii Fit Aerobics training for 30 minutes five times a week. The other half would serve as the control and will not be told to do anything. Student A then wants to determine whether using the Wii Fit would lead to weight loss. Accordingly, she suggests that the participants be classified into one of two groups: no weight loss and weight loss.
  2. Student B also proposes to randomly assign half of the group to the Wii intervention. This group will be instructed to do Wii Fit Aerobics training for 30 minutes five times a week. The other half would serve as the control and will not be told to do anything. Student B then wants to determine whether using the Wii Fit would lead to weight loss. Thus, he suggests that weight loss should be defined in terms of the difference in weight prior to and immediately after the study.
  3. Student C proposes to randomly assign 30 pre-teens to one of four types of Wii interventions: Aerobics, Yoga, Strength Training, and Balance Games. These four groups will be instructed to do the specific Wii Fit activity for 30 minutes five times a week. The last group of 30 pre-teens would serve as the control and will not be told to do anything. Student C then wants to determine whether using the Wii Fit would lead to weight loss. Thus, she suggests that weight loss should be defined in terms of the difference in weight prior to and immediately after the study.
  4. Student D proposes to simply ask the 150 overweight teens to record the number of minutes per day they spend using the Wii Fit. He then suggests that the participants be classified into one of two groups: no weight loss and weight loss.
  5. Student E also proposes to simply ask the 150 overweight teens to record the number of minutes per day they spend using the Wii Fit. He then suggests that weight loss should be defined in terms of the difference in weight prior to and immediately after the study.

First Point: Variables of the Study

What are the variables of the study? Using the Wii Fit would be the independent variable of the study while weight loss would be the dependent variable of the study.

Second Point: Definition or Measurement of the Variables

From the example above it is obvious that there are several ways to define or measure the independent and dependent variables of a study. But there are two main questions to consider:

  1. Is the independent variable measured categorically or continuously?
  2. Is the dependent variable measured categorically or continuously?

Student A. Student A defined Wii Fit use in terms of using the Wii Fit or not using the Wii Fit. Accordingly, Student A defined Wii Fit use in terms of categories. She defined weight loss in terms of no weight loss or weight loss. Thus, her definition of weight loss was categorical.

Student B. Student B also defined Wii Fit use in terms of using the Wii Fit or not using the Wii Fit. Thus, he defined Wii Fit use in terms of categories. But Student B defined weight loss in terms of the difference between weight prior to the study and weight immediately after the study. Weight loss, therefore, was defined continuously.

Student C. Student C defined Wii Fit use in terms of the type of Wii Fit activity. Thus, she defined Wii Fit use in terms of five categories. She also defined weight loss in terms of the difference between weight prior to the study and weight immediately after the study. Weight loss, therefore, was defined continuously.

Student D. Student D defined Wii Fit use in terms of the number of minutes per day spent using the Wii Fit. As such, Wii Fit use was defined continuously. He defined weight loss in terms of no weight loss or weight loss. Thus, his definition of weight loss was categorical.

Student E. Student E defined Wii Fit use in terms of the number of minutes per day spent using the Wii Fit. As such, Wii Fit use was defined continuously. Student E also defined weight loss in terms of the difference between weight prior to the study and weight immediately after the study. Weight loss, therefore, was defined continuously.

Third Point: Choosing the Appropriate Statistical Procedure

Given that independent and dependent variables can be classified as categorical or continuous, the grid below can be used to classify the more common statistical procedures.

   
Dependent Variable
    Categorical Continuous
Independent Variable Categorical

Cross-tabulation
Logistic regression

t-test
One-way ANOVA

  Continuous

Logistic regression
Ordinal regression

Correlation
Linear regression

Student A. Student A could thus choose to perform either a cross-tabulation analysis or a logistic regression procedure. These tests are useful when the independent and dependent variables are measured categorically.

Student B. Student B would need to conduct an independent t-test procedure since his independent variable would be defined in terms of categories and his dependent variable would be measured continuously. An independent t-test procedure is used only when the independent variable has two categories.

Student C. Student C would need to conduct a one-way ANOVA since her independent variable would be defined in terms of categories and her dependent variable would be measured continuously. One-way ANOVAs are used when the independent variable has three or more categories.

Student D. Student D would use a logistic regression procedure to analyze his data since his independent variable would be measured continuously and his dependent variable would be measured categorically. If Student D defined his dependent variable in terms of three or more categories that could be ranked (e.g., weight gain, no weight loss, weight loss), then he would use an ordinal regression procedure.

Student E. Student E could choose to perform either a Pearson correlation procedure or a linear regression procedure since both of her variables would be defined continuously. Usually, a correlation test is conducted when there is only one independent variable and one independent variable. If Student E wanted to study the relationship between several independent variables (e.g., number of hours spent sleeping, number of calories consumed per day) and weight loss, then she would use a linear regression procedure.