## What are possible sources of bias/issues with how the data were selected?

Instructions: In this paper I want you to estimate the mean of some population. You can either use data you collect yourself, or data you find on the internet. Your paper should include 3 sections: an introduction where you introduce the question you are trying to answer and why you are interested in it, a methods section where you discuss how the data was collected, finally a results section where you analyze the data. Below is a more detailed description of what should be included in each section.

I would recommend trying to collect or find data on something interesting to you. You should also consider how easy it will be to find or collect the data you want. You only have about a week to complete this paper, so you want to have a way to easily and quickly get data. A good source of data is from sports. People generally collect a lot of data about sport stars and sporting events. It can also be easy to find metrics on your own sports performance (depending on the metric and the sport). For example, you could try to estimate your average tennis serve speed, or the average serve speed for a professional player. Another easy way to collect data would be to do a convenience survey of you classmates or other students at the school. Remember that the variable you are measuring should be quantitative, so that you can actually compute its mean.

The data you used, as well as the calculations you performed should be in an excel file you turn in with you paper. Clearly label all of the information contained in each cell, and as much as possible reference cells containing numbers instead of typing values into functions. This will help me see your process.

There is not a hard page limit for the paper, but I expect that if you touch on all of the points listed below, your write up will be around 3 pages long. Remember your goal is to demonstrate to me what you have learned in the course so far.

A couple of final Remarks: Please have a rough draft of your paper prepared by Friday, July 8th, and bring a paper copy to class. Also, when writing your paper next week, you can make the assumption that your sample standard deviation is the same as the population standard deviation. On Thursday next week, we will have a method for computing confidence intervals without this assumption, and it will be an easy edit to change your method.

Introduction:

– What are interested in investigating? Why?

– What question are you trying to answer?

Methods:

How was the data collected (by you or others)?

– If it was a sample, was it an SRS, convenience sample, etc?

– If it was an experiment, was it a randomized comparative experiment?, blinded? Matched-pairs?

– What are possible sources of bias/issues with how the data were selected?

– How might this affect your results?

– I do not care if your data are collected with a good method or not, as long as you clearly recognize whether or not it is a bad method.

Results:

– Start by examining your data.

– Give the five number summary.

– Are the values normally distributed? Skewed?

– Include a histogram.

– Are there any outliers?

– Interpret.

– Describe the sampling distribution.

– Compute confidence intervals at various confidence levels.

– Interpret

– Assuming your sample mean and standard deviation accurately estimate the mean and standard deviation for the population, compute the probability of a data point falling in various ranges. What are the top/bottom xth percentiles of the distribution?

– Interpret

– Include any other visualizations you deem appropriate.

– What are possible sources of inaccuracies/error?

Points Break-Down:

Introduction: 10

Methods: 20

Results: 40

Excel Sheet: 20

Visualizations: 5

Writing Style: 5

Visualizations:

– Are the axes at a good scale?

– Are they clearly labeled?

– Are they easy to read?

Writing Style: