The correlation and regression project is designed to:
(1.) Meet one of the learning objectives of the VCCS (Virginia Community College System) standards for:
(a.) MTH 155: Statistical Reasoning
(Presents elementary statistical methods and concepts including visual data presentation, descriptive statistics, probability, estimation, hypothesis testing, correlation and linear regression.).
(b.) MTH 245: Statistics I
(Presents an overview of statistics, including descriptive statistics, elementary probability, probability distributions, estimation, hypothesis testing, correlation, and linear regression.).
(2.) Draw a scatter diagram between two variables by technology.
(3.) Calculate the Pearson linear correlation coefficient between two variables by formulas.
(4.) Determine the Pearson linear correlation coefficient between two variables by technology.
The technology that we shall use are:
Texas Instruments (TI)-series calculator
Pearson Statcrunch
*Statistical Package for the Social Sciences (SPSS): Free use for VCCS students as virtual computing software
(Please see Virtual Computing Lab for SPSS)
*Google Spreadsheets: Free
*Microsoft Excel: Free for students
*Optional software
(5.) Calculate the linear regression equation between two linearly-correlated variables by formula.
(6.) Determine the least-squares regression line between two linearly-correlated variables by technology.
The technology that we shall use are:
Texas Instruments (TI)-series calculator
Pearson Statcrunch
*Statistical Package for the Social Sciences (SPSS): Free use for VCCS students as virtual computing software
(Please see Virtual Computing Lab for SPSS)
*Google Spreadsheets: Free
*Microsoft Excel: Free for students
*Optional software
(7.) Meet the QM (Quality Matters) and USDOE (United States Department of Education) requirements for distance
education as regards the provision of RSI (Regular and Substantive Interaction).
Federal Register: Distance Education and Innovation
St. John's University: New Federal Requirements for Distance Education: Regular and Substantive
Interaction (RSI)
Student – Content Interaction: Very high
Student – Student Interaction: Flexible
Student – Faculty Interaction: Very High
(1.) Data (Dataset): What dataset(s) should you use?
There are five approved ways to get the dataset(s).
Use any or combination of them.
(a.) Textbook (eBook) Datasets
You may use any of the applicable datasets from your eBook
(b.) MyLab Math (MLM) Datasets
You may use any of the applicable datasets from your MLM assignments.
(c.) Datasets from the U.S Government website: United States Government's Open Data: Datasets
You may use any of the applicable open datasets from the U.S government.
(d.) RStudio Datasets:
You may use any of the applicable built-in datasets from RStudio.
(e.) Data Collection from my Students. (This option is for onsite (traditional/in-class) students only.)
Please see me in the Office during Student Engagement hours so we can discuss the data collection methods and other requirements.
Please Note:
(I.) Any other dataset besides the ones mentioned should be pre-approved by me.
(II.) If you cannot find any dataset, please contact me via my school email (not via the Canvas messaging system).
(2.) Type of Linear Relationship
The relationship between any two variables may be:
(a.) Positive Linear Correlation
(b.) Negative Linear Correlation
(c.) No Linear Correlation.
At least an application of any two types of linear relationship is required.
This implies that a minimum of two applications is required for the project.
Please review the Example Guides for all three types of linear relationships.
Each application should be a word problem.
The predictor variable and the response variable should be real-world applied concepts.
(3.) Scatter Diagram
Using technology, draw a scatter diagram for each application.
Interpret the scatter diagram.
(4.) Linear Correlation Coefficient
Calculate the linear correlation coefficient using formulas.
Determine the linear correlation coefficient using technology.
Interpret the linear correlation coefficient.
(5.) Linear Regression
Interpret the variables in the linear regression equation.
(I.) Positive Linear Correlation and Negative Linear Correlation:
(a.) Write the least-squares regression line.
(b.) Predict the value of the response variable, given an applicable value of the predictor variable.
(II.) No Linear Correlation:
Because the linear regression equation is used for predictions only if the linear correlation coefficient indicates a linear correlation between the two variables,
determine the best value for the response variable given an applicable value of the predictor variable.
NOTE: Do not use any dataset that has a perfectly linear positive linear correlation (r = 1) or a perfectly negative linear correlation (r = −1)
This is because it defeats the purpose of the project.
The project is not designed for you to use the Slope-Intercept form of the equation of a straight line $y = mx + b$
It is designed for you to use the Least-Squares regression line (which is also written in slope-intercept form).
(6.) Sample Size
At least a sample size of 5 (n ≥ 5) is required for each application.
(7.) This is an individual project.
You may collaborate with one another. However, it is not a group project.
No two students should use the same dataset and same variable(s) because there are many datasets available for you.
Be it as it may, I am here to help you. You have my number. Feel free to text me anytime. We can arrange Zoom sessions so I share my screen and work you through any questions you have. Please do this as soon as possible. Keep the due dates in mind.
(a.) Please submit the two datasets (names of the datasets including the sources) that you intend to work on, in the Projects: Datasets forum in the Canvas course. I shall review and respond.
(b.) Once I give you the approval, please send your draft to me via email (if you prefer my review to be seen by you alone) or submit your draft in the Projects: Drafts forum in the Canvas course (if you do not mind your colleagues reading my review).
I shall review and respond.
(c.) When everything is fine (after you make changes as applicable based on my feedback), please submit your work in the appropriate area (Assignments: Projects page) of the Canvas course.
Only projects submitted in the appropriate area of the Canvas course are graded.
Draft projects are not graded. In other words, projects submitted via email and/or in the Projects: Drafts forum are not graded because they are drafts.
Submitting drafts is highly recommended. If your professor gives you an opportunity to submit a draft, please use that opportunity.
Submitting drafts is not required. It is highly recommended because I want to give you the opportunity to do your project very well and make an excellent grade in it.
(8.) As a student, you have free access to Microsoft Office suite of apps.
(a.) Please download the desktop apps of Microsoft Office on your desktop/laptop (Windows and/or Mac only).
Do not use a chromebook. Do not use a tablet/iPad. Do not use a smartphone.
Do not use the web app/sharepoint access of Microsoft Office.
(Please contact the IT/Tech Support for assistance if you do not know how to download the desktop app.)
In that regard, the project is to be typed using the desktop version/app of Microsoft Office Word only.
(b.) The file name for the Microsoft Office Word project should be saved as: firstName–lastName–project
Use only hyphens between your first name and your last name; and between your last name and the word, project.
No spaces.
(c.) For all English terms/work (entire project): use Times New Roman; font size of 14; line spacing of 1.5.
Further, please make sure you have appropriate spacing between each heading and/or section as applicable.
Your work should be well-formatted and visually appealing.
(d.) For all Math terms/work: symbols, variables, numbers, formulas, expressions, equations and fractions among others,
the Math Equation Editor is required.
(i.) The font is set to Cambria Math by default (set it to that font if it is not); font size of 14, and
align accordingly (preferably left-aligned).
(ii.) To ensure appropriate spacing between your Math work, use a line spacing of 2.0.
Alternatively, you may use line spacing of 1.5 but insert a space after each equation as applicable.
Your work should be well-formatted, organized, well-spaced (not compact), and visually appealing.
(e.) Include page numbers. You may include at the top of the pages or at the bottom of the pages but not both.
(9.) All work must be shown.
Please write each formula before you use it.
If you use any variables, please define your variables accordingly in the context of your application.
(10.) All work must be turned in by the final due date to receive credit.
Please note the due dates listed in the course syllabus for the submission of the draft and the actual project.
In the course syllabus, we have the:
(a.) Initial due date for the Project Draft: Please turn in your draft.
(b.) Initial due date for the Project: If your draft is not ready for submission, keep working with me. Make changes
based on my feedback and keep working with me until I give you the green light to turn it in.
If you prefer not to turn in a draft, please review all the resources provided for you and do your project well and
submit.
(c.) Final due date for the Project Draft: This is necessary if you want a written feedback for your draft.
After this date, written feedback would not be provided for your draft. However, verbal feedback would still be provided
during Office Hours/Student Engagement Hours/Live Sessions.
(d.) Final due date for the Project: All work must be turned in by this date to receive credit.
After this date, no work may be accepted.
Name: | Your name |
Date: | The date |
Instructor: | Samuel Chukwuemeka |
Project: |
(Please choose any two) (1.) Positive Linear Correlation: Scatter Diagram, Correlation Coefficient, Linear Regression (2.) Negative Linear Correlation: Scatter Diagram, Correlation Coefficient, Linear Regression (3.) No Linear Correlation: Scatter Diagram, Correlation Coefficient, Best Prediction |
1st Dataset: (Please write the name of the dataset and describe it.) | 1st Source: (Please write your source) |
2nd Dataset: (Please write the name of the dataset and describe it) | 2nd Source: (Please write your source) |
Objectives: |
(Please write specific objectives) (1.) (2.) (3.) (4.) (5.) |
Formulas: |
These are the Symbols and Formulas You do not need to write all the formulas. Write only the formulas that you will use. Define any variable in the formula in the context of your application. |
Technology: |
(1.) Texas Instruments (TI) calculator (Set up the TI calculator) (2.) Google Spreadsheets (3.) Microsoft Excel (4.) Pearson Statcrunch |
Table: | Critical Values for Pearson's Linear Correlation Coefficient (based on degrees of freedom) |
References: |
Please cite your sources accordingly. Indicate the citation format. |
Symbol | Meaning |
---|---|
$X$ | dataset $X$ |
$x$ | $x-values$ |
$\Sigma$ | summation (pronounced as uppercase Sigma) |
$\Sigma x$ | summation of the $x-values$ |
$(\Sigma x)^2$ | square of the summation of the $x-values$ |
$\Sigma x^2$ | summation of the square of the $x-values$ |
$\bar{x}$ | sample mean of the $x-values$ |
$s$ | sample standard deviation |
$s_{x}$ | sample standard deviation of the $x-values$ |
$z_{x}$ | $z$ score of an individual sample value $x$ |
$Y$ | dataset $Y$ |
$y$ | $y-values$ |
$\Sigma y$ | summation of the $y-values$ |
$\Sigma y$ | summation of the $y-values$ |
$(\Sigma y)^2$ | square of the summation of the $y-values$ |
$\Sigma y^2$ | summation of the square of the $y-values$ |
$\bar{y}$ | sample mean of the $y-values$ |
$s_{y}$ | sample standard deviation of the $y-values$ |
$z_{y}$ | $z$ score of an individual sample value $y$ |
$\Sigma xy$ | sum of the product of the $x-values$ and the corresponding $y-values$ |
$n$ | sample size |
$r$ | Pearson's correlation coefficient |
$r^2$ | Coefficient of determination |
$\hat{y}$ | predicted values of $y$ |
$b_0$ | $y-intercept$ of the least-squares regression line |
$b_1$ | slope of the least-squares regression line |
$b_2$ | slope of the least-squares regression line (For multiple linear regression) |
α | significance level |
df | degrees of freedom |
$
(1.)\;\; \bar{x} = \dfrac{\Sigma x}{n} \\[5ex]
(2.)\;\; \bar{y} = \dfrac{\Sigma y}{n} \\[5ex]
(3.)\;\; z_{x} = \dfrac{x - \bar{x}}{s_{x}} \\[5ex]
(4.)\;\; z_{y} = \dfrac{y - \bar{y}}{s_{y}} \\[5ex]
$
First Formula for Standard Deviation
$
(5.)\;\; s_{x} = \sqrt{\dfrac{\Sigma(x - \bar{x})^2}{n - 1}} \\[7ex]
(6.)\;\; s_{y} = \sqrt{\dfrac{\Sigma(y - \bar{y})^2}{n - 1}} \\[7ex]
$
Second Formula for Standard Deviation
$
(7.)\;\; s_{x} = \sqrt{\dfrac{n(\Sigma x^2) - (\Sigma x)^2}{n(n - 1)}} \\[7ex]
(8.)\;\; s_{y} = \sqrt{\dfrac{n(\Sigma y^2) - (\Sigma y)^2}{n(n - 1)}} \\[7ex]
$
First Formula for Pearson Correlation Coefficient
$
(9.)\;\; r = \dfrac{\Sigma\left(\dfrac{x - \bar{x}}{s_x}\right)\left(\dfrac{y - \bar{y}}{s_y}\right)}{n - 1} \\[7ex]
(10.)\;\; r = \dfrac{\Sigma\left(z_x\right)\left(z_y\right)}{n - 1} \\[5ex]
$
Second Formula for Pearson Correlation Coefficient
$
(11.)\;\; r = \dfrac{n(\Sigma xy) - (\Sigma x)(\Sigma y)}{\sqrt{n(\Sigma x^2) - (\Sigma x)^2} * \sqrt{n(\Sigma y^2) - (\Sigma y)^2}} \\[7ex]
$
Slope of the Least-Squares Regression Line
$
(12.)\;\; b_1 = r * \dfrac{s_y}{s_x} \\[7ex]
$
Y-intercept of the Least-Squares Regression Line
$
(13.)\;\; b_0 = \bar{y} - b_1 * \bar{x} \\[3ex]
$
Least-Squares Regression Line
or
Line of Best Fit
or
Linear Regression Equation
$
(14.)\;\; \hat{y} = b_{1}x + b_0 \\[3ex]
$
Unless stated otherwise:
(1.) Use α = 5% = 0.05 (Proportion in TWO Tails)
(2.) df = n − 2
(3.) n = df + 2
The first thing we need to do is to turn Diagonstic On
(1.)
(2.)
(3.)
$x$ | $x - \bar{x}$ | $(x - \bar{x})^2$ | $y$ | $y - \bar{y}$ | $(y - \bar{y})^2$ |
---|---|---|---|---|---|
12.1 | −3.5 | 12.25 | 17.13 | −2.875 | 8.265625 |
18.4 | 2.8 | 7.84 | 23.75 | 3.745 | 14.025025 |
20.7 | 5.1 | 26.01 | 26.86 | 6.855 | 46.991025 |
16.8 | 1.2 | 1.44 | 19.83 | −0.175 | 0.030625 |
15.2 | −0.4 | 0.16 | 23.31 | 3.305 | 10.923025 |
10.4 | −5.2 | 27.04 | 9.15 | −10.855 | 117.831025 |
$ \Sigma (x - \bar{x})^2 = 74.74 \\[3ex] n - 1 = 6 - 1 = 5 \\[3ex] s_{x} = \sqrt{\dfrac{\Sigma(x - \bar{x})^2}{n - 1}} \\[5ex] s_x = \sqrt{\dfrac{74.74}{5}} \\[5ex] s_x = \sqrt{14.948} \\[3ex] s_x = 3.866264347 $ | $ \Sigma (y - \bar{y})^2 = 74.74 \\[3ex] n - 1 = 6 - 1 = 5 \\[3ex] s_{y} = \sqrt{\dfrac{\Sigma(y - \bar{y})^2}{n - 1}} \\[5ex] s_y = \sqrt{\dfrac{198.06635}{5}} \\[5ex] s_y = \sqrt{39.61327} \\[3ex] s_y = 6.293907371 $ |
$x - \bar{x}$ | $\dfrac{x - \bar{x}}{s_x}$ | $y - \bar{y}$ | $\dfrac{y - \bar{y}}{s_y}$ | $\left(\dfrac{x - \bar{x}}{s_x}\right)\left(\dfrac{y - \bar{y}}{s_y}\right)$ |
---|---|---|---|---|
−3.5 | −0.905266605 | −2.875 | −0.456790961 | 0.4135176030 |
2.8 | 0.724213284 | 3.745 | 0.595019878 | 0.430921300 |
5.1 | 1.31910276 | 6.855 | 1.08914853 | 1.43669884 |
1.2 | 0.310377121 | −0.175 | −0.02780466 | −0.00862993 |
−0.4 | −0.10345904 | 3.305 | 0.525111001 | −0.05432748 |
−5.2 | −1.3449675 | −10.855 | −1.7246837 | 2.31964368 |
$ \Sigma\left(\dfrac{x - \bar{x}}{s_x}\right)\left(\dfrac{y - \bar{y}}{s_y}\right) = 4.537824013 \\[7ex] n - 1 = 5 \\[5ex] r = \dfrac{\Sigma\left(\dfrac{x - \bar{x}}{s_x}\right)\left(\dfrac{y - \bar{y}}{s_y}\right)}{n - 1} \\[7ex] r = \dfrac{4.537824013}{5} \\[5ex] r = 0.9075648026 $ |
Dataset:
Source:
Description:
Does the ... depend on the ..., or does the ... depend on the ...?
Independent Variable:
Dependent Variable:
Scatter Diagram using Pearson Statcrunch
Dataset:
Source:
Description:
Does the ... depend on the ..., or does the ... depend on the ...?
Independent Variable:
Dependent Variable:
Scatter Diagram using Pearson Statcrunch
Chukwuemeka, Samuel Dominic (2023). R and RStudio Statistics Software.
Retrieved from https://statistical-science.appspot.com/
Sullivan, M., & Barnett, R. (2013). Statistics: Informed decisions using data with an introduction to mathematics of finance
(2nd custom ed.). Boston: Pearson Learning Solutions.
Triola, M. F. (2015). Elementary Statistics using the TI-83/84 Plus Calculator
(5th ed.). Boston: Pearson
Triola, M. F. (2022). Elementary Statistics. (14th ed.) Hoboken: Pearson.
Critical Values for Pearson’s Correlation Coefficient. (n.d.). http://commres.net/wiki/_media/correlationtable.pdf