Yournotes sponsored in part by

Study Break!

Correlation

 
Lecture Goals

 
You should be able to:

    - read and construct both scatter plots and the line of best fit
    - identify types of relationships
     
    - calculate the correlation coeficient   - calculate the coeff. of determination   -understand the limits of correlation  
Reading

Chapter 6
 

Introduction
 

Interest in the correspondence or relationship between 2 variables

Examples:

Similarity and attraction
 
SAT scores and collegiate GPA

  Smoking and longevity

Education level and salary

Study time and grades

 
Relationship between study time and exam scores???

 
Student     Study time     Exam score

    A                 8                 88
    B                 4                 75
    C                 7                 85
    D                 5                 78
    E               10                 94
    F               12               100
    G                 6                 82
    H                 3                 71
 
 

Scatter Plots

Use pairs of scores as coordinate points and plot points

  Student     Study time     Exam score     Coordinate

    A                 8                     88                 (8,88)
    B                 4                     75                 (4,75)
    C                 7                     85                 (7,85)

     
Student     Study time     Exam score

    A                 8                        88
    B                 4                        75
    C                 7                        85
    D                 5                        78
 

(graph)

Student     Study time     Exam score

    A                 8                     88
    B                 4                     75
    C                 7                     85
    D                 5                     78
    E               10                     94
    F               12                   100
    G                 6                     82
    H                 3                     71

 
(graph)

 

Line of Best Fit

Line that best captures the pattern of the coordinates
 

(graph)

 

2 Primary Characteristics

1) Slope direction
 

Positive Slope

          From bottom left to top right
 
 

    (graph)
  = Positive Correlation
 

= Positive Relationship
 

= Direct Correlation

 

As one goes up other goes up
 

As one goes down other goes down

 

    (graph)

 

Negative Slope

 

From top left to bottom right

 

(graph)
  = Negative Correlation

 
= Negative Relationship
 

= Inverse Correlation

 

As one goes up other goes down
 

As one goes down other goes up

 

(graph)
 

Line of Best Fit
 

2 Primary Characteristics

 
1) Slope direction
 

2) Quality of fit

 

Best fit is not necessarily good fit (e.g., buying shoes)
 
 

Perfect fit = coordinates form perfect line
 
Good = coordinates generally clustered around line of best fit   Poor fit = coordinates not clustered around line of best fit  
Perfect Fit

          Shows the relationship between 2 variables is systematic
 
         (graph)

  Good Fit

          Shows the relationship between 2 variables is somewhat systematic

    (graph)
 

Poor Fit

          Shows the relationship between 2 variables is not very systematic

    (graph)
 

Really Poor Fit

Shows the relationship between 2 variables is not systematic at all   (graph)  

Correlation Coefficient
 

    = descriptive statistic that measures amt. of relationship between 2 vars.
    = tells the same story as the line of best fit
= single # that ranges from -1 to +1   = r (rho)   Sign of number  = slope of line of best fit

= indicates type of relationship (positive or negative)

 
When number 0 to +1 = positive slope

(graph)

 
When number -1 to 0 = negative slope

(graph)

 

Magnitude of number

 

= cluster of coordinates around line of best fit   = how systematic relationship is

= how strong relationship is
 

 
********************************************************************

When number = +1.0 = perfect fit

Very strong relationship between 2 vars. (graph)

  ********************************************************************
 

When number = .75 to .99 = high fit

                     Strong relationship between 2 vars.

(graph)

              ***************

When number = .30 to .75 = moderate fit

Moderate relationship between 2 vars. (graph)
 
            ***************
When number = .01 to .30 = poor fit

                  Weak relationship between 2 vars.

(graph)

                                                ***************

When number = .0 = no fit

                 No relationship between 2 vars.

(graph)
 
 
********************************************************************
 

Magnitude and sign are independent

(graph)
 

Perfect Correlation: +1.00 -1.00

High Correlation: + .80 - .80

Moderate Correlation: +. 50 - .50

Low Correlation: + .25 - .25

No Correlation: 0 0
 
 

Calculating the Corr. Coefficient
 
 

Looking for systematic trends:
 

    Both variables increase together

    Both variables decrease together

    One increases while other decreases

    One decreases while other increases

 
Let's start by ordering scores

 

Student     Study time     Exam score

    H                 3                     71

    B                 4                     75

    D                 5                     78

    G                 6                     82

    C                 7                     85

    A                 8                     88

    E               10                     94

    F               12                   100

 
Let's start by ordering scores
 

Student     # Drinks     Exam score

    H                 3                 100

    B                 4                   94

    D                 5                   88

    G                 6                   85

    C                 7                   82

    A                 8                   78

    E               10                   75

    F               12                   71

   

Apples & Oranges Problem!!

 

Can't do math on sets of numbers with different units

 

Examples:

# drinks (pints) vs. score (%)

 

# hours studies vs. score (%)

 

 
Solution = convert all numbers to z- scores  
NOW all scores in same units (the distance from their mean)

 

Remember - transformation does NOT
change the distribution of scores

 

(formula)

 

 

When Both Var. Increase Together

Student     Study time     Exam score     ZxZy Sign
                                                                    (zx) (zy)

    1                 -1.34             -1.34                 +

    2                 -0.80             -0.80                 +

    3                 -0.27             -0.27                 +

    4                 +0.27            +0.27                +

    5                 +0.80             +0.80               +

    6                 +1.34             +1.34               +

 

** Sum = + so Correlation Coefficient = +**

 

When Both Var. Decrease Together

Student     Study time     Exam score     ZxZy Sign
                                                                    (zx) (zy)

    1                 +1.34             +1.34                +

    2                 +0.80             +0.80                +

    3                 +0.27             +0.27                +

    4                 -0.27             -0.27                 +

    5                 -0.80             -0.80                 +

    6                 -1.34             -1.34                 +

 

** Sum = + so Correlation Coefficient = +**

 

Student     Study time     Exam score     ZxZy
                                                                    (zx) (zy)

    1                 -1.34                 -1.34         1.7956

    2                 -0.80                 -0.80           .64

    3                 -0.27                 -0.27           .0729

    4                 +0.27               +0.27         .0729

    5                 +0.80               +0.80         .64

    6                 +1.34               +1.34         1.7956

 

(formula)

 
 
 

When One Increase Other Decrease
 

Student     Study time     Exam score     ZxZy Sign
                                                                    (zx) (zy)

    1                 +1.34             -1.34                 -

    2                 +0.80             -0.80                 -

    3                 +0.27             -0.27                 -

    4                 -0.27             +0.27                 -

    5                 -0.80             +0.80                 -

    6                 -1.34             +1.34                 -

 

** Sum = - so Correlation Coefficient = -**

 

Student     Study time    Exam score     ZxZy
                                                                (zx) (zy)

    1                 +1.34             -1.34         -1.7956

    2                 +0.80             -0.80         -.64

    3                 +0.27             -0.27         -.0729

    4                 -0.27             +0.27         -.0729

    5                 -0.80             +0.80         -.64

    6                 -1.34             +1.34         -1.7956

 
(formula)

 

Computational Formula

 

(formula)

 

Try it

 

 

 

 

 

 

Given a high positive relationship between # drinks and # errors...
 

Q - Can we use relationship to predict # errors if we know # drinks??                     A - Maybe...

 

 

Correlation and Variability

 

Variability
 

    = change in variable

    = change in # errors

 

To what extent did change in X account for change in Y??

To what extent did change in # drinks account for change in # errors?
 

R2
 
= coefficient of determination

= proportion of variability accounted for by X

  = (.73)2 = .53 = 53% of change in # errors accounted for by # drinks - 47% of variance NOT accounted for by # drinks

 

Cautions

 

Line of best fit and curvilinear relationships

  (graph)

 
r (rho) = for linear relationships only!!

   

2 variables MUST be related if change in one causes the change in another

 

Correlation not necessarily causation
 
 

1) relationship is spurious

2) X causes Y

3) Y causes X

4) Third variable

   
 
Restricted range
 
Homework:

 
Chapter 6 Problems 2-4, 6-8, 11, 14, 15

Information contained on this page does not represent the lecture verbatim.
These notes are not a substitute for class attendance.



This page last updated: [an error occurred while processing this directive]
Copyright 1998.
Questions?  Email: info@yournotes.com