Data Science for Psychology ([ds4psy](https://bookdown.org/hneth/ds4psy/1.6-basics:advanced-issues.html))

I am following the steps found at ds4psy.

This section covers importing data from files, factors and vectors, random sampling, lists and flow control.

We will be importing data from an article in the Journal of Clinical Psychology Rosalind J. Woodworth, Angela O’Brien-Malone, Mark R. Diamond, Benjamin Schüz

A paper by Seligman et al showed that positive psychology delivered through the web interventions could increase happiness and decrease depressive episodes relative to a placebo control. Woodworth et al re-evaluated this claim by measuring the long-term effectiveness of different ppis.

In the steps below, we first import the data using two methods. The first places a request to the webpage where the data is stored and retrieves it.

# step 1, import csv file containing 3 columns, Artist, Title, Lyrics. Use 'read_csv2' since this automatically uses ';' for separator.

p_info <- readr::read_csv(file = "http://rpository.com/ds4psy/data/posPsy_participants.csv")

# see the first 6 lines of the file

head(p_info)
## # A tibble: 6 × 6
##      id intervention   sex   age  educ income
##   <dbl>        <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1     1            4     2    35     5      3
## 2     2            1     1    59     1      1
## 3     3            4     1    51     4      3
## 4     4            3     1    50     5      2
## 5     5            2     2    58     5      2
## 6     6            1     1    31     5      1

Data from an R package

We can also retrieve the data from an R packet as we’ve done before.

# install.packages("ds4psy")  # installs the 'ds4psy' package
library(ds4psy)               # loads the 'ds4psy' package

p_info_2 <- posPsy_p_info

Because it is the same data from two different sources, we will contrast them.

all.equal(p_info, p_info_2)
## [1] "Attributes: < Names: 2 string mismatches >"                                  
## [2] "Attributes: < Length mismatch: comparison on first 3 components >"           
## [3] "Attributes: < Component 2: target is externalptr, current is numeric >"      
## [4] "Attributes: < Component 3: Modes: numeric, list >"                           
## [5] "Attributes: < Component 3: Lengths: 295, 3 >"                                
## [6] "Attributes: < Component 3: names for current but not for target >"           
## [7] "Attributes: < Component 3: Attributes: < target is NULL, current is list > >"
## [8] "Attributes: < Component 3: target is numeric, current is col_spec >"
#> [1] "Attributes: < Names: 2 string mismatches >"                                  
#> [2] "Attributes: < Length mismatch: comparison on first 3 components >"           
#> [3] "Attributes: < Component 2: target is externalptr, current is numeric >"      
#> [4] "Attributes: < Component 3: Modes: numeric, list >"                           
#> [5] "Attributes: < Component 3: Lengths: 295, 3 >"                                
#> [6] "Attributes: < Component 3: names for current but not for target >"           
#> [7] "Attributes: < Component 3: Attributes: < target is NULL, current is list > >"
#> [8] "Attributes: < Component 3: target is numeric, current is col_spec >"

Checking a dataset

This allows us get familiar with data frame, table, and tibbles.

dim(p_info)              # 295 rows, 6 columns
## [1] 295   6
#> [1] 295   6
p_info                   # prints a summary of the table/tibble
## # A tibble: 295 × 6
##       id intervention   sex   age  educ income
##    <dbl>        <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1     1            4     2    35     5      3
##  2     2            1     1    59     1      1
##  3     3            4     1    51     4      3
##  4     4            3     1    50     5      2
##  5     5            2     2    58     5      2
##  6     6            1     1    31     5      1
##  7     7            3     1    44     5      2
##  8     8            2     1    57     4      2
##  9     9            1     1    36     4      3
## 10    10            2     1    45     4      3
## # ℹ 285 more rows
#> # A tibble: 295 × 6
#>       id intervention   sex   age  educ income
#>    <dbl>        <dbl> <dbl> <dbl> <dbl>  <dbl>
#>  1     1            4     2    35     5      3
#>  2     2            1     1    59     1      1
#>  3     3            4     1    51     4      3
#>  4     4            3     1    50     5      2
#>  5     5            2     2    58     5      2
#>  6     6            1     1    31     5      1
#>  7     7            3     1    44     5      2
#>  8     8            2     1    57     4      2
#>  9     9            1     1    36     4      3
#> 10    10            2     1    45     4      3
#> # … with 285 more rows
str(p_info)              # shows the structure of an R object
## spc_tbl_ [295 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id          : num [1:295] 1 2 3 4 5 6 7 8 9 10 ...
##  $ intervention: num [1:295] 4 1 4 3 2 1 3 2 1 2 ...
##  $ sex         : num [1:295] 2 1 1 1 2 1 1 1 1 1 ...
##  $ age         : num [1:295] 35 59 51 50 58 31 44 57 36 45 ...
##  $ educ        : num [1:295] 5 1 4 5 5 5 5 4 4 4 ...
##  $ income      : num [1:295] 3 1 3 2 2 1 2 2 3 3 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_double(),
##   ..   intervention = col_double(),
##   ..   sex = col_double(),
##   ..   age = col_double(),
##   ..   educ = col_double(),
##   ..   income = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
#> spec_tbl_df [295 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#>  $ id          : num [1:295] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ intervention: num [1:295] 4 1 4 3 2 1 3 2 1 2 ...
#>  $ sex         : num [1:295] 2 1 1 1 2 1 1 1 1 1 ...
#>  $ age         : num [1:295] 35 59 51 50 58 31 44 57 36 45 ...
#>  $ educ        : num [1:295] 5 1 4 5 5 5 5 4 4 4 ...
#>  $ income      : num [1:295] 3 1 3 2 2 1 2 2 3 3 ...
#>  - attr(*, "spec")=
#>   .. cols(
#>   ..   id = col_double(),
#>   ..   intervention = col_double(),
#>   ..   sex = col_double(),
#>   ..   age = col_double(),
#>   ..   educ = col_double(),
#>   ..   income = col_double()
#>   .. )
#>  - attr(*, "problems")=<externalptr>
tibble::glimpse(p_info)  # shows the types and initial values of all variables (columns)
## Rows: 295
## Columns: 6
## $ id           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, …
## $ intervention <dbl> 4, 1, 4, 3, 2, 1, 3, 2, 1, 2, 2, 2, 4, 4, 4, 4, 3, 2, 1, 3, 3, 4, 1, 3,…
## $ sex          <dbl> 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ age          <dbl> 35, 59, 51, 50, 58, 31, 44, 57, 36, 45, 56, 46, 34, 41, 27, 31, 44, 40,…
## $ educ         <dbl> 5, 1, 4, 5, 5, 5, 5, 4, 4, 4, 5, 4, 5, 1, 2, 1, 4, 5, 3, 4, 4, 4, 4, 4,…
## $ income       <dbl> 3, 1, 3, 2, 2, 1, 2, 2, 3, 3, 1, 3, 3, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 3,…
#> Rows: 295
#> Columns: 6
#> $ id           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
#> $ intervention <dbl> 4, 1, 4, 3, 2, 1, 3, 2, 1, 2, 2, 2, 4, 4, 4, 4, 3, 2, 1, …
#> $ sex          <dbl> 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, …
#> $ age          <dbl> 35, 59, 51, 50, 58, 31, 44, 57, 36, 45, 56, 46, 34, 41, 2…
#> $ educ         <dbl> 5, 1, 4, 5, 5, 5, 5, 4, 4, 4, 5, 4, 5, 1, 2, 1, 4, 5, 3, …
#> $ income       <dbl> 3, 1, 3, 2, 2, 1, 2, 2, 3, 3, 1, 3, 3, 2, 2, 1, 2, 2, 1,…

Learning a data set

  1. obtain a description of the variables and values.