run_analysis.R README

ken r.
Sept. 21, 2015

##Background In December, 2012 scientists of the University of Genova's Smartlab Research Laboratory conducted a series of experiments designed to measure human movement via smartphone technology. Utilizing the smartphone's built-in accelerometer and gyroscope, the scientists were able to measure six activities for each of 30 volunteers. These experiments culminated in a data set the scientists labeled "Human Activity Recognition Using Smartphones (HARUS) Data Set Version 1.0".

##Code Description When executed, run_analysis.r proceeds to subset the original HARUS data set, creating a new, much smaller data set of mean and standard deviation activity measurements for each subject. Each variable in the new data set will be labeled with descriptive activity names. Using this new data set as input, the script creates a second, independent data set containing the average of each variable for each activity, for each subject. The output of this script will be analysis_output.txt; a tidy, wide dataset:

180 rows x 84 variables. There are 30 subjects. Each subject is tied to 6 rows of activity measurements - one per activity. For each row, each measurement contains an average of mean or standard deviation values, of which there are 82.

##Data Processing

###Raw Data Collection An overview of the HARUS Smartlab data set documentation can be downloaded from:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

The actual data set and descriptive files can be downloaded from:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

###Notes on Original HARUS Data Set The HARUS data set consists of a number of text files. These files contain subject activity measurements (up to 6 for each subject), subject identifiers, features, and activity lables. Subjects were randomly assigned to either a "TESTING" or "TRAINING" group; there are subject data files for each group.

The "TRAINING" subject data is contained in:

X_train.txt: Activity measurments for the "TRAINING" subjects.
y_train.txt: "TRAINING" subject identifiers.

The "TESTING" subject data is contained in:

X_test.txt: Activity measurments for the "TESTING" subjects.
y_test.txt: "TESTING" subject ID.

For the six activities hundreds of "features" were measured:

features.txt: Lists 561 features.
features_info.txt: Shows information about the variables used on the feature vector.
activity_labels.txt: List of the activities measured for each subject.

Note: The HUARS data set includes "Inertial" files for both the "TESTING" and "TRAINING" groups. These files are NOT utilized for this project.

###Record Content When the original files are appropriately combined, a single file of 10,299 results. According to the original HUARS README, each record therein contains:

Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration

Triaxial Angular velocity from the gyroscope

A 561-feature vector with time and frequency domain variables

It's activity label

An identifier of the subject who carried out the experiment

##Steps to Execute run_analysis.R

###Installation

Note: "R" and/or "RStudio" must be installed on the target workstation.

To create a tidy data file from the original HARUS data set:

Clone the "GCD_analysis" Github repo to a new folder on your workstation. The repo is located at: https://github.com/kricklin/gcd_analysis
Launch the "RStudio" application, or "R" from the command line
Set the "working directory" to the new folder that contains the repo
Run the "run_analysis.R" file.

###Code Processing Overview The code in "analysis.R" performs the following steps:

####Part I - Set Up

Load required R packages - data.table, dplyr, and stringr
Download and expand the "Dataset.zip" archive into a "./data" sub-directory

####Part II - Process Test Data - 2947 Rows

Load the Test Measurements into a data frame
Load the Test Subject IDs into a separate data frame
Add the Subject IDs to the Test Measurements in a new column
Load the Test Activities into a separate vector
Add the Activities to the Test Measurements in a new column

####Part III - Process Training Data - 7352 Rows "Process Test Data" steps 1-5 are repeated for Training Data

####Part IV - Merge Subject Data

Merge the mutated Test and Training Data Frames
Move "subject_id" and "activity" columns to first two columns
Using the "features.txt" file assign Column Names
Using the "activity_labels.txt" file to change the activity column values from numeric values (1-6) to text factors

The merged data results in a data frame of 10,299 Rows x 563 Columns.

####Part V - Subset and Summarize

Subset all variables with labels containing "std()" or "mean"

Notes: Only feature names containing "std()" or "mean" are included in the subset. However, seven features with names beginning with "angle" and containing "mean" are excluded. The result is 82 features/values per observation.

To improve meaning and reability, modified variable names as follows:

Original	New
begins with "f"	begins with "f_"
begins with "t"	begins with "t_"
"-"	"_"
"BodyBody"	"body_"
"Body"	"body_"
"Gravity"	"gravity_"
"Acc"	"accel_"
"Gyro"	"gyro_"
"JerkMag"	"jerk_mag"
UPPER CASE	lower case

Create a new, independent tidy data set with the average of each variable for each activity and each subject

###Resulting Data Set Notes

####Dimensions 180 observations/rows of 82 variables/columns

####Data Summary

Each row represents all measurements for a single activity for a single subject.
Each column represents an average value for a specific measurement

##Original HUARS Data Set License Use of this dataset in publications must be acknowledged by referencing the following publication [1]

[1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012

This dataset is distributed AS-IS and no responsibility implied or explicit can be addressed to the authors or their institutions for its use or misuse. Any commercial use is prohibited.

Jorge L. Reyes-Ortiz, Alessandro Ghio, Luca Oneto, Davide Anguita. November 2012.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
UCI HAR Dataset/test		UCI HAR Dataset/test
CodeBook.Rmd		CodeBook.Rmd
CodeBook.html		CodeBook.html
CodeBook.md		CodeBook.md
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
analysis_output.txt		analysis_output.txt
huars_tidy_subset.txt		huars_tidy_subset.txt
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

run_analysis.R README

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

kricklin/GetCleanData

Folders and files

Latest commit

History

Repository files navigation

run_analysis.R README

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages