diff --git a/.gitignore b/.gitignore index 9263a1ed..d17e5544 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,4 @@ _site .DS_Store +.Rhistory +.Rproj.user diff --git a/README.md b/README.md index 5b9c3141..83b0568d 100644 --- a/README.md +++ b/README.md @@ -4,11 +4,21 @@ Since the beginning of the Data Science Specialization we've noticed the unbelie ## Contributing -If you've created a web page, video, sideshow, or any other kind of media you think should be shared through this directory you should: +If you've created a web page, video, slideshow, or any other kind of media you think should be shared through this directory you should: 1. Fork this repository. 2. Add a link to your content on the appropriate course page. 3. Commit your changes. 4. Submit a pull request. -We've created a [sample pull request](https://github.com/DataScienceSpecialization/DataScienceSpecialization.github.io/pull/1) to show you what we would like to see in a pull request. If we think your creation is well made, informative, and adds something new to this repository of content then we'll merge your request and add you to our list of contributors. If you happen to notice any inaccuracies or idiosyncrasies on this site or in this site's content, please let us know by opening an issue. \ No newline at end of file +We've created a [sample pull request](https://github.com/DataScienceSpecialization/DataScienceSpecialization.github.io/pull/1) to show you what we would like to see in a pull request. If we think your creation is well made, informative, and adds something new to this repository of content then we'll merge your request and add you to our list of contributors. If you happen to notice any inaccuracies or idiosyncrasies on this site or in this site's content, please let us know by opening an issue. + +**If you are not the author of the content you are submitting** you are welcome to add your link to the [Curated Knowledge](http://datasciencespecialization.github.io/curated/) page. We've created this page specifically so that you can share data science resources that you've found useful. + +**Otherwise if you *are* the author of the content you're submitting** you should ask yourself the following questions: + +1. Does my contribution teach? +2. Does the content of my contribution clearly address topics in the Data Science Specialization? +3. Could my contribution be seamlessly integrated into the canonical course materials? + +If you're on the fence about any of these, err on the side of sending a pull request! diff --git a/about.md b/about.md index 8c98963a..37ecc9da 100644 --- a/about.md +++ b/about.md @@ -6,7 +6,7 @@ permalink: /about/ The [Data Science Specialization](https://www.coursera.org/specialization/jhudatascience/1) is a 9 courses series on Data Science. Every class in the series runs every month, and the course material is availible on [GitHub](https://github.com/DataScienceSpecialization/courses). -### The JHU Data Science Lab Team: +### The JHU Data Science Lab: - [Brian Caffo](http://www.bcaffo.com/) - [Jeff Leek](http://jtleek.com/) @@ -16,5 +16,13 @@ The [Data Science Specialization](https://www.coursera.org/specialization/jhudat ### Community Contributors: -- Kevin Markham +- [Kevin Markham](http://www.dataschool.io/) - Derek Franks +- David Hood +- [Leonard Greski](https://github.com/lgreski) +- Michael Sachs +- Allan Inocêncio de Souza Costa +- [stepds](https://github.com/stepds) +- Bastiaan Quast +- [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/) +- [Edmund julian Ofilada](https://github.com/DocOfi) diff --git a/capstone.md b/capstone.md new file mode 100644 index 00000000..6285e422 --- /dev/null +++ b/capstone.md @@ -0,0 +1,14 @@ +--- +title: "Capstone" +permalink: /capstone/ +layout: page +--- +## Reference Material + +- [Speech and Language Processing, 3rd Edition](https://web.stanford.edu/~jurafsky/slp3/) Working version of Jurafsky, et. al. book on natural language processing whose content on n-grams is helpful for the capstone. + +## Course Project + +- [n-gram Computations and Computer Capacity](http://bit.ly/2couvxh) Explains the amount of memory required to convert the text files for the course project into n-grams, using the quanteda package. +- [Capstone Strategy](http://bit.ly/2rGcgc6) Describes a general strategy to get through the Capstone: use the simplest approaches possible. +- [Choosing a Text Analysis Package](http://bit.ly/2qagsPa) Reviews pros and cons of various R packages used for natural language processing, in the context of requirements for the Capstone project. diff --git a/curated.md b/curated.md new file mode 100644 index 00000000..8c806fd8 --- /dev/null +++ b/curated.md @@ -0,0 +1,86 @@ +--- +layout: page +title: Curated Pages +permalink: /curated/ +--- + +### Analytics + +- [Huge Trello Board Collection of Data Science Resources](https://trello.com/b/rbpEfMld/data-science) +- [Diving Into Data Science Flipboard](https://flipboard.com/@thiakx/diving-into-data-science-5823ectuy) +- [OLAP Operation in R](http://architects.dzone.com/articles/olap-operation-r) +- [Journal of Statistical Software: Tidy data](http://www.jstatsoft.org/v59/i10/paper) +- [Verzani: simpleR – Using R for Introductory Statistics](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf) +- [Data Visualization packages](http://www.datavis.ca/R/) +- [Visualization hints: plotting numeric data by groups](http://www.r-bloggers.com/visualization-series-insight-from-cleveland-and-tufte-on-plotting-numeric-data-by-groups/) +- [Matrix rotation for image and contour plots in R](http://blog.snap.uaf.edu/2012/06/08/matrix-rotation-for-image-and-contour-plots-in-r/) +- [Fig Data: 11 Tips on How to Handle Big Data in R (and 1 Bad Pun)](http://theodi.org/blog/fig-data-11-tips-how-handle-big-data-r-and-1-bad-pun) +- [Data from 538](https://github.com/fivethirtyeight/data) +- [Getting started with python notebook](https://medium.com/@adhira_deo/the-environment-for-building-machine-learning-models-a1552116b355) + +### Command Line + +- [explainshell.com - match command-line arguments to their help text](http://explainshell.com/) +- [The Command Line Crash Course - Quick course in using the command line](http://cli.learncodethehardway.org/book/) +- [Mastering the command line, in one page](https://github.com/jlevy/the-art-of-command-line/blob/master/README.md) + +### R + +- [Try R](http://tryr.codeschool.com/) +- [The R Book by Michael J. Crawley](https://archive.org/details/TheRBook/) +- [Univ. of Calif. Riverside R Programming](http://manuals.bioinformatics.ucr.edu/home/programming-in-r#TOC-R-Basics) +- [G. Sanchez - Strings in R](http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf) +- [The Lubridate Package](http://www.jstatsoft.org/v40/i03/paper) +- [Google Developers R Programming Video Lectures](http://www.r-bloggers.com/google-developers-r-programming-video-lectures/) +- [awesome R](https://github.com/qinwf/awesome-R) - A curated list of awesome R frameworks, packages and software. +- [awesome machine learning](https://github.com/josephmisiti/awesome-machine-learning#r) - A curated list of awesome Machine Learning frameworks, libraries and software. +- [Google's R Style Guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml) +- [Tufte-style HTML in rmarkdown](http://sachsmc.github.io/tufterhandout/) +- [Creating an R Package](http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/) +- [R Packages (Hadley online book)](http://r-pkgs.had.co.nz/) - How to write your own R packages. +- [Beautiful ggplot2 Cheatsheet](http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/) +- [Intro to Graphics](http://bcb.dfci.harvard.edu/~aedin/courses/Bioconductor/2.Plotting.pdf) +- [data.table cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf) +- [Exploratory Data Analysis with data.table](http://varianceexplained.org/RData/lessons/lesson4/) +- [Fast summary statistics in R with data.table](http://blog.yhathq.com/posts/fast-summary-statistics-with-data-dot-table.html) +- [R online in r-fiddle.org](http://www.r-fiddle.org/) + +### Probability and Statistics + +- [Probability and Statistics Cookbook](http://matthias.vallentin.net/probability-and-statistics-cookbook/) + +### GitHub + +- [Official Git Tutorial](http://git-scm.com/docs/gittutorial) +- [Git - Simple Guide](http://rogerdudler.github.io/git-guide/) +- [Git Immersion - A guided tour through the fundamentals of Git](http://gitimmersion.com/) +- [GitHub - Dealing with Multiple Accounts](http://hmkcode.com/git-tutorial/how-to-deal-with-multiple-github-accounts-on-one-computer/) +- [Try Git](https://try.github.io/levels/1/challenges/1) +- [Learn Git Branching: Interactive Game](http://pcottle.github.com/learnGitBranching/) +- [Atlassian Git Tutorials - Branches](https://www.atlassian.com/git/tutorials/using-branches/) + +### Reproducible Research +- [Markdown live demo](http://markdown-here.com/livedemo.html) +- [Boosting Slides by Ron Meir](https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) +- [Reproducible Research website](http://reproducibleresearch.net/) + +### Machine Learning +- [UC Irvine Machine Learning Data Repository](http://archive.ics.uci.edu/ml/) + +### Textbooks +- [OpenIntro textbook](https://www.openintro.org/stat/textbook.php) +- [Statlect - The digital textbook on probability and statistics](http://www.statlect.com/) +- [An Introduction to Statistical Learning with Applications in R](http://www-bcf.usc.edu/~gareth/ISL/) [[PDF, 4th printing]](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf) +- [The Elements of Statistical Learning: Data Mining, Inference, and Prediction](http://statweb.stanford.edu/~tibs/ElemStatLearn/) [[PDF, 10th ed]](http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf) + +### Further Reading + +- [Data Elixir - Free weekly newsletter of the best data-related resources and inspirations from around the web.](http://dataelixir.com/?referred=true) +- [Linkedin - Top 10 Big Data and Analytics References](https://www.linkedin.com/pulse/article/20140810194033-111366377-top-10-big-data-and-analytics-references) +- [Linkedin - Let's Get Nerdy: Data Analytics for Business Leaders Explained](https://www.linkedin.com/pulse/article/20140918162814-111366377-let-s-get-nerdy-data-analytics-for-business-leaders-explained) +- [Data Science Central : a great repository of news and resources for data science practitioners.](http://www.datasciencecentral.com) +- [Data Science Ontology - A visualized overview of Data Science concepts and tools](http://datascienceontology.com/) + +### Data Science Groups, Meetups, and Networking + +- [LinkedIn Data Science Specialisation Group](https://www.linkedin.com/groups/Coursera-Specialization-Data-Science-7495000?home=&gid=7495000&trk=anet_ug_hm&goback=%2Egmp_7495000) diff --git a/ddp.md b/ddp.md index af3ed219..0af67104 100644 --- a/ddp.md +++ b/ddp.md @@ -4,3 +4,22 @@ title: Developing Data Products permalink: /ddp/ --- +- [Slidify to Github walkthrough](http://rpubs.com/thoughtfulbloke/25103) +- [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides) + +## Shiny +- Choropleth of PBS WARN Distribution of Wireless Emergency Alerts + - [Code for Shiny App](https://github.com/amsilvr/shiny_choropleth) + - [App running on shinyapps.ip](https://silverman.shinyapps.io/warn_wea/) +- [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) +- [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) +- [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) +- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) +- [Shinyapps.io: Configuring Application Timeout](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataProd-shinyTimeoutConfig.md) +- [Plotting Natural Disasters](http://www.rpubs.com/DocOfi/367052) + +## Comprehensive Notes + +- Complete notes for [Developing Data Products](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/eda.md b/eda.md index 9133e18d..1f56ac70 100644 --- a/eda.md +++ b/eda.md @@ -4,3 +4,13 @@ title: Exploratory Data Analysis permalink: /eda/ --- +- [Creating a Kite Graph](http://rpubs.com/thoughtfulbloke/kitegraph) +- [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-) +- [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) +- [Data Analysis using Twitter API and Python](http://blog.impiyush.com/2015/03/data-analysis-using-twitter-api-and.html) +- [Exploratory Data Analysis using Flexdashboard](http://rpubs.com/DocOfi/350830) +- [Plotting using Metricsgraphics](http://www.rpubs.com/DocOfi/352947) + +## Comprehensive Notes + +- Complete notes for [Exploratory Data Analysis](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/getclean.md b/getclean.md index 4b1ae3a3..deeccc56 100644 --- a/getclean.md +++ b/getclean.md @@ -4,3 +4,24 @@ title: Getting and Cleaning Data permalink: /getclean/ --- +- [Subsetting example walkthrough](http://rpubs.com/thoughtfulbloke/subset) +- [Apples to Oranges Data Organisation Challenge](https://github.com/thoughtfulbloke/faoexample) +- [dplyr introductory tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) and [R Markdown document](http://rpubs.com/justmarkham/dplyr-tutorial): A 39-minute video tutorial that covers the five basic dplyr "verbs" and a dozen other dplyr functions. dplyr is an [update](http://blog.rstudio.org/2014/01/17/introducing-dplyr/) to the plyr package, useful for subsetting, sorting, summarizing, and merging data using a more intuitive syntax than plyr or base R. +- [dplyr "going deeper" tutorial](https://www.youtube.com/watch?v=2mh1PqfsXVI) and [R Markdown document](http://rpubs.com/justmarkham/dplyr-tutorial-part-2): A 37-minute video tutorial that covers the new functionality in dplyr versions 0.3 and 0.4. +- [Downloading files general advice](http://rpubs.com/thoughtfulbloke/downloadtips) +- [Codebook sample](https://gist.github.com/kirstenfrank/218c36a1938055d0f4e4) +- [Second Codebook sample](https://gist.github.com/kirstenfrank/699abe3e16fd1dc36e5d) +- [Query string (and other fields-within-fields) unrolling](http://rpubs.com/schnee/32988) +- [Pre-processing Excel files before loading them into R](https://github.com/alkashef/cleaningexceldata) +- [Codebook template that can be used in the Getting and Cleaning Data project](https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41) +- ["Real world" example - reading American Community Survey 2000 PUMS Data:](https://github.com/lgreski/acsexample) Demonstrates how to extract records of a given type from a data file containing multiple record types, and how to use an Excel-based code book to specify arguments for reading a fixed-width file. +- [18 Months of CTA advice](https://thoughtfulbloke.wordpress.com/2015/08/31/hello-world) +- [Common Problems: Quiz 1 - Missing Java Runtime](http://bit.ly/2jjtyXM) Explains how to solve the problem of a missing Java Runtime for the question that requires students to process a Microsoft Excel spreadsheet. +- [Strategy for Reading Files & APIs / Quiz 2](http://bit.ly/2e4L5oF) +- [Common Problems: Quiz 2 - sqldf() driver fails to connect](http://bit.ly/2kD2KTY) +- [Tutorial: Downloading Files](http://bit.ly/2iP2suj) Illustrates various ways of downloading files, including binary and text files. +- [Creating dataframes from xml data](https://www.dropbox.com/s/7bbzzp4bwsmfl5y/CreatingDataframesfrom%20XmlFiles.odt?dl=0) + +## Comprehensive Notes + +- Complete notes for [Getting and Cleaning Data](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/index.md b/index.md index 2801a3c4..761f3e41 100644 --- a/index.md +++ b/index.md @@ -4,7 +4,7 @@ layout: page ## Table of Contents -This is site is meant to serve as a directory for the amazing content the +This site is meant to serve as a directory for the amazing content the community has created around the Data Science Specialization. If you are interested in contributing [click here](https://github.com/DataScienceSpecialization/DataScienceSpecialization.github.io#contributing). @@ -17,4 +17,7 @@ interested in contributing [click here](https://github.com/DataScienceSpecializa 7. [Regression Models](/regmod/) 8. [Practical Machine Learning](/pml/) 9. [Developing Data Products](/ddp/) -10. [Other Resources](/other/) +10. [Capstone](/capstone/) + +- [Other Resources](/other/) +- [Curated Pages](/curated/) diff --git a/other.md b/other.md index 981d6786..ddb49135 100644 --- a/other.md +++ b/other.md @@ -4,3 +4,30 @@ title: Other Resources permalink: /other/ --- +## Configuring R and RStudio (Linux) + +- [Installing xlsx and XML packages on Debian Wheezy](http://allanino.me/blog/programming/installing-some-r-packages/) +- [Rscript to customize R environment](http://bit.ly/r-customize-script) - Installs packages used in the specialization. +- [Installing Some Basic R Packages in Ubuntu; Ibrahim El Merehbi](http://elmerehbi.wordpress.com/2014/09/09/installing-some-basic-r-packages-in-ubuntu) +- [Using Projects in RStudio](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) +- [Using Version Control with RStudio](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN) +- [Using R behind HTTP/HTTPS Proxy](https://support.rstudio.com/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-or-HTTPS-Proxy) + +### Ignoring R & RStudio files +- [gitignore template for R](https://github.com/github/gitignore/blob/master/R.gitignore) (source:[gitignore](https://github.com/github/gitignore)) +- [Github Help - Using Git / Ignoring files](https://help.github.com/articles/ignoring-files/) + +## Troubleshooting +- [Windows batch file to work around RStudio startup issues](https://github.com/stepds/contrib-DataScienceSpecialization/blob/master/README.md) + +## Pre-built virtual machines for R development. +- [Here's a pre-built lightweight Linux machine with R and RStudio already installed](https://github.com/queirozfcom/r-box). You just need to install [vagrant](https://www.vagrantup.com/downloads.html), download (or clone) the github repository and you'll get a clean ubuntu machine with the tools you'll need for the assignments. + +- [Data Science Toolbox](http://datasciencetoolbox.org/) - A virtual environment that allows you to start doing data science in a matter of minutes. + +- [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup + +## Deploying and sharing Shiny Apps with Docker +- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) diff --git a/pml.md b/pml.md index f5285643..1054002d 100644 --- a/pml.md +++ b/pml.md @@ -4,3 +4,34 @@ title: Practical Machine Learning permalink: /pml/ --- +## Model Evaluation + +- [Simple Guide to Confusion Matrix Terminology (sensitivity, specificity, etc.)](http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/) +- ROC curves and Area Under the Curve explained: [video tutorial](http://youtu.be/OAl6eAyP-yo), [companion blog post](http://www.dataschool.io/roc-curves-and-auc-explained/) (with video transcript and screenshots) + +## Supplementary Videos + +- [What is machine learning, and how does it work?](https://www.youtube.com/watch?v=elojMnjn4kk): A high-level overview of machine learning in a 10-minute video +- [Video lectures from "An Introduction to Statistical Learning"](http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/): Videos for Chapters 4, 5, 6, 8, and 10 can help to deepen your understanding of the topics presented in this course. + +## Machine Learning Competitions + +- [Participating in Kaggle's Allstate Purchase Prediction Challenge](http://www.dataschool.io/kaggle-allstate-purchase-prediction-challenge/): Description of what it's like to compete in a Kaggle competition, including links to a project paper, R code, presentation slides, and a presentation video. + +## Choosing a Machine Learning Model + +- [Comparing Supervised Learning Algorithms](http://www.dataschool.io/comparing-supervised-learning-algorithms/): Comparing 8 common supervised learning algorithms (for regression and classification) on 13 different dimensions. + +## Content Related to the Lectures + +- Complete notes for [Practical Machine Learning](http://sux13.github.io/DataScienceSpCourseNotes/) +- [Week 4: Combining Predictors -- Math Explained](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-combiningPredictorsBinomial.md) + +## Configuring Github Pages with RStudio for PML Project + +- Step by step instructions to [Configure Github Pages with RStudio](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-ghPagesSetup.md) to support the PML course project. + +## Improving Runtime Performance of Caret + +- Step by step instructions to [implement parallel processing in caret::train()](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-randomForestPerformance.md) on a random forest model, along with runtime performance analysis for a variety of laptops, ranging from an Intel Atom-based tablet to a quad-core i7 processor. + diff --git a/regmod.md b/regmod.md index c72eefd1..1445c83d 100644 --- a/regmod.md +++ b/regmod.md @@ -4,3 +4,10 @@ title: Regression Models permalink: /regmod/ --- +## Supplementary Videos + +- [Video lectures from "An Introduction to Statistical Learning"](http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/): Videos for Chapter 3 can help to deepen your understanding of regression. + +## Comprehensive Notes + +- Complete notes for [Regression Models](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/repres.md b/repres.md index 11519f8f..cba776f9 100644 --- a/repres.md +++ b/repres.md @@ -4,3 +4,13 @@ title: Reproducible Research permalink: /repres/ --- +- [Turning a RPubs document into a Github website walkthrough](https://github.com/thoughtfulbloke/appleorange) +- [Introduction to knitr with rmarkdown](https://sachsmc.github.io/knit-git-markr-guide/knitr/knit.html) +- [Trends and severity of Data Breaches](http://rpubs.com/ww44ss/29389) +- [Benefit-cost analysis of a park user fee](https://rstudio-pubs-static.s3.amazonaws.com/72135_dc45211d976842c2a9a8c8b5f2472ff0.html) +- [Data Lake Integrity](http://rpubs.com/rshane/81297) +- [ProjectTemplate in RStudio with Git](http://padamson.github.io/r/rstudio/projecttemplate/git/2016/01/17/projecttemplate-in-rstudio-with-git.html) + +## Comprehensive Notes + +- Complete notes for [Reproducible Research](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/rprog.md b/rprog.md index 430688f2..47df54d1 100644 --- a/rprog.md +++ b/rprog.md @@ -1,9 +1,57 @@ --- -layout: page -title: R Programming +title: "R Programming" permalink: /rprog/ +layout: page --- +## Getting Started +- [Resources for R Programming](http://bit.ly/2dhZ8Dy) +- [References for R Programming](http://bit.ly/2b8AxhF) +- [Data Science Specialization Value Proposition](http://bit.ly/2j3EcCn) +- [R Onboarding for SAS Users](http://bit.ly/2dr7yum) + ## Programming Assignments -- [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) +- [Strategy for Coding the Programming Assignments](http://bit.ly/2ddFh9A) +- [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) +- [Breaking Down pollutantmean](http://bit.ly/2cHyiCl) +- [Assignment 1: A More Elegant Solution](http://bit.ly/2kwBBlK) +- [A SAS Version of pollutantmean?](http://bit.ly/2d3DR4e) +- [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) +- [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) +- [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) +- [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) +- [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) +- [Grading the SHA-1 Hash Code](http://bit.ly/2iUWoB6) +- [Assignment 2: Demystifying makeVector](http://bit.ly/2bTXXfq) +- [Assignment 2: makeCacheMatrix as an Object](http://bit.ly/2byUe4e) + + +## R Language + +- [Some notes on the R Language](http://lopezrj.github.io) +- [A Data Frame is Also a List](http://bit.ly/2fmMRAp) +- [S Objects, R Objects, and Lexical Scoping](http://bit.ly/2dtOSXi) +- [Common R Mistakes: Overwriting Functions with Data Objects](http://bit.ly/2i3gmoA) +- [Forms of the Extract Operator](http://bit.ly/2bzLYTL) +- [Functions to Sort Data Frames](http://bit.ly/2dxItzw) +- [Creative Use of R: Downloading Course Lectures](http://bit.ly/2bGlI7R) Article illustrating how to use R to automate the download of lectures from *Data Science Specialization* courses, such as *R Programming*. Techniques used in this article are helpful to make research reproducible, as required for courses like *Getting and Cleaning Data* and *Reproducible Research*. +- [Lexical Scoping and Statistical Computing](http://bit.ly/2cmqAPy) Article by Robert Gentleman and Ross Ihaka at the University of Auckland describing how lexical scoping works, and why it is valuable in statistical computing. +- [Data Science Job Report 2017: R Passes SAS, But Python Leaves Them Both Behind](http://bit.ly/2oCHulX) Bob Muenchen's take on the job market for various data science langauges. + + + +## R language cheatsheet + +- [R cheatsheet covering all lectures](https://github.com/startupjing/Tech_Notes/blob/master/R/R_language.md) + +## R and Commercial Statistics Packages + +- [R Onboarding for SAS Users](http://bit.ly/2dr7yum) Provides an overview and links to a variety of resources to help people with SAS experience make the transition to R +- [Commercial Statistics Packages: An Historical Perspective](http://bit.ly/2fPj2qN) +- [Why is R More Difficult than SAS?](http://bit.ly/2erxk3A) +- [Thinking in R versus Thinking in SAS](http://bit.ly/2cH3u8x) + +## Comprehensive Notes + +- Complete notes for [R Programming](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/statinf.md b/statinf.md index fb8017ab..19592a27 100644 --- a/statinf.md +++ b/statinf.md @@ -4,3 +4,19 @@ title: Statistical Inference permalink: /statinf/ --- +- [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) +- [CONCEPTS: Calculating Area for a Point on the Normal Curve](http://bit.ly/2hw5AMF) Reviews the mathematics that explain why one cannot calculate the exact proability for a specific value within a distribution for a continuous variable, and illustrates how to calculate a quantile for a point on the curve. +- [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) +- [Exponential Distribution / Central Limit Theorem - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-expDistChecklist.md) +- [ToothGrowth Analysis - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/ToothGrowthChecklist.md) +- [Exploratory Data Analysis in ToothGrowth Assignment](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/edaInToothGrowthAnalysis.md), explaining the exploratory data analysis requirement for students who have not taken the *Exploratory Data Analysis* course prior to taking *Statistical Inference*. +- [Using MathJax with Discussion Forums, R Markdown, and Github Pages](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/mathjaxWithGithubMarkdown.md) +- [Kable Tables with Data Frames](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/kableDataFrameTable.md) illustrates how to display a custom table in a `knitr()` document by creating a data frame to contain the information to be rendered with `kable()`. +- [Interactive Confidence Interval Visualization](https://github.com/amcadie/interactive_CI) +- [Installing MiKTeK on Windows 10 / Generate a PDF from knitr](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-generatePDF.md) +- [Power calculations: optimal szmple size](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-optimalSampleSize.md) +- [Permutation Tests Explained](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-permutationTests.md) + +## Comprehensive Notes + +- Complete notes for [Statistical Inference](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/toolbox.md b/toolbox.md index de57dac6..3c2dfc68 100644 --- a/toolbox.md +++ b/toolbox.md @@ -6,6 +6,21 @@ permalink: /toolbox/ ## Command Line +- [Working with files in Bash](http://edgarsh.es/ins/working-with-files-in-bash/) + ## Git/GitHub -- [Git & GitHub Video Playlist](https://www.youtube.com/playlist?list=PL5-da3qGB5IBLMp7LtN8Nc3Efd4hJq0kD) \ No newline at end of file +- [Git & GitHub Video Playlist](https://www.youtube.com/playlist?list=PL5-da3qGB5IBLMp7LtN8Nc3Efd4hJq0kD) (also available for [download](https://drive.google.com/folderview?id=0BxRfg0msVmAoRlZFQjJ3T3VTOUE&usp=sharing) as mp4 files) +- [A Beginner's Quick Reference Guide for Git Commands](http://www.dataschool.io/git-quick-reference-for-beginners/) +- [Understanding the Relationship Between Git and GitHub](http://www.dataschool.io/github-is-just-dropbox-for-git/) +- [Simple Guide to GitHub Forks](http://www.dataschool.io/simple-guide-to-forks-in-github-and-git/) +- [Github Repo Tutorial How to fork a repo, download it to your local drive and commit changes ](https://www.youtube.com/watch?v=MY94AIplcaU) +- [Configuring RStudio to work with Git / Github - Mac OSX](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/configureRStudioGitOSXVersion.md) +- [Configuring RStudio to work with Git / Github - Windows](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/configureRStudioGitWindowsVersion.md) + +## Comprehensive Notes + +- Complete notes for [The Data Scientist's Toolbox](http://sux13.github.io/DataScienceSpCourseNotes/) + +## Miscellaneous +- [Using Editor Modes in Coursera Discussion Forum Posts](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/usingMarkdownInForumPosts.md)