You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* `dt[, .N by =x]` = creates a table to count observations by the value of x
179
+
* `dt[, .N, by =x]` = creates a table to count observations by the value of x
180
180
***keys** (quickly filter/subset)
181
-
**example*: `dt <- data.table(x = rep(c("a", "b", "c"), each 100), y = rnorm(300))` = generates data table
181
+
**example*: `dt <- data.table(x = rep(c("a", "b", "c"), each = 100), y = rnorm(300))` = generates data table
182
182
* `setkey(dt, x)` = set the key to the x column
183
183
* `dt['a']` = returns a data frame, where x = 'a' (effectively filter)
184
184
***joins** (merging tables)
@@ -187,9 +187,9 @@ $\pagebreak$
187
187
* `setkey(dt1, x); setkey(dt2, x)` = sets the keys for both data tables to be column x
188
188
* `merge(dt1, dt2)` = returns a table, combine the two tables using column x, filtering to only the values that match up between common elements the two x columns (i.e. 'a') and the data is merged together
189
189
***fast reading of files**
190
-
**example*: `big_df <- data.frame(norm(1e6), norm(1e6))` = generates data table
190
+
**example*: `big_df <- data.frame(rnorm(1e6), rnorm(1e6))` = generates data table
* `write.table(big.df, file=file, row.names=FALSE, col.names = TRUE, sep = "\t". quote = FALSE)` = writes the generated data from big.df to the empty temp file
192
+
* `write.table(big_df, file=file, row.names=FALSE, col.names = TRUE, sep = "\t", quote = FALSE)` = writes the generated data from big.df to the empty temp file
193
193
* `fread(file)` = read file and load data = much faster than `read.table()`
194
194
195
195
@@ -202,7 +202,7 @@ $\pagebreak$
202
202
* free/widely used open sources database software, widely used for Internet base applications
203
203
* each row = record
204
204
* data are structured in databases $\rightarrow$ series tables (dataset) $\rightarrow$ fields (columns in dataset)
205
-
*`dbConnect(MySQL(), user = "genome", db = "hg19", host = "genome-mysql.cse.ucsc.edu)` = open a connection to the database
205
+
*`dbConnect(MySQL(), user = "genome", db = "hg19", host = "genome-mysql.cse.ucsc.edu")` = open a connection to the database
206
206
*`db = "hg19"` = select specific database
207
207
*`MySQL()` can be replaced with other arguments to use other data structures
208
208
*`dbGetQuery(db, "show databases;")` = return the result from the specified SQL query executed through the connection
@@ -473,7 +473,7 @@ $\pagebreak$
473
473
## Subsetting and Sorting
474
474
***subsetting**
475
475
*`x <- data.frame("var1" = sample(1:5), "var2" = sample(6:10), "var3" = (11:15))` = initiates a data frame with three names columns
476
-
*`x <- x[sample(1:5)` = this scrambles the rows
476
+
*`x <- x[sample(1:5),]` = this scrambles the rows
477
477
*`x$var2[c(2,3)] = NA` = setting the 2nd and 3rd element of the second column to NA
478
478
*`x[1:2, "var2"]` = subsetting the first two row of the the second column
479
479
*`x[(x$var1 <= 3 | x$var3 > 15), ]` = return all rows of x where the first column is less than or equal to three or where the third column is bigger than 15
0 commit comments