Skip to content
Closed
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
9f52490
Progress commit renaming Array
nealrichardson Sep 4, 2019
1f6d154
Replace array() with Array()
nealrichardson Sep 4, 2019
8edf085
Remove more backticks
nealrichardson Sep 4, 2019
9fbecda
A few more backticks
nealrichardson Sep 4, 2019
3f1cd71
Object
nealrichardson Sep 5, 2019
bbf0799
Buffer
nealrichardson Sep 5, 2019
3b4b492
ChunkedArray
nealrichardson Sep 5, 2019
4075897
compression
nealrichardson Sep 5, 2019
12031ad
Backfill some methods
nealrichardson Sep 5, 2019
1711d3e
CastOptions
nealrichardson Sep 5, 2019
fbebf27
io
nealrichardson Sep 5, 2019
9bd708f
csv
nealrichardson Sep 5, 2019
55607a6
json
nealrichardson Sep 5, 2019
0e7877b
Drop ::ipc::
nealrichardson Sep 5, 2019
365fedc
feather
nealrichardson Sep 5, 2019
702a0b1
Message
nealrichardson Sep 5, 2019
730313e
One more find/replace, esp. RecordBatch*
nealrichardson Sep 5, 2019
b694511
Remove defunct Column class
nealrichardson Sep 5, 2019
2d1b738
Replace table() with Table()
nealrichardson Sep 5, 2019
71cac57
Clean up Rd file names, experiment with documenting constructors, and…
nealrichardson Sep 5, 2019
85a8d36
Start vignette draft explaining the class and naming conventions
nealrichardson Sep 6, 2019
3e4cfe7
Clean up parquet classes and document the R6
nealrichardson Sep 6, 2019
96873e1
Factor out make_readable_file
nealrichardson Sep 6, 2019
5fd49ef
Fix check failures
nealrichardson Sep 6, 2019
495abf6
Fill in documentation and standardize file naming
nealrichardson Sep 6, 2019
e6b75f4
Consolidate and document reader/writer classes; also fix ARROW-6449
nealrichardson Sep 6, 2019
8683f10
Add content to vignette from blog post
nealrichardson Sep 6, 2019
0150d99
Rename Field.R to field.R
nealrichardson Sep 6, 2019
924edd1
Rename List.R to list.R
nealrichardson Sep 6, 2019
358290b
Rename Schema.R to schema.R
nealrichardson Sep 6, 2019
8bd52d7
Rename Struct.R to struct.R
nealrichardson Sep 6, 2019
35f00f5
Rename Table.R to table.R
nealrichardson Sep 6, 2019
adf1cf9
File renaming (not case-sensitive)
nealrichardson Sep 10, 2019
caf3265
PR feedback from romain
nealrichardson Sep 10, 2019
01084ce
Factor out assert_is()
nealrichardson Sep 10, 2019
22c9d04
More doc cleaning
nealrichardson Sep 10, 2019
3c6f85b
:rat:
nealrichardson Sep 10, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Start vignette draft explaining the class and naming conventions
  • Loading branch information
nealrichardson committed Sep 10, 2019
commit 85a8d3631127f7fa681b3c34dc8f5e1299c39351
21 changes: 21 additions & 0 deletions r/vignettes/arrow.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "Using the Arrow C++ Library in R"
description: "This document describes the low-level interface to the Apache Arrow C++ library in R and reviews the patterns and conventions of the R package."
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Using the Arrow C++ Library in R}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

The Apache Arrow C++ library provides rich, powerful features for working with columnar data. The `arrow` R package provides both a low-level interface to the C++ library and some higher-level, R-flavored tools for working with it. This vignette provides an overview of how the pieces fit together, and it describes the conventions that the classes and methods follow in R.

# Class structure

C++ is an object-oriented language, so the core logic of the Arrow library is encapsulated in classes and methods. In the R package, these classes are implemented as `R6` reference classes, most of which are exported from the namespace.

In order to match the C++ naming conventions, the `R6` classes are in TitleCase, e.g. `RecordBatch`. This makes it easy to look up the relevant C++ implementations in the [code](https://github.com/apache/arrow/tree/master/cpp) or [documentation](https://arrow.apache.org/docs/cpp/). To simplify things in R, the C++ library namespaces are generally dropped or flattened; that is, where the C++ library has `arrow::io::FileOutputStream`, it is just `FileOutputStream` in the R package. One exception is for the file readers, where the namespace is necessary to disambiguate. So `arrow::csv::TableReader` becomes `CsvTableReader`, and `arrow::json::TableReader` becomes `JsonTableReader`.

Some of these classes are not meant to be instantiated directly; they may be base classes or other kinds of helpers. For those that you should be able to create, use the `$create()` method to instantiate an object. For example, `rb <- RecordBatch$create(int = 1:10, dbl = as.numeric(1:10))` will create a `RecordBatch`. Many of these factory methods that an R user might most often encounter also have a `snake_case` alias, in order to be more familiar for contemporary R users. So `record_batch(int = 1:10, dbl = as.numeric(1:10))` would do the same as `RecordBatch$create()` above.

The typical user of the `arrow` R package may never deal directly with the `R6` objects. We provide more R-friendly wrapper functions as a higher-level interface to the C++ library. An R user can call `read_parquet()` without knowing or caring that they're instantiating a `ParquetFileReader` object and calling the `$ReadFile()` method on it. The classes are there and available to the advanced programmer who wants fine-grained control over how the C++ library is used.