Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
address comments
  • Loading branch information
cloud-fan committed Jan 3, 2018
commit c82fc5b160c9ef302be499f6d14ec1f5e6695196
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@
*
* ColumnVector supports all the data types including nested types. To handle nested types,
* ColumnVector can have children and is a tree structure. For struct type, it stores the actual
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: child -> children

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already children

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my mistake.

* data of each field in the corresponding child ColumnVector, and only store null information in
* data of each field in the corresponding child ColumnVector, and only stores null information in
* the parent ColumnVector. For array type, it stores the actual array elements in the child
* ColumnVector, and store null information, array offsets and lengths in the parent ColumnVector.
* ColumnVector, and stores null information, array offsets and lengths in the parent ColumnVector.
*
* ColumnVector is expected to be reused during the entire data loading process, to avoid allocating
* memory again and again.
*
* ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint,
* implementations should prefer computing efficiency over storage efficiency when design the
* ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint.
* Implementations should prefer computing efficiency over storage efficiency when design the
* format. Since it is expected to reuse the ColumnVector instance while loading data, the storage
* footprint is negligible.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@
import org.apache.spark.sql.types.StructType;

/**
* This class is a wrapper of multiple ColumnVectors and represents a logical table-like data
* structure. It provides a row-view of this batch so that Spark can access the data row by row.
* Instance of it is meant to be reused during the entire data loading process.
* This class wraps multiple ColumnVectors as a row-wise table. It provides a row view of this
* batch so that Spark can access the data row by row. Instance of it is meant to be reused during
* the entire data loading process.
*/
public final class ColumnarBatch {
public static final int DEFAULT_BATCH_SIZE = 4 * 1024;
Expand Down Expand Up @@ -79,7 +79,7 @@ public void remove() {
}

/**
* Sets the number of rows that are valid in this batch.
* Sets the number of rows in this batch.
*/
public void setNumRows(int numRows) {
assert(numRows <= this.capacity);
Expand Down