SPARK-4365: Remove unnecessary filter call on records returned from parquet library #3229

saucam · 2014-11-12T14:16:09Z

Since parquet library has been updated , we no longer need to filter the records returned from parquet library for null records , as now the library skips those :

from parquet-hadoop/src/main/java/parquet/hadoop/InternalParquetRecordReader.java

public boolean nextKeyValue() throws IOException, InterruptedException {
boolean recordFound = false;
while (!recordFound) {
// no more records left
if (current >= total)
{ return false; }
try {
checkRead();
currentValue = recordReader.read();
current ++;
if (recordReader.shouldSkipCurrentRecord())
{
// this record is being filtered via the filter2 package
if (DEBUG) LOG.debug("skipping record");
continue;
}
if (currentValue == null)
{
// only happens with FilteredRecordReader at end of block current = totalCountLoadedSoFar;
if (DEBUG) LOG.debug("filtered record reader reached end of block");
continue;
}

recordFound = true;
if (DEBUG) LOG.debug("read value: " + currentValue);
} catch (RuntimeException e)
{ throw new ParquetDecodingException(format("Can not read value at %d in block %d in file %s", current, currentBlock, file), e); }

}
return true;
}

…arquet library

AmplabJenkins · 2014-11-12T14:17:10Z

Can one of the admins verify this patch?

liancheng · 2014-11-14T14:27:59Z

test this please

SparkQA · 2014-11-14T14:34:59Z

Test build #23370 has started for PR 3229 at commit 8909ae9.

This patch merges cleanly.

SparkQA · 2014-11-14T15:41:30Z

Test build #23370 has finished for PR 3229 at commit 8909ae9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-14T15:41:33Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23370/
Test PASSed.

marmbrus · 2014-11-14T23:17:00Z

Thanks! Merged to master and 1.2.

…from parquet library Since parquet library has been updated , we no longer need to filter the records returned from parquet library for null records , as now the library skips those : from parquet-hadoop/src/main/java/parquet/hadoop/InternalParquetRecordReader.java public boolean nextKeyValue() throws IOException, InterruptedException { boolean recordFound = false; while (!recordFound) { // no more records left if (current >= total) { return false; } try { checkRead(); currentValue = recordReader.read(); current ++; if (recordReader.shouldSkipCurrentRecord()) { // this record is being filtered via the filter2 package if (DEBUG) LOG.debug("skipping record"); continue; } if (currentValue == null) { // only happens with FilteredRecordReader at end of block current = totalCountLoadedSoFar; if (DEBUG) LOG.debug("filtered record reader reached end of block"); continue; } recordFound = true; if (DEBUG) LOG.debug("read value: " + currentValue); } catch (RuntimeException e) { throw new ParquetDecodingException(format("Can not read value at %d in block %d in file %s", current, currentBlock, file), e); } } return true; } Author: Yash Datta <[email protected]> Closes #3229 from saucam/remove_filter and squashes the following commits: 8909ae9 [Yash Datta] SPARK-4365: Remove unnecessary filter call on records returned from parquet library (cherry picked from commit 63ca3af) Signed-off-by: Michael Armbrust <[email protected]>

saucam · 2014-11-15T06:02:56Z

Thanks everyone!

SPARK-4365: Remove unnecessary filter call on records returned from p…

8909ae9

…arquet library

asfgit closed this in 63ca3af Nov 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SPARK-4365: Remove unnecessary filter call on records returned from parquet library #3229

SPARK-4365: Remove unnecessary filter call on records returned from parquet library #3229

Uh oh!

saucam commented Nov 12, 2014

Uh oh!

AmplabJenkins commented Nov 12, 2014

Uh oh!

liancheng commented Nov 14, 2014

Uh oh!

SparkQA commented Nov 14, 2014

Uh oh!

SparkQA commented Nov 14, 2014

Uh oh!

AmplabJenkins commented Nov 14, 2014

Uh oh!

marmbrus commented Nov 14, 2014

Uh oh!

saucam commented Nov 15, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SPARK-4365: Remove unnecessary filter call on records returned from parquet library #3229

SPARK-4365: Remove unnecessary filter call on records returned from parquet library #3229

Uh oh!

Conversation

saucam commented Nov 12, 2014

Uh oh!

AmplabJenkins commented Nov 12, 2014

Uh oh!

liancheng commented Nov 14, 2014

Uh oh!

SparkQA commented Nov 14, 2014

Uh oh!

SparkQA commented Nov 14, 2014

Uh oh!

AmplabJenkins commented Nov 14, 2014

Uh oh!

marmbrus commented Nov 14, 2014

Uh oh!

saucam commented Nov 15, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants