Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
851b6a9
SPARK-5217 Spark UI should report pending stages during job execution…
ScrapCodes Jan 19, 2015
3453d57
[SPARK-3288] All fields in TaskMetrics should be private and use gett…
Jan 19, 2015
4a4f9cc
[SPARK-5088] Use spark-class for running executors directly
jongyoul Jan 19, 2015
1ac1c1d
MAINTENANCE: Automated closing of pull requests.
pwendell Jan 19, 2015
4432568
[SPARK-5282][mllib]: RowMatrix easily gets int overflow in the memory…
hhbyyh Jan 19, 2015
cd5da42
[SPARK-5284][SQL] Insert into Hive throws NPE when a inner complex ty…
yhuai Jan 19, 2015
2604bc3
[SPARK-5286][SQL] Fail to drop an invalid table when using the data s…
yhuai Jan 19, 2015
74de94e
[SPARK-4504][Examples] fix run-example failure if multiple assembly j…
gvramana Jan 19, 2015
e69fb8c
[SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop
zsxwing Jan 20, 2015
306ff18
SPARK-5270 [CORE] Provide isEmpty() function in RDD API
srowen Jan 20, 2015
debc031
[SQL][minor] Add a log4j file for catalyst test.
rxin Jan 20, 2015
4afad9c
[SPARK-4803] [streaming] Remove duplicate RegisterReceiver message
ilayaperumalg Jan 20, 2015
9d9294a
[SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException
jongyoul Jan 20, 2015
8140802
[SQL][Minor] Refactors deeply nested FP style code in BooleanSimplifi…
liancheng Jan 20, 2015
c93a57f
SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3…
jacek-lewandowski Jan 20, 2015
769aced
[SPARK-5329][WebUI] UIWorkloadGenerator should stop SparkContext.
sarutak Jan 20, 2015
23e2554
SPARK-5019 [MLlib] - GaussianMixtureModel exposes instances of Multiv…
tgaloppo Jan 20, 2015
bc20a52
[SPARK-5287][SQL] Add defaultSizeOf to every data type.
yhuai Jan 20, 2015
d181c2a
[SPARK-5323][SQL] Remove Row's Seq inheritance.
rxin Jan 20, 2015
2f82c84
[SPARK-5186] [MLLIB] Vector.equals and Vector.hashCode are very inef…
hhbyyh Jan 20, 2015
9a151ce
[SPARK-5294][WebUI] Hide tables in AllStagePages for "Active Stages, …
sarutak Jan 21, 2015
bad6c57
[SPARK-5275] [Streaming] include python source code
Jan 21, 2015
ec5b0f2
[HOTFIX] Update pom.xml to pull MapR's Hadoop version 2.4.1.
rkannan82 Jan 21, 2015
424d8c6
[SPARK-5297][Streaming] Fix Java file stream type erasure problem
jerryshao Jan 21, 2015
8c06a5f
[SPARK-5336][YARN]spark.executor.cores must not be less than spark.ta…
WangTaoTheTonic Jan 21, 2015
2eeada3
SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in Ya…
sryza Jan 21, 2015
aa1e22b
[MLlib] [SPARK-5301] Missing conversions and operations on IndexedRow…
Jan 21, 2015
7450a99
[SPARK-4749] [mllib]: Allow initializing KMeans clusters using a seed
str-janus Jan 21, 2015
3ee3ab5
[SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT gra…
Jan 21, 2015
812d367
[SPARK-5244] [SQL] add coalesce() in sql parser
adrian-wang Jan 21, 2015
8361078
[SPARK-5009] [SQL] Long keyword support in SQL Parsers
chenghao-intel Jan 21, 2015
b328ac6
Revert "[SPARK-5244] [SQL] add coalesce() in sql parser"
JoshRosen Jan 21, 2015
ba19689
[SQL] [Minor] Remove deprecated parquet tests
liancheng Jan 21, 2015
3be2a88
[SPARK-4984][CORE][WEBUI] Adding a pop-up containing the full job des…
scwf Jan 21, 2015
9bad062
[SPARK-5355] make SparkConf thread-safe
Jan 22, 2015
27bccc5
[SPARK-5202] [SQL] Add hql variable substitution support
chenghao-intel Jan 22, 2015
ca7910d
[SPARK-3424][MLLIB] cache point distances during k-means|| init
mengxr Jan 22, 2015
fcb3e18
[SPARK-5317]Set BoostingStrategy.defaultParams With Enumeration Algo.…
Peishen-Jia Jan 22, 2015
3027f06
[SPARK-5147][Streaming] Delete the received data WAL log periodically
tdas Jan 22, 2015
246111d
[SPARK-5365][MLlib] Refactor KMeans to reduce redundant data
viirya Jan 22, 2015
820ce03
SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAll…
sryza Jan 22, 2015
3c3fa63
[SPARK-5233][Streaming] Fix error replaying of WAL introduced bug
jerryshao Jan 23, 2015
e0f7fb7
[SPARK-5315][Streaming] Fix reduceByWindow Java API not work bug
jerryshao Jan 23, 2015
ea74365
[SPARK-3541][MLLIB] New ALS implementation with improved storage
mengxr Jan 23, 2015
cef1f09
[SPARK-5063] More helpful error messages for several invalid operations
JoshRosen Jan 24, 2015
e224dbb
[SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a p…
maropu Jan 24, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-5323][SQL] Remove Row's Seq inheritance.
Author: Reynold Xin <rxin@databricks.com>

Closes apache#4115 from rxin/row-seq and squashes the following commits:

e33abd8 [Reynold Xin] Fixed compilation error.
cceb650 [Reynold Xin] Python test fixes, and removal of WrapDynamic.
0334a52 [Reynold Xin] mkString.
9cdeb7d [Reynold Xin] Hive tests.
15681c2 [Reynold Xin] Fix more test cases.
ea9023a [Reynold Xin] Fixed a catalyst test.
c5e2cb5 [Reynold Xin] Minor patch up.
b9cab7c [Reynold Xin] [SPARK-5323][SQL] Remove Row's Seq inheritance.
  • Loading branch information
rxin authored and marmbrus committed Jan 20, 2015
commit d181c2a1fc40746947b97799b12e7dd8c213fa9c
75 changes: 71 additions & 4 deletions sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql

import scala.util.hashing.MurmurHash3

import org.apache.spark.sql.catalyst.expressions.GenericRow


Expand All @@ -32,7 +34,7 @@ object Row {
* }
* }}}
*/
def unapplySeq(row: Row): Some[Seq[Any]] = Some(row)
def unapplySeq(row: Row): Some[Seq[Any]] = Some(row.toSeq)

/**
* This method can be used to construct a [[Row]] with the given values.
Expand All @@ -43,6 +45,16 @@ object Row {
* This method can be used to construct a [[Row]] from a [[Seq]] of values.
*/
def fromSeq(values: Seq[Any]): Row = new GenericRow(values.toArray)

def fromTuple(tuple: Product): Row = fromSeq(tuple.productIterator.toSeq)

/**
* Merge multiple rows into a single row, one after another.
*/
def merge(rows: Row*): Row = {
// TODO: Improve the performance of this if used in performance critical part.
new GenericRow(rows.flatMap(_.toSeq).toArray)
}
}


Expand Down Expand Up @@ -103,7 +115,13 @@ object Row {
*
* @group row
*/
trait Row extends Seq[Any] with Serializable {
trait Row extends Serializable {
/** Number of elements in the Row. */
def size: Int = length

/** Number of elements in the Row. */
def length: Int

/**
* Returns the value at position i. If the value is null, null is returned. The following
* is a mapping between Spark SQL types and return types:
Expand Down Expand Up @@ -291,12 +309,61 @@ trait Row extends Seq[Any] with Serializable {

/** Returns true if there are any NULL values in this row. */
def anyNull: Boolean = {
val l = length
val len = length
var i = 0
while (i < l) {
while (i < len) {
if (isNullAt(i)) { return true }
i += 1
}
false
}

override def equals(that: Any): Boolean = that match {
case null => false
case that: Row =>
if (this.length != that.length) {
return false
}
var i = 0
val len = this.length
while (i < len) {
if (apply(i) != that.apply(i)) {
return false
}
i += 1
}
true
case _ => false
}

override def hashCode: Int = {
// Using Scala's Seq hash code implementation.
var n = 0
var h = MurmurHash3.seqSeed
val len = length
while (n < len) {
h = MurmurHash3.mix(h, apply(n).##)
n += 1
}
MurmurHash3.finalizeHash(h, n)
}

/* ---------------------- utility methods for Scala ---------------------- */

/**
* Return a Scala Seq representing the row. ELements are placed in the same order in the Seq.
*/
def toSeq: Seq[Any]

/** Displays all elements of this sequence in a string (without a separator). */
def mkString: String = toSeq.mkString

/** Displays all elements of this sequence in a string using a separator string. */
def mkString(sep: String): String = toSeq.mkString(sep)

/**
* Displays all elements of this traversable or iterator in a string using
* start, end, and separator strings.
*/
def mkString(start: String, sep: String, end: String): String = toSeq.mkString(start, sep, end)
}
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,9 @@ trait ScalaReflection {
}

def convertRowToScala(r: Row, schema: StructType): Row = {
// TODO: This is very slow!!!
new GenericRow(
r.zip(schema.fields.map(_.dataType))
r.toSeq.zip(schema.fields.map(_.dataType))
.map(r_dt => convertToScala(r_dt._1, r_dt._2)).toArray)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -272,9 +272,6 @@ package object dsl {
def sfilter[T1](arg1: Symbol)(udf: (T1) => Boolean) =
Filter(ScalaUdf(udf, BooleanType, Seq(UnresolvedAttribute(arg1.name))), logicalPlan)

def sfilter(dynamicUdf: (DynamicRow) => Boolean) =
Filter(ScalaUdf(dynamicUdf, BooleanType, Seq(WrapDynamic(logicalPlan.output))), logicalPlan)

def sample(
fraction: Double,
withReplacement: Boolean = true,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,8 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
val casts = from.fields.zip(to.fields).map {
case (fromField, toField) => cast(fromField.dataType, toField.dataType)
}
buildCast[Row](_, row => Row(row.zip(casts).map {
// TODO: This is very slow!
buildCast[Row](_, row => Row(row.toSeq.zip(casts).map {
case (v, cast) => if (v == null) null else cast(v)
}: _*))
}
Expand Down
Loading