Skip to content

Conversation

@falaki
Copy link
Contributor

@falaki falaki commented Dec 12, 2017

What changes were proposed in this pull request?

dev/lint-r.R file installs uses devtools to install jimhester/lintr package in the default site library location which is /usr/local/lib/R/site-library. This is not recommended and can fail because we are running this script as jenkins while that directory is owned by root.

This patch changes lint-r.R to install the linter package in a local directory.

How was this patch tested?

Existing linter should pass.

cc @JoshRosen

@JoshRosen
Copy link
Contributor

Because LOCAL_LIB_LOC seems to default to $SPARK_HOME/R/lib, will installing to this folder create the potential for lintr to be installed into a lib folder which gets packaged up in binary builds? Just want to double-check to make sure this won't mess up any packaging scripts which might be interacting with that directory. (I'm not super familiar with how R packaging works, so just wanted to double-check and be safe).

@falaki
Copy link
Contributor Author

falaki commented Dec 12, 2017

@JoshRosen the SparkR package is built from spark/R/pkg directory. $SPARK_HOME/R/lib is just a place where SparkR is installed (after it has been built) and at runtime loaded from.

dev/lint-r.R Outdated

# Installs lintr from Github in a local directory.
# NOTE: The CRAN's version is too old to adapt to our rules.
if ("lintr" %in% row.names(installed.packages()) == FALSE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@falaki, actually, seems it's not installing lintr in Jenkins:

Error in library(lintr, lib.loc = LOCAL_LIB_LOC) : 
  there is no package called 'lintr'
Execution halted

I think it's because there's lintr in Jenkins already and "lintr" %in% row.names(installed.packages()) returns TRUE.

So, I think lintr is already not synced to jimhester/lintr@5431140 and I had some discussion, for example, here - https://issues.apache.org/jira/browse/SPARK-22063

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @shivaram, @felixcheung and @shaneknapp as well for your information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. I think in that case we should just install it anyway and do not rely on the existing installation. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that sounds ok to me; however, I remember @shivaram had a different idea(?) (did I remember this correctly .. ?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather not have the build installing system-level packages, unless it's in it's own distinct environment... the R ecosystem on our jenkins is pretty fragile as it is, so let me think about how i want to go about addressing this. updating the workers' packages isn't that big of a deal, i would just like it to be reproducable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing Rcpp installed on our jenkins machines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - so it means that the existing Rcpp was used to build the existing lintr right ? And if we are not changing lintr versions then the same Rcpp should still work ? Sorry if I'm missing something here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see two possibilities. Unfortunately, I cannot get access to machines to verify:

  1. There is already an older version so ("lintr" %in% row.names(installed.packages()) returns 'TRUE`
  2. When installing lintr in a new directory some transient dependency check behaves different and we end up rejecting existing Rcpp version.

Why do you think re-installing Rcpp in a non-system directory is an issue? This is under $SPARK_HOME and will be wiped with every new test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I'm trying to make sure is that the lint-r version we are using is not changing because of this change. If we need a new Rcpp to install this lint-r then I'm worried something has changed ?

Also it seems fragile to expect Rcpp to be there at that repo with this version ? The github-based install lint-r seems more stable to me.

One more final question is what is xmlparsedata used for ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does export R_LIBS_USER=~/Rlib work for installing Rcpp?

devtools install_version can fetch a specific version of lintr I think
https://www.rdocumentation.org/packages/devtools/versions/1.13.3/topics/install_version

@SparkQA
Copy link

SparkQA commented Dec 12, 2017

Test build #84808 has finished for PR 19959 at commit 6892725.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84813 has finished for PR 19959 at commit 4355fae.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Just for a refreshing reminder about #19959 (comment).

One problem was how to deal with other branches too in our single environment. If this is simply upgraded in workers, it might cause style checking failures in other branches, if I understood correctly. Here, #19290, I fixed the styles in a batch caught by jimhester/lintr@5431140 so it's fine in the master but might be broken in other branches if lintr is upgraded naively in workers.

One suggestion was virtual env in R by @shivaram here.

Another quick suggestion was to install it by controlling this by Jenkins environment variable, which I don't even like but at least works - #19290 (comment)

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84804 has finished for PR 19959 at commit 2376b0b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@falaki
Copy link
Contributor Author

falaki commented Dec 14, 2017

@shivaram are there other concerns with this patch?

@jiangxb1987
Copy link
Contributor

ping @shivaram

@HyukjinKwon
Copy link
Member

actually ping @falaki for #19959 (comment) if I understood correctly.

@SparkQA
Copy link

SparkQA commented May 30, 2018

Test build #91311 has finished for PR 19959 at commit 4355fae.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 8, 2019

Test build #104400 has finished for PR 19959 at commit 4355fae.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

I guess we should close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants