Skip to content

Conversation

@sten1ee
Copy link
Contributor

@sten1ee sten1ee commented Nov 1, 2019

Description:

file:sync(collection, targetPath, dateTime?) is now extended to accept another optional arg:
file:sync(collection, targetPath, dateTime?, prune?)
When not specified the function behaves as usual,
When specified and true, sync will take care to delete obsolete files and directories, i.e. it will delete any file/dir that does not correspond to a doc/collection currently in the DB.

Reference:

There is no GitHub issue corresponding to this PR

Type of tests:

There are no tests for the new feature.
I could not find the tests for file:sync anyway, so point me to those, pls. I will build on top of them.

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 1, 2019

@adamretter Could you please 'assign' a person from the eXist team, who is going to answer my questions and help me get this PR accepted!

As a beginning I see these issues:

  1. Appveyor has failed the Win build (due to timeout), someone has to restart it or ... remove the Win build altogether
  2. On the invocation site the new functionality will look like:
file:sync(
             concat($common:data-path, '/', $collection), 
             concat('/', $repo-path, '/', $collection), 
             (),
             true
         )

which is far from informative. Are there 'keyword arguments' in xquery, so that the last arg may be supplied in the form:

         prune: true

"Optional: only resources modified after the given date/time will be synchronized.")
"Optional: only resources modified after the given date/time will be synchronized."),
new FunctionParameterSequenceType("prune", Type.BOOLEAN,
Cardinality.ZERO_OR_ONE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cardinality for this should be EXACTLY_ONE. It doesn't really make sense for this to be optional as it is a boolean value.

Copy link
Contributor Author

@sten1ee sten1ee Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it "optional" with the single purpose that the API remains backward compatible.
Old style invocations of file:sync(...) will still work.
If the prune param is made non-optional, then:

  1. Old-style invocations of that function will be rendered broken (i.e. will have to be revised)
  2. The previous (3rd) param of the function will also need to become non-optional, because (to the best of my understanding) you can't have optional param, followed by non-optional, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not backwards compatible. There is no polymorphism in XQuery. You have to keep the original function signature, and then define additional function signatures with additional arities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, wait, wait.
Then (in order not to break existing code) this extended 'prune' functionality has to come together with a new function, e.g. file:sync_ex(...) or file:sync_prune(...) or something.
Is it not so?

Copy link
Contributor

@adamretter adamretter Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need a new named function as they are on different arities. But you have to design two signatures, the original signature, and the new signature which has an additional argument. The new signature should then be added to the FileModule.java

}
for (final Iterator<DocumentImpl> i = collection.iterator(context.getBroker()); i.hasNext(); ) {
DocumentImpl doc = i.next();
if (startDate == null || doc.getMetadata().getLastModified() > startDate.getTime()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should READ_LOCK the document before reading from it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am assuming you are referring to the collection when you say the document):
As far as I see on line 6 lines up (on line 152) the collection is locked and opened:
try(final Collection collection = context.getBroker().openCollection(collectionPath, LockMode.READ_LOCK))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are calling doc.getMetadata().getLastModified() which is on the document, so you should read-lock the document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am not.
Once again -- this is old code, I have not modified it.
So you see - I am trying to get a feature PR through and then it turns out the code as it was before my PR was not exactly conformant.
I don't know what is the proper way to LOCK a doc in this case (and in general).
Could you suggest the relevant change to the code and I will take it from there.
(You know I've seen GitLab gives the reviewer a way to suggest changes in the code, there should be something equivalent here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try (final ManagedLock lock = broker.getBrokerPool().getLockManager().acquireDocumentLock(docUri, LockMode.READ_LOCK) { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
You've missed one missing final - line 162: DocumentImpl doc = i.next(); ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively -- is it not a better idea (for the sake of 'Atomicity')
to read_lock the whole collection tree that is being sync-ed?

@adamretter
Copy link
Contributor

@sten1ee You can find tests in exist/extensions/modules/file/src/test. You can implement your sync tests in either Java or XQuery.

Copy link
Contributor

@adamretter adamretter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small changes needed.

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 4, 2019

@adamretter : you did not answer my question:
Are there 'keyword arguments' in xquery, so that the last arg may be supplied in the form:
prune: true

And I also have this one:
So, an optional param to a xQuery function means that that param may be 'null', but this does not exempt callers from having to explicitly specify 'null' as an argument for this param?

@adamretter
Copy link
Contributor

Are there 'keyword arguments' in xquery, so that the last arg may be supplied in the form:
prune: true

No.

So, an optional param to a xQuery function means that that param may be 'null', but this not exempt callers from having to explicitly specify 'null' as an argument for this param?

There is no null in XQuery; The closest thing is likely an "empty sequence". So even if the cardinality of the parameter is specified as ZERO_OR_ONE, you still have to specify a value when calling even if it is the empty sequence.

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 5, 2019

@adamretter in the Sync signature:

    new FunctionParameterSequenceType("collection", Type.STRING,
    Cardinality.EXACTLY_ONE, "The collection to sync."),
    new FunctionParameterSequenceType("targetPath", Type.ITEM, 
    Cardinality.EXACTLY_ONE, "The full path or URI to the directory"),

targetPath is declared as Type.ITEM (as opposed to Type.STRING) but is later treated as if it is Type.STRING:

	final String collectionPath = args[0].getStringValue();

	final String target = args[1].getStringValue();
	Path targetDir = FileModuleHelper.getFile(target);

Why is that?

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 5, 2019

I need a (xquery) hand here, in order to craft the test for this sync/4 (sync with prune)
The only sensible XQuery test I can see (so far) is:

declare
    %test:pending("need to mechanism to setup a temporary file to work with")
    %test:assertEquals("data", "true", "true", "true")
function serialization:overwrite() {

    let $node-set := text {"data"}
    let $path := system:get-exist-home() || "/test.txt"
    let $parameters := ()
    let $append := true()
    let $remove := file:delete($path)
    let $ser1 := 	file:serialize($node-set, $path, (), false())
    let $ser2 := 	file:serialize($node-set, $path, (), false())
    let $read := file:read($path)
    let $remove := file:delete($path)
    return ($read, $ser1, $ser2, $remove)
};

I am extremely short in XQuery skills, that's why I need help with the following test scenario:

  1. create a collection C containing two docs: A and B
  2. sync(C)
  3. check both files are in target dir with expected content
  4. update contents of doc A
  5. sync(C ... mode='prune')
  6. check both files are in target dir with expected content
  7. remove/delete doc B from collection C
  8. sync(C ... mode='no_prune')
  9. check both files are in target dir with expected content
  10. sync(C ... mode='prune')
  11. check only file A is in target dir with expected content

Help needed!

@adamretter
Copy link
Contributor

targetPath is declared as Type.ITEM (as opposed to Type.STRING) but is later treated as if it is Type.STRING:

It should likely have been declared as Type.String, however we can't change it easily now, and the getStringValue will likely give the expected result anyway.

@adamretter
Copy link
Contributor

@duncdrum @joewiz any chance either of you could help @sten1ee with some XQuery for his test above?

@duncdrum
Copy link
Contributor

@joewiz can i pass creating an XQsuite test on to you? I'm swamped with more pressing issues at the moment. If so can you assign yourself the ticket.

@duncdrum duncdrum added the needs XQSuite test XQSuite test required to reproduce label Nov 11, 2019
Copy link
Contributor

@duncdrum duncdrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. thank you @sten1ee
We are traditionally hesitant with on-disc delete operations, is there anything preventing me to prune C:\>

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 13, 2019

We are traditionally hesitant with on-disc delete operations
Is there anything preventing me to prune C:\>

You have a point here.
Security-wise I haven't changed anything in the existing code (the one that implemented file:sync but without option to prune).
I think exploit-wise overwriting is even more dangerous than deleting - imagine if file:sync gives you a chance to replace a system file.
OTOH disaster-wise deleting seems to be more dangerous.

Now back to the code:

  • on the eXist side: You can do that (ask to delete/overwrite files in C:>)
  • on the OS side: hopefully, eXist is not running as a user, who is granted the right to do that?

^^^^^
@duncdrum (that's to draw your attention)

@line-o
Copy link
Member

line-o commented Nov 16, 2019

related issue #2627

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 18, 2019

@line-o, are you willing to help this PR get through by authoring the required TC?
Please, see my post from Nov 5th (the ones starting with 'I need a (xquery) hand here')
Perhaps the TC may be simplified -- in general it has to demonstrate that the functionality is absent before that PR and present with it.

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 18, 2019

@duncdrum, perhaps you haven't been notified about my reply to your question about wiping out C:> with 'prune'
Please, check my comment from Nov 13th and tell me what do we do with it - is the option with using OS user rights enough of a 'protection' against this disaster or if it is not -- what else do you suggest ?

@line-o
Copy link
Member

line-o commented Nov 18, 2019

@sten1ee First of all, for me, just adding a prune argument is not enough. I need the ability to exclude files so that .git and other files are left untouched.

@line-o
Copy link
Member

line-o commented Nov 18, 2019

@duncdrum file:sync is able to modify a folder on the file system based on OS permissions. For production systems this should always be a user with limited rights. Write access includes the ability to remove files. Therefore, I see no issue in removing files when syncing to the filesystem.

@joewiz
Copy link
Member

joewiz commented Nov 18, 2019

@line-o's suggestion about a mechanism for excluding file and folder paths and patterns makes a lot of sense, particularly given that we don't always store .git folders or other build artifacts in the database.

@sten1ee Would you be interested in adding this feature to this PR, or in a follow-on PR? I imagine this would take the form of a new $excludes as xs:string* parameter that would take 0-or-more strings containing the filenames or patterns. For a function that implements filename/patterns, see http://exist-db.org/exist/apps/fundocs/index.html?q=xmldb:store-files-from-pattern.

I would be happy to help craft an XQSuite test.

@sten1ee
Copy link
Contributor Author

sten1ee commented Nov 20, 2019

@joewiz I think I can do that, it shouldn't be that much of a work. I only need to coordinate this with (my) management as after all it is 84000 that's paying me.

Apart from that, it'd be very helpful for me if you go ahead and craft some XQSuite test for the already submitted code. Why is that? I am almost sure there will be things to fix in it, problems that I missed to spot without a test.
If you craft the test keeping in mind the 'mechanism for excluding file and folder paths and patterns' that is pending -- it will be quite easy to extend the test to cover this later.

I prefer we get all this done in a single PR, but it's not that important in the end.

@sten1ee
Copy link
Contributor Author

sten1ee commented Dec 12, 2019

@joewiz
I am more or less done (this includes the exclusion patterns functionality.
When will you be able to craft some TCs ?

@sten1ee
Copy link
Contributor Author

sten1ee commented Jan 15, 2020

@joewiz, are you still willing to contribute the XQSite test, so that we can move fwd with that feature?

@joewiz
Copy link
Member

joewiz commented Jan 15, 2020

@sten1ee Sorry for the delay, I haven't forgotten. Just working on some hard deadlines and will hope to be able to supply some tests soon. In the meantime, Duncan smartly added a skeleton XQSuite test to the issue template, so if that provides the scaffolding you needed to add a test, by all means. But if that's not sufficient, I will still follow through on creating tests as soon as I can.

@adamretter adamretter added the awaiting-response requires additional information from submitter label Mar 11, 2020
@line-o line-o added the in progress Issue is actively being worked upon label Apr 15, 2020
@joewiz joewiz self-assigned this Jul 7, 2020
@line-o
Copy link
Member

line-o commented Sep 21, 2020

Just a heads up: @sten1ee your PR was discussed in todays Community Meeting. We are still committed to get it merged, but none of us has the time to prepare a test suite at the moment. As a result, this feature will not make it into the next release, which is around the corner.

@adamretter
Copy link
Contributor

@sten1ee Of course, if you want to send a test-suite yourself, then we can get it merged ASAP.

@joewiz
Copy link
Member

joewiz commented Jan 12, 2021

With #3704, I now have a technique for creating and accessing temporary directories on the filesystem, which is needed for writing tests of file:sync(). We could use the same technique for creating the xqsuite test for this PR. However, I see two challenges:

  1. Some conflicts are present in the PR now and need to be resolved.
  2. In light of [BUG] file:sync doesn't respect conf.xml serializer; hard codes serialization parameters #3704, it's clear to me that file:sync() needs a $parameters argument, mirroring file:serialize(), in order to allow developers to specify serialization parameters when syncing a collection to disk. See http://exist-db.org/exist/apps/fundocs/view.html?uri=http://exist-db.org/xquery/file#serialize.3. Should this $parameters argument come before or after the $prune and $excludes arguments? Or, alternatively, should we consolidate $parameters and $prune, etc. into an $options as map(*)* map, which has become quite standard in XQuery 3.1? It would look like this:
file:sync(
    $source-collection, 
    $destination-directory, 
    map { 
        "dateTime": xs:dateTime("2021-01-12T11:47:08.12-05:00"),
        "prune": true(),
        "excludes": array { "*.png", "*.jpg" },
        "indent": false(),
        "expand-xincludes": false(),
        "process-xsl-pi": false(),
        "encoding": "UTF-8",
        "omit-xml-declaration": true()
    }
)

Before we merge this PR, we'd ideally discuss the direction of parameter handling and make any necessary adjustments. Thoughts?

@joewiz
Copy link
Member

joewiz commented Jan 12, 2021

Also, just to clarify for the purposes of writing tests, I seem to recall that file:sync is one-way - from database to disk - and will not pull changes from disk back to database. Could someone confirm if I'm right?

@joewiz
Copy link
Member

joewiz commented Jan 12, 2021

(Also on the topic of how file:sync serializes, see #2394.)

@line-o line-o added this to the eXist-5.4.0 milestone Jul 1, 2021
@line-o
Copy link
Member

line-o commented Nov 9, 2021

I am in favour of an options parameter instead of separate $prune and $excludes for file:sync.

@line-o line-o mentioned this pull request Nov 10, 2021
@line-o
Copy link
Member

line-o commented Nov 11, 2021

@joewiz Yes, you are right file:sync only allows to sync to disk not from it.
Re: your proposed change to the function signature:

file:sync(
    $source-collection, 
    $destination-directory, 
    map { "dateTime": xs:dateTime("2021-01-12T11:47:08.12-05:00"), ... }
)

I would love to change file:sync#3 to have the options map as the third parameter. But this then means that we have an almost breaking change to the API.
We could allow a xs:dateTime or a map(*) as the third parameter to file:sync#3, but this ODD.
While I really like to do that, maybe creating a separate file:sync#4 with explicit optional xs:dateTime as the third parameter and the options map as the fourth is the safer option.

file:sync(
    $source-collection, 
    $destination-directory,
    xs:dateTime("2021-01-12T11:47:08.12-05:00"), 
    map { "prune": true() , ... }
)

@joewiz
Copy link
Member

joewiz commented Nov 11, 2021

@line-o Great. And I actually like your idea of the hybrid 3rd parameter. We could allow (xs:dateTime | map(*)) at first and immediately deprecate xs:dateTime, flagging it as to-be-deleted in the next major version of eXist.

@line-o
Copy link
Member

line-o commented Nov 13, 2021

@joewiz The first working version with the 3rd parameter accepting either an xs:dateTime or an options map is now in the follow-up PR #4081
I invite @joewiz and @sten1ee to discuss progress there and would propose to close this PR as development has taken a different route.

@dizzzz dizzzz closed this in #4081 Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-response requires additional information from submitter in progress Issue is actively being worked upon needs XQSuite test XQSuite test required to reproduce

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants