Skip to content

Conversation

@wolfgangmm
Copy link
Member

We are trying to apply more index-related optimizations at compile-time to improve speed and keep the code simple. The downside of this is that queries may need to be recompiled when the index configuration changes.

For now just clear the xquery cache. The performance loss is minimal compared to the benefits of optimized queries.

…eed to refresh compiled queries if a collection configuration has been added or removed: clear query cache when collection configuration gets invalidated.
@adamretter
Copy link
Contributor

Can you explain why the queries have to be compiled? I would not have thought they were coupled to the Collection configuration?

@wolfgangmm
Copy link
Member Author

@shabanovd: ok, we'll keep the public methods for backwards compatibility.

@adamretter: optimizing expressions like =, >, <, contains() and the like requires information about the availability of indexes. I want to move away from runtime optimizations and do more compile-time optimizations, leading to cleaner code generation. The new range index thus optimizes expressions and chooses an execution path at compile-time. The downside is that those decisions may have to be revised when index configuration changes. Given the speed benefits of the new optimizations, the extra cost is ignorable.

@joewiz
Copy link
Member

joewiz commented Feb 6, 2014

@wolfgangmm Are queries compiled when stored and updated, or when first executed? And does this "compile-time optimization" have any ramifications for where .xq files are stored relative to the data collections that are indexed?

@wolfgangmm
Copy link
Member Author

@joewiz Normally queries are compiled when first executed. Some interfaces may compile them ahead of time though.

The storage location doesn't matter. Since 2.0 we assume that a matching index should be used if it is defined and it is the users' responsibility to make sure all relevant collections are indexed. Before 2.0 an index was only used if it was defined on the entire context set, leading to some confusion among users because they did not get the performance they expected without knowing why. The rule: "if there's an index, always use it" seems easier to understand.

shabanovd added a commit that referenced this pull request Feb 6, 2014
Clear query cache when collection config is invalidated
@shabanovd shabanovd merged commit 2a4f66d into eXist-db:develop Feb 6, 2014
@adamretter
Copy link
Contributor

@wolfgangmm But changing the Collection Configuration only changes the index definitions. It does not mean that those indexes are present at all. The user would have also needed to do a re-index. So I think rather you have to consult the available indexes, not the defined indexes. It seems to me that you are not actually solving the problem with this fix, rather you are compounding it? For example, users often define indexes without re-indexing, with your commit they will now get invalid query results.
Or... what am I misunderstanding?

Also if you make this change here, you or I will also have to make the change to the query compilation in RESTXQ to do the same thing.

@wolfgangmm
Copy link
Member Author

@adamretter I indeed first thought about invalidating the cache upon reindex, but then the index would not be used if you store a new collection.xconf for an empty collection and start uploading files.

Well, the configuration features are not completely fixed yet. We will have to see how users cope with the new index. Right now I'd just like to get a release candidate out to get more feedback (before Prague). We can then refine the configuration based on that.

I'm aware that the same change needs to be done for RESTXQ. Because the changes may not be final, I'd like to keep it simple.

@adamretter
Copy link
Contributor

Okay so the same fixes HAVE to go into RESTXQ as well for the release candidate as well. Many people are using it now, it is not okay to just break it. What needs to be done?

@wolfgangmm
Copy link
Member Author

@adamretter I understand that clearing the pre-compiled queries in RESTXQ might not be ideal. I'm struggling to figure out an alternative approach which would rewrite queries on the fly. This leads to more complex code though and is not as elegant as I wished it would be. I'll see how far I get today and report back later.

@shabanovd
Copy link
Member

@wolfgangmm as option we can add UUID to configuration (generate at time of load) and note that id at query to know what cache & optimize for ... and in preuse time check id changed or not and reset that query if yes. (just idea)

@adamretter
Copy link
Contributor

@shabanovd We do not need a UUID, we have the last modified date for that purpose.

@wolfgangmm Clearing compiled queries in RESTXQ could be possible, but I am not sure how you would trigger this.

@shabanovd
Copy link
Member

@adamretter why you dont use "central" xquery caching for RESTXQ?

@wolfgangmm
Copy link
Member Author

The issues discussed above were addressed by pull request #146. No further action required.

@adamretter
Copy link
Contributor

@shabanovd The caching query compiler in RESTXQ is much more intelligent than the "central" one, for example - it also correctly handles the compilation of modules, which the "central" one does not. It was always my intention to replace the "central" one with what I consider to be the more advanced and flexible one that is in RESTXQ.

@shabanovd
Copy link
Member

@adamretter, better to fix main one and use it all around. I do remember that this (handles the compilation of modules) was "disable" because of bug in xquery engine...

@adamretter
Copy link
Contributor

@shabanovd Yes I agree, but I propose fixing the main one by replacing it ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants