Skip to content

Conversation

@PVince81
Copy link
Contributor

Description

Skip null groups in group manager, because data inconsistency could cause some backends (LDAP?) to return null groups.

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@felixboehm @jvillafanez @butonic

@mention-bot
Copy link

@PVince81, thanks for your PR! By analyzing the history of the files in this pull request, we identified @DeepDiver1975, @bartv2 and @macjohnny to be potential reviewers.

@PVince81
Copy link
Contributor Author

Steps to reproduce:

Problem can be reproduced with memcache/redis by manually causing a cache inconsistency by meddling with the keys:

  1. Setup LDAP
  2. Setup OC to use Redis as memcache
  3. Create a user "zombie300"
  4. Add "zombie300" in two groups "Box100" and "Box105" (I used "memberUid" mode)
  5. flushall in redis to clear all keys, just in case
  6. PROPFIND as zombie300 on the root as this user: curl -X PROPFIND -u zombie300:zombie300 http://localhost/owncloud/remote.php/webdav/ (this is important to not cause web UI polling to repopulate the cache while testing)
  7. Compute key name on CLI: php -r 'print(md5("groupExists" . "Box105"));' => f083920ef775eb16207405c1c4dc57e9
  8. In Redis CLI, find the: keys *f083920ef775eb16207405c1c4dc57e9* => "6a3a92343f0dc7ff3b548564acfcbb91/LDAP-user_ldap--f083920ef775eb16207405c1c4dc57e9" (you will get a different result depending on OC instance id)
  9. Go to LDAP and delete the group "Box105"
  10. In Redis CLI, delete the key from above: del 6a3a92343f0dc7ff3b548564acfcbb91/LDAP-user_ldap--f083920ef775eb16207405c1c4dc57e9
  11. Redo the PROPFIND

Note: the cache key can also exist if the group does not exist, its contents will be the value "false" in JSON format.

Before the fix:

{"reqId":"HU2rhjW37re7hoEcU0vx","remoteAddr":"127.0.0.1","app":"PHP","message":"Call to a member function getGID() on null at \/srv\/www\/htdocs\/owncloud\/apps\/dav\/lib\/Connector\/Sabre\/Principal.php#152","level":3,"time":"2016-12-23T10:53:02+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/","user":"5adb9728-5d44-1036-95d9-3f869d76dba2"}

After the fix, the warning from this PR is logged:

{"reqId":"eSDUQp4TuoeKtPiYBbS7","remoteAddr":"127.0.0.1","app":"core","message":"Warning: user \"5adb9728-5d44-1036-95d9-3f869d76dba2\" belongs to deleted group: \"Box104\"","level":0,"time":"2016-12-23T10:56:03+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/","user":"5adb9728-5d44-1036-95d9-3f869d76dba2"}

@PVince81
Copy link
Contributor Author

So I'd say this PR is mostly about hardening against such inconsistencies until we figure out why they are happening in the first place. Here's a starting point for LDAP's crappy caching: owncloud/user_ldap#30

Please review @DeepDiver1975 @jvillafanez @PhilippSchaffrath @felixboehm @butonic

if (!is_null($aGroup)) {
$groups[$groupId] = $aGroup;
} else {
\OC::$server->getLogger()->debug('Warning: user "' . $uid . '" belongs to deleted group: "' . $groupId . '"', array('app' => 'core'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not be better instead debug a higher log level? I would suggest 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug log level makes sense, with this code it is a designed functionality to handle users in deleted groups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes oc gracefully handle null groups. So no warning necessary IMO. But starting the debug log line with 'Warning' is weird. Either the admin should do something to fix this or he shouldn't. If he should we need to tell him whit to do to make the warning go away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This situation should never happen. The purpose of the warning is to make it possible to find out about such scenarios where the LDAP server (or our LDAP impl) returns inconsistent results.

I'm fine removing the "Warning: " piece of the string.

Copy link
Contributor

@felixboehm felixboehm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@felixboehm
Copy link
Contributor

Please add test for deleted groups.

@cdamken
Copy link
Contributor

cdamken commented Dec 27, 2016

Please add test for deleted groups.

@SergioBertolinSG Please help here

@PVince81
Copy link
Contributor Author

PVince81 commented Jan 9, 2017

The automated test will need to interact directly with memcache to be able to delete the key in question there... good luck

@PVince81
Copy link
Contributor Author

PVince81 commented Jan 12, 2017

Also skip groups in "search" function that is used by the users page and others: 400baf3

Additionally I searched for usages of "groupManager->get(" and found that some parts of the sharing code doesn't properly check for group existence, so I hardened that as well in 8e7e5a9

  • TODO: add unit tests for group manager changes
  • TODO: add unit tests for sharing code changes
  • TODO: integration tests for deleted group cases with existing shares

@PVince81
Copy link
Contributor Author

Just had a quick test with local DB users and shares: when deleting a group in the users admin page, it automatically deletes all matching shares.

But I think in the case of LDAP the group deletion detection happens with a delay, so it makes sense to harden the sharing code to avoid issues when bumping into shares pointing to deleted groups.
I hope we can find a way to automatically test this with LDAP @davitol

@PVince81
Copy link
Contributor Author

Apparently for LDAP without caching, deleting the group in LDAP has a direct effect in OC and any affected shares disappear in the file listing. However they keep the oc_share entries. This makes the hardening above even more important.
Added test cases here owncloud/user_ldap#35 (comment)

@PVince81
Copy link
Contributor Author

Ewww, with redis memcaching enabled (and on this branch) if I delete a group in LDAP, it is still cached. That's ok. The shared mount still appears but the OCS Share API doesn't return the share anymore. So possibly it's reading from a different cache ?!

@PVince81
Copy link
Contributor Author

I debugged into it and the OCS Share API uses getUserGroups which still returns the deleted group as it was cached. Then later it calls inGroup() which itself has its own cache. But that cache was not populated yet, so it grabbed the value directly from LDAP after the deletion, so it returns false.
So this is the cache discrepancy I described in owncloud/user_ldap#30.

The only thing we can do here is to be prepared to receive inconsistent results. This is what this PR is about.

Ok, then let's continue with the unit tests...

@PVince81
Copy link
Contributor Author

integration tests for deleted group cases with existing shares

@SergioBertolinSG do we have integration tests for shares related to deleted groups with the database backend ? (LDAP cases will have to be done in the LDAP app as stated above)

@PVince81
Copy link
Contributor Author

@jvillafanez @butonic @PhilippSchaffrath I need a new review, more code was added.

@SergioBertolinSG
Copy link
Contributor

@SergioBertolinSG do we have integration tests for shares related to deleted groups with the database backend ? (LDAP cases will have to be done in the LDAP app as stated above)

No, I don't think so. We have one about orphaned shares, this case is be something similar?


// such issue can usually happen when dealing with
// null groups which usually appear with group backend
// caching inconsistencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather log something. It will be easier to debug possible errors or to know what is being ignored.

$share->method('getSharedWith')->willReturn('group2');
$this->assertFalse($this->invokePrivate($this->ocs, 'canAccessShare', [$share]));

$this->groupManager->method('get')->with('group2')->willReturn($group);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some adjustments are needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, forgot to use "groupnull"

->with($path)
->willReturn([$share2]);

$this->invokePrivate($this->manager, 'userCreateChecks', [$share]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no assertions in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No assertion possible here. This is similar to other tests.
Basically we run the method and no exception is thrown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking sebastianbergmann/phpunit-documentation#171 (comment)

A $this->assertNull($this->invokePrivate(....)) might be enough in this case. It's a kind of dummy check that fills the requirements.

@PVince81
Copy link
Contributor Author

No, I don't think so. We have one about orphaned shares, this case is be something similar?

@SergioBertolinSG similar, but not the same. This is about shares that point to deleted groups. Apparently only reproducible with LDAP + caching so far. Would still be good to have tests that confirm that after deleting a group, the matching shares do not appear neither in the PROPFIND nor in the "shared with you" section (OCS Share API query).

@PVince81
Copy link
Contributor Author

@jvillafanez adjusted here c1eea58

Vincent Petry added 8 commits January 17, 2017 09:17
In some cases, data is inconsistent in the oc_share table due to legacy
data. The mount provider might attempt to make it consistent but if the
target group does not exist any more it cannot work. In such case we
simply ignore the exception as it is not critical. Keeping the
exception would break user accounts as they would be unable to use
their filesystem.
@PVince81 PVince81 force-pushed the stable9.1-skipnullgroups branch from c1eea58 to 56ca62a Compare January 17, 2017 08:17
@PVince81
Copy link
Contributor Author

@jvillafanez added assertNull() as requested.

@jvillafanez
Copy link
Member

👍

@PVince81 PVince81 merged commit fe939ed into stable9.1 Jan 17, 2017
@PVince81 PVince81 deleted the stable9.1-skipnullgroups branch January 17, 2017 16:03
PVince81 pushed a commit that referenced this pull request Jan 17, 2017
* Skip null groups in group manager

* Also skip null groups in group manager's search function

* Add more group null checks in sharing code

* Add unit tests for null group safety in group manager

* Add unit tests for sharing code null group checks

* Added tests for null groups handling in sharing code

* Ignore moveShare optional repair in mount provider

In some cases, data is inconsistent in the oc_share table due to legacy
data. The mount provider might attempt to make it consistent but if the
target group does not exist any more it cannot work. In such case we
simply ignore the exception as it is not critical. Keeping the
exception would break user accounts as they would be unable to use
their filesystem.

* Adjust null group handing + tests
@PVince81
Copy link
Contributor Author

Forward port to master: #26956

PVince81 pushed a commit that referenced this pull request Jan 19, 2017
* Skip null groups in group manager (#26871)

* Skip null groups in group manager

* Also skip null groups in group manager's search function

* Add more group null checks in sharing code

* Add unit tests for null group safety in group manager

* Add unit tests for sharing code null group checks

* Added tests for null groups handling in sharing code

* Ignore moveShare optional repair in mount provider

In some cases, data is inconsistent in the oc_share table due to legacy
data. The mount provider might attempt to make it consistent but if the
target group does not exist any more it cannot work. In such case we
simply ignore the exception as it is not critical. Keeping the
exception would break user accounts as they would be unable to use
their filesystem.

* Adjust null group handing + tests

* Fix new group manager tests
@lock
Copy link

lock bot commented Aug 4, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants