Skip to content

Conversation

@SpacyRicochet
Copy link
Contributor

@SpacyRicochet SpacyRicochet commented Oct 4, 2018

Checklist

Description

* Remove the top code snippet, as per instructions.
* Remove use of deprecated `characters` property.
* Adds a section about another Bloom filter approach, using only a single hashing function with Swift 4.2's `Hasher`.
* Adds links to some more documentation and a blog post implementing the Bloom filter in this way.
* Adds my name as updater.
@SpacyRicochet SpacyRicochet changed the title [WIP] [Swift 4.2] Bloom Filter [Swift 4.2] Bloom Filter Oct 4, 2018
@SpacyRicochet
Copy link
Contributor Author

References #748.

*Written for Swift Algorithm Club by Jamil Dhanani. Edited by Matthijs Hollemans.*
## Another approach

Another approach to create different hashes of an element for use in the Bloom filter, is to use the same hash function for every iteration, but combine it with different random numbers. This can help, because finding good hashing functions is hard, but combining them is equally non-trivial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is unclear for me.

My understanding of hash functions is that it should be deterministic -- "hello world!" should hash to the same value during the lifetime of a single program.

If you combine the result of the hash function with different random numbers, wouldn't that just be akin to producing random numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Bloom filter, the actual hash isn't important, as long as it is consistent and a proper hash. So if you guarantee that you acquire the hash for the same object in the same way during the lifetime of the Bloom filter, you're fine.

This approach would create random numbers at the initialisation of the Bloom filter, and use them consistently as modifiers on an object's hash, effectively creating a new hash function per random number. So the hash would still be same for object A during the lifetime of that Bloom filter.

Make sense? The linked blog post about it is probably clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're getting at. I read this as "everything time you use the hash function, combine it with a random number."

I propose we change this paragraph to the following:

In the previous section, you learnt about how using multiple different hash functions can help reduce the chance of collisions in the bloom filter. However, good hash functions are difficult to design. A simple alternative to multiple hash functions is to use a set of random numbers.

As an example, let's say a bloom filter wants to hash each element 15 times during insertion. Instead of using 15 different hash functions, you can rely on just one hash function. The hash value can then be offset by 15 different values to form the indices for flipping. This bloom filter would initialize a set of 15 random numbers ahead of time and use these values during each insertion.

Let me know if that sounds fine with you. If this is okay, I'll go ahead and make the change to merge it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the README.

@kelvinlauKL
Copy link
Member

Thanks @SpacyRicochet!

@kelvinlauKL kelvinlauKL merged commit 0e4bd5f into kodecocodes:master Nov 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants