Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add note about Hasher in README.
* Adds a section about another Bloom filter approach, using only a single hashing function with Swift 4.2's `Hasher`.
* Adds links to some more documentation and a blog post implementing the Bloom filter in this way.
* Adds my name as updater.
  • Loading branch information
SpacyRicochet committed Oct 4, 2018
commit 849309c5eb5d2ef29918940536cc1b011b665b3c
27 changes: 26 additions & 1 deletion Bloom Filter/README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,29 @@ public func query(_ value: T) -> Bool {

If you're coming from another imperative language, you might notice the unusual syntax in the `exists` assignment. Swift makes use of functional paradigms when it makes code more consise and readable, and in this case `reduce` is a much more consise way to check if all the required bits are `true` than a `for` loop.

*Written for Swift Algorithm Club by Jamil Dhanani. Edited by Matthijs Hollemans.*
## Another approach

Another approach to create different hashes of an element for use in the Bloom filter, is to use the same hash function for every iteration, but combine it with different random numbers. This can help, because finding good hashing functions is hard, but combining them is equally non-trivial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is unclear for me.

My understanding of hash functions is that it should be deterministic -- "hello world!" should hash to the same value during the lifetime of a single program.

If you combine the result of the hash function with different random numbers, wouldn't that just be akin to producing random numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Bloom filter, the actual hash isn't important, as long as it is consistent and a proper hash. So if you guarantee that you acquire the hash for the same object in the same way during the lifetime of the Bloom filter, you're fine.

This approach would create random numbers at the initialisation of the Bloom filter, and use them consistently as modifiers on an object's hash, effectively creating a new hash function per random number. So the hash would still be same for object A during the lifetime of that Bloom filter.

Make sense? The linked blog post about it is probably clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're getting at. I read this as "everything time you use the hash function, combine it with a random number."

I propose we change this paragraph to the following:

In the previous section, you learnt about how using multiple different hash functions can help reduce the chance of collisions in the bloom filter. However, good hash functions are difficult to design. A simple alternative to multiple hash functions is to use a set of random numbers.

As an example, let's say a bloom filter wants to hash each element 15 times during insertion. Instead of using 15 different hash functions, you can rely on just one hash function. The hash value can then be offset by 15 different values to form the indices for flipping. This bloom filter would initialize a set of 15 random numbers ahead of time and use these values during each insertion.

Let me know if that sounds fine with you. If this is okay, I'll go ahead and make the change to merge it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the README.


```
hash("Hello world!") >> hash(987654321) // would flip bit 8
hash("Hello world!") >> hash(123456789) // would flip bit 2
```

Since Swift 4.2, `Hasher` is now included in the Standard library, which is designed to reduce multiple hashes to a single hash in an efficient manner. This makes combining the hashes trivial.

```
private func computeHashes(_ value: T) -> [Int] {
return randomSeeds.map() { seed in
let hasher = Hasher()
hasher.combine(seed)
hasher.combine(value)
let hashValue = hasher.finalize()
return abs(hashValue % array.count)
}
}
```

If you want to learn more about this approach, you can read about the [Hasher documentation](https://developer.apple.com/documentation/swift/hasher) or Soroush Khanlou's [Swift 4.2 Bloom filter](http://khanlou.com/2018/09/bloom-filters/) implementation.

*Written for Swift Algorithm Club by Jamil Dhanani. Edited by Matthijs Hollemans. Updated by Bruno Scheele.*