Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions Bloom Filter/BloomFilter.playground/Contents.swift
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
//: Playground - noun: a place where people can play
// last checked with Xcode 9.0b4
#if swift(>=4.0)
print("Hello, Swift 4!")
#endif

public class BloomFilter<T> {
fileprivate var array: [Bool]
private var hashFunctions: [(T) -> Int]
Expand Down Expand Up @@ -54,15 +51,15 @@ public class BloomFilter<T> {

func djb2(x: String) -> Int {
var hash = 5381
for char in x.characters {
for char in x {
hash = ((hash << 5) &+ hash) &+ char.hashValue
}
return Int(hash)
}

func sdbm(x: String) -> Int {
var hash = 0
for char in x.characters {
for char in x {
hash = char.hashValue &+ (hash << 6) &+ (hash << 16) &- hash
}
return Int(hash)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>IDEDidComputeMac32BitWarning</key>
<true/>
</dict>
</plist>
6 changes: 0 additions & 6 deletions Bloom Filter/BloomFilter.playground/timeline.xctimeline

This file was deleted.

27 changes: 26 additions & 1 deletion Bloom Filter/README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,29 @@ public func query(_ value: T) -> Bool {

If you're coming from another imperative language, you might notice the unusual syntax in the `exists` assignment. Swift makes use of functional paradigms when it makes code more consise and readable, and in this case `reduce` is a much more consise way to check if all the required bits are `true` than a `for` loop.

*Written for Swift Algorithm Club by Jamil Dhanani. Edited by Matthijs Hollemans.*
## Another approach

Another approach to create different hashes of an element for use in the Bloom filter, is to use the same hash function for every iteration, but combine it with different random numbers. This can help, because finding good hashing functions is hard, but combining them is equally non-trivial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is unclear for me.

My understanding of hash functions is that it should be deterministic -- "hello world!" should hash to the same value during the lifetime of a single program.

If you combine the result of the hash function with different random numbers, wouldn't that just be akin to producing random numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Bloom filter, the actual hash isn't important, as long as it is consistent and a proper hash. So if you guarantee that you acquire the hash for the same object in the same way during the lifetime of the Bloom filter, you're fine.

This approach would create random numbers at the initialisation of the Bloom filter, and use them consistently as modifiers on an object's hash, effectively creating a new hash function per random number. So the hash would still be same for object A during the lifetime of that Bloom filter.

Make sense? The linked blog post about it is probably clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're getting at. I read this as "everything time you use the hash function, combine it with a random number."

I propose we change this paragraph to the following:

In the previous section, you learnt about how using multiple different hash functions can help reduce the chance of collisions in the bloom filter. However, good hash functions are difficult to design. A simple alternative to multiple hash functions is to use a set of random numbers.

As an example, let's say a bloom filter wants to hash each element 15 times during insertion. Instead of using 15 different hash functions, you can rely on just one hash function. The hash value can then be offset by 15 different values to form the indices for flipping. This bloom filter would initialize a set of 15 random numbers ahead of time and use these values during each insertion.

Let me know if that sounds fine with you. If this is okay, I'll go ahead and make the change to merge it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the README.


```
hash("Hello world!") >> hash(987654321) // would flip bit 8
hash("Hello world!") >> hash(123456789) // would flip bit 2
```

Since Swift 4.2, `Hasher` is now included in the Standard library, which is designed to reduce multiple hashes to a single hash in an efficient manner. This makes combining the hashes trivial.

```
private func computeHashes(_ value: T) -> [Int] {
return randomSeeds.map() { seed in
let hasher = Hasher()
hasher.combine(seed)
hasher.combine(value)
let hashValue = hasher.finalize()
return abs(hashValue % array.count)
}
}
```

If you want to learn more about this approach, you can read about the [Hasher documentation](https://developer.apple.com/documentation/swift/hasher) or Soroush Khanlou's [Swift 4.2 Bloom filter](http://khanlou.com/2018/09/bloom-filters/) implementation.

*Written for Swift Algorithm Club by Jamil Dhanani. Edited by Matthijs Hollemans. Updated by Bruno Scheele.*
4 changes: 2 additions & 2 deletions Bloom Filter/Tests/BloomFilterTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import XCTest
func djb2(_ x: String) -> Int {
var hash = 5381

for char in x.characters {
for char in x {
hash = ((hash << 5) &+ hash) &+ char.hashValue
}

Expand All @@ -16,7 +16,7 @@ func djb2(_ x: String) -> Int {
func sdbm(_ x: String) -> Int {
var hash = 0

for char in x.characters {
for char in x {
hash = char.hashValue &+ (hash << 6) &+ (hash << 16) &- hash
}

Expand Down
24 changes: 20 additions & 4 deletions Bloom Filter/Tests/Tests.xcodeproj/project.pbxproj
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,12 @@
isa = PBXProject;
attributes = {
LastSwiftUpdateCheck = 0720;
LastUpgradeCheck = 0800;
LastUpgradeCheck = 1000;
ORGANIZATIONNAME = "Swift Algorithm Club";
TargetAttributes = {
7B2BBC7F1C779D720067B71D = {
CreatedOnToolsVersion = 7.2;
LastSwiftMigration = 0800;
LastSwiftMigration = 1000;
};
};
};
Expand Down Expand Up @@ -141,14 +141,22 @@
CLANG_CXX_LIBRARY = "libc++";
CLANG_ENABLE_MODULES = YES;
CLANG_ENABLE_OBJC_ARC = YES;
CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES;
CLANG_WARN_BOOL_CONVERSION = YES;
CLANG_WARN_COMMA = YES;
CLANG_WARN_CONSTANT_CONVERSION = YES;
CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES;
CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR;
CLANG_WARN_EMPTY_BODY = YES;
CLANG_WARN_ENUM_CONVERSION = YES;
CLANG_WARN_INFINITE_RECURSION = YES;
CLANG_WARN_INT_CONVERSION = YES;
CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES;
CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES;
CLANG_WARN_OBJC_LITERAL_CONVERSION = YES;
CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR;
CLANG_WARN_RANGE_LOOP_ANALYSIS = YES;
CLANG_WARN_STRICT_PROTOTYPES = YES;
CLANG_WARN_SUSPICIOUS_MOVE = YES;
CLANG_WARN_UNREACHABLE_CODE = YES;
CLANG_WARN__DUPLICATE_METHOD_MATCH = YES;
Expand Down Expand Up @@ -188,14 +196,22 @@
CLANG_CXX_LIBRARY = "libc++";
CLANG_ENABLE_MODULES = YES;
CLANG_ENABLE_OBJC_ARC = YES;
CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES;
CLANG_WARN_BOOL_CONVERSION = YES;
CLANG_WARN_COMMA = YES;
CLANG_WARN_CONSTANT_CONVERSION = YES;
CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES;
CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR;
CLANG_WARN_EMPTY_BODY = YES;
CLANG_WARN_ENUM_CONVERSION = YES;
CLANG_WARN_INFINITE_RECURSION = YES;
CLANG_WARN_INT_CONVERSION = YES;
CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES;
CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES;
CLANG_WARN_OBJC_LITERAL_CONVERSION = YES;
CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR;
CLANG_WARN_RANGE_LOOP_ANALYSIS = YES;
CLANG_WARN_STRICT_PROTOTYPES = YES;
CLANG_WARN_SUSPICIOUS_MOVE = YES;
CLANG_WARN_UNREACHABLE_CODE = YES;
CLANG_WARN__DUPLICATE_METHOD_MATCH = YES;
Expand Down Expand Up @@ -228,7 +244,7 @@
LD_RUNPATH_SEARCH_PATHS = "$(inherited) @executable_path/../Frameworks @loader_path/../Frameworks";
PRODUCT_BUNDLE_IDENTIFIER = swift.algorithm.club.Tests;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 4.0;
SWIFT_VERSION = 4.2;
};
name = Debug;
};
Expand All @@ -240,7 +256,7 @@
LD_RUNPATH_SEARCH_PATHS = "$(inherited) @executable_path/../Frameworks @loader_path/../Frameworks";
PRODUCT_BUNDLE_IDENTIFIER = swift.algorithm.club.Tests;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 4.0;
SWIFT_VERSION = 4.2;
};
name = Release;
};
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>IDEDidComputeMac32BitWarning</key>
<true/>
</dict>
</plist>
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<Scheme
LastUpgradeVersion = "0800"
LastUpgradeVersion = "1000"
version = "1.3">
<BuildAction
parallelizeBuildables = "YES"
Expand Down