Skip to content

Investigate python hash libraries #20674

@damccorm

Description

@damccorm

stats.ApproximateUnique has an optional mmh3 dependency [1] (mmh3 is roughly 9xs faster than md5), but if that repository is problematic for users, we should look into alternatives.

Other options: sklearn.utils.murmurhash3_32

  [1]https://github.com/hajimes/mmh3, https://pypi.org/project/mmh3/2.0/

 

cc: [~tvalentyn]

Imported from Jira BEAM-10920. Original Jira may contain additional context.
Reported by: monicadsong.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions