This repository was archived by the owner on Jul 4, 2023. It is now read-only.
Python 3.5 Support, Sampler Pipelining, Finer Control of Random State, New Corporate Sponsor
LatestMajor Updates
- Updated my README emoji game to be more ambiguous while maintaining fun and heartwarming vibe. 🐕
- Support for Python 3.5
- Extensive rewrite of README to focus on new users and building an NLP pipeline.
- Support for Pytorch 1.2
- Added
torchnlp.randomfor finer grain control of random state building on PyTorch'sfork_rng. This module controls the random state oftorch,numpyandrandom.
import random
import numpy
import torch
from torchnlp.random import fork_rng
with fork_rng(seed=123): # Ensure determinism
print('Random:', random.randint(1, 2**31))
print('Numpy:', numpy.random.randint(1, 2**31))
print('Torch:', int(torch.randint(1, 2**31, (1,))))- Refactored
torchnlp.samplersenabling pipelining. For example:
from torchnlp.samplers import DeterministicSampler
from torchnlp.samplers import BalancedSampler
data = ['a', 'b', 'c'] + ['c'] * 100
sampler = BalancedSampler(data, num_samples=3)
sampler = DeterministicSampler(sampler, random_seed=12)
print([data[i] for i in sampler]) # ['c', 'b', 'a']- Added
torchnlp.samplers.balanced_samplerfor balanced sampling extending Pytorch'sWeightedRandomSampler. - Added
torchnlp.samplers.deterministic_samplerfor deterministic sampling based ontorchnlp.random. - Added
torchnlp.samplers.distributed_batch_samplerfor distributed batch sampling. - Added
torchnlp.samplers.oom_batch_samplerto sample large batches first in order to force an out-of-memory error. - Added
torchnlp.utils.lengths_to_maskto help create masks from a batch of sequences. - Added
torchnlp.utils.get_total_parametersto measure the number of parameters in a model. - Added
torchnlp.utils.get_tensorsto measure the size of an object in number of tensor elements. This is useful for dynamic batch sizing and fortorchnlp.samplers.oom_batch_sampler.
from torchnlp.utils import get_tensors
random_object_ = tuple([{'t': torch.tensor([1, 2])}, torch.tensor([2, 3])])
tensors = get_tensors(random_object_)
assert len(tensors) == 2- Added a corporate sponsor to the library: https://wellsaidlabs.com/
Minor Updates
- Fixed
snliexample (#84) - Updated
.gitignoreto support Python's virtual environments (#84) - Removed
requestsandpandasdependency. There are only two dependencies remaining. This is useful for production environments. (#84) - Added
LazyLoaderto reduce dependency requirements. (4e84780) - Removed unused
torchnlp.datasets.Datasetclass in favor of basic Python dictionary lists andpandas. (#84) - Support for downloading
tar.gzfiles and unpacking them faster. (eb61fee) - Rename
itosandstoitoindex_to_tokenandtoken_to_indexrespectively. (#84) - Fixed
batch_encode,batch_decode, andenforce_reversiblefortorchnlp.encoders.text(#69) - Fix
FastTextvector downloads (#72) - Fixed documentation for
LockedDropout(#73) - Fixed bug in
weight_drop(#76) stack_and_pad_tensorsnow returns a named tuple for readability (#84)- Added
torchnlp.utils.split_listin favor oftorchnlp.utils.resplit_datasets. This is enabled by the modularity oftorchnlp.random. (#84) - Deprecated
torchnlp.utils.datasets_iteratorin favor of Pythonsitertools.chain. (#84) - Deprecated
torchnlp.utils.shufflein favor oftorchnlp.random. (#84) - Support encoding larger datasets following fixing this issue (#85).
- Added
torchnlp.samplers.repeat_samplerfollowing up on this issue: pytorch/pytorch#15849