Skip to content

Conversation

@danieldk
Copy link
Owner

Transitions is a wrapper of Numberer that has several additional
properties that are useful for transition tables.

  • It insures that the identifier 0 for unknown transitions.
  • It returns the correct length of the transition table, that includes
    the special identifier 0.
  • The table can be both fresh and frozen. A fresh table automatically
    adds transitions that are not known. A frozen table returns the
    special identifier 0 when a transition is now known.

For future consideration: provide a thaw method as well?

Copy link
Collaborator

@twuebi twuebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the solution with the two variants to avoid the indirection through the box at inference time. I am wondering though why it is a specific struct for transitions. In features::lookup there exist similar structs that could be merged with this implementation. If I didn't miss something Transitions is already generic over the same traits Numberer constrains and does not limit itself to structs that implement the Transition trait.

For future consideration: provide a thaw method as well?

I'd say as long as this struct is used for Transitions only it is probably unproblematic to use one-way freeze

/// the table becomes immutable. Lookups of transitions that are not in the
/// table will result in a special index (`0`).
#[derive(Deserialize, Eq, PartialEq)]
pub enum Transitions<T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransitionLookupTable might be a more descriptive name, with Transitions::xxx I'd expect transitions and not a lookup-table. (TransitionTable is unfortunately ambiguous too)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll do a big rename tomorrow. Thanks for the review!

@danieldk
Copy link
Owner Author

danieldk commented Oct 25, 2018

I am wondering though why it is a specific struct for transitions. In features::lookup there exist similar structs that could be merged with this implementation.

I agree. I considered doing that. However, table lookups from lookup currently use a different procedure for serialization (that I am not happy with anymore). They can be merged, but that entails rewriting the deserialization/serialization code (stored_table) first. And that is kind of a detour from the goal of these sequence of PRs to move training from Python to Rust. But it's definitely a very valid point. Numberer may not even be necessary anymore if every use of it uses 0 as a special value and starts counting at 1.

I can create an issue for merging Transitions/LookupTable/MutableLookupTable, so that we don't forget.

TransitionLookup is a wrapper of Numberer that has several additional
properties that are useful for transition tables.

- It insures that the identifier 0 for unknown transitions.
- It returns the correct length of the transition table, that includes
  the special identifier 0.
- The table can be both fresh and frozen. A fresh table automatically
  adds transitions that are not known. A frozen table returns the
  special identifier 0 when a transition is now known.

For future consideration: provide a thaw method as well?
@danieldk
Copy link
Owner Author

Changes:

  • I renamed Transitions to TransitionLookup.
  • I have implemented PartialEq then using derive, since we want fresh/frozen lookups with the same transitions to compare equal.
  • Added a unit test with a serialization/deserialization roundtrip.

Copy link
Collaborator

@twuebi twuebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment

/// If the the transition is not in the table, it is added for a
/// fresh table. Frozen tables will return the special identifier
/// `0` in such cases.
pub fn lookup(&self, t: T) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ownership of t is only required in case of a fresh lookup table where t is added to the numberer. Again, when dealing with Transition Enums probably not really a problem. If we should generalize the struct later on to e.g. number strings we might consider adding to_owned or clone to the trait bounds so that we can pass in a &T and only convert it when necessary.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we should generalize the struct later on to e.g. number strings we might consider adding to_owned or clone to the trait bounds so that we can pass in a &T and only convert it when necessary.

I agree that this would be nice. IIRC I decided use moves in Numberer::add to avoid two lookups when the value is absent and HashMaps entry method requires an owned value. But since in transition and feature lookups, the lookups succeed in the vast majority of cases, doing an get (with a reference) + insert when absent would be more economical. There was an RFC to make entry work with borrowed values, but it seems to be stranded:

rust-lang/rfcs#1769

I'll add this as a todo item to the 'one lookup to rule them all'-issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also consider moving the numberer / lookup into a separate crate.

For the lemmatizer, I had to implement the same functionality and dealt with similar issues. If there'd be a crate we might spare some people a lot of unnecessary worries and work.

@danieldk danieldk merged commit 37e5d28 into master Oct 26, 2018
@danieldk danieldk deleted the transitions branch October 26, 2018 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants