Skip to content

Conversation

@Swoorup
Copy link
Contributor

@Swoorup Swoorup commented Dec 6, 2020

As requested by @dsyme on twitter.

This adds an RFC for erased union types. First time here writing an RFC so might not have fully thought things through.

The suggestion: fsharp/fslang-suggestions#538 probably needs to be approved first. If everything is fine, happy to continue with the actual implementation.

Rendered

@Swoorup Swoorup changed the title Add RFC for erased union types Add RFC-1092: Support for erased union types Dec 6, 2020
@cartermp cartermp requested a review from dsyme December 9, 2020 01:58

Supporting erased union types in the language allows us to move more type information with the usual advantages this brings:

* They serve as an alternative to function overloading.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that functions can't be overloaded in F#. I think you mean that they add a kind of "function overloading" that you can't accomplish today?

Copy link
Contributor

@isaacabraham isaacabraham Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to method overloading. This is actually a common use case we see more and more these days - creating a static class with static members for overloading.


* They serve as an alternative to function overloading.
* They obey subtyping rules.
* They allow representing subset of protocols as a type without needing to resort to the lowest common denominator like `obj`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DUs allow this today, too. What is the benefit of erased/ad-hoc unions as opposed to existing ones for this point? Perhaps this is worth expounding upon in an example later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I allowed to add examples? I didn't want to step on anyone's toes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either add a comment with your examples, or send a pull request to the pull request branch :)

* They serve as an alternative to function overloading.
* They obey subtyping rules.
* They allow representing subset of protocols as a type without needing to resort to the lowest common denominator like `obj`.
* Types are actually enforced, so mistakes can be caught early.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: DUs

* They obey subtyping rules.
* They allow representing subset of protocols as a type without needing to resort to the lowest common denominator like `obj`.
* Types are actually enforced, so mistakes can be caught early.
* They allow representing more than one type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: DUs

* They allow representing subset of protocols as a type without needing to resort to the lowest common denominator like `obj`.
* Types are actually enforced, so mistakes can be caught early.
* They allow representing more than one type
* Because they are enforced, type information is less likely to become outdated or miss edge-cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: DUs

Comment on lines +29 to +56
```fsharp
let distance(x: (Point|Location), y: (Point|Location)) = ...
```

```fsharp
type RunWork = RunWork of args: string
type RequestProgressUpdate = RequestProgressUpdate of workId: int
type SubscribeProgressUpdate = SubscribeProgressUpdate of receiver: string
type WorkerMessage = (RunWork | RequestProgressUpdate)
type WorkManagerMessage = (RunWork | SubscribeProgressUpdate)

let processWorkerMessage (msg: WorkerMessage) =
match msg with
| :? RunWork as m -> ...
| :? RequestProgressUpdate m -> ...
```

```fsharp
type Username = Username of string
type Password = Password of string
type UserOrPass = (Password | UserName) // UserOrPass is a type alias

// `getUserOrPass` is inferred to `unit -> UserOrPass`
let getUserOrPass () = if (true) then name :> UserOrPass else password :> UserOrPass

// `getUserOrPass2` binding is inferred to `unit -> (UserOrPass | Error)`
let getUserOrPass2 () = if isErr() then err :> (UserOrPass | Error) else getUserOrPass() :> _
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples are fine, but they don't appear to build on any of the motivation. I'd recommend just having a simple code example here that demonstrates "what the feature is all about" and then dive into more examples in appropriate sub-sections.


Alternatively we could just warn when such constructs are used.

* Initial implementation should not allow using static or generic type arguments in erased unions?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should definitely support generics in the first version

Copy link
Contributor Author

@Swoorup Swoorup Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cartermp Things like

let a: list<(string|int>) = ...

would work.

But I am not sure how generics parameter in union can even be supported? As it is erased, the wrapping type cannot be known when creating IL for generics? My knowledge in this area is limited. Am I missing something? Scala 3 does allow it however. Would love some input on how this can be done.

If generics were allowed, would you expect to write in form of

let formUnion<'a, 'b> (test: 'a): ('a|'b) = test :> _

or

let formUnion<'a, 'b when 'a :> ('a |'b) and 'b :> ('a | 'b)> (test: 'a) = test :> _

Copy link
Contributor

@kerams kerams Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the most obvious solution is using the closest common ancestor as the generic type behind the scenes. That would often be obj, causing boxing for any value types among the constituent types. On the other hand, given this type

// wrapping type is System.Object
type IntOrString = (int|string)

ints that you pass around this way will need to be boxed as well, regardless of any generics.

However, if you were to use a proper union to represent IntOrString, you'd still need to allocate... Perhaps boxing should be seen as an unovoidable fact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be foregoing where Erased type is ValueType in situations where if all cases are structs. But F# appears to do box and unbox anyway in case struct is upcasted to ValueType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does https://sharplab.io/#v2:DYLgZgzgPsCmAuACAhiRBlAnhesC2AdAGrLACusAKpgA6wDaAuogLyL1SICMA3IgEwREURkA
It makes sense when you think about it -- individual value types can have various sizes, how would you pack them in an array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Only situation I imagine where this might be bit overhead is when reduction of joins of same type or join of subtype to supertype leads to same type. For example: (int|int) or (IA | I) where I :> IA. But union case are only special cases though

Copy link
Contributor

@dsyme dsyme Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cartermp I'm inclined to say that this feature should not support naked generic type variables, e.g. (int | 'T). This is just not needed to get the utility of this feature and brings in a huge range of issues which we can't solve (e.g. what happens on substitution for T). If you want naked generic type variables you use a tagged union

I believe it should support generic type variables as arguments to a nominal type, e.g. (int | 'T list) but no overlap, so not (int | 'T list | 'U list) , the type never gets "smaller" on instantiation

Swoorup and others added 3 commits December 9, 2020 16:03
Provided an example which shows how operator overloads can be collapsed to a single method
│ C │ │ D │
└───┘ └───┘

type (A|B|I) // equal to type definition for I, since I is supertype of A and B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a subtle distinction that is worth calling out here. Let's assume I has 3 subtypes:

      ┌───┐
      │ I │
      └─┬─┘
   ┌────┴─┬──────┐
 ┌─┴─┐  ┌─┴─┐  ┌─┴─┐
 │ A │  │ B │  | C |
 └───┘  └───┘  └───┘

type (A|B|I) // still equal to I, since I is supertype of A, B, and C
type (A|B) // erased to I but not equal to it because this excludes C

/// example
let shape = Circle(1.0) :> (Circle | Square) // erased type IShape
let what = shape.What // error
let what = (shape :> IShape).What // ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to clarify what "common members" means in this context. For instance..

  1. Arbitrary members declared on the union'd types
type Foo() =
  member __.Prop = "Foo"
type Bar() =
  member __.Prop = "Bar"

let baz : (Foo|Bar) = ...

baz.Prop // not valid

// you need to do this...
match baz with
| :? Foo as f -> f.Prop
| :? Bar as b -> b.Prop
  1. Members declared on common interfaces (as in the existing sample)
// .. snip ..
let what = shape.What // error
let what = (shape :> IShape).What // ok

I think this is the behavior we have today with interfaces on all types, so I don't think you'd expect unions to behave any differently.

  1. Members declared on a common base class
type Base() =
  abstract Prop : string
  default __.Prop = "Base"
type Foo() =
  inherit Base()
  override __.Prop = "Foo"
type Bar() =
  inherit Base()
  override __.Prop = "Bar"

let baz : (Foo|Bar) = ...

baz.Prop // valid

[erasedtype]: #erased-type

The IL wrapping type for `(A | B)` is the _smallest intersection type_ of base
types of `A` and `B`. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to mention how various APIs would be compiled and exposed to other languages. I assume most elements (e.g. properties, fields, return types) would just be given the erased type. For function arguments, there could be a couple options..

  1. The argument types might simply be type erased with some F# metadata indicating the actual union type. The compiler should probably emit some check at the beginning of the method body to throw if the passed argument is not actually included in the union'd type (e.g. when called from C#)

  2. They might be exposed as a group of overloads, as I mentioned at the bottom of this comment. I'd personally advocate for this approach, and it also jibes with the stated goal that erased unions serve as an alternative to function overloading.


abstract member (*) (a: (float|Decision), b:Decision) : LinearExpression =
match a with
| :> float as f -> // float action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not :? instead of :>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. That is the more correct thing to use in this situation. I was referencing some other code that I had done which relied on casting. I would prefer :?.

Copy link
Contributor

@matthewcrews matthewcrews Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big thing for me is providing a clean solution for operator, method, and function overloading. I believe it is a cleaner and more sustainable approach. This would make it easier to provide an API that is as easy to use as Python/R for the data science use cases. I'm guessing it applies in other domains as well.

The maintenance of libraries with large numbers of operator-overloads becomes simpler because the behavior is defined in one place.

# Detailed design
[design]: #detailed-design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section you should first describe the syntax added (for types and expressions and patterns).

Then describe what happens when these are checked, e.g. what makes these valid

  • are type variables allowed ('T | int)
  • would (int | int) give a warning

@dsyme
Copy link
Contributor

dsyme commented Jan 19, 2021

I'm going to merge this to get a linkable RFC then we can address the feedback above in a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants