Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
253 changes: 253 additions & 0 deletions RFCs/FS-1092-Erased-Union-Types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# F# RFC FS-1092 - Erased Union Types

This RFC covers the detailed proposal for this suggestion. [Erased type-tagged anonymous union types](https://github.com/fsharp/fslang-suggestions/issues/538).

* [ ] Approved in principle
* [x] [Suggestion](https://github.com/fsharp/fslang-suggestions/issues/538)
* [ ] Details: TBD
* [ ] Implementation: [Preliminary Prototype](https://github.com/dotnet/fsharp/pull/10566)


# Summary
[summary]: #summary

Add erased union types as a feature to F#. Erased Union types provide some of the benefits of structural ("duck") typing, within the confines of a nominative type system.

# Motivation
[motivation]: #motivation

Supporting erased union types in the language allows us to move more type information with the usual advantages this brings:

* They serve as an alternative to function overloading.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that functions can't be overloaded in F#. I think you mean that they add a kind of "function overloading" that you can't accomplish today?

Copy link
Contributor

@isaacabraham isaacabraham Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to method overloading. This is actually a common use case we see more and more these days - creating a static class with static members for overloading.

* They obey subtyping rules.
* They allow representing subset of protocols as a type without needing to resort to the lowest common denominator like `obj`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DUs allow this today, too. What is the benefit of erased/ad-hoc unions as opposed to existing ones for this point? Perhaps this is worth expounding upon in an example later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I allowed to add examples? I didn't want to step on anyone's toes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either add a comment with your examples, or send a pull request to the pull request branch :)

* Types are actually enforced, so mistakes can be caught early.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: DUs

* They allow representing more than one type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: DUs

* Because they are enforced, type information is less likely to become outdated or miss edge-cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: DUs

* Types are checked during inheritance, enforcing the Liskov Substitution Principle.

```fsharp
let distance(x: (Point|Location), y: (Point|Location)) = ...
```

```fsharp
type RunWork = RunWork of args: string
type RequestProgressUpdate = RequestProgressUpdate of workId: int
type SubscribeProgressUpdate = SubscribeProgressUpdate of receiver: string
type WorkerMessage = (RunWork | RequestProgressUpdate)
type WorkManagerMessage = (RunWork | SubscribeProgressUpdate)

let processWorkerMessage (msg: WorkerMessage) =
match msg with
| :? RunWork as m -> ...
| :? RequestProgressUpdate m -> ...
```

```fsharp
type Username = Username of string
type Password = Password of string
type UserOrPass = (Password | UserName) // UserOrPass is a type alias

// `getUserOrPass` is inferred to `unit -> UserOrPass`
let getUserOrPass () = if (true) then name :> UserOrPass else password :> UserOrPass

// `getUserOrPass2` binding is inferred to `unit -> (UserOrPass | Error)`
let getUserOrPass2 () = if isErr() then err :> (UserOrPass | Error) else getUserOrPass() :> _
```
Comment on lines +29 to +56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples are fine, but they don't appear to build on any of the motivation. I'd recommend just having a simple code example here that demonstrates "what the feature is all about" and then dive into more examples in appropriate sub-sections.


The definition of operators for types becomes simpler.

```fsharp
type Decision =
// Fields

abstract member (*) (a: float, b: Decision) : LinearExpression =
// member body
abstract member (*) (a: Decision, b: Decision) : LinearExpression =
// member body
```

Becomes

``` fsharp
type Decision =
// Fields

abstract member (*) (a: (float|Decision), b:Decision) : LinearExpression =
match a with
| :> float as f -> // float action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not :? instead of :>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. That is the more correct thing to use in this situation. I was referencing some other code that I had done which relied on casting. I would prefer :?.

Copy link
Contributor

@matthewcrews matthewcrews Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big thing for me is providing a clean solution for operator, method, and function overloading. I believe it is a cleaner and more sustainable approach. This would make it easier to provide an API that is as easy to use as Python/R for the data science use cases. I'm guessing it applies in other domains as well.

| :> Decision as d -> // Decision action
```

The maintenance of libraries with large numbers of operator-overloads becomes simpler because the behavior is defined in one place.

# Detailed design
[design]: #detailed-design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section you should first describe the syntax added (for types and expressions and patterns).

Then describe what happens when these are checked, e.g. what makes these valid

  • are type variables allowed ('T | int)
  • would (int | int) give a warning


## Subtyping rules
[subtyping]: #subtyping-rules

* Erased union are commutative and associative:

```fsharp
(A | B) =:= (B | A)
(A | (B | C)) =:= (( A | B ) | C)
```

*`=:=` implies type equality and interchangable in all context*

* If `A :> C` and `B :> C` then `(A | B) :> C` where `T :> U` implies T is subtype of C;

### Hierarchies in Types
[hierarchy]: #hierarchy-types

For cases where, all cases in the union are disjoint, all cases must be exhaustively checked during pattern matching.
However in situations where one of the case is a supertype of another case, the super type is chosen discarding the derived cases.

For example:
`I` is the base class, which class `A` and class `B` derives from. `C` and `D` subsequently derives from `B`

```fsharp
┌───┐
│ I │
└─┬─┘
┌──┴───┐
┌─┴─┐ ┌─┴─┐
│ A │ │ B │
└───┘ └─┬─┘
┌──┴───┐
┌─┴─┐ ┌─┴─┐
│ C │ │ D │
└───┘ └───┘

type (A|B|I) // equal to type definition for I, since I is supertype of A and B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a subtle distinction that is worth calling out here. Let's assume I has 3 subtypes:

      ┌───┐
      │ I │
      └─┬─┘
   ┌────┴─┬──────┐
 ┌─┴─┐  ┌─┴─┐  ┌─┴─┐
 │ A │  │ B │  | C |
 └───┘  └───┘  └───┘

type (A|B|I) // still equal to I, since I is supertype of A, B, and C
type (A|B) // erased to I but not equal to it because this excludes C

type (A|B|C) // equal to type (A|B), since B is supertype of C
type (A|C) // disjoint as A and C both inherit from I but do not have relationship between each other.
```

## Type inference
[inference]: #type-inference

Erased Union type is explicitly inferred meaning that at least one of the types in an expression must contain the erased union type.

i.e something like the following is invalid:

```fsharp
let intOrString = if true then 1 else "Hello" // invalid
```

However the following is valid:

```fsharp
// inferred to (int|string)
let intOrString = if true then 1 :> (int|string) else "Hello" :> _
```

This respects the rules around where explicit upcasting is required including cases despite where type information being available. Although the latter might change depending on the outcome of [fslang-suggestion#849](https://github.com/fsharp/fslang-suggestions/issues/849)

## Exhaustivity checking
[exhaustivity]: #exhaustivity-checking

If the selector of a pattern match is an erased union type, the match is considered exhaustive if all parts of the erased union are covered. There would be no need for fallback switch.

```fsharp
let prettyPrint (x: (int8|int16|int64|string)) =
match x with
| :? (int8|int16|int64) as y -> prettyPrintNumber y
| :? string as y -> prettyPrintNumber y
```

The above is the same as F# in current form:

```fsharp
let prettyPrint (x: obj) =
match x with
| :? int8 | :? int16 | :? int64 as y -> prettyPrintNumber y
| :? string as y -> prettyPrintNumber y
```

Similarly the following would also be considered exhaustive:

```fsharp
let prettyPrint (x: (int8|int16|int64|string)) =
match x with
| :? System.ValueType as y -> prettyPrintNumber y // int8, int16 and int64 are subtype of ValueType
| :? string as y -> prettyPrintNumber y
```

## Erased Type
[erasedtype]: #erased-type

The IL wrapping type for `(A | B)` is the _smallest intersection type_ of base
types of `A` and `B`. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to mention how various APIs would be compiled and exposed to other languages. I assume most elements (e.g. properties, fields, return types) would just be given the erased type. For function arguments, there could be a couple options..

  1. The argument types might simply be type erased with some F# metadata indicating the actual union type. The compiler should probably emit some check at the beginning of the method body to throw if the passed argument is not actually included in the union'd type (e.g. when called from C#)

  2. They might be exposed as a group of overloads, as I mentioned at the bottom of this comment. I'd personally advocate for this approach, and it also jibes with the stated goal that erased unions serve as an alternative to function overloading.


```fsharp
// wrapping type is System.Object
type IntOrString = (int|string)
// wrapping type is System.ValueType
type IntOrString = (int8|int16|float)
type I = interface end
type A = inherit I
type B = inherit I
// I is the wrapping type
type AorB = (A|B)

type I2 = interface end
type C = inherit I inherit I2
type D = inherit I inherit I2
// Both I or I2 could be potential wrapping type. The compiler would choose I2 since its the earliest ancestor
type CorD = (C|D)
```

# Drawbacks
[drawbacks]: #drawbacks

TBD

# Alternatives
[alternatives]: #alternatives

TBD

# Unresolved questions
[unresolved]: #unresolved-questions

* Initial implementation should not allow for using uom in erased unions when the underlying primitive is already part of union ?

```fsharp
type [<Measure>] userid
type UserId = int<userid>
type IntOrUserId = (int|UserId)
```

Alternatively we could just warn when such constructs are used.

* Initial implementation should not allow using static or generic type arguments in erased unions?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should definitely support generics in the first version

Copy link
Contributor Author

@Swoorup Swoorup Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cartermp Things like

let a: list<(string|int>) = ...

would work.

But I am not sure how generics parameter in union can even be supported? As it is erased, the wrapping type cannot be known when creating IL for generics? My knowledge in this area is limited. Am I missing something? Scala 3 does allow it however. Would love some input on how this can be done.

If generics were allowed, would you expect to write in form of

let formUnion<'a, 'b> (test: 'a): ('a|'b) = test :> _

or

let formUnion<'a, 'b when 'a :> ('a |'b) and 'b :> ('a | 'b)> (test: 'a) = test :> _

Copy link
Contributor

@kerams kerams Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the most obvious solution is using the closest common ancestor as the generic type behind the scenes. That would often be obj, causing boxing for any value types among the constituent types. On the other hand, given this type

// wrapping type is System.Object
type IntOrString = (int|string)

ints that you pass around this way will need to be boxed as well, regardless of any generics.

However, if you were to use a proper union to represent IntOrString, you'd still need to allocate... Perhaps boxing should be seen as an unovoidable fact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be foregoing where Erased type is ValueType in situations where if all cases are structs. But F# appears to do box and unbox anyway in case struct is upcasted to ValueType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does https://sharplab.io/#v2:DYLgZgzgPsCmAuACAhiRBlAnhesC2AdAGrLACusAKpgA6wDaAuogLyL1SICMA3IgEwREURkA
It makes sense when you think about it -- individual value types can have various sizes, how would you pack them in an array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Only situation I imagine where this might be bit overhead is when reduction of joins of same type or join of subtype to supertype leads to same type. For example: (int|int) or (IA | I) where I :> IA. But union case are only special cases though

Copy link
Contributor

@dsyme dsyme Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cartermp I'm inclined to say that this feature should not support naked generic type variables, e.g. (int | 'T). This is just not needed to get the utility of this feature and brings in a huge range of issues which we can't solve (e.g. what happens on substitution for T). If you want naked generic type variables you use a tagged union

I believe it should support generic type variables as arguments to a nominal type, e.g. (int | 'T list) but no overlap, so not (int | 'T list | 'U list) , the type never gets "smaller" on instantiation


```fsharp
type StringOr<'a> = ('a | string)
```

* Initial implementation should not allow for common members of the erased unions to be exposed without upcasting?

```fsharp
type IShape =
abstract member What: string

type Circle =
| Circle of r: float
interface IShape with
member _.What = "Circle"

type Square =
| Square of l: float
interface IShape with
member _.What = "Square"

/// example
let shape = Circle(1.0) :> (Circle | Square) // erased type IShape
let what = shape.What // error
let what = (shape :> IShape).What // ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to clarify what "common members" means in this context. For instance..

  1. Arbitrary members declared on the union'd types
type Foo() =
  member __.Prop = "Foo"
type Bar() =
  member __.Prop = "Bar"

let baz : (Foo|Bar) = ...

baz.Prop // not valid

// you need to do this...
match baz with
| :? Foo as f -> f.Prop
| :? Bar as b -> b.Prop
  1. Members declared on common interfaces (as in the existing sample)
// .. snip ..
let what = shape.What // error
let what = (shape :> IShape).What // ok

I think this is the behavior we have today with interfaces on all types, so I don't think you'd expect unions to behave any differently.

  1. Members declared on a common base class
type Base() =
  abstract Prop : string
  default __.Prop = "Base"
type Foo() =
  inherit Base()
  override __.Prop = "Foo"
type Bar() =
  inherit Base()
  override __.Prop = "Bar"

let baz : (Foo|Bar) = ...

baz.Prop // valid

```

* Should exhaustive check in instance clause be implemented in normal circumstances? https://github.com/dotnet/fsharp/issues/10615