Enforce JVM UTF-8 string limits in the backend#25300
Enforce JVM UTF-8 string limits in the backend#25300SolalPirelli wants to merge 4 commits intoscala:mainfrom
Conversation
| def genConstant(const: Constant): Unit = { | ||
| (const.tag/*: @switch*/) match { | ||
| private def genConstant(const: Constant, pos: SrcPos): Unit = { | ||
| (const.tag: @switch) match { |
There was a problem hiding this comment.
I don't know why that @switch was commented out but it seems to work fine so might as well
| val mirrorMethodName = m.javaSimpleName | ||
| if !BCodeUtils.checkConstantStringLength(jgensig, mirrorMethodName, mdesc) then | ||
| report.error("Mirror method signature is too long for the JVM", m.srcPos) | ||
| return |
There was a problem hiding this comment.
I don't like having to duplicate this code for mirror generation, IMHO the backend should move towards a model where we first decide which classes and methods we're going to create, including JVM-specific synthetic stuff, and then emit these, but that's for later
|
|
||
| object BCodeUtils { | ||
| val MAX_BYTES_PER_UTF8_CONSTANT = 65535 | ||
| /** Checks that the given signature if present, or the concatenation of the given name and descriptor, do not exceed the JVM's UTF-8 text size limits. */ |
There was a problem hiding this comment.
I'm not very happy about having to implement length detection for Java's subset of UTF-8, but we can't reuse ASM's because it's private and catching exceptions to test their message would be both ugly and too late to provide a position in the error...
| |$showErrors | ||
| |${expected.mkString("Unfulfilled expectations:\n", "\n", "")} | ||
| |${unexpected.mkString("Unexpected errors:\n", "\n", "")} | ||
| |""".stripMargin.trim.linesIterator.mkString("\n", "\n", "") |
There was a problem hiding this comment.
Not directly related but it annoyed me when I put an // error on the wrong line not to have information on what went wrong. This is copied and adapted from the case just above.
There was a problem hiding this comment.
I have added that code several times! I just saw it again in this refreshed PR from 2022.
#15624
There was a problem hiding this comment.
good to know I'm not the only one... that PR added it for the 2nd case though (expectedErrors == 0), so this is not even a conflict :)
| import dotty.tools.dotc.core.Contexts.* | ||
| import dotty.tools.dotc.core.Decorators.em | ||
|
|
||
| import scala.collection.JavaConverters.asScalaIteratorConverter |
There was a problem hiding this comment.
Oops, I missed this file while self-reviewing, thanks for pointing it out
| object BCodeUtils { | ||
| val MAX_BYTES_PER_UTF8_CONSTANT = 65535 | ||
| /** Checks that the given signature if present, or the concatenation of the given name and descriptor, do not exceed the JVM's UTF-8 text size limits. */ | ||
| def checkConstantStringLength(sig: String | Null, name: String, desc: String = ""): Boolean = { |
There was a problem hiding this comment.
I think this method is a bit cryptic, having multiple methods (for signature, name+desc) would make things more clear. Eg, it's weird that we're passing a non-null signature and also a name at some callsites, but then the name is ignored.
There was a problem hiding this comment.
I simplified it
| if c <= 0x7F then 1 | ||
| else if c <= 0x7FF then 2 | ||
| else if Character.isHighSurrogate(c) || Character.isLowSurrogate(c) then 2 | ||
| else 3 |
There was a problem hiding this comment.
can you provide a reference (in a code comment) to where this is specified?
also the comment above says "can't be more than 4 bytes per char" but here we never have more than 3.
There was a problem hiding this comment.
I added details in a comment at the top, which also led me to realize I had forgotten to handle the "null char" case because Java uses "modified UTF-8" which is... weird.
And indeed it's actually 3 bytes per char, I had mixed up char and codepoint in my mind when writing that.
1c9fd03 to
f9abc6d
Compare
Fixes #15850
Fixes #24597
Subsumes #19622
Enforce UTF-8 limits at bytecode generation time.