Enforce JVM UTF-8 string limits in the backend by SolalPirelli · Pull Request #25300 · scala/scala3

SolalPirelli · 2026-02-18T08:02:48Z

Fixes #15850
Fixes #24597
Subsumes #19622

Enforce UTF-8 limits at bytecode generation time.

SolalPirelli · 2026-02-18T08:03:21Z

compiler/src/dotty/tools/backend/jvm/BCodeBodyBuilder.scala

-    def genConstant(const: Constant): Unit = {
-      (const.tag/*: @switch*/) match {
+    private def genConstant(const: Constant, pos: SrcPos): Unit = {
+      (const.tag: @switch) match {


I don't know why that @switch was commented out but it seems to work fine so might as well

SolalPirelli · 2026-02-18T08:04:18Z

compiler/src/dotty/tools/backend/jvm/BCodeHelpers.scala

      val mirrorMethodName = m.javaSimpleName
+      if !BCodeUtils.checkConstantStringLength(jgensig, mirrorMethodName, mdesc) then
+        report.error("Mirror method signature is too long for the JVM", m.srcPos)
+        return


I don't like having to duplicate this code for mirror generation, IMHO the backend should move towards a model where we first decide which classes and methods we're going to create, including JVM-specific synthetic stuff, and then emit these, but that's for later

SolalPirelli · 2026-02-18T08:05:26Z

compiler/src/dotty/tools/backend/jvm/BCodeUtils.scala


 object BCodeUtils {
+  val MAX_BYTES_PER_UTF8_CONSTANT = 65535
+  /** Checks that the given signature if present, or the concatenation of the given name and descriptor, do not exceed the JVM's UTF-8 text size limits. */


I'm not very happy about having to implement length detection for Java's subset of UTF-8, but we can't reuse ASM's because it's private and catching exceptions to test their message would be both ugly and too late to provide a position in the error...

SolalPirelli · 2026-02-18T08:06:23Z

compiler/test/dotty/tools/vulpix/ParallelTesting.scala

+              |$showErrors
+              |${expected.mkString("Unfulfilled expectations:\n", "\n", "")}
+              |${unexpected.mkString("Unexpected errors:\n", "\n", "")}
+              |""".stripMargin.trim.linesIterator.mkString("\n", "\n", "")


Not directly related but it annoyed me when I put an // error on the wrong line not to have information on what went wrong. This is copied and adapted from the case just above.

I have added that code several times! I just saw it again in this refreshed PR from 2022.
#15624

good to know I'm not the only one... that PR added it for the 2nd case though (expectedErrors == 0), so this is not even a conflict :)

lrytz · 2026-02-18T10:23:40Z

compiler/src/dotty/tools/backend/jvm/PostProcessor.scala

 import dotty.tools.dotc.core.Contexts.*
 import dotty.tools.dotc.core.Decorators.em
+
+import scala.collection.JavaConverters.asScalaIteratorConverter


accidental?

Oops, I missed this file while self-reviewing, thanks for pointing it out

lrytz · 2026-02-18T10:57:50Z

compiler/src/dotty/tools/backend/jvm/BCodeUtils.scala

 object BCodeUtils {
+  val MAX_BYTES_PER_UTF8_CONSTANT = 65535
+  /** Checks that the given signature if present, or the concatenation of the given name and descriptor, do not exceed the JVM's UTF-8 text size limits. */
+  def checkConstantStringLength(sig: String | Null, name: String, desc: String = ""): Boolean = {


I think this method is a bit cryptic, having multiple methods (for signature, name+desc) would make things more clear. Eg, it's weird that we're passing a non-null signature and also a name at some callsites, but then the name is ignored.

I simplified it

lrytz · 2026-02-18T11:00:43Z

compiler/src/dotty/tools/backend/jvm/BCodeUtils.scala

+          if c <= 0x7F then 1
+          else if c <= 0x7FF then 2
+          else if Character.isHighSurrogate(c) || Character.isLowSurrogate(c) then 2
+          else 3


can you provide a reference (in a code comment) to where this is specified?

also the comment above says "can't be more than 4 bytes per char" but here we never have more than 3.

I added details in a comment at the top, which also led me to realize I had forgotten to handle the "null char" case because Java uses "modified UTF-8" which is... weird.

And indeed it's actually 3 bytes per char, I had mixed up char and codepoint in my mind when writing that.

SolalPirelli commented Feb 18, 2026

View reviewed changes

SolalPirelli marked this pull request as ready for review February 18, 2026 08:53

SolalPirelli requested a review from lrytz February 18, 2026 08:53

SolalPirelli assigned lrytz Feb 18, 2026

lrytz reviewed Feb 18, 2026

View reviewed changes

SolalPirelli added 4 commits February 19, 2026 11:14

Hamza's tests

b48c2f5

Fix, extra test for other issue

9a73fd6

Revert unnecessary diff

95c4581

PR feedback

f9abc6d

SolalPirelli force-pushed the solal/jvm-string-max branch from 1c9fd03 to f9abc6d Compare February 19, 2026 12:11

SolalPirelli requested a review from lrytz February 19, 2026 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce JVM UTF-8 string limits in the backend#25300

Enforce JVM UTF-8 string limits in the backend#25300
SolalPirelli wants to merge 4 commits intoscala:mainfrom
dotty-staging:solal/jvm-string-max

SolalPirelli commented Feb 18, 2026

Uh oh!

SolalPirelli Feb 18, 2026

Uh oh!

SolalPirelli Feb 18, 2026

Uh oh!

SolalPirelli Feb 18, 2026

Uh oh!

SolalPirelli Feb 18, 2026

Uh oh!

som-snytt Feb 18, 2026

Uh oh!

SolalPirelli Feb 19, 2026

Uh oh!

lrytz Feb 18, 2026

Uh oh!

SolalPirelli Feb 19, 2026

Uh oh!

lrytz Feb 18, 2026

Uh oh!

SolalPirelli Feb 19, 2026

Uh oh!

lrytz Feb 18, 2026

Uh oh!

SolalPirelli Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

SolalPirelli commented Feb 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments