Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix(awk): honor POSIX statement continuation across newlines (#206)
The awk lexer emitted a NEWLINE token unconditionally when it saw `\n`,
even when the previous token was one of the continuation-allowing tokens
POSIX awk specifies (`,`, `{`, `&&`, `||`, `?`, `:`, `do`, `else`, `if`,
`while`). Common multi-line idioms like

    printf "%s=%d\n",
      $1, $2

(comma at end-of-line followed by indented args on the next line) parsed
as two separate statements with a stray NEWLINE in the middle, surfacing
as "Unexpected token: NEWLINE" — even though gawk, mawk, and the BSD
one-true-awk all accept this form.

The lexer already tracks `lastTokenType` as an instance property, so the
fix is a small inject in `nextToken`: when the next character is `\n`
and `lastTokenType` is in `CONTINUES_ACROSS_NEWLINE`, swallow the
newline and recurse to the next real token instead of emitting one.

Adds eight regression tests covering each continuation-allowing token
plus the TSV → SQL INSERT printf idiom that motivated the fix.
  • Loading branch information
subsetpark authored and cramforce committed Apr 29, 2026
commit f34750d7ae6df2190d6ab57dede1fd6509f40c94
11 changes: 11 additions & 0 deletions .changeset/awk-comma-continuation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
"just-bash": patch
---

Fix awk lexer to honor POSIX statement continuation across newlines after `,`,
`{`, `&&`, `||`, `?`, `:`, `do`, `else`, `if`, and `while`. Previously, a
multi-line idiom like `printf "%s=%d\n", \n $1, $2` (comma at end-of-line
followed by indented args on the next line) failed with `Unexpected token:
NEWLINE` because the lexer emitted a NEWLINE token unconditionally. The
lexer now suppresses the NEWLINE when it immediately follows one of the
continuation-allowing tokens, matching POSIX awk.
99 changes: 99 additions & 0 deletions packages/just-bash/src/commands/awk/awk.parsing.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,105 @@ describe("awk parsing", () => {
});
});

// POSIX awk specifies that a newline immediately following a comma (or
// `{`, `&&`, `||`, `?`, `:`, or the keywords `do`/`else`/`if`/`while`)
// does not terminate a statement. Without this, common multi-line
// idioms like `printf "...", \n $1, $2` fail with
// `Unexpected token: NEWLINE`. The lexer suppresses the NEWLINE
// token after these continuation-allowing tokens.
describe("statement continuation across newlines", () => {
it("continues across a newline after a comma in printf args", async () => {
const env = new Bash();
const result = await env.exec(`echo "" | awk 'BEGIN {
printf "%s=%d\\n",
"answer", 42
}'`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe("answer=42\n");
});

it("continues across a newline after a comma in function-call args", async () => {
const env = new Bash();
const result = await env.exec(`echo "abc" | awk '{
print substr($0,
1,
2)
}'`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe("ab\n");
});

it("continues across a newline after &&", async () => {
const env = new Bash();
const result = await env.exec(`echo "" | awk 'BEGIN {
if (1 == 1 &&
2 == 2) print "ok"
}'`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe("ok\n");
});

it("continues across a newline after ||", async () => {
const env = new Bash();
const result = await env.exec(`echo "" | awk 'BEGIN {
if (0 ||
1) print "ok"
}'`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe("ok\n");
});

it("continues across a newline after { (block opening)", async () => {
const env = new Bash();
const result = await env.exec(`echo "" | awk 'BEGIN {
x = 1
print x
}'`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe("1\n");
});

it("continues across a newline after else", async () => {
const env = new Bash();
const result = await env.exec(`echo "" | awk 'BEGIN {
if (0) print "no"
else
print "yes"
}'`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe("yes\n");
});

it("continues across a newline after a ternary ? operator", async () => {
const env = new Bash();
const result = await env.exec(`echo "" | awk 'BEGIN {
print 1 ?
"yes" :
"no"
}'`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe("yes\n");
});

// Regression: TSV → SQL INSERT generation idiom — the most common
// shape that motivated this fix.
it("handles TSV-to-SQL printf idiom with comma-newline continuation", async () => {
const env = new Bash({
files: { "/in.tsv": "vendor\tamount\nAcme\t100\nGlobex\t200\n" },
cwd: "/",
});
const result = await env.exec(`awk -F'\\t' 'NR > 1 {
printf "INSERT INTO t VALUES ('"'"'%s'"'"', %d);\\n",
$1, $2
}' /in.tsv`);
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe(
"INSERT INTO t VALUES ('Acme', 100);\n" +
"INSERT INTO t VALUES ('Globex', 200);\n",
);
});
});

describe("string parsing", () => {
it("should handle escaped quotes in string", async () => {
const env = new Bash();
Expand Down
38 changes: 37 additions & 1 deletion packages/just-bash/src/commands/awk/lexer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,28 @@ function expandPosixClasses(pattern: string): string {
.replace(/\[\[:cntrl:\]\]/g, "[\\x00-\\x1f\\x7f]");
}

/**
* Tokens after which a newline does not terminate a statement. Mirrors POSIX
* awk's grammar — the newline is whitespace when it immediately follows any
* of these. Without this, multi-line idioms like
* printf "%s=%d\n",
* $1, $2
* (comma at EOL, args on the next indented line) trip with
* "Unexpected token: NEWLINE".
*/
const CONTINUES_ACROSS_NEWLINE: ReadonlySet<TokenType> = new Set<TokenType>([
TokenType.COMMA,
TokenType.LBRACE,
TokenType.AND,
TokenType.OR,
TokenType.QUESTION,
TokenType.COLON,
TokenType.DO,
TokenType.ELSE,
TokenType.IF,
TokenType.WHILE,
]);

export class AwkLexer {
private input: string;
private pos = 0;
Expand Down Expand Up @@ -214,9 +236,23 @@ export class AwkLexer {
const startColumn = this.column;
const ch = this.peek();

// Newline
// Newline. POSIX awk specifies that a statement continues across a
// newline that immediately follows certain tokens — most notably `,`,
// but also `{`, `&&`, `||`, `?`, `:`, and the keywords `do`, `else`,
// `if`, `while`. Without this, a perfectly POSIX-compliant program
// like `printf "%s=%d\n", $1, $2` written across multiple lines
// (a comma at end-of-line followed by indented args) parses as two
// statements with a NEWLINE in the middle, surfacing as
// `Unexpected token: NEWLINE`. We swallow the newline and recurse to
// the next real token when it follows a continuation-allowing token.
if (ch === "\n") {
this.advance();
if (
this.lastTokenType !== null &&
CONTINUES_ACROSS_NEWLINE.has(this.lastTokenType)
) {
return this.nextToken();
}
return {
type: TokenType.NEWLINE,
value: "\n",
Expand Down