Josh Goldberg

TypeScript Contribution Diary: Allowing Code in Constructors Before super() (Technical Overview)

Mar 7, 202225 minute read

More technical descriptions around allowing derived classes with properties to include code before `super()` call that doesn't touch `this`.

This contribution diary post is much longer than normal because its subject matter is deeper. It also assumes you’ve read through previous entries and/or are already familiar with how JavaScript compilers and type checkers work. If that’s not the case, no worries! Read through a previous entry such as TypeScript Contribution Diary: Improved Syntax Error for Enum Member Colons and Andrew Branch’s Debugging the TypeScript Codebase.

My previous TypeScript Contribution Diary posts were structured as stories explaining the timeline of how those changes made it in. This entry’s pull request had 159 comments over three years — far too many for that format. I’ll instead give a high-level overview of the backing issue’s context, the pull request’s strategy, and general code changes.

Project Scope

There ended up being two areas of source code I had to change:

I’ll give a high-level overview for each. I’d strongly recommend referring back to the pull request in your local editor to understand the flow of code.

✨ #29374: Allowed non-this, non-super code before super call in derived classes with property initializers ✨

Let’s dig in! 🎂


Updating the Type Checker

Most use cases for including non-this, non-super code in the constructor of a derived class are fairly small. The ones I’d seen in the wild were generally about logging and/or creating a temporary variable to be passed as an argument to the super() call. I also didn’t want to spend a great deal of time to handle complicated logical cases.

Thus, I thought it’d be best to tweak TypeScript’s type system logic without overhauling it. Instead of requiring the super() call be the first expression in the constructor, I would make two requirements:

You can see the changes in the pull request’s src/compiler/checker.ts file view. These next two blog post sections will give a high-level overview of them.

Checking for a Root Level super()

src/compiler/checker.ts#34739

TypeScript’s type checker already found the first super() call in a constructor using a call to an existing findFirstSuperCall function:

const superCall = findFirstSuperCall(node.body!);

That function returns the first node that matches isSuperCall, skipping any function boundary and recursively searching through all other child nodes:

function findFirstSuperCall(node: Node): SuperCall | undefined {
	return isSuperCall(node)
		? node
		: isFunctionLike(node)
		? undefined
		: forEachChild(node, findFirstSuperCall);
}

I fortunately didn’t need to change findFirstSuperCall for my changes.

I used the existing superCall variable for a check to make sure it was root level with a new superCallIsRootLevelInConstructor function:

if (!superCallIsRootLevelInConstructor(superCall, node.body!)) {
	error(
		superCall,
		Diagnostics.A_super_call_must_be_a_root_level_statement_within_a_constructor_of_a_derived_class_that_contains_initialized_properties_parameter_properties_or_private_identifiers,
	);
}

superCallIsRootLevelInConstructor checks whether a super() call expression’s parent expression statement is in the body of a constructor:

function superCallIsRootLevelInConstructor(superCall: Node, body: Block) {
	const superCallParent = walkUpParenthesizedExpressions(superCall.parent);
	return (
		isExpressionStatement(superCallParent) && superCallParent.parent === body
	);
}

To recap TypeScript’s AST behavior around call statements:

I find it easier to remember the distinction by recalling that statements may optionally have a semicolon. In codebases that include semicolons, expression statements contain a child such as a binary expression or call expression plus one character for a semicolon:

super();
|------| <- expression statement
|-----| <- call expression

Checking Constructor Statement Order

src/compiler/checker.ts#34754

Next up was making sure nothing in the constructor accessed super or this before the super() call. I did that with a for loop over the statements in the constructor. For each statement:

  1. If the statement is an expression statement that contains a super() call, mark that we found it and break the loop
  2. If the statement is a “prologue directive”, continue
  3. If the statement “immediately” references super or this, break the loop
for (const statement of node.body!.statements) {
	if (
		isExpressionStatement(statement) &&
		isSuperCall(skipOuterExpressions(statement.expression))
	) {
		superCallStatement = statement;
		break;
	}

	if (
		!isPrologueDirective(statement) &&
		nodeImmediatelyReferencesSuperOrThis(statement)
	) {
		break;
	}
}

After the loop, if we hadn’t found the super() call, issue a type error with an amusingly long error message for failing to find it.

if (superCallStatement === undefined) {
	error(
		node,
		Diagnostics.A_super_call_must_be_the_first_statement_in_the_constructor_to_refer_to_super_or_this_when_a_derived_class_contains_initialized_properties_parameter_properties_or_private_identifiers,
	);
}

“A super call must be the first statement in the constructor to refer to super or this when a derived class contains initialized properties parameter properties or private identifiers.”

Prologue Directives

I had never heard of this term before this pull request. It refers to string literals used as a statements such as "use asm;" and "use strict";. They are by nature allowed to come before any code in a constructor.

In retrospect, I don’t recall why I added a special case for them to the function. Ah well.

Edit 4/13/2022: The ECMAScript Spec refers to them as “Directive Prologues”. Whoops.

Immediately Referencing super or this

By “immediately” I mean a node accesses super or this in code that is known to execute immediately, such as children of expressions and blocks. Another way of putting that is ignoring any code that won’t be immediately executed, such as function or property declaration. There are a lot of edge cases in there! For example, a class extends clause immediately executes the base class being extended, but initial values for properties in any class aren’t used in runtime until the constructor for their class is called.

class Base {}
class Derived extends Base {
	constructor() {
		// class Middle { ... } executes immediately for Inside to extend it...
		class Inside extends class Middle {
			// ...while this property is created later, per-instance
			woweeMiddle = this;
		} {
			// ...while this property is created later, per-instance
			woweeInside = this;
		}

		super();

		new Inside();
	}
}

I wrote a nodeImmediatelyReferencesSuperOrThis helper function that, similar to findFirstSuperCall, recursively checks children of a node. It stops searching when it encounters a node that creates a new class scope or delays execution of its contents, such as a function or class property.

function nodeImmediatelyReferencesSuperOrThis(node: Node): boolean {
	if (
		node.kind === SyntaxKind.SuperKeyword ||
		node.kind === SyntaxKind.ThisKeyword
	) {
		return true;
	}

	if (isThisContainerOrFunctionBlock(node)) {
		return false;
	}

	return !!forEachChild(node, nodeImmediatelyReferencesSuperOrThis);
}

/**
 * @returns Whether the node creates a new 'this' scope for its children.
 */
export function isThisContainerOrFunctionBlock(node: Node): boolean {
	switch (node.kind) {
		// Arrow functions use the same scope, but may do
		// so in a "delayed" manner
		// For example, `const getThis = () => this` may be
		// before a super() call in a derived constructor
		case SyntaxKind.ArrowFunction:
		case SyntaxKind.FunctionDeclaration:
		case SyntaxKind.FunctionExpression:
		case SyntaxKind.PropertyDeclaration:
			return true;
		case SyntaxKind.Block:
			switch (node.parent.kind) {
				case SyntaxKind.Constructor:
				case SyntaxKind.MethodDeclaration:
				case SyntaxKind.GetAccessor:
				case SyntaxKind.SetAccessor:
					// Object properties can have computed names;
					// only method-like bodies start a new scope
					return true;
				default:
					return false;
			}
		default:
			return false;
	}
}

With these approximate type checker changes, the type checker allows for code before the super() call as long as it doesn’t immediately reference super or this. The type checker was sufficiently updated for my changes. Hooray!

That leaves us with making TypeScript’s code emit properly transform JavaScript for these new constructor variants.

Updating Transformers

TypeScript’s code emit converts input TypeScript syntax to output JavaScript syntax by passing each input AST through a series of transformers. You can see the impacted transformers in the pull request under src/transformers. They’re coordinated by a getScriptTransformers in src/compiler/transformer.ts.

The transformers relevant to this pull request are, in order:

  1. transformTypeScript: Removes type system specific syntax, leaving pure glorious JavaScript.
  2. transformClassFields: Massages class fields such as class properties and parameter properties into their JavaScript equivalents.
  3. transformES....: For each language version recognized by TypeScript, a transformer of the next language version’s name transforms it.
    • These start at ESNext, then decrease sequentially from the newest known language version down to the configured output target language version.
    • For example, if the configured output language version is "es2019", then as of TypeScript 4.6 the transformers to be run would be: transformESNext, transformES2021, and transformES2020.

Transformers generally recursively crawl through the nodes in the file’s AST, applying transformations to specific node types as they find them. These next three blog post sections will give a high-level overview of each of the changed transformers.

transformTypeScript

src/compiler/transformers/ts.ts

transformTypeScript includes a transformConstructorBody function that turns any parameter properties into assignments within the constructor.

For example, this TypeScript class:

class HasParameterProperty {
	constructor(public property: number) {
		console.log("Hello, world!");
	}
}

…would become this JavaScript class (or the equivalent with Object.defineProperty if useDefineForClassFields is enabled):

class HasParameterProperty {
	constructor(property) {
		this.property = property;
		console.log("Hello, world!");
	}
}

transformTypeScript previously assumed it could add both prologue directives and the initial super call all at once when transforming a constructor with nothing between them. It did so with a function named addPrologueDirectivesAndInitialSuperCall that returned the index of the first statement after them.

I replaced that function with code that computed two important variables:

  1. indexAfterLastPrologueStatement: After copying any prologue statements, the index of the node just after them
  2. superStatementIndex: Index of the first found super() call after prologue statements, or -1 if not found
const indexAfterLastPrologueStatement = factory.copyPrologue(
	body.statements,
	statements,
	/*ensureUseStrict*/ false,
	visitor,
);

const superStatementIndex = findSuperStatementIndex(
	body.statements,
	indexAfterLastPrologueStatement,
);
function findSuperStatementIndex(
	statements: NodeArray<Statement>,
	indexAfterLastPrologueStatement: number,
) {
	for (let i = indexAfterLastPrologueStatement; i < statements.length; i += 1) {
		const statement = statements[i];

		if (getSuperCallFromStatement(statement)) {
			return i;
		}
	}

	return -1;
}

Using those two variables, this is the order the code now takes to create the transformed constructor’s body in the proper order:

  1. If superStatementIndex was found, first visit existing statements up to and including it
  2. Visit any parameter properties and map them into nodes:
    • If superStatementIndex was found, place those parameter properties immediately after it
    • If superStatementIndex wasn’t found, place the parameter properties first in the constructor
  3. Add any remaining statements from the body, skipping the superStatementIndex index if it was found
// If there was a super call, visit existing statements up to and including it
if (superStatementIndex >= 0) {
	addRange(
		statements,
		visitNodes(
			body.statements,
			visitor,
			isStatement,
			indexAfterLastPrologueStatement,
			superStatementIndex + 1 - indexAfterLastPrologueStatement,
		),
	);
}

// Transform parameters into property assignments. Transforms this:
//
//  constructor (public x, public y) {
//  }
//
// Into this:
//
//  constructor (x, y) {
//      this.x = x;
//      this.y = y;
//  }
//
const parameterPropertyAssignments = mapDefined(
	parametersWithPropertyAssignments,
	transformParameterWithPropertyAssignment,
);

// If there is a super() call, the parameter properties go immediately after it
if (superStatementIndex >= 0) {
	addRange(statements, parameterPropertyAssignments);
}
// Since there was no super() call, parameter properties are the first statements in the constructor
else {
	statements = addRange(parameterPropertyAssignments, statements);
}

// Add remaining statements from the body, skipping the super() call if it was found
addRange(
	statements,
	visitNodes(body.statements, visitor, isStatement, superStatementIndex + 1),
);

transformClassFields

src/compiler/transformers/classFields.ts

transformClassFields also contains a transformConstructorBody function. This time it’s used to turn class properties into assignments within the constructor.

For example, this TypeScript class:

class HasClassProperty {
	property = 1;
	constructor() {
		console.log("Hello, world!");
	}
}

…would become this JavaScript class (or the equivalent with Object.defineProperty if useDefineForClassFields is enabled):

class HasClassProperty {
	constructor() {
		this.property = 1;
		console.log("Hello, world!");
	}
}

This transformConstructorBody also inserts a “synthetic” super(...arguments) if the class is a derived one with a property initializer and without its own constructor.

For example, this TypeScript class:

class HasJustClassProperty {
	property = 1;
}

…needs to create its own constructor and super(...arguments) in order to hold the mapped property in its output JavaScript:

class HasJustClassProperty {
	constructor() {
		super(...arguments);
		this.property = 1;
	}
}

In order to account for code being emitted before any class properties and any constructor, the logic is roughly:

  1. Map any prologue directives and explicit super() call into the new constructor
  2. If there was a super() call, splice any statements preceding it after the prologue statements and before the super() call
  3. Later depending on whether a super() call was found:
    • If it was, add parameter properties immediately after it
    • If it wasn’t but a synthetic super(...arguments) was added, add those parameter properties just after it
    • If neither is the case, add those parameter properties to the top of the constructor

Ordering is tricky!

I also excluded parameter properties from being moved into the constructor when useDefineForClassFields is enabled, as those properties are then handled elsewhere. I don’t remember where else they’re handled but I do remember that when I didn’t filter them out, they appeared twice in the output JavaScript.

I’ve omitted code snippets from this transformer’s explanation for brevity.

transformES2015

src/transformers/es2015.ts

The ES2015-to-ES5 transformer is the largest of TypeScript’s transformers and contains more lines of code than all the other ECMAScript transformers combined. I suggested in #47573: Remove older emit support over time that TypeScript no longer target ECMAScript versions older than what any realistically used runtime environment needs… but until dropping pre-ES2020 happens some years in the future (🙏), ES2015 classes still need to be transformed into function prototype equivalents in TypeScript’s compiled output JavaScript.

This TypeScript class:

class HasPropertyAndLog {
	message = "world";

	constructor() {
		console.log("Hello", this.message);
	}
}

…becomes roughly this output JavaScript:

var HasPropertyAndLog = /** @class */ (function () {
	function HasPropertyAndLog() {
		this.message = "world";
		console.log("Hello", this.message);
	}
	return HasPropertyAndLog;
})();

transformES2015’s transformConstructorBody keeps track of two arrays of statement nodes:

My change started off by adding three pieces of logic:

  1. Captures any previously existing prologue directives in an existingPrologue array
  2. Find the super() call, storing it in a superCall and its statement index in superStatementIndex
    • This is done with a new findSuperCallAndStatementIndex that loops through constructor body statements after those in existingPrologue
  3. Create a postSuperStatementsStart variable to determine where post-er(...) nodes are meant to be placed:
    • If a super() call wasn’t found, place them just after existingPrologue
    • If a super() call was found, place them just after superStatementIndex

transformConstructorBody is then able to use that information to create constructor body statements:

  1. If the super() call wasn’t synthesized, copy prologue statements into prologue
  2. Create a superCallExpression variable to store a new super() call, if a previous one exists:
    • If the existing super() is synthesized, replace it with the ES5 equivalent: var _this = _super !== null && _super.apply(this, arguments) || this;
    • If the existing super() wasn’t synthesized, store the result of visiting it
  3. Add any default property value assignments and constructor rest parameter to the end of prologue
  4. Add any remaining statements from the constructor to statements

The logic for where to place that superCallExpression node changes based on a few potential cases commented in src/compiler/transformers/es2015.ts#1056:

I’ve omitted code snippets from this transformer’s explanation for brevity.

I know that was a big wall of text, but if you read through the contents of transformConstructorBody and use its comments as reference, I think it can be reasoned through. The transformer code has to include a few extra function calls to properly massage this scoping and source maps from ES2015+ classes to ES5 functions here and there.

Bewildered at that high-level walkthrough? Me too! Please upvote #47573: Remove older emit support over time to make it more likely we’ll no longer need to support ES5 eventually! 💖

Final Thanks

I’d like to extend a sincere heartfelt thanks to the several developers who reviewed the pull request over the years. In order of review: