Overall Structure and Layout

When you write a typical C application for a modern operating system, you do not need to pay very much heed to the exact layout of system memory nor spell out a great many details to the linker about how your program needs to be loaded. Decades of abstraction layers insulate you and provide a highly standardized, sanitized environment for your program's running process.

The GameBoy Advance is a bare-metal environment. There is no operating system – or the game is its own operating system, whichever viewpoint one prefers. While the idealized memory abstraction presented to an OS-managed application is a single uniform space from 0x00000000 to 0xffffffff, the real memory space of the hardware is broken down into pieces with different functionalities and there are often large gaps that aren't actually attached to anything. The GBA memory space includes the read-only BIOS (16KB), fast working RAM (32KB), slow working RAM (256KB), memory belonging directly to the screen (98KB), and space reserved for the cartridge's ROM (32MB) and optionally either battery-backed SRAM (64KB) or flash (64KB, bankswitchable) for save storage. The working (that is, "normal") RAM is split into fast and slow regions not because the designers thought you would like slower RAM, but to balance the device's cost against performance. The variables which the game updates constantly should be squeezed into fast RAM, with the rest on the slow boat.

Note

A game cartridge can contain a "mapper" or "bankswitch" capability which connects and disconnects different chips on the cartridge board to the console's memory space dynamically. This allows the game to have more data than will fit by breaking it down into chunks that are never loaded at the same time. However, this was much more common in 8-bit and 16-bit games since 32-bit games have a much larger address space available even on systems (including the GBA) which only wire up part of it. You may be interested in my high-level overview of how mappers are used on the NES.

It is therefore necessary to divide Emerald up into sections – the read-only code and data, the fast variables, the slow variables, and the save file – and explicitly instruct the compiler on what goes where in the final ROM. This is accomplished through linker scripts. It is fortunately not necessary to understand every detail of linker script syntax for our purposes. The scripts are found in ld_script.txt and the files it includes by name.

Notice that there are two versions of the linker script in the repository: one which makes sure everything is placed exactly in the same spot they ended up in the original ROM (ld_script.txt), and one for modern compilers which will find any new modules you've added and not be too particular about the exact order (ld_script_modern.txt). The latter is useful if you are making substantial changes to the game. We are examining the exact-reproduction version so that what we see in the script will correlate exactly to what we see in the game's memory in a live debugger. In general, the ordering of different pieces of data within the same linking section is due to the order they were added to the code base or the alphabetical order of the original source filenames, and not because they must be in that specific order. The naming conventions used in linker scripts are extremely archaic and non-obvious: .text means executable code, .data means variables that always have a value and .bss means variables that do not always have a value. .rodata is read-only data which is, at least, actually what it sounds like.

Memory Map Summary

rom_header_gf.c

static const struct GFRomHeader sGFRomHeader = {
    .version = GAME_VERSION,
    .language = GAME_LANGUAGE,
    .gameName = "pokemon emerald version",
    .monFrontPics = gMonFrontPicTable,
    .monBackPics = gMonBackPicTable,
    .monNormalPalettes = gMonPaletteTable,
    .monShinyPalettes = gMonShinyPaletteTable,
    .monIcons = gMonIconTable,
    .monIconPaletteIds = gMonIconPaletteIndices,
    .monIconPalettes = gMonIconPaletteTable,
    .monSpeciesNames = gSpeciesNames,
    .moveNames = gMoveNames,
    .decorations = gDecorations,
    .flagsOffset = offsetof(struct SaveBlock1, flags),
    .varsOffset = offsetof(struct SaveBlock1, vars),
    .pokedexOffset = offsetof(struct SaveBlock2, pokedex),
    .seen1Offset = offsetof(struct SaveBlock1, seen1),
    .seen2Offset = offsetof(struct SaveBlock1, seen2),
    .pokedexVar = VAR_NATIONAL_DEX - VARS_START,
    .pokedexFlag = FLAG_RECEIVED_POKEDEX_FROM_BIRCH,
    .mysteryEventFlag = FLAG_SYS_MYSTERY_EVENT_ENABLE,
    .pokedexCount = NATIONAL_DEX_COUNT,
    .playerNameLength = PLAYER_NAME_LENGTH,
    .trainerNameLength = TRAINER_NAME_LENGTH,
    .pokemonNameLength1 = POKEMON_NAME_LENGTH,
    .pokemonNameLength2 = POKEMON_NAME_LENGTH,
    // Two of the below 12s are likely move/ability name length, given their presence in this header
    .unk5 = 12,
    .unk6 = 12,
    .unk7 = 6,
    .unk8 = 12,
    .unk9 = 6,
    .unk10 = 16,
    .unk11 = 18,
    .unk12 = 12,
    .unk13 = 15,
    .unk14 = 11,
    .unk15 = 1,
    .unk16 = 8,
    .unk17 = 12,
    .saveBlock2Size = sizeof(struct SaveBlock2),
    .saveBlock1Size = sizeof(struct SaveBlock1),
    .partyCountOffset = offsetof(struct SaveBlock1, playerPartyCount),
    .partyOffset = offsetof(struct SaveBlock1, playerParty),
    .warpFlagsOffset = offsetof(struct SaveBlock2, specialSaveWarpFlags),
    .trainerIdOffset = offsetof(struct SaveBlock2, playerTrainerId),
    .playerNameOffset = offsetof(struct SaveBlock2, playerName),
    .playerGenderOffset = offsetof(struct SaveBlock2, playerGender),
    .frontierStatusOffset = offsetof(struct SaveBlock2, frontier.challengeStatus),
    .frontierStatusOffset2 = offsetof(struct SaveBlock2, frontier.challengeStatus),
    .externalEventFlagsOffset = offsetof(struct SaveBlock1, externalEventFlags),
    .externalEventDataOffset = offsetof(struct SaveBlock1, externalEventData),
    .unk18 = 0x00000000,
    .speciesInfo = gSpeciesInfo,
    .abilityNames = gAbilityNames,
    .abilityDescriptions = gAbilityDescriptionPointers,
    .items = gItems,
    .moves = gBattleMoves,
    .ballGfx = gBallSpriteSheets,
    .ballPalettes = gBallSpritePalettes,
    .gcnLinkFlagsOffset = offsetof(struct SaveBlock2, gcnLinkFlags),
    .gameClearFlag = FLAG_SYS_GAME_CLEAR,
    .ribbonFlag = FLAG_SYS_RIBBON_GET,
    .bagCountItems = BAG_ITEMS_COUNT,
    .bagCountKeyItems = BAG_KEYITEMS_COUNT,
    .bagCountPokeballs = BAG_POKEBALLS_COUNT,
    .bagCountTMHMs = BAG_TMHM_COUNT,
    .bagCountBerries = BAG_BERRIES_COUNT,
    .pcItemsCount = PC_ITEMS_COUNT,
    .pcItemsOffset = offsetof(struct SaveBlock1, pcItems),
    .giftRibbonsOffset = offsetof(struct SaveBlock1, giftRibbons),
    .enigmaBerryOffset = offsetof(struct SaveBlock1, enigmaBerry),
    .enigmaBerrySize = sizeof(struct EnigmaBerry),
    .moveDescriptions = NULL,
    .unk20 = 0x00000000, // 0xFFFFFFFF in FRLG
};

It is very curious that the game name is in all lower-case.

The variables marked as "unknown" in the decomp have the following names in the original code:

// unknowns 5-17
WAZA_NAME_SIZE,
ITEM_NAME_SIZE,
SEED_NAME_SIZE,
SPEABI_NAME_SIZE,
ZOKUSEI_NAME_SIZE,
MAPNAME_WIDTH,
MAPNAME_MAX,
TRTYPE_NAME_SIZE,
GOODS_NAME_SIZE,
ZUKAN_TYPE_SIZE,
EOM_SIZE,
BTL_TR_NAME_SIZE,
KAIWA_WORK_SIZE,
// unknown 18
0
// unknown 20
static  const u8 IndexNull[ 0x100 - sizeof(POKEMON_ROM_HEADER)] = {}; 

Unknown 18 will take its secrets to the end of time, it seems. It's in a set of interoperability flags, but no name or explanation is given. Unknown 20 is not actually part of the structure in the original source code, but simply follows after it. It was intended as padding to reserve a full 256 bytes, but its existence is pointless as the structure is 344 bytes. The ancient compiler the original team was using must have decided that an array with a negative length (256 - 344) actually has a length of one rather than throwing an error, which would have brought it to their attention as no longer needed.

Source Directory Layout

Most of the C source files that make up the bulk of the engine are in src/ but there are a few outside of it (gflib/) and assembly source files (.s) are scattered across several folders. asm/ contains macro definitions for the scripting engine; data/ contains an enormous amount of assorted high-level game data such as event scripts, in-game text, tilesets and maps; graphics/ contains visuals other than tilesets; include/ is C headers; sound/ includes songs and sound effects; tools/ contains utilities used in the build process that are not part of the game itself. Here you will find the scripts that convert the easily-editable forms of images and music stored in the repo to the formats expected by the game and gbafix which patches the final CRC into the header.

If you add new data resources to the game, you will likely need to wade into src/data/ and find the right header to patch to actually incorporate it into the ROM by name.

Continue on to Init and Main