diff --git a/index.html b/index.html index 58505386..d2915783 100644 --- a/index.html +++ b/index.html @@ -1115,146 +1115,147 @@ pre{ white-space:pre }
  • 19.2. gem5 run benchmark +
  • +
  • 19.3. gem5 system parameters + +
  • +
  • 19.4. gem5 kernel command line parameters
  • +
  • 19.5. gem5 GDB step debug + +
  • +
  • 19.6. gem5 checkpoint + +
  • +
  • 19.7. Pass extra options to gem5
  • +
  • 19.8. m5ops + +
  • +
  • 19.9. gem5 arm Linux kernel patches + +
  • +
  • 19.10. m5out directory + +
  • +
  • 19.11. m5term
  • +
  • 19.12. gem5 Python scripts without rebuild
  • +
  • 19.13. gem5 fs_bigLITTLE
  • +
  • 19.14. gem5 in-tree tests + +
  • +
  • 19.15. gem5 simulate() limit reached
  • +
  • 19.16. gem5 build options + +
  • +
  • 19.17. gem5 CPU types +
  • -
  • 19.3. gem5 kernel command line parameters
  • -
  • 19.4. gem5 GDB step debug -
  • -
  • 19.5. gem5 checkpoint - -
  • -
  • 19.6. Pass extra options to gem5
  • -
  • 19.7. m5ops - -
  • -
  • 19.8. gem5 arm Linux kernel patches - -
  • -
  • 19.9. m5out directory - -
  • -
  • 19.10. m5term
  • -
  • 19.11. gem5 Python scripts without rebuild
  • -
  • 19.12. gem5 fs_bigLITTLE
  • -
  • 19.13. gem5 in-tree tests - -
  • -
  • 19.14. gem5 simulate() limit reached
  • -
  • 19.15. gem5 build options - -
  • -
  • 19.16. gem5 CPU types - -
  • -
  • 19.17. gem5 ARM platforms
  • -
  • 19.18. gem5 upstream images
  • -
  • 19.19. gem5 bootloaders
  • -
  • 19.20. gem5 CommMonitor
  • +
  • 19.18. gem5 ARM platforms
  • +
  • 19.19. gem5 upstream images
  • +
  • 19.20. gem5 bootloaders
  • 19.21. gem5 internals
  • -
  • 19.21.4.4. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs +
  • 19.21.4.4. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs
  • +
  • 19.21.4.5. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs and Ruby
  • +
  • 19.21.4.6. gem5 event queue MinorCPU syscall emulation freestanding example analysis
  • -
  • 19.21.4.5. gem5 event queue MinorCPU syscall emulation freestanding example analysis +
  • 19.21.4.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis -
  • -
  • 19.21.4.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis -
  • @@ -4325,7 +4323,7 @@ echo "$(./getvar --arch aarch64 --emulator gem5 image)"
    -

    see also: Section 19.17, “gem5 ARM platforms”.

    +

    see also: Section 19.18, “gem5 ARM platforms”.

    This generates yet new separate images with new magic constants:

    @@ -5692,7 +5690,7 @@ sched_getcpu = 0
    -

    The number of cores is modified as explained at: Section 19.2.2.1, “Number of cores”

    +

    The number of cores is modified as explained at: Section 19.3.1, “Number of cores”

    taskset from the util-linux package sets the initial core affinity of a program:

    @@ -10632,7 +10630,7 @@ CONFIG_IKCONFIG_PROC=y
    @@ -14198,6 +14212,25 @@ virt_to_phys(&static_var) = 0x40002308

    In this section we will play with them.

    +

    The following files contain examples to access that data and test it out:

    +
    +
    + +
    +

    First get a virtual address to play with:

    @@ -14250,9 +14283,6 @@ pid 110
    -

    Source: userland/linux/virt_to_phys_user.c

    -
    -

    Now we can verify that linux/virt_to_phys_user.out gave the correct physical address in the following ways:

    @@ -19044,7 +19074,7 @@ cat out/gem5-bench-dhrystone.txt
    -

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 19.5.3, “gem5 checkpoint restore and run a different script”.

    +

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 19.6.3, “gem5 checkpoint restore and run a different script”.

    Now you can play a fun little game with your friends:

    @@ -19113,16 +19143,17 @@ cat out/gem5-bench-dhrystone.txt

    Those problems should be insignificant if the benchmark runs for long enough however.

    -
    -

    19.2.2. gem5 system parameters

    +
    +
    +

    19.3. gem5 system parameters

    Besides optimizing a program for a given CPU setup, chip developers can also do the inverse, and optimize the chip for a given benchmark!

    The rabbit hole is likely deep, but let’s scratch a bit of the surface.

    -
    -
    19.2.2.1. Number of cores
    +
    +

    19.3.1. Number of cores

    ./run --arch arm --cpus 2 --emulator gem5
    @@ -19168,8 +19199,8 @@ getconf _NPROCESSORS_CONF
    -
    -
    19.2.2.1.1. QEMU user mode multithreading
    +
    +
    19.3.1.1. QEMU user mode multithreading

    User mode simulation QEMU v4.0.0 always shows the number of cores of the host, presumably because the thread switching uses host threads directly which would make that harder to implement.

    @@ -19205,8 +19236,8 @@ ps Haux | grep qemu | wc

    At 369a47fc6e5c2f4a7f911c1c058b6088f8824463 + 1 QEMU appears to spawn 3 host threads plus one for every new guest thread created. Remember that userland/posix/pthread_count.c spawns N + 1 total threads if you count the main thread.

    -
    - -
    -
    19.2.2.3. gem5 DRAM model
    +
    +

    19.3.3. gem5 DRAM model

    Some info at: TimingSimpleCPU analysis #1 but highly TODO :-)

    -
    -
    19.2.2.3.1. gem5 memory latency
    +
    +
    19.3.3.1. gem5 memory latency

    TODO These look promising:

    @@ -19508,8 +19539,11 @@ instructions 91738770
    -
    -
    19.2.2.3.2. Memory size
    +
    +
    19.3.3.2. Memory size
    +
    +

    Can be set across emulators with:

    +
    ./run --memory 512M
    @@ -19609,9 +19643,112 @@ get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000

    AV means available and gives the free memory: https://stackoverflow.com/questions/14386856/c-check-available-ram/57659190#57659190

    -
    -
    19.2.2.4. gem5 disk and network latency
    +
    19.3.3.3. gem5 DRAM setup
    +
    +

    This can be explored pretty well from gem5 config.ini.

    +
    +
    +

    se.py just has a single DDR3_1600_8x8 DRAM with size given as Memory size and physical address starting at 0.

    +
    +
    +

    fs.py also has that DDR3_1600_8x8 DRAM, but can have more memory types. Notably, aarch64 has as shown on RealView.py VExpress_GEM5_Base:

    +
    +
    +
    +
    0x00000000-0x03ffffff: (  0     -  64 MiB) Boot memory (CS0)
    +0x04000000-0x07ffffff: ( 64 MiB - 128 MiB) Reserved
    +0x08000000-0x0bffffff: (128 MiB - 192 MiB) NOR FLASH0 (CS0 alias)
    +0x0c000000-0x0fffffff: (192 MiB - 256 MiB) NOR FLASH1 (Off-chip, CS4)
    +0x80000000-XxXXXXXXXX: (  2 GiB -        ) DRAM
    +
    +
    +
    +

    We place the entry point of our baremetal executables right at the start of DRAM with our Baremetal linker script.

    +
    +
    +

    This can be seen indirectly with:

    +
    +
    +
    +
    ./getvar --arch aarch64 --emulator gem5 entry_address
    +
    +
    +
    +

    which gives 0x80000000 in decimal, or more directly with some some gem5 tracing:

    +
    +
    +
    +
    ./run \
    +  --arch aarch64 \
    +  --baremetal baremetal/arch/aarch64/no_bootloader/exit.S \
    +  --emulator gem5 \
    +  --trace ExecAll,-ExecSymbol \
    +  --trace-stdout \
    +;
    +
    +
    +
    +

    and we see that the first instruction runs at 0x80000000:

    +
    +
    +
    +
          0: system.cpu: A0 T0 : 0x80000000
    +
    +
    +
    +

    TODO: what are the boot memory and NOR FLASH used for?

    +
    +
    +
    +
    +

    19.3.4. gem5 CommMonitor

    +
    +

    You can place this SimObject in between two ports to get extra statistics about the packets that are going through.

    +
    +
    +

    It only works on timing requests, and does not seem to dump any memory values, only add extra statistics.

    +
    +
    +

    For example, the patch patches/manual/gem5-commmonitor-se.patch hack a CommMonitor between the CPU and the L1 cache on top of gem5 1c3662c9557c85f0d25490dc4fbde3f8ab0cb350:

    +
    +
    +
    +
    patch -d "$(./getvar gem5_source_dir)" -p 1 < patches/manual/gem5-commmonitor-se.patch
    +
    +
    +
    +

    That patch was done largely by copying what fs.py --memcheck does with a MemChecker object.

    +
    +
    +

    You can then run with:

    +
    +
    +
    +
    ./run \
    +  --arch aarch64 \
    +  --emulator gem5 \
    +  --userland userland/arch/aarch64/freestanding/linux/hello.S \
    +  -- \
    +  --caches \
    +  --cpu-type TimingSimpleCPU \
    +;
    +
    +
    +
    +

    and now we have some new extra histogram statistics such as:

    +
    +
    +
    +
    system.cpu.dcache_mon.readBurstLengthHist::samples            1
    +
    +
    +
    +

    One neat thing about this is that it is agnostic to the memory object type, so you don’t have to recode those statistics for every new type of object that operates on memory packets.

    +
    +
    +
    +

    19.3.5. gem5 disk and network latency

    TODO These look promising:

    @@ -19625,8 +19762,8 @@ get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000

    and also: gem5-dist: https://publish.illinois.edu/icsl-pdgem5/

    -
    -
    19.2.2.5. gem5 clock frequency
    +
    +

    19.3.6. gem5 clock frequency

    As of gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 defaults to 2GHz for fs.py:

    @@ -19710,9 +19847,8 @@ hello
    -
    -

    19.3. gem5 kernel command line parameters

    +

    19.4. gem5 kernel command line parameters

    Analogous to QEMU:

    @@ -19745,9 +19881,9 @@ hello
    -

    19.4. gem5 GDB step debug

    +

    19.5. gem5 GDB step debug

    -

    19.4.1. gem5 GDB step debug kernel

    +

    19.5.1. gem5 GDB step debug kernel

    Analogous to QEMU, on the first shell:

    @@ -19780,7 +19916,7 @@ hello
    -

    19.4.2. gem5 GDB step debug userland process

    +

    19.5.2. gem5 GDB step debug userland process

    We are unable to use gdbserver because of networking as mentioned at: Section 14.3.1.3, “gem5 host to guest networking”

    @@ -19815,7 +19951,7 @@ hello
    -

    19.4.3. gem5 GDB step debug secondary cores

    +

    19.5.3. gem5 GDB step debug secondary cores

    gem5’s secondary core GDB setup is a hack and spawns one gdbserver for each core in separate ports, e.g. 7000, 7001, etc.

    @@ -19836,7 +19972,7 @@ hello
    -

    19.5. gem5 checkpoint

    +

    19.6. gem5 checkpoint

    Analogous to QEMU’s Snapshot, but better since it can be started from inside the guest, so we can easily checkpoint after a specific guest event, e.g. just before init is done.

    @@ -19924,7 +20060,7 @@ m5 checkpoint

    since boot has already happened, and the parameters are already in the RAM of the snapshot.

    -

    19.5.1. gem5 checkpoint userland minimal example

    +

    19.6.1. gem5 checkpoint userland minimal example

    In order to debug checkpoint restore bugs, this minimal setup using userland/freestanding/gem5_checkpoint.S can be handy:

    @@ -19978,7 +20114,7 @@ Exiting @ tick 84500 because m5_exit instruction encountered
    -

    19.5.2. gem5 checkpoint internals

    +

    19.6.2. gem5 checkpoint internals

    A quick way to get a gem5 syscall emulation mode or full system checkpoint to observe is:

    @@ -20029,7 +20165,7 @@ prvEvalTick=0
    -

    19.5.3. gem5 checkpoint restore and run a different script

    +

    19.6.3. gem5 checkpoint restore and run a different script

    You want to automate running several tests from a single pristine post-boot state.

    @@ -20177,7 +20313,7 @@ expect eof
    -

    19.5.4. gem5 restore checkpoint with a different CPU

    +

    19.6.4. gem5 restore checkpoint with a different CPU

    gem5 can switch to a different CPU model when restoring a checkpoint.

    @@ -20292,7 +20428,7 @@ cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    -
    19.5.4.1. gem5 fast forward
    +
    19.6.4.1. gem5 fast forward

    Besides switching CPUs after a checkpoint restore, fs.py also has the --fast-forward option to automatically run the script from the start on a less detailed CPU, and switch to a more detailed CPU at a given tick.

    @@ -20418,7 +20554,7 @@ FullO3CPU: Ticking main, FullO3CPU.
    -

    19.5.5. gem5 checkpoint upgrader

    +

    19.6.5. gem5 checkpoint upgrader

    The in-tree util/cpt_upgrader.py is a tool to upgrade checkpoints taken from an older version of gem5 to be compatible with the newest version, so you can update gem5 without having to re-run the simulation that generated the checkpoints.

    @@ -20454,7 +20590,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -

    19.6. Pass extra options to gem5

    +

    19.7. Pass extra options to gem5

    Remember that in the gem5 command line, we can either pass options to the script being run as in:

    @@ -20511,7 +20647,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -

    19.7. m5ops

    +

    19.8. m5ops

    m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats.

    @@ -20551,7 +20687,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -

    19.7.1. gem5 m5 executable

    +

    19.8.1. gem5 m5 executable

    m5 is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the m5ops

    @@ -20581,7 +20717,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...

    This can be a good test m5ops since it executes very quickly.

    -
    19.7.1.1. m5 exit
    +
    19.8.1.1. m5 exit

    End the simulation.

    @@ -20590,13 +20726,13 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -
    19.7.1.2. m5 dumpstats
    +
    19.8.1.2. m5 dumpstats

    Makes gem5 dump one more statistics entry to the gem5 m5out/stats.txt file.

    -
    19.7.1.3. m5 fail
    +
    19.8.1.3. m5 fail

    End the simulation with a failure exit event:

    @@ -20635,7 +20771,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -
    19.7.1.4. m5 writefile
    +
    19.8.1.4. m5 writefile

    Send a guest file to the host. 9P is a more advanced alternative.

    @@ -20666,7 +20802,7 @@ m5 writefile myfileguest myfilehost
    -
    19.7.1.5. m5 readfile
    +
    19.8.1.5. m5 readfile

    Read a host file pointed to by the fs.py --script option to stdout.

    @@ -20694,7 +20830,7 @@ m5 writefile myfileguest myfilehost
    -
    19.7.1.6. m5 initparam
    +
    19.8.1.6. m5 initparam

    Ermm, just another m5 readfile that only takes integers and only from CLI options? Is this software so redundant?

    @@ -20720,7 +20856,7 @@ m5 writefile myfileguest myfilehost
    -
    19.7.1.7. m5 execfile
    +
    19.8.1.7. m5 execfile

    Trivial combination of m5 readfile + execute the script.

    @@ -20755,7 +20891,7 @@ m5 execfile
    -

    19.7.2. m5ops instructions

    +

    19.8.2. m5ops instructions

    gem5 allocates some magic instructions on unused instruction encodings for convenient guest instrumentation.

    @@ -20842,7 +20978,7 @@ m5 execfile
    -
    19.7.2.1. m5ops instructions interface
    +
    19.8.2.1. m5ops instructions interface

    Let’s study how the gem5 m5 executable uses them:

    @@ -20956,7 +21092,7 @@ m5_fail(ints[1], ints[0]);
    -
    19.7.2.2. m5op annotations
    +
    19.8.2.2. m5op annotations

    include/gem5/asm/generic/m5ops.h also describes some annotation instructions.

    @@ -20967,7 +21103,7 @@ m5_fail(ints[1], ints[0]);
    -

    19.8. gem5 arm Linux kernel patches

    +

    19.9. gem5 arm Linux kernel patches

    https://gem5.googlesource.com/arm/linux/ contains an ARM Linux kernel forks with a few gem5 specific Linux kernel patches on top of mainline created by ARM Holdings on top of a few upstream kernel releases.

    @@ -21047,7 +21183,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    drm: Add component-aware simple encoder allows you to see images through VNC, see: Section 13.3, “gem5 graphic mode”

  • -

    gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 19.2.2.1.2, “gem5 ARM full system with more than 8 cores”

    +

    gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 19.3.1.2, “gem5 ARM full system with more than 8 cores”

  • @@ -21055,7 +21191,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    Tested on 649d06d6758cefd080d04dc47fd6a5a26a620874 + 1.

    -

    19.8.1. gem5 arm Linux kernel patches boot speedup

    +

    19.9.1. gem5 arm Linux kernel patches boot speedup

    We have observed that with the kernel patches, boot is 2x faster, falling from 1m40s to 50s.

    @@ -21073,7 +21209,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    19.9. m5out directory

    +

    19.10. m5out directory

    When you run gem5, it generates an m5out directory at:

    @@ -21089,7 +21225,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    The files in that directory contains some very important information about the run, and you should become familiar with every one of them.

    -

    19.9.1. gem5 m5out/system.terminal file

    +

    19.10.1. gem5 m5out/system.terminal file

    Contains UART output, both from the Linux kernel or from the baremetal system.

    @@ -21098,7 +21234,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    19.9.2. gem5 m5out/system.workload.dmesg file

    +

    19.10.2. gem5 m5out/system.workload.dmesg file

    This file used to be called just m5out/system.dmesg, but the name was changed after the workload refactorings of March 2020.

    @@ -21172,7 +21308,7 @@ index f296d89be757..3e79916322c2 100644
    -

    19.9.3. gem5 m5out/stats.txt file

    +

    19.10.3. gem5 m5out/stats.txt file

    This file contains important statistics about the run:

    @@ -21271,7 +21407,7 @@ system.cpu.dtb.inst_hits

    and after that the file size went down to 21KB.

    -
    19.9.3.1. gem5 HDF5 statistics
    +
    19.10.3.1. gem5 HDF5 statistics

    We can make gem5 dump statistics in the HDF5 format by adding the magic h5:// prefix to the file name as in:

    @@ -21321,7 +21457,7 @@ system.cpu.dtb.inst_hits
    -
    19.9.3.2. gem5 only dump selected stats
    +
    19.10.3.2. gem5 only dump selected stats

    TODO

    @@ -21333,7 +21469,7 @@ system.cpu.dtb.inst_hits
    -
    19.9.3.3. gem5 stats internals
    +
    19.10.3.3. gem5 stats internals

    This describes the internals of the gem5 m5out/stats.txt file.

    @@ -21407,7 +21543,7 @@ Text::end()
    -

    19.9.4. gem5 config.ini

    +

    19.10.4. gem5 config.ini

    The m5out/config.ini file, contains a very good high level description of the system:

    @@ -21480,7 +21616,7 @@ clock=500

    Modifying the config.ini file manually does nothing since it gets overwritten every time.

    -
    19.9.4.1. gem5 config.dot
    +
    19.10.4.1. gem5 config.dot

    The m5out/config.dot file contains a graphviz .dot file that provides a simplified graphical view of a subset of the gem5 config.ini.

    @@ -21561,7 +21697,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    19.10. m5term

    +

    19.11. m5term

    We use the m5term in-tree executable to connect to the terminal instead of a direct telnet.

    @@ -21586,7 +21722,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    19.11. gem5 Python scripts without rebuild

    +

    19.12. gem5 Python scripts without rebuild

    We have made a crazy setup that allows you to just cd into submodules/gem5, and edit Python scripts directly there.

    @@ -21620,7 +21756,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    19.12. gem5 fs_bigLITTLE

    +

    19.13. gem5 fs_bigLITTLE

    By default, we use configs/example/fs.py script.

    @@ -21669,7 +21805,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    19.13. gem5 in-tree tests

    +

    19.14. gem5 in-tree tests

    https://stackoverflow.com/questions/52279971/how-to-run-the-gem5-unit-tests

    @@ -21680,7 +21816,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"

    But can the people from the project be convinced of that?

    -

    19.13.1. gem5 unit tests

    +

    19.14.1. gem5 unit tests

    These are just very small GTest tests that test a single class in isolation, they don’t run any executables.

    @@ -21735,7 +21871,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    19.13.2. gem5 regression tests

    +

    19.14.2. gem5 regression tests

    This section is about running the gem5 in-tree tests.

    @@ -21784,7 +21920,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    19.14. gem5 simulate() limit reached

    +

    19.15. gem5 simulate() limit reached

    This error happens when the following instruction limits are reached:

    @@ -21920,12 +22056,12 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    19.15. gem5 build options

    +

    19.16. gem5 build options

    In order to use different build options, you might also want to use gem5 build variants to keep the build outputs separate from one another.

    -

    19.15.1. gem5 debug build

    +

    19.16.1. gem5 debug build

    How to use it in LKMC: Section 18.8, “Debug the emulator”.

    @@ -21937,7 +22073,7 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    19.15.2. gem5 fast build

    +

    19.16.2. gem5 fast build

    ./build-gem5 --gem5-build-type fast
    @@ -21961,7 +22097,7 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    19.15.3. gem5 prof and perf builds

    +

    19.16.3. gem5 prof and perf builds

    Profiling builds as of 3cea7d9ce49bda49c50e756339ff1287fd55df77 both use: -g -O3 and disable asserts and logging like the gem5 fast build and:

    @@ -21989,7 +22125,7 @@ gprof "$(./getvar --arch aarch64 gem5_executable)" > tmp.gprof
    -

    19.15.4. gem5 clang build

    +

    19.16.4. gem5 clang build

    TODO test properly, benchmark vs GCC.

    @@ -22002,7 +22138,7 @@ gprof "$(./getvar --arch aarch64 gem5_executable)" > tmp.gprof
    -

    19.15.5. gem5 sanitation build

    +

    19.16.5. gem5 sanitation build

    If there gem5 appears to have a C++ undefined behaviour bug, which is often very difficult to track down, you can try to build it with the following extra SCons options:

    @@ -22076,7 +22212,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    19.15.6. gem5 Ruby build

    +

    19.16.6. gem5 Ruby build

    gem5 has two types of memory system:

    @@ -22212,7 +22348,7 @@ cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"

    Tested in gem5 d7d9bc240615625141cd6feddbadd392457e49eb.

    -
    19.15.6.1. gem5 Ruby MI_example protocol
    +
    19.16.6.1. gem5 Ruby MI_example protocol

    This is the simplest of all protocols, and therefore the first one you should study to learn how Ruby works.

    @@ -22248,7 +22384,7 @@ cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    -
    19.15.6.2. gem5 crossbar interconnect
    +
    19.16.6.2. gem5 crossbar interconnect

    Crossbar or XBar in the code, is the default CPU interconnect that gets used by fs.py if --ruby is not given.

    @@ -22296,7 +22432,7 @@ class SystemXBar(CoherentXBar):
    -

    19.15.7. gem5 Python 3 build

    +

    19.16.7. gem5 Python 3 build

    Python 3 support was mostly added in 2019 Q3 at arounda347a1a68b8a6e370334be3a1d2d66675891e0f1 but remained buggy for some time afterwards.

    @@ -22314,7 +22450,7 @@ class SystemXBar(CoherentXBar):
    -

    19.16. gem5 CPU types

    +

    19.17. gem5 CPU types

    gem5 has a few in tree CPU models for different purposes.

    @@ -22397,9 +22533,9 @@ class SystemXBar(CoherentXBar):

    From this we see that there are basically only 4 C++ CPU models in gem5: Atomic, Timing, Minor and O3. All others are basically parametrizations of those base types.

    -

    19.16.1. List of gem5 CPU types

    +

    19.17.1. List of gem5 CPU types

    -
    19.16.1.1. gem5 BaseSimpleCPU
    +
    19.17.1.1. gem5 BaseSimpleCPU

    Simple abstract CPU without a pipeline.

    @@ -22420,7 +22556,7 @@ class SystemXBar(CoherentXBar):
    -
    19.16.1.1.1. gem5 AtomicSimpleCPU
    +
    19.17.1.1.1. gem5 AtomicSimpleCPU

    AtomicSimpleCPU: the default one. Memory accesses happen instantaneously. The fastest simulation except for KVM, but not realistic at all.

    @@ -22429,7 +22565,7 @@ class SystemXBar(CoherentXBar):
    -
    19.16.1.1.2. gem5 TimingSimpleCPU
    +
    19.17.1.1.2. gem5 TimingSimpleCPU

    TimingSimpleCPU: memory accesses are realistic, but the CPU has no pipeline. The simulation is faster than detailed models, but slower than AtomicSimpleCPU.

    @@ -22445,7 +22581,7 @@ class SystemXBar(CoherentXBar):
    -
    19.16.1.2. gem5 MinorCPU
    +
    19.17.1.2. gem5 MinorCPU

    Generic in-order superscalar core.

    @@ -22511,7 +22647,7 @@ class SystemXBar(CoherentXBar):
    -
    19.16.1.3. gem5 DerivO3CPU
    +
    19.17.1.3. gem5 DerivO3CPU

    Generic out-of-order core. "O3" Stands for "Out Of Order"!

    @@ -22571,7 +22707,7 @@ wbWidth=8
    -
    19.16.1.3.1. gem5 DerivO3CPU pipeline stages
    +
    19.17.1.3.1. gem5 DerivO3CPU pipeline stages
    -
    19.16.1.3.2. gem5 util/o3-pipeview.py O3 pipeline viewer
    +
    19.17.1.3.2. gem5 util/o3-pipeview.py O3 pipeline viewer

    Mentioned at: http://www.m5sim.org/Visualization

    @@ -22624,7 +22760,7 @@ less o3pipeview.tmp.log
    -
    19.16.1.3.3. gem5 Konata O3 pipeline viewer
    +
    19.17.1.3.3. gem5 Konata O3 pipeline viewer

    https://github.com/shioyadan/Konata

    @@ -22644,7 +22780,7 @@ less o3pipeview.tmp.log
    -

    19.16.2. gem5 ARM RSK

    +

    19.17.2. gem5 ARM RSK

    https://github.com/arm-university/arm-gem5-rsk/blob/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/gem5_rsk.pdf

    @@ -22654,7 +22790,7 @@ less o3pipeview.tmp.log
    -

    19.17. gem5 ARM platforms

    +

    19.18. gem5 ARM platforms

    The gem5 platform is selectable with the --machine option, which is named after the analogous QEMU -machine option, and which sets the --machine-type.

    @@ -22682,7 +22818,7 @@ less o3pipeview.tmp.log
    -

    19.18. gem5 upstream images

    +

    19.19. gem5 upstream images

    Present at:

    @@ -22736,7 +22872,7 @@ cd ..
    -

    19.19. gem5 bootloaders

    +

    19.20. gem5 bootloaders

    Certain ISAs like ARM have bootloaders that are automatically run before the main image to setup basic system state.

    @@ -22763,49 +22899,6 @@ cd ..
    -

    19.20. gem5 CommMonitor

    -
    -

    You can place this SimObject in between two ports to get extra statistics about the packets that are going through.

    -
    -
    -

    It only works on timing CPUs, and does not seem to dump any memory values, only add extra statistics.

    -
    -
    -

    For example, the patch patches/manual/gem5-commmonitor-se.patch hack a CommMonitor between the CPU and the L1 cache on top of gem5 1c3662c9557c85f0d25490dc4fbde3f8ab0cb350:

    -
    -
    -
    -
    patch -d "$(./getvar gem5_source_dir)" -p 1 < patches/manual/gem5-commmonitor-se.patch
    -
    -
    -
    -

    which you can run with:

    -
    -
    -
    -
    ./run \
    -  --arch aarch64 \
    -  --emulator gem5 \
    -  --userland userland/arch/aarch64/freestanding/linux/hello.S \
    -  -- \
    -  --caches \
    -  --cpu-type TimingSimpleCPU \
    -;
    -
    -
    -
    -

    and now we have some new extra histogram statistics such as:

    -
    -
    -
    -
    system.cpu.dcache_mon.readBurstLengthHist::samples            1
    -
    -
    -
    -

    One neat thing about this is that it is agnostic to the memory object type, so you don’t have to recode those statistics for every new type of object that operates on memory packets.

    -
    -
    -

    19.21. gem5 internals

    Internals under other sections:

    @@ -25493,8 +25586,7 @@ type=SetAssociative

    CPU0 already has has that cache line (0x880) in its cache at state E of MOESI, so it snoops and moves to S. We can look up the logs to see exactly where CPU0 had previously read that address:

    -
    table: 1, dirty: 0
    -59135500: Cache: system.cpu0.icache: Block addr 0x880 (ns) moving from state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 | tag: 0 set: 0x22 way: 0
    +
    59135500: Cache: system.cpu0.icache: Block addr 0x880 (ns) moving from state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 | tag: 0 set: 0x22 way: 0
     59135500: CoherentXBar: system.membus: recvAtomicBackdoor: src system.membus.slave[1] packet WritebackClean [8880:88bf]
     59135500: CoherentXBar: system.membus: recvAtomicBackdoor: src system.membus.slave[1] packet WritebackClean [8880:88bf] SF size: 0 lat: 1
     59135500: DRAM: system.mem_ctrls: recvAtomic: WritebackClean 0x8880
    @@ -25627,10 +25719,11 @@ type=SetAssociative

    and so on, they just keep fighting over that address and changing one another’s state.

    - +
    +
    19.21.4.5. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs and Ruby
    -

    Now let’s do the exact same we did for gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs, but with Ruby rather than the classic system.

    +

    Now let’s do the exact same we did for gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs, but with Ruby rather than the classic system and TimingSimpleCPU (atomic does not work with Ruby)

    Since we have fully understood coherency in that previous example, it should now be easier to understand what is going on with Ruby:

    @@ -25645,7 +25738,7 @@ type=SetAssociative
    --trace FmtFlag,DRAM,ExecAll,Ruby \ --userland userland/c/atomic.c \ -- \ - --cpu-type AtomicSimpleCPU \ + --cpu-type TimingSimpleCPU \ --ruby \ ;
    @@ -25669,9 +25762,8 @@ non-atomic 19

    TODO

    -
    -
    19.21.4.5. gem5 event queue MinorCPU syscall emulation freestanding example analysis
    +
    19.21.4.6. gem5 event queue MinorCPU syscall emulation freestanding example analysis

    The events for the Atomic CPU were pretty simple: basically just ticks.

    @@ -25841,14 +25933,14 @@ non-atomic 19
    -
    19.21.4.5.1. gem5 event queue MinorCPU syscall emulation freestanding example analysis: hazard
    +
    19.21.4.6.1. gem5 event queue MinorCPU syscall emulation freestanding example analysis: hazard

    TODO like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but with the hazard.

    -
    19.21.4.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis
    +
    19.21.4.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis

    Like gem5 event queue MinorCPU syscall emulation freestanding example analysis but even more complex since for the gem5 DerivO3CPU!

    @@ -25876,7 +25968,7 @@ non-atomic 19

    This section and children are tested at LKMC 144a552cf926ea630ef9eadbb22b79fe2468c456.

    -
    19.21.4.6.1. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless
    +
    19.21.4.7.1. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless

    Let’s have a look at the arguably simplest example userland/arch/aarch64/freestanding/linux/hazardless.S.

    @@ -26115,7 +26207,7 @@ non-atomic 19
    -
    19.21.4.6.2. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard
    +
    19.21.4.7.2. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard

    Now let’s do the same as in gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless but with a hazard: userland/arch/aarch64/freestanding/linux/hazard.S.

    @@ -26159,7 +26251,7 @@ non-atomic 19
    -
    19.21.4.6.3. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard4
    +
    19.21.4.7.3. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard4

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but a hazard of depth 4: userland/arch/aarch64/freestanding/linux/hazard.S.

    @@ -26200,7 +26292,7 @@ non-atomic 19
    -
    19.21.4.6.4. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall
    +
    19.21.4.7.4. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall.S.

    @@ -26251,7 +26343,7 @@ non-atomic 19
    -
    19.21.4.6.5. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain
    +
    19.21.4.7.5. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall-gain.S.

    @@ -26338,7 +26430,7 @@ non-atomic 19
    -
    19.21.4.6.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-hazard4
    +
    19.21.4.7.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-hazard4

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain but now with some dependencies after the LDR: userland/arch/aarch64/freestanding/linux/stall-hazard4.S.

    @@ -26405,7 +26497,7 @@ non-atomic 19
    -
    19.21.4.6.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative
    +
    19.21.4.7.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative

    Now let’s try to see some Speculative execution in action with userland/arch/aarch64/freestanding/linux/speculative.S.

    @@ -29120,7 +29212,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t

    libguestfs: https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697, in particular vfs-minimum-size

  • -

    use methods described at: Section 19.5.3, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

    +

    use methods described at: Section 19.6.3, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

  • @@ -31683,7 +31775,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png

    The cache sizes were chosen to match the host 2017 Lenovo ThinkPad P51 to improve the comparison. Ideally we should also use the same standard library.

    -

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 19.9.3.2, “gem5 only dump selected stats”

    +

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 19.10.3.2, “gem5 only dump selected stats”

    Sources: