2024-04-10 - Milo
Install cross compiler
To run the programs we need on ARM we need to setup a cross compiler for AArch64. This is actually not too bad.
First download the binaries from ARM:
wget https://developer.arm.com/-/media/Files/downloads/gnu-a/10.3-2021.07/binrel/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu.tar.xz
Then extract them:
tar xv gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu.tar.xz
Then we move them to an easier path
mv gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu aarch
Now we can use the cross compiler toolchain to compiler our programs.
Build programs
We need to use the cross compiler, and regular compiler to build out programs.
For both architectures, we need to make sure our binaries are static by using the -static
command line option. This is for gem5 to work.
For AArch64 we need to decide what optimization level we want from gcc. Regardless of optimization level we can turn on auto-vectorization with -ftree-vectorize
For X86 we need to include an additional flag -msse (or -msse2 not sure).
We can get more info from the compiler by passing in the -fopt-info-vec-{missed,all} flag.
For more info on the vectorizer go vectorizer and for more info on getting output go here
2024-04-12 - Milo
ARM NEON
The official ARM NEON page has a good overview of the NEON architecture for vector instructions. It can be found here. Key takeaways are that Vector instructions act on vector registers (128-bit or 64-bit).
2024-04-13 - Milo
First workload
I added a basic image processing task to the repo.
The file img_gray.c
will convert an RBG image into grayscale
by computing the luminance of each pixel.
This should be heavily vectorizable, and should see a great improvement.
Build Script
Currently writing a build script to build vectorized and non-vectorized objects for x86 and ARM automatically.
Auto-vectorization
We need to decide what level of optimization we want to do with gcc. For example, with O1, the img_gray.c will not vectorize the grayscale conversion, but with O2, it will. One thought is that if we compare the speedup between vectorized and non-vectorized, that wil give us an idea of the impact of the vector instructions?
Basic CPU Configuration
I copied the given class configuration for the CPU microachitectural details. It uses an Intel Skylake style cache hierarchy. We should think about if the actual details matter.
Additionally, we can select what ISA we want and what binary we want to run as a CLI argument to the simple.py script. The first is the binary we want to run, and the second is x86 for x86 or arm for ARM.
2024-04-15 - Milo
Simulation Script
Created a script run_sim.sh
which will run 4 simulations based on the
naming that we use in the build script.
It takes one argument and runs X86 and ARM sims for vector and no vector tests.
It saves the results in the format: sim-name-arch-[n]vec.
External Benchmarks
Found an old and freely available benchmark suite Livermore Loops. It looks like a good choice for some numerical algorithms. Some of the loops can be vectorized and others cannot. The source code in c can be found here
2024-04-16 Davis
second workload
I created a vector addition file called vecAddProj.c
and added it to the repo. My thought is that
since it simply adds two vectors, there should be singificant imporvement
in speedup between runs with and without vectors. This should serve as a proof of concept.
After Milo added the header and template, I used them to create files for all the livermore loops. I then created a directory to store all a .tar.gz file with all the loops in it.
2024-04-16 Milo
Livermore Loops
I created a header that should set up the main variables for each bench mark. Added a template hydro.c that we can use to add the other loops.
2024-04-17 Milo
GCC Version
I found a gcc 11.3 version for the ARM toolchain. 11.4 was a bug fix so, the performance should be comparable. The link to download can be found here.
LLoops
I removed all the prints from the livermore loops and moved the c sources up
Added a build script that builds all the programs in a dir for all configurations. Had to add "-lm" to the build script to link the math library needed for some of the programs. This doesn't work so I tried linking the libm but I'm not sure if that worked either.
Scripts
Added a build_l.sh, verify.sh, and clean.sh. These scripts deal with building and cleaning the binaries for the livermore loops.
Out
Added the initial results. Theses SHOULD NOT BE TRUSTED We need to go through and see what is going on first.