Milo Craun
5504-research

Repository



2024-04-10 - Milo

Install cross compiler
To run the programs we need on ARM we need to setup a cross
compiler for AArch64. This is actually not too bad.
First download the binaries from ARM:
wget https://developer.arm.com/-/media/Files/downloads/gnu-a/10.3-2021.07/binrel/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu.tar.xz
Then extract them:
tar xv gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu.tar.xz
Then we move them to an easier path
mv gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu aarch
Now we can use the cross compiler toolchain to compiler our programs.

Build programs
We need to use the cross compiler, and regular compiler to build out programs.
For both architectures, we need to make sure our binaries are static by using the -static
command line option. This is for gem5 to work.
For AArch64 we need to decide what optimization level we want from gcc.
Regardless of optimization level we can turn on auto-vectorization with
-ftree-vectorize
For X86 we need to include an additional flag -msse (or -msse2 not sure).
We can get more info from the compiler by passing in the -fopt-info-vec-{missed,all} flag.
For more info on the vectorizer go vectorizer and for more info on getting output go here

2024-04-12 - Milo

ARM NEON
The official ARM NEON page has a good overview of the NEON architecture for vector instructions.
It can be found here.
Key takeaways are that Vector instructions act on vector registers (128-bit or 64-bit).

2024-04-13 - Milo

First workload
I added a basic image processing task to the repo.
The file img_gray.c will convert an RBG image into grayscale
by computing the luminance of each pixel.
This should be heavily vectorizable, and should see a great improvement.

Build Script
Currently writing a build script to build vectorized and non-vectorized objects for
x86 and ARM automatically.

Auto-vectorization
We need to decide what level of optimization we want to do with gcc.
For example, with O1, the img_gray.c will not vectorize the grayscale
conversion, but with O2, it will.
One thought is that if we compare the speedup between vectorized and non-vectorized, that wil give us an idea of the impact of the vector instructions?

Basic CPU Configuration
I copied the given class configuration for the CPU microachitectural details.
It uses an Intel Skylake style cache hierarchy.
We should think about if the actual details matter.
Additionally, we can select what ISA we want and what binary we want to run
as a CLI argument to the simple.py script.
The first is the binary we want to run, and the second is x86 for x86 or arm for ARM.

2024-04-15 - Milo

Simulation Script
Created a script run_sim.sh which will run 4 simulations based on the
naming that we use in the build script.
It takes one argument and runs X86 and ARM sims for vector and no vector tests.
It saves the results in the format: sim-name-arch-[n]vec.

External Benchmarks
Found an old and freely available benchmark suite Livermore Loops.
It looks like a good choice for some numerical algorithms.
Some of the loops can be vectorized and others cannot.
The source code in c can be found here

2024-04-16 Davis

second workload
I created a vector addition file called vecAddProj.c and added it to the repo. My thought is that
since it simply adds two vectors, there should be singificant imporvement
in speedup between runs with and without vectors. This should serve as a proof of concept.

2024-04-16 Milo

Livermore Loops
I created a header that should set up the main variables for each bench mark.
Added a template hydro.c that we can use to add the other loops.