Skip to content
Snippets Groups Projects
Name Last commit Last update
README.md
build.sh
img_gray.c
simple.py

2024-04-10 - Milo

Install cross compiler

To run the programs we need on ARM we need to setup a cross compiler for AArch64. This is actually not too bad.

First download the binaries from ARM: wget https://developer.arm.com/-/media/Files/downloads/gnu-a/10.3-2021.07/binrel/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu.tar.xz

Then extract them: tar xv gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu.tar.xz

Then we move them to an easier path mv gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu aarch

Now we can use the cross compiler toolchain to compiler our programs.

Build programs

We need to use the cross compiler, and regular compiler to build out programs. For both architectures, we need to make sure our binaries are static by using the -static command line option. This is for gem5 to work.

For AArch64 we need to decide what optimization level we want from gcc. Regardless of optimization level we can turn on auto-vectorization with -ftree-vectorize

For X86 we need to include an additional flag -msse (or -msse2 not sure).

We can get more info from the compiler by passing in the -fopt-info-vec-{missed,all} flag.

For more info on the vectorizer go vectorizer and for more info on getting output go here

2024-04-12 - Milo

ARM NEON

The official ARM NEON page has a good overview of the NEON architecture for vector instructions. It can be found here. Key takeaways are that Vector instructions act on vector registers (128-bit or 64-bit).

2024-04-13 - Milo

First workload

I added a basic image processing task to the repo. The file img_gray.c will convert an RBG image into grayscale by computing the luminance of each pixel. This should be heavily vectorizable, and should see a great improvement.

Build Script

Currently writing a build script to build vectorized and non-vectorized objects for x86 and ARM automatically.

Auto-vectorization

We need to decide what level of optimization we want to do with gcc. For example, with O1, the img_gray.c will not vectorize the grayscale conversion, but with O2, it will. One thought is that if we compare the speedup between vectorized and non-vectorized, that wil give us an idea of the impact of the vector instructions?