Lab 7 — C programming

Template for Lab 7 with instructions for how to begin. More experience with C programming, and with using continuous integration via GitHub Actions.

Pointers and debugging SEGFAULT

A value, such as a number or the contents of a struct, can have one variable that it is assigned to, and it can have any number of pointers referring to it.

      int x = 100; /* create a int variable. this holds a `value` */
int *xp; /* create a variable that can point to an int. this `points` to a value */
xp = &x; /* make `xp` point to `x` by getting `x`'s memory address with the & operator */
*xp = 5; /* dereference `xp` with * and set its contents to 5. this sets x = 5. */

/* Unlike in Python, assigning one struct to another creates a copy, for the
 * reason given above */
struct Data d1;
d1.value = 20;
struct Data d2 = d1;

/* Modifying d2 does not change d1 */
d2.value = 50;
assert(d1.value != 50);

/* A pointer refers to the struct in memory, and can modify it by dereferencing,
   just like how `xp` modifies `x`. Use the -> operator to modify a struct ptr */
struct Data *dptr = &d1;
dptr->value = 50;
assert(d1.value == 50);

A pointer can be a single object, or it can be an array of objects.

      /* malloc gives us a memory address that is big enough to hold 3 integers.
 * the argument to malloc is the bytes of memory to alloc, which we can
 * calculate as 3 times the size of an int */
int *x = malloc(3 * sizeof int);
x[0] = 5;
x[1] = 6;
x[2] = 7;

/* whenever we allocate memory we should try to free it: */
free(x);
x = NULL; /* the address given to x by malloc is no longer valid so set it to NULL */

/* strings have always been just arrays of `char` */
char *string = "Hello, world!"
/* string[0] == 'H' */

A very common use-case for pointers is passing a struct to a function that we may want the function to modify. The default behavior in C when a struct is passed as a parameter is to copy the struct. This means if we change the struct inside a function, the struct outside the function won't have been modified. If we pass a pointer to the struct instead, then the function will modify the struct itself. That's why I pass a pointer to Array in the insert_array function for assignment 4. The Data struct can be passed as a value because it isn't modified by the function.

      void insert_array(struct Array *array, struct Data data) {
        if (array == NULL) {
            /* some error message about the array being uninitialized */
            return;
        }
        if (array->length == 64) {
            /* handle the case where we've overfilled the array */
        } else {
            int index = array->length;
            array->array[index] = data;
            array->length += 1;
        }
    }

Using an address sanitizer

We can catch many pointer errors in our programs if we use an address sanitizer. This is a library we build into our program to perform some checks every time we access memory, either through a pointer directly, or through an array index, which is really just accessing a pointer at an offset under the hood. For example, we might have a pointer that we don't actually make point to anything. C will put a random value into it, and when we try to access it, that random value will likely not point to anything in our program so the OS will kill it and report a SEGFAULT (Segmentation fault).

The OS makes sure any memory accesses our program makes actually are to addresses allocated to the program. If we try to access any memory whatsoever, the OS kills our program to protect the integrity of the system. Otherwise you could have processes accidentally changing the memory of other processes and causing all sorts of errors. So, for example:

/* create an integer pointer and try to dereference it */
int *x;
*x = 500; /* `x` doesn't point to anything in the program so SEGFAULT! */

We create an integer pointer and then immediately dereference it and set its contents to 500. Because the memory address in the pointer is random, it doesn't point into our process so the OS kills it with a SEGFAULT. We could also directly set the memory address of a pointer like:

      
/* a pointer is just a number, so it's valid to cast a number to a pointer right? */
*(char*) 0 = '\0'; /* SEGFAULT! */

And the same thing will most likely happen. Both of these examples will just cause the program to crash with no useful debugging information. We can give ourselves more to work with if we compile our code with the address sanitizer that comes with gcc and clang (on macOS) like so:

      gcc program.c -fsanitize=address -g -o program

Here I use the -fsanitize=address compiler flag to link the address sanitizer, and the -g flag to build debugging information into my executable. When compile and run the code above with the address sanitizer, rather than a SEGFAULT message, we get a detailed report about where the buggy pointer dereference occurred.


AddressSanitizer:DEADLYSIGNAL
=================================================================
==28220==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000107520f19 bp 0x7ff7b89e2800 sp 0x7ff7b89e27e0 T0)
==28220==The signal is caused by a WRITE memory access.
==28220==Hint: address points to the zero page.
    #0 0x107520f19 in main example.c:5
    #1 0x7ff802122417 in start+0x767 (dyld:x86_64+0xfffffffffff6e417) (BuildId: be14380c5e1836409fbe851beb014bba32000000200000000100000000070d00)

==28220==Register values:
rax = 0x0000000000000000  rbx = 0x0000000107520ee0  rcx = 0x00007ff7b89e2b50  rdx = 0x00007ff7b89e2ac0  
rdi = 0x0000000000000001  rsi = 0x00007ff7b89e2ab0  rbp = 0x00007ff7b89e2800  rsp = 0x00007ff7b89e27e0  
 r8 = 0x000000010751d3c0   r9 = 0x0000000000000000  r10 = 0x0000000000004000  r11 = 0x0000000000040000  
r12 = 0x00007ff7b89e29c8  r13 = 0x00007ff7b89e2a40  r14 = 0x00007ff7b89e2a00  r15 = 0x00007ff7b89e2890  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ex.c:5 in main
==28220==ABORTING

Here the useful part is the line telling us that the dereference occurred in example.c at line 5. It's usually enough to go back to that line and then trace where the pointer came from and what values it possibly could have to be able to diagnose the issue. In this case it's obvious because we're initializing the pointer to zero and then dereferencing it, but in real code it's most likely the pointer was never set to a value and then passed to a function or into a struct.

CMake and GitHub Action setup

The following is documentation for how the tests are set up with CMake and ran with GitHub Actions.

The CMakeLists.txt at the root of the repository creates a library target named libassignment of all source files not containing the main function. This way any functions other than main are available to the executable made for running the unit tests in the tests/ folder by linking to libassignment. It also adds tests/ as a subdirectory so that when CMake is run, it will look in that directory for more build instructions.

cmake_minimum_required(VERSION 3.13)
project(EECS348-Lab-7)

if (CMAKE_VERSION VERSION_GREATER_EQUAL "3.24.0")
    cmake_policy(SET CMP0135 NEW)
endif()

file(GLOB src *.c *.cpp)
list(FILTER src EXCLUDE REGEX ".*main\\.c(c|pp|xx)?")
add_subdirectory(tests)
add_library(assignment ${src})

The CMakeLists.txt in tests/ downloads the googletest repository and compiles it. It then compiles an executable with unit tests and links to libassignment to build your functions into the tests. The file mostly follows the example in https://google.github.io/googletest/quickstart-cmake.html.

include(FetchContent)
FetchContent_Declare(
  googletest
  # branch 1.16.x
  URL https://github.com/google/googletest/archive/e88cb95b92acbdce9b058dd894a68e1281b38495.zip
)
FetchContent_MakeAvailable(googletest)

enable_testing()
add_executable(tests tests.cpp)
target_include_directories(tests PRIVATE ..)
target_link_libraries(tests assignment GTest::gtest_main)
include(GoogleTest)
gtest_discover_tests(tests)

The workflow is found in .github/workflows/gtest.yml. It is triggered by any pushes to the main branch of your repository. Under jobs is the test job. It defines Ubuntu Linux as the environment to be ran in, and lists the steps to building and running googletest. Under the job named Build googletest are cmake instructions to execute the CMakeLists.txt files. Under the job named Run googletest is an export command to set the environment variable GTEST_COLOR to on, and then to change directory into the build directory and run the tests.

  name: Test
on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build googletest
        run: |
          cmake -S . -B build -DBUILD_GMOCK=OFF
          cmake --build build
      - name: Run googletest
        run: |
          export GTEST_COLOR=1
          cd build/tests && ctest --output-on-failure

An example of a failing job run can be found at https://github.com/hrhwilliams/EECS348_Lab-7-template/actions/runs/13823097362/job/38672730358. There, if you look at the Build googletest step you can see a log reporting why the build failed. If you look at lines 36-38, and 40-43, you can see the build tool reports undefined references to all of the functions the test file wants to call, because none of them are defined yet in any of the .c files in the template.