4
\$\begingroup\$

I wrote a generic dynamic array using the preprocessor. My use case for it is a compiler I'm writing that uses dynamic arrays a lot, and my current void pointer implementation is becoming a bit hard to use as I have to remember the type each array stores when using it. I'm wondering if this is a good readable implementation and interface.

I'm aware that using an existing library would most likely be better, but as this is for learning purposes, I want to implement all the data structures I use.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#define ARRAY_INITIAL_CAPACITY 8

#define Array(type) type *

// [capacity][used][data]...
#define initArray(array) do { \
    size_t *data = malloc(2 * sizeof(size_t) + ARRAY_INITIAL_CAPACITY * sizeof(*(array))); \
    data[0] =  ARRAY_INITIAL_CAPACITY; \
    data[1] = 0; \
    (array) = (typeof(*(array)) *)&data[2]; \
} while(0)

#define freeArray(array) free((size_t *)(array)-2)

#define arrayCapacity(array) ((array) ? *((size_t *)(array)-2) : 0lu)
#define arrayUsed(array) ((array) ? *((size_t *)(array)-1) : 0lu)

#define arrayEmpty(array) (arrayUsed(array) == 0)

#define arrayPush(array, value) do { \
    if(arrayUsed(array) + 1lu >= arrayCapacity(array)) { \
        size_t *data = ((size_t *)(array)-2); \
        size_t newCap = arrayCapacity(array) * 2 + 2 * sizeof(size_t); \
        data = realloc(data, newCap); \
        (array) = (typeof(*(array)) *)&data[2]; \
    } \
    size_t *used = ((size_t *)(array) - 1); \
    (array)[(*used)++] = (value); \
} while(0)

#define arrayPop(array) (assert(arrayUsed(array) > 0), (array)[--*((size_t *)(array) - 1)])
#define arrayGet(array, idx) (assert((idx) < arrayUsed(array)), (array)[(idx)])

// callback type has to be void (*)(T, U)
// where T is the type of the items in the array,
// and U is the type of the user_data argument.
#define arrayMap(array, callback, user_data) do { \
    for(size_t i = 0; i < arrayUsed(array); ++i) {\
        callback(array[i], (user_data)); \
    } \
} while(0)

#define arrayCopy(dest, src) do { \
    for(size_t i = 0; i < arrayUsed(src); ++i) { \
        arrayPush((dest), (src)[i]); \
    } \
} while(0)

Example usage:

static void callback(int item, void *user_data) {
    printf("%d\n", item);
}

int main(void) {
    Array(int) nums = NULL;
    initArray(nums);

    arrayPush(nums, 1);
    arrayPush(nums, 2);

    printf("capacity: %lu, used: %lu\n", arrayCapacity(nums), arrayUsed(nums));

    // Iterating
    for(int i = 0; i < arrayUsed(nums); ++i) {
        printf("%d\n", arrayGet(nums, i));
    }

    // Iterating in reverse
    for(int i = arrayUsed(nums)-1; i >= 0; --i) {
        printf("%d\n", nums[i]);
    }

    Array(int) nums2 = NULL;
    initArray(nums2);
    arrayCopy(nums2, nums);

    while(!arrayEmpty(nums)) {
        printf("%d\n", arrayPop(nums));
    }

    arrayMap(nums2, callback, NULL);

    freeArray(nums2);
    freeArray(nums);
    return 0;
}
New contributor
Itai Nelken is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
\$\endgroup\$

2 Answers 2

1
\$\begingroup\$

General Observations

I think this is a great learning project, and asking for a review is a great idea.

The keyword typeof is not part of the current C Programming Standard and that means the code isn't portable, it can only be compiled by compilers that have provided the extension.

Generally hiding memory allocation is a bad idea, it makes the code very hard to debug and maintain.

Expecting a function such as callback() to be defined in a macro can be problematic.

While using macros in C can sometimes greatly reduce the amount of code written, it is very difficult to maintain code written this way, writing, debugging and maintaining such code is very difficult and needs to be heavily documented.

Alternate Implementation

Possibly a better way to implement this is to have a struct that contains the current capacity, the current size of the array in use and a void pointer that points to the data. Allocating the array would be a 2 step process, first allocate the struct and then allocate the data portion. Have a header file that defines the struct and function prototypes that operate on the struct. Have a C file that contains the functions that operate on the struct.

Test for Possible Memory Allocation Errors

In modern high-level languages such as C++, memory allocation errors throw an exception that the programmer can catch. This is not the case in the C programming language. While it is rare in modern computers because there is so much memory, memory allocation can fail, especially if the code is working in a limited memory application such as embedded control systems. In the C programming language when memory allocation fails, the functions malloc(), calloc() and realloc() return NULL. Referencing any memory address through a NULL pointer results in undefined behavior (UB).

Possible unknown behavior in this case can be a memory page error (in Unix this would be call Segmentation Violation), corrupted data in the program and in very old computers it could even cause the computer to reboot (corruption of the stack pointer).

To prevent this undefined behavior a best practice is to always follow the memory allocation statement with a test that the pointer that was returned is not NULL.

\$\endgroup\$
3
  • \$\begingroup\$ In the compiler I'm writing (which is the use case for this array), I have wrapper functions for all the memory allocation functions that check for NULL. here for simplicity I replaced them with the regular libc malloc, calloc, and realloc. \$\endgroup\$ 6 hours ago
  • \$\begingroup\$ Your alternate implementation is pretty much what I'm currently using. I was wondering if the benefits this macro implementation are enough to use it over even though it is harder to debug and maintain. \$\endgroup\$ 6 hours ago
  • \$\begingroup\$ @ItaiNelken If you use the macros to call your struct functions it might be okay, otherwise the maintenance problems outweigh the benefits of using the macros. Please keep in mind when posting questions we want to review the actual code. \$\endgroup\$
    – pacmaninbw
    6 hours ago
1
\$\begingroup\$

Mis-matched specifier

#define arrayCapacity(array) ((array) ? *((size_t *)(array)-2) : 0lu) is a problem as the return type will be the wider of size_t and unsigned long. Also applies to arrayUsed().

printf("capacity: %lu, used: %lu\n", arrayCapacity(nums), arrayUsed(nums)); works well when the return type is unsigned long but risks UB otherwise.

Instead, return a specific type object and use a matching specifer.

#define arrayCapacity(array) ((array) ? *((size_t *)(array)-2) : (size_t)0)

printf("capacity: %zu, used: %zu\n", arrayCapacity(nums), arrayUsed(nums));`

Superfluous l

The l in arrayUsed(array) + 1lu >= arrayCapacity(array) serves no purpose. Suggest a cleaner 1u.

Mixed sign-ness and widths

for(int i = arrayUsed(nums)-1; i >= 0; --i) { incurs problems with extreme values. Stick to one type.

for(size_t i = arrayUsed(nums); i > 0;) {
    i--;
    printf("%d\n", nums[i]);
}

Bug

This is subtle.

(array) = (typeof(*(array)) *)&data[2]; may lead to UB.

OP's code assumes that the array element's can directly follow Capacity and Used - without padding. This is not certainly true as the array type may have an alignment greater than 2*sizeof(size_t). As this condition is rare, perhaps use a static_assert to test as re-working code to code is tricky.

Magic 8 ball?

Why 8 in #define ARRAY_INITIAL_CAPACITY 8? A naked 8 deserve explanation. 1 or 0 makes more sense.

Perhaps allow caller to override?

#ifndef ARRAY_INITIAL_CAPACITY
#define ARRAY_INITIAL_CAPACITY 1
#endif

Namespace

Rather than init..., free... items, use a consistent prefix array... like all the others.

// #define initArray(array)
#define arrayInit(array)

Codes uses array..., ARRAY..., ...Array and an unnamed file to hold it all. "array" is a very common word in coding.

Suggest a more unique name like dyarray for all to reduce collisions. Use dyarray... consistently to reduce scattering of namespace usage.


A clean way to handle the alignment issue mentioned above is to have a union before the array instead of 2 size_t. (or see align_as)

typedef union {
  struct {
    size_t used;
    size_t capacity;
  } s;
  array dummy;
} array_header;

Allocate with sizeof(array_header) + sizeof(array)*n. array_header will have any needed padding to put it in front of an array of array.

\$\endgroup\$
4
  • \$\begingroup\$ I don't understand the alignment issue. I can't think of any possible type that has padding before the data (structs/unions don't have padding before the first member as far as I know). \$\endgroup\$ 4 hours ago
  • \$\begingroup\$ @ItaiNelken Extreme example: Consider array as long double * where *array may have an alignment requirement of 16 and size_t is size 4. Then (array) = (typeof(*(array)) *)&data[2]; fails to form a valid pointer for array. It really comes down to how portable you want your code. \$\endgroup\$ 4 hours ago
  • \$\begingroup\$ I just tried the array with long double * as the type and it works fine. \$\endgroup\$ 3 hours ago
  • \$\begingroup\$ @Itai Nelken, Yes long double * works on your unnamed compiler/machine, but C does not specify that the alignment needs of long double are not more that 2 size_t. It can fail other places. Again, it really comes down to how portable you want your code. Do you want to code to spec or just your target compiler/machine? \$\endgroup\$ 29 mins ago

Your Answer

Itai Nelken is a new contributor. Be nice, and check out our Code of Conduct.

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.