Atlas blue cedar Cedro Español, English Sentido-Labs.com

Cedro is a C language extension that works as a pre-processor with eight features:

  1. The backstitch macro x@ f(), g(y);f(x); g(x, y); (related work).
  2. Deferred resource release auto ... or defer ... (related work).
  3. Break out of nested loops break label; (related work).
  4. Notation for array slices array[start..end] (related work).
  5. Block macros #define { ... #define }.
  6. Loop macros #foreach { ... #foreach } (related work).
  7. Binary inclusion #embed "..." (related work).
  8. Better number literals (12'34 | 12_341234, 0b10100xA).

To activate it, the source file must contain this line: #pragma Cedro 1.0
Otherwise, the file is copied directly to the output.

This line can contain certain options, for instance #pragma Cedro 1.0 defer to activate the use of defer instead of auto.

The source code (Apache 2.0 license) can be found at the library.
To compile it, see Compile.
cedro only uses the standard C functions, cedrocc and cedro-new require POSIX.

Usage: cedro [options] <file.c>… cedro new <name> # Runs: cedro-new <name> To read from stdin, put - instead of <file.c>. The result goes to stdout, can be compiled without intermediate file: cedro file.c | cc -x c - -o file It is what the cedrocc program does: cedrocc -o file file.c With cedrocc, the following options are the defaults: --insert-line-directives Code gets modified only after the line `#pragma Cedro 1.0`, which can include certain options: `#pragma Cedro 1.0 defer` --apply-macros Applies the macros: backstitch, defer, etc. (default) --no-apply-macros Does not apply the macros. --escape-ucn Escape non-ASCII in identifiers as UCN. --no-escape-ucn Does not escape non-ASCII in identifiers. (default) --discard-comments Discards the comments. --discard-space Discards all whitespace. --no-discard-comments Does not discard comments. (default) --no-discard-space Does not discard whitespace. (default) --insert-line-directives Insert `#line` directives. --no-insert-line-directives Does not insert `#line` directives. (default) --embed-as-string=<limit> Use string literals instead of bytes for files smaller than <limit>. Default value: 0 --c99 Produces source code for C99. (default) Removes the digit separators (“'” | “_”), converts binary literals into hexadecimal (“0b1010” → “0xA”), and expands `#embed`. It is not a translator between different C versions. --c89 Produces source code for C89/C90. For now it is the same as --c99. --c11 Produces source code for C11. For now it is the same as --c99. --c17 Produces source code for C17. For now it is the same as --c99. --c23 Produces source code for C23. For instance, maintains the digit separators (“'” | “_” → “'”), the binary literals (“0b1010” → “0b1010”), and does not expand the `#embed` directives. --print-markers Prints the markers. --no-print-markers Does not print the markers. (default) --benchmark Run a performance benchmark. --validate=ref.c Compares the input to the given “ref.c” file. Does not apply any macros: to compare the result of running Cedro on a file, pipe its output through this option, for instance: `cedro file.c | cedro - --validate=ref.c` --version Show version: 1.0 The corresponding “pragma” is: `#pragma Cedro 1.0`

The option --escape-ucn encodes Unicode® characters outside of the ASCII range, when they appear as part of an identifier, as C99 universal character names (“C99 standard”, page 65, “6.4.3 Universal character names”), which can be useful for older compilers without UTF-8 support such as GCC before version 10.

For API documentation, see doc/api/index.html after running make doc that requires having Doxygen installed.

Compile #compile

The simplest way is cc -o bin/cedro src/cedro.c, but it’s more convenient to use make:

$ make help Available targets: release: optimized build, assertions disabled, NDEBUG defined. → bin/cedro* debug: debugging build, assertions enabled. → bin/cedro*-debug static: same as release, only statically linked. → bin/cedro*-static doc: build documentation with Doxygen https://www.doxygen.org → doc/api/index.html test: build both release and debug, then run test suite. check: apply several static analysis tools: sparse: https://sparse.docs.kernel.org/ valgrind: https://valgrind.org/ gcc -fanalyzer: https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html clean: remove bin/ directory, and clean also inside doc/.

cedrocc #cedrocc

The second executable, cedrocc, allows using Cedro as if it was part of the C compiler.

Usage: cedrocc [options] <file.c> [<file2.o>…] Runs Cedro on the first file name that ends with “.c”, and compiles the result with “cc -x c - -x none” plus the other arguments. cedrocc -o file file.c cedro file.c | cc -x c - -o file The options get passed as is to the compiler, except for those that start with “--cedro:…” that correspond to cedro options, for instance “--cedro:escape-ucn” is like “cedro --escape-ucn”. The following option is the default: --cedro:insert-line-directives Some GCC options activate Cedro options automatically: --std=c90|c89|iso9899:1990|iso8999:199409 → --cedro:c90 --std=c99|c9x|iso9899:1999|iso9899:199x → --cedro:c99 --std=c11|c1x|iso9899:2011 → --cedro:c11 --std=c17|c18|iso9899:2017|iso9899:2018 → --cedro:c17 --std=c2x → --cedro:c23 In addition, for each `#include`, if it finds the file it reads it and if it finds `#pragma Cedro 1.0` processes it and inserts the result in place of the `#include`. You can specify the compiler, e.g. `gcc`: CEDRO_CC='gcc -x c - -x none' cedrocc … For debugging, this writes the code that would be piped into `cc`, into `stdout` instead: CEDRO_CC='' cedrocc …

If you get an error message like “embedding a directive within macro arguments is not portable” (GCC) or “embedding a directive within macro arguments has undefined behavior” (clang), it means that you’re using Cedro with --insert-line-directives inside the parameters for a macro. You can either expand the code given to the macro manually, or avoid --insert-line-directives by replacing cedrocc -o file file.c with cedro file.c | cc -x c - -o file.

cedrocc does another thing in addition to cedro file.c | cc -x c - -o file: for each #include, if it finds the file (-I ...) it looks inside for #pragma Cedro 1.0 and if found, processes the file and inserts the result in place of the #include. The reason is being able to compile in one go programs that use Cedro in several files, instead of having to transform each of them into temporary files to be compiled later.

cedro-new #cedro-new

There is a third executable, cedro-new, that produces a program draft in a similar way to cargo new in Rust. cedro new … will actually run cedro-new …. The content is produced from the template in the template/ directory, that gets included in the cedro-new executable at compilation.

Usage: cedro-new [options] <name> Creates a directory named <name>/ with the template. -h, --help Shows this message. -i, --interactive Asks for the names for command and project. Otherwise, they will be derived from the directory name.

When producing the draft, certain patterns get replaced in Makefile, README.md, and all the files under src/:

The template includes several project drafts, that can be activated in the generated Makefile:

Backstitch macro: @ #backstitch-macro

Threads a value through a sequence of function calls, as first parameter for each of them.

It is an explicit version of what other programming languages do to implement member functions, and the result is a usual pattern in C libraries.

Note: the @ symbol is not recognized when written as \u0040, but it gets converted to @ in the output. This serves to escape it when chaining Cedro with another pre-processor that uses it.

object @ f(a), g(b); f(object, a); g(object, b);
&object @ f(a), g(b); f(&object, a); g(&object, b);
object.field @ f(a), g(b); f(object.field, a); g(object.field, b);
int x = (object @ f(a), g(b)); int x = (f(object, a), g(object, b));

This is the C comma operator, it’s the same as

f(object, a); int x = g(object, b);
object @prefix_... f(a), g(b); prefix_f(object, a); prefix_g(object, b);
object @..._suffix f(a), g(b); f_suffix(object, a); g_suffix(object, b);
graphics_context @nvg... BeginPath(), Rect(100,100, 120,30), Circle(120,120, 5), PathWinding(NVG_HOLE), FillColor(nvgRGBA(255,192,0,255)), Fill();
nvgBeginPath(graphics_context); nvgRect(graphics_context, 100,100, 120,30); nvgCircle(graphics_context, 120,120, 5); nvgPathWinding(graphics_context, NVG_HOLE); nvgFillColor(graphics_context, nvgRGBA(255,192,0,255)); nvgFill(graphics_context);

For each comma-separated segment, if it starts with any of the tokens [, ++, --, ., ->, =, +=, -=, *=, /=, %=, <<=, >>=, &=, ^=, |=, or there is nothing that looks like a function call, the insertion point is the beginning of the segment:

number_array @ [3]=44, [2]=11; number_array[3]=44; number_array[2]=11;
*number_array++ @ = 1, = 2; *number_array++ = 1; *number_array++ = 2;
figure_center_point @ .x=44, .y=11; figure_center_point.x=44; figure_center_point.y=11;

Complex expressions can be used as prefixes by putting them to the left of the @ and leaving the ellipsis without prefix or suffix:

// ngx_http_sqlite_module.c#L802 (*chain->last)->buf->@ ... pos = u_str, last = u_str + ns.len, memory = 1;
// ngx_http_sqlite_module.c#L802 (*chain->last)->buf->pos = u_str; (*chain->last)->buf->last = u_str + ns.len; (*chain->last)->buf->memory = 1;

The object part can be left empty, which is useful for things like adding prefixes or suffixes to enumerations:

typedef enum { @TOKEN_... SPACE, WORD, NUMBER } TokenType; typedef enum { TOKEN_SPACE, TOKEN_WORD, TOKEN_NUMBER } TokenType;
// http://docs.libuv.org/en/v1.x/guide/threads.html#core-thread-operations // `hare` and `tortoise` are functions. int main() { int tracklen = 10; @uv_thread_... t hare_id, t tortoise_id, create(&hare_id, hare, &tracklen), create(&tortoise_id, tortoise, &tracklen), join(&hare_id), join(&tortoise_id); return 0; }
// http://docs.libuv.org/en/v1.x/guide/threads.html#core-thread-operations // `hare` and `tortoise` are functions. int main() { int tracklen = 10; uv_thread_t hare_id; uv_thread_t tortoise_id; uv_thread_create(&hare_id, hare, &tracklen); uv_thread_create(&tortoise_id, tortoise, &tracklen); uv_thread_join(&hare_id); uv_thread_join(&tortoise_id); return 0; }
function(a, @prefix_... b, c) function(a, prefix_b, prefix_c)

The segments part can also be left empty to add either a prefix or a suffix to an identifier:

Next(reader) @xmlTextReader...; xmlTextReaderNext(reader);
get(&vector, index) @..._Byte_vec; get_Byte_vec(&vector, index);
function(a, b @prefix_..., c) function(a, prefix_b, c)

It is a left-associative operator:

object @ f(a) @ g(b); g(f(object, a), b);
x @ one() @ two() @ three() @ four(); four(three(two(one(x))));

Looking for prior implementations of this idea I’ve found magma (2014), where it is called doto. It is a macro for the cmacro pre-processor which has the inconvenient of needing the Common Lisp SBCL compiler in addition to the C compiler.

Clojure also has a macro called doto which works in a similar manner, for instance to do f₁(x); f₂(x); f₃(x);:

Magmadotodoto macrodoto(x) { f₁(); f₂(); f₃(); }
Clojuredotodoto macro(doto x f₁ f₂ f₃)
Cedro@backstitch macrox @ f₁(), f₂(), f₃()

Functional languages often have a similar operator without the ability to thread a same value through several functions. For instance, the equivalent of f₃(f₂(f₁(x))):

Shell|pipe operatorecho x | f₁ | f₂ | f₃
Haskell&reverse application operatorx & f₁ & f₂ & f₃
OCaml|>reverse application operatorx |> f₁ |> f₂ |> f₃
Elixir|>pipe operatorx |> f₁ |> f₂ |> f₃
Clojure->threading macro(-> x f₁ f₂ f₃)
Cedro@backstitch macrox @ f₁() @ f₂() @ f₃()

Ada 2005 introduced a feature called prefixed-view notation that is more similar to C++ as the exact function being called can not be determined without knowing which methods are implemented for the object type.

Deferred resource release: #deferred-resource-release

Moves the clean-up code for a variable to the end of its scope including the exit points break, continue, goto, return.

In C, resources must be released back to the system explicitly once they are no longer needed, which usualy happens quite far from the place where they were allocated. As time passes and changes accumulate in the program, it’s easy to forget releasing them in all cases or to attempt releasing a resource twice.

Other programming languages have mechanisms for automatic resource release: C++ for instance, uses functions called destructors that get run implicitly when exiting a variable’s scope.

The programming language Go introduced an explicit notation called “defer” that fits better the style of C. The first difference is that in Go, all releases happen when exiting the function, while with Cedro the releases happen when exiting each block, like the destructors in C++ do.

There are more differences, such as for instance that in Go it can be used to modify the return value of the function, and that Cedro does not even try to handle longjmp(), exit(), thrd_exit() etc. because it could only apply the deferred actions in the current function, not in any function that called this one. See “A defer mechanism for C” (published academic paper as PDF in the SAC’21 conference) for a compiler-level implementation that does handle longjmp() and stack unwinding.

In Cedro, the release function is marked with the C keyword auto which is not needed in standard C code before C23 because it is the default and can be replaced with signed as it has the same effect.

It is also possible to use defer instead of auto adding the keyword “defer” to the “pragma”: #pragma Cedro 1.0 defer.

#pragma Cedro 1.0 … char* text = malloc(count + 1); if (!text) return ENOMEM; auto free(text); … if (file_name) { FILE* file = fopen(file_name, "w"); if (!file) return errno; auto fclose(file); fwrite(text, sizeof(char), count, file); … #pragma Cedro 1.0 defer … char* text = malloc(count + 1); if (!text) return ENOMEM; defer free(text); … if (file_name) { FILE* file = fopen(file_name, "w"); if (!file) return errno; defer fclose(file); fwrite(text, sizeof(char), count, file); …

In this example, there is a text store and a file that must be released back to the system:

#include <stdio.h> #include <stdlib.h> #include <errno.h> #pragma Cedro 1.0 int repeat_letter(char letter, size_t count, char* file_name) { char* text = malloc(count + 1); if (!text) return ENOMEM; auto free(text); for (size_t i = 0; i < count; ++i) { text[i] = letter; } text[count] = 0; if (file_name) { FILE* file = fopen(file_name, "w"); if (!file) return errno; auto fclose(file); fwrite(text, sizeof(char), count, file); fputc('\n', file); } printf("Repeated %lu times: %s\n", count, text); return 0; } int main(void) { return repeat_letter('A', 6, "aaaaaa.txt"); }

#include <stdio.h> #include <stdlib.h> #include <errno.h> int repeat_letter(char letter, size_t count, char* file_name) { char* text = malloc(count + 1); if (!text) return ENOMEM; for (size_t i = 0; i < count; ++i) { text[i] = letter; } text[count] = 0; if (file_name) { FILE* file = fopen(file_name, "w"); if (!file) { free(text); return errno; } fwrite(text, sizeof(char), count, file); fputc('\n', file); fclose(file); } printf("Repeated %lu times: %s\n", count, text); free(text); return 0; } int main(void) { return repeat_letter('A', 6, "aaaaaa.txt"); }

Compiling it with GCC or clang, on the left running explicitly the compiler, and on the right using cedrocc:

$ cedro repeat.c | cc -o repeat -x c - $ ./repeat Repeated 6 times: AAAAAA $ cat aaaaaa.txt AAAAAA $ valgrind --leak-check=yes ./repeat … ==8795== HEAP SUMMARY: ==8795== in use at exit: 0 bytes in 0 blocks ==8795== total heap usage: 4 allocs, 4 frees, 5,599 bytes allocated ==8795== ==8795== All heap blocks were freed -- no leaks are possible $ cedrocc -o repeat repeat.c $ ./repeat Repeated 6 times: AAAAAA $ cat aaaaaa.txt AAAAAA $ valgrind --leak-check=yes ./repeat … ==8795== HEAP SUMMARY: ==8795== in use at exit: 0 bytes in 0 blocks ==8795== total heap usage: 4 allocs, 4 frees, 5,599 bytes allocated ==8795== ==8795== All heap blocks were freed -- no leaks are possible

In this example adapted from “Proposal for C2x, WG14 ​n2542, Defer Mechanism for C” p. 40, the released resources are spin locks: (the difference of course is that in this case the spin_unlock() calls do not run after the panic)

/* Adapted from example in n2542.pdf#40 */ #pragma Cedro 1.0 int f1(void) { puts("g called"); if (bad1()) { return 1; } spin_lock(&lock1); auto spin_unlock(&lock1); if (bad2()) { return 1; } spin_lock(&lock2); auto spin_unlock(&lock2); if (bad()) { return 1; } /* Access data protected by the spinlock then force a panic */ completed += 1; unforced(completed); return 0; } /* Adapted from example in n2542.pdf#40 */ int f1(void) { puts("g called"); if (bad1()) { return 1; } spin_lock(&lock1); if (bad2()) { spin_unlock(&lock1); return 1; } spin_lock(&lock2); if (bad()) { spin_unlock(&lock2); spin_unlock(&lock1); return 1; } /* Access data protected by the spinlock then force a panic */ completed += 1; unforced(completed); spin_unlock(&lock2); spin_unlock(&lock1); return 0; }

Andrew Kelley compared resource management between C and his Zig programming language in a 2019 presentation titled “The Road to Zig 1.0” at 29:21s, and here I’ve re-created his C example using Cedro to produce the function as he showed it, except that Cedro does not know that the final for loop never ends so it adds unnecessary resource release code after it.

// Example retrofitted from C example by Andrew Kelley: // https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s #pragma Cedro 1.0 int main(int argc, char **argv) { struct SoundIo *soundio = soundio_create(); if (!soundio) { fprintf(stderr, "out of memory\n"); return 1; } auto soundio_destroy(soundio); int err; if ((err = soundio_connect(soundio))) { fprintf(stderr, "unable to connect: %s\n", soundio_strerror(err)); return 1; } soundio_flush_events(soundio); int default_output_index = soundio_default_output_device_index(soundio); if (default_output_index < 0) { fprintf(stderr, "No output device\n"); return 1; } struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index); if (!device) { fprintf(stderr, "out of memory\n"); return 1; } auto soundio_device_unref(device); struct SoundIoOutStream *outstream = soundio_outstream_create(device); if (!outstream) { fprintf(stderr, "out of memory\n"); return 1; } auto soundio_outstream_destroy(outstream); outstream->format = SoundIoFormatFloat32NE; outstream->write_callback = write_callback; if ((err = soundio_outstream_open(outstream))) { fprintf(stderr, "unable to open device: %s" "\n", soundio_strerror(err)); return 1; } if ((err = soundio_outstream_start(outstream))) { fprintf(stderr, "unable to start device: %s\n", soundio_strerror(err)); return 1; } for (;;) soundio_wait_events(soundio); } // Example retrofitted from C example by Andrew Kelley: // https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s int main(int argc, char **argv) { struct SoundIo *soundio = soundio_create(); if (!soundio) { fprintf(stderr, "out of memory\n"); return 1; } int err; if ((err = soundio_connect(soundio))) { fprintf(stderr, "unable to connect: %s\n", soundio_strerror(err)); soundio_destroy(soundio); return 1; } soundio_flush_events(soundio); int default_output_index = soundio_default_output_device_index(soundio); if (default_output_index < 0) { fprintf(stderr, "No output device\n"); soundio_destroy(soundio); return 1; } struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index); if (!device) { fprintf(stderr, "out of memory\n"); soundio_destroy(soundio); return 1; } struct SoundIoOutStream *outstream = soundio_outstream_create(device); if (!outstream) { fprintf(stderr, "out of memory\n"); soundio_device_unref(device); soundio_destroy(soundio); return 1; } outstream->format = SoundIoFormatFloat32NE; outstream->write_callback = write_callback; if ((err = soundio_outstream_open(outstream))) { fprintf(stderr, "unable to open device: %s" "\n", soundio_strerror(err)); soundio_outstream_destroy(outstream); soundio_device_unref(device); soundio_destroy(soundio); return 1; } if ((err = soundio_outstream_start(outstream))) { fprintf(stderr, "unable to start device: %s\n", soundio_strerror(err)); soundio_outstream_destroy(outstream); soundio_device_unref(device); soundio_destroy(soundio); return 1; } for (;;) soundio_wait_events(soundio); soundio_outstream_destroy(outstream); soundio_device_unref(device); soundio_destroy(soundio); }

However, his Zig example had the unfair advantage of returning error values instead of printing error messages which takes more space. The following is a C function that matches the Zig version more closely:

// Example retrofitted from Zig example by Andrew Kelley: // https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s #pragma Cedro 1.0 int main(int argc, char **argv) { struct SoundIo *soundio = soundio_create(); if (!soundio) { return SoundIoErrorNoMem; } auto soundio_destroy(soundio); int err; if ((err = soundio_connect(soundio))) return err; soundio_flush_events(soundio); const int default_output_index = soundio_default_output_device_index(soundio); if (default_output_index < 0) return SoundIoErrorNoSuchDevice; const struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index); if (!device) return SoundIoErrorNoMem; auto soundio_device_unref(device); const struct SoundIoOutStream *outstream = soundio_outstream_create(device); if (!outstream) return SoundIoErrorNoMem; auto soundio_outstream_destroy(outstream); outstream->format = SoundIoFormatFloat32NE; outstream->write_callback = write_callback; if ((err = soundio_outstream_open(outstream))) return err; if ((err = soundio_outstream_start(outstream))) return err; while (true) soundio_wait_events(soundio); } // Example retrofitted from Zig example by Andrew Kelley: // https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s int main(int argc, char **argv) { struct SoundIo *soundio = soundio_create(); if (!soundio) { return SoundIoErrorNoMem; } int err; if ((err = soundio_connect(soundio))) { soundio_destroy(soundio); return err; } soundio_flush_events(soundio); const int default_output_index = soundio_default_output_device_index(soundio); if (default_output_index < 0) { soundio_destroy(soundio); return SoundIoErrorNoSuchDevice; } const struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index); if (!device) { soundio_destroy(soundio); return SoundIoErrorNoMem; } const struct SoundIoOutStream *outstream = soundio_outstream_create(device); if (!outstream) { soundio_device_unref(device); soundio_destroy(soundio); return SoundIoErrorNoMem; } outstream->format = SoundIoFormatFloat32NE; outstream->write_callback = write_callback; if ((err = soundio_outstream_open(outstream))) { soundio_outstream_destroy(outstream); soundio_device_unref(device); soundio_destroy(soundio); return err; } if ((err = soundio_outstream_start(outstream))) { soundio_outstream_destroy(outstream); soundio_device_unref(device); soundio_destroy(soundio); return err; } while (true) soundio_wait_events(soundio); soundio_outstream_destroy(outstream); soundio_device_unref(device); soundio_destroy(soundio); }

The Cedro version is much closer, but his point still stands because the plain C version needs a lot of repeated code and is more fragile. And of course Zig has many other great features.

Apart from the already mentioned “A defer mechanism for C”, there are macros that use a for loop as for (allocation and initialization; condition; release) { actions } [1] or other techniques [2].

[1] “P99 Scope-bound resource management with for-statements” from the same author (2010), “Would it be possible to create a scoped_lock implementation in C?” (2016), ”C compatible scoped locks“ (2021), “Modern C and What We Can Learn From It - Luca Sas [ ACCU 2021 ] 00:17:18”, 2021
[2] “Would it be possible to create a scoped_lock implementation in C?” (2016), “libdefer: Go-style defer for C” (2016), “A Defer statement for C” (2020), “Go-like defer for C that works with most optimization flag combinations under GCC/Clang” (2021)

Compilers like GCC and clang have non-standard features to do this like the __cleanup__ variable attribute.

Cedro does not have the limitation of the deferred code having to be a function: it can be a code block, with or without conditionals, which allows for instance to emulate Zig’s errdefer by performing different actions in case of error:

char* allocate_block(size_t n, char** err_p) { char* result = malloc(n); auto if (*err_p) { free(result); result = NULL; } if (n > 10) { *err_p = "n is too big"; } return result; } char* allocate_block(size_t n, char** err_p) { char* result = malloc(n); if (n > 10) { *err_p = "n is too big"; } if (*err_p) { free(result); result = NULL; } return result; }

Break out of nested loops: #label-break

Converts break label; or continue label; into goto label;. In C it’s only possible to break out of one loop at a time when using break, which is also a problem when the interruption comes from a switch block.

#include <stdio.h> #include <stdlib.h> #pragma Cedro 1.0 int main(int argc, char* argv[]) { int x = 0, y = 0; for (x = 0; next_x: x < 100; ++x) { for (y = 0; y < 100; ++y) { switch (x + y) { case 157: break found_number_decomposition; case 11: x = 37; fprintf(stderr, "Jump from x=11 to x=%d\n", x); --x; continue next_x; } } } found_number_decomposition: if (x < 100 || y < 100) { fprintf(stderr, "Found %d = %d + %d\n", x + y, x, y); } return 0; }

#include <stdio.h> #include <stdlib.h> int main(int argc, char* argv[]) { int x = 0, y = 0; for (x = 0; x < 100; ++x) { for (y = 0; y < 100; ++y) { switch (x + y) { case 157: goto found_number_decomposition; case 11: x = 37; fprintf(stderr, "Jump from x=11 to x=%d\n", x); --x; goto next_x; } } next_x:; } found_number_decomposition: if (x < 100 || y < 100) { fprintf(stderr, "Found %d = %d + %d\n", x + y, x, y); } return 0; }

The difference between break …, continue …, and goto … is in the restrictions:

It is part of the deferred resource release macro:

#include <stdio.h> #include <stdlib.h> #pragma Cedro 1.0 int main(int argc, char* argv[]) { int x = 0, y = 0; char *level1 = malloc(1); auto free(level1); for (x = 0; next_x: x < 100; ++x) { char *level2 = malloc(2); auto free(level2); for (y = 0; y < 100; ++y) { char *level3 = malloc(3); auto free(level3); switch (x + y) { case 157: break found_number_decomposition; case 11: x = 37; fprintf(stderr, "Jump from x=11 to x=%d\n", x); continue next_x; } } } found_number_decomposition: if (x < 100 || y < 100) { fprintf(stderr, "Found %d = %d + %d\n", x + y, x, y); } return 0; }

#include <stdio.h> #include <stdlib.h> int main(int argc, char* argv[]) { int x = 0, y = 0; char *level1 = malloc(1); for (x = 0; x < 100; ++x) { char *level2 = malloc(2); for (y = 0; y < 100; ++y) { char *level3 = malloc(3); switch (x + y) { case 157: free(level3); free(level2); goto found_number_decomposition; case 11: x = 37; fprintf(stderr, "Jump from x=11 to x=%d\n", x); free(level3); goto next_x; } free(level3); } next_x:; free(level2); } found_number_decomposition: if (x < 100 || y < 100) { fprintf(stderr, "Found %d = %d + %d\n", x + y, x, y); } free(level1); return 0; }

Using goto in general it can’t be guaranteed that the resources will be released correctly, but with the restrictions when using break … and continue … it does work.

$ bin/cedrocc test/defer-label-break.c -std=c99 -pedantic-errors -Wall -fanalyzer -o /tmp/find-number-decomposition $ valgrind --leak-check=yes /tmp/find-number-decomposition … Jump from x=11 to x=37 Found 157 = 58 + 99 ==1077== ==1077== HEAP SUMMARY: ==1077== in use at exit: 0 bytes in 0 blocks ==1077== total heap usage: 2,236 allocs, 2,236 frees, 6,683 bytes allocated ==1077== ==1077== All heap blocks were freed -- no leaks are possible ==1077== ==1077== For lists of detected and suppressed errors, rerun with: -s ==1077== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The BLISS 11 programming language was the first that introduced labels for its leave keyword (analogous to C’s break) around 1974, and later other languages like Java, Javascript, and Go did the same with continue and break.

Notation for array slices: #slice-notation

Converts array[start..end] into &array[start], &array[end]. The array/pointer value array can be just an identifier or in general an expression which, as it will be evaluated twice, must not have any side effects, just like standard C preprocessor macros.

We can use it to improve the vector example from the loop macro section:

append(&words, animals[0..2]); append(&words, plants[0..3]); append(&words, animals[2..4]); append(&words, &animals[0], &animals[2]); append(&words, &plants[0], &plants[3]); append(&words, &animals[2], &animals[4]);

The end of the slice can have a positive sign to indicate that it is a relative position to the start of the slice: array[start..+end] becomes &array[start], &array[start+end]. In this case, the advice about double execution of side effects applies to start in addition to array.

append(&words, animals[0..+2]); append(&words, plants[0..3]); append(&words, animals[2..+2]); append(&words, &animals[0], &animals[0+2]); append(&words, &plants[0], &plants[3]); append(&words, &animals[2], &animals[2+2]);

If the slice is composed of more than one token, it will be wrapped in parentheses to make sure it is correct.

append(&words, (uint8_t*)animals[0..+2]); append(&words, (uint8_t*)plants[0..3]); append(&words, (uint8_t*)animals[2..+2]); append(&words, &((uint8_t*)animals)[0], &((uint8_t*)animals)[0+2]); append(&words, &((uint8_t*)plants)[0], &((uint8_t*)plants)[3]); append(&words, &((uint8_t*)animals)[2], &((uint8_t*)animals)[2+2]);

It can be used in initializers, in which case the braces can be omitted:

#include <stdio.h> #pragma Cedro 1.0 typedef struct { const char* a; const char* b; } char_slice_t; const char* text = "uno dos tres"; /** Extract "dos" from `text`. */ int main(void) { char_slice_t slice = text[4..+3]; const char* cursor; for (cursor = slice.a; cursor != slice.b; ++cursor) { putc(*cursor, stderr); } putc('\n', stderr); }

#include <stdio.h> typedef struct { const char* a; const char* b; } char_slice_t; const char* text = "uno dos tres"; /** Extract "dos" from `text`. */ int main(void) { char_slice_t slice = { &text[4], &text[4+3] }; const char* cursor; for (cursor = slice.a; cursor != slice.b; ++cursor) { putc(*cursor, stderr); } putc('\n', stderr); }

This complete example shows a scrolling text marquee:

#ifdef _WIN32 #include <windows.h> // See https://stackoverflow.com/a/3930716/ int sleep_ms(int ms) { Sleep(ms); return 0; } #else #define _XOPEN_SOURCE 500 #include <unistd.h> // Deprecated but simple: int sleep_ms(int ms) { return usleep(ms*1000); } #endif #include <stdlib.h> #include <stdio.h> #include <string.h> #pragma Cedro 1.0 typedef char utf8_char[5]; // As C string. void print_slice(const utf8_char* start, const utf8_char* end) { while (start < end) fputs(*start++, stdout); } int main(int argc, char* argv[]) { size_t display_size = 8; if (argc < 2) { fprintf(stderr, "Usage: marquee <text>\n", display_size); exit(1); } const char separator[] = " *** "; size_t len_bytes = strlen(argv[1]); size_t len_separator = strlen(separator); char* m = malloc(len_bytes + len_separator); auto free(m); memcpy(m, argv[1], len_bytes); memcpy(m + len_bytes, separator, len_separator); len_bytes += len_separator; // Extract each character encoded as UTF-8, // which needs up to 4 bytes + 1 terminator. utf8_char* message = malloc(sizeof(utf8_char) * len_bytes); auto free(message); size_t len = 0; for (size_t end = 0; end < len_bytes;) { const char b = m[end]; size_t u; if (0xF0 == (b & 0xF8)) u = 4; else if (0xE0 == (b & 0xF0)) u = 3; else if (0xC0 == (b & 0xE0)) u = 2; else u = 1; if (end + u > len_bytes) break; memcpy(&message[len], &m[end], u); message[end][u] = '\0'; end += u; ++len; } if (len < 2) { fprintf(stderr, "message text is too short.\n"); exit(2); } else if (len < display_size) { display_size = len - 1; } for (;;) { for (int i = 0; i < len; ++i) { int rest = i + display_size > len? i + display_size - len: 0; int visible = display_size - rest; print_slice(message[i .. +visible]); print_slice(message[0 .. rest]); putc('\r', stdout); fflush(stdout); sleep_ms(300); } } return 0; }



#ifdef _WIN32 #include <windows.h> // See https://stackoverflow.com/a/3930716/ int sleep_ms(int ms) { Sleep(ms); return 0; } #else #define _XOPEN_SOURCE 500 #include <unistd.h> // Deprecated but simple: int sleep_ms(int ms) { return usleep(ms*1000); } #endif #include <stdlib.h> #include <stdio.h> #include <string.h> typedef char utf8_char[5]; // As C string. void print_slice(const utf8_char* start, const utf8_char* end) { while (start < end) fputs(*start++, stdout); } int main(int argc, char* argv[]) { size_t display_size = 8; if (argc < 2) { fprintf(stderr, "Usage: marquee <text>\n", display_size); exit(1); } const char separator[] = " *** "; size_t len_bytes = strlen(argv[1]); size_t len_separator = strlen(separator); char* m = malloc(len_bytes + len_separator); memcpy(m, argv[1], len_bytes); memcpy(m + len_bytes, separator, len_separator); len_bytes += len_separator; // Extract each character encoded as UTF-8, // which needs up to 4 bytes + 1 terminator. utf8_char* message = malloc(sizeof(utf8_char) * len_bytes); size_t len = 0; for (size_t end = 0; end < len_bytes;) { const char b = m[end]; size_t u; if (0xF0 == (b & 0xF8)) u = 4; else if (0xE0 == (b & 0xF0)) u = 3; else if (0xC0 == (b & 0xE0)) u = 2; else u = 1; if (end + u > len_bytes) break; memcpy(&message[len], &m[end], u); message[end][u] = '\0'; end += u; ++len; } if (len < 2) { fprintf(stderr, "message text is too short.\n"); exit(2); } else if (len < display_size) { display_size = len - 1; } for (;;) { for (int i = 0; i < len; ++i) { int rest = i + display_size > len? i + display_size - len: 0; int visible = display_size - rest; print_slice(&message[i], &message[i+visible]); print_slice(&message[0], &message[rest]); putc('\r', stdout); fflush(stdout); sleep_ms(300); } } free(message); free(m); return 0; }

The notation [a..b] for array slices was first defined in Algol 68 where it was an alternative to the primary notation [a:b], and both have been adopted by other languages since then. The [a..b] form is used in Ada, Perl, D, and Rust, for example.

Block macros: #block-macros

Formats a multi-line macro into a single line.

C macros must be written all in one line, but some times you need to split them in several pseudo-lines and it gets tedious and error-prone to maintain all the newline escapes \.

By adding braces { or } right after #define we can have Cedro do that for us:

#define { macro(A, B, C) /// Version of f() for type A. f_##A(B, C) /// Without semicolon “;” at the end. #define } int main(void) { int x = 1, y = 2; macro(int, x, y); // → f_int(x, y); } #define macro(A, B, C) \ /** Version of f() for type A. */ \ f_##A(B, C) /** Without semicolon “;” at the end. */ \ /* End #define */ int main(void) { int x = 1, y = 2; macro(int, x, y); // → f_int(x, y); }

In cases like this (“function-like macros”), since there is no semicolon after f_##A(B, C) tools such as source code editors (e.g. Emacs) indent the code incorrectly.

The solution is to leave it there for the editor, and add it too after #define } as #define }; which directs Cedro to remove it from the definition.

#define { macro(A, B, C) /// Version of f() for type A. f_##A(B, C); /// Semicolon “;” removed by Cedro. #define }; int main(void) { int x = 1, y = 2; macro(int, x, y); // → f_int(x, y); } #define macro(A, B, C) \ /** Version of f() for type A. */ \ f_##A(B, C) /** Semicolon “;” removed by Cedro. */ \ /* End #define */ int main(void) { int x = 1, y = 2; macro(int, x, y); // → f_int(x, y); }

Preprocessor directives are not allowed inside macros, so you can not use #if, #include, etc.

Note: the directive must start exactly with #define { or #define }, with no more or less space between #define and the brace { or }.

Loop macros: #loop-macros

Repeat the lines between #foreach { ... and #foreach } replacing the given variables. These lines may contain macro definitions (#define) but the loop variables will not be expanded inside them.

As in #define, the ## serves to join tokens so that if, for example, T is float, Vec_##T produces Vec_float.

Inside the loop, any operator following a # (e.g. #,) gets omitted in the last iteration.

typedef enum { #foreach { V {SPACE, NUMBER, \ KEYWORD, IDENTIFIER, OPERATOR} T_##V#, #foreach } } TokenType; typedef enum { T_SPACE, T_NUMBER, T_KEYWORD, T_IDENTIFIER, T_OPERATOR } TokenType;

If a variable has the prefix #, the result is a string with its contents: if T is float, char* name = #T; produces char* name = "float";.

Loops can be nested, and the list of values can come from a variable defined in an outer loop.

#foreach { VALUES {{SPACE, NUMBER, \ KEYWORD, IDENTIFIER, OPERATOR}} typedef enum { #foreach { V VALUES T_##V#, #foreach } } TokenType; const char* const TokenType_STRING[] = { #foreach { V VALUES #V#, #foreach } }; #foreach } typedef enum { T_SPACE, T_NUMBER, T_KEYWORD, T_IDENTIFIER, T_OPERATOR } TokenType; const char* const TokenType_STRING[] = { "SPACE", "NUMBER", "KEYWORD", "IDENTIFIER", "OPERATOR" };

It is possible to iterate several variables in parallel using tuples of variables and values, that must have the same number of elements:

#foreach { {TYPE, PREFIX, VALUES} { \ { TokenType, T_, {SPACE, NUMBER, \ KEYWORD, IDENTIFIER, OPERATOR} }, \ { PinConfig, M_, {INPUT, OUTPUT} } \ } typedef enum { #foreach { V VALUES PREFIX##V#, #foreach } } TYPE; const char* const TYPE##_STRING[] = { #foreach { V VALUES #V#, #foreach } }; #foreach } typedef enum { T_SPACE, T_NUMBER, T_KEYWORD, T_IDENTIFIER, T_OPERATOR } TokenType; const char* const TokenType_STRING[] = { "SPACE", "NUMBER", "KEYWORD", "IDENTIFIER", "OPERATOR" }; typedef enum { M_INPUT, M_OUTPUT } PinConfig; const char* const PinConfig_STRING[] = { "INPUT", "OUTPUT" };

This example iterates over a list of fields to build a struct and the corresponding print_...() function.

#include <stdio.h> #include <stdlib.h> #include <stdint.h> #pragma Cedro 1.0 void print_double(double n, FILE* out) { fprintf(out, "%f", n); } typedef uint32_t Colour_ARGB; void print_Colour_ARGB(Colour_ARGB c, FILE* out) { if ((c & 0xFF000000) == 0xFF000000) { fprintf(out, "#%.6X", c & 0x00FFFFFF); } else { fprintf(out, "#%.8X", c); } } #foreach { FIELDS {{ \ { double, x, /** X position. */ }, \ { double, y, /** Y position. */ }, \ { Colour_ARGB, colour, /** Colour 32 bit. */ } \ }} typedef struct Point { #foreach { {TYPE, NAME, COMMENT} FIELDS TYPE NAME; COMMENT #foreach } } Point; void print_Point(Point point, FILE* out) { #foreach { {TYPE, NAME, COMMENT} FIELDS print_##TYPE(point.NAME, out); putc('\n', out); #foreach } } #foreach } int main(void) { Point point = { .x = 12.3, .y = 4.56, .colour = 0xFFa3f193 }; print_Point(point, stderr); }
#include <stdio.h> #include <stdlib.h> #include <stdint.h> void print_double(double n, FILE* out) { fprintf(out, "%f", n); } typedef uint32_t Colour_ARGB; void print_Colour_ARGB(Colour_ARGB c, FILE* out) { if ((c & 0xFF000000) == 0xFF000000) { fprintf(out, "#%.6X", c & 0x00FFFFFF); } else { fprintf(out, "#%.8X", c); } } typedef struct Point { double x; /** X position. */ double y; /** Y position. */ Colour_ARGB colour; /** Colour 32 bit. */ } Point; void print_Point(Point point, FILE* out) { print_double(point.x, out); putc('\n', out); print_double(point.y, out); putc('\n', out); print_Colour_ARGB(point.colour, out); putc('\n', out); } int main(void) { Punto punto = { .x = 12.3, .y = 4.56, .color = 0xFFa3f193 }; imprime_Punto(punto, stderr); }

This complete example defines variants for the types float, str, and cstr of an array/vector of variable length with a function called append_Vec_##T() for each one. It then uses C11’s _Generic to define a pseudo-polymorphic append() function.

#include <stdint.h> #include <stdbool.h> #include <stdlib.h> #include <string.h> // For memcpy(). typedef struct str { uint8_t* start; uint8_t* end; } str; typedef char* cstr; // Type names must be single words. #pragma Cedro 1.0 #foreach { TYPES_LIST {{float, str, cstr}} #foreach { T TYPES_LIST /** Vector (resizeable array) type. */ typedef struct { T* _; size_t len; size_t capacity; } Vec_##T; /** Append a slice to a vector of elements of this type. */ bool append_Vec_##T(Vec_##T *v, const T *start, const T *end) { const size_t to_add = (size_t)(end - start); if (v->len + to_add > v->capacity) { const size_t new_capacity = v->len + (to_add < v->len? v->len: to_add); T * const new_start = realloc(v->_, new_capacity * sizeof(T)); if (!new_start) return false; v->_ = new_start; v->capacity = new_capacity; } memcpy(v->_ + v->len, start, to_add * sizeof(T)); v->len += to_add; return true; } #foreach } #foreach { DEFINE {#define} // Avoid joining lines. DEFINE append(VEC, START, END) _Generic((VEC), \ #foreach { T TYPES_LIST Vec_##T*: append_Vec_##T#, \ #foreach } )(VEC, START, END) #foreach } #foreach } #include <stdio.h> int main(void) { Vec_cstr words = {0}; cstr animals[] = { "horse", "cat", "chicken", "dog" }; cstr plants [] = { "radish", "wheat", "tomato" }; append(&words, &animals[0], &animals[2]); append(&words, &plants[0], &plants[3]); append(&words, &animals[2], &animals[4]); for (cstr *w = words._, *end = words._ + words.len; w != end; ++w) { fprintf(stderr, "Word: \"%s\"\n", *w); } return 0; } #include <stdint.h> #include <stdbool.h> #include <stdlib.h> #include <string.h> // For memcpy(). typedef struct str { uint8_t* start; uint8_t* end; } str; typedef char* cstr; // Type names must be single words. /** Vector (resizeable array) type. */ typedef struct { float* _; size_t len; size_t capacity; } Vec_float; /** Append a slice to a vector of elements of this type. */ bool append_Vec_float(Vec_float *v, const float *start, const float *end) { const size_t to_add = (size_t)(end - start); if (v->len + to_add > v->capacity) { const size_t new_capacity = v->len + (to_add < v->len? v->len: to_add); float * const new_start = realloc(v->_, new_capacity * sizeof(float)); if (!new_start) return false; v->_ = new_start; v->capacity = new_capacity; } memcpy(v->_ + v->len, start, to_add * sizeof(float)); v->len += to_add; return true; } /** Vector (resizeable array) type. */ typedef struct { str* _; size_t len; size_t capacity; } Vec_str; /** Append a slice to a vector of elements of this type. */ bool append_Vec_str(Vec_str *v, const str *start, const str *end) { const size_t to_add = (size_t)(end - start); if (v->len + to_add > v->capacity) { const size_t new_capacity = v->len + (to_add < v->len? v->len: to_add); str * const new_start = realloc(v->_, new_capacity * sizeof(str)); if (!new_start) return false; v->_ = new_start; v->capacity = new_capacity; } memcpy(v->_ + v->len, start, to_add * sizeof(str)); v->len += to_add; return true; } /** Vector (resizeable array) type. */ typedef struct { cstr* _; size_t len; size_t capacity; } Vec_cstr; /** Append a slice to a vector of elements of this type. */ bool append_Vec_cstr(Vec_cstr *v, const cstr *start, const cstr *end) { const size_t to_add = (size_t)(end - start); if (v->len + to_add > v->capacity) { const size_t new_capacity = v->len + (to_add < v->len? v->len: to_add); cstr * const new_start = realloc(v->_, new_capacity * sizeof(cstr)); if (!new_start) return false; v->_ = new_start; v->capacity = new_capacity; } memcpy(v->_ + v->len, start, to_add * sizeof(cstr)); v->len += to_add; return true; } #define append(VEC, START, END) _Generic((VEC), \ Vec_float*: append_Vec_float, \ Vec_str*: append_Vec_str, \ Vec_cstr*: append_Vec_cstr \ )(VEC, START, END) #include <stdio.h> int main(void) { Vec_cstr words = {0}; cstr animals[] = { "horse", "cat", "chicken", "dog" }; cstr plants [] = { "radish", "wheat", "tomato" }; append(&words, &animals[0], &animals[2]); append(&words, &plants[0], &plants[3]); append(&words, &animals[2], &animals[4]); for (cstr *w = words._, *end = words._ + words.len; w != end; ++w) { fprintf(stderr, "Word: \"%s\"\n", *w); } return 0; }

This can be done without Cedro with a technique called “X Macros”.

The X macro technique was used extensively in the operating system and utilities for the DECsystem-10 as early as 1968, and probably dates back further to PDP-1 and TX-0 programmers at MIT.

Randy Meyers in «The New C: X Macros», Dr.Dobb’s 2001-05-01
CedroX macro
typedef enum { #foreach { V { \ SPACE, NUMBER, \ KEYWORD, IDENTIFIER, OPERATOR \ } T_##V#, #foreach } } TokenType; #define LIST_OF_VARIABLES \ X(SPACE), X(NUMBER), \ X(KEYWORD), X(IDENTIFIER), X(OPERATOR) typedef enum { #define X(V) T_##V LIST_OF_VARIABLES #undef X } TokenType; #undef LIST_OF_VARIABLES
#foreach { VALUES {{ \ SPACE, NUMBER, \ KEYWORD, IDENTIFIER, OPERATOR \ }} typedef enum { #foreach { V VALUES T_##V#, #foreach } } TokenType; const char* const TokenType_STRING[] = { #foreach { V VALUES #V#, #foreach } }; #foreach } #define LIST_OF_VARIABLES \ X(SPACE), X(NUMBER), \ X(KEYWORD), X(IDENTIFIER), X(OPERATOR) typedef enum { #define X(V) T_##V LIST_OF_VARIABLES #undef X } TokenType; const char* const TokenType_STRING[] = { #define X(V) #V LIST_OF_VARIABLES #undef X }; #undef LIST_OF_VARIABLES

Binary inclusion: #binary-include

Inserts a file as a byte array, or as a literal string as described below.

The name of the file is relative to the including C file.

#include <stdint.h> #pragma Cedro 1.0 const uint8_t image[] = { #embed "../images/cedro-32x32.png" };
#include <stdint.h> const uint8_t image[] = { /* cedro-32x32.png */ 0x89,0x50,0x4E,0x47,0x0D,0x0A,0x1A,… 0x00,0x00,0x00,0x20,0x00,0x00,0x00,… ⋮ };

The form #include {...} was the original in Cedro and is kept for now because of backwards compatibility, while #embed "..." follows the future C23 standard (“N3017 #embed - a scannable, tooling-friendly binary resource inclusion mechanism”) except that it only accepts paths delimited by quotes (""), not delimited by chevrons (<>).

Note: when using #include { there must be no more or less space between #include and the brace {.

#include <stdint.h> #pragma Cedro 1.0 const uint8_t image #include {../images/cedro-32x32.png} ;
This form is obsolete, and will be removed.

#include <stdint.h> const uint8_t image [1559] = { /* cedro-32x32.png */ 0x89,0x50,0x4E,0x47,0x0D,0x0A,0x1A,… 0x00,0x00,0x00,0x20,0x00,0x00,0x00,… ⋮ };

#embed is more flexible because it allows adding bytes before or after the inserted file, and also to combine several files.

#pragma Cedro 1.0 const char const vertex_shader[] = { #embed "shader.frag.glsl" , 0x00 // Zero-terminated string. };
const char const vertex_shader[] = { /* shader.frag.glsl */ 0x23,0x76,0x65,0x72,0x73,0x69,0x6F,0x6E,… 0x65,0x63,0x69,0x73,0x69,0x6F,0x6E,0x20,… ⋮ , 0x00 // Zero-terminated string. };
#pragma Cedro 1.0 const char const two_lines[] = { #embed "text-line-1.txt" , '\n', #embed "text-line-2.txt" , 0x00 };
const char const two_lines[] = { 0x46,0x69,0x72,0x73,0x74,0x20,0x6C,0x69,0x6E,0x65,0x2E , '\n', 0x53,0x65,0x63,0x6F,0x6E,0x64,0x20,0x6C,0x69,0x6E,0x65,0x2E , 0x00 };

Embed as string

Instead of inserting byte literals one by one, they can be put all at once in a literal string with the option --embed-as-string=<limit>, although this variant has certain limitations depending on the C compiler that will be used with the result.

For instance, the Microsoft C compiler has a limit of 2048 bytes for each string literal, with a maximum of 65535 after concatenating the strings that appear together. (See “Maximum String Length”)

The ANSI/ISO C standard requires that all compilers accept at least 509 bytes in total after concatenation in C89, and 4096 in C99/C11/C17, but other compilers such as GCC and clang allow much bigger strings.

That’s why it’s necessary to specify a limit for the size: if the file goes over that limit, the result will be an array of bytes instead of a string.

In an informal test /usr/bin/time -v, taking the fastest run values with an 8 MB file, on an IBM Power9 CPU with the files on RAM disk, the code generation took 26% less time than when using hexadecimal bytes, and the compilation with GCC was 28 times faster using 10 times less memory. With clang the compilation was 72 times faster than with bytes, using 7 times less memory.

Generate codeCompile with GCC 11Compile with clang 12
cedro0.19 s1.80 MB27.66 s1328.13 MB30.44 s984.59 MB
cedro --embed-as-string=…0.14 s1.63 MB0.99 s125.32 MB0.42 s144.95 MB
bin2c0.03 s1.37 MB0.90 s108.70 MB0.52 s145.73 MB

In comparison with bin2c, the code generation took five times more time, and the compilation is very similar (±100ms, bin2c’s result compiles faster on GCC, Cedro’s on clang) because the format is almost the same.

#pragma Cedro 1.0 const char const two_lines[] = { #embed "text-line-1.txt" , '\n', #embed "text-line-2.txt" , 0x00 }; --embed-as-string=30 const char const two lines[25] = /* text-line-1.txt */ "First line.""\n" /* text-line-2.txt */ "Second line.";

Note how Cedro realizes that the last byte is zero and removes it, because since there is a string just before, the compiler will add the zero terminator automatically..

#pragma Cedro 1.0 const char const fragment_shader[] = { #embed "shader.frag.glsl" , 0x00 // Zero-terminated string. }; --embed-as-string=170 const char const fragment_shader[164] = /* shader.frag.glsl */ "#version 140\n" "\n" "precision highp float; // needed only for version 1.30\n" "\n" "in vec3 ex_Color;\n" "out vec4 out_Color;\n" "\n" "void main(void)\n" "{\n" "\tout_Color = vec4(ex_Color,1.0);\n" "}\n";
#include <stdint.h> #pragma Cedro 1.0 const uint8_t image[] = { #embed "../images/cedro-32x32.png" }; --embed-as-string=1600 #include <stdint.h> const uint8_t image[1559] = /* cedro-32x32.png */ "\211PNG\r\n" "\032\n" "\000\000\000\rIHDR\000\000\000 \000\000\000 \b\002…" ⋮ …";
#include <stdint.h> #pragma Cedro 1.0 const uint8_t image #include {../images/cedro-32x32.png} ;
This form is obsolete, and will be removed.
--embed-as-string=1600 #include <stdint.h> const uint8_t image [1559] = /* cedro-32x32.png */ "\211PNG\r\n" "\032\n" "\000\000\000\rIHDR\000\000\000 \000\000\000 \b\002…" ⋮ …";

Directly inserting the code in the program is very convenient but it will slow down compilation. The way to reduce the problem, apart from using --embed-as-string=<limit>, is to compile this part separately as can be seen in template/Makefile.nanovg.mk or in this example: another way would be to use precompiled headers)

assets.c#include <stdint.h> #include <stdlib.h> #pragma Cedro 1.0 const uint8_t image[] = { #embed "../images/cedro-32x32.png" }; const size_t sizeof_image = sizeof(image);
main.c#include <stdint.h> #include <stdlib.h> #include <stdio.h> #include <limits.h> extern const uint8_t image[]; extern const size_t sizeof_image; int main(void) { unsigned int sum = 0; for (size_t i = 0; i < sizeof_image; ++i) { sum += image[i]; } fprintf(stderr, "The sum (modulo UINT_MAX=%u) of the image bytes is %u.\n", UINT_MAX, sum); }
cedrocc -c -o assets.o assets.c -std=c99 cedrocc -c -o main.o main.c -std=c99 cc -o program main.o assets.o

This feature is an old idea and there are several implementations, for instance xxd (as xxd -i, man page) which I used many years ago and has it since 1994.

More recently, the include_bytes!() macro has been very useful to me in my Rust programs.

I got the idea of producing string literals instead of byte arrays from this comment:

the way that you’ve lowered them is absolutely the worst case for compiler performance. The compiler needs to create a literal constant object for every byte and an array object wrapping them. If you lower them as C string literals with escapes, you will generate code that compiles much faster. For example, the cedro-32x32.png example lowered as “\x89\x50\x4E\x47\0D\x0A\x1A…” will be faster and use less memory in the C compiler.

David Chisnall in Lobsters, 2021-08-12

I did not realize that, you are right of course! I know there are limits to the size of string literals, but maybe that does not apply if you split them. I’ll have to check that out.

EDIT: I’ve just found out that bin2c (which I knew existed but haven’t used) does work in the way you describe, with strings instead of byte arrays: https://github.com/adobe/bin2c#comparison-to-other-tools It does mention the string literal size limit. I suspect you know, but for others reading this: the C standard defines some sizes that all compilers must support as a minimum, and one of them is the string literal maximum size. Compilers are free to allow bigger tokens when parsing.

I’m concerned that it would be a problem, because as I hinted above my use case includes compiling on old platforms with outdated C compilers (sometimes for fun, others because my customers need that) so it is important that cedro does not fail any more than strictly necessary when running on unusual machines.

Thinking about it, I could use strings when under the length limit, but those would be the cases where the performance difference would be small. I’ll keep things like this for now, but thanks to you I’ll take these aspects into account. EDIT END.

Alberto González Palomo in Lobsters, 2021-08-12

An example of this method is Adobe’s bin2c published in 2020 (not the same as the bin2c of 2012 that produces byte literals like xxd), and although I haven’t looked in its source code, Cedro follows the specification in its documentation except the ends of line, where Cedro splits the string literal in the same way as is usually done by hand and also limits the size of each individual string to 500 bytes in the hope of it working on old compilers.

More references with information on other methods:

Number literals: #number-literals

Allows using digit separators (' or _) and binary literals (0b…).

Starting from C23, the apostrophe separator and the binary literals are standard. If your compiler accepts C23, you can use the option --c23 to leave them as they are.

I prefer the underscore, but the C23 committe could not use it because of compatibility with C++, which could not use it because it conflicts with custom literal suffixes which already uses the underscore:

The syntax of digit separators is ripe for bikeshed discussions about what character to use as the separator token. The most common suggestions from the community are to use underscore (_), a single quote ('), or whitespace as these seem like they would not produce any lexical ambiguities. However, two of these suggestions turn out to have usability issues in practice.

[...] Use of an underscore character is also problematic, but only for C++. C++ has the ability for users to define custom literal suffixes [WG21 N2765], and these suffixes are required to be named with a legal C++ identifier, which includes the underscore character. Using an underscore as a digit separator then becomes ambiguous for a construct like 0x1234_a (does the _a require a lookup for a user-defined literal suffix or is it part of the hexadecimal constant?).

Aaaron Ballman in N2606 “Digit Separators”.

Several compilers, for instance GCC starting from v4.3, allow binary literals as extensions to the C language.

123'45'67 1234567
123_45_67 1234567
123'45_67 1234567
0b10100110 0xA6
0b_1010_0110 0xA6
0b'1010_0110 0xA6
↑ Contents