Cedro is a C language extension that works as a pre-processor with eight features:
x@ f(), g(y);
→ f(x); g(x, y);
(related work).auto ...
or defer ...
(related work).break label;
(related work).array[start..end]
(related work).#define { ... #define }
.#foreach { ... #foreach }
(related work).#embed "..."
(related work).12'34
| 12_34
→ 1234
, 0b1010
→ 0xA
).To activate it, the source file must contain this line:
#pragma Cedro 1.0
Otherwise, the file is copied directly to the output.
This line can contain certain options, for instance
#pragma Cedro 1.0 defer
to activate the use of
defer
instead of auto
.
The source code (Apache 2.0 license) can be found at the library.
To compile it, see Compile.cedro
only uses the standard C functions, cedrocc
and cedro-new
require POSIX.
Usage: cedro [options] <file.c>…
cedro new <name> # Runs: cedro-new <name>
To read from stdin, put - instead of <file.c>.
The result goes to stdout, can be compiled without intermediate file:
cedro file.c | cc -x c - -o file
It is what the cedrocc program does:
cedrocc -o file file.c
With cedrocc, the following options are the defaults:
--insert-line-directives
Code gets modified only after the line `#pragma Cedro 1.0`,
which can include certain options: `#pragma Cedro 1.0 defer`
--apply-macros Applies the macros: backstitch, defer, etc. (default)
--no-apply-macros Does not apply the macros.
--escape-ucn Escape non-ASCII in identifiers as UCN.
--no-escape-ucn Does not escape non-ASCII in identifiers. (default)
--discard-comments Discards the comments.
--discard-space Discards all whitespace.
--no-discard-comments Does not discard comments. (default)
--no-discard-space Does not discard whitespace. (default)
--insert-line-directives Insert `#line` directives.
--no-insert-line-directives Does not insert `#line` directives. (default)
--embed-as-string=<limit> Use string literals instead of bytes
for files smaller than <limit>.
Default value: 0
--c99 Produces source code for C99. (default)
Removes the digit separators (“'” | “_”),
converts binary literals into hexadecimal (“0b1010” → “0xA”),
and expands `#embed`.
It is not a translator between different C versions.
--c89 Produces source code for C89/C90. For now it is the same as --c99.
--c11 Produces source code for C11. For now it is the same as --c99.
--c17 Produces source code for C17. For now it is the same as --c99.
--c23 Produces source code for C23.
For instance, maintains the digit separators (“'” | “_” → “'”),
the binary literals (“0b1010” → “0b1010”),
and does not expand the `#embed` directives.
--print-markers Prints the markers.
--no-print-markers Does not print the markers. (default)
--benchmark Run a performance benchmark.
--validate=ref.c Compares the input to the given “ref.c” file.
Does not apply any macros: to compare the result of running Cedro
on a file, pipe its output through this option, for instance:
`cedro file.c | cedro - --validate=ref.c`
--version Show version: 1.0
The corresponding “pragma” is: `#pragma Cedro 1.0`
The option --escape-ucn
encodes Unicode® characters
outside of the ASCII range, when they appear as part of an identifier,
as C99 universal character names
(“C99 standard”, page 65, “6.4.3 Universal character names”),
which can be useful for older compilers without UTF-8 support such as
GCC before version 10.
For API documentation, see make doc
that requires having
Doxygen installed.
Compile
#compileThe simplest way is cc -o bin/cedro src/cedro.c
, but it’s more convenient to use make
:
$ make help
Available targets:
release: optimized build, assertions disabled, NDEBUG defined.
→ bin/cedro*
debug: debugging build, assertions enabled.
→ bin/cedro*-debug
static: same as release, only statically linked.
→ bin/cedro*-static
doc: build documentation with Doxygen https://www.doxygen.org
→ doc/api/index.html
test: build both release and debug, then run test suite.
check: apply several static analysis tools:
sparse: https://sparse.docs.kernel.org/
valgrind: https://valgrind.org/
gcc -fanalyzer:
https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html
clean: remove bin/ directory, and clean also inside doc/.
cedrocc
#cedroccThe second executable, cedrocc
, allows using Cedro as if it was part of the C compiler.
Usage: cedrocc [options] <file.c> [<file2.o>…]
Runs Cedro on the first file name that ends with “.c”,
and compiles the result with “cc -x c - -x none” plus the other arguments.
cedrocc -o file file.c
cedro file.c | cc -x c - -o file
The options get passed as is to the compiler, except for those that
start with “--cedro:…” that correspond to cedro options,
for instance “--cedro:escape-ucn” is like “cedro --escape-ucn”.
The following option is the default:
--cedro:insert-line-directives
Some GCC options activate Cedro options automatically:
--std=c90|c89|iso9899:1990|iso8999:199409 → --cedro:c90
--std=c99|c9x|iso9899:1999|iso9899:199x → --cedro:c99
--std=c11|c1x|iso9899:2011 → --cedro:c11
--std=c17|c18|iso9899:2017|iso9899:2018 → --cedro:c17
--std=c2x → --cedro:c23
In addition, for each `#include`, if it finds the file it reads it and
if it finds `#pragma Cedro 1.0` processes it and inserts the result
in place of the `#include`.
You can specify the compiler, e.g. `gcc`:
CEDRO_CC='gcc -x c - -x none' cedrocc …
For debugging, this writes the code that would be piped into `cc`,
into `stdout` instead:
CEDRO_CC='' cedrocc …
If you get an error message like “embedding a directive within macro arguments is not portable” (GCC) or “embedding a directive within macro arguments has undefined behavior” (clang), it means that you’re using Cedro with --insert-line-directives
inside the parameters for a macro. You can either expand the code given to the macro manually, or avoid --insert-line-directives
by replacing cedrocc -o file file.c
with cedro file.c | cc -x c - -o file
.
cedrocc
does another thing in addition to cedro file.c | cc -x c - -o file
:
for each #include
, if it finds the file (-I ...
) it looks inside for #pragma Cedro 1.0
and if found, processes the file and inserts the result in place of the #include
. The reason is being able to compile in one go programs that use Cedro in several files, instead of having to transform each of them into temporary files to be compiled later.
cedro-new
#cedro-newThere is a third executable, cedro-new
, that produces a program draft in a similar way to cargo new
in Rust. cedro new …
will actually run cedro-new …
. The content is produced from the template in the cedro-new
executable at compilation.
Usage: cedro-new [options] <name>
Creates a directory named <name>/ with the template.
-h, --help Shows this message.
-i, --interactive Asks for the names for command and project.
Otherwise, they will be derived from the directory name.
When producing the draft, certain patterns get replaced in
{#year}
: the current year.{#Author}
: name and email address
reported by
git config user.name
and … user.email
if they are available, and if not
“Your Name Here <[email protected]>”.{#Template}
: the project name,
e.g. “Cedro”.{#template}
: the program name,
e.g. “cedro”.{#TEMPLATE}
: the program name in uppercase,
e.g. “CEDRO”.The template includes several project drafts, that can be activated in the generated
Command line (“CLI”) tool
that identifies the type of each argument,
and uses a btree
to count how many times each one is repeated.
This gets built if the
Graphical application with nanovg.
Downloads (with curl
) and
compiles automatically nanovg, GLFW, and GLEW.
include Makefile.nanovg.mk
MAIN=src/main.nanovg.c
HTTP/1.1 server using libuv.
Downloads (with curl
) and
compiles automatically libuv.
include Makefile.libuv.mk
MAIN=src/main.libuv.c
Threads a value through a sequence of function calls, as first parameter for each of them.
It is an explicit version of what other programming languages do to implement member functions, and the result is a usual pattern in C libraries.
Note: the @
symbol is not recognized
when written as \u0040
,
but it gets converted to @
in the output.
This serves to escape it when chaining Cedro with another
pre-processor that uses it.
object @ f(a), g(b); |
f(object, a);
g(object, b); |
&object @ f(a), g(b); |
f(&object, a);
g(&object, b); |
object.field @ f(a), g(b); |
f(object.field, a);
g(object.field, b); |
int x = (object @ f(a), g(b)); |
int x = (f(object, a), g(object, b));
This is the C comma operator, it’s the same as f(object, a); int x = g(object, b); |
object @prefix_... f(a), g(b); |
prefix_f(object, a);
prefix_g(object, b); |
object @..._suffix f(a), g(b); |
f_suffix(object, a);
g_suffix(object, b); |
graphics_context @nvg...
BeginPath(),
Rect(100,100, 120,30),
Circle(120,120, 5),
PathWinding(NVG_HOLE),
FillColor(nvgRGBA(255,192,0,255)),
Fill();
|
nvgBeginPath(graphics_context);
nvgRect(graphics_context, 100,100, 120,30);
nvgCircle(graphics_context, 120,120, 5);
nvgPathWinding(graphics_context, NVG_HOLE);
nvgFillColor(graphics_context, nvgRGBA(255,192,0,255));
nvgFill(graphics_context);
|
For each comma-separated segment,
if it starts with any of the tokens
[
,
++
, --
, .
, ->
,
=
, +=
, -=
, *=
, /=
, %=
, <<=
, >>=
, &=
, ^=
, |=
,
or there is nothing that looks like a function call,
the insertion point is the beginning of the segment:
number_array @ [3]=44, [2]=11; |
number_array[3]=44; number_array[2]=11; |
*number_array++ @ = 1, = 2; |
*number_array++ = 1; *number_array++ = 2; |
figure_center_point @ .x=44, .y=11; |
figure_center_point.x=44; figure_center_point.y=11; |
Complex expressions can be used as prefixes by putting them
to the left of the @
and leaving the ellipsis
without prefix or suffix:
// ngx_http_sqlite_module.c#L802
(*chain->last)->buf->@ ...
pos = u_str,
last = u_str + ns.len,
memory = 1; |
// ngx_http_sqlite_module.c#L802
(*chain->last)->buf->pos = u_str;
(*chain->last)->buf->last = u_str + ns.len;
(*chain->last)->buf->memory = 1; |
The object part can be left empty, which is useful for things like adding prefixes or suffixes to enumerations:
typedef enum {
@TOKEN_... SPACE, WORD, NUMBER
} TokenType;
|
typedef enum {
TOKEN_SPACE, TOKEN_WORD, TOKEN_NUMBER
} TokenType;
|
// http://docs.libuv.org/en/v1.x/guide/threads.html#core-thread-operations
// `hare` and `tortoise` are functions.
int main() {
int tracklen = 10;
@uv_thread_...
t hare_id,
t tortoise_id,
create(&hare_id, hare, &tracklen),
create(&tortoise_id, tortoise, &tracklen),
join(&hare_id),
join(&tortoise_id);
return 0;
}
|
// http://docs.libuv.org/en/v1.x/guide/threads.html#core-thread-operations
// `hare` and `tortoise` are functions.
int main() {
int tracklen = 10;
uv_thread_t hare_id;
uv_thread_t tortoise_id;
uv_thread_create(&hare_id, hare, &tracklen);
uv_thread_create(&tortoise_id, tortoise, &tracklen);
uv_thread_join(&hare_id);
uv_thread_join(&tortoise_id);
return 0;
}
|
function(a, @prefix_... b, c) |
function(a, prefix_b, prefix_c) |
The segments part can also be left empty to add either a prefix or a suffix to an identifier:
Next(reader) @xmlTextReader...; |
xmlTextReaderNext(reader); |
get(&vector, index) @..._Byte_vec; |
get_Byte_vec(&vector, index); |
function(a, b @prefix_..., c) |
function(a, prefix_b, c) |
It is a left-associative operator:
object @ f(a) @ g(b); |
g(f(object, a), b); |
x @ one() @ two() @ three() @ four(); |
four(three(two(one(x)))); |
Looking for prior implementations of this idea I’ve found
magma (2014),
where it is called
doto
.
It is a macro for the
cmacro
pre-processor which has the inconvenient of needing the Common Lisp
SBCL compiler
in addition to the C compiler.
Clojure also has a macro called doto
which works
in a similar manner,
for instance to do f₁(x); f₂(x); f₃(x);
:
Magma | doto | doto macro | doto(x) { f₁(); f₂(); f₃(); } |
---|---|---|---|
Clojure | doto | doto macro | (doto x f₁ f₂ f₃) |
Cedro | @ | backstitch macro | x @ f₁(), f₂(), f₃() |
Functional languages often have a similar operator
without the ability to thread a same value
through several functions.
For instance, the equivalent of f₃(f₂(f₁(x)))
:
Shell | | | pipe operator | echo x | f₁ | f₂ | f₃ |
---|---|---|---|
Haskell | & | reverse application operator | x & f₁ & f₂ & f₃ |
OCaml | |> | reverse application operator | x |> f₁ |> f₂ |> f₃ |
Elixir | |> | pipe operator | x |> f₁ |> f₂ |> f₃ |
Clojure | -> | threading macro | (-> x f₁ f₂ f₃) |
Cedro | @ | backstitch macro | x @ f₁() @ f₂() @ f₃() |
Ada 2005 introduced a feature called prefixed-view notation that is more similar to C++ as the exact function being called can not be determined without knowing which methods are implemented for the object type.
Moves the clean-up code for a variable to the end of its scope
including the exit points
break
, continue
, goto
,
return
.
In C, resources must be released back to the system explicitly once they are no longer needed, which usualy happens quite far from the place where they were allocated. As time passes and changes accumulate in the program, it’s easy to forget releasing them in all cases or to attempt releasing a resource twice.
Other programming languages have mechanisms for automatic resource release: C++ for instance, uses functions called destructors that get run implicitly when exiting a variable’s scope.
The programming language Go introduced an explicit notation called “defer” that fits better the style of C. The first difference is that in Go, all releases happen when exiting the function, while with Cedro the releases happen when exiting each block, like the destructors in C++ do.
There are more differences, such as for instance that in Go it
can be used to modify the return value of the function,
and that Cedro does not even try to handle
longjmp()
,
exit()
,
thrd_exit()
etc.
because it could only apply the deferred actions in the current function, not in any function that called this one. See “A defer mechanism for C” (published academic paper as PDF in the SAC’21 conference) for a compiler-level implementation that does handle longjmp()
and stack unwinding.
In Cedro, the release function is marked with the
C keyword auto
which is not needed in standard C code
before C23
because it is the default and can be replaced with signed
as
it has the same effect.
It is also possible to use defer
instead of auto
adding the keyword “defer” to the “pragma”:
#pragma Cedro 1.0 defer
.
#pragma Cedro 1.0
…
char* text = malloc(count + 1);
if (!text) return ENOMEM;
auto free(text);
…
if (file_name) {
FILE* file = fopen(file_name, "w");
if (!file) return errno;
auto fclose(file);
fwrite(text, sizeof(char), count, file);
… |
#pragma Cedro 1.0 defer
…
char* text = malloc(count + 1);
if (!text) return ENOMEM;
defer free(text);
…
if (file_name) {
FILE* file = fopen(file_name, "w");
if (!file) return errno;
defer fclose(file);
fwrite(text, sizeof(char), count, file);
… |
In this example, there is a text
store and a file
that must be released back to the system:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#pragma Cedro 1.0
int repeat_letter(char letter, size_t count,
char* file_name)
{
char* text = malloc(count + 1);
if (!text) return ENOMEM;
auto free(text);
for (size_t i = 0; i < count; ++i) {
text[i] = letter;
}
text[count] = 0;
if (file_name) {
FILE* file = fopen(file_name, "w");
if (!file) return errno;
auto fclose(file);
fwrite(text, sizeof(char), count, file);
fputc('\n', file);
}
printf("Repeated %lu times: %s\n",
count, text);
return 0;
}
int main(void)
{
return repeat_letter('A', 6, "aaaaaa.txt");
} |
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
int repeat_letter(char letter, size_t count,
char* file_name)
{
char* text = malloc(count + 1);
if (!text) return ENOMEM;
for (size_t i = 0; i < count; ++i) {
text[i] = letter;
}
text[count] = 0;
if (file_name) {
FILE* file = fopen(file_name, "w");
if (!file) {
free(text);
return errno;
}
fwrite(text, sizeof(char), count, file);
fputc('\n', file);
fclose(file);
}
printf("Repeated %lu times: %s\n",
count, text);
free(text);
return 0;
}
int main(void)
{
return repeat_letter('A', 6, "aaaaaa.txt");
} |
Compiling it with GCC or clang,
on the left running explicitly the compiler,
and on the right using cedrocc
:
$ cedro repeat.c | cc -o repeat -x c -
$ ./repeat
Repeated 6 times: AAAAAA
$ cat aaaaaa.txt
AAAAAA
$ valgrind --leak-check=yes ./repeat
…
==8795== HEAP SUMMARY:
==8795== in use at exit: 0 bytes in 0 blocks
==8795== total heap usage: 4 allocs, 4 frees,
5,599 bytes allocated
==8795==
==8795== All heap blocks were freed -- no leaks are possible
|
$ cedrocc -o repeat repeat.c
$ ./repeat
Repeated 6 times: AAAAAA
$ cat aaaaaa.txt
AAAAAA
$ valgrind --leak-check=yes ./repeat
…
==8795== HEAP SUMMARY:
==8795== in use at exit: 0 bytes in 0 blocks
==8795== total heap usage: 4 allocs, 4 frees,
5,599 bytes allocated
==8795==
==8795== All heap blocks were freed -- no leaks are possible
|
In this example adapted from
“Proposal for C2x, WG14 n2542, Defer Mechanism for C” p. 40,
the released resources are spin locks:
(the difference of course is that in this case the spin_unlock()
calls do not run after the panic)
/* Adapted from example in n2542.pdf#40 */
#pragma Cedro 1.0
int f1(void) {
puts("g called");
if (bad1()) { return 1; }
spin_lock(&lock1);
auto spin_unlock(&lock1);
if (bad2()) { return 1; }
spin_lock(&lock2);
auto spin_unlock(&lock2);
if (bad()) { return 1; }
/* Access data protected by the spinlock then force a panic */
completed += 1;
unforced(completed);
return 0;
}
|
/* Adapted from example in n2542.pdf#40 */
int f1(void) {
puts("g called");
if (bad1()) { return 1; }
spin_lock(&lock1);
if (bad2()) { spin_unlock(&lock1); return 1; }
spin_lock(&lock2);
if (bad()) { spin_unlock(&lock2); spin_unlock(&lock1); return 1; }
/* Access data protected by the spinlock then force a panic */
completed += 1;
unforced(completed);
spin_unlock(&lock2);
spin_unlock(&lock1);
return 0;
}
|
Andrew Kelley compared resource management between C and his
Zig programming language
in a 2019 presentation titled
“The Road to Zig 1.0” at 29:21s,
and here I’ve re-created his C example using Cedro to produce
the function as he showed it, except that Cedro does not know
that the final for
loop never ends so it adds
unnecessary resource release code after it.
// Example retrofitted from C example by Andrew Kelley:
// https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s
#pragma Cedro 1.0
int main(int argc, char **argv) {
struct SoundIo *soundio = soundio_create();
if (!soundio) {
fprintf(stderr, "out of memory\n");
return 1;
}
auto soundio_destroy(soundio);
int err;
if ((err = soundio_connect(soundio))) {
fprintf(stderr, "unable to connect: %s\n", soundio_strerror(err));
return 1;
}
soundio_flush_events(soundio);
int default_output_index = soundio_default_output_device_index(soundio);
if (default_output_index < 0) {
fprintf(stderr, "No output device\n");
return 1;
}
struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index);
if (!device) {
fprintf(stderr, "out of memory\n");
return 1;
}
auto soundio_device_unref(device);
struct SoundIoOutStream *outstream = soundio_outstream_create(device);
if (!outstream) {
fprintf(stderr, "out of memory\n");
return 1;
}
auto soundio_outstream_destroy(outstream);
outstream->format = SoundIoFormatFloat32NE;
outstream->write_callback = write_callback;
if ((err = soundio_outstream_open(outstream))) {
fprintf(stderr, "unable to open device: %s" "\n", soundio_strerror(err));
return 1;
}
if ((err = soundio_outstream_start(outstream))) {
fprintf(stderr, "unable to start device: %s\n", soundio_strerror(err));
return 1;
}
for (;;) soundio_wait_events(soundio);
}
|
// Example retrofitted from C example by Andrew Kelley:
// https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s
int main(int argc, char **argv) {
struct SoundIo *soundio = soundio_create();
if (!soundio) {
fprintf(stderr, "out of memory\n");
return 1;
}
int err;
if ((err = soundio_connect(soundio))) {
fprintf(stderr, "unable to connect: %s\n", soundio_strerror(err));
soundio_destroy(soundio);
return 1;
}
soundio_flush_events(soundio);
int default_output_index = soundio_default_output_device_index(soundio);
if (default_output_index < 0) {
fprintf(stderr, "No output device\n");
soundio_destroy(soundio);
return 1;
}
struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index);
if (!device) {
fprintf(stderr, "out of memory\n");
soundio_destroy(soundio);
return 1;
}
struct SoundIoOutStream *outstream = soundio_outstream_create(device);
if (!outstream) {
fprintf(stderr, "out of memory\n");
soundio_device_unref(device);
soundio_destroy(soundio);
return 1;
}
outstream->format = SoundIoFormatFloat32NE;
outstream->write_callback = write_callback;
if ((err = soundio_outstream_open(outstream))) {
fprintf(stderr, "unable to open device: %s" "\n", soundio_strerror(err));
soundio_outstream_destroy(outstream);
soundio_device_unref(device);
soundio_destroy(soundio);
return 1;
}
if ((err = soundio_outstream_start(outstream))) {
fprintf(stderr, "unable to start device: %s\n", soundio_strerror(err));
soundio_outstream_destroy(outstream);
soundio_device_unref(device);
soundio_destroy(soundio);
return 1;
}
for (;;) soundio_wait_events(soundio);
soundio_outstream_destroy(outstream);
soundio_device_unref(device);
soundio_destroy(soundio);
}
|
However, his Zig example had the unfair advantage of returning error values instead of printing error messages which takes more space. The following is a C function that matches the Zig version more closely:
// Example retrofitted from Zig example by Andrew Kelley:
// https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s
#pragma Cedro 1.0
int main(int argc, char **argv) {
struct SoundIo *soundio = soundio_create();
if (!soundio) { return SoundIoErrorNoMem; }
auto soundio_destroy(soundio);
int err;
if ((err = soundio_connect(soundio))) return err;
soundio_flush_events(soundio);
const int default_output_index = soundio_default_output_device_index(soundio);
if (default_output_index < 0) return SoundIoErrorNoSuchDevice;
const struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index);
if (!device) return SoundIoErrorNoMem;
auto soundio_device_unref(device);
const struct SoundIoOutStream *outstream = soundio_outstream_create(device);
if (!outstream) return SoundIoErrorNoMem;
auto soundio_outstream_destroy(outstream);
outstream->format = SoundIoFormatFloat32NE;
outstream->write_callback = write_callback;
if ((err = soundio_outstream_open(outstream))) return err;
if ((err = soundio_outstream_start(outstream))) return err;
while (true) soundio_wait_events(soundio);
}
|
// Example retrofitted from Zig example by Andrew Kelley:
// https://www.youtube.com/watch?v=Gv2I7qTux7g&t=1761s
int main(int argc, char **argv) {
struct SoundIo *soundio = soundio_create();
if (!soundio) { return SoundIoErrorNoMem; }
int err;
if ((err = soundio_connect(soundio))) {
soundio_destroy(soundio);
return err;
}
soundio_flush_events(soundio);
const int default_output_index = soundio_default_output_device_index(soundio);
if (default_output_index < 0) {
soundio_destroy(soundio);
return SoundIoErrorNoSuchDevice;
}
const struct SoundIoDevice *device = soundio_get_output_device(soundio, default_output_index);
if (!device) {
soundio_destroy(soundio);
return SoundIoErrorNoMem;
}
const struct SoundIoOutStream *outstream = soundio_outstream_create(device);
if (!outstream) {
soundio_device_unref(device);
soundio_destroy(soundio);
return SoundIoErrorNoMem;
}
outstream->format = SoundIoFormatFloat32NE;
outstream->write_callback = write_callback;
if ((err = soundio_outstream_open(outstream))) {
soundio_outstream_destroy(outstream);
soundio_device_unref(device);
soundio_destroy(soundio);
return err;
}
if ((err = soundio_outstream_start(outstream))) {
soundio_outstream_destroy(outstream);
soundio_device_unref(device);
soundio_destroy(soundio);
return err;
}
while (true) soundio_wait_events(soundio);
soundio_outstream_destroy(outstream);
soundio_device_unref(device);
soundio_destroy(soundio);
}
|
The Cedro version is much closer, but his point still stands because the plain C version needs a lot of repeated code and is more fragile. And of course Zig has many other great features.
Apart from the already mentioned
“A defer mechanism for C”,
there are macros that use a for
loop as
for (allocation and initialization; condition; release) { actions }
[1]
or other techniques
[2].
Compilers like GCC and clang have non-standard features to do this
like the __cleanup__
variable attribute.
Cedro does not have the limitation of the deferred code having to be a function: it can be a code block, with or without conditionals, which allows for instance to emulate Zig’s errdefer
by performing different actions in case of error:
char* allocate_block(size_t n, char** err_p)
{
char* result = malloc(n);
auto if (*err_p) {
free(result);
result = NULL;
}
if (n > 10) {
*err_p = "n is too big";
}
return result;
}
|
char* allocate_block(size_t n, char** err_p)
{
char* result = malloc(n);
if (n > 10) {
*err_p = "n is too big";
}
if (*err_p) {
free(result);
result = NULL;
}
return result;
}
|
Converts break label;
or continue label;
into goto label;
. In C it’s only possible to break out of one loop at a time when using break
, which is also a problem when the interruption comes from a switch
block.
#include <stdio.h>
#include <stdlib.h>
#pragma Cedro 1.0
int main(int argc, char* argv[])
{
int x = 0, y = 0;
for (x = 0; next_x: x < 100; ++x) {
for (y = 0; y < 100; ++y) {
switch (x + y) {
case 157:
break found_number_decomposition;
case 11:
x = 37;
fprintf(stderr, "Jump from x=11 to x=%d\n", x);
--x;
continue next_x;
}
}
} found_number_decomposition:
if (x < 100 || y < 100) {
fprintf(stderr, "Found %d = %d + %d\n",
x + y, x, y);
}
return 0;
}
|
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
int x = 0, y = 0;
for (x = 0; x < 100; ++x) {
for (y = 0; y < 100; ++y) {
switch (x + y) {
case 157:
goto found_number_decomposition;
case 11:
x = 37;
fprintf(stderr, "Jump from x=11 to x=%d\n", x);
--x;
goto next_x;
}
}
next_x:;
} found_number_decomposition:
if (x < 100 || y < 100) {
fprintf(stderr, "Found %d = %d + %d\n",
x + y, x, y);
}
return 0;
}
|
The difference between break …
, continue …
, and goto …
is in the restrictions:
break label
only allows forward jumps, and the label must go right after the end of the loop.continue label
only backward jumps, and the label must go on the loop cond-expression: for (i = 0; label: i < 10; ++i)
, while (label: i < 10)
.goto label
has no restrictions.It is part of the deferred resource release macro:
#include <stdio.h>
#include <stdlib.h>
#pragma Cedro 1.0
int main(int argc, char* argv[])
{
int x = 0, y = 0;
char *level1 = malloc(1);
auto free(level1);
for (x = 0; next_x: x < 100; ++x) {
char *level2 = malloc(2);
auto free(level2);
for (y = 0; y < 100; ++y) {
char *level3 = malloc(3);
auto free(level3);
switch (x + y) {
case 157:
break found_number_decomposition;
case 11:
x = 37;
fprintf(stderr, "Jump from x=11 to x=%d\n", x);
continue next_x;
}
}
} found_number_decomposition:
if (x < 100 || y < 100) {
fprintf(stderr, "Found %d = %d + %d\n",
x + y, x, y);
}
return 0;
}
|
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
int x = 0, y = 0;
char *level1 = malloc(1);
for (x = 0; x < 100; ++x) {
char *level2 = malloc(2);
for (y = 0; y < 100; ++y) {
char *level3 = malloc(3);
switch (x + y) {
case 157:
free(level3);
free(level2);
goto found_number_decomposition;
case 11:
x = 37;
fprintf(stderr, "Jump from x=11 to x=%d\n", x);
free(level3);
goto next_x;
}
free(level3);
}
next_x:;
free(level2);
} found_number_decomposition:
if (x < 100 || y < 100) {
fprintf(stderr, "Found %d = %d + %d\n",
x + y, x, y);
}
free(level1);
return 0;
}
|
Using goto
in general it can’t be guaranteed that the resources will be released correctly, but with the restrictions when using break …
and continue …
it does work.
$ bin/cedrocc test/defer-label-break.c -std=c99 -pedantic-errors -Wall -fanalyzer -o /tmp/find-number-decomposition
$ valgrind --leak-check=yes /tmp/find-number-decomposition
…
Jump from x=11 to x=37
Found 157 = 58 + 99
==1077==
==1077== HEAP SUMMARY:
==1077== in use at exit: 0 bytes in 0 blocks
==1077== total heap usage: 2,236 allocs, 2,236 frees, 6,683 bytes allocated
==1077==
==1077== All heap blocks were freed -- no leaks are possible
==1077==
==1077== For lists of detected and suppressed errors, rerun with: -s
==1077== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) |
The BLISS 11 programming language was the first that introduced labels for its leave
keyword (analogous to C’s break
) around 1974, and later other languages like Java, Javascript, and Go did the same with continue
and break
.
Converts array[start..end]
into &array[start], &array[end]
. The array/pointer value array
can be just an identifier or in general an expression which, as it will be evaluated twice, must not have any side effects, just like standard C preprocessor macros.
We can use it to improve the vector example from the loop macro section:
append(&words, animals[0..2]);
append(&words, plants[0..3]);
append(&words, animals[2..4]);
|
append(&words, &animals[0], &animals[2]);
append(&words, &plants[0], &plants[3]);
append(&words, &animals[2], &animals[4]);
|
The end of the slice can have a positive sign to indicate that
it is a relative position to the start of the slice:
array[start..+end]
becomes
&array[start], &array[start+end]
.
In this case, the advice about double execution of side effects applies to
start
in addition to array
.
append(&words, animals[0..+2]);
append(&words, plants[0..3]);
append(&words, animals[2..+2]);
|
append(&words, &animals[0], &animals[0+2]);
append(&words, &plants[0], &plants[3]);
append(&words, &animals[2], &animals[2+2]);
|
If the slice is composed of more than one token, it will be wrapped in parentheses to make sure it is correct.
append(&words, (uint8_t*)animals[0..+2]);
append(&words, (uint8_t*)plants[0..3]);
append(&words, (uint8_t*)animals[2..+2]);
|
append(&words, &((uint8_t*)animals)[0], &((uint8_t*)animals)[0+2]);
append(&words, &((uint8_t*)plants)[0], &((uint8_t*)plants)[3]);
append(&words, &((uint8_t*)animals)[2], &((uint8_t*)animals)[2+2]);
|
It can be used in initializers, in which case the braces can be omitted:
#include <stdio.h>
#pragma Cedro 1.0
typedef struct {
const char* a;
const char* b;
} char_slice_t;
const char* text = "uno dos tres";
/** Extract "dos" from `text`. */
int main(void)
{
char_slice_t slice = text[4..+3];
const char* cursor;
for (cursor = slice.a; cursor != slice.b; ++cursor) {
putc(*cursor, stderr);
}
putc('\n', stderr);
}
| #include <stdio.h>
typedef struct {
const char* a;
const char* b;
} char_slice_t;
const char* text = "uno dos tres";
/** Extract "dos" from `text`. */
int main(void)
{
char_slice_t slice = { &text[4], &text[4+3] };
const char* cursor;
for (cursor = slice.a; cursor != slice.b; ++cursor) {
putc(*cursor, stderr);
}
putc('\n', stderr);
}
|
This complete example shows a scrolling text marquee:
#ifdef _WIN32
#include <windows.h>
// See https://stackoverflow.com/a/3930716/
int sleep_ms(int ms) { Sleep(ms); return 0; }
#else
#define _XOPEN_SOURCE 500
#include <unistd.h> // Deprecated but simple:
int sleep_ms(int ms) { return usleep(ms*1000); }
#endif
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#pragma Cedro 1.0
typedef char utf8_char[5]; // As C string.
void
print_slice(const utf8_char* start,
const utf8_char* end)
{
while (start < end) fputs(*start++, stdout);
}
int
main(int argc, char* argv[])
{
size_t display_size = 8;
if (argc < 2) {
fprintf(stderr, "Usage: marquee <text>\n",
display_size);
exit(1);
}
const char separator[] = " *** ";
size_t len_bytes = strlen(argv[1]);
size_t len_separator = strlen(separator);
char* m = malloc(len_bytes + len_separator);
auto free(m);
memcpy(m, argv[1], len_bytes);
memcpy(m + len_bytes, separator, len_separator);
len_bytes += len_separator;
// Extract each character encoded as UTF-8,
// which needs up to 4 bytes + 1 terminator.
utf8_char* message =
malloc(sizeof(utf8_char) * len_bytes);
auto free(message);
size_t len = 0;
for (size_t end = 0; end < len_bytes;) {
const char b = m[end];
size_t u;
if (0xF0 == (b & 0xF8)) u = 4;
else if (0xE0 == (b & 0xF0)) u = 3;
else if (0xC0 == (b & 0xE0)) u = 2;
else u = 1;
if (end + u > len_bytes) break;
memcpy(&message[len], &m[end], u);
message[end][u] = '\0';
end += u;
++len;
}
if (len < 2) {
fprintf(stderr, "message text is too short.\n");
exit(2);
} else if (len < display_size) {
display_size = len - 1;
}
for (;;) {
for (int i = 0; i < len; ++i) {
int rest = i + display_size > len?
i + display_size - len: 0;
int visible = display_size - rest;
print_slice(message[i .. +visible]);
print_slice(message[0 .. rest]);
putc('\r', stdout);
fflush(stdout);
sleep_ms(300);
}
}
return 0;
}
| #ifdef _WIN32
#include <windows.h>
// See https://stackoverflow.com/a/3930716/
int sleep_ms(int ms) { Sleep(ms); return 0; }
#else
#define _XOPEN_SOURCE 500
#include <unistd.h> // Deprecated but simple:
int sleep_ms(int ms) { return usleep(ms*1000); }
#endif
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef char utf8_char[5]; // As C string.
void
print_slice(const utf8_char* start,
const utf8_char* end)
{
while (start < end) fputs(*start++, stdout);
}
int
main(int argc, char* argv[])
{
size_t display_size = 8;
if (argc < 2) {
fprintf(stderr, "Usage: marquee <text>\n",
display_size);
exit(1);
}
const char separator[] = " *** ";
size_t len_bytes = strlen(argv[1]);
size_t len_separator = strlen(separator);
char* m = malloc(len_bytes + len_separator);
memcpy(m, argv[1], len_bytes);
memcpy(m + len_bytes, separator, len_separator);
len_bytes += len_separator;
// Extract each character encoded as UTF-8,
// which needs up to 4 bytes + 1 terminator.
utf8_char* message =
malloc(sizeof(utf8_char) * len_bytes);
size_t len = 0;
for (size_t end = 0; end < len_bytes;) {
const char b = m[end];
size_t u;
if (0xF0 == (b & 0xF8)) u = 4;
else if (0xE0 == (b & 0xF0)) u = 3;
else if (0xC0 == (b & 0xE0)) u = 2;
else u = 1;
if (end + u > len_bytes) break;
memcpy(&message[len], &m[end], u);
message[end][u] = '\0';
end += u;
++len;
}
if (len < 2) {
fprintf(stderr, "message text is too short.\n");
exit(2);
} else if (len < display_size) {
display_size = len - 1;
}
for (;;) {
for (int i = 0; i < len; ++i) {
int rest = i + display_size > len?
i + display_size - len: 0;
int visible = display_size - rest;
print_slice(&message[i], &message[i+visible]);
print_slice(&message[0], &message[rest]);
putc('\r', stdout);
fflush(stdout);
sleep_ms(300);
}
}
free(message);
free(m);
return 0;
}
|
The notation [a..b]
for array slices was
first defined in Algol 68
where it was an alternative to the primary notation [a:b]
, and
both have been adopted by other languages since then.
The [a..b]
form is used in Ada,
Perl, D, and
Rust, for example.
Formats a multi-line macro into a single line.
C macros must be written all in one line,
but some times you need to split them in several pseudo-lines
and it gets tedious and error-prone
to maintain all the newline escapes \
.
By adding braces {
or }
right after #define
we can have Cedro do that for us:
#define { macro(A, B, C)
/// Version of f() for type A.
f_##A(B, C) /// Without semicolon “;” at the end.
#define }
int main(void) {
int x = 1, y = 2;
macro(int, x, y);
// → f_int(x, y);
}
|
#define macro(A, B, C) \
/** Version of f() for type A. */ \
f_##A(B, C) /** Without semicolon “;” at the end. */ \
/* End #define */
int main(void) {
int x = 1, y = 2;
macro(int, x, y);
// → f_int(x, y);
}
|
In cases like this
(“function-like macros”),
since there is no semicolon after f_##A(B, C)
tools such as source code editors
(e.g. Emacs)
indent the code incorrectly.
The solution is to leave it there for the editor,
and add it too after
#define }
as #define };
which directs Cedro to remove it from the definition.
#define { macro(A, B, C)
/// Version of f() for type A.
f_##A(B, C); /// Semicolon “;” removed by Cedro.
#define };
int main(void) {
int x = 1, y = 2;
macro(int, x, y);
// → f_int(x, y);
}
|
#define macro(A, B, C) \
/** Version of f() for type A. */ \
f_##A(B, C) /** Semicolon “;” removed by Cedro. */ \
/* End #define */
int main(void) {
int x = 1, y = 2;
macro(int, x, y);
// → f_int(x, y);
}
|
Preprocessor directives are not allowed
inside macros, so you can not use
#if
, #include
, etc.
Note: the directive must start exactly with
#define {
or #define }
,
with no more or less space between
#define
and the brace
{
or }
.
Repeat the lines between
#foreach { ...
and #foreach }
replacing the given variables.
These lines may contain macro definitions (#define
)
but the loop variables will not be expanded inside them.
As in #define
,
the ##
serves to
join tokens
so that if, for example,
T
is float
,
Vec_##T
produces Vec_float
.
Inside the loop,
any operator following a #
(e.g. #,
) gets omitted in the last iteration.
typedef enum {
#foreach { V {SPACE, NUMBER, \
KEYWORD, IDENTIFIER, OPERATOR}
T_##V#,
#foreach }
} TokenType;
|
typedef enum {
T_SPACE,
T_NUMBER,
T_KEYWORD,
T_IDENTIFIER,
T_OPERATOR
} TokenType;
|
If a variable has the prefix #
,
the result is a string with its contents:
if T
is float
,
char* name = #T;
produces
char* name = "float";
.
Loops can be nested, and the list of values can come from a variable defined in an outer loop.
#foreach { VALUES {{SPACE, NUMBER, \
KEYWORD, IDENTIFIER, OPERATOR}}
typedef enum {
#foreach { V VALUES
T_##V#,
#foreach }
} TokenType;
const char* const TokenType_STRING[] = {
#foreach { V VALUES
#V#,
#foreach }
};
#foreach }
|
typedef enum {
T_SPACE,
T_NUMBER,
T_KEYWORD,
T_IDENTIFIER,
T_OPERATOR
} TokenType;
const char* const TokenType_STRING[] = {
"SPACE",
"NUMBER",
"KEYWORD",
"IDENTIFIER",
"OPERATOR"
};
|
It is possible to iterate several variables in parallel using tuples of variables and values, that must have the same number of elements:
#foreach { {TYPE, PREFIX, VALUES} { \
{ TokenType, T_, {SPACE, NUMBER, \
KEYWORD, IDENTIFIER, OPERATOR} }, \
{ PinConfig, M_, {INPUT, OUTPUT} } \
}
typedef enum {
#foreach { V VALUES
PREFIX##V#,
#foreach }
} TYPE;
const char* const TYPE##_STRING[] = {
#foreach { V VALUES
#V#,
#foreach }
};
#foreach }
|
typedef enum {
T_SPACE,
T_NUMBER,
T_KEYWORD,
T_IDENTIFIER,
T_OPERATOR
} TokenType;
const char* const TokenType_STRING[] = {
"SPACE",
"NUMBER",
"KEYWORD",
"IDENTIFIER",
"OPERATOR"
};
typedef enum {
M_INPUT,
M_OUTPUT
} PinConfig;
const char* const PinConfig_STRING[] = {
"INPUT",
"OUTPUT"
};
|
This example iterates over a list of fields
to build a struct
and the corresponding print_...()
function.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#pragma Cedro 1.0
void print_double(double n, FILE* out)
{
fprintf(out, "%f", n);
}
typedef uint32_t Colour_ARGB;
void print_Colour_ARGB(Colour_ARGB c, FILE* out)
{
if ((c & 0xFF000000) == 0xFF000000) {
fprintf(out, "#%.6X", c & 0x00FFFFFF);
} else {
fprintf(out, "#%.8X", c);
}
}
#foreach { FIELDS {{ \
{ double, x, /** X position. */ }, \
{ double, y, /** Y position. */ }, \
{ Colour_ARGB, colour, /** Colour 32 bit. */ } \
}}
typedef struct Point {
#foreach { {TYPE, NAME, COMMENT} FIELDS
TYPE NAME; COMMENT
#foreach }
} Point;
void print_Point(Point point, FILE* out)
{
#foreach { {TYPE, NAME, COMMENT} FIELDS
print_##TYPE(point.NAME, out); putc('\n', out);
#foreach }
}
#foreach }
int main(void)
{
Point point = { .x = 12.3, .y = 4.56, .colour = 0xFFa3f193 };
print_Point(point, stderr);
}
| #include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
void print_double(double n, FILE* out)
{
fprintf(out, "%f", n);
}
typedef uint32_t Colour_ARGB;
void print_Colour_ARGB(Colour_ARGB c, FILE* out)
{
if ((c & 0xFF000000) == 0xFF000000) {
fprintf(out, "#%.6X", c & 0x00FFFFFF);
} else {
fprintf(out, "#%.8X", c);
}
}
typedef struct Point {
double x; /** X position. */
double y; /** Y position. */
Colour_ARGB colour; /** Colour 32 bit. */
} Point;
void print_Point(Point point, FILE* out)
{
print_double(point.x, out); putc('\n', out);
print_double(point.y, out); putc('\n', out);
print_Colour_ARGB(point.colour, out); putc('\n', out);
}
int main(void)
{
Punto punto = { .x = 12.3, .y = 4.56, .color = 0xFFa3f193 };
imprime_Punto(punto, stderr);
}
|
This complete example
defines variants for the types
float
, str
, and cstr
of an array/vector of variable length
with a function called append_Vec_##T()
for each one.
It then uses C11’s
_Generic
to define a pseudo-polymorphic
append()
function.
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h> // For memcpy().
typedef struct str { uint8_t* start; uint8_t* end; } str;
typedef char* cstr; // Type names must be single words.
#pragma Cedro 1.0
#foreach { TYPES_LIST {{float, str, cstr}}
#foreach { T TYPES_LIST
/** Vector (resizeable array) type. */
typedef struct {
T* _;
size_t len;
size_t capacity;
} Vec_##T;
/** Append a slice to a vector of elements of this type. */
bool
append_Vec_##T(Vec_##T *v, const T *start, const T *end)
{
const size_t to_add = (size_t)(end - start);
if (v->len + to_add > v->capacity) {
const size_t new_capacity = v->len +
(to_add < v->len? v->len: to_add);
T * const new_start =
realloc(v->_, new_capacity * sizeof(T));
if (!new_start) return false;
v->_ = new_start;
v->capacity = new_capacity;
}
memcpy(v->_ + v->len, start, to_add * sizeof(T));
v->len += to_add;
return true;
}
#foreach }
#foreach { DEFINE {#define} // Avoid joining lines.
DEFINE append(VEC, START, END) _Generic((VEC), \
#foreach { T TYPES_LIST
Vec_##T*: append_Vec_##T#, \
#foreach }
)(VEC, START, END)
#foreach }
#foreach }
#include <stdio.h>
int main(void)
{
Vec_cstr words = {0};
cstr animals[] = { "horse", "cat", "chicken", "dog" };
cstr plants [] = { "radish", "wheat", "tomato" };
append(&words, &animals[0], &animals[2]);
append(&words, &plants[0], &plants[3]);
append(&words, &animals[2], &animals[4]);
for (cstr *w = words._, *end = words._ + words.len;
w != end; ++w) {
fprintf(stderr, "Word: \"%s\"\n", *w);
}
return 0;
}
|
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h> // For memcpy().
typedef struct str { uint8_t* start; uint8_t* end; } str;
typedef char* cstr; // Type names must be single words.
/** Vector (resizeable array) type. */
typedef struct {
float* _;
size_t len;
size_t capacity;
} Vec_float;
/** Append a slice to a vector of elements of this type. */
bool
append_Vec_float(Vec_float *v, const float *start, const float *end)
{
const size_t to_add = (size_t)(end - start);
if (v->len + to_add > v->capacity) {
const size_t new_capacity = v->len +
(to_add < v->len? v->len: to_add);
float * const new_start =
realloc(v->_, new_capacity * sizeof(float));
if (!new_start) return false;
v->_ = new_start;
v->capacity = new_capacity;
}
memcpy(v->_ + v->len, start, to_add * sizeof(float));
v->len += to_add;
return true;
}
/** Vector (resizeable array) type. */
typedef struct {
str* _;
size_t len;
size_t capacity;
} Vec_str;
/** Append a slice to a vector of elements of this type. */
bool
append_Vec_str(Vec_str *v, const str *start, const str *end)
{
const size_t to_add = (size_t)(end - start);
if (v->len + to_add > v->capacity) {
const size_t new_capacity = v->len +
(to_add < v->len? v->len: to_add);
str * const new_start =
realloc(v->_, new_capacity * sizeof(str));
if (!new_start) return false;
v->_ = new_start;
v->capacity = new_capacity;
}
memcpy(v->_ + v->len, start, to_add * sizeof(str));
v->len += to_add;
return true;
}
/** Vector (resizeable array) type. */
typedef struct {
cstr* _;
size_t len;
size_t capacity;
} Vec_cstr;
/** Append a slice to a vector of elements of this type. */
bool
append_Vec_cstr(Vec_cstr *v, const cstr *start, const cstr *end)
{
const size_t to_add = (size_t)(end - start);
if (v->len + to_add > v->capacity) {
const size_t new_capacity = v->len +
(to_add < v->len? v->len: to_add);
cstr * const new_start =
realloc(v->_, new_capacity * sizeof(cstr));
if (!new_start) return false;
v->_ = new_start;
v->capacity = new_capacity;
}
memcpy(v->_ + v->len, start, to_add * sizeof(cstr));
v->len += to_add;
return true;
}
#define append(VEC, START, END) _Generic((VEC), \
Vec_float*: append_Vec_float, \
Vec_str*: append_Vec_str, \
Vec_cstr*: append_Vec_cstr \
)(VEC, START, END)
#include <stdio.h>
int main(void)
{
Vec_cstr words = {0};
cstr animals[] = { "horse", "cat", "chicken", "dog" };
cstr plants [] = { "radish", "wheat", "tomato" };
append(&words, &animals[0], &animals[2]);
append(&words, &plants[0], &plants[3]);
append(&words, &animals[2], &animals[4]);
for (cstr *w = words._, *end = words._ + words.len;
w != end; ++w) {
fprintf(stderr, "Word: \"%s\"\n", *w);
}
return 0;
}
|
This can be done without Cedro with a technique called “X Macros”.
The X macro technique was used extensively in the operating system and utilities for the DECsystem-10 as early as 1968, and probably dates back further to PDP-1 and TX-0 programmers at MIT.
Randy Meyers in «The New C: X Macros», Dr.Dobb’s 2001-05-01
Cedro | X macro |
---|---|
typedef enum {
#foreach { V { \
SPACE, NUMBER, \
KEYWORD, IDENTIFIER, OPERATOR \
}
T_##V#,
#foreach }
} TokenType;
|
#define LIST_OF_VARIABLES \
X(SPACE), X(NUMBER), \
X(KEYWORD), X(IDENTIFIER), X(OPERATOR)
typedef enum {
#define X(V) T_##V
LIST_OF_VARIABLES
#undef X
} TokenType;
#undef LIST_OF_VARIABLES
|
#foreach { VALUES {{ \
SPACE, NUMBER, \
KEYWORD, IDENTIFIER, OPERATOR \
}}
typedef enum {
#foreach { V VALUES
T_##V#,
#foreach }
} TokenType;
const char* const TokenType_STRING[] = {
#foreach { V VALUES
#V#,
#foreach }
};
#foreach }
|
#define LIST_OF_VARIABLES \
X(SPACE), X(NUMBER), \
X(KEYWORD), X(IDENTIFIER), X(OPERATOR)
typedef enum {
#define X(V) T_##V
LIST_OF_VARIABLES
#undef X
} TokenType;
const char* const TokenType_STRING[] = {
#define X(V) #V
LIST_OF_VARIABLES
#undef X
};
#undef LIST_OF_VARIABLES
|
Inserts a file as a byte array, or as a literal string as described below.
The name of the file is relative to the including C file.
#include <stdint.h>
#pragma Cedro 1.0
const uint8_t image[] = {
#embed "../images/cedro-32x32.png"
};
|
#include <stdint.h>
const uint8_t image[] = {
/* cedro-32x32.png */
0x89,0x50,0x4E,0x47,0x0D,0x0A,0x1A,…
0x00,0x00,0x00,0x20,0x00,0x00,0x00,…
⋮
};
|
The form Note: when using | |
#include <stdint.h>
#pragma Cedro 1.0
const uint8_t image
#include {../images/cedro-32x32.png}
;
This form is obsolete, and will be removed.
|
#include <stdint.h>
const uint8_t image
[1559] = { /* cedro-32x32.png */
0x89,0x50,0x4E,0x47,0x0D,0x0A,0x1A,…
0x00,0x00,0x00,0x20,0x00,0x00,0x00,…
⋮
};
|
#embed
is more flexible because it allows adding bytes
before or after the inserted file,
and also to combine several files.
#pragma Cedro 1.0
const char const vertex_shader[] = {
#embed "shader.frag.glsl"
, 0x00 // Zero-terminated string.
};
|
const char const vertex_shader[] = {
/* shader.frag.glsl */
0x23,0x76,0x65,0x72,0x73,0x69,0x6F,0x6E,…
0x65,0x63,0x69,0x73,0x69,0x6F,0x6E,0x20,…
⋮
, 0x00 // Zero-terminated string.
};
|
#pragma Cedro 1.0
const char const two_lines[] = {
#embed "text-line-1.txt"
, '\n',
#embed "text-line-2.txt"
, 0x00
};
|
const char const two_lines[] = {
0x46,0x69,0x72,0x73,0x74,0x20,0x6C,0x69,0x6E,0x65,0x2E
, '\n',
0x53,0x65,0x63,0x6F,0x6E,0x64,0x20,0x6C,0x69,0x6E,0x65,0x2E
, 0x00
};
|
Instead of inserting byte literals one by one,
they can be put all at once in a literal string
with the option --embed-as-string=<limit>
,
although this variant has certain limitations
depending on the C compiler that will be used with the result.
For instance, the Microsoft C compiler has a limit of 2048 bytes for each string literal, with a maximum of 65535 after concatenating the strings that appear together. (See “Maximum String Length”)
The ANSI/ISO C standard requires that all compilers accept at least 509 bytes in total after concatenation in C89, and 4096 in C99/C11/C17, but other compilers such as GCC and clang allow much bigger strings.
That’s why it’s necessary to specify a limit for the size: if the file goes over that limit, the result will be an array of bytes instead of a string.
In an informal test /usr/bin/time -v
,
taking the fastest run values with an 8 MB file,
on an IBM Power9 CPU with the files on RAM disk,
the code generation took 26% less time
than when using hexadecimal bytes,
and the compilation with GCC was 28 times faster
using 10 times less memory.
With clang the compilation was 72 times faster
than with bytes,
using 7 times less memory.
Generate code | Compile with GCC 11 | Compile with clang 12 | ||||
---|---|---|---|---|---|---|
cedro | 0.19 s | 1.80 MB | 27.66 s | 1328.13 MB | 30.44 s | 984.59 MB |
cedro --embed-as-string=… | 0.14 s | 1.63 MB | 0.99 s | 125.32 MB | 0.42 s | 144.95 MB |
bin2c | 0.03 s | 1.37 MB | 0.90 s | 108.70 MB | 0.52 s | 145.73 MB |
In comparison with
bin2c
,
the code generation took five times more time,
and the compilation is very similar
(±100ms, bin2c’s result compiles faster on GCC,
Cedro’s on clang)
because the format is
almost the same.
#pragma Cedro 1.0
const char const two_lines[] = {
#embed "text-line-1.txt"
, '\n',
#embed "text-line-2.txt"
, 0x00
};
|
--embed-as-string=30
const char const two lines[25] =
/* text-line-1.txt */
"First line.""\n"
/* text-line-2.txt */
"Second line.";
|
Note how Cedro realizes that the last byte is zero and removes it, because since there is a string just before, the compiler will add the zero terminator automatically.. |
|
#pragma Cedro 1.0
const char const fragment_shader[] = {
#embed "shader.frag.glsl"
, 0x00 // Zero-terminated string.
};
|
--embed-as-string=170
const char const fragment_shader[164] =
/* shader.frag.glsl */
"#version 140\n"
"\n"
"precision highp float; // needed only for version 1.30\n"
"\n"
"in vec3 ex_Color;\n"
"out vec4 out_Color;\n"
"\n"
"void main(void)\n"
"{\n"
"\tout_Color = vec4(ex_Color,1.0);\n"
"}\n";
|
#include <stdint.h>
#pragma Cedro 1.0
const uint8_t image[] = {
#embed "../images/cedro-32x32.png"
};
|
--embed-as-string=1600
#include <stdint.h>
const uint8_t image[1559] = /* cedro-32x32.png */
"\211PNG\r\n"
"\032\n"
"\000\000\000\rIHDR\000\000\000 \000\000\000 \b\002…"
⋮
…";
|
#include <stdint.h>
#pragma Cedro 1.0
const uint8_t image
#include {../images/cedro-32x32.png}
;
This form is obsolete, and will be removed.
|
--embed-as-string=1600
#include <stdint.h>
const uint8_t image
[1559] = /* cedro-32x32.png */
"\211PNG\r\n"
"\032\n"
"\000\000\000\rIHDR\000\000\000 \000\000\000 \b\002…"
⋮
…";
|
Directly inserting the code in the program is very convenient
but it will slow down compilation.
The way to reduce the problem,
apart from using --embed-as-string=<limit>
,
is to compile this part separately
as can be seen in
#include <stdint.h>
#include <stdlib.h>
#pragma Cedro 1.0
const uint8_t image[] = {
#embed "../images/cedro-32x32.png"
};
const size_t sizeof_image = sizeof(image); |
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
extern const uint8_t image[];
extern const size_t sizeof_image;
int main(void)
{
unsigned int sum = 0;
for (size_t i = 0; i < sizeof_image; ++i) {
sum += image[i];
}
fprintf(stderr,
"The sum (modulo UINT_MAX=%u) of the image bytes is %u.\n",
UINT_MAX, sum);
} |
cedrocc -c -o assets.o assets.c -std=c99
cedrocc -c -o main.o main.c -std=c99
cc -o program main.o assets.o |
This feature is an old idea
and there are several implementations, for instance
xxd
(as xxd -i
, man page)
which I used many years ago and has it since 1994.
More recently,
the include_bytes!()
macro
has been very useful to me in my Rust programs.
I got the idea of producing string literals instead of byte arrays from this comment:
the way that you’ve lowered them is absolutely the worst case for compiler performance. The compiler needs to create a literal constant object for every byte and an array object wrapping them. If you lower them as C string literals with escapes, you will generate code that compiles much faster. For example, the cedro-32x32.png example lowered as “\x89\x50\x4E\x47\0D\x0A\x1A…” will be faster and use less memory in the C compiler.
David Chisnall in Lobsters, 2021-08-12
I did not realize that, you are right of course! I know there are limits to the size of string literals, but maybe that does not apply if you split them. I’ll have to check that out.
EDIT: I’ve just found out that bin2c (which I knew existed but haven’t used) does work in the way you describe, with strings instead of byte arrays: https://github.com/adobe/bin2c#comparison-to-other-tools It does mention the string literal size limit. I suspect you know, but for others reading this: the C standard defines some sizes that all compilers must support as a minimum, and one of them is the string literal maximum size. Compilers are free to allow bigger tokens when parsing.
I’m concerned that it would be a problem, because as I hinted above my use case includes compiling on old platforms with outdated C compilers (sometimes for fun, others because my customers need that) so it is important that cedro does not fail any more than strictly necessary when running on unusual machines.
Thinking about it, I could use strings when under the length limit, but those would be the cases where the performance difference would be small. I’ll keep things like this for now, but thanks to you I’ll take these aspects into account. EDIT END.
Alberto González Palomo in Lobsters, 2021-08-12
An example of this method is
Adobe’s bin2c
published in 2020 (not the same as the
bin2c of 2012
that produces byte literals like xxd
),
and although I haven’t looked in its source code,
Cedro follows
the specification in its documentation
except the ends of line, where Cedro splits the string literal
in the same way as is usually done by hand
and also limits the size of each individual string to 500 bytes
in the hope of it working on old compilers.
More references with information on other methods:
Allows using digit separators ('
or _
)
and binary literals (0b…
).
Starting from C23, the apostrophe separator and the binary literals
are standard.
If your compiler accepts C23, you can use the option
--c23
to leave them as they are.
I prefer the underscore, but the C23 committe could not use it because of compatibility with C++, which could not use it because it conflicts with custom literal suffixes which already uses the underscore:
The syntax of digit separators is ripe for bikeshed discussions about what character to use as the separator token. The most common suggestions from the community are to use underscore (
_
), a single quote ('
), or whitespace as these seem like they would not produce any lexical ambiguities. However, two of these suggestions turn out to have usability issues in practice.[...] Use of an underscore character is also problematic, but only for C++. C++ has the ability for users to define custom literal suffixes [WG21 N2765], and these suffixes are required to be named with a legal C++ identifier, which includes the underscore character. Using an underscore as a digit separator then becomes ambiguous for a construct like
Aaaron Ballman in N2606 “Digit Separators”.0x1234_a
(does the_a
require a lookup for a user-defined literal suffix or is it part of the hexadecimal constant?).
Several compilers, for instance GCC starting from v4.3, allow binary literals as extensions to the C language.
123'45'67 |
1234567 |
123_45_67 |
1234567 |
123'45_67 |
1234567 |
0b10100110 |
0xA6 |
0b_1010_0110 |
0xA6 |
0b'1010_0110 |
0xA6 |