CoffeeCat - programming language & source-source compiler

Status
Not open for further replies.

Camel

Well-known member
Hello all,

CoffeeCat is a personal project of mine that I feel is ready to been shown off a little. It is a programming language and a source-source compiler that is designed for rapid development of high integrity embedded software. Basically, high-level source files get compiled into C++ which then get compiled into an executable.

Github page

CoffeeCat gives you the performance and reliability of a C++ program with the simplicity of a scripted language. It does not have issues with heap fragmentation or memory corruption like micropython and it is immune to buffer overruns that are sometimes seen in C code (I'm looking at you, sprintf() ). It has been designed from the ground up with safety, integrity and performance as well as the constraints of embedded systems in mind.

The syntax could be described as pythonesque/C#ey. There are some examples here.

This project targets the STM32F103 chip and uses the ninjaskit HAL and CoffeeCat - github page

It is fast to develop with, much easier to maintain than C/C++, runs as fast as C/C++ (because it compiles into C++), and is safer than micropython (and badly written C/C++). :)

Sooo . . . yeah. I'm basically after feedback and stuff. Does anyone know if this has been done before? Would you use it in your projects? If not, why?

Cheers!
 
[..]This project targets the STM32F103 chip and uses the ...
[..]Would you use it in your projects? If not, why?[..]

No, because we use Teensy here, which uses a differentChip :)
You posted in the wrong Forum.
 
Last edited:
No, because we use Teensy here, which uses a differentChip :)
You posted in the wrong Forum.

Aah, ok. Perhaps I didn't word that sentance very well. The STM32 project is purely there as an example of the language and what it can do. There's absolutely no reason you couldn't use it with a Teensy, all you'd have to do is port ETK and you're good to go. It uses ETK under the hood for a lot of things, like arrays, lists, strings and memory pools. I think a Teensy port would be easy - I've got a few teensy 3.2s lying around so maybe I'l do it myself.
Blinky
Code:
void setup():
    pinMode(LED_BUILTIN, OUTPUT)

void loop():
    digitalWrite(LED_BUILTIN, HIGH)
    delay(1000)
    digitalWrite(LED_BUILTIN, LOW)
    delay(1000)

Arduno de-bounce example
Code:
global int buttonPin = 2
global int ledPin = 13

global int ledState = HIGH
global int buttonState
global int lastButtonState = LOW

global uint32 lastDebounceTime = 0
global int32 debounceDelay = 50

void setup():
    pinMode(buttonPin, INPUT)
    pinMode(ledPin, OUTPUT)
    digitalWrite(ledPin, ledState)

void loop():
    int reading = digitalRead(buttonPin)

    if (reading != lastButtonState):
        lastDebounceTime = millis()

    if ((millis() - lastDebounceTime) > debounceDelay):
        if (reading != buttonState):
            buttonState = reading

            if (buttonState == HIGH):
                ledState = !ledState

    digitalWrite(ledPin, ledState)

    lastButtonState = reading

Admittedly, these examples don't show off much of the language, but at least you can see what it's like.
 
Sorry, this doesn't appear to be a step forward. From a quick glance at your posts and the github readme, it looks like CoffeeCat is merely syntactic sugar on a subset of C++. This approach might make things slightly easier for rank beginners, but any bugs in the language will trip them up and they'll need to unlearn it in order to move on.

If you want a scripting language on a microncontroller there are already plenty to choose from. If you want better safety, you'll need to dig much deeper into ideas, tools and languages that improve correctness. Notation is trivial. Correctness proofs are not.
 
Sorry, this doesn't appear to be a step forward. From a quick glance at your posts and the github readme, it looks like CoffeeCat is merely syntactic sugar on a subset of C++. This approach might make things slightly easier for rank beginners, but any bugs in the language will trip them up and they'll need to unlearn it in order to move on.

If you want a scripting language on a microncontroller there are already plenty to choose from. If you want better safety, you'll need to dig much deeper into ideas, tools and languages that improve correctness. Notation is trivial. Correctness proofs are not.

You are right, it's all about syntactic sugar, rapid development & safety. I'm hoping that at some point there will be no bugs in the language, and that the ccat compiler will be able to pick up on ALL errors instead of leaving it to g++ as it does now. For now, I'm happy to work on just getting the language right & let g++ pick up the slack.

I've tried both Squirrel and micropython and they are both great fun. But they have some pretty major down sides. The first is execution speed. They are slow & non-deterministic. The second is that they tend to use heap memory in a frivolous manner. This means they are prone to issues such as heap corruption and did I mention slowness & non-determinism?

The whole idea with this language is to provide an alternative to the current options that is more concise than C++. C++ is great, but it really throws the kitchen sink at you and the training wheels are off the whole time. It's easy to write bad code with C++. CoffeeCat is simple and concise, and gets its functionality from wrapping 'safe' C++ classes in syntactic sugar. It's hard to write inefficient or bad code with CoffeeCat.

Arrays, lists and strings are all bounds checked so buffer runs are eliminated. Objects are passed as const references by default so you cannot accidentally deep copy objects. Everything is placed on the stack by default but if you need to use memory dynamically you can easily use either a memory pool or the heap (memory pools are preferred because they are faster & safer).

Can I ask what you mean by correctness proofs?
 
You have a tremendous amount of work to do if you're serious about getting any substantial number of people to adopt your laungauge. Posting like this on forums, especially when you only support other off-topic hardware, will do little good.

If you're really serious about this, if you really do want to put in the many years of sustained effort any new language takes to gain adoption, I would suggest you first start with a couple free workshops, perhaps at hackerspaces. You will learn much as you try to teach it to a few people face-to-face, when you see where they get things and where they are confused. It'll also give you a reason to (hopefully) write up a simple step-by-step tutorial. Those early experiences can really help you to create a more credible website, with examples and photos of success stories, and with more emphasis on your language's practical benefits, rather than its technical design.

But beware the natural human social convention for polite in-person dialogue. I'd advise you *not* to ask for direct feedback such as "do you like it" or "will you use it". Instead, collect contact info and follow up later. If they're regulars at a hackerspaces, that's pretty simple because you'll see them again. You'll know if they liked it by whether they go on to actually make use of it for any other project.

I can't emphasize enough the human, not technical, nature of this effort. It's relatively easy to create a language, even a pretty good one in a technical sense. So much more is needed to refine it, to create compelling materials & documentation, to gain initial adopters, and to grow towards a community and eventually become well established. Others have done this, so it's certainly not impossible. But you have a tremendous amount of work to do, and quite frankly, you're probably not helping your cause by putting it out to forums like this, before you've done the early work of teaching small groups in person and refining based on those experiences. (Yes, to be a bit blunt, the current lack of refinement shows you haven't done anything like this yet)

Good luck on this very long journey. Oh, and somewhere along the line, port it to Teensy. :)
 
Last edited:
You have a tremendous amount of work to do if you're serious about getting any substantial number of people to adopt your laungauge. Posting like this on forums, especially when you only support other off-topic hardware, will do little good.

If you're really serious about this, if you really do want to put in the many years of sustained effort any new language takes to gain adoption, I would suggest you first start with a couple free workshops, perhaps at hackerspaces. You will learn much as you try to teach it to a few people face-to-face, when you see where they get things and where they are confused. It'll also give you a reason to (hopefully) write up a simple step-by-step tutorial. Those early experiences can really help you to create a more credible website, with examples and photos of success stories, and with more emphasis on your language's practical benefits, rather than its technical design.

But beware the natural human social convention for polite in-person dialogue. I'd advise you *not* to ask for direct feedback such as "do you like it" or "will you use it". Instead, collect contact info and follow up later. If they're regulars at a hackerspaces, that's pretty simple because you'll see them again. You'll know if they liked it by whether they go on to actually make use of it for any other project.

I can't emphasize enough the human, not technical, nature of this effort. It's relatively easy to create a language, even a pretty good one in a technical sense. So much more is needed to refine it, to create compelling materials & documentation, to gain initial adopters, and to grow towards a community and eventually become well established. Others have done this, so it's certainly not impossible. But you have a tremendous amount of work to do, and quite frankly, you're probably not helping your cause by putting it out to forums like this, before you've done the early work of teaching small groups in person and refining based on those experiences. (Yes, to be a bit blunt, the current lack of refinement shows you haven't done anything like this yet)

Good luck on this very long journey. Oh, and somewhere along the line, port it to Teensy. :)

Thank you for the feedback Paul.

I had hoped that if I posted here there would be at least some appreciation of the technical benefits, but I see that you're quite right about the human side. This is a personal project done for fun and I'm not interested in working hard to get others to adopt it - that just doesn't feel right. I did actually hope that a few people would be interested enough to try it out, even in its present unpolished state (the examples compile fine on a PC), but that's OK. It's on github and now you know of it's existance at least. :p
 
No offense, but your code is one performance WTF after another. The generated code for your ETK is horrible. E.g. Rope:
Code:
Rope::Rope(char* buf, uint32 maxlen, const char* c)
{
    str = buf;
    pos = 0;
    N = maxlen;
    for(uint32 i = 0; i < c_strlen(c, maxlen); i++)
        append(c[i]);
    Rope::terminate();
}

void Rope::append(char c)
{
    if(pos < N-1)
    {
        str[pos++] = c;
        str[pos] = '\0';
    }
}

void test(char* test_string) {
    const uint32 max_len = 128;
    char buffer[max_len];
    Rope rope(buffer, max_len, "hello world");
}

"append()" and even worse worse "c_strlen()" are performed each and every loop iteration in the constructor. Neither GCC nor Clang are able to optimize that code. For the test function I even gave them a compile-time constant source string and buffer length.

You can check the generated code with the Compiler Explorer.

Since we just had some discussions about the broken Teensy float to string functions, I looked at yours.
Code:
int main() {
    const uint32 max_len = 128;
    char buffer[max_len];
    Rope rope(buffer, max_len, "Float->Rope is broken: ");
    {
        float test_val = 100000;
        rope.append(test_val, 7);
        std::cout << "Correct value: " << test_val << "     " << rope.c_str() << std::endl;
    }
    {
        rope.clear();
        auto test_val = std::numeric_limits<int32_t>::max()/2;
        rope << "Int->Rope is also buggy: " << test_val;
        std::cout << "Correct value: " << test_val << "     "  << rope.c_str() << std::endl;
    }
}
$ ./a.exe
Correct value: 100000 Float->Rope is broken: -214.7483648
Correct value: 1073741823 Int->Rope is also buggy: 73741823

There is undefined behavior everywhere in your code. Doing math in C++ without incurring undefined behavior is unfortunately rocket science.
Code:
void Rope::append(int32 j, uint32 npad)
{
    if(j < 0)
    {
        append('-');
        j *= -1;
    }
    append((uint32)j, npad);
}
This will overflow for INT32_MIN (which is undefined behavior). "Rope::append(float j, uint8 precision)" has a cast float->int64 which has undefined behavior if it overflows the int64.

Memory corruption:
Code:
int main() {
    const uint32 max_len = 128;
    char buffer1[max_len]; char buffer2[max_len];
    strcpy(buffer1, "              I'm after rope 1. I will corrupt buffer2.");
    strcpy(buffer2, "          I'm after rope 2. I will get corrupted.");
    Rope rope1(buffer1, 3);
    Rope rope2(buffer2, 10);
    std::cout << "buffer after rope2: " << rope2.c_str()+10 << std::endl;
    rope1.sub_string(rope2, 4, 60);
    std::cout << "buffer after rope2: " << rope2.c_str()+10 << std::endl;
}
$ ./a.exe
buffer after rope2: I'm after rope 2. I will get corrupted.
buffer after rope2: I'm after rope 1. I will corrupt buffer2.
 
Thanks for the review. It's all a work in progress . . . I think you're the first person other than myself to inspect that code so closely (no one else has flagged any issues, anyway), so I really do appreciate it. Please feel free to open issues on github.

I was able to reproduce the int->rope problem, but float->rope I get "Correct value: 100000 Float->Rope is broken: 99999.9995904". Are you able to offer any insight into this?
 
I was able to reproduce the int->rope problem, but float->rope I get "Correct value: 100000 Float->Rope is broken: 99999.9995904". Are you able to offer any insight into this?
I used your typedefs, the typedef for int64_t was long - which is 32 bits on my test system. To see the overflow on your system, just increase precision to 10 (mul will overflow) or use a larger float value like 1000000000000.0 to print and "static_cast<int64>(roundf(j*mul))" will overflow.
 
Ok, so I think I've fixed the issues that you found. I'm still a little shocked that the compiler doesn't optimise the rope constructor, but oh well, that was an easy fix.

ETK doesn't do scientific notation yet, so I've worked around the float->rope issue by determining whether an overflow will occur. If it will, then the number is too large to be displayed and you get the text "ovr" instead. Kind of like "nan" and "inf". It's simply a limitation of the library for now.
 
Can I ask what you mean by correctness proofs?

You might start by reading about Software System Safety and Formal Verification.

The history of programming language design is littered with projects intended to make programming safer and easier for beginners. The vast majority of these attempts fail and often for common reasons. At the heart of the matter is the mistaken belief that a simplified subset of an established language would be an improvement despite any unintended consequences. The unintended consequences are very much worthy of study.

What happens as language users become more sophisticated and take on more complex problems? What are the costs associated with talent, maintenance, training, porting, certification, etc. etc.?

I'm not arguing that no one should ever try to design a new programming language. I am arguing that successful contribution is unlikely without extensive experience in the pitfalls and without a sufficiently experienced team.

As a learning experience, designing and implementing toy languages is great. Just keep in mind most folks need tools that solve their problems and experimental tools usually don't fit the bill.
 
Before I actually started my bare metal stuff on Teensy I was pondering the idea to use Rust on it. I skipped on that in the end because of the implied porting and runtime/size overhead, but that would be the way for me to go if I ever decided to use something closer to a scripting language on an embedded system. Another option is (e)Lua. Downside from your point of view is that these languages are more or less fully developed, i.e. less to learn in that regard. If the core idea is to create "immediate value" and less about the language creation step, porting Rust or (e)Lua is probably more useful in the long run.

Btw, heap fragmentation: I've checked into the malloc()/free() stuff which you get with the Arduino software on Teensy. There are some tricks included to lessen the chance for fragmentation, in return you can crash Teensy if you overallocate the memory (which is probably more of a configuration fault, haven't checked in detail).

sprintf() shouldn't be used in the first place, even on desktop machines, which is why you have snprintf(). Admittedly I dunno if that exists in the Arduino context, although it is pretty much a must in memory limited environments.
 
sprintf() shouldn't be used in the first place, even on desktop machines, which is why you have snprintf(). Admittedly I dunno if that exists in the Arduino context, although it is pretty much a must in memory limited environments.
Both the AVR and ARM toolchains have snprintf.

Properly used, there is nothing wrong with String either. If you reserve a sufficiently large initial buffer and only use concat / append, there is only a single allocation.
 
Status
Not open for further replies.
Back
Top