"char" implementation different on Teensy and Arduino

Jimbo

Member
I discovered a difference in "char" implementation between Teensy and Arduino.

With :
Code:
void setup() {
  Serial.begin(115200);
}
 
int counter = 0;
void loop() {
  char foo = 3;
  
  delay(3000); // time to click on serial monitor window icon...
  while (counter++ < 7) Serial.println(foo--, DEC);
}

Teensy 3.0 output is:
3
2
1
0
255
254
253

Arduino Nano V3.0 output is:
3
2
1
0
-1
-2
-3

At http://arduino.cc/en/Reference/Char, we can read "The char datatype is a signed type, meaning that it encodes numbers from -128 to 127. For an unsigned, one-byte (8 bit) data type, use the byte data type."

Is it intentional on Teensy, is it a bug, or is it an error of mine ? Of course, I solved the problem with a "signed char" instead of a "char"...


Thank you for the wonderfull job you made on Teensy.
 
And what does ANSI C say about the type of char? (Or does it punt and leave it to the implementation)
 
And what does ANSI C say about the type of char? (Or does it punt and leave it to the implementation)
Signedness is up to the implementation.

If you care about char being signed/unsigned, using int8_t/uint8_t instead is usually the right thing...
 
I've added this to my to-do list. Teensy's Print class really should give the same output as Arduino's.

I looked at the code briefly, but it wasn't obvious why this is happening. Realistically, it may be months until I really dig into this. But it's on my written bug list, so I will not forget.
 
The default for ARM is generally unsigned char. GCC has options to force signedness: '-fsigned-char' and '-funsigned-char'.
 
Signedness is up to the implementation.

If you care about char being signed/unsigned, using int8_t/uint8_t instead is usually the right thing...

<warning, probably boring language lawering below>
Note, technically it is now ISO standard and not ANSI (ANSI is the american standards body, ISO is the world wide standards body, ANSI released the original C standard in 1989, and then ISO released the world wide standard in 1990, there have been two revisions since then, C99 in 1999, and C11 in 2011 or 2012).

Standards arcana aside, the C standards require that the printable characters in the C character set be positive, but other characters might be negative. It originally comes from the fact that C was developed by Americans (primarily Dennis Ritchie), and at the time of its creation, there weren't 8-bit character sets, just 7-bit character sets where some of the characters were replaced by other characters in different countries. Given '{', '}', '[', ']', '\' are characters commonly replaced by these other 7-bit character sets, programming C on these languages must have been interesting. Due to politics and the way world standards bodies are organized, the C standard added trigraphs (beginning with '?' '?') to deal with these problematical characters, and then C++ decided trigraphs were too ugly, and added digraphs. I must admit, I have never seen trigraphs and digraphs in the wild, and in the latest GCC, they are turned off by default, and you have to use one of the standards flags to enable them.

Anyway, the original C machines (PDP-7, then PDP-11, followed by the VAX and now x86) all tended to favor using signed byte operations, so plain char became signed in those implementations. However some machines default to unsigned operations instead of signed, and they opt for making plain char unsigned (or if the language abi writers are concerned about working with 8-bit character sets that the majority of the world uses). In order to provide some way to allow programmers to deal with the issue, the original ANSI C committee (that I was on) added using 'unsigned' as a modifier to 'char', and also added a new keyword 'signed' for use with plain chars and bitfields. I recall during the standardization process Dennis Ritchie said one of the things he might have done differently if he had the benefit of hindsight when designing C was to make chars unsigned by default (priority of && and || was another thing, but by the time they realized it, there were 10's of machines running C, and it was thought to be too painful to change).

Speaking of signed defaults, when I was working on the Motorola 88000 GCC port many years ago, the ABI for the 88k demanded that chars and bitfields default to unsigned, and it took 1-2 years of arguing with Richard Stallman to allow the 88k port to actually implement the ABI in question by default, rather than using the GCC default of signed chars/bitfields.

I suspect ISO8859-1 is the most common 8-bit set that incorporates the US 7-bit ascii character set, plus the upper characters representing characters found in other Western European languages (and ISO8859-15 which revises ISO8859-1 and adds the Euro symbol), but there are others. Here is the wiki entry on it in case somebody is curious: http://en.wikipedia.org/wiki/ISO/IEC_8859
 
Last edited:
I've added this to my to-do list. Teensy's Print class really should give the same output as Arduino's.

But the Print class is doing exactly the same thing. It isn't an issue in the Print class.
The difference (issue) is happening above that in the actual sketch.
The code in the sketch is treating the char differently and so the call argument promotion of the "char" is handling
the sign bit differently.
The C standard allows the default behavior (sign-ness) for "char" to be chosen by the implementor,
I think that it is bit dumb, but hey thats what they decided.

It looks like AVR and ARM didn't pick the same default char behavior.
To me, the simple answer is if you want ARM basedd Teensy3 to have the same default char behavior as AVR,
then you have the ability to "make it so".

Use the -fsigned-char command-line option for Teensy3 compiles.

There are some other things that I consider "bugs" in the AVR with respect to "char" handling during argument call promotion.
There are certain loop conditions where the compiler will mis-optimize promotion when collapsing loops
and not properly clear the upper byte of the 16 bit int created during function call promotion.
The char behavior also changes if the char variable is declared as an automatic vs a static.
The AVR C guys claim the usage of the char in the example I provided was in an undefined area;
however, my read is that it was in an implementation dependent area.
I argued that undefined != implementation dependent are not the same and that the char promotion to called functions
should be handled the same regardless of surrounding code, declaration, or optimization used.
i.e. the implementation can pick how it wants to do it but then it must be consistent.
I lost the debate.
Keep in mind that this same behavior can affect the results you see in code like the original example code provided above.
The issue comes down to overflow/underflow/rollover and the behavior during those is undefined according to the standard.
The AVR C guys will say would you have preferred the the program crash when foo underflowed?
So just be careful that if you ever depend on overflow/underflow, it is not guaranteed to work
especially on AVR if you pass the variable as an argument to a function,
depending on your declaration and signess of the variable.

Anyway, as mentioned earlier, chars suck if used as 8 bit integers, since they are special and really aren't generic 8 bit values.
even if chars default to or if -fsigned-fsigned-char is effect,
"char" is not the same as a "signed char" - and this is required in the C standard.
The best thing is to NEVER use a char for anything but real chars.
Use uint8_t int8_t, ints etc... for your loops or even better consider using the c99 types
uint_fast8_t or int_fast8_t for types that do not have to be exactly 8 bits.
This allows the compiler to optimized as needed down to only 8 bits
if it helps optimize the code since depending on the processor the optimal bit size can be different.

BTW,
I sympathize with you on trying to convince Stallman of making/accepting changes to gcc.
I argued with him for quite some time back in the late 80's about relaxing structure member alignments.
(structure packing)
He would have nothing to do with it. I ended up having to create a custom version of the
m68k compiler to allow it. History has shown that he was on the wrong side of that one.


--- bill
 
Back
Top