Something wrong with Adding a String to itself!

defragster

Senior Member+
Why would this HALT the Teensy 3.1:
onestring += onestring;

I had a longer example that worked for some repetitions { growing the string to over 80KB } - however in going for a shorter sample once I Printed the string as shown below to clarify this example it HALTS in the FIRST concatenation before printing : " HELLO WORLD !!!!!"

Code:
String onestring = "1234567890";

void setup() {
  while (!Serial && (millis ()  <= 3000));

  Serial.print( "\n\n");
[B]  Serial.print( onestring );     [/B]
  Serial.println( "<<< A");
  Serial.print( "1 += 1");
[B]  onestring += onestring;[/B]
  Serial.print( "  HELLO WORLD !!!!!" );
}

void loop() {
}

Here is the Full debug output:
1234567890<<< A
1 += 1
 
Not sure, May take a look for the fun of it. Could be a += operator, did you localize to adding itself was the issue vs creating another string and doing:
onestring += twostring...

Or could generate simple code, that does something like:
Code:
void strCat(char *strOut, char *strIn) {
    // find the end of the output str
    while (*strOut) strOut++;
    // now add string to end
    while (*strOut++ = *strIn++) ;
}
Now image calling this like:
Code:
char MyString[50] = "1234567890";
strCat(MyString, MyString);
In this case the code will be chasing it's own table, and write over memory...

I would assume that this would be different with strings, where it probably keeps a string length pointer, and would use this to control how many bytes it copies, but could still end up doing the copy of data like I showed above, or maybe it increments the length counter on each byte it transfers and has a psuedo code like:
Code:
    for (i=0; i < inString.length; i++) {
        OutString[OutString.length] = inString[i];
        OutString.length++
}
So again if the generated code works sort of like above, the i will never reach the length...
 
For the fun, I compiled your code, then used objdump to look at what was generated and I believe it looks like:
Code:
00001578 <_ZN6String6appendEPKcj>:
    1578:	b570      	push	{r4, r5, r6, lr}
    157a:	4604      	mov	r4, r0
    157c:	460d      	mov	r5, r1
    157e:	6883      	ldr	r3, [r0, #8]
    1580:	b15a      	cbz	r2, 159a <_ZN6String6appendEPKcj+0x22>
    1582:	18d6      	adds	r6, r2, r3
    1584:	4631      	mov	r1, r6
    1586:	f7ff ffad 	bl	14e4 <_ZN6String7reserveEj>
    158a:	b130      	cbz	r0, 159a <_ZN6String6appendEPKcj+0x22>
    158c:	6820      	ldr	r0, [r4, #0]
    158e:	68a3      	ldr	r3, [r4, #8]
    1590:	4418      	add	r0, r3
    1592:	4629      	mov	r1, r5
    1594:	f003 fd38 	bl	5008 <strcpy>
    1598:	60a6      	str	r6, [r4, #8]
    159a:	4620      	mov	r0, r4
    159c:	bd70      	pop	{r4, r5, r6, pc}
    159e:	bf00      	nop

000015a0 <_ZN6String6appendERKS_>:
    15a0:	b508      	push	{r3, lr}
    15a2:	460b      	mov	r3, r1
    15a4:	6809      	ldr	r1, [r1, #0]
    15a6:	689a      	ldr	r2, [r3, #8]
    15a8:	f7ff ffe6 	bl	1578 <_ZN6String6appendEPKcj>
    15ac:	bd08      	pop	{r3, pc}
    15ae:	bf00      	nop

Note: I have not done a lot of ARM assembly language... Probably someday should find good reference manual.
But I believe the code works like:
Current length = 10
New length = <current Length> + <current length (from source pointer> = 20
Maybe reallocate the destination string : ZN6String7reserveEj
Then calls strcpy, with pointers to current end of string (trailing \0 ), and pointer to start of string.
Then it will chase it's tail like I mentioned in first one...

Edit: Should mention you can find the sources to the string class in the Teensy Cores3 folder. If you look at WString.h/cpp, you will see:
Code:
String & String::append(const String &s)
{
	return append(s.buffer, s.len);
}

String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	if (length == 0 || !reserve(newlen)) return *this;
	strcpy(buffer + len, cstr);
	len = newlen;
	return *this;
}
Which shows the issue I mentioned. You could also run into another issue, that is suppose, in the append function, that you need to reserve new space, it will reallocate the string, which may move the string to a new memory location, but you will then try to copy in from the old memory location which is probably free space on the heap.
 
Last edited:
Not sure, May take a look for the fun of it. Could be a += operator, did you localize to adding itself was the issue vs creating another string and doing:
onestring += twostring...

Yes, that is why the name onestring was chosen, there was a twostring. I had a great sample with iterations and steps as noted taking the size of onestring up to over 80KB and working - then a second pass.

Then it all boiled down to the simple sample and that code went away.

Nice you got to the sources - I didn't get that far at that hour.

There was another post noting he had to append something else to get it to work - I only ran into it trying to grow memory use.
 
So perhaps using the slightly safer strncpy could fix the issue, in the String concatenation code:

Code:
String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	if (length == 0 || !reserve(newlen)) return *this;
	strncpy(buffer + len, cstr, length);
	len = newlen;
	return *this;
}
 
Actually I would use memcpy as you know the lengths... BUT: this still may not full solve it as again resever() may move the string elsewhere and as such you may be copying in garbage.... Or if the heap allocator can mark memory as not accessible, could fault...
 
So perhaps using the slightly safer strncpy could fix the issue, in the String concatenation code:
strncpy can't be used for overlapping source/destination (the \0 end of string overlaps).
Actually I would use memcpy as you know the lengths... BUT: this still may not full solve it as again resever() may move the string elsewhere and as such you may be copying in garbage...
Yes, there needs to additional logic to handle that. A quick fix would be (untested):
Code:
String & String::append(const String &s) {
    if(!reserve(len+s.len)) return *this;
    return append(s.buffer, s.len);
}
[COLOR="#D3D3D3"]
String & String::append(const char *cstr, unsigned int length) {
    unsigned int newlen = len + length;
    if (length == 0 || !reserve(newlen)) return *this;
    memcpy(buffer + len, cstr, length+1);
    len = newlen;
    return *this;
}[/COLOR]

Edit:
"String & String::append(const char *cstr, unsigned int length)" is buggy. Fixed version:
Code:
String & String::append(const char *cstr, unsigned int length) {
    unsigned int newlen = len + length;
    if (length == 0 || !reserve(newlen)) return *this;
    memcpy(buffer + len, cstr, length);
    buffer[newlen] = 0;
    len = newlen;
    return *this;
}
 
Last edited:
This case works where realloc() in changeBuffer() does the right thing when reserve() determines more space is needed as below.

The problem is the NAKED:: "1string += 1string;" - as in the other post I referred to even adding a "" to the situation works.

Here is an updated sample - I reserve 200 chars then die when 10 is added to 21 chars. If the failing "+=" is removed the loop() code works fine.

Code:
String onestring = "1234567890";
String twostring = "abcdefghij";
char sibuff[300];

void setup() {
  while (!Serial && (millis ()  <= 3000));
  onestring.reserve(200);
  Serial.print( "\n\n");
  Serial.print( onestring );     Serial.println( "<<< A");
  Serial.print( "1 += 1");
  onestring += "" + onestring;
  Serial.print( "  HELLO WORLD !!!!!" );
  Serial.print( "\n\n");
  Serial.print( onestring );     Serial.println( "<<< B");
  Serial.print( "1 += 1");
  onestring += onestring; // Comment this out to complete setup
  Serial.print( "  HELLO WORLD !!!!!" );
}

void loop() {
  if ( onestring.length() < 300 ) {
    Serial.print( onestring );     Serial.println( "<<< ");
    Serial.println( "1 += 1");
    //    onestring += onestring; // this is DEATH
    onestring += "" + onestring; // THIS WORKS
    Serial.print( "onestring.length()=" );
    Serial.println( onestring.length() );
  }

}

Resulting output shows death on the original problem case - even with space and even after an initial operation works.
1234567890<<< A
1 += 1 HELLO WORLD !!!!!

12345678901234567890<<< B
1 += 1

This CORES code for String re-alloc works!::
Code:
unsigned char String::changeBuffer(unsigned int maxStrLen)
{
	char *newbuffer = (char *)realloc(buffer, maxStrLen + 1);
	if (newbuffer) {
		buffer = newbuffer;
		capacity = maxStrLen;
		return 1;
	}
	return 0;
}

void* realloc (void* ptr, size_t size);

Reallocate memory block

Changes the size of the memory block pointed to by ptr.

The function may move the memory block to a new location (whose address is returned by the function).

The content of the memory block is preserved up to the lesser of the new and old sizes, even if the block is moved to a new location. If the new size is larger, the value of the newly allocated portion is indeterminate.

It can safely grow the string if it leaves setup() and the "" is part of the equation.
1234567890<<< A
1 += 1 HELLO WORLD !!!!!

12345678901234567890<<< B
1 += 1 HELLO WORLD !!!!!12345678901234567890<<<
1 += 1
onestring.length()=40
1234567890123456789012345678901234567890<<<
1 += 1
onestring.length()=80
12345678901234567890123456789012345678901234567890123456789012345678901234567890<<<
1 += 1
onestring.length()=160
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890<<<
1 += 1
onestring.length()=320
 
Finally had a look at the code - Kurt was right about memcpy.

ERROR :: BUT this BREAKS using the sketch above in loop with :: onestring += "" + onestring; // THIS WORKS

This edit to WString.cpp fixes my sketch:
Code:
String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	if (length == 0 || !reserve(newlen)) return *this;
	if ( buffer == cstr ) {
		memcpy(buffer + len, cstr, length);
		}
	else
		strcpy(buffer + len, cstr);
	len = newlen;
	return *this;
}

Code:
String onestring = "1234567890";
String twostring = "abcdefghij";
char sibuff[300];

void setup() {
  while (!Serial && (millis ()  <= 3000));
  onestring.reserve(200);
  Serial.print( "\n\n");
  Serial.print( onestring );     Serial.println( "<<< A");
  Serial.println( "1 += 1");
  onestring += "" + onestring;
  Serial.print( "  HELLO WORLD !!!!!" );
  Serial.print( "\n\n");
  Serial.print( onestring );     Serial.println( "<<< B");
  Serial.println( "1 += 1");
  onestring += onestring; // WORKS WITH memcpy edit to WString.cpp
  Serial.print( onestring );     Serial.println( "<<< C");
  Serial.println( "  HELLO WORLD !!!!!" );
  onestring += twostring;
  Serial.print( onestring );     Serial.println( "<<< D");
}

void loop() {
  if ( onestring.length() < 300 ) {
    Serial.print( onestring );     Serial.println( "<<< ");
    Serial.println( "1 += 1");
    onestring += "" + onestring; // THIS WORKS
    Serial.print( "onestring.length()=" );
    Serial.println( onestring.length() );
  }
  else if ( onestring.length() < 4000 ) {
    Serial.print( onestring );     Serial.println( "<<< ");
    Serial.println( "1 += 1");
    onestring += onestring; // WORKS WITH memcpy edit to WString.cpp
    Serial.print( "onestring.length()=" );
    Serial.println( onestring.length() );
  }
}
 
Last edited:
Note: On memcpy, not sure if you should use length or length+1 (i.e. not sure if it includes the trailing null character)

Also not sure if you need test for buffer == cstr? You can probably always do memcpy...

But if the string get's reallocated you are probably still copying from potentially bogus memory.

Note: for the most part I Avoid using heap code on smaller processors.

But years ago when I used to use it and run into memory issues, there used to be debug versions of the malloc libraries. I believe some of them did things like when you did a free of memory, it would write out some pattern into the freed memory, such that if you use it, it might be obvious. Also When you allocated memory, it might allocate slightly larger and put some signature at the end and maybe a few bytes before allocated, that it verified were still those bytes when freed or realloced or...

Likewise on some processors with segments, it would allocate the memory such that you would get access faults if you went outside of allocated memory or used memory that was previously freed.

Not sure if we have such library setup here.

Again here may not be much of an issue as you only have one thing allocating memory and as such it can probably just extend the previous allocation. But if things get more complex and you have things like maybe interrupt handler that allocates or frees memory, then maybe run into issues, but than again the memory allocater may not be thread or interrupt safe...
 
@defragster:
Your change is not enough. If reserve/realloc allocates a new buffer, your code may fail. cstr will point to an invalid buffer when the memcpy is performed. Look at post #7.
 
I think there might be several places in the String library that expects the data to be zero terminated, that can be an issue when using memcopy, unless taking care to insure the zero termination.

SORRY was a bit quick to answer defragster before seeing Kurts analysis
 
Last edited:
I think there might be several places in the String library that expects the data to be zero terminated, that can be an issue when using memcopy, unless taking care to insure the zero termination.
Yes there are. That's why I'm copying length+1 chars in post #7 to ensure 0-termination.
 
@defragster:
Your change is not enough. If reserve/realloc allocates a new buffer, your code may fail. cstr will point to an invalid buffer when the memcpy is performed. Look at post #7.

This update should handle that case of buffer being moved.
The NEW problem (noted in #9) is what used to work does not work - per post #12 - below I added a NULL to the internal cstr[] as that would not have been there.

Somebody needs to sanity check this as I've never seen the internal representation of String before.

Below is my super Kludge Hacked test case that now runs for me.

Code:
String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	bool self = false;
	if ( buffer == cstr ) self = true;
	if (length == 0 || !reserve(newlen)) return *this;
	if ( self ) {
		memcpy(buffer + len, buffer, length);
		buffer[newlen] = 0;
		}
	else
		strcpy(buffer + len, cstr);
	len = newlen;
	return *this;
}

Code:
String onestring = "1234567890";
String twostring = "abcdefghij";
String trestring = "ABCDEFGHIJ";
char sibuff[300];

void setup() {
  setup1();
  loop1();
  loop1();
  loop1();
  loop1();
  loop1();
  loop1();
  loop1();
  loop1();
  loop1();
  setup2();
}

void loop() {
  loop2();
}
void setup1() {
  while (!Serial && (millis ()  <= 3000));
  onestring.reserve(200);
  Serial.print( "\n\n");
  Serial.print( onestring );     Serial.println( "<<< A");
  Serial.println( "1 += 1");
  onestring += "" + onestring;
  Serial.print( "  HELLO WORLD !!!!!" );
  Serial.print( "\n\n");
  Serial.print( onestring );     Serial.println( "<<< B");
  Serial.println( "1 += 1");
  onestring += onestring; // Comment this out to complete setup
  Serial.print( "  HELLO WORLD !!!!!" );
}

void loop1() {
  if ( onestring.length() < 3000 ) {
    Serial.print( onestring );     Serial.println( "<<< ");
    Serial.println( "1 += 1MNMNMNMN");
    onestring += onestring; // this is DEATH
    Serial.println( "1 += 1MNMNMNMN xyz");
    onestring += "" + onestring; // THIS WORKS
    Serial.print( "onestring.length()=" );
    Serial.println( onestring.length() );
  }

}

void setup2() {
  while (!Serial && (millis ()  <= 3000));
  twostring.reserve(200);
  Serial.print( "\n\n");
  Serial.print( twostring );     Serial.println( "<<< Aaa");
  Serial.println( "1 += 1");
  twostring += "" + twostring;
  Serial.print( "  HELLO WORLD 222 !!!!!" );
  Serial.print( "\n\n");
  Serial.print( twostring );     Serial.println( "<<< Bbb");
  Serial.println( "1 += 1");
  twostring += twostring; // WORKS WITH memcpy edit to WString.cpp
  Serial.print( twostring );     Serial.println( "<<< Cbb");
  Serial.println( "  HELLO WORLD  222 !!!!!" );
  twostring += trestring;
  Serial.print( twostring );     Serial.println( "<<< Ddd");
}

void loop2() {
  if ( twostring.length() < 300 ) {
    Serial.print( twostring );     Serial.println( "<<< 2222");
    Serial.println( "1 += 1");
    twostring += "" + twostring; // THIS WORKS
    Serial.print( "twostring.length()=" );
    Serial.println( twostring.length() );
  }
  else if ( twostring.length() < 4000 ) {
    Serial.print( twostring );     Serial.println( "<<< 2222");
    Serial.println( "1 += 1");
    twostring += twostring; // WORKS WITH memcpy edit to WString.cpp
    Serial.print( "twostring.length()=" );
    Serial.println( twostring.length() );
  }
}
 
Last edited:
Yes there are. That's why I'm copying length+1 chars in post #7 to ensure 0-termination.

This will fail when the buffer is moved [and the two pointers are the same], and also there will be no NULL to copy - that is the first character over written on memcpy()
 
Last edited:
memcpy of length+1 chars wont help, cause the 0 termination has already been overwritten when the last char is copied, so it must be manually added (or by strncpy :) ) like defragster tries, but he places it at : buffer[length] = 0; when it actually should be at buffer[newlen] = 0;
 
8) - Cant believe how much of a life this one has become :D

I think if it were me and it were important, I would maybe rework some of it. That is from earlier posting the current code is:
Code:
String & String::append(const String &s)
{
	return append(s.buffer, s.len);
}

String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	if (length == 0 || !reserve(newlen)) return *this;
	strcpy(buffer + len, cstr);
	len = newlen;
	return *this;
}
So if we are worried about the two strings being the same case then maybe
Code:
String & String::append(const String &s)
{
	unsigned int length = s.len;
	unsigned int newlen = len + length;

	if (length == 0 || !reserve(newlen)) return *this;
	memcpy(buffer + len, s.buffer, length);
	buffer[newlen] = 0;   // First character copied if same will overwrite the null terminator...
	len = newlen;
	return *this;
}

Now if you wish for robust append, with cstr, length, then your above code that detects same string may work in that one case, but
would not work if for example, I called it with something like:

onestring.append(onestring.c_str()+1, onestring.length -1);

But again if important could detect this as see if the cstr >=buffer && cstr < (buffer+len)...

But are these issues worth fixing?
 
Last edited:
memcpy of length+1 chars wont help, cause the 0 termination has already been overwritten when the last char is copied, so it must be manually added (or by strncpy :) ) like defragster tries, but he places it at : buffer[length] = 0; when it actually should be at buffer[newlen] = 0;

Good Eye! I was just seeing if anyone was still paying attention - I'm in the middle of too many things ... none of which should be *this - but *this makes me feel better . . . ;)

Updated above and :
Code:
String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	bool self = false;
	if ( buffer == cstr ) self = true;
	if (length == 0 || !reserve(newlen)) return *this;
	if ( self ) {
		memcpy(buffer + len, buffer, length);
		buffer[newlen] = 0;
		}
	else
		strcpy(buffer + len, cstr);
	len = newlen;
	return *this;
}
 
8) - Cant believe how much of a life this one has become :D

I think if it were me and it were important, I would maybe rework some of it. ... But are these issues worth fixing?

I think p# 18 could be a good and safe improvement and should resolve some easy issues?

Strings are VOLATILE and self same copy is a recursively risky operation - I don't see SubString() copies as worth changing for?
 
This will fail when the buffer is moved [and the two pointers are the same]
There is the reserve in "String & String::append(const String &s)" that ensures the buffer doesn't move. (I don't fix the case there "append(const char *cstr, unsigned int length)" is called directly.)
and also there will be no NULL to copy - that is the first character over written on memcpy()
Right.
memcpy of length+1 chars wont help, cause the 0 termination has already been overwritten when the last char is copied, so it must be manually added (or by strncpy :) )
strncpy can't add the termination. Either you use length+1, which is undefined behavior (overlapping source/destination buffers) or you use length in which case strncpy won't terminate the string.
 
Yep - updated #17 to do the manual null termination, for the two string case.

for the case of: onestring.append(onestring.c_str() + 1);
Could extend your #18 method something like:

Code:
String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	bool self = false;
	unsigned int buffer_offset; 
	if ( (cstr >= buffer) && (cstr < (buffer+len) )  {
		self = true;
		buffer_offset = (unsigned int)(cstr-buffer);
	}
	if (length == 0 || !reserve(newlen)) return *this;
	if ( self ) {
		memcpy(buffer + len, buffer+buffer_offset, length);
		buffer[newlen] = 0;
		}
	else
		strcpy(buffer + len, cstr);
	len = newlen;
	return *this;
}
But again not sure it is worth it.
 
Strings have a bad reputation - but can have their place so they shouldn't be evil. And I saw dead stop behavior - no warning and no feedback - ICK!!!!

KurtE - that seems simple enough to add more robustness. Attacking it 3 minutes at a time sure wasn't efficient ( in post #22 ) - but it was new code and something fun to do.

tni - the reserve - AFAIK - is set like in my sketch to an expected max to pre-allocate. If that is not done - or is not big enough in the end the String will move as needed.
 
tni - the reserve - AFAIK - is set like in my sketch to an expected max to pre-allocate. If that is not done - or is not big enough in the end the String will move as needed.
Right, your new version looks fine and doesn't require the reserve (the old one in post #9 did).
 
Awesome,
Just curious, not as a newbe, but a dumbe. How do you go about implementing the fix. Is it just in the Teensy String implementation or the Arduino libraries. Do you have to submit it to them and hope they implement it or what?

thanks
 
@Paul - does this look good? Probably applied to 'teensy' tree as well - but less memory probably keeps sane folks from using Strings?

@bicycleguy- This code is in the Teensy3 Tree:: "...\hardware\teensy\avr\cores\teensy3\WString.cpp" there is a similar file in ..\Teensy.
For the PJRC distribution they are installed and managed for use on an independent GitHub repository, and Paul reviews and takes pull requests to consider releasing.
Paul knows where they came from and anything getting back to the Arduino folks goes through them. Paul works hard to improve their stuff when he fixes it - not sure where this falls in all that.

@tni - I see what you mean 'require' - Correct - it only worked before when it didn't get moved. Glad it is working in the end - and I got to help - but distracted getting four computers back to owners.

KurtE - I see you didn't compile - missing a paren. I have just a CLONE so I can't do a PULL request?
I re-re-hacked my test case and substring did okay on small strings and growing on from 480 length to 15608 length in steps noted below code - took four iterations, looks good.

at:: "@@ -248,8 +248,19 @@ String & String::append(const String &s)"
Code:
String & String::append(const char *cstr, unsigned int length)
{
	unsigned int newlen = len + length;
	bool self = false;
	unsigned int buffer_offset; 
	if ( (cstr >= buffer) && (cstr < (buffer+len) ) ) {
		self = true;
		buffer_offset = (unsigned int)(cstr-buffer);
	}
	if (length == 0 || !reserve(newlen)) return *this;
	if ( self ) {
		memcpy(buffer + len, buffer+buffer_offset, length);
		buffer[newlen] = 0;
		}
	else
		strcpy(buffer + len, cstr);
	len = newlen;
	return *this;
}

// this sample goes 480 length to 15608 length in steps noted below code - took four iterations, looks good.
Code:
  else if ( twostring.length() < 10000 ) {
    Serial.print( twostring );     Serial.println( "<<< SS0");
    twostring += twostring.substring( 0, 5 );
    Serial.print( twostring );     Serial.println( "<<< SS1");
    twostring += twostring.substring( 5, 9 );
    Serial.print( twostring );     Serial.println( "<<< SS2");
    twostring += twostring.substring( 10, twostring.length() );
    Serial.print( twostring );     Serial.println( "<<< SS3");
  }
 
Back
Top