code hanging

Status
Not open for further replies.

ParDHarD

Member
Hi all,
I have a problem that I hope someone has come across and can give me something to check for:
I have a function (a) that calls function (b) that calls function (c) - which has a serial.write() call.
(c) then returns to (b), but (b) never returns to (a).
I have debug statements right at the end of (b) which work, but the debug statements in (a) right after the call to (b) never get executed.
What would cause a simple function call to not return?
There is nothing special about the code - it just calls another function.
The chip is a 3.2 and there is approximately 8K of memory available in all calls in the surrounding area.
If I comment out a File.open(), then all code continues as expected - with the exception that the file isn't open and no data gets written - so it's not a viable solution.
I can't really narrow down the code any more because it's part of some 3 chip inter-communication logic and the code is already minimal.
As an overview of what's going on, I'm sending a file from the PC, to an LC chip, which then relays it to the 3.2 chip - which is attached to the destination Micro-SD card.
The only other file in the code is for a debug file so I can try to track down why this isn't working - so there shouldn't be a file limit reached or anything.
I haven't found any writes to an out of range subscript that might corrupt memory.
Any ideas what else I can check for?
Thanks!
 
Now people will think there is a solution. If you don't have any idea, then please do not respond with rules and regulations. My source code consists of several DLL libraries and approximately 100,000 lines of code. I'm sure you really don't want to go thru all that code. I'm just asking for ideas... Like, what may cause my stack to corrupt, or why else would a function just not return...
 
"I can't really narrow down the code any more because it's part of some 3 chip inter-communication logic and the code is already minimal."

Without seeing the code, since you're incapable of debugging it yourself, how are we supposed to do it blindly? It doesn't matter if it's 1 million lines, the experts here who know it will spot it, probably even without testing the code.
 
The PC is writing to Chip1 - which is an LC chip. The LC then writes to the 3.2. The 3.2 has an SD card, so I'm debugging it by writing to a File on the SD card. The PC program that is controlling the chips has roughly 100,000 lines of code (and no DLL's on the Teensy to our friends in Germany).
These is some minimal code of the 3 functions in question - "openfilecreate" calls "sendresults", but "sendresults" never returns to "openfilecreate":

Code:
   bool openfilecreate(int ref)
   {
	   int len = msgin.readint(3);
	   char *path = msgin.readstring(len);
	   char *to = msgin.readfrom();
	   bool ok = false;
	   if (file.isOpen())
	   {
#ifdef stat
		   if (statlog.isOpen()) { sprintf(cmsg, "%s already open\r\n", path); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }
#endif
	   }
	   else
	   {
#ifdef stat
		   if (statlog.isOpen()) { sprintf(cmsg, "file open (%s) = %s\r\n", path, ok ? "ok" : "fail"); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }
#endif
		   ok = file.open(path, O_WRITE | O_CREAT | O_TRUNC);
#ifdef stat
		   if (statlog.isOpen()) { sprintf(cmsg, "file open (%s) = %s\r\n", path, ok ? "ok" : "fail"); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }
#endif
		   written = ok ? 0 : -1;
	   }
	   msgin.clear();

#ifdef stat
	   //int mem = FreeRam();
	   //if (statlog.isOpen()) { sprintf(cmsg, "freemem=%d\r\n", mem); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }
	   if (statlog.isOpen()) { sprintf(cmsg, ">>send results\r\n"); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }
#endif
	   bool ret = sendresults(ok, to, ref); // <-- this never returns
	   if (ret) //<-- this was to help make sure the returned value was used, in case the compiler tried to optimize the call out of existence - and does not execute
	   {
		   ok = !ok;
		   ok = !ok;
	   }
#ifdef stat
	   if (statlog.isOpen()) { sprintf(cmsg, "<<send results\r\n"); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }//<--this never gets called
#endif
	   return ok;
   }

	bool sendresults(bool result, char *to, int ref)//-->
	{
		msgout.addheader(MSG_RESULTS, UNITID, to, ref);
		msgout.addstring(result ? "1" : "0");
		msgout.finalize();
#ifdef stat
		if (statlog.isOpen()) { sprintf(cmsg, ">>msgout.send\r\n"); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }
#endif
		bool ret = msgout.send();
#ifdef stat
		if (statlog.isOpen()) { sprintf(cmsg, "<<msgout.send\r\n"); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); } //<-- this executes
#endif
		return ret;
	}

	char buf[MAXMSGSIZE];//maxmsgsize currently set to 512, len=length of message - which = 18 characters (in this case)
	bool send()
	{
#ifdef stat
		if(aaa) if (statlog.isOpen()) { sprintf(cmsg, ">>serial.write(%s), len=%d, strlen=%d\r\n",buf,len,strlen(buf)); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); }
#endif
		serial->write((const uint8_t *)buf, len);
#ifdef stat
		if (statlog.isOpen()) { sprintf(cmsg, "<<serial.write(%s), len=%d, strlen=%d\r\n",buf,len,strlen(buf)); statlog.write(cmsg, strlen(cmsg)); statlog.flush(); } //<-- this executes
#endif
		return true;
	}
 
I assume that 'stat' is defined, so all the #ifdef stat stuff is simply always true

Its not simply a function calling another function ...

there are a couple of calls to some unknown msgin and msgout objects, what are they ?
 
here are the chips source

The attached zip file contains the files for the 2 chips.
I have the hh/hh.ino on the LC chip and the laser/laser.ino on the 3.2 chip.
On the 3.2 chip is a MicroSD attachment.
The PC is connected to the LC chips USB connector.
The LC and 3.2 chips are connected via their RX1/TX1 pins.
The PC is sending the following commands to the LC chip - which in turn relays them to the 3.2 chip.
The 3.2 chip then maintains a file structure on the SD card.
Here are the commands being sent from the PC to the LC chip up to the point that it hangs: (the PC code is way too much to include)

##06018HOLS001001/
##04018HOLS002/ACD
##05021HOLS003004/ACD
##06021HOLS004004/ACD
##07027HOLS005010layout.xml


// layout:
// 0....+....1....+....2....+....3
// ##IDLenFrToRefmmmm
// | | |..| | |..|__ message (offset=14, len=eol=Len-14)
// | | |..| | |_____ reference id (responses match request) (offset=11, len=3)
// | | |..| |_______ to (offset=9, len=2)
// | | |..|_________ from (offset=7, len=2)
// | | |____________ Length (whole msg len) (offset=4, len=3)
// | |______________ message id (request number) (offset=2, len=2)
// |________________ validator (offset=0, len=2)


The ID's are in comm.cpp, so the commands are attempting to:
##06018HOLS001001/ --- chdir /
##04018HOLS002/ACD --- rmdir -rf /ACD
##05021HOLS003004/ACD --- mkdir /ACD
##06021HOLS004004/ACD --- chdir /ACD
##07027HOLS005010layout.xml --- file.open("layout.xml", O_WRITE | O_CREAT | O_TRUNC)

after each command, a result status is sent back to the host.

In the CommClass.cpp openfilecreate() function, the file gets created (layout.xml)
The "sendresults" function call at line 101 never returns though.
If I replace the "file.open()" at line 87 with "ok = false" - then everything continues fine - just without any files actually being created.

My output (sd card stat.log file) looks like this:

file open (layout.xml) = fail (=fail because I copied the code after the open to before the attempt to open the file)
file open (layout.xml) = ok
>>send results
>>msgout.send
>>serial.write(##18015LSHO0051), len=15, strlen=15
<<serial.write(##18015LSHO0051), len=15, strlen=15
<<msgout.send
 

Attachments

  • hang.zip
    16 KB · Views: 92
Last edited:
Those are 2 instances of a message class included in the zip file below "MsgClass.cpp".
One is uses HardwareSerial and the other uses usb_serial_class.
Each one has incoming and outgoing messages (msgin, msgout)
 
There is a bug in the hardware somewhere... I put some debug stuff in that function and wrapped some of the code in {} braces. The last function call in there, no matter what it is, fails to return. File operations, strcpy, a=b, whatever - nothing returns, the last statement always locks up the chip. I'm going to do a full code rewrite and use some new chips to see if I can narrow what is causing it.
 
ParDHarD -- Have you solved this glitch? If so, could you post the cause / solution that you found? I'd be quite surprised if this turned out to be a hardware issue.

In the spirit of your original question:

I have encountered similar problems in the past, typically caused by problems that would be easy to spot with a real stepwise debugger but are surprisingly difficult to find in a Teensy, where using a debugger seems to mean leaving the Visual Micro or TeensyDuino environment entirely.

When a subroutine executes and logs final state successfully but does not return, it seems pretty likely that the problem has nothing to do with the code immediately before the return. Probably the code to which you wanted to return (or the return itself) has been trashed by something that executes when ok == true.

  • The use of Strings is a common cause of this kind of unexpected behavior, but I don't see any of that in the code you pasted.
  • Writing past the end of a buffer is another common cause.
  • I've also caused this by accidentally using char pointers where I should have use char arrays.
  • Sometimes I've found the actual problem in library code, where the issues were unmasked by my code.

For me, the most common inexplicable lockups have been associated with crossed brain-wires caused by the deceptive similarity of too many coding languages containing the letter "C".:cool:
 
I'm looking at your code (from message #8 above), but it's not clear to me how to reproduce the problem.

Perhaps this code was developed in software other than Arduino? I see 2 folders "laser" and "hh", each containing one .ino file. Each of them has only a single #include line, with a Windows style full pathname including the drive letter. Many of the .cpp files have #include lines for the other .cpp files. If I just copy them all into a sketch, Arduino tries to compile all the files.

Can you understand how this brings up questions & uncertainty about I how I should copy your code into an Arduino sketch in the Arduino IDE running on Linux?

This code also seems to use the SdFat library. Even if I can copy the right files (without redundant including of each other) into Arduino, to compile I also need to know exactly which version of SdFat you have.

I do try to investigate these sorts of problems, but I need you to post copy I can just copy & paste into Arduino, or give very specific details about how I import the code. This code simply isn't in the form of a sketch I can import to Arduino without too much guesswork!
 
Thanks for looking at it.
I found that using the Visual Studio editor was much better than the Arduino editor, so that's why the only line is a direct path to a cpp file.
All files have a common base folder so converting the paths to Linux should be straight forward.
I ended up rewriting all of the communications from scratch, and this time it's avoiding the problem area.
I can't imagine what the cause would be, because in one particular function, the last executable line would hang.
I could add any number of functions, but the last function call would never return.
There was plenty of available memory and the stack was only 3 or 4 deep off of the main loop, so it hanging didn't make any sense.
After the rewrite, it's working fine, so short of working directly with the manufacturer to isolate the issue, I doubt I'll be resurrecting that code any time soon.
Going forward, if I need any of the lost functionality, I'll just copy the abandoned routines over one at a time as needed.
I'm at the office now, so if you still want the version, I'll post it when I get back home.
 
Status
Not open for further replies.
Back
Top