The better half of my past week can be summarized best by this oh-so-descriptive-error-message:
Right: a message I have spent a long time on to find out what was happening – and what caused it. Multiple days – so let me try to spare you the pain when you would encounter this error.
(tip: if you don’t care about the story, just skip to the conclusion ;-)).
We are rebuilding our product to Business Central – and are almost finished. In fact, we have spent about 500 days building it – and since the recent release of Wave 2, we are fully in the process of upgrading it – because obviously, since all is extensions (we have a collection of 12 dependent extensions), that should be easy. (think again – Wave 2 came with a lot of breaking changes … but that’s for another blogpost ;-)).
Our DevOps builds had been acting strange for a while – just not “very” strange .. . In fact: when a build failed with a strange error (yep, the above one), we would just retry, and if ok, we wouldn’t care.
That was a mistake.
Since our move to Wave2 .. the majority of the builds from only 1 of the 12 apps failed – and even (what never happened before), the publish from VSCode failed as well with the same error message:
Insufficient stack to continue executing the program safely. This can happen from having too many functions on the call stack or function on the stack using too much stack space.
We are developing with a team of about 9 developers – so people started to NOT being able to build an environment, or compile and publish anymore. Sometimes.
Yes indeed: sometimes. I had situations where I thought I had a fix, and after 10 builds or publishes – it started to fail again.
And in case you might wonder – the event log didn’t show anything either. Not a single thing. Except from the error above as well.
What didn’t help
I started to look at the latest commits we did. But that was mainly due to the upgrade – stuff we HAD to do because of the breaking changes Microsoft introduced in Wave 2.
Since it failed at the “publish” step, one might think we had an install codeunit that freaked out. Well, we have quite a few install-codeunits (whenever it makes sense for a certain module in that app) .. I disabled all of them – I even disabled the upgrade-codeunits. To no avail.
Next, I started to look at the more complex modules in our app, and started to remove them .. Since one of the bigger modules had a huge job during install of the app – AND it publishes and raises events quite heavily, I was quite sure it was that module that caused the pain. To test it, I removed that folder from VSCode, made the code compile .. and .. things started to work again. But only shortly. Moments later, it was clear in DevOps that certain builds started to fail because of the exact same error. From victory .. back to the drawing board ;-).
Another thing we tried was playing with the memory on our build agents and docker hosts. Again, to no avail .. that absolutely didn’t help one single byte.
And I tried so much more .. really. I was so desperate that I started to take away code from our app (which we have been building for over 6 months with about 9 developers (not fulltime, don’t worry ;-)). It’s a whole lot of code – and I don’t know if you ever tried to take away code and make the remaining code work again .. it takes time :-/. A lot!
What did help
It took so much time, I was desperately seeking help .. and from pure frustration, I turned to Twitter. I know .. not the best way to get help .. but afterwards, I was quite glad I did ;-).
You can find the entire thread here:
First of all: thanks so much for all of the people for their suggestions. There were things I didn’t try yet. There were some references to articles I didn’t find yet. All these things gave me new inspiration (and hope) .. which was invaluable! Translation files, recursive functions, event log, dependencies, remove all code, force sync, version numbers, …
Until phenno mentioned this:
Exactly the same error message, with a big xmlport. It first pointed me to the wrong direction (recursive functions / xmlport) ..
But after one of our developers remembered me that from months back, we also had a big object: A 1.2Mb codeunit, auto generating all Business Central icons as data in a table, to be able to use them as icons in business logic. Initially I didn’t think it would ever have an effect on the stability of the app (in this case – the inability to publish it) .. we wrote the damn thing more than 4 months back, for crying out loud :-/ and the code was very simple – nothing recursive, no loops, very straight forward. Just a hellofalot of code ;-). But .. It doesn’t hurt to try what it would do when I would remove the code .. so I tried .. and it works now! Victory!
The size of a file (or object) does matter. If you have the error above – it makes sense to list your biggest files, and see if you can make them smaller by splitting the objects in multiple (if possible.. ).
While in our case, it was one huge object in one file. And I don’t know what exactly was the problem: the size of the file, or the size of the object. There is a difference. If I wanted to have kept the functionality, I might have had to split the object in multiple codeunits, and on top of that I might have had to split them in multiple files (which – in my honest opinion – is best practice anyway..).
Also, I have the feeling that Wave 2, is a bit more sensitive to these kind of situations.. I don’t know. It’s just – we had this file for quite a while already, and it’s just with the upgrade to Wave2 that it started to be a problem.
In any case – I hope I won’t wake up tomorrow, concluding the error is back and all the above was just one pile of crap. Wish me luck ;-).