Random thoughts on code bloat

EK701 · Sep 13, 2014

While updating an app on my iPhone today, I noted that the update was 40MB in size. For some reason, this got me to thinking about my early days in computers, starting in about 1979 or so. Of course, like most kids of that era, I used the Apple II, TRS-80, Atari 400/800, etc. Then in about 1981, I took a summer computer class at a local university and got an account on the VAX 11/780 running BSD4.1 (I think) and man was that cool! I got to program on a "mainframe"! The VAX 11/780 had a 5MHz processor, 16MB of RAM, and 8 28MB RK07 disk drives! Plus I had access to the ARPANet! I was in geek heaven. Oh, and don't forget "rogue"!

I also took a class at another local university and got access to their VAX/VMS system (don't remember the details), but I never really cared for VMS. I remember using a Teletype portable terminal with a 300 baud acoustic coupler modem that my mom scrounged from work (She worked for AT&T) to get into the VMS system from home.

Then came the 1/1/83 ARPANet "flag day" when the entire ARPANet switched from NCP to TCP/IP. Usenet access and mail came about around this time frame as well. The university with the VAX 11/780 got it upgraded to an 11/785 (7.5MHz CPU!) and got a 56k ARPANet/Internet connection. Anonymous FTP was blazing fast now!

Fast forward to mid-1984. My mom was working for AT&T Computer Systems and she brought home a just introduced AT&T 3B2/300 mini computer. It had a whopping 2MB of RAM and a 10MB hard disk, plus the requisite 5 1/4" floppy disk (360k capacity if I remember right), a 4410 monochrome terminal, and most importantly a 300/1200 baud modem. The modem wasn't auto dial, though. You had to dial the number, then wne the remote end answered, you pushed a button on the modem and hung up the handset. I was on top of the geek world with this setup. I even had a Usenet email (remember bang path email addresses?) and news feed.

In 1985, while still in high school, I got a job with a local database application company as a junior system administrator. We had all kinds of computers that I had full access to - AT&T 3B series, VAXes, one of the first Sequent systems, HP machines and more. And we installed out first Ethernet network using "think net" coax and vampire taps. 2400 baud auto dial modems were the norm. We got an AT&T 8086 IBM PC Clone at home with a 2400 baud modem. Somewhere before that, we got an IBM PC JR, too. Now I had color graphics! Oh, and the Mac made it's debut. Man I really wanted one, but man were they expensive (especially since we got AT&T stuff for free.)

Around 1987, I got a part-time job with AT&T Computer Systems as a system administrator at their local office. I remember getting in a 3B2/600 with dual 600MB hard drives and a 120MB QIC tape drive. I thought to myself, man what are we going to do with all that disk space?! Usenet was really rocking at this point and a "full feed" consumed about 10MB per day, if I remember right. Around 1990, I remember seeing a confidential Intel X86 processor development timeline and seeing a projected X86 processor that ran at 256MHz and thinking to myself how are they going to shield that thing so it didn't make a RF mess.

Computers and Internet progressed and I finally left the IT field in 2002. By that time, I was working for a large international telecommunications company in the network engineering department where we were deploying a global IP network. OC-3 circuits were common and we were starting to deploy OC-12 circuits in the continental US and Europe. GigE had just hit the scene, too.

Anyway, back to my original point, how did we get to the point where an iPhone app update is 40MB in size? Back in 1984, the entire OS, plus applications, only took up about 6MB on that 3B2/300. Hell, the Apple II, TRS-80, etc. ran off 180k and 360k 5 1/4" floppy disks. Of course, the applications today are much more sophisticated, but I remember running Lotus 123 on an Apple III using dual 360k floppy disks. Is the size of applications today due to the development environments used today? Poor programming because of the abundant RAM and disk available? Or are the apps just that much more sophistacted that the code base is several orders of magnitude larger? Or a combination of factors? I haven't done any software development (until I recently started playing with the Arduino and Teensy) for over 15 years and I sure have forgotten a lot, but I'm sure I'm also out of touch with today's software development environment.

nlecaude · Sep 13, 2014

One of the factors on iOS is that apps often target multiple devices (iPhone, iPad) Just for graphical ui assets developers need to provide different sizes for each device (non-retina, retina, IPad sized etc.) Which can add up pretty quickly. Also they tend to include both 32 bits and 64 bits code in the same bundle they call "fat binaries".
Thanks alot for that story, looks like I am as old as TCP/IP

EK701 · Sep 13, 2014

nlecaude said:
One of the factors on iOS is that apps often target multiple devices (iPhone, iPad) Just for graphical ui assets developers need to provide different sizes for each device (non-retina, retina, IPad sized etc.) Which can add up pretty quickly. Also they tend to include both 32 bits and 64 bits code in the same bundle they call "fat binaries".

I figured there was something going on like that. However, that seems kind of a bit lazy - why force a download of data you don't need? In the controlled iOS environment, I would think the App Store would know exactly what device is running the app, so therefore a binary specifically for that device could be downloaded. In the more open and diverse Droid arena, I can see fat binaries making more sense.

Thanks alot for that story, looks like I am as old as TCP/IP

It was a fun trip down memory lane!

jimmayhugh · Sep 13, 2014

I started similarly, my first computer used an RCA 1802 COSMAC CPU, which I wire-wrapped on perfboard and had a staggering 256 BYTES of memory. The programs were entered via bit switches on the front panel. Next I wire-wrapped an S100-bus backplane and CPU board using a Z-80, this time with a purchased 5K static memory board, and a wire-wrapped EPROM board for the BIOS, which I wrote myself (I had a cross assembler on a PDP-8 at work). My first storage device was a cassette tape. I was using an old teletype that I'd picked up as surplus for I/O, until I scraped together the money for a Processor Technology's VDM-1 video board.

Many upgrades followed.

I've done mostly embedded systems, with limited RAM and ROM, so when writing a program, you sometimes had to squeeze every bit as hard as you could. Nowadays, every flag seems to take at least a byte. Let's also remember that the CPUs of yesteryear were all 8 bit or smaller (Intel 4004), so many instructions were a single byte. With today's 32 and 64 bit processors, even a single instruction takes up 4 or 8 bytes, not to mention address selection values.

pictographer · Sep 13, 2014

In my day job, I develop EDA software. Code bloat has many sources, some of which tend to get overlooked by folks on the receiving end. Quality assurance costs are up there. Each variation or option added potentially doubles the QA cost. Meanwhile with each generation of the software support for more platforms is added. Even though older platforms may be officially disavowed, it's often not because all the old code has been removed. Rather, it's just too expensive to support the old stuff so it's disabled with a check. Actually removing all the code can be a ton of risky work. So all sorts of legacy code and data accumulate from version to version. And of course, for the majority of commercial applications, libraries are essential. Here again code for many unnecessary things gets included. On a large enough project, you're likely to find redundant libraries such as multiple XML parsers because one 3rd party picked one and a different 3rd party library picked another, for example. Every four to six years, there's a crisis about how large things have gotten and a few engineers are tasked with reducing the bloat, but this sort of effort is undertaken only when absolutely unavoidable. New features sell. Reduced transmission/storage requirements, not so much.

stevech · Sep 13, 2014

Code Bloat: Enabled by over-use of OOP and too much abstraction. Also leads to lost-cause of error down the call stack, and "Error, Try Later"\

Do we blame it on the RAM industry for making it so cheap? Or Microsoft for needing gigabytes so that impossibly large programming teams can stir their soup?

bobc · Sep 13, 2014

EK701 said:
Oh, and don't forget "rogue"!

Yes, I spent many hours working with that application

Anyway, back to my original point, how did we get to the point where an iPhone app update is 40MB in size?

That is a $million question, one that I have often pondered. A simple answer is "because we can", or as a corollary of Parkinson's Law, "code expands to fill the space available". It takes on a real practical angle when I have to persuade my boss to fit the 64K chip instead of the 32K one. That was a struggle. I tried to explain while the prototype might fit on 32K, people keep adding new requirements and I have to write more code. Or course, I have now nearly filled the 64K chip, and I am arguing for the 128K chip

What I have noticed though is that adding more features tends to have a non-linear effect on code size, as does having teams of programmers or use of libraries. A single programmer can hold the whole problem in his head and perfectly optimise the code - I was always impressed that Transport Tycoon was coded entirely by Chris Sawyer, with another guy doing graphics and sound, and came on 2 or 3 floppies, I guess a lot of that was graphics.

If you think that the venerable rogue ran using 7 bit ASCII, todays apps support wide variety of languages, so instead of strlen/strcpy you need functions to handle multi-byte character sets. And then there are languages which display right-to-left, complex glyphs, languages where the same character is display differently depending on whether it is at the start, middle or end of a word, it all requires a lot more code.

There is a great quote "If I had more time, I would have written a shorter letter", and I think the same applies to software. Timescale and features are generally given priority, and using an off the shelf library is often a quick and easy way forward.

It's interesting to contrast with hardware, where the hardware gets faster, smaller, cheaper with more features, software goes in the opposite direction. No one seems to have discovered any techniques which can be applied to software to provide the same benefits hardware gets from shrinking feature size, instead we are stuck with low level coding, not much changed from 1981. Object orientation has perhaps enabled more complex features to be added, but hasn't done anything for the other attributes.

Anyway, I don't have an answer, but my embedded coding technique is quite different to how I write PC software. I rarely even think about how much space the PC software takes, unless it becomes obvious.

EK701 · Sep 13, 2014

bobc, thanks for the insight.

Pensive · Sep 14, 2014

A very good question.

Largely a combo of 24 or 32 bit graphics and off the shelf libraries offering 10 times the required functionality.

Interestingly my brother writes console games for a living (extremely popular ones) and I understand that they are in a perpetual state of rewriting the game engines from the ground up; they, like arduino enthusiasts, need to squeeze every last ounce of performance from the kit.

They rarely re-use old code- by the time they've fixed all the bugs in an engine it's out of date

But when it comes to apps on windows or mobile platforms people are lazy....

It's nice actually - had a 64k program limit in Turbo Pascal 5.5 back in about 1988 and now I'm using the teensy I have the same limit again

defragster · Feb 25, 2015

@Pensive - code and RAM in 64k? - My first project in '85 after school in interpreted basic I had 64k code and data and comments combined - I had to remove comments to finish the project (and I got dinged on that at review time) - logging vial weights of penicillin running the PC interface and a remote watertight dumb color terminal in the production area with bar graphs as I talked to the machine weighing the product and running the process - all from a lowly 8bit PC.

My first HDD unit was 5MB and it was great no longer booting from dual 360K floppies.

Floppies grew in capacity - and they grew in quantity (dozens) shipped - then CD's - then multiple CD's then DVD's - then Multiple DVD's. Now you don't get disks and you have to download everything - repeatedly for multiple computers. Three downloaded Updates to get to todays Win 8.1 from factory 8.0 were 3+gb, 800+MB, and 500+MB.

I got my new 3GB RAM phone and went cheap and only added in 64GB flash instead of 128GB - to add to the system 32GB flash.

jacky4566 · Feb 26, 2015

Don't forget that high level programming will involve more libraries. If you want small code size you need to program in assembly which is not feasible for everything except microwave ovens.
So the cost of easy programming and compatibility is bigger programs.

Constantin · Feb 26, 2015

Even microwave ovens have some pretty big chips in them to handle the displays, cooking programs, and so on. Heck I even found an Atmega running my vaccuum cleaner.

Assembly likely only finds application these days in situations where the execution of the code has to be 100% unambiguous, very fast, and/or very reliable. Instead of hoping that the compiler won't bork the thing, you will upload exactly what you expect to, every time.

A famous example of the above was Arctic Fox on the C64 (6502) vs. the Amiga (68000). The C64 gameplay was smoother, faster, etc. Guess which game had been written in assembler and which in a higher language? Similarly, I expect that many of the eye-candy games written today will spend a lot of time tuning very specific parameters, sub-routines, etc. in the GPU arena to get the highest FPS possible for a given resolution, anti-aliasing, etc. setup.

stevech · Feb 26, 2015

Code bloat and excessive abstraction via layering of APIs - is a reason we often see
"Error 1258321. Abort, Retry, Fail"

the app can't trace which pancake in the tall stack was rotten.

Rena · Mar 2, 2015

I think it's largely because of abstraction. Modern programs are bigger, but they do a lot more. Sure back in the 80s, we could fit a word processor in 64K, but it only worked on one specific model of one specific machine, and probably only supported English, a couple different fonts (or even just one), very basic formatting, not a lot of features, etc. Whereas today's word processors support every language ever, a zillion fonts, advanced formatting, headers, tables, columns, custom page layouts, any size paper, extremely large files, embedded graphics and charts, spelling and grammar checking, cloud syncing, live collaboration across the net, revision control, etc, while running on a wide variety of machines and OSes. The software hasn't simply got 10 times bigger; it's gained 10 times as many features.

Of course a lot of that abstraction is made possible by using higher-level tools. We don't write much in assembly anymore. We use things like Javascript and Python, which are a lot more flexible, but also take up a lot more space and more processing power. (The program is distributed in text format and requires an interpreter, libraries, etc.) It's likely that your program is running through several layers of interpreters and compilers. We've traded efficiency for flexibility and convenience.

There is definitely some "bloat" as well. We allocate 8K of memory for a string buffer "just in case" and then only use a few hundred bytes. We have fancy graphics and gradients. We embed long, detailed messages and instructions instead of making them part of a separate manual. We bundle default config files just in case the external one gets lost, default and example files so that the program "just works" instead of relying on the user to provide their own, etc.

Random thoughts on code bloat

EK701

Well-known member

nlecaude

Well-known member

EK701

Well-known member

jimmayhugh

Well-known member

pictographer

Well-known member

stevech

Well-known member

bobc

Active member

EK701

Well-known member

Pensive

Well-known member

defragster

Senior Member+

jacky4566

Well-known member

Constantin

Well-known member

stevech

Well-known member

Rena

Active member