The Anatomy Of XLSX Files
I work with webshops, and as such I work with .XLSX files a lot, and convert these to .CSV prior to import/export. It's the OS-independent format. The big standard. Also: the much bigger file! A 821KB XLSX is 87MB large as a CSV!! Considering the difference - even though they supposedly contain the same data, I thought I'd dig into the file format a bit, using this particular file as my example, and see how it really works and what they really contain.
I used to think XLSX files used some smart algorithms or formulas to effectivise the way they store data, like finding similar instances and grouping them together as one entity, or skipping delimiter symbols altogether, and instead using spaces or line breaks (which I assume take less space), but as it turns out it's nothing like that. They just use heavy compression, and if you change the filename you can see exactly what they contain - which is not just the worksheet itself (apparently only 8MB big, though, in XML format), but a lot of other resources, and even the image file for the script icon! Some very unnecessary things, I feel, but the file size does decrease drastically with compression.
Looking at the files, the worksheet I'm currently working with (it's a workbook with only one worksheet) contains these: