A Place for Every Page and Every Post in Its Place ⁂ starbreaker.org

Author's Notes

The following is intended to be part of the 32bit Cafe "Back to School" Code Jam.
I use 'directory' instead of 'folder' because I grew up on DOS and UNIX, operating systems whose main interface was a command line. It's rather like the way my wife talks about putting groceries in 'the boot' instead of 'the trunk' because she's an Aussie. Commonwealth English and US English aren't quite the same, and terminology applicable to command-line interfaces may differ from that used in graphical user interfaces (GUIs).
I do web development at my day job, but I am not a teacher. While I am trying to provide a reasonably comprehensive explanation of how directory structures work when building websites, I may unintentionally skip over some fundamentals because they are so familiar to me as to seem unworthy of mention. If you trip over an unmentioned fundamental, that's my fault, not yours.

I recently received an email from a visitor to one of my other websites. They have a dream of self-hosting their own website and writing a simple static site generator, and while they've seen many directory structures for generating a static site, they have seen far fewer for a website once it's been generated. In particular, they asked about how to structure a blog that only gets a few posts each year, where each post might have one image, no images, or (rarely) many images. They wanted to know if each post should have its own directory, even if it has no assets.

The sensible thing would have been to answer them immediately and tell them that they can indeed do that, but it had occurred to me that an explanation of how URLs on the Web are shaped by directory structures might be of use to a wider audience while also giving my correspondent information they can use to make informed design decisions.

This post is an initial, rough attempt at explaining how to structure a static website for deployment whether it's hand-coded, generated with tools you created yourself, or generated with popular tools like Jekyll and Hugo. Where appropriate I will refer to a tool's documentation. I may get things wrong or leave things out (see the note above). Suggestions and corrections are welcome; you may email me if you have any questions.

The Short Short Version

You can have a separate directory for each post if you want.

The Short Version

The computer generally does not care about your website's directory structure.

You can have a directory for each post if you want to, but this is not necessary. You can also put all of your posts in a single /blog directory. You can even put all of your blog posts and pages in the same directory as your homepage, if you want, along with all of your images, stylesheets, and JavaScript code.

However, once you publish your website, your directory structure becomes part of each page's URL, or "uniform resource locator", which is the address at which that page can be located on the internet. Incidentally, URL is often used interchangeably with URI, or "uniform resource identifier". It is important to remember that if you change the directory structure afterward, you change your pages' and posts' URLs, and cool URLs should not change, because if a page's URL changes then anybody who links to that page must now deal with a broken link in their website, and this can be annoying.

The Long Version

This is where things may get a bit complicated. There may be information and concepts here with which you might already be familiar, but to make this post as useful as possible to as many people as possible I prefer not to assume too much knowledge on the reader's part.

Why Do URLs Have Slashes?

Let's consider the following URL as an example: https://32bit.cafe/sitemap/index.php. You've seen addresses like these before, but might not have given much thought to their structure. A URL can have several parts, as shown below:

the protocol used to access the URL, in this case https://: hypertext transfer protocol (HTTP) over secure socket layer (SSL)
the domain: 32bit
the top-level domain (TLD): .cafe
the root directory, which is the first single slash (/) in the URL
a subdirectory: sitemap/, which ends with a slash.
the file containing the sitemap: index.php

As suggested above, each single slash in https://32bit.cafe/sitemap/index.php is a directory separator. Every website has a root directory, which is the primary directory that contains all subdirectories. A website's root directory is generally mapped to its domain, in this case, 32bit.cafe/.

Do I Need to Include the Filename?

In the example above, I used an explicit URL: https://32bit.cafe/sitemap/index.php. However, the following address would lead to the same page: https://32bit.cafe/sitemap/.

The latter is what's called a "clean url", and has been popular for a long time. Many people think that such URLs are easier for people to read, and better for SEO (or, manipulating Google), since they leave out repetitive and irrelevant implementation details.

How do clean URLs work? The answer lies in the software that serves websites, HTTP daemons like Apache and Nginx.

concerning daemons

What are daemons?

A 'daemon' (pronounced as 'demon') is a bit of software that lurks in the background on a computer system and acts in response to conditions set in its configuration rather than in response to an operator's input. The term was coined by the developers of the Berkeley Software Distribution BSD a variation on classic ATamp;T UNIX developed at the University of California's Berkeley. It's the reason FreeBSD's mascot is a cute little devil with a pitchfork wearing Chuck Taylors.

httpd, an alternative name for the Apache webserver, is just one sort of daemon one might find on a UNIX-like system. Others include daemons to transfer email like smtpd, launchd on macOS, and sshd, the Secure SHell daemon.

Microsoft Windows, being boring and corporate, refers to daemons as 'services'. Apple's macOS seems to use 'daemons' and 'services' interchangeably. I prefer 'daemon' because I'm a UNIX fan and a metalhead, and having daemons in my computer bound to my service makes me feel like a sorcerer of sorts.

Commonly used web server software can be configured to return a particular file by default of the requested URL ends with a directory instead of a file. This file is usually index.html on static sites and index.php on blogs built with content management systems (CMS) like WordPress. Other somewhat less common extensions include the following:

.shtml for web pages that use Server Side Includes (SSI)
.htm for static HTML pages created on DOS and Windows
.asp for web applications built using Microsoft's (long obsolete) Active Server Pages
.aspx for web applications built using ASP.NET
.jsp for web applications built with Jakarta Server Pages

You can even configure a web server to use index.txt as the default page, but that would just mean the server is sending a plain text file with no structural markup or interactive elements. Nor does your default file for a given directory have to start with 'index'. One could, for example, set the default to homepage.html. Furthermore, one could also configure one's web server to list every file in the directory instead of providing a default page, though this is often considered a security risk.

It bears mentioning, however, that if you want to have full control over your web server's configuration you must either have control over the physical server, be using a virtual private server (VPS), or be on a hosting provider that uses Apache and lets their users create .htaccess files. I am not aware of an equivalent to .htaccess for Nginx, a popular alternative to the Apache web server. Nor am I aware of an equivalent for Microsoft's IIS (Internet Information Services), but I'm equally unaware of anybody using that for a personal website.

Short version: if you're on Neocities or a similar free host, you won't have access to these settings.

So, how should I lay out a site that I've generated?

If you're building your website by hand, you can do it any way you like. For example, you might decide to party like it's 1996.

/index.html
/styles.css
/cancer.js
/rss.xml
/images/
/about.html
/contact.html
/blog/index.html
/blog/iron-maiden-rules-ok.html
/blog/killer-klownz-from-outer-space.html

a small blog without clean URLs

You could also modernize a little, like so.

/index.html
/assets/styles.css
/assets/cancer.js
/assets/images/
/feed/index.xml
/about/index.html
/contact/index.html
/blog/index.html
/blog/iron-maiden-rules-ok/index.html
/blog/killer-klownz-from-outer-space/index.html

a small blog with clean URLs and assets tucked into its own directory

Either of the examples above will work for a small blog whose operator doesn't post often. However, a website with a blog that gets several entries per week, or even per day, might become unmanageable quickly with such a simple structure.

To keep things tidy, many blogs — especially those built with WordPress or with a static site generator (SSG) like Jekyll or Hugo — will incorporate each post's creation date into its URL. Here's an example.

/index.html
/assets/styles/main.css
/assets/scripts/main.js
/assets/images/
/feed/index.xml
/about/index.html
/contact/index.html
/blog/index.html
/blog/2024/07/31/iron-maiden-rules-ok/index.html
/blog/2024/08/11/killer-klownz-from-outer-space/index.html

a small blog whose entry URLs contain creation dates in YYYY/MM/DD format

A directory structure like the one shown above can not only be used to create URLs that tell visitors how old a post is, it can ensure that a given directory doesn't have too many posts in it. The format for blog post URLs is as follows:

All blog posts go in /blog/.
Within /blog/ are nested subdirectories containing the year, month, and day of a given blog post's creation date.
Inside the day directory is a directory for the blog post's "slug", which is the post's title with spaces converted hyphens and other characters removed. Sometimes extraneous parts of speech like articles and conjunctions get removed as well.

While I'm assuming that all images would go in /assets/images/, this is not a requirement; you can place all images for a blog post in the same directory as its index.html file. You can even have separate stylesheets and JavaScript files for each blog post and page by placing these assets in the same directory, though there's nothing stopping you from placing as many stylesheets and JavaScript files in their respective directories and linking to them as needed.

What about SEO?

I'm not the person to ask about SEO. I've been told that using clean URLs that incorporate a post's keywords and indicate the date it was created can help, but I've never seen any evidence of this. Furthermore, it appears that Google no longer gives much consideration to personal websites, and does not send traffic your way if it can instead mine your website to train its AI and keep people on search engine result pages where they can show people ads.

I would, therefore, advise against giving too much consideration to SEO. This is especially the case if your website is a hobby rather than a side hustle.

So, What Should I Do?

Try to think ahead and consider the maintenance challenges your future self might face. But don't let your future self rule you. The person you are right now is the person designing your website. You should, above all else, do what you want to do.

Most of what you'll read about "best practices" will prove irrelevant to you. Best practices are for commercial and corporate websites. You are not obligated to build your website as if it were intended for commercial or corporate use. If you're not getting paid to do this for somebody else, you might as well have some fun.

A Place for Every Page and Every Post in Its Place

Background