Branches, Segments, Leaves

On the surface, a typical Forust [site] will be one index page, a handful of category pages, and many content pages.  Think of these as the trunk, your main limbs, and then all the big and little branches hung off them.  Underneath each of these types will simply be a “branch” that is a list of “segments” with “leaves” attached directly to the branch or one of its segments.  If you’re familiar with JavaScript, think of Array objects which have subscripts and a length (segments), but are also objects and can therefore have name/value properties (leaves).

On S3 a branch is a first-class object where the URL exactly matches the particular page being requested with a couple exceptions.  First, the index page URL is decided in the site configuration and used automatically when no path is given; it defaults to “/index.html”.   That is to say, when you go to http://www.example.com/ (path is “/”), the configuration there will specify the site index branch (http://www.example.com/index.html).  The other exception are categories which are represented with a similarly named default (index.html) but are publicly represented without it.  For example, http://www.example.com/category/index.html is the actual page, but it can be visited at http://www.example.com/category/ (if the former is used, a 301 redirect is issued).  Continuing this, if a user enters the category name without a trailing slash (http://www.example.com/category) then the slash+default name is added and searched for.

Branches are never deleted, because they serve the URL permanence commandment.

All pages must end with a known extension or “/index.html” to represent a folder default.

Segments are the meat of a page and are represented as the branch’s URL followed by a number in square brackets.  Oh I know, it’s just too obvious isn’t it?  For example, imagine a branch called “category/foo.html” which has two segments: “category/foo.html[0]” and “category/foo.html[1]“.  Each segment is a block of content whereas the branch object is actually information about that branch.  This is a bit of my change from my last post on URL’s, it will mean that newly user-created pages are actually just segments (not first-class branches) attached to a pre-created branch object[1].

Finally, leaves are extra data that are not necessarily permanent and are S3 objects with a name based on the branch or segment, followed by a dollar sign and identifier.  For example, “category/foo.html$comments.xml” is a leaf called “comments.xml” attached to the “category/foo.html” branch.  You can see how I plan to attach comments!

Rendering a page will follow these baby steps:

  1. GET the branch.
  2. If it is a pointer then issue 301 redirect.
  3. HEAD displayed segments (pagination may require only some segments being loaded).
  4. GET leaves specific to any server-side features (comments not initially part of this).
  5. Construct XHTML, remove hyperlinking on self-referencing URL’s.

Renaming a page will follow these steps:

  1. Create new branch object with “pending rename” status (contains old URL). [2]
  2. Change old branch object to be a pointer to the new branch.
  3. Move all segments and leaves (anything with same base URL followed by ‘[' or '$').
  4. Update new branch object to remove "pending rename" status.

Pages in the middle of being renamed cannot be modified at all.  It's possible the renaming process can be interrupted which is why a "pending rename" status is added and then removed when finished.  If "pending rename" is found when a page is loaded and it is old, then it is renewed (given a fresh timestamp) and started once again.  This process continues until all segments and leaves are moved.

In conclusion, Forust is taking shape and its shape is made up of branches with segments and leaves.  With this strange analogy I can formulate a common class for interacting with it and back it locally with files or just data in memory for testing.  Yes yes, I said no automated testing, but that doesn't mean it won't help ad hoc testing!

[1] Abandoned branches (no segments) must be periodically pruned.

[2] The branch and pointer XML formats must be defined.  Branches must support statuses and possibly: text/url of parent, previous sibling, and next sibling.  Branch XML must not need to be updated each time a segment/leaf is added.

One Response to “Branches, Segments, Leaves”

  1. Neil Obremski Says:

    Thinking more on the “XML” aspect, I can’t come up with enough stuff (for branches and pointers) and nor can I figure out why it would be the object’s data and not just in a header. Therefore, amending this post I believe branches will be segments of their own (simplifies new-page creation) and extra meta data will be used to indicate: parent (p), sibling (s), redirect (r), and old URL (o) values. The presence of an “old URL” indicates a pending rename is in progress.

Leave a Reply