Leafs and Limbs

May 27, 2009

Leaves may die and be reborn, but limbs persist until destroyed.  As they relate to Forust, limbs are page objects and leaves represent their data.  Up until now I’ve written about two different routes of saving leaves, but I may have it figured out.  I’m adopting the TSV idea, but also some objects based on their URL’s will be included, and the data itself will be spread across multiple objects.

First, why not just use a database, especially SimpleDB which is already based on S3?  The answer is difficult for me to formulate, but it comes down to the gut feeling of that data being fluid and unpermanent.  I am not opposed to using a database for transient information and analytics, but I will not use one for Forust storage.  Chalk it up to crankiness if you like, I have certainly tired of arguing with myself over it, but I’m moving on.

Now, there are a couple priorities behind what I’m about to explain: minimize S3 transactions and never destroy.  Obviously the latter is relative, as certain junk and trademark/copyright material must be burned, but in general I aim to never lose data.

Why not use straight-up TSV for the page’s leaves and segments?  The answer lies in the abstraction: S3 objects are atomic, distributed, and decidedly not files as we would normally treat them.  If I could simply append to them, then there’d be no issue.  Consider if two users try to add a comment on the same page and exactly the same time; the internet is great for producing these types of fringe scenarios.  Both operations to add the comments will succeed, but one is going to get “stomped”.  Not good.

Why not use separate S3 objects then?  Pages will move and be given new URL’s while the old URL must be modified to forward there.  With S3 objects based on a page’s URL, they’re going to get lost now and then when trying to perform this rename.  I came up with some ways around this, but it’s easy to imagine it getting out of sync and either losing data or duplicating it.

Alright, so how can I possibly do this?  Well, I’m going to combine these ideas using some extra (albeit more complicated) server logic.  Let’s start at the beginning and walk through a page’s lifetime.

Initially, when a user creates a new page the content and the limb are one.  The name of the file they’re uploading will be embedded in the URL they’re given:

/2009/05/27/141500-image.jpg.html

The above is both the public URL and the S3 URL; there is no need to create a separate “head” object.  When a leaf is added to the limb, say a comment, a new S3 object is created:

/2009/05/27/141500-image.jpg.html[timestamp]$tsv

If a new content segment is added, the same type of action occurs, but the URL it ends up at ends with “$seg”:

/2009/05/27/141500-image.jpg.html[timestamp]$seg

Segments use the “f0″ meta tag whereas TSV objects use something else (but not Content-Type, since a user can cause that to be set which would be dangerous).

The use of a timestamp is for ordering purposes and to prevent stomping (not perfect, but I’ll further solve it when I get there — probably use the PHP session ID too).

All of these objects are loaded in order to build a map of the page.  If a segment is not referenced in any TSV then its mere existence causes it to be in the map.  The TSV files are formatted similarly to how I imagined earlier today, except the first column is a timestamp.  The two types of data rows (implied by existence and in a TSV) are sorted by timestamp prior to interpretation to ensure new ones override old ones.

Fine, you say, but what happens when the page is categorized or renamed?  If it’s given a new URL then it breaks the URL-relationship of all those tsv’s and seg’s.

Yup, totally, and that’s okay!  Let me explain …

Renaming a page under this system involves two new objects: a tsv under the old structure indicating the page is relocated and a new page object that is a tsv rather than a seg.  The new page object will have all the previous TSV data rolled up into it including the rename directive and all the segments that had linkage implied by their paths.  The segments thus retain their old URL’s, but are linked to the limb via explicit TSV rows rather than their paths.

There are issues here, but none of them are related to losing data.  Given that there will be unnecessary / stray S3 objects after a rename, a CRON job will have the responsibility of cleaning that crap up.  Over time all these fragmented objects will cause page loading performance to drop considerably, and it will be the job of said scheduled task(s) to fix it.  I hate to use this term, but it’s kinda like defragmenting.  Yes, it sounds awful, but I think it can work out well.

Data duplication could be a problem, so one rule I’m stating now is that duplicate TSV rows (implied or not) are completely ignored.  The only reason a row will be duplicated is if there is some sort of synchronization issue, because the timestamp column prevents two pieces of equivalent content added separately from being an issue?  Why would you want that?  Well, hypothetically if each comment is simply its text then you wouldn’t be able to say just “lol” more than once.  God forbid!

Caching is an obvious advantage but also a necessity.  Note that S3 objects are never modified by this system, so it’s perfectly safe to cache them indefinitely.  The removal of data is actually done by adding new objects which indicate what data to cancel out.  Legally there are problems with this approach, and I’ll need some way to purge cached items that must be permanetly deleted, but I’ll get to that.


TSV Revenge

May 27, 2009

It’s the epiphany toilet I’m sure, but it struck me that my old friend TSV could store the page object data.  I used it to great success on multiple internal projects (my friend Kelsie will remember “SWEET” and “Ferret”) as well as my failed FanSiter.  In every case it’s been simple (split on tabs and linefeeds!),  portable (open directly in Excel, import into SQL, no problem!), and extensible (add another column!).  I don’t know why I didn’t think to use it before.

Originally I planned to store leafs as separate S3 objects and reference them based on filename.  Yesterday I decided the page object itself should be a list of all its leaves and segments.  Now today I believe in merging these two things into a single TSV.  That is: leaf names and values will be stored on the page object as rows in a TSV.  The initial format will be something like this:

  • Column 0: S3 URL to segment or page.
  • Column 1: Name.
  • Column 2+: Value(s).

A blank URL indicates a page-level leaf (e.g. referencing the entire page) whereas the presence of a URL means the leaf is specific to a segment.  Segment URL’s never change because they are never exposed publicly, so they are perfect for keys.  Segments themselves are identified by the presence of any leaves or just the URL by itself.  Links to pages are easily identified by the URL ending with “.html” and the leaves can then represent cached title/summary/etc.  For myself and anyone who remembers FanSiter, this is nearly exactly the same format … it was a good one!

When a page is renamed, the old page object can simply add a row indicating the new URL along with a leaf named “redirect” or something like that.

Multiple leaves of the same name are placed into an array, but generally the newest of something is used.  Later jobs can prune off old information (old page titles/descriptions) or leave it there for historical purposes.


Stage Fright

May 27, 2009

Where do we go now?  Yesterday saw possibly my last in-office work day for a programming contract which preceded a long walk home to clear my mind.  It took longer than I had hoped to get back into the spirit of Forust, but I was determined to stamp down the stage list … and I did.

  1. Pages
  2. Comments
  3. User State
  4. Moderation
  5. Authorship
  6. Rating
  7. Videos
  8. Statistics
  9. Scheduled Tasks
  10. Anti-Spam

The order of these may change.  Indeed, some of them are a bit of a chicken and egg scenario, but I think this is a fair assessment.  Also, Statistics and Scheduled Tasks are semi-vague, but then anything beyond the first three weeks (or so) of work is going to be.  If you have a moment, we’re going to walk through them in a bit more depth.

1. Pages

Firstly we need to be able to create, add to, and view pages.  A couple of details here: creating S3 direct-upload form, handling S3 upload-success-redirect to create the page object, base code structure for accepting a command (treating website as a service) and rendering output, downloading attachments from S3 and proxying them via a simple command (foo.html?a=number/name).  Text is limited to whatever can fit in the “f0″ meta tag and attachments can only be images up to 5mb in size.

2. Comments

What would content be these days without recourse to compliment or denounce it?  Comments will be the first leaf feature, so this exercises not only another form of input and page data, but also underlying structure.

3. User State

How do you login and keep session data?  My plan is to rely initially on either Gdata (Google) or Open ID and track with standard PHP sessions.  Anonymous users, with cookies enabled, will get a long-lived session for temporary authorship.  On the site-front this means a form for logging in/out and a “public terminal” / “remember me” checkbox.

4. Moderation

Gotta put a stop to anon-power-users at this point with the introduction of moderator status, a tool for managing who is a moderator (for admins), and also admins themselves.  Moderators are stored in a PHP-writable data file whereas Administrators are specified in the site configuration (a private PHP file).  Tools available to moderators: flagging/hiding pages or comments (reason must be given, from an enum), categorizing new pages, and listing uncategorized pages.  Administrators can do all that as well as permanently remove content (but not the URL) give a page a new URL anytime.  Note that neither have permission to change content, which brings us to …

5. Authorship

If you wrote it, then it’s yours and your responsibility.  Even anonymous users (assuming they have cookies enabled) retain authorship of their articles for a time.  This stage adds an “author” leaf to pages which gives that person the ability to control follow-up contributions, lock out SE’s (so they can display it on their own site without competing), specify a forward link which is displayed but not automatically used, mark a page as public/private (latter is never categorized, not shown in unmoderated pages list), and remove/edit content (change text, re-upload attachment).  Edits are not allowed after comments are made against the current segments, otherwise they lose their context.

6. Rating

Sure there’s a lot of content, but is it good content?  I have a hard time with ratings, because I feel in general they are done poorly and end up skewing the perception of quality.  Amazon is one of the few sites which does ratings “ok”, but I just don’t believe in star ratings.  The idea here will be more about customization: you can up vote / star / favorite items you really like.  Alternatively you can “delete” (down vote) ones you don’t.  The data will be collated later into something more useful, but in the interim it will help you ignore things you don’t want to see/read.  Also, I want the ability for visitors to suggest a page’s category (especially on new pages).

7. Videos

You may recall from the first stage that only images are supported, but video is a huge reason for me to put this site together.  In this stage I’ll increase the upload limit, the supported formats, write the player scripts (WMP, QuickTime, Flash), and come up with a way to specify embedded videos (a la YouTube).

8. Statistics

I’ll have to pull out my rusty MySQL at this point to store every tiny action that occurs.  The logs provide some information for general page views, but I’d like to track more navigation to determine “human views”, popularity, and quality (do the vistors scroll through the entire page, etc.).  This stage is very ethereal, and I might end up replacing it with something more concrete… it seemed like a good idea, but now I’m wondering if better site structure / design should take precedence.

9. Scheduled Tasks

Things are bound to get out of sync, so I’ll need to write a CRON job for scanning the pile of objects: site map, list orphans, invalid objects, corrupt page object, find errors in web server logs, and generate report(s) for these things.

10. Anti-Spam

All this hippy shit is fine and dandy, but the world isn’t as clean as that.  At this final stage in initial development, I’ll explore methods of turning the tide on bots, scammers, phishermen, etc.  The primary idea behind this is to allow the flow, but not to give it as much credit or consideration.  Out of site, out of mind!

Today I’ll update the spec with this information and nail down the data formats of which I came up with some great ideas yesterday.


Branches, Segments, Leaves

May 25, 2009

On the surface, a typical Forust [site] will be one index page, a handful of category pages, and many content pages.  Think of these as the trunk, your main limbs, and then all the big and little branches hung off them.  Underneath each of these types will simply be a “branch” that is a list of “segments” with “leaves” attached directly to the branch or one of its segments.  If you’re familiar with JavaScript, think of Array objects which have subscripts and a length (segments), but are also objects and can therefore have name/value properties (leaves).

On S3 a branch is a first-class object where the URL exactly matches the particular page being requested with a couple exceptions.  First, the index page URL is decided in the site configuration and used automatically when no path is given; it defaults to “/index.html”.   That is to say, when you go to http://www.example.com/ (path is “/”), the configuration there will specify the site index branch (http://www.example.com/index.html).  The other exception are categories which are represented with a similarly named default (index.html) but are publicly represented without it.  For example, http://www.example.com/category/index.html is the actual page, but it can be visited at http://www.example.com/category/ (if the former is used, a 301 redirect is issued).  Continuing this, if a user enters the category name without a trailing slash (http://www.example.com/category) then the slash+default name is added and searched for.

Branches are never deleted, because they serve the URL permanence commandment.

All pages must end with a known extension or “/index.html” to represent a folder default.

Segments are the meat of a page and are represented as the branch’s URL followed by a number in square brackets.  Oh I know, it’s just too obvious isn’t it?  For example, imagine a branch called “category/foo.html” which has two segments: “category/foo.html[0]” and “category/foo.html[1]“.  Each segment is a block of content whereas the branch object is actually information about that branch.  This is a bit of my change from my last post on URL’s, it will mean that newly user-created pages are actually just segments (not first-class branches) attached to a pre-created branch object[1].

Finally, leaves are extra data that are not necessarily permanent and are S3 objects with a name based on the branch or segment, followed by a dollar sign and identifier.  For example, “category/foo.html$comments.xml” is a leaf called “comments.xml” attached to the “category/foo.html” branch.  You can see how I plan to attach comments!

Rendering a page will follow these baby steps:

  1. GET the branch.
  2. If it is a pointer then issue 301 redirect.
  3. HEAD displayed segments (pagination may require only some segments being loaded).
  4. GET leaves specific to any server-side features (comments not initially part of this).
  5. Construct XHTML, remove hyperlinking on self-referencing URL’s.

Renaming a page will follow these steps:

  1. Create new branch object with “pending rename” status (contains old URL). [2]
  2. Change old branch object to be a pointer to the new branch.
  3. Move all segments and leaves (anything with same base URL followed by ‘[' or '$').
  4. Update new branch object to remove "pending rename" status.

Pages in the middle of being renamed cannot be modified at all.  It's possible the renaming process can be interrupted which is why a "pending rename" status is added and then removed when finished.  If "pending rename" is found when a page is loaded and it is old, then it is renewed (given a fresh timestamp) and started once again.  This process continues until all segments and leaves are moved.

In conclusion, Forust is taking shape and its shape is made up of branches with segments and leaves.  With this strange analogy I can formulate a common class for interacting with it and back it locally with files or just data in memory for testing.  Yes yes, I said no automated testing, but that doesn't mean it won't help ad hoc testing!

[1] Abandoned branches (no segments) must be periodically pruned.

[2] The branch and pointer XML formats must be defined.  Branches must support statuses and possibly: text/url of parent, previous sibling, and next sibling.  Branch XML must not need to be updated each time a segment/leaf is added.


Tiny Workstation Win

May 23, 2009

One fact I forgot to mention yesterday: my workstation  is/was my EEE 901 netbook! Somehow I’ve achieved enough proficiency to write some code on it in Ubuntu! My pride went pretty high when I actually checked in my first version of s3arch.php.  It doesn’t yet set the public-acl, do any error checking, or calculate MD5’s but it is functional beyond that.  Here are some tech notes about the experience:

  • The position of the arrow keys and right-shift on the EEE layout is ridiculous and still caused me some mishaps.  I’m considering VIM even more, given that the arrow keys can’t (I think) move the cursor when you go to edit something.
  • Copy and pasting from Firefox 3.0.1.0 into Gedit 2.26.1 did something to the PHP, introduced some characters which caused my script to fail with weird errors.  I literally spent around a half hour tracking down a problem that didn’t exist (e.g. typing the code out myself worked).  So if you see “Unexpected ‘{‘” after cut&paste, consider deleting all that and typing it yourself.  I should have suspected that when the resulting paste removed all line breaks.
  • PHP CLI arguments are not automatically parsed into an associative array like $_GET or $_POST, instead you get the boring ol’ C-style $argv and $argc.  Also, you must specify a double-dash on the command-line before writing script-specific arguments, otherwise PHP tries to interpret them.
  • The S3 class I’m using uses CURL which does not come with PHP CLI by default.  I actually had gotten PHP CLI by installing PEAR, so I didn’t know where or what I had (thankfully I knew enough to run “$ sudo updatedb” and “$ locate php”).  Then I couldn’t figure out how to get php-curl on Linux, got confused by the pecl command which didn’t work for me.  I tried installing the entire PHP CLI through Synaptic Package Manager but it didn’t get me anywhere new.  Finally, some forum I found indicated I just needed to use a standard package install “$ sudo apt-get install php-curl”.  Duh
  • Gedit is okay but the word jumping is really different from Windows (notepad2 / textpad / visual studio) and that throws me off.  I never realized how much of a ctrl+left/right junky I am!

So today …

  • Install S3 Fox to remove the garbage I uploaded and check on my results.
  • Do file MD5 and set public-acl.
  • Print out timestamp of existing object if skipping.
  • Add error checking … start by having any error cause the whole thing to stop.

Later …

  • Back port source to ObremSDK where it belongs.
  • Add phpDoc comments to source, that way the license shows up properly in Google Code.
  • What style of comments does Google parse for JavaScript licenses, same as phpDoc?

Alright, let’s get to work!


S3Arch now in PHP!

May 22, 2009

Pain and weird feelings of an RSI type in my arms prevented my last writing of S3Arch, but that excuse no longer holds, and now I’m going to write it in PHP.  Right now.  This post will server as a light spec for it as well as another shameful reminder if I don’t do it.  The gist …

  • Launched from command-line, e.g. “php -f s3arch.php”.
  • CLI args: key, secret key, bucket, destination path.
  • public-acl set for anonymous HTTP GET access.
  • MD5 calculated for added defense against corruption.
  • Objects not created if they exist on the server.
  • Only files, not directories, are PUT as objects.

The source path is based on the current directory; I might change that to a parameter later but it works well for now.  Some things to figure out:

  • Reading arguments in a PHP script; probably a super global, I’m guessing $ARGS
  • Creating MD5 for a file; does the S3 class do this for me?
  • Use old S3 class for doing the job
  • Determining if object already exists: obviously the PUT will fail, but prior to that I need to do a list on everything in the bucket to reference.
  • XML parsing, based on above factor.  Should I just use strpos() for the time being?

Alright, let’s get started!  One last note: I’m going to forgo any kind of consistent commenting so I can just get through this!


Forust URL Organization

May 22, 2009

Maintaining a consistent hierarchy is a huge pain, I’ve decided to forgo it altogether.  Forust pages will have naming constraints, but they will not depend on a “parent URL”.  This also frees sections from having their status/type derived from the path.  All that said, the front-end will typically try to maintain pages within a parent URL, but it is not a requirement.

For example, the section “news/” may contain pages “news/today.html”, “2009-05-22/”, or even “2009/05/22/headline.html”.  Another reason for this is definitely to allow migration from other CMS or hierarchies.

Part of my inspiration comes from *nix where directories are simply files that are of a specific type.  In this way, a file can be linked from multiple places but its data is in only one place.  The data in Forust includes the URL, meaning a page will only have one final URL.

The world isn’t perfect, people change their minds,  and things need to move.  “She wants to move … she sexy!“  And when you’re first writing something, it’s preferable to have it consistently saved instead of tossed into the void if you lose your interent connection.  Plus Forust promises the extra sugary topping of giving your brand-new page (that you wrote without logging in) a URL of its own.  That URL is generated for you and isn’t exactly pretty.  As things get better organized, it makes sense to update URL’s to match.

Would you prefer “www.forust.com/29050285902850982590820938.html” or “www.forust.com/funny/joke.html”.  The latter is just going to look better anyplace it shows up, but you don’t want to break the former either.  HTTP deals with this just fine using a 301 redirect (or is it 302?  In any case, I mean the “permanently moved” status code).  URL’s always remain, but the object data may be changed to a forwarding link.  Periodic tasks can update section data to use the latest content links, but all the old links will continue to work.

Let’s face it, broken links blow big hairy chunks of shit.  Even more annoying are those “This page has moved” notices which load all their crappy navigation before shoving you onto the next destination (that may or may not be the end).  I aim to make Forust output only the most minimal content for a 30X redirect, something most people will never see anyway.  An additional remedy is having it look forward (only if the objects are cached) and pass you straight to the end of the list.  Again, a scheduled task will periodically refresh forwarding links so they point to the latest thing.

Hardening can be good; let that sink in a moment.  Wait, what are you thinking about?  I mean link hardening.  Anyway, after some point it becomes annoying to have links change (even if maintaining the new location).  For instance, the more a link appears as content and not just in the HREF attribute, the more it moves beyond the threshold of temporary and into the realm of permanence.  I’m a big believer in maintaining presence permanence as much as possible.  Thus, once a page has been moderated and categorized, it is petrified.  Yeah, more forest analogies … like that?

Administrators can trump petrified pages, but moderators cannot.

Maybe I should use “rooted” … categorized is the term I’ll use in public documentation at any rate.

Hey, come back to me now, it’s time for review.

  • New pages are given an auto-generated URL.
  • Sections are objects which contain lists of URL’s; not path/hierarchy-dependent.
  • Moderators give pages their final URL’s by categorizing.
  • Only Administrators can change URL’s after categorization.

There, that wasn’t so bad was it?  It isn’t until a later stage that I plan to tackle spammers and phisherman (Come to Papa Moon), but it’s worth mentioning a little here since I talk about creating pages without logging in.  The possibility for malicious and spammy pages is more than highly likely, it will be an immediate predicatement.  Fortunately, there are aspects of spamming that can help me prevent it …

  • Uncategorized pages are view-throttled; only so many people can view them.  Get your page moderated and that goes away; small groups and families will probably not reach this limit.
  • Uncategorized pages have no links because no section links to them.
  • Current content rules disallow ANY links, but eventually (when that is allowed) new pages will print out JS-based proxy links that warn the user before visiting a link.
  • Users can flag any page as inappropriate, including new ones.  A new page with a high ratio of flags to views causes its content to be invisible when viewed unless the user clicks a button stating they understand the content is questionable.
  • Heuristics will auto-flag pages.

I’m sure it will be a constant battle and learning experirence, but a founding ideal of Forust is removing the need to login to create web pages.


Up Early Crash Early

May 22, 2009

I woke up around 2:00am this morning and crawled out of bed a half hour or so afterwards.  It’s not that I couldn’t sleep, it’s that I already slept.  I hit the hay around 7:30 so I guess going conscious around 2:00 was okay, because I’m still feeling alright (JINX!).

Anyway, I was actually very productive today, but not on Forust at all.  Laura and I established our assignments the last time we met up for the Pookie featurette, and I did mine today.  Additionally, thinking in the realm of audio/video, I wrangled FFmpeg to my will and got my willpower nearly destroyed by the terrible audio/video editors available through Ubuntu’s add/remove software.  Eventually I threw up my hands in a huff and rebooted into Windows 7 wherein I crammed my crusty copy of Vegas 6 and went to work.

The ideal time was 3:30 from a video over twelve minutes long, so it seemed a bit too pie in the sky.  It continued to look that way after I kept making incremental chops to the end parts until I skipped over a couple blobs of noisy discussion (high points had already been mentioned or would be mentioned afterwards) and simply removed them.  It still sounds like one contiguous discussion, but they arrive at the eurekas much quicker.  And it ended up being a mere 3:13 … hmm, I thought it was under 3 so that’s good.

Anyway, I’ll continue on S3 tomorrow, I doubt I’ll have time tonight what with sleep looming large on my horizon.  It was good to take a break, because after reviewing the Zend framework I became highly doubtful of PEAR and it’s pure CLI (no do-it-from-PHP features unless you count extracting everything yourself and constructing your own source directories).   Zend appears to be a giant collection of classes which tend to work with one another and has a lot of momentum.

Reading Brain Rules today reaffirmed the need to grab attention right away and re-grab within ten minutes.  Part of doing this is to start off with a big “gist” statement and then follow with the details.  PEAR’s documents feel like the reverse of this which may be why I couldn’t “get it”.  I ended up figuring out PEAR reading other people’s comments on using it which is a bit ridiculous.


FFmpeg on Ubuntu

May 21, 2009

I’m using Jaunty (Ubuntu 9.04) and wanted to figure out how to copy the audio track from a video into its own file, using FFmpeg.  First you’ll really want “libavcodec-unstripped” which gives you access to all of the A/V Codec Library’s capabilities (even restricted ones).  Theoretically you need the appropriate licenses, but I feel justified in using it considering how much A/V software I’ve bought (at one point I actually bought WinDVD for XP, it wasn’t pre-bundled).

$ sudo apt-get install libavcodec-unstripped-52

Next you can examine the content of your target video by passing it as input and not providing any output.  Little FFmpeg will complain it has your juice but no cup, but it still pony’s up the information we want.

$ ffmpeg -i example.avi

Finally, if the audio format is already MP3 then you can copy that stream directly into a separate file.  This is fast and doesn’t degrade the quality at all:

$ ffmpeg -i example.avi -acodec copy example-audio.mp3

If the stream isn’t MP3 and that’s the format you want, use the LAME encoder to change it:

$ ffmpeg -i example.avi -acodec libmp3lame example-audio.mp3

It will detect the file format based on your output name’s extension (“mp3″).


Wednesday Webbing

May 21, 2009

I updated OwebPHP with the latest PHP and MySQL binaries.  I haven’t yet found a way to interact with PEAR from within PHP itself, but I’ll check out the manual tonight.  It appears to be more of a server-configured thing, something you need administrative access to do.  To do it manually would mean constructing the PEAR include structure yourself … I think.  I’ve read a bunch of the manual, but still not clear how the whole thing fits together.  It has not clicked for me.

Thursday ToDo’s:

  • Start s3arch.php, put it in the forust folder.
  • Get PEAR S3 module working … somehow
  • Use localpwd.php to load/save the AWS secret key; put the other values in a config.php.
  • Unzip PHP and put php.exe on the path for running s3arch.php
  • Following PEAR coding standards.