Picture Maximums (PicMax)

November 4, 2009

Having a bit of inspiration after a productive day at the office, I decided to implement this small feature that’s been on my mind for weeks: restriction maximum picture dimensions and/or size during the preload process.  There are two big reasons: 1.) FTP push and 2.) because I have a lot of super-high-quality image sources.  The Grace Park site is the first to take advantage of this, getting a maximum width of 400 (about 25% the width of some of the sources!).

PicMax is a variable specified in the site text at the site level, e.g. at the very top.  I wanted it to follow this syntax:

picmax=WIDTHxHEIGHT,SIZE

I also wanted each one of those components to be optional.  You can specify all, one, some, or none and achieve the effect you’re looking for.  The aforementioned fansite simply specifies “400″ which means “400 maximum width”.  However, what if you just want height?  Easy: “x400″ (400 maximum height).  You see the ‘x’ and ‘,’ are delimiters that need to be present in order to specify what’s after them.  So to cap all pictures at ten thousand bytes, you’d say “,10000″.  Maximum width and size but no height: “400,10000″.  How do I achieve such magic?

preg_match(‘/^([0-9]+)?x?([0-9]+)?,?([0-9]+)?$/i’, $picmax, $m))

Damn I love regular expressions!  Now, to retain your sanity you can specify a zero in the whatever spot you don’t want filled in and the code will operate the same.  When I first approached the parsing I wondered how I’d structure the pattern, but it turns out just making everything optional solved it.  Crazy!

When pictures are resized for scale, their maximum file size is capped at their previous file size.  This prevents the unintentional effect of shrinking one and having it grow on disk (the opposite desired effect).  Usually resizing dimensions is enough to squish the byte size, but it’s difficult to know what to set the JPG quality value to and re-saving any lossy format is an easy way towards bloat.


Delicious but Bitter

October 16, 2009

A simple desire to add a link along with a description led me down a completely unexpected path.  However, I did achieve the result I was looking for …

Finally, descriptions on the links page ...

Finally, descriptions on the links page ...

Initially, I thought most of the code was there, I just had to do some switcheroos.  The “lnk” object started as a bit of a bastard and I used the HTML content (“alt” variable) for the link text.  My plan was to allow setting of a “title” variable and then use that as the link text if it existed.  Any kind of weird automagic decision like this in code is not really a great idea, and after futzing with it in various places I realized the mess would have to be lived with or there would be worse consequences.  Rather, I allowed for a “description” variable, and that satisfied me.  In my testing, I noticed some of the sites weren’t appearing in the Plat and that got me wondering about the whole site count in general.

It turns out that the call I was using in Zend’s Delicious object to getRecentPosts() can only get up to 100 as per the API.  Since I had looked up the raw protocol answer myself, I decided it was high time to just call it myself using CurlHttpRequest.  Rather than copy and pasting the URL, I typed it in — bad mistake.  I mistyped the ol’ delicious host “del.icio.us” as “deli.cio.us”.  Even though I know that to be wrong, given how much I’ve typed it in over the years, I still didn’t spot the problem.  This led me on a merry chase for a long while.

Programmers like to argue, but I bet we can agree that the worst challenges are not the hard or complicated ones.  They’re the “this should not be happening, it doesn’t make sense” ones.  If you’ve ever heard one of us say “that’s impossible” then you’ve witnessed the reaction to it.  In my case, it was possible, and came down to a stupid typo which is probably the source of most of these and probably the driving reason behind constants and #define’s.

Within PHP cURL, the user name and password is set as a single string with curl_setopt(…, CURLOPT_USERPWD, …).  I know this and initially when I was testing with hard-coded values, I called it properly.  After refactoring into a new Delicious class, however, I kept getting 401 errors.  I just about tore my hair out wondering if the ordering was suspect (blaming cURL).  Of course, I was calling my new function with two parameters instead of one — stupid, stupid, stupid.  So I updated the function to allow both styles, because I know I’ll make that mistake again later; I figure I’d formalize it now.

Next issue encountered: Delicious XML.  You can put line breaks in the description of a link which is placed into an attribute called “extended” in the XML API and which Zend calls “notes”.  Oh and the title is in a “description” attribute (legacy?).  Anyway, XML allows line breaks in attribute values but it also normalizes them (read: converts them to spaces).  I was relying on the ability to have breaks and now I wasn’t getting any.  For some reason Zend’s object is able to parse these out which makes me think they’re using their own XML parser (maybe SimpleXml allows this?) with the non-standard behavior of leaving line breaks alone.  I pondered my options for a long while before figuring out a nifty regular expression to re-insert line breaks just where I expect them.  This, I decided, was better than suddenly losing line breaks later when Yahoo normalizes their data internally and also writing special XML parsing to grab them.

Whew!


FanSiter Business Cards

October 8, 2009

I let myself go full retard on a rant against eHow today.  It doesn’t matter whether it was justified or not, and the few comments I got on FaceBook and over IM let me know I was being hateful and whiny, because it felt good.  And there is a relief in having cut my account off permanently and removed all my articles: they can no longer make money off of abusing me and my work.  Argh, see I’m talking about it again!  Let’s just cut to this, shall we?

Throwing the camera a cheesy self-shot for our new business cards.

Throwing the camera a cheesy self-shot for our new business cards.

The cards were created at VistaPrint which has the most arduous upsale passageway I’ve seen for a reputable company.  That is, I used them based on reputation alone, and got two sets of 250 (one for me and one for our writer Stephanie) for twenty bucks each.  Thus a total of $40 then another $15 for shipping a moderate “rush” directive (versus like $9 without it).

They look great and the watermark fan is ever so faintly visible you have to really search for it.  I love it!


Delicious Database

October 3, 2009

All the data sources for FanSiter need only specify title, body, and tags.  In the case of Google Sites the URL itself is the only tagging (e.g. the site slug).  Otherwise both Blogger and Delicious fit nicely as inputs to the system, but my process for the latter was precarious and untested.  I haven’t checked in the code for this yet, but now you can specify pieces of pages using Delicious and it will either create the page if it doesn’t exist or add to it if it does.  Take a look at this bookmark for Kristen Wiig:

Delicious bookmark specifies data to be embedded on a particular page

Delicious bookmark specifies data to be embedded on a particular page

This adds a picture from someone’s blog to the glam.html page.  I should note here that the image is downloaded and served from the fansite so as not to hotlink and it is linked back to the source page.

Fansite results for embedding

Fansite results for embedding

Using Delicious as a database is nothing new, but here it becomes an easy way to add things I find while browsing.  It’s by no means the primary method of adding content, but works surprisingly well for supplemental stuff.  Also, you may note I mostly mark the bookmarks private simply to stay on the good side of their T.O.S.


Colbie Caillat on FanSiter

October 1, 2009

After a marathon session of textual transcription and tiny tweaks in FanSiter’s capabilities, my Colbie Caillat Fansite is now fully migrated!

Colbie.org running on FanSiter (shown in Opera 10)

Colbie.org running on FanSiter (shown in Opera 10)

There is goofiness here and there with regards to the data and the template is very crusty, but on the whole I’m happy with the result.  At first I set out to redesign the underlying HTML but then realized the folly of that after procrastinating for so long.  Once I simply copy and pasted the old HTML into a new FSLT (frickin’ awesome), things progressed quickly.  I only started migrating yesterday evening, and half-assed at that, and now I’m done.  Now I can cross that off my list!

Another aspect holding me back from doing this was the forums.  I was going to simply scrap them, but there are some people who enjoy them.  Thus I added “fallback” functionality, something like a “missing files host”.  So if you happen upon an old URL based on the forums structure, it will simply forward you along to http://colbie.freeforums.org/.

I’d write more, but I’m exhausted and going to go reward myself with coffee.


Consuming Feeds

September 23, 2009

Today I forced myself to add another feed consumption feature: populating a page’s contents. In particular I had found a District 9 press kit photostream which made me want to be able to incorporate all the entries as a single page. The challenge is how to chop them up into discreet page objects. I tried several approaches and ended up, for now, just letting my generic HTML parser (developed for the Blogger Firehose) attack the entry contents node:

Sharlto Copley's site has a page based on a Flickr photostream feed

Sharlto Copley's site has a page based on a Flickr photostream feed

One of my attempts flattened the contents HTML into text, used it as a description, and then looked for enclosure links to use as images. That worked great except Flickr’s enclosure images are the originals and in this case they were huge! Seriously, at over three megabytes a pop, I had to decide whether to shrink images after downloading them or just grab the smaller images. I hastily chose the latter (abandoning the enclosures), but I’d like to point out this doesn’t really effect the feed source (e.g. Flickr), because the picture caching system will only ever download a URL once until manually cleared.

The template Sharlto’s site uses now also dynamically generates the categories on the left whereas previously it was fixed to just “Pictures” and “Videos”. That’s why the news is in the middle, because they’re alphabetically sorted. I’m not sure it’s worthwhile to fix that, but it urks me (generated content should go at the bottom — right?). I had intended to embed the news in his index page, which is possible with the new feed system, but it looks like ass (Google’s News RSS uses fixed HTML and it’s against their T.O.S. to modify it at all). Another option would be to have a feature which embeds a category listing into the body of a page … something I’m curious to try, but it is actually a large architectural addition. I’m trying to keep the count of object types to an absolute minimum.

There are also two other small modifications to his site, can you tell? First, menu items hide their overflow which was necessary for those arbitrary news titles. Secondly, only 5 items in any category are shown and then underneath is a more link which takes you to that particular category’s page. That one has been on my to-do for a long time and obviously wasn’t that difficult. Having it means I can add stuff more readily and not worry about the page expanding into an unmanageable pile of links.


FanSiter supports Fart.Go

September 21, 2009

I created Fart.Go in July 2007 as part of a SEO experiment to develop a niche website. Initially it started as poopy.info, which is why the logo ALT is still “I See Poopy”, but I had trouble getting Google to index it. The problem probably had something to do with early 404’s and 500’s, but I blamed the TLD and went to register a new domain. I wanted something that included the word fart and ended up reversing my favorite (Go Farts) resulting in fartgo.com. It’s an awkward name, but near-impossible to misspell, and being different is more a boon for the obscure than a bomb in my opinion.

By all counts it is a successful website. I spent a weekend collecting 50 pieces of content and organizing them in a very specific structure. Its initial host was Blogger, but the blog format just doesn’t suit a small, “tight” collection of categorized pages. Thus a JavaScript to run on WSH was born to generate the HTML which I then manually uploaded (some URL’s, you may notice, still use a blogspot structure of YYYY/MM/name.html). At the time I thought the process was painless enough, but later it became clear just how wrong I was based on how infrequent I updated the thing.

Fart.Go simmered in its own stinky stench with nary a helping hand to waft it into greater spaces for nearly two years. It accrued a meager few updates, literally two or three. It floated to the top of Google for certain phrases and traffic steadily climbed to about 200 unique visitors a day. And finally it made about a buck a day on AdSense. I never implemented even a fraction of the fantastical features planned and yet mere age, stability, and URL permanence solidified its success. Imagine if I could make it better, or is meddling bound to make it worse?

We shall see! In the first week of August, over a month and a half ago, I pondered the ability for FanSiter to support Fart.Go on its platform. I reasoned they shared enough common capability to warrant a bit of effort to implement any additional code necessary. This is back when FanSiter2’s spaghetti shop ran the show and FanSiter3 existed locally in mostly-untested, fragmented form. Thus only last week with the advent of the Blogger Firehose did I set out to complete this task.

Fart.Go (via FanSiter platform) in FireFox 3.X on a EEE 901 netbook.

Fart.Go (via FanSiter platform) in FireFox 3.X on a EEE 901 netbook.

Talk about underestimating requirements! Tons of programming and manual conversion later, it’s finally online. The only thing that kept me going was my own stubbornness and that feeling of “I’ve already come this far …”. Before I talk about some of the new engine capabilities, let’s look at a couple of the new things about the site output itself:

  • FavIcon is transparent: Whatever I used to create the original ICO from my GIF did a very poor job and I shrugged my shoulders at the result, despite shuddering every time I saw the white boxed poop in the browser. This time I did some simple GIMP edits to remove the more conspicuous anti-aliasing artifacts and the ran convert in Ubuntu (wonderful utility BTW). Voila! It turned out so good that I also used it for list items!
  • Newest content list actually works. Previously it generated the list of 4 or so and then popped off drafts. Thus you’d only see maybe one item since I had a lot of drafts taking up space. That list also shows their publish date, which is hidden elsewhere, to indicate freshness. Yes, that is a deliberate decision to encourage updates!
  • Next post arrow based on date. I don’t remember what the » link went to before, but now it goes to the next, older item by date. Thus when you read the newest page, you can click that to go to one older, and on and on regardless of what categories those pages are in.

Internally I expanded the site-text parsing capabilities to encompass near-arbitrary HTML. FanSiter3 divides page items into discreet objects that then have a type: picture, video, paragraph of text, link, etc. Generally this is done by splitting by line and parsing each of those individually. Now there is a simple buffer between lines of HTML content which allow tags to span multiple lines whereas before these would be repaired individually.

On the rendering side there are now type and block wrappers allowing specific tags surrounding individual page objects and collections of those of the same type. Fart.Go’s category pages require this since it uses a HTML list (<ul>). They also have different title structures since the site name and category are displayed at the top (more like a blog) and the page title is not connected. On the same token, it uses different tags for each category page and their URI’s are /category/ not /category.html. Plus there’s the complication of directory redirects (/games => /games/). Whew! Liberal formatting capabilities and use of vsprintf() are now customization options for templates and sites.

Finally, because I’m tiring myself writing this, I introduced the Blogger Firehose which allows one to utilize the UI and layout of a blog to write and post content for multiple sites. Fart.Go needed this because it’s already pushing the limits of how much I can stuff into a single Google Sites page reliably (FireFox on my netbook can barely handle all those DIV’s and BR’s) and I want to be able to recruit a writer or writers without overloading them with unnecessary internals knowledge.

It works reasonably well by parsing the actual HTML output and configuring various settings to discourage people from visiting the raw blogspot and/or (more importantly) linking to it. One issue I’m still working out is the PHP DOM creating empty tags from ones which need to be self-closed (e.g. <img></img>). Pictures and videos are parsed out assuming they meet certain conditions and relevant page objects are created. Other HTML becomes text paragraph objects and tags can be used to set variables like category. Oh yeah, and the title can specify the URI, which has the useful side effect of letting multiple posts add to the same page!

It’s all very cool, but I need to get back to expanding and testing it. Next up is some more Delicious parsing for objects/pages and working with the media namespace in RSS for photostreams.


Fart.Go goes to shit

September 21, 2009

The infamous Fart.Go is now fully on the FanSiter platform. Porting forced me to introduce a bevy of features including the super-awesome, heart-attack-inducing “Attempt to delete all files every 5 minutes”. Yikes!

Yesterday I pushed a metric shit ton of new code live containing the aforementioned bug. The first mysterious moment occurred when testing the new Blogger Firehose (more on that later): blogg3.php suddenly couldn’t be found and everything began burping 404 errors. Now, I discounted the possibility of programming fault and chalked it up to a user error, because dragging from a FTP folder in Nautilus (Ubuntu) will remove it from the server (and mess up the local permissions). I just assumed I had accidentally dragged the wrong direction between windows.

After successfully completing a smoke test, I ate some lunch and relaxed to some ambient “space music”. Following that, in a paranoid check (the only thing that saves me from my own idiocy), I encountered the same problem! Weirdly, and this didn’t tip me off to the seriousness, it gave me the 404 even moments after re-uploading the file. Fuzzy memories prevent me from forming a proper excuse as to why I didn’t go on a surgical journey through the entire source and logs. Stupidity and exhaustion, I suppose (it took hours across multiple days to snip, paste, and restructure the site data into the new CMS format).

One of our cats, Princess Kitty, woke me up to incessant yowling this morning before 6. I had gone to bed past 1. I was not amused. After getting up *sniff*, and pulling myself together, I decided on a whim to check the site. 404! WTF! WTH! Only 15 minutes until my bus arrived and I logged onto FTP to find all but a couple PHP files completely gone.

Oh shit oh shit oh shit oh shit … I’ve been hacked! No, that’s silly, why would they delete some and why wouldn’t they just leave my stuff alone while simultaneously hosting their own stuff? Yeah, not hacked, definitely programmer error.

I spent the bus ride jotting notes on where to look, what functions to scrutinize, and how to scour the logs. After getting setup at Tully’s, however, it turned out to be very simple. One of the very first emails I pulled up from my CRON script (MediaTemple automatically emails you CRON output, so my scripts only output if there’s errors) showed it trying to delete literally everything off the system. Just above that …

Warning: Missing argument 1 for DelTree(), called in /nfs/c03/h03/mnt/55261/domains/fansiter.com/html/templates/template1.php on line 13 and defined in /nfs/c03/h03/mnt/55261/domains/fansiter.com/inc/fileio.php on line 4

Notice: Undefined variable: dir in /nfs/c03/h03/mnt/55261/domains/fansiter.com/inc/fileio.php on line 7

Notice: Undefined variable: dir in /nfs/c03/h03/mnt/55261/domains/fansiter.com/inc/fileio.php on line 8

Urgh! Okay, this is totally my fault, the call was being made without a directory being specified in one of the templates. However, what kills me is that PHP issues a warning and continues. It reminds me of VB’s ol’ ON ERROR RESUME NEXT bullshit.

I added two things to the function (and here’s my recommendation to others working on this haphazard platform), which is starting to become common in all my PHP methods: an isset($dir) check for the directory parameter and it returns immediately on the first unlink() failure unless a $force parameter is set to true. Brittle is better, especially when you’re too lazy to properly test.


FanSiter Landing Pages

September 12, 2009

It’s been a slow week for improvements, but I finally knocked out a feature I’m calling Landing Pages which also happens to utilize the first RSS feed code.

Good Eats site demonstrates a Landing Page using a new FSLT: "The Green House"

Good Eats site demonstrates a Landing Page using a new FSLT: "The Green House"

These are something like parking pages, e.g. “AdSense for Domains”, because they are place-holders of sites-to-be.  That is, they represent a serious intent to flesh out content.  Currently, the FanSiter blog is linking to third-levels based on the post and prior to this feature those links went to 404 pages.  Now a user is given a generated site relevant to the thing they clicked on.

Most of the content comes from Google and Bing through RSS feeds.  These feeds become category pages which allow all templates to function normally as long as they can handle arbitrary categories (as opposed to the fixed “Photos” and “Videos” of some of the originals).  The next feature for RSS feeds will be integrating them on an existing page via an RSS “object”.  One of my ideas is attaching a particular picture page to a Flickr photostream feed.  Then I can show the first N images ending with a link to the entire photostream (and each pic will be linked back to its Flickr source page of course).

Oh, and it’s important to note that there’s no ads on FanSiter landing pages.  This is very much intentional.  If the content is basically a glorified mash-up (ugh, using that word stings) from existing web services, then it is certainly against someone’s TOS to use ads to monetize it.  Plus, I think parking pages are one of the great internet scourges, and I honestly want to provide some utility with these beyond maintaining URL permanence.

That said, I do plan on adding an Amazon product block under the body text. ;)

The FanMake3 component which generates the web output is starting to get a wee bit messy, because of its manipulation in FSLT.  I am thinking about a bit of refactoring before adding too many more templates.  I’d like to standardize certain code bits across the board like chopping / abbreviating links, sanitizing titles, etc.  Currently it’s all a big spaghetti pile and the functionality is nice, but it’s hard to keep track of the complexity.

Of course, the web (and life) is messy, so maybe it deserves to be left alone.


Feed me FanSiter

September 11, 2009

I’ve been pouring over Google’s AJAX API documentation for more than an hour and have come to the conclusion that their neutered RSS feeds are easier to deal with (though I do like the little news widget, I will probably end up using that in place of ads on un-scrubbed fansites). My issue is all of the weird functions and controls just for listing a set of links based on a query.

I realize their results are essentially proprietary, and they provide a JSON API, but I just want to suck it up ahead of time on PHP and regenerate periodically. Also, this will go on ad-less pages (e.g. landing sites which represent the intent of a fansite before it exists). I really want these to be useful to visitors so my links to them aren’t tainted.

It must be noted that their Blog and News search services provide RSS feeds, and those I plan to use. Bing’s web search is mostly excellent, so I’ll mash those three things together. Maybe I can tap Yahoo for images? Then that leaves Video. In any case, I realized that I need RSS integration first and foremost because it enables me to create (and I shudder calling it this) “mashup” landing pages which provide the best of the titans’ information.

What I plan to do for the landing pages is have them set certain categories to RSS feeds. Like “News by Google”, “Images from Yahoo”, etc. These then get listed as normal in a template, but lead the user offsite. Those category pages themselves will actually be visit-able and have further descriptions of the links. The main page itself will simply explain the fact that this is a landing page (e.g. rather than a “Biography”) that is automatically generated. The landing pages will slowly be replaced, of course, but in the meantime this should fill the gaps nicely.

Finally, this allows me to keep those nifty next/previous footer links which let crawlers find all the fansites (albeit slowly). Alright, so it’s RSS time. Looks like the Zend Framework provides a decent API for working with feeds.