Feed Me

November 14, 2009

Wow!  I am typing this into the “Visual” tab on WordPress in FireFox 3.5 and dang is it noticeably faster.  To clarify, I installed Ubuntu 9.10 (Karmic Koala) on my EEE 901 and thus have gone from Firefox 3.0.X to 3.5.X.  Browsing seemed snappier, but now I know it’s magnitudes better based on how this is performing.  Previously I could practically watch the characters appearing, now they feel like they’re there before I finish hitting the keys.  This is awesome!

I’m also trying AdBlock Plus instead of NoScript to see how well it fairs.  NoScript is great, don’t get me wrong, but sometimes it takes me a moment to notice I’m missing something on a web page.  Some sites require JS for form submissions and things and you won’t know until clicking a button does nothing.  So I’m hoping ABP will use up less of my brain cells.

It’s the end of another week, this one kinda fell off the edge at contracting job.  That is, I started to run out of things that needed doing and didn’t feel like wasting my time and their money by sitting in a cubicle sopping up their cash.  Instead I mostly worked half-days and played Dragon Age or read The Gathering Storm in the evenings.  No programming, but I have been adding posts to Bikini BOP here and there (always a fun activity).

What I’ve been meaning to do, as a smallish activity, is rip out the last of the PHP code I have that requires Zend Framework, specifically the Feed classes.  In true perfectionist fashion my biggest stumbling block has been what to name the class and its file.  Dumb, right?  Well, I started with XmlFeed (xmlfeed.php) but that sounded too generic and you wouldn’t necessarily know what it does at a glance, so I’ve switched to Rss (rss.php).  Too obvious?  Oh well, I tried.  It will read Atom feeds to an extent as well, but it will basically boil it down to the same components as RSS.

Another quirk, since I’m writing this for myself, I found myself designing all the property names (these will be array keys) as 4 characters.  What’s that called?  Not OCD, well not really.  My mind was more of in a puzzle-solving and creative mood: how can I make these mostly apparent but keep them in 4 characters?  It’s a little crazy too, I admit, but here’s what I came up with:

  • name: <title>  This one isn’t great, but it still makes sense.
  • uuid: <guid>  I could have gone with the latter, but other parts of my code (GData class) had already used UUID and I thought I’d keep it the same.  In fact, I copied all these except “from” from their usage in my GData class.
  • body: <description>  Following a sort of “email message” motif, huh?
  • date: <pubDate>  I’m working exclusively with published dates so all others are ignored.
  • href: <link>  Specifically the HYPERTEXT reference (e.g. link to a web page).  For Atom feeds this will find that HTML link which for my purposes is what I always want.
  • tags: <category>  Nothing much to say here except that I drop all the fluff around categories and just keep their names.
  • from: <author>  This is my favourite one!  I didn’t want to use “auth” for fear it would be confused with “authorization” or “authentication” (as I commonly use that abbreviation).  Since a name and/or email address goes in this field, it continues that email motif I mentioned earlier.

So as soon as I have the urge, I’ll be plopping this into ObremSDK and get FanSiter running on this netbook where I don’t have (and didn’t want to download) ZF.


CSS Insert Content

October 31, 2009

On the FanSiter blog I wanted to take advantage of the footer area which, for WordPress and Google Sites alike, is completely off limits to hosted customers.  Luckily for users, and probably a pain for admins, CSS3 allows us to generate and replace content.

A disclaimer before we begin: don’t abuse this for spammy purposes and don’t rely on it for indexing within search engines like Google (it probably isn’t and won’t be for a long time).  This is another, purely-cosmetic modification to your pages.  Which, for the purposes of SEO, is fantastic because you can also reduce redundant content from taking up space, showing up in search engines, etc.  It is relatively easy to detect by service administrators and I have no doubt your banning is imminent if you try to use this to avoid their normal content filters / rules.  WordPress, for example, already automatically removes HTML tags so you can’t use it for links, images, etc. — just text.

.footer_content:before {
content:"Created by Neil C. Obremski • ";
}

That’s all I had to add to my custom CSS in WordPress. What this says is: go inside any tag using the “footer_content” class and insert the content string at the beginning. Let’s see what that does …

CSS inserted text

CSS3 content doesn't show up in FireBug

It works pretty dang well and it shows up in Chrome (and therefore Safari I’m sure), IE8, and FireFox 3.5; e.g. the modern trifecta.  I didn’t try in older browsers, but if it doesn’t show up then it’s not a big deal.  I wanted to start with something rather mundane and actually I tried linking my name to my website which is when I found out that WP strips out HTML tags.  It’s possible I could find someway around that, but I don’t want to break their trust (and TOS) and lose my blog.

And as noted in the caption, the CSS-generated content does not show up in FireBug, which can provide a bit of mystery to web developers when they’re trying to track down a bug.  This made me think it might be a funny practical joke to play on a designer: use it to insert some rogue content and watch while they freak out trying to figure out where it’s coming from.  In order to hide from file searching functionality, you could use Unicode escaping. :)


Linux Rename using Modified Date

October 22, 2009

I know I’m going to forget how I did this again, especially if I lose the following script I wrote, so I’m posting it here.  Here’s the gist: a directory full of files and I want to to copy them to a new location while also changing them to lower-case, using a different extension (MOD => mpg), and finally including the Last Modified date in the name (YYMMDD-hhmmss).

#! /bin/bash
for file in $1/*.MOD
do
   modifdate=`stat -c %y $file`
   formatdate=`date -d "$modifdate" "+%Y%m%d-%0k%M%S"`
   echo "cp -p -u $file $2/mov-$formatdate.mpg"
   cp -p -u $file $2/mov-$formatdate.mpg
done

Note to myself: you called it movcp and put it in /usr/bin. Thus I go from this …

Raw video files from SD card for my Canon FS100

Raw video files from SD card for my Canon FS100

Using this …

Linux script copying files off Canon FS100 SD and adding timestamp to their name.

Linux script copying files off Canon FS100 SD and adding timestamp to their name.

To this:

Canon FS100 files are just MPEG-2 without proper aspect ratio header set

Canon FS100 files are just MPEG-2 without proper aspect ratio header set

Voila! Now I can archive these off to my ReadyNAS and online backup.


X-Hacker?

October 17, 2009

Ha, clever …

If you're reading this ...

If you're reading this ...


Silence PHP’s Magic Quotes

October 12, 2009

Here’s a nifty snip of script you can prefix your PHP with to reverse the effects of the “magic” quotes. Thankfully, PHP 6 will no longer have these awful things, but in the meantime:

if (get_magic_quotes_gpc()) {
    foreach ($_GET as $nm => &$s)
        $s = stripslashes($s);
    foreach ($_POST as $nm => &$s)
        $s = stripslashes($s);
}

You may have seen the effects of magic quotes and not even known it. A script processing a form is giving you quotes where every apostrophe and/or double-quote character is preceded by a backslash. The original idea, and who knows how the language designers got it approved (no QA?), was to prevent newbie programmers from inadvertently creating SQL injection security holes. What it ended up doing was open up worse problems as noobs struggled to reverse the effects without quite knowing what they were doing.


Goodbye Zend, Hello cURL

September 29, 2009

cURL you know it’s true … ooh, ooh, ooh, I love you.

Well, we’ll see at any rate.  I spent a significant amount of hours today delving into both the Mechanical Turk API (REST) and cURL.  Up to this point I’ve been relying on Zend_Http, but it’s been rocky.  cURL seems to be a solid, long-running library that has been accessible to PHP since the first 4.x series.  The former has a very clean, but limited API.  The later is super arcane, but immensely more powerful.  Again, that’s the theory!

Much of what cURL does is automagic in the best and worst possible ways.  For example, to send POST parameters you can simply call:

curl_setopt($handle, CURLOPT_POSTFIELDS, array('name' => 'value'));

Fantastic!  However, that also sets the content type to multipart/form-data automatically and you can’t change it.  Likewise, if you did this:

curl_setopt($handle, CURLOPT_POSTFIELDS, 'name=value');

It’ll configure itself to send application/x-www-form-urlencoded. Yikes, what a weird way to choose.


That Darn Name

September 28, 2009

An hour ago at Starbucks I happily tweeted my first experience with a FINE Sharpie Pen on my Moleskin.  “Hmm,” I thought afterwards. “I wonder if loveatfirstwrite.com is taken?”  It’s only somewhat clever but is also rather long for a domain name, so there was a good chance.  Moments ago I navigated to that web site and it exists … sort of.  It goes directly to a page selling the domain.  Doh!  Okay, so it’s only $97, I’m still interested so I dig further.

loveatfirstwrite.com redirects to ITDomainNames.net purchase page

loveatfirstwrite.com redirects to ITDomainNames.net purchase page

I immediately opened three tabs to Archive.org, DNscoop, and StatBrain.  Existed back in 2007 for a book called “Love @ First . Write”, has a PR1, and a couple dozen visitors a day.  That’s not terrible, but I’m in no rush so I checked the WHOIS to see when it would expire …

Domain Name: LOVEATFIRSTWRITE.COM
Registrar: THAT DARN NAME, INC.
Whois Server: whois.intrustdomains.com
Referral URL: http://www.intrustdomains.com
Name Server: NS1.INTRUSTDOMAINS.NET
Name Server: NS2.INTRUSTDOMAINS.NET
Status: ok
Updated Date: 28-sep-2009
Creation Date: 28-sep-2009
Expiration Date: 28-sep-2010

>>> Last update of whois database: Mon, 28 Sep 2009 22:35:51 UTC <<<

Wha-wha-what? It’s a bit fishy. Either there’s a massive coincidence here, entirely possibly, or that registrar is using Twitter as a domain tasting source. Yes, it all sounds very conspiracy theory, but imagine they come up with domain names and then only purchase them if they already once existed and have backlinks. My tweet would have been perfect to generate this name. Darn them!


Bing People Search

September 7, 2009

I think Bing is trying to be over helpful and therefore misinterprets searches. For example, I was trying to remember the name of this person I met at PAX and could only conjure two pieces of information. Here’s the results of my searches (tried Bing first):

Google vs Bing on name search

You can guess which result helped me.


Server-Side Blogger Proxy

August 29, 2009

A challenge in bringing on new people to work for me is giving them some kind of standard interface while continuing also having a very flexible way of using their work.  Most specifically I am talking about, among other things, my FanSiter CMS and my old concept of WebFront.  What I’m toying with yet again is providing a single blog for writers to post on which is a data source utilized by the individual sites.

First, you’re probably wondering what the advantage of that is over traditional blogging.  Really it’s taking the nice, singular and flowing user interface of posting blog entries and then redirecting those to all the niche sites they’re related to.  A writer then opens their browser, pulls up their list of assignments, and goes to one place to write all of them.  These posts become pages in a more static-looking site structure so they can be organized into much tighter groups.

Small, individual websites have a different set of traits from normal blogs.  People can generally link to them without fearing about what might appear on them when the topic is much more specific.  Additionally, I’d hypothesize that most links will go to the home page rather than some deep, archived post.  And then I just like smaller websites over larger blogs; they can be branded in isolation and work together on a network without one bad chunk of content pulling the whole thing down.

Distribution is normally a curse of organization.  Logging into all those separate sites or even trying to manage things from one of the more complicated CMS is a nightmare I never want to wade into.  Most software tends towards tons of features when I just want a solid place to put content and pipe it to the right places.

Anyway, my idea for this with Blogger is relatively simple.  As part of the preload processed mentioned earlier today, the blogger archives will be loaded and parsed into pages.  The presence of a slug tag determines where said post goes and that tag must always be first.  This prevents weird tag notation and helps keep things grouped properly in the blogging interface as well.  The blog itself is then mapped to a FanSiter domain so that a writer may verify their work by actually visiting the blog-configured URL.

BlogSpot hosting is fine, except it’s public, and one stray link-in there and you’ve got a bunch of people going to the monolithic blog inside of the individual sites.  Once SE’s find that, they’re going to see all this duplicate content too, and so you’re in deep shit.  So I use a custom domain that I can control, and I just tested it to make sure this would work.  Blogger complains the DNS is setup incorrectly, but it still resolves when asking ghs.google.com directly using the appropriate Host value.  There is a fragility here that concerns me, but it can be overcome by switching back to BlogSpot and making the blog private.

To view private blogs I’d need extra code to authenticate programmatically before extracting the information from them.  You might be wondering why I wouldn’t just use the GData API which makes all this available through XML.  The problem there is that it only gets content from a specific user.  That might be okay if I have a single account my writers use, but with multiple writers (myself included for example), it starts to get wonky.  Then I’d be making multiple calls, have to store user names/passwords, and merging all this XML.  So, no, I’m just going to hack-parse the HTML I get from HTTP GET’s.

Finally, when preloading a site, the blog is hit up using a single query: “/search?label=slug“. This provides all the posts for that site in a single page and therefore a single HTTP GET.  Simple simple!  The somewhat more complicated part is parsing out the individual pages and their tags; time-consuming to code but relatively straight-forward (unless the HTML template is changed!).

As long as these pages retain the same /year/month/day-title.html location then mapping them to the correct site is a cinch. However, I think those URL’s look too “bloggish” and I would probably do something about that.

Hmm, alright, I’m done rambling for a while. This has given me more to think about than I care to at the moment.


PHP5 Output Buffering

August 25, 2009

I just spent way too much time debugging a problem with a custom output buffer capture function and an object in PHP. Hopefully this post helps others skip the same pain. Tell me if this is what you’re doing: trying to access a global object in your capture method and you’re calling ob_start() with either no second parameter or a zero. Thus the global object is always NULL, right?

By not giving ob_start() a buffer size, PHP will only call your capture method after the script has executed and global objects are garbage collected! Pass in a buffer size or don’t use an object instance in your capture method.