Week 6: Ratings

July 6, 2009

This week’s stage is about giving more control to general visitors, anonymous included.  Any user is thus able to “Approve” or “Delete” any segment which can be seen as voting up or down.  Unless this person is the author or a moderator, other users will not see the results of these actions, but they are saved.  Back in the planning stage I was going to save this data in a local cookie, but now all of it is being kept in S3.  All of it.  And I won’t be displaying any raw numbers or using it in a much-meaningful way now, but I intend to (down the road) use it for democratic moderation along the lines of Digg, HN, etc.

So the work this week is minimal: open the “moderate” operation to all users and potentially rename it to something like “visibility” since it’s all about setting the ‘v’ (visibility) leaf value.  Interpreting the ‘v’ leaf needs to be a bit more advanced, taking into account the current user, author, and moderator values.  I’d like to expand the XHTML rendering to omit hidden segments entirely and have them loaded by jQuery if the user clicks a button to show them.

I may spend extra time improving comment functionality.  Specifically an author or moderator must have the ability to approve/delete them.  Comments have very little information to them since each one is an individual leaf in a leaf list (leaflet?), whereas segments can have multiple leaves.  I think democratic moderation of comments is the only way to go.  If you don’t want others hiding your comments, then you can author your own page; easy as that.  Anyway, saving moderation data on comments will most likely be in the form of a single leaf for that user on that particular segment which indicates which comments they approved/deleted.

If a user deletes their own comment then it is set to blank in order to completely “retract” it.

For new pages, users should be able to suggest a title.  This is the title that is used for them and potentially displayed for moderators as a drop-down when performing the mod.categorize operation.

Finally, I’m debating the allowance of external links in the “link” operation which is currently only available to moderators.  If users can add external links, I can see it being mightily abused and difficult to keep relevant or moderated, democratically or not.  The functionality would be extremely helpful for FanSiter, however.


Authorship and Editing

July 2, 2009

Today I completed the final feature for the authorship stage: editing.

AJAX-loaded edit form; text is initialized to previous value.

AJAX-loaded edit form; text is initialized to previous value.

This brings me up to having done the following:

  • Saving UID on S3 objects.
  • Locking (all or just comments).
  • Moderation (approve/hide); there’s a bug here which gives the author too much power, but I can fix it later.
  • Editing.

As has been the case, the code and output is very rough, but it does what I set out to make it do.  Hurray!


Last Week This Week

June 30, 2009

Damn bladder, I just got to actual work today (at 3:23 PM) and now I have to pee … again.  I can blaim this on two things: Panera having caffeine-free Diet Pepsi and my addiction to it.  Anyway, last week suffered due to the upcoming and now-past half-marathon.

Page (categorize) and segment (show/hide) moderation forms.

Page (categorize) and segment (show/hide) moderation forms.

Basic moderation tools are in place, however, so I’m moving on to authorship.  I must note, that I did not put any alerts in place for unmoderated content.  This means that only if a moderator finds your content will they see it needs to be moderated.  This decision was made while my mind was fuddled on Friday with thoughts of running for 13.1 miles, but the week is gone and so I’m moving on.

Authorship means retaining control of the pages you start and the content therein.  Contributors may add content, but only the author has the ability to edit and redirect.  The latter is a feature which won’t be in this release for authors who are self-hosting their content (the Forust page redirects to their own site/domain).  Authors are niche operators of the page they started, giving them raised privileges.

First point: authors are page moderators, not site moderators.  They can not move pages around on the site (rename, link, categorize, delete), but they can effect segments within the page.  They also have a unique ability to edit segment text prior to moderation or comments being added.  Let’s explore that first.

Rather than giving a time limit, a new segment’s text may be edited before it is commented on.  Also, if a site moderator gives the segment a thumbs up (”v’ == ‘1′) then the text cannot be edited either.  This allows a page author to build the content from multiple segments and get it “pristine” before advertising its existence.  They can do this for any segment on the page, even ones added by other users.  Thus if you want control, you must start your own page.  If you want to keep a page author from editing a segment you added, just add a comment to it.

Why prevent edits?  Simple: context.  If you comment on a block of text and then that text is changed, the context of the comment is lost.  There is also the possibility of malicious authors submitting good text and then changing it to bad text after a site moderator has approved it.  Neither situation is desirable.

Alright, the other (and much more simple) thing an author can do is choose whether a segment is visible or not.  This is the exact same form displayed to site moderators.  Anonymous visitors will get the visibility of a segment based on the author or site moderator; whichever is latest (time-based).

Note on visibility: it’s currently set as a CSS class on the segment <div>.  This means that hidden segments are not actually hidden since there’s no stylesheet.  Thus another task this week is a bit of jQuery magic to make these invisible unless the user asks to see them.  It doesn’t help against spam (e.g. SE’s will still index), but protects the user.  I’ll get to the spam issue in that stage.

Lastly, authors must be able to “lock” their pages once they get them the way they want.  This also prevents comments, unfortunately, but that’s a problem for later.  The lock functionality is already there, so this is a simple matter.


One by One

June 24, 2009

My initial plan for the first mod tool was to present a list of all the new pages, but that’s turning out to be more than I want to work on.  Now I’m thinking the mod tool will display the S3 object info, plus the categorize form, and the ACTION URL will be the same tool which will then pick up the next page to moderate.  That way a moderator does them one by one and each submit brings them to the next thing.  This process can certainly be streamlined, but later when I figure more out about what works and what doesn’t.  I need to move onto the next tool also which is moderating segments on accepted pages, a much harder tool to create.


Rename Complications

June 24, 2009

Giving a new URI to a page is a complex process mired in the asynchronous nature of S3, the goal of keeping every single URI ever approved for permanence, and now basing moderation tools around the initial structure.  The latter is the most recent addition to the already-long Bough:rename() method.

There is no move method for objects in S3, only copy and delete.  Thus there is now the possibility that the copy will complete and the delete won’t, leaving … two copies of the same object with two URI’s.  I can combat this by adding an ETag check, but what happens when a user intentionally uploads two identical files?  Obviously, not what they expect when only one shows up.

Code to handle failed moves means dealing with redundancy.  The old URI will have a ‘m’ (moved) leaf associated with it and the object with an exact URI key match must exist at the new location, else the move must have failed and need to be retried.  It’s complicated, but being able to handle possible scenarios is important and I wanted to post this so I know what I was thinking when I come back to this self-correction months or years from now!


Zend OpenID 2

June 20, 2009

zendframeworkZend’s tutorial on using their OpenID consumer convinced me to use their framework.  It looks like they based their demonstration on the SimpleOpenID class, yet a glance at the code and its tests tells you it’s built on much more thoughtful underpinnings.  Unfortunately … it doesn’t support OpenID 2!  Their page states that it has OpenID 2 support, but that doesn’t include YADIS or XRI!

The week has ended, however, and I switched to their codebase yesterday for logging in.  I left my own implementation buried withing ForustOp_Auth as functions that are never called, mostly just for reference.  I filed an bug about OpenID 2 that I’ll revisit later.  Perhaps I’ll assist the Zend community at that point, who can tell?

I made some user-session decisions this week, you can login, but the information is not yet necessary or even included in the TSV or object data.  Those things will come next week with Moderation and following later with Authorship.


Cookies+Session Required

June 18, 2009

Forust is in read-only mode by default unless sessions are enabled.  Additionally, operations cannot be performed unless a prior request has verified that session values are persisted.  This isn’t an issue for web browsers since a GET on the page starts the session and a further POST will replay that cookie with its session ID.

Internally, Forust never calls session_start() and is coded to handle if $_SESSION is not available (e.g. session_start() was not called).  The session variable ‘upd’ (last updated timestamp) must be present for any operation to succeed; it is set automatically on $_SESSION at the end of Forust::operate().

Even anonymous users must have a session and a user identifier in order to make changes.  The user identifier is generated automatically as “a[timestamp]” which allows for a very simple check for the “a[" prefix and "]” suffix to determine if they have authenticated or not.  It’s true that this means two anonymous users might end up with the same ID, but the consequences are not grave.  The user’s REMOTE_ADDR is not used out of respect for their desire for privacy/anonymity.

Each operation during a session is added to a history list on $_SESSION and counters are incremented to show usage.  It’s possible that I will rate-limit based on the ratio of pageviews to operations, but then it would just encourage spammers to slam the server harder.  More importantly, each operation increments a running total that should at some point be carried in between sessions in order to give more weight to their actions.  Once a spammer is identified, their material will be easily removed, and they’re reset to zero when they start again with a new anonymous user.  Sorting by an author’s operation count allows moderators to serve the highest-quality contributors first.  But I’m getting ahead of myself!

Authentication state is stored on the session as well.  This includes the OpenID information such as claimed identifier, actual identifier (may be delegate), provider, association handle, etc.


Primitive OpenID1 Support

June 17, 2009

I created ForustOp_Auth for all authentication operations (“auth.login” and “auth.logout”) and just got through a successful test using my Blogger OpenID.

Sparse login form, I've entered my OpenID without the scheme.

Sparse login form, I've entered my OpenID without the scheme.

Blogger's OpenID1 server asks for my approval.

Blogger's OpenID1 server asks for my approval.

And now I'm authenticated; I made the "create" form temporarily dependent on this for testing.

And now I'm authenticated; I made the "create" form temporarily dependent on this for testing.

Authentication results are put into the query string which I'm tracing here.

Authentication results are put into the query string which I'm tracing here.

Rather than use SimpleOpenID directly, I skimmed it to get the gist and wrote mine from scratch.  Just saying that induces a small shudder, but I wanted to fix some bugs (server URL’s with query string parts), use PHP5-specific stuff, and avoid hassles with mixing in their GPLv3 licensed code with my BSD licensed code.

That said, it has some terrible flaws and is nowhere near secure.  It’s enough to get me through the block I had.

You might note that the return_to URL has a parameter ‘p=1′ which has Forust operate on $_GET rather than $_POST.  OpenID servers never POST, unfortunately, so I extended my API a bit (honestly had planned to for testing anyway).

The OpenID server discovery code uses cURL (so https will work) and also the PHP5 DOM which is awesome, if a bit verbose.  It will handle improperly formed HTML and let you parse it like the XML DOM.  I only use it to enumerate all the link elements, but it allowed me to forgo the wacky RegExp patterns used in SimpleOpenID.

Things to be done:

  • Check the signature returned by the server.
  • Use the nonce, but where is it?  Is that only OpenID 2?
  • Check openid_mode (must be == “id_res”).
  • What happens if the same parameters are replayed elsewhere, couldn’t a snoop do that and be logged in with this identity?  This is what the nonce solves (albeit with a slight race condition), right?
  • I’m a bit worried about simply storing the user ID in the PHP session..  Again it seems like something a snoop could take and put in their own requests to operate as that user unless I put the whole site on HTTPS when logged in (ridiculous?).
  • Basic OpenID 2 support so I can use my Gmail account for testing.

There’s probably more, but these are enough questions to keep me busy.


OpenID Continued

June 17, 2009

I read part of the OpenID version 2 specification today which is essentially an overhaul or a parallel addition.  It can co-exist with version 1 (or 1.1) only because it seems to be completely different.  The super-simple library I was playing with only works with verison 1.  Experimenting with my own accounts I found that Google supports only version 2, Blogger only version 1, WordPress only version 1, and I couldn’t get Yahoo to work at all.  Oye, what a mess.

Tomorrow I must start coding something, this is taking way too long.  So, I’m going to wrap the OpenID functionality in a few high-level calls, adapted from the SimpleOpenID class, and leave it at that.  It works with my Blogger OpenID and that’ll be fine for now.  I’ll figure out all that OAuth, XRI, Yadis, IRL, etc. etc. crap out later.  Maybe by then the JanRain library will run in strict PHP 5.

On a related note, I tried the JavaScript-based openid-selector demo and found it to be lacking.  Again, I’ll try it again later when I need to let in more users besides myself (actually next week I’ll need to test out different accounts for different privileges, so maybe sooner than I’d hoped).  Specifically the Blogger functionality asks you for your “Blogger Username” which is a bit confusing and may be different than your URL if yours is domain-hosted (csynapse goes to a redirect page that doesn’t contain the OpenID tag — whoops!).  The WordPress one doesn’t work because my user name isn’t the same as my blog since I deleted the original and created another (but user names never change).

Okay, enough complaining, what do I want?  I’d like for people to be able to enter their username and password, period.  If they click on a “registration” link it would tell them that the site supports any OpenID with a brief explanation of how to get one and whether or not they might already have one.  It sounds like OpenID 2 might support that kind of thing, but the specification is highly fragmented and difficult to grasp/code.  It seems to me that it’s trying to re-invent a lot of things like domain names which already provide a universal label.  Sure you may change email addresses, but once you buy a domain name, it’s fairly cheap to keep for life (10 bucks a year, I mean come on).


OpenID has Problems

June 15, 2009

On Saturday I struggled through the PHP OpenID library documentation, trying to write a minimal example using hard-coded values.  I ended up just running the “consumer” example on my localhost, and didn’t feel any better for it.  “Great, I can jump through a handful of hoops as a user and see that I am who I say I am.  Now what?”  Today I tried reproducing the same example on my netbook, but here I have turned on strict error reporting and it doesn’t work.  Turns out the portability of the between PHP platforms also makes it invalid per PHP 5’s strict rules.  Nightmares of embedding Flash movies in [X]HTML suddenly resurface … Read the rest of this entry »