EwePlay: MP3 Sync

October 26, 2008

The main feature of my original EwePlay idea was sort of a personal radio whereby you selected/created your mood and then actively or passively rated songs as it chose them.  If you let a song play through then it would decide it’s “good enough” and if you skipped then obviously it was wrong for that mood, at least that time.  An explicit thumbs up means it totally belongs in that mood category and an explicit thumbs down would of course be the opposite.  In the middle it would just be a good random shuffle that intermittently included songs you skipped and more often songs you let play all the way through.  The memory of this would be lifetime based so, unlike Pandora, it wouldn’t start up by playing those four or five songs you thumbed up every god damned time.

Something I was thinking about today is finally moving my music collection online.  Bandwidth and storage is cheap, especially when you don’t pay for it, and if you’re streaming your own music with a password-protected area then you’re not doing anything illegal (personal online storage, not public/published files).  What I’d want to access this is a simple application, aka the EwePlay client which would synchronize the online data with some place I’ve designated on disk.  It wouldn’t be a full-synchronization, but rather each time a song is played it would be downloaded too (assuming it finishes downloading).  That way once a computer is up to date, it isn’t chewing up much bandwidth and you could play the files offline.

Rather than files being stored in some wonky cache database, their actual data would be files on disk with semi-friendly names based on unique identifying factors: normalized song name (decide what to do with non-file-name-friendly characters, spaces, etc.), the artist “category” (I say category, because you might use “NiN” instead of “Nine Inch Nails”, or “Zombie” instead of “White Zombie” / “Rob Zombie”, etc.), the original release date/year if known, and a hash of the song content itself (not the MP3 meta data).  These three things would be contained in an unchanging syntax anywhere in the filename so the rest of the name could be named whatever you wish at the time.  I know my thoughts on naming music has changed overtime and it’s annoying to constantly re-organize and weed out duplicates when something else could do it for you.

These unique song ID’s would be stored in the filename and never the folder so you could have it sync up your “Music” folder on XP and then use Windows Media Player, WinAmp, or whatever to play the songs.  Or you could use the Flash/ActiveX webpage if you’re online.  The client might eventually evolve to containing the online logic, but for the first phase all of it would be on the server.  This way your continued listening preferences would be constant across all machines, even guest machines like at the library or on a workstation at your job.


Mostly Unique Picture Filenames

October 26, 2008

At the most broad level you can divide pictures into two categories: those you take yourself and those you have downloaded from elsewhere.  Tack on a simple hash of the contents in some consistent manner and you can practically guarantee global uniqueness.

In the case of the former, the date is the only unique constant.  As a non-professional photographer, you can’t without extreme effort take two pictures at exactly the same time.  So while it is possible, you can compare the dates (the value stored in the JPEG meta-data, file date values can get mangled) of two files to determine their equalness.

For files downloaded from elsewhere on the internet which have been passed through some kind of editor, and no longer retain their pictures-taken-on meta-data, you can only do a hash of the content and compare file sizes.  The problem comes about when it has been resized so that both the content hash code and file sizes are different even though the visual is practically the same.  In most cases I do believe that people modify a picture file in order to resize it, and the reason just below that is to add their own mark to advertise their site (a tag at the bottom, a watermark, a logo, etc.).  There’s little you can do about the latter, but what if the hash was done on a pre-resized version of the picture.  Say you always resize the dimensions to 425xN where 425 is the width and N is the height relative to the width after it’s been changed.  Then run the image through a filter to remove minor level changes and hash that content.  I wonder how effective it would be in finding duplicates, and then at that point how to programmatically tell which version is higher quality.

You can’t always determine the quality level of a picture file based on its dimensions or file size.  People tend to explode both of these artificially, for whatever reason, and quite often I’ve seen pictures where the smaller of two duplicates is better looking.  I think there’s some typical automatic logic you can apply to get decent results.

  • Insanely high dimensions are indicative of an original because only complete assholes will blow up a tiny image into something like 2200×3600.  The threshold of a half-megabyte file size will generally tell you if this is the case or not.
  • If JPEG meta-data exists, you can compare its values to the picture itself.  For example, it may have an “Original Resolution” property which you can compare against the image dimensions.

I’m just thinking aloud as I ponder collecting assets for mini-content sites.