RSS
 

Posts Tagged ‘data’

How Hulu Should Use My Data

15 Jun

It’s always been a little strange to me that Hulu has profiles. I suppose they’re for people who interact via the comments on videos, but the profiles seem so bland and token. It’s as if someone remembered to add them right before they shipped and then they forgot.

Specifically, the part of Hulu profiles I’m curious about is the “Favorite TV shows” and “Favorite Movies” boxes.

Hulu: don’t you know what my favorite movies and TV shows are?

Hulu should tell me my favorites. Do I ostensibly love The Simpsons but never watch it? (Yes.) Do I watch re-runs of pseudo-crime dramas like Lie To Me whenever they’re available? (Yes.) Have I watched every vampire-related bit of video available? (No. Only Buffy.)

Hulu already does recommendations, but they are surprisingly easy to miss, and seem to be based on watching one particular show or another. They’re not terrible, but when Netflix is willing to spend a million dollars to get some improvement in its recommendation engine, it’s obvious that people are looking for a little more than Hulu is giving at the moment.

I would like to see Hulu become the Last.fm of TV and film: teach me about myself, and bring that data and recommendations front and center. Put in a “Recommended Channel” and just show me stuff you think I’ll like, based not on a single movie or show, but the whole package, and what other people with similar taste also like. Show me those other people, too, and let me watch a “Neighborhood Channel.”

Go a step further and tell me what other people in my physical area are watching. You ask for my ZIP code: use it for more than marketing stats. If all my neighbors or coworkers are watching a show, maybe it’s worth checking out.

If there are two things people like, it’s being told random facts about themselves, and being told what else to like. Hulu should leverage that.

 
2 Comments

Posted in Articles

 

Farewell, Facebook

06 May

On Monday, I deleted my Facebook account. A day before I hit the button, I posted a note letting people know where they could find me online if they wanted, and promising more of an explanation: here it is.

I’m a control freak. I run my own web servers, mail server, IRC server, CI server, SVN server, so I have control. If I could afford the colocation, I’d run them on my own hardware. Hell, if I could afford the bandwidth, I’d install a rack in my closet.

But most importantly, I want control of the data. My data.

Facebook recently made two changes to their service that signal a significant and frightening shift in their position on data—specifically who owns and has control over data. They automatically linked interests to public pages, and they introduced “Social Plugins and Instant Personalization.”

Until now, even if I decided to be permissive with my data, I still felt like I was in control of them on Facebook. With the new “connections” feature, as the EFF says, “Facebook users now face a Hobson’s choice between the new Connections and no listed interests at all.” I no longer have the option to share my data with the subset of people I know: either I share them with everyone, in particular advertisers, or I don’t post data at all.

I mention advertisers because they are most likely consumer of the vast quantities of aggregate data Facebook is creating with the new connections feature. Surely no individual will gain anything from knowing that several million people share their interest in Lady Gaga.

And until now, I had the ability to whitelist the applications with which I shared data. I routinely hit a wall as I browsed my friends’ activity, where I would be asked to choose between sharing my data with an application or not seeing its content. More often than not, I chose not to share, and live without the content.

This makes three things about the Instant Personalization onerous: the presumptive sharing with third parties; the shift to a blacklist, where I must specifically opt out; and the willingness to share data even if I have opted out in general.

  • Facebook has decided that Yelp, Microsoft’s Docs.com, and Pandora should have access to my data. I was not part of that decision.
  • If I opt out and turn off Instant Personalization, Facebook will still share my data with these third parties, if my friends choose to use their services. Again, I am not part of that decision.
  • In order to prevent Facebook from sharing my data with them, I have to manually block each application. That’s annoying, but manageable when it’s just three applications, but it’s not scalable.

This is all scary. Facebook could not have made these changes if they honestly believed that I own my data, and they have access with my permission. These changes indicate that Facebook believes they own my data, and will do with them what they please, unless I go out of my way to ask them not to.

I’ve always had mixed feelings about the protest groups that form on Facebook after every major change. Sure, Facebook staff are more likely to notice a Facebook group with 100,000 members than 100,000 individual blog posts, but in our socio-economic system, the real way to signal displeasure to a business is to stop using that business—the online equivalent of “voting with your wallet.”

So, like a few others, I’m taking my data and going home.

I’m willing to share my data with Facebook as long as I ultimately feel in control. It’s possible that I’ll come back to Facebook if they’re willing to not only fix these particular issues but also make it clear that I am ultimately in control of my own data. That doesn’t seem likely.

What do you think about Facebook, these changes, and your data? Let me know in the comments.

Facebook served as an aggregator of my activity online, and now all those aggregated feeds are alone and disparate again. I’m looking at turning jamessocol.com into a lifestream/aggregator to make up for it. I looked at Planet Venus but wasn’t thrilled with it. If you know of any cool software for that, let me know. Otherwise I’ll write something and play with things like Redis, Node.js, Tornado, and/or other neat stuff.

And yes, I know Tornado is from Facebook.

 
13 Comments

Posted in Articles

 

Bleach, HTML sanitizer and auto-linker

25 Feb

Bleach is a whitelist-based HTML sanitizer and auto-linker in Python, built on html5lib, for AMO and SUMO and released under the BSD license.

Bleach has two main functions: sanitizing HTML based on a whitelist of tags and attributes, and turning URLs into links. It uses html5lib for both.

For more information on using Bleach, see the README included in the source. For more info on how Bleach works, follow below the jump. Read the rest of this entry »

 
1 Comment

Posted in Articles