Wednesday, October 07, 2009

Stop! Drop that field!

For PyOhio registration, we used a nice service called eventbrite. It worked great, but I have one big problem with it: it collected way too much data from registrants. It got us the data we needed, but it also asked for home addresses, gender, job title, company... all data we had no legitimate need for or plans to use, probably just because the fields are in the eventbrite form template. Entering it was pointless nuisance for our attendees, and maybe some were actually put off by the length or intrusiveness of the registration form. (Dave Stanek, if you're reading this, let's see if we can change that for next year.)

We are so not the only offenders in this department. It's everywhere, it's endemic. At website after website, we're asked to provide information of no apparent relevance to the sites' purposes. It's so easy to throw field after field into a data collection form; templates are provided with every conceivable field already in place; and - well, why not? Isn't more data better?

No. No, it's not. Excess data takes time, clutters databases, obscures important data, increases risks of data leakage. In interpersonal interactions, we always have the option of asking "Why do you need to know that?", or just giving people that funny look that tells them they're going out of bounds. On paper forms, we can leave fields blank. Automated forms with field validation cut those safeguards off and open the door to compulsive collection syndrome. The one defense people do have against intrusive electronic forms - lying - ruins data quality, and false data is much worse than no data at all.

We need a ethos of restraint in data collection, of always asking, "Why am I collecting this field?" Data collection needs to be seen as something that is not pure good, but something that has a cost to weigh against the benefit. Not collecting data is often the responsible choice, and we need to teach each other that.

6 comments:

benjaminws said...

Shoulda used that Django site dstanek and I whipped up :)

But seriously, I agree with you 110%!!

Unknown said...

Ha. Put two or three Pythonistas together and they compulsively start building a web application. If you're lucky, you can stop them short of actually building a new web framework.

Anonymous said...

Reminds me of a data mining enthusiast who, when shown some open pit mining software on production tracking and geospatial modeling, replied, "You know, it would be really neat if you could click on a blasthole (on an onscreen map) and find out how many kids the driller has that are enrolled in the company's health plan." This is probably why those fields get put in as a default. (BTW, you're right, they don't need to be there).

rgz said...

Amen.

Funny thing is, I was just trying to post that short 5 char comment but couldn't. Google is refusing to sign me in saying I need to enable cookies which are enabled. No other site gives me problems, I can even log in from google's search page, my guess is that google is trying to coerce me into enabling third parties to read my cookies.

Also while i was in my google account page I was reminded that while I asked for an app engine account to toy around I haven't done so because they want my cellphone number, why do they need it?

This data mining really puts me off.

Unknown said...

That's totally configurable on eventbrite -- you can do it through the event management interface. You can also add fields you do need (eg. tshirt size) if you want to, and set which fields are visible to everyone, eg. if you want to show who's attending and what companies they represent, or you want to hide it, or whatever.

boredandblogging said...

Skud is right. I think the absolute minimum information required by eventbrite is an email address.

If more information was required, someone at PyOhio did that.