Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used the dataset with my students, as it is small and does not require preprocessing, like dummy coding or handling missing values. Students also brought the racial issue to my attention and it created a bit of a discussion. We eventually decided to simply change the definition to "birds by town" and moved on.

Think of all the children books that get rewritten. Read the new ones to your children and discuss the old ones when they are teenagers. I would have preferred if sklearn contributors had done the same and simply revised the description as opposed to removing the dataset.

EDIT: changed "banning" to "removing" the dataset



Can you really call this "banning the dataset"? https://github.com/scikit-learn/scikit-learn/commit/8a86e219...


This is an impressively responsible way of handling the situation, and I'd recommend that others read it as well. It identifies the specific problem with the dataset which led to its removal from the library (with references!), tells the user how to retrieve it if they really need it, and suggests alternatives.


Does it make sense to revise a definition of a real world data series to some random definition like "birds by town"?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: