Archives For algorithms


The Internet has made the world a global village. One where it matters no more where you live to be connected with people.  It takes less time to share your thoughts with people that are with you digitally than those that you may run into physically. Location based personalization aside, everyone around the world can read the same news, get the same results when they search on a topic, see the same updates on Facebook and so on.  What exactly is this doing to our diversity?

Eli Parser discusses Filter Bubbles in his TED Talk and discusses how the Internet may be killing our diversity in opinions. The more a page gets viewed, the higher its rank gets; the higher its rank, the earlier it appears in search results; the earlier it appears, the more it gets viewed – this certainly can be a diversity killer.  This is more of an issue with social opinions and content – nobody wants to be that guy (gal) that stands out with a controversial opinion.  I do wonder about just how much Quora’s algorithms are able to extract and get visibility to the under-viewed and yet good content.  The reality is that the more upvotes an answer gets, it is likely to continue getting more upvotes in future.  Facebook and G+ are no exceptions. Our friends’ likes on a picture make us want to stop and look at it – and more often than not, we may end up liking it too.

Let’s look at the physical world here.  This phenomenon was certainly always present, but it was localized.  The Internet has taken a local phenomenon and made it global.  Is this a problem?  In more dimensions that we can imagine, this is generally a good thing.  It has reconnected us with lost friends and has made the world a smaller place.  But the culprit here seems to be the increasing consumption of content online.  We used to have several sources of content in the past – newspapers, magazines, television, etc. Increasingly, it is all converging to be online.  Our ranking algorithm was previously via word-of-mouth recommendations.  A friend asked us to check something out – in the process, we found something else and asked someone to check that out.  There was scope for interesting discovery.  We talked about opinions in smaller circles – there was room for potentially having varied opinions and not being the loner.

Now we are online and our opinions are too. When we say something, it is visible to a large audience, all at once (unless you have extraordinary patience to compartmentalize your audience).

Are we slowly killing the power of having different points of view?  If we are, that would also kill creativity and it will become a threat to innovation. Before that happens, our algorithms need to start having a measure of interesting and good that is independent of likes/views/votes so that we can take the road less traveled sometimes.



For an article that deems it provocative to question big data, the GigaOm post on big data today has a pretty provocative title of its own.  Provocative enough to have gotten me to read the article.  But, reading this reminded me of some other articles I’ve read that have been littered with buzzwords.  

I see this regularly in tech articles – a roughly related set of jargon that are all conflated, without actually making much sense.  In this particular case, the conflation is among data, databases/data storage, algorithms/data analysis methods and (predictive) models.  To put actual data (e.g., social media data that the article talks about), data computational paradigms such as Hadoop and models all at the same level as if there was some comparison to draw on these is glaringly ignorant.  The writer claims that big data might be more about automation than insights – even provides a link to another article (written by himself) that supports that theory! 

Part of the problem here is of course that the term “big data” has been miserably overloaded.  So much that “data” in that term is only incidental.  It is almost like a discipline now.  When we see articles like this one, it is time for a reset on the terminology.  It is partly because of this type of hype that we often get customers and others thinking that machine learning is “magic”.  And that it involves a black box taking a bunch of data and churning out magical insights.  

To be fair, not all of that article is useless – it makes a couple of valid points, especially around knowing expectations before setting out on a big data project.  It is true that you will get very little out of an exploration of data without objectives.  Knowing what you are looking to understand from the data is key to getting somewhere with it.  

When it comes to terminology, the tech world hasn’t been traditionally strong.  Terms like “big data” and “software defined X” have made it possible to turn anything into a topic related to the hottest buzzword in the industry.  It would be good to see better quality writings from famous professional blogs like GigaOm – but such things are par for the course!