Archives For Data

Disclaimer: All thoughts are my own and does not reflect the views of my employer in any way. 

Image

Telegram saw 8M downloads the day Whatsapp was acquired by Facebook.  This is not new.  When Parse was acquired by Facebook, the blogs rushed to write about why this is great for their rivals. Stackmob accelerated its Parse migration pipeline and came out with it in just a weekend! There was outrage when Tumblr was acquired by Yahoo!.

Image

There are, of course, several reasons for such reactions and in each case, it is slightly different. Early adopters get so entrenched in their favorite platforms that there is a sense of ownership – when a drastic change occurs, it feels like their trust has been misplaced or that they have been betrayed.  But, beyond all this, there is another challenge here that we are seeing, such as in the case of the Whatsapp-Facebook situation – why should Facebook have all my data?  This alone causes a split market in terms of data ownership.

When Facebook published its recent upgrade to the Android app, it asked for permission to read SMS.  As much as I like to be on top of the world of mobile apps, I said no to the upgrade on my primary phone.  I had enough secondary devices on which I don’t use SMS to try out the new app!  To this day, my Facebook remains at v3.9 on my primary phone!  The thought of Facebook reading my text messages just did not sit well with me.  Of course, now I’m faced with the challenge of using or not using Whatsapp!  (Just to remove any ambiguity, I fully plan on continuing to use Whatsapp, unless Facebook decides to mess with it like LinkedIn did with Pulse!).

Image

Recently, a friend that I recommended SwiftKey to said that he did not agree to SwiftKey learning from his GMail – they had no business knowing the content of his emails!  My reasoning around the benefits of personalization that can shave off minutes in typing a single email did not manage to convince him.

So, what exactly is behind these strong feelings about who can or cannot read the various parts of our data?  Mostly, just personal principles.  For most people, when it comes down to it, as long as the data is “secure” and “private”, this means nothing and they only stand to benefit from all the personalization it can enable.  However, there are two problems – we don’t always believe it is in fact, “secure” and “private” and we have our biases in which companies we love and trust.

But the knee jerk reaction to these acquisitions tells a very interesting story.  In reality, we are faced with this particular challenge:

Do I want to give more of my data to the bigger companies that can aggregate various types of data to learn all kinds of crazy things about me? Or, do I want to give my data to a small startup that has no resources to even consider implementing security correctly? 

This is a very difficult conundrum, particularly because, “implementing security correctly” is a non-trivial task, that most developers are quite bad at by default.  When you are big, you have a responsibility to keep the data secure – way more so than we can imagine.  When you are small, there is no real upside to spending the time on security.  It slows down the development to think about it from an architectural perspective and get the pieces right. All the security holes in Snapchat and other small apps are testimony to this. Security gaps happen even in big companies, where this is taken seriously and experts are hired to ensure correctness. One can imagine why it is more or less just “winged” in the smaller ones. This is not a reflection of anything in particular – it is often just a lack of resources to focus on everything, when you are a startup.

I don’t particularly have an answer to this conundrum.  But, as a user, convenience trumps everything – which means that I will use an app from a small startup if it does the right things to make my life simpler.  That said, I generally have less issues with giving up my privacy to the bigger companies – the value that personalization can bring is huge and I’m looking forward to it!

Image

 

For an article that deems it provocative to question big data, the GigaOm post on big data today has a pretty provocative title of its own.  Provocative enough to have gotten me to read the article.  But, reading this reminded me of some other articles I’ve read that have been littered with buzzwords.  

I see this regularly in tech articles – a roughly related set of jargon that are all conflated, without actually making much sense.  In this particular case, the conflation is among data, databases/data storage, algorithms/data analysis methods and (predictive) models.  To put actual data (e.g., social media data that the article talks about), data computational paradigms such as Hadoop and models all at the same level as if there was some comparison to draw on these is glaringly ignorant.  The writer claims that big data might be more about automation than insights – even provides a link to another article (written by himself) that supports that theory! 

Part of the problem here is of course that the term “big data” has been miserably overloaded.  So much that “data” in that term is only incidental.  It is almost like a discipline now.  When we see articles like this one, it is time for a reset on the terminology.  It is partly because of this type of hype that we often get customers and others thinking that machine learning is “magic”.  And that it involves a black box taking a bunch of data and churning out magical insights.  

To be fair, not all of that article is useless – it makes a couple of valid points, especially around knowing expectations before setting out on a big data project.  It is true that you will get very little out of an exploration of data without objectives.  Knowing what you are looking to understand from the data is key to getting somewhere with it.  

When it comes to terminology, the tech world hasn’t been traditionally strong.  Terms like “big data” and “software defined X” have made it possible to turn anything into a topic related to the hottest buzzword in the industry.  It would be good to see better quality writings from famous professional blogs like GigaOm – but such things are par for the course!