big data: a definition

People often complain, justifiably, that “big data” is a catchy phrase, not a real concept. And yes, it certainly is hot, but that doesn’t mean that you can’t come up with a useful definition that can guide research. Here is my definition – big data is data that has the following properties:

  • Size: The data is “large” when compared to the data normally used in social science. Normally, surveys only have data from a few thousand people. The World Values Survey, probably the largest conventional data set used by social scientists, has about  two hundred thousand people in it. “Big data” starts in the millions of observations.
  • Source: The data is generated through the use of the Internet – email, social media, web sites, etc.
  • Natural: It generated through routine daily activity (e.g., email or Facebook likes) . It is not, primarily, created in the artificial environment of a survey…

