The Data Science Terminology from a Personal Experience | Coursera Community
Coursera Header

The Data Science Terminology from a Personal Experience

  • 29 July 2019
  • 3 replies

Userlevel 4
Badge +4
Hello everybody.

We can easily find a lot of definitions for "Data Science" in a different point of views. The aim of my topic is to get insights from Data Scientists through sharing their personal experience in the field, by sharing how people understand the concept of DS.

My question is here:
How can you explain the terminology of Data Science based on your practical experience and educational background? How did you get into Data Science for the first time? Is there any specific project that let you know that you are solving an issue through Data Science?

Waiting for fruitful your answers. Thank you!

3 replies

Userlevel 1
Badge +1
Hi Mo!

I got into data science through a practical business route, working my way into a data analytics team, and from there I was able to develop my skills to a point where I was working with predictive analytics.

It's at this point that I consider that I was doing 'data science' rather than data analysis, visualization or statistics, applying machine learning techniques to make predictions about the future, and really leverage these over large amounts of data.

My first complete project in this area was making customer level predictions, in my case propensity to make a purchase in the next 7 days.

Looking forward to reading others varied responses to your excellent question!

My background is in psychology (I have a doctoral degree), and business - first, banking and then consulting. Most people think of psychology as being mainly a health profession (e.g. doing psychotherapy), but it is much more than that. Scientific research comprises a vast portion of the field. Psychometrics, in particular, is based heavily on statistics. Because humans are notoriously difficult to measure (in no small part, because we humans are both 'the observer' and 'the observed'), research methods are highly sophisticated. So, what I write, below, is heavily premised on these realities.

I have spent a career doing research that is methodologically and quantitatively sophisticated. My profession also requires me to adhere to a number of ethical standards not present in what is now known as data science.

I decided to pursue the professional data science certificate offered by IBM-Coursera because I wanted to learn newer tools, and because I hope that it will improve my career prospects.

Here are a few things I've learned (and a starter set of examples):

  • There is nothing about the procedures involved in 90% of DS that is new. What is new: the capacity to query databases in real time and to process large quantities of data, quickly and at vastly less cost. "Machine Learning," contrary to the material presented here, is not analogous to human learning. It consists largely of multivariate, inferential/predictive statistics - combined with what is known in Business school as "DeSCi," Decision Science, but is actually just a set of decision trees.
  • The 10% that is new is the addition of GIS data to the mix - and it's wonderful! I do hope that people using these data are aware that that, from the standpoint of analysis, Zip codes are entirely meaningless. They exist for the sole purpose of facilitating the delivery of postal mail. The census block group, however, is a meaningful unit of analysis, and is vastly preferable. (Note: data is plural for datum.)
  • The lexicon in DS is different. For instance, in most sciences, the levels of measurement are listed as nominal, ordinal, interval, and ratio. In DS, the terminology is different and not entirely overlapping (see: Wikipedia, Data Types). This was, and remains somewhat confusing for me. One reason for this is, of course, that I was unfamiliar with this vocabulary when I began the certification track.
  • However, there is another reason that is, to my mind, more important. That is: each level of measurement comes with a set of rules - for instance, you cannot perform analyses that depend on an interval level of measurement, if your data are binary or nominal. There are also assumptions that need to be met: for instance, heteroscedasticity, independence of samples (the train-test model is a case in point, since both analyses are run on one sample) and more. None of these were mentioned in the 9 courses I passed to attain this certification, and that worries me. Data Scientists seem not to be aware of the devastating effect that violating these assumptions can have on the accuracy of the results.
Hope this is helpful!
Userlevel 4
Badge +4
Thank you @benpowis and @saraw1, I appreciate your insightful sharing.

My educational background is in Mathematics, I have started my journey with DS after I realized that mathematical algorithms are being the secret of AI technologies, and a was curious to learn and passionate to build useful ML projects!

I'm interested to hear from others too!