Data Science versus Bioinformatics? What are the differences? | Coursera Community
Coursera Header

Data Science versus Bioinformatics? What are the differences?

  • 9 November 2018
  • 20 replies
  • 2174 views

Userlevel 6
Badge +12
Hello everyone,

I am inspired by @richk , who asked the question, "What is Data Science" here: https://coursera.community/data-science-8/what-is-data-science-106 and has not received an answer yet. 😉

I am in clinical science. I naturally collect data as part of my work and analyse with various statistical methods and some complex mathematical modeling for predictive purposes. Some people categorized what I do as "Bioinformatics" but according to some, such categorization was debatable. Since both Bioinformatics and Data Science fields improved tremendously in the last decade, I am not sure of the categorization of which is which anymore. Am I into Data Science as well as Bioinformatics and I may not know it? 🙂

It would be great to receive some opinions and comments from those of you who know about these fields; about their differences and what may qualify as Data Science or as Bioinformatics.

I was not sure if there was only one answer for such a query. So, I post it as discussion to keep it open for everyone to comment freely. Thank you in advance.

20 replies

Userlevel 5
Badge +5
It sounds like there are very similar. But in Data Science models are trained based on pairings of target data (the thing you are trying to predict) and input data (data which has some relation to the target parameter). In Data Science a model is generated by running training data through an algorithm. Then that model is used to make predictions. This differs to traditional mathematical models which make predictions based on known relationships.

Are the complex mathematical models you refer to making predictions based on known relationships or are they purely data driven predictions?
Userlevel 6
Badge +12
Thank you @Liz . The models are created based on the data I collect and their significance. I am actually creating the mathematical models for predictive purposes using the data I collect. So, that would mean then that they are data driven predictions?
Userlevel 5
Badge +5
What is the process you follow to create the models, could you give an example?
Userlevel 6
Badge +12
@Liz I run the data through an algorithm and check their significance for predictive purposes, based on known outcomes at this stage. If they are successful for the known relationships, they could be used for unknown things in the future.
Userlevel 5
Badge +5
That sounds exactly like Data Science to me.
Userlevel 6
Badge +12
Thank you so much @Liz . My mentors will be happy to hear that. 🙂
Userlevel 2
Badge +1
Good questions!

I first give my personal definition of data science. That is, when your work joins three fields: math, software and science to the same degrees. Missing skills in one of them, leads to insufficient results. The material of data science are quantities in first place.

Now Data Science can have different specialisations, depending on the field you are working on. You say you do clinical data science and I would hold to that term, as it describes the focus of you work.

Bioinformatics has a focus on genomes and proteins as sequences. The material of bioinformatics are strings in first place. Strings are data, too. Yet it differs much from data science in other fields, like the things you are doing.

It is related to many other fields like computer science, biology, biochemistry, medicine, statistics, math and engineering. It makes a huge difference, if you use the tools, code them or even develop new and better data models and algorithms.
Userlevel 1
Badge
As happens with all these new terms that are arising in the computer science scene, they are pretty interconnected. All of them have a little piece of the other.

Bioinformatics: it uses statistics, machine learning and OTHER informatics techniques.

Data science: it is used in fields like bionformatics and biostatistics, and in others areas as well.

So, you can be perfectly doing them both, but, of course, an important part of data science is the conclusions presentation, in addition to coding . If you are just finding statistical relationships and performing testing hypothesis, your are mostly doing biostatistics. Instead, if you are trying to find any other value from data, yeah, this is data science!

Anyway, there aren't concrete bounds. The frontiers are really diffuse.
Userlevel 2
Badge +1
Hello Javier,

it's right, both the plumber and the carpenter use hammers and saws. Does this make it the same profession? Are the frontiers diffuse?

The focus of the work is different. The material is different. Even the hammers and saws are different.

Doing clinical research is not doing bioinformatics. I stress the difference in the material ( numbers vs strings) and the difference in the tools. Saying you are a data scientist is like saying I am a craftsman in data.

Fortunately the borders are still not strictly regulated. We are free to switch the fields. It's the more difficult for the employer, to find the craftsman having the required skills and experience.
Userlevel 2
Badge +1
If you like to get an idea of Bioinformatics, I am currently setting up a page to run all algorithms I implemented.

https://elmarhinz.pythonanywhere.com/challenges/

I case it it is public protected:

login: visitor
password: welcome
Userlevel 1
Badge
Bioinformatics is much more than sequences and biological structures (or biochemical structures, my current area of work) handling using computational methods (that is, the bioinformatics for dummies book) High Throughput Analyses, for example, are found in the syllabus of any MSc in bioinformatics that you can look up (and they are broadly ANOVA analysis, that is, linear models, that is, statistics. They are numbers, not just strings)

I think that your example of plumbing and carpentry is not a good analogy. And I reiterate that limits between bionformatics and data science are not strictly delimited just because bioinformatics can use data science and data science, as you said, is data craftsmanship which can be use in bioinformatics or any unrelated area. Bioinformatics and biostatistics are subareas of biomedical data science, so you can be doing both at the same time.

https://csumb.edu/bd2k

But I also think that a discussion about the definition of these modern "buzz"-words is a little sterile. For example, the meaning of bioinformatics has change since its conception. They change as science and technology advance.

Well, all this is just my opinion. Probably, many people have many different things to say about this.
Userlevel 2
Badge +1
Sure you have to deal with numbers in Bioinformatics, too. Yet the problems are quite different from clinical research. High Throughput Analyses, the large amount of data, is the critical factor in important ranges of Bioinformatics today.

Developing clever algorithms like BLAST is Bioinformatics as well and definitely not the doing of dummies. Again this is an algorithm to address the challenge of huge data size. Bringing complex algorithms down to linear time and space is challenging. The huge amount of data is setting such constraints.

In my opinion the large amount of data is not the critical aspect in clinical research. I think you often have rather few data and the primary question is how far you can deduce valid predictions based on this limited data. Denise may tell me better.

So I think the example of plumbing and carpentry isn't that bad after all. Different goals require different approaches and a different usage of the tools.
Userlevel 1
Badge
'Bioinformatics for Dummies' is just the title of a book, it has nothing to do with the algorithms being "dummies" 😉 (I know the insides of the BLAST algorithm). It is an old book, and I guess this is why it only talks about sequences and structures.

High-throughput analysis is another subject of bioinformatics (much related with biostatistics) which deals with microarrays and next generation sequencing (NGS)

Of course, the amount of data is not the critical aspect in clinical research nor in bioinformatics. Sequence Analysis and High-Throughput Analysis are just two topics of bioinformatics that you can find in the corpus of a bioinformatics education program. One of them is about strings (the algorithms you have implemented) but the other have much to do with numbers. Both of them are part of bioinformatics, that is my point.
Userlevel 6
Badge +12
Thanks so much @Elmar and @Javier for this stimulating discussion on the nature of Bioinformatics and the references. I have been hearing different opinions too over the years on the definition of Bioinformatics, depending on who I consult, which was part of my incentive to start this thread. 🙂

There was a season when some experts tried to restrict the definition of Bioinformatics mainly to genomics and proteomics as @Elmar highlighted but its definition and application is a lot broader much earlier on and nowadays, I think, as @Javier emphasized. This website here: http://www.bioinformatics.org/wiki/bioinformatics indeed confirms this by a statement: "Bioinformatics has been defined many different ways, since practitioners do not always agree upon the scope of its use within the biological and computer sciences, but it is always considered a combination of both sciences, along with other contributing disciplines."

Wikipedia seems to have a good coverage on Bioinformatics here: https://en.wikipedia.org/wiki/Bioinformatics
that includes everyone's definition, I think. 🙂

This website here: https://learn.org/articles/What_is_Bioinformatics.html simply says: "Bioinformatics is the application of information technology to the study of living things, usually at the molecular level. Bioinformatics involves the use of computers to collect, organize and use biological information to answer questions in fields like evolutionary biology," which seems to be rather broad as well.

Another website here appears to concur that perhaps: https://www.usfhealthonline.com/resources/key-concepts/what-is-bioinformatics/

This website here: http://theconversation.com/explainer-what-is-bioinformatics-9911 has a really good coverage of the history of Bioinformatics, I think.

What do you all think? Is the information on these links accurate? If you have more links to share or comment further, please feel free. Thank you.
Userlevel 2
Badge +1

There was a season when some experts tried to restrict the definition of Bioinformatics mainly to genomics and proteomics as @Elmar highlighted


I think that you can't define Bioinformatics, if you exclude the genomics and proteomics part. It's the core. The rest is the extension. If it is your profession to apply the tools of data science to bioinformatics result sets, then it is data science. You don't need to know anything of genomics and proteomics algorithms to do so. It does not make you a Physician either, if you apply data science upon clinical result sets, but you may do it as a Physician.
Userlevel 6
Badge +12
I am not sure if you are referring to me specifically @Elmar by " If it is your profession to apply the tools of data science to bioinformatics result sets, then it is data science." For me the answer is "no." I was not trying to call myself as a "data scientist" either with this discussion, just to clarify in case of any misconceptions. Definition of my profession is very clear. I was merely inquiring on current points of view on these topics...
Userlevel 2
Badge +1
I was not trying to call myself as a "data scientist" either with this discussion, just to clarify in case of any misconceptions. Definition of my profession is very clear. I was merely inquiring on current points of view on these topics...

Then how would you call your profession if data science is your main occupation? What do you write into your tax declaration?
Userlevel 6
Badge +12
Data science or bioinformatics are not my main occupation @Elmar , They are part of it. I’m a clinical scientist or a biomedical scientist.
Userlevel 2
Badge +1
Data science or bioinformatics are not my main occupation @Elmar , They are part of it. I’m a clinical scientist or a biomedical scientist.

Every classical scientist is also a data scientist, as there is hardly a scientific field without numbers. Scientists already have been data scientists before the area of the computer. You are like the Physician, that also does data science or like the Bioinformatic scientist, that also does data science.

Going back to my original definition of Data Scientist as a modern profession. Then it is a person in the center of math, software and science.

The person knows more of math and and software than the classical scientist. Or more of science and software than the classical mathematician and so on. On the other hand it does not need full inside into all of the three fields, just enough to do the job well.
Userlevel 6
Badge +12
Yes, that sounds accurate @Elmar . Thank you for the explanations. 🙂

Reply