Very often, there is a confusion on the difference between Data Scientists and Data Engineers. Most organizations use the terms interchangeably thus causing further confusion between the exact roles and responsibilities for each profile.
If one looks from the perspective of the Data Value Chain (as shown below), it becomes clear that Data Engineers are the ones who serve data to Data Scientists in a platter.
The work starting from Data Acquisition to Data Ingestion, Curation, Data Wrangling and Data Manipulation/Preparation is under a Data Engineer's domain. Data Scientists will use the data prepared by Data Engineers and build the models. Most of the times, Data Scientists are doing data manipulation, which is a waste of their time. Data Scientists are the ones who are responsible for the Analysis and Operationalization of the data by applying various techniques including Machine Learning, Deep Learning and Artificial Intelligence.
Data Engineers deploy the models built by Data Scientists into production.
The skill set required for a Data Engineer is different from that of a Data Scientist. A Data Engineer would need Hadoop full-stack development skills along with various tools on the Hadoop Ecosystem. Whereas a Data Scientist mainly works with R or Python and should have a business mindset which enables them to solve business problems.
Therefore, my preference is to use the phrase 'Business Scientist' as opposed to 'Data Scientist' so that the distinction between a 'Data Engineer' and a 'Business Scientist' is conspicuous.
The term 'Data Scientists' has been around for almost a decade whereas, the term 'Data Engineers' is relatively new. There used to be roles such as ETL Developers, ETL Architects, Data Warehouse Developers, Data Architects etc. Now, all these roles are combined into a common one known as 'Data Engineers'.