top of page

Exploring the Day-To-Day Tasks of a Data Scientist.

  • Writer: Tamia - Ashley
    Tamia - Ashley
  • Jan 29, 2024
  • 2 min read

As a beginner in the realm of data science it can be very intimidating when faced with a data science problem or project. The question - "Where do I start?" and the phrase - "Make it make sense" always come to mind. So for our first entry into data science, let us dive into the tasks performed by a data scientist in order to solve a problem.


The Day-to-Day Tasks of a Data Scientist :

1.

Data Management

- Here we collect and retrieve the data we need, securely, efficiently, and cost-effectively.

- The data can be collected from many sources, including X (Formerly Twitter), Flipchart, media, sensors, websites etc.

2.

Data Integration + Transformation

- Also known as the Extract, Transform and Load phase.

- The data is extracted [ from flat files, data cubes, relational databases etc.] and stored in a central repository. -Data transformation is the process of transforming the values, structure and format of the data.

- The transformed data is then loaded back into the data warehouse.

3.

Data Visualization

- The process of creating graphical representations of data and information.

- This can be done in the form of charts, plots, maps, and even animations.

- Data Visualization is an important step as it conveys data more effectively.

4.

Model Building

- This step is where we train the data and analyze patterns using suitable machine learning algorithms.

5.

Model Deployment

- Model deployment involves integrating the model into a production environment.

- We use APIs (Application Programming Interfaces) to enable data-based decisions.

6.

Model Monitoring + Assessment

- Model Monitoring tracks deployed models using tools like Fiddler.

- Model Assessment uses evaluation metrics such as the F1 Score, True Positive Rate, and the Sum of Squared Error to check for accuracy, fairness and robustness monitoring.



These tasks are supported by the following :

* Data Asset Management

  • This is a platform for organizing and managing the data

  • Supports replication, backup and access right management

* Code Asset Management

  • A unified view where we manage an inventory of assets.

  • Developers use versioning to track and manage changes to a software project's code.

  • Collaboration allows diverse people to share and update the same project together.

  • Example: GitHub

* Development Environments

  • Integrated Development Environments (IDEs) provide a workspace and tools to work on source code.

  • IDEs like IBM Watson Studio provide testing and simulation tools to emulate the real world so we can see how the code will behave after it is deployed.

* Execution Environments

  • An execution environment has libraries for code compiling and system resources to execute and verify code.

  • Cloud-Based execution environments are not tied to any specific hardware or software.

  • Example : IBM Watson Studio.





If this is you right now, I get it. It's a lot of words and a lot to digest. But, these are the day to day tasks of a data scientist when going about solving a problem. Breaking down the work of a data scientist is an effective way to remember the steps we should take when presented with a new project.


Let's utilize ChatGPT to create a fun mnemonic for the acronym DDDMMM using a memorable phrase:

"Daring Dragons Dance Merrily, Making Marvels!"

Each word in the phrase corresponds to one of the elements in the DDDMMM acronym:

  • Daring: Data Management

  • Dragons: Data Integration + Transformation

  • Dance: Data Visualization

  • Merrily: Model Building

  • Making: Model Deployment

  • Marvels: Model Monitoring + Assessment

Now, you can visualize daring dragons dancing merrily, making marvels to recall the key components of DDDMMM!


And...

"Daring Coders Develop Excellent Marvels!"


Each word in the phrase corresponds to one of the elements in the DCDE acronym:

  • Daring: Data Asset Management

  • Coders: Code Asset Management

  • Develop: Development Environments

  • Excellent: Execution Environments

  • Marvels: Represents the capabilities of these tools, supporting tasks highlighted in the acronym DDDMMM

Now, you can imagine daring coders developing excellent marvels while utilizing the Data Asset Management, Code Asset Management, Development Environments, and Execution Environments provided by the DCDE platforms!


Next week we will dive into the Software Tools that we can use to support and perform these tasks.


Thanks for reading with me. Can't wait to see you again next week.

XOXO

-Your Ditsy Data Scientist

Tamia-Ashley


Comments


bottom of page