How Long Does It Take to Learn Data Science Fundamentals? - KDnuggets (2024)

This article discusses 2 levels of data science learning, and the amount of time that will need to go into each. From 6 months to 4 years, this write-up covers a number of skills and how long it takes to acquire them.

By Benjamin Obi Tayo, Ph.D., KDnuggets on March 9, 2022 in Data Science

How Long Does It Take to Learn Data Science Fundamentals? - KDnuggets (1)
Image by Author

The time required to learn the fundamentals of data science could be classified into 3 main categories. We remark here that these are approximate values only. The amount of time required to gain a certain level of competence depends on your background and how much amount of time you are willing to invest in your data science studies. Typically, individuals with a background in an analytic discipline such as physics, mathematics, science, engineering, accounting, or computer science would need less time compared to individuals with backgrounds not complementary to data science.


At level one, a data science aspirant should be able to work with datasets generally presented in comma-separated values (CSV) file format. They should have competency in data basics; data visualization; and linear regression.

1.1 Data Basics


Be able to manipulate, clean, structure, scale, and engineer data. They should be skilled in usingpandas and NumPy libraries. Should have the following competencies:

  • Know how to import and export data stored in CSV file format
  • Be able to clean, wrangle, and organize data for further analysis or model building
  • Be able to deal with missing values in a dataset
  • Understand and be able to apply data imputation techniques suchas mean or median imputation
  • Be able to handle categorical data
  • Know how to partition a dataset into training and testing sets
  • Be able to scale data using scaling techniques such asnormalizationand standardization
  • Be able to compress data via dimensionality reduction techniques such as principal component analysis (PC)

1.2. Data Visualization


Be able to understand the essential components of good data visualization. Be able to use data visualization tools including Python’smatplotlib and seabornpackages; and R’s ggplot2 package. Should understand the essential components of good data visualization:

  • Data Component: An important first step in deciding how to visualize data is to know what type of data it is, e.g., categorical data, discrete data, continuous data, time-series data, etc.
  • Geometric Component:Here is where you decide what kind of visualization is suitable for your data, e.g., scatter plot, line graphs, bar plots, histograms, Q-Q plots, smooth densities, boxplots, pair plots, heatmaps, etc.
  • Mapping Component:Here, you need to decide what variable to use as yourx-variableand what to use as youry-variable. This is important especially when your dataset is multi-dimensional with several features.
  • Scale Component:Here, you decide what kind of scales to use, e.g., linear scale, log scale, etc.
  • Labels Component:This includes things like axes labels, titles, legends, font size to use, etc.
  • Ethical Component: Here, you want to make sure your visualization tells the true story. You need to be aware of your actions when cleaning, summarizing, manipulating, and producing a data visualization and ensure you aren’t using your visualization to mislead or manipulate your audience.

1.3 Supervised Learning (Predicting Continuous Target Variables)


Be familiar with linear regression and other advanced regression methods. Be competent in using packages such asscikit-learn and caretfor linear regression model building. Have the following competencies:

  • Be able to perform simple regression analysis using NumPy or Pylab
  • Be able to perform multiple regression analysis with scikit-learn
  • Understand regularized regression methods such as Lasso, Ridge, and Elastic Net
  • Understand other non-parametric regression methods such as KNeighbors regression (KNR), and Support Vector Regression (SVR)
  • Understand various metrics for evaluating a regression model such as MSE (mean square error), MAE (mean absolute error), and R2 score
  • Be able to compare different regression models


In addition to skills and competencies in level I, should have competencies in the following:

2.1 Supervised Learning (Predicting Discrete Target Variables)


Be familiar with binary classification algorithm such as:

  • Perceptron classifier
  • Logistic Regression classifier
  • Support Vector Machines (SVM)
  • Be able to solve nonlinear classification problems using kernel SVM
  • Decision tree classifier
  • K-nearest classifier
  • Naive Bayes classifier
  • Understand several metrics for accessing the quality of a classification algorithm such as accuracy, precision, sensitivity, specificity, recall, f-l score, confusion matrix, ROC curve.
  • Be able to use scikit-learn for model building

2.2 Model Evaluation and Hyperparameter Tuning

  • Be able to combine transformers and estimators in a pipeline
  • Be able to use k-fold cross-validation to assess model performance
  • Know how to debug classification algorithms with learning and validation curves
  • Be able to diagnose bias and variance problems with learning curves
  • Capable of addressing overfitting and underfitting with validation curves
  • Know how to fine-tune machine learning models viagrid search
  • Understand how totune hyperparametersviagrid search
  • Be able to read and interpret a confusion matrix
  • Be able to plot and interpret a receiver operating characteristic (ROC) curve

2.3 Combining Different Models forEnsemble Learning

  • Be able to use the ensemble method with different classifiers
  • Be able to combine different algorithms for classification
  • Know how to evaluate and tune the ensemble classifier


Be able to work with advanced datasets such as text, images, voice, and videos. In addition to the Basic and Intermediate skills, should have the following competencies:

  • Clustering Algorithm (Unsupervised Learning)
  • K-means
  • Deep Learning
  • Neural Networks
  • Keras
  • TensorFlow
  • PyTorch
  • Theano
  • Cloud Systems (AWS, Azure)


In summary, we’ve discussed the 3 levels of data science.Level 1competency can be achieved within 6 to 12 months.Level 2competencies can be achieved within 7 to 18 months.Level 3competenciescan be achieved within 18 to 48 months. It all depends on the amount of effort invested and the background of each individual.



Benjamin O. Tayo is a Physicist, Data Science Educator, and Writer, as well as the Owner of DataScienceHub. Previously, Benjamin was teaching Engineering and Physics at U. of Central Oklahoma, Grand Canyon U., and Pittsburgh State U.


More On This Topic

  • How Many AI Neurons Does It Take to Simulate a Brain Neuron?
  • Top 10 Data Science Courses to Take in 2021
  • Federated Learning: Google’s Take
  • What Took Me So Long to Land a Data Scientist Job
  • Classifying Long Text Documents Using BERT
How Long Does It Take to Learn Data Science Fundamentals? - KDnuggets (2024)
Top Articles
Latest Posts
Article information

Author: Sen. Emmett Berge

Last Updated:

Views: 5930

Rating: 5 / 5 (60 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Sen. Emmett Berge

Birthday: 1993-06-17

Address: 787 Elvis Divide, Port Brice, OH 24507-6802

Phone: +9779049645255

Job: Senior Healthcare Specialist

Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.