66 Days of Data!

What is #66daysofdata?

#66DaysOfData was started by a famous youtuber named Ken Jee. For 66 days, everyday you have to learn something related to Data Science for a minimum of 5 minutes (5 minutes since one cannot be free for a long time everyday) and then share it on social media. This idea was taken up by the Data Science and Artifical Intelligence (DSAI) Society of IIIT Dharwad to start this initiative for the students of IIIT Dharwad. The society intends to make the students aware of the importance of data science and ML, and to help them in their journey of learning.

For more information you can checkout the video here.

In this article, I will be documenting my day to day journey of #66daysofdata starting 19th February, 2022.

Day 1 - 19th February, 2022

  • Revised the various EDA techniques used for Data Visualization.

Day 2 - 20th February, 2022

  • Today i learnt about the Singular Value Decomposition Technique (SVD) and how it can be used to reduce the dimensionality of the data.

Day 3 - 21st February, 2022

Day 4 - 22nd February, 2022

  • Read an article on the “Curse of Dimensionality Reduction” and why it is important to have lesser optimal features instead of huge number of dimensions. The various methods to solve this problem in case of huge number dimensions are:

    • by the use of Cosine Similarity instead of the usual Euclidean Distance

    • Prinicipal Component Analysis (PCA)

    • Dimensionality Reduction using Kernel PCA

    • Dimensionality Reduction using Random Projection

    • Dimensionality Reduction using LDA

    • Ref: Curse of Dimensionality Reduction

Day 5 - 23rd February, 2022

  • Learnt about the various feature selection methods used in ML.
    • Correlation Matrix
    • Univariate Selection
    • ExtraTreesClassifier Method
    • Ref: Feature Selection

Day 6 - 24th February, 2022

  • Learnt about how to handle imbalanced data
    • Under-sampling majority class
    • Over Sampling Minority class by duplication
    • Over Sampling minority class using Synthetic Minority Oversampling Technique (SMOTE)

Day 7 - 25th February, 2022

  • Learnt about the various Feature learning techniques and did hands on coding with the Breast Cancer data.

Day 8 - 26th February, 2022

  • Revised statistics for Machine Learning

Day 9 - 27th February, 2022

Day 10 - 28th February, 2022

  • Revised Neural Networks from scratch.
    • More emphasis on various Loss Functions and Optimizers.
    • The math behind these really excites me and I remember them as a story.

Day 11 - 1st March, 2022

Day 12 - 2nd March, 2022

Day 13 - 3rd March, 2022

  • Still revising basic Linear Algebra. However today shifting towards the advanced topics like SVD and PCA. Studied the SVD’s and PCA thoroughly and I LOVE THE MATH.

Day 14 - 4th March, 2022

  • Studied more about roc Curves and used Keras Tuner for Classification problem.
  • Ref: AUC ROC Curves
  • THE BEST docs for Keras Tuner are the Kears Docs itself. Hats off to the writers. Check them here

Day 15 - 5th March, 2022

  • Learnt about the various Ensemble techniques in Machine Learning. Mainly the bagging, boosting and Voting. Would do the hands on coding later, now I am just trying to know things and understand the intuition.

Day 16 - 6th March, 2022

  • Applied the Voting Ensemble classifier and also learnt about Startified KFold Cross validation.
    • For the Voting Classifier I prefer reading the sklearn docs, They provide a good oveview about the various params and also their examples are self sufficient/explainable.
    • Happy with the results i got using the Voting Classifier for a project of mine which uses real data. Would be speaking about that project once its complete. Using Voting Classifier we got 93% accuracy while individually the models used to give accuracy not more than 90%. In the Voting Classifier the models that I used are Logistic Regression, SVM and Decision Tree
    • Ref: Sklearn Voting Classifier
    • Ref: Stratified KFold Cross Validation

Day 17 - 7th March, 2022

  • Doing a quick recap of everything I learnt from Day 1 to Day 15.
    • Its important to revise your stuff regularly in the field of Data Science. I was supposed to revise it on Day 16 but eventually had to learn few things for my project.
    • No new learning today, but got a much better perspective about the previous topics after revising them.

Do checkout the article everyday if you wanna be a part of my journey for 66 days.

Socials