Datascience in Towards Data Science on Medium,

4 Years of Data Science in 8 Minutes

10/24/2024 Jesus Santana

What I have learned in my 4+ year journey of studying data science

I have been studying data science for the past 4 years, during which time I have worked as a professional data scientist for over 3 years.

I started by studying physics and was not too certain about what I wanted to do after University. However, now I have a career I love and can’t see myself doing anything else!

In this article, I want to review my journey year by year and share my experiences, learnings, and failures that helped propel me as a data scientist.

I hope you can find value in my journey that will help you in yours!

Year 1

As some of you may know, my original plan was to do a Physics PhD with the ultimate goal of becoming a professional researcher!

For my master’s program, I had to do a research year from the last half of my third to the first of my fourth year. This is basically the whole of 2020.

However, during this year, I realised that Physics research was probably not for me. Everything moved so slowly, and I wanted an industry with frequent breakthroughs. It wasn’t quite like the quantum mechanics revolution of the early 20th century.

At that time, DeepMind dropped their AlphaGO documentary, and after immediately watching it, I became hooked.

I didn’t know exactly what to do; I just knew I wanted to study machine learning and AI. Naturally, this led me to look at the data scientist career path, and I was hell-bent on becoming one.

The rest of the year was purely studying and learning.

Luckily, my physics degree gave me all the necessary maths background, so I could go straight into learning the ML theory.

I took two courses:

Both were recommended to me by a former physics student from my university who then went on to become a data scientist.

I am proud of myself here because I didn’t waste time trying to find the “perfect course”; I just took the recommendation and started learning, which is the advice I give to all people.

These courses will teach all the fundamentals of machine learning, such as linear regression, logistic regression, decision trees, neural networks, support vector machines, CNNs, RNNs, and even a bit of reinforcement learning.

I also learned Python and SQL because the only programming language I knew at the time was FORTRAN! Yes, you heard me right! FORTRAN is the oldest general-purpose language, but it is great for number crunching and scientific simulations.

I took a few entry courses for both languages and built several data science projects to learn them. This is the exact process I recommend. Learn something and then immediately implement it and build something.

During all this studying, I also applied for jobs. I applied to over 300 roles, albeit not all were data science positions. I got a few offers here and there, but the one I accepted, I got about July.

So, this first year was full of learning, and my main takeaways are:

Don’t waste time finding the “best course.”
Learn, then immediately implement. Particularly when it comes to coding.
A volume approach to applying for jobs is a viable option.

Year 2

At this point, I started working as a graduate data scientist at an Insurance company in London.

This was the first time I had the opportunity to implement data science in a real-world setting, and I learned a lot!

Most of the algorithms I worked on were gradient-boosted trees using mainly the XGBoost, LightGBM and CatBoost libraries. These are still the gold standard today, particularly for tabular data, and cover the majority of ML use cases.

I also got acquainted with technologies like Snowflake, Data Bricks, git and Jenkins, which are all very important in the data ecosystem.

As a data scientist, I learned how to build pricing and fraud models and how to really drive business value. Building fancy ML models is cool, but your sole goal is to be a net positive for the business, and you can do that by building simple tools. I even learned a bit of NLP!

While working full-time, I also started this Medium blog to help me up-skill myself in data science. I studied in my spare time, and I wrote an article about anything I learned, which is an example of using the Feynman technique.

I basically spent the whole year learning statistics at an undergraduate level, well at least the first two years! I went through the statistics modules of my former University and taught myself nearly everything on that list.

I learned many probability distributions, generalised linear models, Bayesian statistics, Markov Chains, probability theory and a lot more. Nearly all of it is documented on this blog that you can still view today!

You can check out my posts on these topics below.

It helped that I was working at an Insurance company, as there were many Actuaries around who were experts in applied statistics to help me. An advanced fundamental understanding of statistics is invaluable in your career, and I recommend it to everyone.

My main takeaway was:

It’s more about the problem and how you frame it than the tools you use.
Continual learning is probably the only “secret” to becoming a good data scientist.
Always focus on business impact.
Gain a great grounding in statistics.

Year 3

After spending a year at the insurance company, I was offered a new opportunity at my current employer.

The main reason I switched was because my current employer is more of a tech company, so I would learn a lot about engineering and deployment of ML models.

This was my first time working on production code; most previous work was on isolated PoCs. Adjusting to writing high-quality Python code took a while and the learning curve was very steep. I had to learn things like testing, typing, linting, CI/CD, and working with AWS.

Fair to say I learned a lot, but this is what usually happens when you switch companies. So, if you are feeling like your skills are plateauing, maybe its time for a change either internally or externally?

Before, I would say I was quite a generalist, but in my new role, I was working in a team that specialised in time series forecasting and combinatorial optimisation problems.

Naturally, I developed skills in this and shockingly also blogged about them! I learned about ARIMA, harmonic regression, meta-heuristics and mathematical optimisation.

You can check out my posts on these topics below.

Over time and through self-learning, I became an expert in these domains, and I would say I know more than most data scientists in these areas.

It also showed me that to advance in your career, you must have some specialism to help you stand out against others. You can know many things, but having T-shaped skills is the ideal way to progress.

T-shaped skills diagram by author in Canva.

My main takeaways are:

Learn how to write production code and deploy your algorithms.
Have an idea or know what you want to specialise in.
Gain some awareness of software engineering principles and best practices.
Change companies if you feel like your skills are not growing.

Year 4

I began this year still at the same company, but I was now promoted to a mid-level data scientist!

As I built trust, I had more autonomy and control over what I could work on. This is really important; the more people trust you, the more opportunities you will get increasing the scope of your work.

If every bit of work you get is delivered on time and to a good standard, then you will gain trust. This applies to all stages of your career, regardless of your position. Naturally, this will lead to regular promotions and compensation increases.

Another thing that really helped me get promoted was being visible. I volunteered to present at every opportunity and shared my work within the company. Going the extra mile consistently is how you standout against your peers.

My soft skills vastly improved, which many aspiring data scientists underestimate. Hard skills get you in the door, but soft skills get you upstairs. Poor analogy, but you know what I mean!

Being able to drive influence within your team and broader business is how you move up the ranks. To do this, you must understand how to communicate with people and communicate your ideas persuasively. Soft skills are what allow you to do this.

They are more challenging to develop because some just come down to human nature, but if you can constantly invest in them over time, they will prove invaluable forever.

Over the year, I became more of specialist in forecasting time series and optimisation by up-skilling myself and carrying out more challenging projects.

I also learned more about machine learning engineering and started to independently deploy my algorithms into production. This skill is essential for data scientists, as you can generate value individually and own the whole end-to-end process.

My main takeaways are:

Be visible to help get promoted. You can do this by volunteering for presentations and sharing your work.
Develop some machine learning engineering skills.
Execute every task to a high standard to build trust.

Summary & Further Thoughts

It’s been a crazy four years, and I have grown a lot technically, professionally, and personally. Data science is such a wide field, and you have many opportunities to guide your career in any direction you want. My main advice is just to keep learning and never rest on your laurels. Like anything, the more you put in, the more you will get out!

Another Thing!

I have a free newsletter, Dishing the Data, where I share weekly tips and advice as a practising data scientist. Plus, when you subscribe, you will get my FREE data science resume and short PDF version of this AI roadmap!

Dishing The Data | Egor Howell | Substack

Connect With Me!

LinkedIn or Instagram.
My YouTube Channel to learn technical data science and machine learning concepts!
👉 Book a 1:1 mentoring call

4 Years of Data Science in 8 Minutes was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Datascience in Towards Data Science on Medium https://ift.tt/iFPXclr
via IFTTT