Datascience in Towards Data Science on Medium,

Becoming a Data Scientist: What I Would Do If I Had to Start Over

12/03/2024 Jesus Santana

Breaking into data science: The Good, the Bad, and the Python Bugs

.com/max/1024/ — Photo by Markus Spiske on Unsplash

Martin Luther King Jr. is famous for his speech, “I Have a Dream.” He delivered it at the Lincoln Memorial in Washington, D.C., on August 28, 1963, in front of approximately 250,000 persons. It’s considered one of the most important speeches of the 20th century. It played a crucial role in the civil rights movement for Black Americans.

During this speech, he said that he dreamed of a day when his four children would live in a nation where people will not be judged by the color of their skin but by the content of their character.

I also had a dream several years ago. It was not as glorious or reshaped the course of history as Martin Luther King's. I aspired to become a data scientist.

It wasn’t for the prestige or because it was trendy (and still is) but because I genuinely love working with data, solving complex problems, and leveraging insights to drive business results. Becoming a data scientist was where my unique skills and passions met. You know, that sweet spot that leads to a fulfilling career.

My journey wasn’t straightforward. I didn’t know where to start, nor did I know what to do next. I took various courses, many of which turned out to be unhelpful. I also read countless articles about data science. While becoming a data scientist requires hard work, I spent a lot of effort on things that ultimately weren’t necessary.

I wish someone had given me the guidance I’m about to share with you. This is the purpose of this article. The good news? Following these steps won’t guarantee a job as a data scientist, but they will significantly improve your chances… even without a PhD! I know several professionals who have excelled as data scientists without doctorates. Success in this field is mainly about persistence and practical experience.

Start Somewhere, Start Now

“The beginning is the most important part of any work.”

— Plato

Research shows that a toddler takes about 14,000 steps and experiences 100 falls per day over 2–3 months before mastering walking. Yet, they persist, never considering giving up.

In contrast, as adults, we often do the opposite. We tend to abandon as soon as we encounter obstacles. Where an adult might see 100 failures, a baby sees 100 learning opportunities. The baby doesn’t overanalyze its failure or overcalculate the risks. It simply starts, tries, falls, and tries again!

Consider the story of Justin Kan, the co-founder of Twitch. His entrepreneurial journey didn’t start with a blockbuster success. It began with what he called a “shitty first startup” named Kiko, an online calendar app. Kiko was competing against giants like Google Calendar, but it was eventually sold on eBay for $258,100!

Next, he launched Justin.tv, a platform where he live-streamed his life 24/7. Justin.tv eventually became Twitch, a live-streaming platform focused on gaming. In 2014, Amazon acquired Twitch for $970 million!

As Justin Kan stated, “Don’t wait. Go build your first shitty startup now.”

This advice applies to your journey into data science as well. Start somewhere. Begin your learning process now. Even if your first attempt feels “shitty” and you’re unsure of where to start, it’s okay. You can build upon your initial efforts, and nothing prevents you from adjusting your direction as you progress. You need to start now and somewhere.

So… Where Do I Start?

The Cathedral of Beauvais in France was intended to be the tallest cathedral in the world during the 13th century. Its ambitious design pushed the limits of Gothic architecture. However, one notable collapse occurred in 1284 when the choir vault fell due to insufficient foundations and structural support. It remains unfinished to this day.

This serves as a strong analogy for your journey into data science. You may be tempted (we all are) to dive directly into the exciting parts, such as deep learning models, LLMs, or the latest machine learning frameworks. But like the Cathedral of Beauvais, your ambitious plan could fail without a solid foundation. Learning the basics first is crucial to ensure your knowledge is robust enough to support more advanced concepts.

Mathematics: Your Universal Language

Think of mathematics as the language of patterns. There is mathematics everywhere. And honestly, if you don’t like mathematics, perhaps a career in data science isn’t the right choice for you.

You don’t need to become a mathematician, but you do need to understand the following key concepts :

Linear algebra (matrices, vectors, etc.): Think of matrices and vectors as the language in which data communicates. Understanding these concepts allows you to manipulate data structures for machine learning algorithms.
Calculus (differentiation, integration, gradient, etc.): They are essential for optimizing models, like gradients in training neural networks.
Statistics (distributions, descriptive statistics, etc.): This is where you learn to interpret the stories data tells. Understanding concepts like distributions and descriptive statistics allows you to make informed decisions based on patterns in data.

Diving into Programming

With your mathematical foundation in place, programming will bring your ideas to life. While some will argue to learn R in data science, Python stands out for its versatility and widespread use in the industry. Furthermore, most people I know use Python. It will be more than good enough for most use cases. Focus on:

Basic syntax and functions: understand how Python works at a fundamental level. It’s like learning an alphabet before writing stories.
Data structures: lists, dictionaries, tuples — know how to use them. It’s crucial for handling real-world data.
Control flow statements: master “if statements,” “for loops,” and “while loops.” These allow you to implement logic that can solve complex problems. With simple statements, you can accomplish much more than you think!
Object-oriented programming: understand the concept of classes, functions, and objects. This allows you to write efficient, reusable code. It also facilitates collaboration with others.

SQL: Your Database Language

Data is often stored in databases that you need to access and manipulate. SQL is your language to interact with this data.

Interacting with databases: Learn basic SQL commands to retrieve, update, and manage data.

Machine Learning: Turning Data into Insights

Next, you can move on to machine learning after understanding mathematics, programming, and data handling. Focus on:

Understanding algorithms: start by learning algorithms like linear regression, decision trees, and clustering methods. These are the basics for more complex models.
Supervised vs unsupervised learning: understand the difference between these two core types of machine learning. Supervised learning involves training models with labeled data, whereas unsupervised learning involves unlabeled data.
Model evaluation: Learn how to assess the performance of your models using metrics like F1 score for classification models, word error rate for speech recognition, or RMSE for time-series analysis.
Feature engineering: It’s the art of transforming your raw data so your models can understand it. Often, this makes more of a difference than using a fancy algorithm. You can see an example here.
Libraries and frameworks: Familiarize yourself with popular Python libraries for machine learning, such as scikit-learn, TensorFlow, and PyTorch.

Remember, machine learning is not just about applying algorithms. It’s about understanding the problem you’re trying to solve and choosing the right approach.

Business Sense: Turning Technical Skill into Business Impact

Many people contact me about starting a career in data science. They typically have impressive qualifications, such as Ph.D.s and a strong background in mathematics. However, even with these impressive credentials, many struggle to break into the field. The reason? They lack business sense.

Technical skills are essential. However, here’s the truth. The best AI model will have a 0$ value if it doesn’t solve a business problem. I’ve seen brilliant data scientists fail because they built sophisticated models that no one used. The key? Learn to think like a business owner.

For instance:

Translating business problems: Instead of just building a predictive model, you should ask, “How does this model support decision-making within the business?”
Prioritizing impact: Focus on problems where data science can generate the most value rather than pursuing complex solutions that don’t solve a business problem.

Focus on the Essentials

Vilfredo Pareto was an Italian polymath who contributed to multiple fields, such as economics and sociology. One of the concepts he is known for is the Pareto optimality. It describes a situation where resources are allocated the most economically efficiently, so no one can be made better off without making someone else worse off.

However, the most famous observation he is known for was while studying wealth distribution in Italy. He discovered that 20% of the population owned 80% of the land. He also noticed the same pattern in Prussia, England, France, etc.

This observation led to the formulation of what we know today as the Pareto Principle or the 80/20 rule. In other words, 20% of the causes are responsible for 80% of the effects.

For example, in business, it’s often observed that 80% of sales come from 20% of customers. In quality control, 80% of problems are caused by 20% of defects. In the workplace, 20% of our tasks contribute to 80% of what we deliver. We tend to use about 20% of what we own 80%. And the list goes on.

The same idea applies to your journey of becoming a data scientist. Instead of trying to master every possible topic, focus on taking just one course for each key area: mathematics for data science, Python, SQL, machine learning, and business analytics. That’s it. Focus on the core 20% of skills (or even less), yielding 80% of your results.

Remember, don’t get caught in the trap of “tutorial hell,” where you constantly consume new content but never deeply understand what you’re learning. Becoming a skilled data scientist is mostly about gaining experience, like any other job. It’s applying what you’ve learned to real-world projects.

When you don’t understand something, search for it, learn it, and then return to your project. Repeat this process to reinforce your knowledge and skills as much as required.

Create Your Own Work Experience

“Experience is the teacher of all things.”

— Julius Caesar

After completing the basic courses, enhance your skills by applying what you’ve learned to real-world projects.

Building expertise in any field requires significant dedication and practice. Ericsson, Krampe, and Tesch-Römer's study highlighted that developing expertise in any field typically requires around 10,000 hours of deliberate practice. Elite performers, such as concert musicians and professional athletes, often dedicate around four hours of focused practice per day to perfect their skills.

The same principle applies to data science. Mastery doesn’t happen overnight. It requires consistent effort and experience. By dedicating time daily to apply what you’ve learned and solve real-world problems, you’re moving closer to becoming an expert in the field.

Ok… But How Do I Gain Experience?

It’s simpler than what most people think. Yet, many get paralyzed trying to figure out the “perfect” starting point. As I said earlier, the most crucial step is to start now and somewhere. It’s okay to make mistakes and adapt your approach as you learn.

Your professional background isn’t a limitation, even if it’s not in data science. It’s quite the opposite. It’s an asset.

Every field, whether marketing, healthcare, finance, or law, has problems that can be solved with data. A marketer might analyze customer engagement patterns. Someone with a finance background might want to forecast the stock market.

I once advised someone I was coaching with a background in finance. The person didn’t know where to start. I advised him to create an ARIMA model to forecast Canadian housing prices (ARIMA is quite a simple model).

It was nothing groundbreaking but real and relevant. Not only did it leverage his domain expertise and technical skills, but that person was focusing on a topic that was high in demand (Canadian housing prices).

If you’re still unsure, start with something you genuinely enjoy. This is the key. When you're truly interested, you will most likely go through those 10,000 hours of practice we discussed earlier. You’re also more likely to approach challenges with determination and view setbacks as learning opportunities rather than a reason to quit.

It can be anything. If you’re an artist, you may use computer vision to analyze visual patterns or create generative art with neural networks. A healthcare worker may want to predict patient outcomes. Someone in environmental science might model climate change impacts using large datasets. The list goes on.

If possible, consider using Large Language Models (LLMs). It’s definitively not mandatory. However, LLMs have become popular recently, especially after ChatGPT’s launch in late 2022. Companies are rapidly adopting them. It offers a fantastic opportunity to develop expertise in a cutting-edge field.

There are several frameworks to build an application using LLMs. One of them is LangChain. But again, LLMs should complement, not replace, your understanding of basic machine learning. If you find LLMs too complex, start with something simple.

Once you’ve built something, share it with the world. Write articles on Medium or publish your code on GitHub. It will showcase your work. Start with a basic model or project. Then, iteratively enhance it.

For example, you could start with a simple ARIMA model to forecast housing prices. Then, you could switch to a more sophisticated multi-variate model (like a transformer-based time series model). You could incorporate features such as interest rate, income to debt, and unemployment rate. Finally, you could compare that model to your baseline.

As you incorporate additional features or refine your algorithms, update your GitHub repository and write follow-up articles on your progress. It demonstrates your skills and commitment to continuous learning. It’s one of the best (if not the best) ways to learn and showcase your capabilities.

Conclusion

Thank you for reading the article! Again, remember. As Voltaire wisely said, “Perfect is the enemy of good.” Just start now and somewhere. You don’t need to wait for the perfect project or idea to take action. As you gain hands-on experience, it will become clearer what your next steps should be.

Liked this article? Show your support!

👏 Clap it up to 50 times

🤝 Connect with me on LinkedIn to stay in touch and discuss opportunities.

Becoming a Data Scientist: What I Would Do If I Had to Start Over was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Datascience in Towards Data Science on Medium https://ift.tt/J45HBSt
via IFTTT