One of the most important parts of your Data Science job application process is creating a project portfolio.
It is important to show your prospective employer that you have got most of the skills under your belt already. If your resume does not reflect the skills that the employer is looking for, chances are that it would go automatically under the rejection pile.
One way to showcase these skills is through a project portfolio. Project portfolios are a testament to your work and make it easier for the hiring manager to make a case for you.
In this article, we highlight some of the important projects that showcase your capabilities as a data scientist.
- Exploratory Data Analysis (EDA)
If you’re new to data science and are looking for fresher data science jobs, EDA projects make a lot of sense.
It shows that you’re comfortable with telling a story from raw data.
Moreover, the majority of data science is cleaning the data, creating pipelines, and deriving insights from features. EDA projects can help you showcase all of this.
We recommend picking up some messy raw data and getting your hands dirty.
Ideally, you’ll scrape your own data or pull it from an API.
Begin with cleaning the data using any tool of your choice (MS-Excel, SQL, Python, R). The choice of the tool does not really matter as long as you make sure that you’re very comfortable with it.
Define some questions that you’d like to answer and state assumptions. Derive some insights and visualize them effectively.
2. Classification Problem
The second project is one where you create a classifier to predict a categorical outcome.
One of the most famous examples of this would be the Titanic dataset, where you try to predict the probability of a person surviving.
Here is one thing to keep in mind in such projects. Almost everyone would be doing them. Make sure that you personalize your problem statement so that it helps you to stand out.
By using logistic regression or random forest, you can show your confidence in the classification of each data point. This shows that you understand business values as well.
3. Regression Problem
In these analyses, you try to predict a continuous outcome.
One common example is trying to predict the number of views that a certain YouTube video will get. Or how a demographic of customers will order from an e-commerce store.
Defining your evaluation criteria is important here. The result itself varies based on the accuracy of how you’re evaluating it. You may use R squared, root means squared error, or log loss.
Explore different models and see how they perform against the metric that you’ve chosen.
4. Clustering Problem
Clustering problems are focussed on unstructured data.
With this, we use algorithms to understand which data points are related to each other.
Clustering can be combined with EDA or Classification and can help you see relationships amongst the features.
5. Advanced Techniques
The final project is where you should focus on some more advanced techniques and show off your specialized skills!
You may explore NLP, computer vision or deep neural nets.
These can take you out of your comfort zone but these types of analyses are a part of a normal data science toolkit.
In the next few years, if you specialize in one of these areas, I believe it makes you far more desirable in the job market as well.
One example of an NLP project could be building a chatbot or a discord server.
These are some of the projects that can help you land your next data science role. Remember to showcase these on your GitHub, Kaggle etc.
Good luck on your data science journey!