Abhijit Annaldas

For the love of data and machines that can learn!

GitHub | Kaggle | Twitter | LinkedIn | Feed

Building a Data Science Portfolio


Having a good portfolio is very important to an individual’s success. It brings opportunities, helps get in touch with great people. Networking/new connections can bring in lot of mutual learning. People with like mindset, those who have worked on similar problems will get in touch. It’s a win-win for everyone.

I would recommend doing lots-n-lots of hands on projects. If it’s beginner level, having different kinds of projects/datasets/problems helps in maximizing learning. If it’s in the intermediate/expert level or about specialization, doing a lot of different kind of projects related to the specialization under consideration helps strengthen skills.

Secondly, as you work and gain expertise, you will build your own arsenal of code snippets that you might see yourself reusing often. Consider spinning them out into tools/libraries to give back to the community.

Once you feel comfortable, start competing in hackathons. There are several opportunities online for all levels of expertise. One of the most notable is Kaggle.com. Start working on the challenges over there.

GitHub is no doubt a nerds portfolio! Consider pushing most of your work to GitHub. If you don’t want to push your work to GitHub public repo, consider writing about that in your blog.

Some tips for good presentation of repositories (portfolio)…

  • Each project should have a different repository (needless to say, but I’ve seen people stuffing code into same repo with a blanket name)
  • A neat and short ‘read me’ for each repository explaining a problem statement and the solution in short, preferably a single page at max.
  • Apart from code and introductory read me, document the solution approach in detail. The purpose of this is to show how the solution was built. It should include…
    • Problem statement
    • Info about dataset
    • Visualizations of data
    • train, cross validation, test and predict performance charts
    • accuracy, metrics and results
    • closing notes - challenges faced, possible enhancements, etc.
  • If code uses a jupyter notebook, code and the solution approach detail can be neatly presented together.
  • Last but not the least, keep sharing your knowledge through a blog (as I’m doing!)

All the best!

Abhijit Annaldas
avannaldas [at] hotmail [dot] com