This world has always been filled with wonders. And the best thing is- new fun things keep popping up. Some find the new species of bees incredible, while others find the development of carbon nanotubes quite fascinating. If you are in Team Data and find it the most fanciful, then let us explore how you can start your adventure in the realms of data science.
Here is the checklist-
Make up your Mind: First things first, ask yourself why you want to learn data science? Have a clear answer in mind, and if possible, write it down. Why? Because at some point in your learning or job hunting, things are going to get tough. And you will need a rock-solid reason to continue. It is fun to learn, but hitting those complex headshots in “Call of Duty” has already been more lucrative.
If you don’t have your reasons clear, we have two for you. First, it pays well. Like really well when compared to its counterparts. With a mean salary of above 100K$, it is worth putting your bet on.
Two, all the cool kids are doing it. All the brilliant people have this itch to create new things. Especially, they want to put those proven theories into practical beauties. The same has applied to data science leading to application in different fields, changing old practices, and creating new areas. If you have that itch, start scratching that data science surface.
Third, there are no barriers here. You, me, our grandmoms, our exes- anybody interested is invited to this parade. Even people who say they have two points and write three are welcome.
Find a guide: To venture into the unknown is troubling. Even for the bravest of all, it is very uncomfortable. And finding a good tutor is a time-tested technique to learn any new skill- be it driving or Data science skills. We should all very carefully chart out the map of our learning. Give adequate time to choose from the offerings. Take trials or whatever. But once entirely sure, we as learners should not second guess our decision. Again, choose a good course and stay with it. Be accountable for your decision and learn with dedication. We also have a bouquet of courses to offer at InfosecTrain, but please keep in mind that you can chart your journey yourself. Find individual courses on skills that you deem valuable. To decide how to weigh and compare all the offerings out there, read on.
Learn the base skills required to roam that place: What skillset do you need? Who decides what the best tool is? The answers to these are simple- The industry. Julia as a language has advantages, but you will hardly ever come across someone suggesting you start with it. Reason- the industry has almost no demand yet. The people who pay us decide what we will work on and learn to do it. The industry could very well decide that we should also know Latin to be data scientists. And when that time comes, we will be paratus.
The basic skills are-
Programming language: Most of us know how to tighten a screw. We also know how challenging a task is without a screwdriver. A programming language is pretty analogous. It is not just a screwdriver tool. It is the complete toolkit. Yes, any language will empower you with the essential/ basic abilities to tackle data. The abilities are:
The first problem is most of us are a bit afraid of learning a programming language.
Let me tell you a secret! Programming language is just English written differently. If you can write down the steps of doing work on paper, you can write the code for it. Yes, to use a language efficiently, you need only three things:
With just the three things, you are golden. Learning to program can feel a bit tedious at first. But give it a week of your life, and you will be amazed at what you can do and how easy it was to do it.
The second problem is to decide on the language of your choice. For that, start with either Python or R. Which of the two? Anyone you like. The basics of all programming languages have been the same. For example, the way an IF statement is written can be a little different, but almost all languages have it. Similar is the case with R and Python. Learn one of them and start working. My personal favorite to start is Python because it is also widely used in other technologies like web development. Let me tell you another secret! Once you have learned a language, learning another is a piece of cake (butterscotch flavored)!
Mathematics: If any of you hate maths, welcome to the club. Most of us do, with all our heart. But the nice thing about data science is that you only need elementary school-level maths. If you can write formulas in an Excel file, you are 70% set. About the rest of 30%, it is like a new language. Take the Spearman’s Rank correlation coefficient. It is a big word to denote how much two things correlate. Like if one thing increases, by how much will the other be affected. If you are bad at maths, just have a little courage as most of what you will come across are big words for really simple terms. Integration Calculus is actually just a fancy way to find an area.
Again, have a little heart when you come across complex terms. It is almost never as difficult as it sounds.
Databases and Big data Skills: Have you ever wondered what brought about this rush of data science and data-centric roles? Why did humanity not do this earlier? Because we couldn’t. Never before was it possible to store so much data, much less process it. What takes up a few kB of space as an excel file on your system is used to take up the space equalling a box. But now, each of us is generating millions of KBs of data. And excel is not suitable to store it. So we have databases.
They are just neat ways of storing data. They are specially designed to have less space, be fast while writing, reading, or processing data. What will take you hours to look up in excel can be done in a few seconds in a database. And you can convert it into any file you want- an excel, a CSV, or even an mp3!
The beauty of Databases was that they were designed to be easy to work with. And that is why they came up with ‘SQL’ (Structured Query Language). When I first heard the name, it sounded dreadful. But when I first read the lines, I couldn’t stop smiling. It is almost English-like, has just four types of operations, and is easy to learn. Of course, people complicate it by saying things like DDL, DML, etc. But to be true, you need to learn just six types of codes like a select statement, joining table, writing simple subqueries go a long way. You can slowly level up your skills in SQL by learning to write Stored Procedures and using Window functions. But start small and have fun.
What makes ‘The Big Show’ big? It is his sheer size. Similarly, ‘Big data’ is an extensive database. The data is much larger (several Terabytes or Petabytes) and stored on specialized hardware using algorithms for – you guessed it right—reading, writing, and processing data. Think of it as a normal database that took steroids for many years and became really big and fast. What we are interested in is that even that is based on SQL. Yes, if you know SQL, you can tackle Big Data as well.
There are other databases out there. But you don’t need them to start. Start small and build up only on useful things later.
Data Visualization: Every dataset (fancy name for raw data) has a story to tell. And your job as a data scientist is to listen to it and tell it to the world, especially to your client and boss. And the best way to tell a story is using pictures and visual representations. And that is all there is to Data Visualization. You take the filtered data and create charts out of it. Small Maps out of it. Or even a dashboard(a collection of charts, maps, and other things).
The most astonishing thing about creating beautiful visuals is- there are many ways to do it. You can do it using the programming language you learned. Or learn the cool and in-demand software tools like Tableau, Microsoft Power BI. Visualization tools have started to add to their capabilities by adding extensions for your favorite languages. What this means is that instead of clicking and dropping things on screen, you can write an R code for it (and make it automatic later without telling your boss about it).
Machine Learning and Deep Learning: Finally, it is time to address the elephant in the room- Machine Learning. It is the ability to write a piece of code that allows a computer to learn from already existing data. After learning (or training), the machine can approximate customers, recognize cats in the photos or drive cars.
Spooky and scary at the same time. It fills us with hope and self-doubt whether we can really write this type of code. This is time to reveal another secret. The programming language you learned already has special packages like sci-kit learn to do this. Most of the work actually goes into working with data, cleaning it, and compiling it. We already have algorithms like Random forest, which takes a few lines of code to generate fairly good Machine Learning models.
The field of machine Learning diverges into many others. And its most famous stream is Deep Learning. Deep learning is specialized Machine Learning, where the logic behind the code resembles how the neurons (brain cells) work—no need to learn biology for this either.
Though I took many words to describe it, the way to be a data scientist is not that hard. All we need to know is a few things:
It really isn’t that hard to start. Big words, odd-looking equations, or odd-looking code, take a few deep breaths and look at it again. Most of the time, you’ll understand what is happening on that second look. Other times, you can Google for help.
Once you have the basics down, you can set out on your own. Do fun things on your own – Compete against the best on platforms like Kaggle, get a new job in the field, or just small chatbots to converse with yourself when you feel lonely.
And once you have something to call your own, share it with the world (unless your work is confidential). The appreciations build confidence, the rejections give lessons, and the conversations make connections.