Machine Learning Jobs for beginners/freshers : My Perspective from various Quora Answers

I will collect some of my answers here regarding fresher Machine Learning jobs in India as I have some experience in this domain and hire in this domain quite often

Q. Are data scientist jobs only a fake hype in India?

Originally answered here: https://qr.ae/pNcnZU

In my limited experience, I have been a part of teams that worked on Data Science and Machine Learning applications analyzing Terabytes of data and making millions of dollars, started a AI startup and have raised money for it and have built a research group for the startup that has published many new cool techniques in Deep Learning. All this while staying in India and being an average middle class Indian.

So, I think my life experience says good opportunities do exist in India for average Indians. You need to reach the right companies and right positions and make some compromises sometimes. (and be a bit lucky !)

Q. How hard is data science and machine learning to get into for a non-coder?

Originally answered here : https://qr.ae/pNOTBM

Very. I would recommend learning programming first.

As of now, there is nothing useful in these fields you can do without programming.

Q. What should I do to get a job in Machine Learning in India ?

It’s a simple rule, work on an interesting project, get some results and put on display (Google, kaggle etc). Most employers can gauge your interest when they look at your profile. An entry level Machine Learning Engineer who can explain the Maths behind the algorithms they apply in some non-trivial problem is good enough for a lot of companies, they are not expecting seasoned veterans.

The following types of projects look very impressive:

  1. Taking a research/kaggle competition and working on its dataset. The code to handle such problems is a good proof of ones capability. The other good way is contributions to open source ML. The more you know what works behind your solution, the better.

  2. Taking a paper (or taking a set of papers by a research group) and trying to code the algorithm/replicate the results is even better than 1 (at least for us at ParallelDots).

  3. An academic/industrial publication is even better.

The following is what is not enough:

  1. Forking code for Bicycle challenge or other such competitions (the Titanic one as well) and running it and submitting that as previous work. They are too many tutorials for them and people won’t take you seriously.

  2. Completing Andrew Ng course.

  3. Unless you are applying in a very big enterprise, I will say avoid certification and try working on personal projects. That is way more impressive.

  4. Again, Please don’t just fork code on Github and expect employers to believe you have worked. Its easy to see how much you have worked. :(

HTH.

Answer is here: https://www.quora.com/What-should-I-do-next-for-getting-a-job-in-Data-Science-and-Machine-Learning-in-India/answer/Muktabh-Mayank?srid=qie

Q. Is it not possible for a fresher to work as a Machine Learning Engineer ?

It is not “not possible” for a fresher to join as machine learning engineer if he she is very good at it. Wrong conclusion.

I have seen freshers taking role of machine learning engineer at enterprises and even my startup. The problem is “he she is very good at it” part . Freshers often (not always, I know freshers with CVs as good as a professional ML engineer) over estimate their CVs as they don’t know about how real world works , they have very high expectations due to that and real world is unable to meet that.

Answer is here: https://www.quora.com/Why-is-it-almost-impossible-for-a-fresher-to-join-as-a-ML-engineer-even-though-he-is-very-good-at-it/answer/Muktabh-Mayank?srid=qie

Q. What is the learning path after going through basic Machine Leaning material ?

Does one do projects, go for a summer training in Hadoop in Delhi, read books ?

(Please see link to blog posts below to find all these books)

IMHO, projects are the best way to learn (as you said you want to become a factory data scientist and are not in a mood to pursue a higher degree ). Taking different datasets for different types of problems and trying to come up with a data processing pipeline and then applying Machine Learning algorithms on top of it helped to clear many concepts for me. Its way more fun than all theory. Look at UCI or Kaggle datasets to apply your skills.

1. From where should I do summer training in big data hadoop(Delhi)? (or in any other specific technology)

Dont. Hadoop is just a framework to write programs. If from starting years of your career, you start concentrating on technologies (hadoop / mysql / postgre / java / python / scala and stuff like that), you will end up becoming a <specific technology> professional (say hadoop professional). Technology trends come and go, unless the fundamentals (which here is distributed computing) behind them is clear, one would never be able to adapt to the changing market . As long as you know programming basics well, stuff like Hadoop would be easy to pick up when your job requires it. You can try triggering a mapreduce job on your local system with help of an online tutorial, that is a different thing, but taking a full on course on hadoop is neither required nor recommended.

There are some other points I would like to specify as well:

a) Hadoop is for handling very large amount of data (Terabyte+), which most places do not have. Unlike what the internet hype (and plethora of hadoop training institutes in India want you to believe), most companies still have small data which can fit easily on a postgres cluster if not one server. Data Analysis skill is required almost everywhere, Hadoop at best in few 100 offices in India.

b) Apart from HDFS, most parts of hadoop ecosystem are already being replaced/renovated by the innovation leader companies. Google (the inventor) phased out mapreduce The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google, projects like mahout already seem to be moving away from hadoop and a lot of companies seem to be moving towards Apache Spark™ - Lightning-Fast Cluster Computing . Now by the time you become professional, spark might be the talk of the town, in 10 more years even it might be removed.

I would rather advice you to read books like mmds, foundations of data science, ISLR, Deep Learning etc. All of them are freely available on internet. Save your money and time (reading them now than you would read them after you graduate).

Edit : Adding the link to books (will rather link the blog post at ParallelDots we made collecting links to all these books, most of them freely available)

Free Machine Learning Books

Free Data Science Books

Free Deep Learning Resources

Must Follow Blogs

Answer is Here: https://www.quora.com/What-should-be-the-learning-path-after-going-through-basic-machine-learning-material/answer/Muktabh-Mayank?srid=qie

Q. Do companies hire data scientists with zero internship experience?

Yes. Doing an internship is not necessary to work in Data Science.

But most companies won’t hire Data Scientists with no experience of solving real world problems.

I would say anything out the following gives you equal (or more) weightage than internship at a tech startup.

  1. A bronze medal or better at Kaggle,

  2. A research project under an academic that resulted in a publication at a decent venue,

  3. An internship doing a proper AI project (which is to be deployed within the company), (whose alternatives we are discussing),

  4. Contribution to an open source project,

  5. Some other way in which you have worked in a real world setting optimizing for different constraints.

The bigger companies working in Data Science, maybe like Fractal and Manthan, might be hiring absolute freshers too, but most companies in this space are small startups who cannot take risk if hiring absolute freshers and training them.

Original Answer Link: https://qr.ae/pNnch5

Q. Is it true that maths or statistics isn’t required at all in data science while programming skills are required? My friend told me the same thing. He said that’s why a majority of data scientists are from BTech backgrounds, not from maths or stats.

Any respectable Engineering program (BTech) has a sizable number of Maths and Statistics courses. At BITS Pilani, I had coursework in Vector Calculus, Statistics, Linear Algebra, Differential Equations, Optimization and Operations Research (each one was a separate 1 semester long course) which every Engineering student had to take. On top of this there were Computer Science specific subjects (Data Mining, Machine Learning) which taught the Maths and Programming part both to everyone. The theory taught in these many courses is frankly enough to read most Data Science papers and understand concepts (You might need to read a couple of tutorials here and there).

So the hypothesis that Engineering students aren’t aware of Maths and Stats is a wrong one.

Now about Data Scientists not needing Maths skill, data scientist as a job description is very broad just like software development.

You can say that Software Engineer doesn’t need to know about Databases, and that will be true for many people: Software Engineers who write Operating Systems, Software Engineers who develop frontend applications, Software Engineers who write compilers and below par Software Engineers who work on web application backends.

There are similarly many job descriptions for Data Scientists. Some of them actually don’t require too much Maths and just programming, but most of them do require some basics of the above mentioned mathematical areas.

Original Answer Here: https://qr.ae/pNnctE

Q. My answer to “What are the mistakes people make when they start Machine Learning?”

Originally Answered here: https://qr.ae/TiNJou

Some mistakes according to me:

  1. Focusing too much on math in the initial stages. To train your first Neural networks, you don’t need to know in detail how backprop works (backprop derivatives of all ops for example), or you don’t need to understand the support vector derivation to train SVMs. Dont start reading complicated mathematical resources in the beginning, it makes learning very slow. Touch on these topics when you are somewhat experienced writing code and training algorithms.

  2. Focusing too little on Math is similarly bad too. Not knowing what different parameters to a Conv layer in Keras signify is also sub optimal. A basic book on Machine Learning with theory is best starting point.

  3. Practising less. This is what differentiates between good and excellent practitioners. Most excellent practitioners can think of 100s of ideas around a dataset and can iterate quickly on them. That is how accuracy on a dataset will go up. Other practitioners will waste time in thinking about what would be the perfect method and code their “one best” method in hours, which will mostly not work in the end. You cannot think in advance what should be the method for best accuracy, EDA and trail-and-error is the key. Remember during practicing “Getting average accuracy on multiple datasets << multiple round of iteration to get good accuracy on one dataset”.

  4. Not focusing on basics numpy and pandas. As I said earlier, you need to iterate over many ideas quickly rather than thinking of “one true idea” that will work. Its grit boring work. To make this quicker, good command on Numpy and Pandas help. Lesser number of Google searhes == more code. Tensorflow/PyTorch have been purposefully written close to Numpy to make sure that Numpy users can iterate quickly.

    1. Q. What are some signs to recognize inexperienced Machine learning engineers ?

Original Answer Here: https://qr.ae/pNncvm

In my view most inexperienced Machine Learning people (including me say 6–7 years back) focus more on algorithm than data.

Newbies want to try out all the 250 (dummy number) algorithms on the dataset they have got without EDA on the data itself.

Changing the algorithm will generally give what 2–3% (again dummy number for giving an idea about magnitude) gains in accuracy, while, arranging/ balancing/ feature engineering/ augmenting the data can give manifold accuracy gains.

Machine Learning is not yet a cool art where you summon a Charlizard and then it will be burn the competitor to the ground. It requires grit, getting dirty with the data and understanding what is the algorithm learning through many boring iterations.

Q. I just hired an unqualified machine learning engineer, he know only some basic Tensorflow and have no idea about the maths and build more complex neural networks, should I fire him immediately or give him 2 months to improve?

Originally Answered Here: https://qr.ae/pNncrn

Why did you hire an “unqualified” person ? Don’t you have a job description and interview process ? If there is such a huge gap between the hiring and requirement, I think its a problem with the company leadership and process. You should understand who you want to hire for what exact profile before you even put the JD out.

What you should do next is dependent on what is expected of the employee in the long term.

Do you expect him to do something Math(y) ? I don’t think they will be able to learn in 2 months. OTOH, 99.9% of the companies won’t need people who can write “new and complex” neural networks. Its mostly about transforming the data well, using clever training methods and loss functions etc. That said, even to gain enough experience for this, they will need more than 2 months.

The general observation I have is that a lot of companies don’t have clear specs and requirements and they expect their Data Scientists to weave magic. Then they won’t hire senior people who can tell what is possible and what is not and put the entire pressure of expectations on cheap “fresher” employees, who generally have no real world experience (and many times will overestimate their capabilities, thus signing into something they cannot accomplish). This type of org structure is a house of cards. Hope this is not the case with your organization.

If the work is running different open source models and experiments, 2 months is a decent time to learn and apply. If your expectation is in a domain , which is well worked and researched in, 2 months should be a good time for a less experienced person to catch up. If the expected work is this much, the hiring strategy is also not wrong and this is the general capability of talent available in the market. Groom them a bit for 2 months and everything should work out. If the expectation is any more open ended, you have made a wrong hiring decision.

Q. Will the talent shortage in AI end soon?

Originally Answered Here: https://qr.ae/TSkAzo

There is no shortage of AI talent as of 2019 end. Too many people in India know basics about AI and can be trained to work. I am not sure if there is a way to see people applying for AI positions but if there was a way, you could clearly see that way many people apply for all AI/Data Science job (at least here in India). There was a time when just knowing some basics got you a job but no more now.

The problem has now shifted to retaining top talent as too many people in the world are applying AI and working abroad is a big deal for most Indians. So the situation is a lot more like Software Development profiles now for most AI profiles too.

There is a huge pool of talent and there is arms race for retaining top talent.

Q. Will data science and machine learning get automated leading to lesser opportunities for data scientists by 2025 ?

This blog post is also an answer by me on Quora here.

Yes. (2025 is not the date I think its going to happen, but its inevitable and will happen in near future). They will be automated to a good extent. So will be Software Developers, designers, manual workers, teachers, linguists, musicians, game developers etc,etc. There are already rudimentary projects like Turning Design Mockups Into Code With Deep Learning which can turn a design mockup into HTML/CSS code, carpedm20/ENAS-pytorch which can design neural networks without a Data Scientist, Why AutoML Is Set To Become The Future Of Artificial Intelligence , system which can generate new characters for games, Microsoft AI can translate Chinese to English just as accurately as humans , [Baidu’s Deep Voice can clone speech with less than four seconds of training Computing](https://www.computing.co.uk/ctg/news/3028065/baidus-deep-voice-can-clone-speech-with-less-than-four-seconds-of-training) and multiple such projects.

Video Link : https://youtu.be/XOxxPcy5Gr4

AI will impact every job profile which exists as of now, Data Scientists no exception, automating some or a lot of work people spend their time on. For a lot of time these systems will become a <Man + Machine> AI systems rather than just a Human working before stuff is totally automated. So its not like everyone becomes redundant day 1, but they will eventually.

Full Automation of any field is going to take way longer than 2025 IMHO. That said, yes a lot less people will be needed for the same task as of today. Then what will people do you ask ? newer more complex tasks.

Automation is not a new phenomenon. Think about the railway breaks a long time back:

Video Link Here: https://youtu.be/EEUkmP2nyxo

So much work was once needed to just stop the train. A lot lesser work is needed today to run/stop the train and not just that, slowly trains are moving towards full automation, but right now they are in a <man + machine> stage. A lot of jobs will stay in this phase for sometime before full automation kicks in. But unlike railways, which take generations to move from one stage of automation to another, AI is causing changes at a very high rate.

What is the effect on a general Data Scientist (or any white/blue collared worker for that matter):

  1. Adapt to AI. Automation has started, but AI aided jobs will stay for a few more years than non-AI jobs. So while no plain X jobs by 2025, X + AI jobs might be around till 2030.

  2. Things wont be like earlier generations where one skill learnt gets you a job for entire life. One needs to be open to learning new skills and get started in the middle of life.

  3. Average level of education needed will be high. Think of it, 50 years back “High School” was all the education needed for most jobs. Now its somewhere between high school and graduation. Masters and Research might look like the next frontier, but these degrees are too slow and broad. Coursera like courses will become more important in catching new skillsets. You might already see people doing that a lot.

  4. With more uncertainty in jobs, millenials probably will want to be a less “spend-y” and more frugal. You can see things happening already 6 reasons why more millennials aren’t buying homes and will actually increase. AI is just a trend in a longer cycle of Automation and millenials are at a point in history where education+society was according to old norms but automation has reached a point where jobs have become uncertain. Younger people will be smarter.

    1. Q. Why are Kaggle Grandmasters in a great demand?

Because they work really really hard on Applied Machine Learning (that’s what kaggle is) and thus have become really good at their job. If there is a hard Machine Learning problem with Billion dollars behind it, with a high probability, Kaggle grandmasters will be able to solve it better than an average joe.

Working consistently on something for years makes one a master of the art. Outliers (book) - Wikipedia

People really good at their job (say in top 10%) are really sought after in any field, not just Machine Learning. They drive the innovation, solve open ended problems and hence they get the rewards.

  1. ###

    Q. How do I become a data scientist in 2020 in India by self-teaching?

I originally answered the question on Quora here: https://qr.ae/TmYc20

I don’t think the path to being Data Scientist in India is different from being a Data Scientist abroad. It is in fact 99% same if you live in a place with good internet connectivity and understand English.

You have to clarify to yourself the following:

  1. 1. As a self-learner, determine if you like learning from books or like learning in virtual classrooms.

    2. Either way, you will have to dedicate around 10000 hours of hard work to learning and practical exercises. It looks simple, but most people lose out here. They do not put enough effort.

    3. Point 2 above requires self drive. It is not easy. A proxy to that is buying online courses / books. The money you put into them makes you (and probably your parents who drive most people towards learning stuff) have skin-in-the-game and you wont want to lose out. If you arent that wealthy, you will have to push yourself.

    4. You have to invest in a good computer. (somewhat 60000 INR cost).

    5. You need to have good internet.

That is all. There are enough resources available for free to learn and make yourself a good data scientist.

  1. 1. Learn Python programming. This is first and foremost. There are many free books and online courses to learn Python (sometimes Python for Data Science specifically) if you aren’t enrolling in a course. When you are doing a course or reading a book, don’t just read it, force yourself into using python after you are done learning. If book exercises look boring to you, join a startup near you as a free intern and do some coding for them. Forcing yourself to write 1000 lines of code (made up number) is very important.

    2. Make yourself comfortable with Python ecosystem : Numpy, Scikit_learn, Keras, PyTorch, Pandas. There are courses and free books available to do these. Look for their ratings on goodreads (for books) and course stars. Best is to just do the famous courses [Applied Data Science with Python Coursera](https://www.coursera.org/specializations/data-science-python) , http://deeplearning.ai and [Machine Learning Coursera](https://www.coursera.org/learn/machine-learning) . If you are looking for free books, check out some lists I curated on ParallelDots blog: https://blog.paralleldots.com/data-science/50-must-read-free-books-for-every-data-science-enthusiast/ , https://blog.paralleldots.com/data-science/24-best-and-free-books-to-understand-machine-learning/ , https://blog.paralleldots.com/data-science/deep-learning/free-resources-deep-learning/ , https://blog.paralleldots.com/data-science/nlp/free-natural-language-processing-resources/ . Force yourself to write more and more code. Do problem sets from courses, exercises from books, work for free in a startup, enter analyticsvidhya competitions or whatever suits you. Unless you practice after you learn, its not going to work.

    3. This much will make you employable by many companies. If you still are trying to push to learn more, you will have to start reading research literature. This is not entirely necessary, but if you still want to, you should get yourself associated with some decent university group (or a research startup), where novel problem statements are being solved. This step is really hard without talking to peers, but if you are really self driven, Twitter is a good place to follow people and handles to learn about research while being outside the research world. Becoming an Independent Researcher and getting published in ICLR with spotlight

    1. Q. What signs will tell you that your company is not taking data science seriously?

Originally answered here: https://qr.ae/pN28BW

Note : I am assuming that this question was asked to understand ways to judge a potential employer by an applicant who wants to take a job as Data Scientist. What might be something that is risky. There might be other ways to look at it, but this seems most plausible source of the question to me.

Well, if your company has hired people as Data Scientists, there is definitely a vision to derive value out of data. Funds were allocated, someone actually made effort to hire Data Scientists and (hopefully) a set of problems were defined to solve. If your company is not doing all these 3 (allocation funds, hiring data science team and defining problems to solve), you can safely assume its not going to implement Data Science in near future.

However, something being in the vision is different from getting executed successfully. Sometimes, the company might be looking at Data Science, but there might be some antipatterns in execution that make life of Data Scientists hell and probability of failing high (Similar risks as being a Data Scientist in a company which doesn’t take Data Science seriously). Execution has its own challenges. Some other issues I see :

  1. Not hiring Data Scientists at senior positions and allocating Data Scientists to work under senior technology leaders. Although this might work sometimes, a senior traditional technology executive might take decisions that are not really the best to run a Data Science team. So for example a senior backend developer put in charge of Data Science team will often ask Data Scientists working under him to accommodate additional constraints in the data pipelines they implement. For example, they might be (for no reason) asked to work within the box of software architecture which their lead has designed, reducing their efficiency.

  2. Sales team not taking feasibility into account is another huge problem. Basically, the Data Science team need not be involved later just in the development of product/service being sold, but also earlier while the sale is being made to make sure something feasible is being sold. They can also suggest applications of data during sales time to make picth better. It is a very common pattern that Sales people of an AI company sell something that is not even feasible.

  3. The third is expecting Data Science teams to work while not having/collecting enough data. This often is a result of 1 or 2 where data wasnt collected to optimize some other tech need, or wrong expectations have been set to any client about the performance a low amount of data can deliver. Data Science is not magic, you need enough data to derive insights or train algorithms on it.

  4. A spineless leader for Data Science team. Well a spineless leader is always bad as they just pass the pressure downwards. People leave bosses not companies. A lot of the work of senior Data Science folks is expectation management and if they cannot do so, 1,2 and 3 mentioned will haunt the team’s work.

    1. Q. After 5 years of experience in an irrelevant field, do I have to start as a fresher in the AI field?

Originally Answered here: https://www.quora.com/After-5-years-of-experience-in-an-irrelevant-field-do-I-have-to-start-as-a-fresher-in-the-AI-field/answer/Muktabh-Mayank

Let’s think of it in two different ways :

  1. Do you think that someone who has worked for 5 years in AI will be less or more equipped to solve problems using AI algorithms than someone with 5 years experience in a different domain ? What are the odds of a person not experienced at a domain being better than an experienced person ? What are the odds to a an experienced guitar player playing cello better than an experienced cello player ? There is a very small but non-zero probability, but no one will be willing to take a bet on such small chance. Will you be willing ? Mostly no. That is the point, an unexperienced person is quite a less safe bet than an experienced person . People put less money on less safe bets and more money in safe bets. One doesnt need to start as a fresher if one has no experience in a domain, but one will find it hard to find people who will trust them for the job in lieu of an experienced AI engineer. However, a field is almost always open for entry level employees, so anyone can get started in a field anytime they want.

  2. Would you let a person of 5 year experience in AI field and not the field you are working on be willing to hire them as a senior employee in your field ? If you are a backend developer with 5 years experience for example, the AI engineer will almost surely be a worse backend developer than you are. Its only fair that they start working as a junior employee when they join your field despite their experience. I hope you can imagine why the opposite will be true as well.

    1. Q. If I have a BCA degree, how can I become a successful data scientist in India?

Originally Answered here: https://www.quora.com/If-I-have-a-BCA-degree-how-can-I-become-a-successful-data-scientist-in-India/answer/Muktabh-Mayank

This question has hidden context which is not directly visible to someone trying to answer it. I will have to break it down as two possible states you might be in while asking this question, read accordingly:

A. Your curriculum hasn’t got enough Data Science/Mathematics/Machine Learning theory and you want to cover these subjects:

It has now become very easy to access quality Data Science courses and lectures if you are self driven. Join a course specialization on coursera, edx, deeplearning ai or fast ai and push yourself to complete the course well, not just to pass tests. If you have some money, you can buy certifications, else audit these courses for free. Read good Data Science and Machine Learning books (Listed here: [50 Must-Read Free Books For Every Data Scientist in 2020 ParallelDots](https://blog.paralleldots.com/data-science/50-must-read-free-books-for-every-data-science-enthusiast/)
and [24 Best (and Free) Books To Understand Machine Learning ParallelDots](https://blog.paralleldots.com/data-science/24-best-and-free-books-to-understand-machine-learning/)

) and try solving exercises from these courses and books. Make sure you work on practical problems, not plain theory. Data Science is and continues to be very empirical field and practical knowledge is very important. These courses/books give you enough knowledge about any potential entry level Data Science jobs.

B. You think you know Data Science well enough to work in a firm but BCA degree is not providing you enough credential to get a job:

This is the second possible scenario I can think you have asked the question in. This is actually a hard scenario and can be thought to be like a Chicken Egg problem. “Just BCA doesnt get you a Data Science job” and “A Data Science job experience is required to make your CV credentialed enough to make your degree unimportant”, this is a loop. There is just one way to break out of Chicken Egg loops, put an asymmetric proposition on table. Offer to work for free or for minimal money with Universities/Companies which can get you a good project and a credential on your CV. Look for CS professors working on AI in an IIT/NIT/IIIT near you and ask them if they have any internship projects. Invest 1 or 2 quarters till you have a CV which is not limited by the scope of your degree and you are now good to go for any job applications.

  1. ###

    Q. As a data scientist, I have been with my current company for 5 yrs with flat growth and limited opportunity. For a better career move, shall I accept a job with better salary or a company sponsored PhD from one of the India’s prestigious institutes?

Originally answered here : https://qr.ae/pNVwr5

This is opinion, not based on any data.

A PhD is as good as the topic and the advisor. Unless you determine how good these are, its very hard to evaluate the PhD offer. Look for how well known is your supervisor and what is their expertise to understand if the research is worth it.

In case this is a blind offer, that is you will be told what will be subject of PhD after joining the program {I can think something like that happening in India}, I would have not taken the offer and applied for an independent PhD abroad. PhDs are not costly anywhere and a sponsored PhD means nothing IMHO !

Having a career where you are not offered challenging work is a big risk as you are slowly getting behind the expectation from you with the number of years as a professional. 5 times of 1 year experience is not same as 5 years experience. Switch ! Evaluate the PhD at hand or switch to another job whichever you feel is better.

Q. My company has been tasked with building a machine learning solution but the client can’t share real data due to privacy issues. They just expect to plug in their data and the system will work. How would you go about handling this?

Originally Answered here : https://qr.ae/pNzckr

If you are an employee {data scientist who needs to do this} of a company whose business people have taken up projects on these terms.. run away, run and don’t look back. Tomorrow, they will ask you to fetch a unicorn out of a Fedora, what will you do then ? LOL !

You know that it is not going to work ! Forget Machine Learning, you cannot even write simple rules on data without analyzing and viewing it.

If you are a person from BD team asking this question, try and set expectation of the client. Tell them about the No free lunch theorem - Wikipedia

and that no one (not just your team) can solve the problem without looking at data. They cannot share data, ask can your employees work on their data on their own systems, if not remotely, then in person ? You can bill your client extra for these facilities. After all, if you are high maintenance, you gotta pay more ! There is always a limit to what is feasible and what can be done.

  1. ###

    Q. Do data scientists code a lot?

Originally answered here : https://qr.ae/pNZr2z

Yes.

Two most important parts of my job as a Data Scientist is to 1. Code and 2. Read.

Q. I have 2 years of experience working in a product organisation, but I am not liking work now. Is it a wise decision to leave my current job and go for an ML and AI diploma by IIIT Bangalore for a year?

Originally answered here : https://qr.ae/pNaYF2

And you are sure you would like your work after the diploma ? Or you are just investing a couple of Lakh rupees to get into your comfort zone (or not being 100% committed to your current job thinking the best is yet to come). Many Indians do that with respect to education (“I have a Chemical Engineering coursework, what is the need to get involved, I will do an MBA later”, “MBA coursework is just bad, the real stuff is the job”, “my job is boring, real stuff is the cool job XYZ speaks about on LinkedIn”). I am saying so because many Indians including me often think in the way and its just escapism to be true. I am not saying this is the case with you btw, just telling about how many of us (including me) are programmed to think.

Probably a better introspection (before taking the plunge) would be that “why dont you like your current job?”, “what has changed since you liked it and accepted the offer letter ?”,”Is it the company environment you don’t like or the nature of job itself ?”. If you have developed aversion to company practices, change the organization, if you think this is not the field for you, maybe think of what exactly makes you dislike the field and whether the next field (ML/AI) is free of that flaw.

Most jobs look interesting and cool from the outside but the day to day work is as boring as others. ML has model/hyperparameter tuning slog overs before the final pitch hitting of publication/production. You will have to make a decision thinking about and knowing yourself and not just looking at fancy stuff on the internet / social media. People are trying to sell you their product/ their course/ their framework/ their ideology all the time. The entire point of marketing is that you feel good about the thing and buy it. You have to decide whether the need is actually there.

If you think that you will dig your heels into the field ML/AI and want to build a career there itself, sure, go for it. Just make sure you are not being nudged into it and you want the thing to happen genuinely.

Q. I’m 40 years old and currently a homemaker. I really, really want to be a data scientist. Is it too late? If not, where can I start?

Originally answered here : https://qr.ae/pN9Nka

It’s never too late. You can never be sure, but as long as you are determined and understand exactly your skills and shortcomings are, anything is possible.

Start by learning programming. And then start some Data Science courses. Put a small amount of money into “get started” type of courses, say 360 Rupee high rated courses on Udemy so that you have some skin in the game.

Once you know the basics, the good thing with Data Science is that everything is available to the interested soul open and free on the internet. The community is welcoming and there is rare (if any) gatekeeping as of now.

There are women I know (and I work with) who were non-programmers till very late in their careers, started late in the field, and they care of kids along with learning and then took a job and are doing really well. Women are strong and very good at multitasking and can make it along with all other responsibilities. So if one can do it, all of them can.

Q. How can I stand out as a machine learning/artificial intelligence engineer in India ?

Originally answered here : https://qr.ae/pNWgCp

Good Question.

I generally get questions asking advice on “how to get a job” which is very different from “standing out”. A compilation of my Quora answer around how to get one’s first data science job is here: Views On Fresher Data Science Jobs .

I am assuming that getting a job for you is not a problem, given you already have experience in Data Science work in a professional internship, but you rather want to build great work profile and reputation as a Data Scientist.

Basically, the aim is to get to the top, be known and reputed. Or as the author of Freakonomics call it, you want to have a “tournament” like lifestyle in Data Science. Somewhat the type of life a beauty pageant winner has or a drug dealer has, being well known, rising above the normal. (That’s the name of the chapter in the book decribing this lifestyle)

While a Data Science “tournament” is not as cut throat (Kaggle enthusiasts might disagree! ) as say beauty competitions or thuggery where there is no barrier to entry, it does involve taking substancial risks. In thuggery or beauty pageants, there is risk of absolute ruin, you either become a beauty queen or a waitress, its not as stark a contrast in Data Science. Depending upon your financial or family situation, these risks might be too great and meh for you and you should choose to run the tournament lifestyle only of you are sure you want to take such risks. There is a reason you might feel rich people are more well known, the reason is because they often can take risks relatively poor people cannot.

So yes, what risks you can take to become really rich and famous :

  1. Take a hard Data Science project at work. Something that has a great probability of failure. You only learn if you work on hard projects (true for me at least). You run the risk of losing promotion to a guy who took an easy project and got a raise because he wrote sorted(important_for_business_list) while you were figuring out how to solve a hard problem.

  2. Start/Join an open source project (or Kaggle Competitions) as a contributor. Work your a** off after coming back from office, spoil your family life. If the project gets famous, you are famous too. Else no one knows you, you have your job and a family you neglected earlier to pamper !

  3. Join a startup, take some equity, get paid way less than your market salary to solve a Data Science problem much harder than you would have solved in your day to day job. If everything aligns and the startup is successful, you are both rich and famous, else you are back to your regular desk job.

  4. Join a Data Science PhD program and work hard 4–6 years to discover something that impacts society.If your research changes the world say like Ian Goodfellow’s or Chelsea Manning’s or Matei Zaharia’s did , you are super famous. If your research is meh, you are just one out of the millions getting higher education in Data Science.

The alternative is get a cushion-y data science job and wait for your experience to compound in say 10–20 years. The life most people(and most Data Scientists will live).

There is no sure shot way to get famous and stand out you see, you can just take risks and see how it goes ! It’s the red pill way or the blue pill way. The red pill way is harder, takes longer and might not always work. That said, the risks one needs to take to get on top in Data Science are much less than say gunfights with other gangsters.

Q. I am planning to acquire new skills. I am confused whether to go with Blockch or AI,ML or Cloud platforms? I have aspirations of migrating to UK. Which technology would help me do that.

Originally answered here : https://qr.ae/pNOT3z

Pick one randomly and start. You will eventually either like it or not like it and switch to another skill. Don’t let this decision about what to choose suck any of your time, one week/month of extra skill in a space you don’t like isn’t going to harm you. In your early 20s and mid 20s, the purpose of life is to explore and not stick, you can always stick to a field later when you have responsibilities. The answer to “Which is better skill out of X,Y and Z ?” is whichever skill you can get the best at.

If your aspiration is to migrate to UK (which btw you should think why you want to do so), the right way is to figure out what jobs will be open in UK for a non-citizen in coming future. You can almost be sure that software skills are not the best idea as most software jobs are in US. Think of analyst, supply chain and other specializations that Brexit will create a need for.

Q. I am from a non-technical background. Is competitive programming is required for data science freshers to get a job in any company?

Originally answered here : https://qr.ae/pNOTZD

Nope. Competitive Programming has no application in Data Science.

All participating (and having a good score) in competitive programming means you can solve unseen problems with code. That’s a good thing. But if you dont do competitive programming it doesn’t mean you cannot solve problems.

Q. Is there a shortage of Data Scientists?

Originally answered here : https://qr.ae/pNbcZE

As of Oct 2020, No.

There are too many Data Scientists, maybe more than there are job opportunities. The education industry has been able to cash in on the trend and use headlines of 2010–2015 like Data Scientist: The Sexiest Job of the 21st Century

and Building data science teams

and use it to attract 1000s and 1000s of interested people and turn them into a massive Data Science workforce. These articles were not wrong, there was a time in 2010s when Data Scientists were rare and people could do really well just on epistemic advantages of knowing about the field, but now Data Science is just like any other field, lot of demand for top 1–10% and then “from/to everyone according to their ability”. So you don’t just need to know Data Science now for success like in the past, you need to excel in it.

There was a creative disruption in early-mid 2010s, its now commonplace. It was a very good time then to invest in a Data Science career just for the returns one got for being aware. Data Science is now a relatively safe but structured career field like Software Engineering and Product Management. The hype is now reaching more general audience, so India at least has new emerging engineering degrees focused on Data Science compared to executive courses which taught the skills to most people as of now and MOOCs which were popular before them.

You might also want to read my answer here as to why its more advantageous to place bets on a relatively emerging field than something that is more commonplace (lakhs means multiples of 100,000 Indian Rupees a Month, basically a good salary in India) :

Muktabh Mayank’s answer to Why do people run towards latest technologies in the IT sector when people working on C and C++ are also earning lakhs per month?

Q. Why should an AI and ML engineer learn C/C++? Isn’t Python enough?

Originally answered here : https://qr.ae/pNk1Mr

For most practical purposes, you don’t need C/C++ to work in AI/ML .

There are some profiles where you need C/C++ writing/enhancing infrastructure and APIs like Tensorflow/PyTorch / ScikitLearn / XGBOOST and the like. But just using these frameworks doesn’t require any knowledge of C/C++ .

Q. Can someone become a data scientist with poor or average programming skills?

Originally answered here : https://qr.ae/pNwSpy

It depends upon what type of Data Science role one is looking into. If you have not realized it, a very broad spectrum of jobs is advertised as “Data Scientist”. Its like “Software Engineer”, different companies need different type of people for the role.

There are some profiles where you can make do without programming knowledge. Like using Microsoft Analytics stack or MS Excel or SQL. This is profiles which were typically called Business (or Functional) Analysts earlier, but have now been rebranded to “Data Scientist” role. Even many “big data” analysts (or Data Scientists) work with SQL on Hadoop/Spark.

If you are more on the statistical side of “Data Scientist” spectrum, you can probably manage with learning a subset of R. You don’t need to do too much programming here as long as you know about what methods you are applying.

If you are talking about Machine Learning Engineering or complex Data Analysis with code, despite of many no-code solutions now in place, being good at programming is still kind of mandatory. Maybe in future, you might not require to write programs, but not as of now.

Q. Is Python good enough for data science, or should we learn R as well for analysis?

Originally answered here: https://qr.ae/pNVKCT

If you are comfortable with Python, there is almost no need to learn R for Data Science. Literally everything you can do with R can be done in Python {the converse is not true, Python can do much more}.

There are just two exceptions you want to learn R along with Python :

  1. If your workplace demands it. If you join a company where everyone uses R, you are out of options.

  2. If you are working in a niche field where most people use R. In such a case all relevant libraries for your work will be written in R and working in Python will involve too much reinventing the wheel.