big data Tag Archives - General Assembly Blog

Is a Data Science Career a Good Fit for You? Here’s What You Need to Know.

By

So you’re thinking of a career in data science, but you’re not sure if it’s the right fit for you.  Here is your data science guide, where we break down what data science is, day in the life of a data scientist, tips from GA’s data science alumni, career opportunities, and much more. 

WHAT IS DATA SCIENCE? 

According to Berkeley, data science is the ability to take data, understand it, extract value from it, visualize it, and communicate the findings. The term “data science” was coined in 2008 when companies realized the need for data professionals to analyze immense amounts of data. 

Continue reading

How to Get a Job in Data Science Fast

By

You want to get a data science job fast. Obviously, no one wants one to get a job slowly. But the time it takes to find a job is relative to you and your situation. When I was seeking my first data science job, I had normal just Kevin bills and things to budget for, plus a growing family who was hoping I’d get a job fast. This was different from some of my classmates, while others had their own versions of why they needed a job fast, too. I believe that when writing a how-to guide on getting a data science job quickly, we should really acknowledge that we’re talking about getting you, the reader, a job faster. Throughout this article, we’ll discuss how to get a job as a data scientist faster than you might otherwise, all things considered.

Getting a job faster is not an easy task in any industry, and getting a job faster as a data scientist has additional encumbrances. Some jobs, extremely well-paying jobs, require a nebulous skill set that most adults could acquire after several years in the professional working world. Data science is not one of those jobs. For all the talk about what a data scientist actually does, there’s a definite understanding that the set of skills necessary to successfully execute any version of the job are markedly technical, a bit esoteric, and specialized. This has pros and cons, which we’ll discuss. The community of people who aspire to join this field, as well as people already in the field, is fairly narrow which also has pros and cons.

Throughout this article, we’ll cover two main ways to speed up the time it takes to get a data science job: becoming aware of the wealth of opportunities, and increasing the likelihood that you could be considered employable.

Becoming Aware of the Wealth of Opportunities

Data science is a growing, in-demand field. See for yourself in Camm, Bowers, and Davenport’s article, “The Recession’s Impact on Analytics and Data Science” and “Why data scientist is the most promising job of 2019” by Alison DeNisco Rayome. It’s no secret however that these reports often only consider formal data science job board posts. You may have heard or already know that there exists a hidden job market. It stands to reason that if this hidden job market exists, there may also be a number of companies who have not identified their need for a data scientist yet, but likely need some portion of data science work. Here’s your action plan, assuming you already have the requisite skills to be a data scientist:

1. Find a company local to your region. This is easier if you know someone at that company, but if you don’t know anyone, just think through the industries that you’d like to build a career in. Search for several companies in those fields and consider a list of problems that might be faced by that organization, or even those industries at large.

2. Do some data work. Try to keep the scope of the project limited to something you could accomplish in one to two weekends. The idea here is not to create a thesis on some topic, but rather to add to your list of projects you can comfortably talk about in a future interview. This also does not have to be groundbreaking, bleeding edge work. Planning, setting up, and executing a hypothesis test for a company who is considering two discount rates for an upcoming sale will give you a ton more fodder for interviews over a half-baked computer vision model with no clear deliverable or impact on a business.

3. You have now done data science work. If you didn’t charge money for your services on the first run, shame on you. Charge more next time.

4. Repeat this process. The nice thing about these mini projects is that you can queue up your next potential projects while you execute the work for your current project at the same time.

Alternatively, you could consider jobs that are what I call the “yeah but there’s this thing…” type jobs. For example, let’s say you’re setting up a database for a non-profit and really that’s all they need. The thing is… it’s really your friend’s non-profit, all they need is their website to log some info into a database, and they can’t pay you. Of course you should not do things that compromise your morals or leave you feeling as though you’ve lowered your self worth in any way. Of course you’d help out your friend. Of course you would love some experience setting up a database, even if you don’t get to play with big data. Does that mean that you need to explain all of those in your next job interview? Of course not! Take the job and continue to interview for others. Do work as a data engineer. Almost everyone’s jobs have a “yeah but” element to them; it’s about whether the role will help increase your likelihood of being considered employable in the future.

Increasing the Likelihood That You Could Be Considered Employable

Thought experiment: a CTO comes to you with a vague list of Python libraries, deep learning frameworks, and several models which seem relevant to some problems your company is facing and tasks you with finding someone who can help solve those issues. Who would you turn to if you had to pick a partner in this scenario? I’ll give you a hint — you picked the person who satisfied three, maybe four criteria on what you and that team are capable of.

Recruiting in the real world is no different. Recruiters are mitigating their risk of hiring someone that won’t be able to perform the duties of the position. The way they execute is by figuring out the skills (usually indicated by demonstrated use of a particular library) necessary for the position, then finding the person who seems like they can execute on the highest number of the listed skills. In other words, a recruiter is looking to check a lot of boxes that limit the risk of you as a candidate. As a candidate, the mindset shift you need to come to terms with is that they want and need to hire someone. The recruiter is trying to find the lowest risk person, because the CTO likely has some sort of bearing on that recruiter’s position. You need to basically become the least risky hire, which makes you the best hire, amongst a pool of candidates.

There are several ways to check these boxes if you’re the recruiter. The first is obvious: find out where a group of people who successfully complete the functions of the job were trained, and then hire them. In data science, we see many candidates with training from a bootcamp, a master’s program, or PhDs. Does that mean that you need these degrees to successfully perform the function of the job? I’d argue no — it just means that people who are capable of attaining those relevant degrees are less risky to hire. Attending General Assembly is a fantastic way to show that you have acquired the relevant skills for the job.

Instead of having your resume alone speak to your skill, you can have someone in your network speak to your skills. Building a community of people who recognize your value in the field is incredibly powerful. While joining other pre-built networks is great, and opens doors to new opportunities, I’ve personally found that the communities I co-created are the strongest for me when it comes to finding a job as a data scientist. These have taken two forms: natural communities (making friends), and curated communities. Natural communities are your coworkers, friends, and fellow classmates. They become your community who can eventually speak up and advocate for you when you’re checking off those boxes. Curated communities might be a Meetup group that gathers once a month to talk about machine learning, or an email newsletter of interesting papers on Arxiv, or a Slack group you start with former classmates and data scientists you meet in the industry. In my opinion, the channel matters less, as long as your community is in a similar space as you.

Once you have the community, you can rely on them to pass things your way and you can do the same. Another benefit of General Assembly is its focus on turning thinkers into a community of creators. It’s almost guaranteed that someone in your cohort, or at a workshop or event has a similar interest as you. I’ve made contacts that passed alongside gig opportunities, and I’ve met my cofounder inside the walls of General Assembly! It’s all there, just waiting for you to act.

Regardless of what your job hunt looks like, it’s important to remember that it’s your job hunt. You might be looking for a side gig to last while you live nomadically, a job that’s a stepping stone, or a new career as a data scientist. You might approach the job hunt with a six-pack of post-graduate degrees; you might be switching from a dead end role or industry, or you might be trying out a machine learning bootcamp after finishing your PhD. Regardless of your unique situation, you’ll get a job in data science fast as long as you acknowledge where you’re currently at, and work ridiculously hard to move forward.

What is Data Science?

By

It’s been anointed “the sexiest job of the 21st century”, companies are rushing to invest billions of dollars into it, and it’s going to change the world — but what do people mean when they mention “data science”? There’s been a lot of hype about data science and deservedly so, but the excitement has helped obfuscate the fundamental identity of the field. Anyone looking to involve themselves in data science needs to understand what it actually is and is not.

In this article, we’ll lay out a deep definition of the field, complete descriptions of the data science workflow, and data science tasks used in the real world. We hope that any would-be entrants into this line of work will come away reading this article with a nuanced understanding of data science that can help them decide to enter and navigate this exciting line of work.

So What Actually is Data Science?

A quick definition of data science might be articulated as an interdisciplinary field that primarily uses statistics and computer programming to derive insights from and base decisions from a collection of information represented as numerical figures. The “science” part in data science is quite apt because data science very much follows a scientific process that involves formulating a hypothesis and using a specific toolset to confirm or dispel that hypothesis. At the end of the day, data science is about turning a problem into a question and a question into an answer and/or solution.

Tackling the meaning of data science also means interrogating the meaning of data. Data can be easily described as “information encoded as numbers” but that doesn’t tell us why it’s important. The value of data stems from the notion that data is a tangible manifestation of the intangible. Data provides solid support to aid our interpretations of the world. For example, a weather app can tell you it’s cold outside but telling you that the temperature is 38 degrees fahrenheit provides you with a stronger and specific understanding of the weather.

Data comes in two forms: qualitative and quantitative.

Qualitative data is categorical data that does not naturally come in the form of numbers, such as demographic labels that you can select on a census form to indicate gender, state, and ethnicity.

Quantitative data is numerical data that can be processed through mathematical functions; for example stock prices, sports stats, and biometric information.

Quantitative can be subdivided into smaller categories such as ordinal, discrete, and continuous.

Ordinal: A sort of qualitative and quantitative hybrid variable in which the values have a hierarchical ranking. Any sort of star rating system of reviews is a perfect example of this; we know that a four-star review is greater than a three-star review, but can’t say for sure that a four- star review is twice as good as a two-star review.

Discrete: These are countable and finite values that often appear in the form of integers. Examples include number of franchises owned by a company and number of votes cast in an election. It’s important to remember discrete variables have a finite range of numbers and can never be negative.

Continuous: Unlike discrete variables, continuous can appear in decimal form and have an infinite range of possibilities. Things like company profit, temperature, and weight can all be described as continuous. 

What Does Data Science Look Like?

Now that we’ve established a base understanding of data science, it’s time to delve into what data science actually looks like. To answer this question, we need to go over the data science workflow, which encapsulates what a data science project looks like from start to finish. We’ll touch on typical questions at the heart of data science projects and then examine an example data science workflow to see how data science was used to achieve success.

The Data Science Checklist

A good data science project is one that satisfies the following criteria:

Specificity: Derive a hypothesis and/or question that’s specific and to the point. Having a vague approach can often lead to a waste of time with no end product.

Attainability: Can your questions be answered? Do you have access to the required data? It’s easy to come up with an interesting question but if it can’t be answered then it has no value. The same goes for data, which is only useful if you can get your hands on it.

Measurability: Can what you’re applying data science to be quantified? Can the problem you’re addressing be represented in numerical form? Are there quantifiable benchmarks for success? 

As previously mentioned, a core aspect of data science is the process of deriving a question, especially one that is specific and achievable. Typical data science questions ask things like, does X predict Y and what are the distinct groups in our data? To get a sense of data science questions, let’s take a look at some business-world-appropriate ones:

  • What is the likelihood that a customer will buy this product?
  • Did we observe an increase in sales after implementing a new policy?
  • Is this a good or bad review?
  • How much demand will there be for my service tomorrow?
  • Is this the cheapest way to deliver our goods?
  • Is there a better way to segment our marketing strategies?
  • What groups of products are customers purchasing together?
  • Can we automate this simple yes/no decision?

All eight of these questions are excellent examples of how businesses use data science to advance themselves. Each question addresses a problem or issue in a way that can be answered using data science.

The Data Science Workflow

Once we’ve established our hypothesis and questions, we can now move onto what I like to call the data science workflow, a step-by-step description of a typical data science project process.

After asking a question, the next steps are:

  1. Get and Understand the Data. We obviously need to acquire data for our project, but sometimes that can be more difficult than expected if you need to scrape for it or if privacy issues are involved. Make sure you understand how the data was sampled and the population it represents. This will be crucial in the interpretation of your results.
  1. Data Cleaning and Exploration. The dirty secret of data science is that data is often quite dirty so you can expect to do significant cleaning which often involves constructing your variables in a way that makes your project doable. Get to know your data through exploratory data analysis. Establish a base understanding of the patterns in your dataset through charts and graphs.
  1. Modeling. This represents the main course of the data science process; it’s where you get to use the fancy powerful tools. In this part, you build a model that can help you answer a question such as can we predict future sales of a product from your dataset.
  1. Presentation. Now it’s time to present the results of your findings. Did you confirm or dispel your hypothesis? What are the answers to the questions you started off with? How do your results advance our understanding of the issue at hand? Articulate your project in a clear and concise manner that makes it digestible for your audience, which could be another team in your company or your company’s executives.

Data Science Workflow Example: Predicting Neonatal Infection

Now let’s parse out an example of how data science can affect meaningful real-world impact, taken from the book Big Data: A Revolution That Will Transform How We Live, Work, and Think.

We start with a problem: Children born prematurely are at high risk of developing infections, many of which are not detected until after a child is sick.

Then we turn that problem into a question: Can we detect patterns in the data that accurately predict infection before it occurs?

Next, we gather relevant data: variables such as heart rate, respiration rate, blood pressure, and more.

Then we decide on the appropriate tool: a machine learning model that uses past data to predict future outcomes.

Finally, what impact do our methods have? The model is able to predict the onset of infection before symptoms appear, thus allowing doctors to administer treatment earlier in the infection process and increasing the chances of survival for patients.

This is a fantastic example of data science in action because every step in the process has a clear and easily understandable function towards a beneficial outcome.

Data Science Tasks

Data scientists are basically Swiss Army knives, in that they possess a wide range of abilities — it’s why they’re so valuable. Let’s go over the specific tasks that data scientists typically perform on the job.

Data acquisition: For data scientists, this usually involves querying databases set up by their companies to provide easy access to reams of data. Data scientists frequently write SQL queries to retrieve data. Outside of querying databases, data scientists can use APIs or web scraping to acquire data.

Data cleaning: We touched on this before, but it can’t be emphasized enough that data cleaning will take up the vast majority of your time. Cleaning oftens means dealing with null values, dropping irrelevant variables, and feature engineering which means transforming data in a way so that it can be processed by a model.

Data visualization: Crafting and presenting visually appealing and understandable charts is a hugely valuable skill. Visualization has an uncanny ability to communicate important bits of information from a mass of data. Good data scientists will use data visualization to help themselves and their audiences better understand what’s going on.

Statistical analysis: Statistical tests are used to confirm and/or dispel a data scientist’s hypothesis. A t-test or chi-square are used to evaluate the existence of certain relationships. A/B testing is a popular use case of statistical analysis; if a team wants to know which of two website designs leads to more clicks, then an A/B test is the right solution.

Machine learning: This is where data scientists use models that make predictions based on past observations. If a bank wants to know which customers are likely to pay back loans, then they can use a machine learning model trained on past loans to answer that question.

Computer science: Data scientists need adequate computer programming skills because many of the tasks they undertake involve writing code. In addition, some data science roles require data scientists to function as software engineers because data scientists have to implement their methodologies into their company’s backend servers.

Communication: You can be a math and computer whiz, but if you can’t explain your work to a novice audience, your talents might as well be useless. A great data scientist can distill digestible insights from complex analyses for a non-technical audience, translating how a p-value or correlation score is relevant to a part of the company’s business. If your company is going to make a potentially costly or lucrative decision based on your data science work, then it’s incumbent on you to make sure they understand your process and results as much as possible.

Conclusion

We hope this article helped to demystify this exciting and increasingly important line of work. It’s pertinent to anyone who’s curious about data science — whether it’s a college student or an executive thinking about hiring a data science team — that they understand what this field is about and what it can and cannot do.

The Skills and Tools Every Data Scientist Must Master

By

women of color in tech

Photo by WOC in Tech.

“Data scientist” is one of today’s hottest jobs.

In fact, Glassdoor calls it the best job of 2017, with a median base salary of $110,000. This fact shouldn’t be big news. In 2011, McKinsey predicted there would be a shortage of 1.5 million managers and analysts “with the know-how to use the analysis of big data to make effective decisions.” Today, there are more than 38,000 data scientist positions listed on Glassdoor.com.

It makes perfect sense that this job is both new and popular, since every move you make online is actively creating data somewhere for something. Someone has to make sense of that data and discover trends in the data to see if the data is useful. That is the job of the data scientist. But how does the data scientist go about the job? Here are the three skills and three tools that every data scientist should master.

Continue reading

Announcing General Assembly’s New Data Science Immersive

By

DataImmersive_EmailArt_560x350_v1

Data science is “one of the hottest and best-paid professions in the U.S. More than ever, companies need analytical minds who can compile data, analyze it, and drive everything from marketing forecasts to product launches with compelling predictions. Their work drives the core strategies of modern business — so much so that, by 2018, data-related job openings will total 1.5 million. That’s why we’ve worked hard to develop classes, workshops, and courses to confront the data science skills gap. The latest addition to our proud family of data education is the new Data Science Immersive program.

Launching for the first time in San Francisco and Washington, D.C. on April 11, this full-time Immersive program will equip you with the tools and techniques you need to become a data pro in just 12 weeks.

Continue reading

3 Ways that Data Affects Mass Media

By

media-data-blog-picjumbo

If you thought the introduction of the commercial Internet changed mass media, take a look at what’s in front of you today. Behind the sites of your favorite newspapers and blogs (yes, even this one), publishers are using data to create better audience experiences. For anyone who has ever considered working with data as part of their career, there are now more opportunities than ever to bring media and data together. Here are some of the most important technologies to have on your radar.

Continue reading

How Can UX Design Make Sense of Big Data?

By

ux-data-blog-picjumbo

Big data is just what it sounds like; data so big that it’s not easily processed through conventional methods. However, once this large data set is eventually distilled down, user experience can play a huge role in making sense of the reports and leading the charge for user-centered solutions.

User experience (UX) is the bridge between big data analytics and the end user. The richness of big data being collected by all types of companies has unleashed a treasure trove of information for user experience designers. UX designers can create more robust solutions for users by analyzing these enormous data sets.

Continue reading

5 Companies Using Data for Social Impact

By

socialimpact-blog-picjumbo2

Can data improve the future of our humanity? You better believe it. “Big data” is more than just big businesses. Every day, social impact groups are finding new and creative ways to act upon the information that they’re generating. They’re using data to surface new information, uncover underserved communities, and track performance over time. Here are 5 very different organizations that are using data, in new and creative ways, to improve the lives of people around them:

Continue reading