The field of Data Science is growing with new capabilities and technologies. The 2010s saw a global data science movement- the Harvard Business Review even declared “data science” as one of the “sexiest jobs of the 21st century”, and we can’t help but think that the trend will continue well into the 2020s. As the field of data science continues to grow rapidly into this decade, our Data Science team at Indellient identified their favourite data science trends for 2020.
Focus on Model Interpretability
As machine learning models have been applied more and more across various industries, it becomes vital for us to understand what ML models are predicting and why they are predicting those results. There are three main reasons why model interpretability is important.
- To continuously stabilize and improve the model performance, data scientists should not only understand why the model performs well but also know when the model will fail.
- Business owners without data science knowledge will have more confidence in using the insights from the model when the key factors for machine learning models have been presented.
- Understanding how the machine learning model works aids in directing future data collection.
Availability of Data Science Workflow Frameworks
The most time-consuming aspect of daily data science work is focused on model development, feature engineering and hyper-parameter tuning to creating the best performing models. But how can we quickly deploy to production to validate, without depending on complex infrastructure requirements like resources, data warehousing, or a scheduler? Tech giants like Netflix developed internal frameworks for their in-house Data Scientists for better collaboration, multi-version deployment of libraries, inspecting model metadata from various runs, etc., which is now open-sourced.
We are also seeing exciting new commercial platforms such as Iguazio that improve time to market, performance and costs for model training and deployment.
To improve the turnaround time to put a model in production and testing, frameworks and platforms like these are crucial. The faster we implement and test our models, the sooner businesses can benefit from them, a win-win!
Data Privacy and Security
Data Privacy and security has been a general concern in technology for several years. As billions of users surface the web every day and share their personal information, it becomes easy for potential bad guys to breach systems and access sensitive information.
In order to minimize data breaches and fraud, governments worldwide are introducing regulations to and making it tougher for tech companies to use customer’s personal data with laws like the California Consumer Privacy Act (CCPA) to prevent data breaches and scandals like the Capital One data breach in future.
This will have major impacts on the millions of personalized advertisements we see through Amazon, Facebook, Google etc., leading to the emergence of a statistical technique called Differential Privacy, a practice that ensures all personal user information is prevented from AI models while ensuring the model results remain accurate. Google introduced TensorFlow Privacy in March 2019 as a part of their responsible AI initiative, which allows developers to train their models on personal user data without leakage. Facebook, Amazon and several other tech giants are spending a significant amount of research budget on fair and responsible AI, as we continue to except additional government regulations and more industry leaders taking an interest in differential privacy AI.
From physical servers to virtual machines and cloud, more and more companies are switching to host their servers and applications on AWS, Google Azure, and IBM Cloud. However, hosting your own EC2 and building the whole architecture around it is not cheap. Thus, leading to more and more organizations using serverless services such as FaaS (Function as a Server) and BaaS (Backend as a Server) to save cost and resources. Typically with these services, there is no need to build firewalls, system maintenance, patching, settings cybersecurity, and load balancers. Take FaaS as an example; the function can handle requests within milliseconds, invoking computation, and distributing resources. Organizations won’t need an architect to solely focus on operation and the quality of code. When the resources are released a fee is charged for the times of usage.
Additional advantages of serverless include:
- Quick, simple, and much less to worry
- Highly elastic and responsive to different traffic volumes
- Different from IaaS and PaaS, as it reduces the server’s idle time and saves costs
- It can preset the server’s location and time-schedule to reduce latency further while being robust
However, that’s not to say there are not challenges. For example, it can be difficult to integrate multiple different small functions and monitor performance, and in general, FaaS is not as strong at handling extremely complex functions.
There are commercial alternatives to setting up and maintaining open source serverless services such as the Iguazio Nuclio platform. This provides the performance benefits of serverless (and more) but with a comprehensive administration interface.
Before deciding to move to serverless, be sure to dive into what components your application might need and how large it can be. But, we hope that serverless is a more green and cost-effective choice for you.
The End of the Unicorn Myth
When people first started talking about data science, there was a tendency to describe the Data Scientist as a modern-day hero; with astounding capabilities in business analytics, data engineering and mathematics, or more popularly “Unicorns”. While there are many talented individuals working in data science, they have skills, strengths and weaknesses just like everyone else. Many projects were slowed or prevented from getting into production, as data scientists without experience with enterprise-scale cloud systems struggled with data pipelines and serverless architectures.
The modern advanced analytics group combines data scientists, data engineers, full-stack developers, and business analysts to provide the full range of skills an organization needs to really benefit from their data. Relegating unicorns back to the realm of myth will only help our profession build the kind of teamwork and organization needed to get “AI” applications out of the lab and into production.