Recently, HFS published a Top 10 report on the top data science and machine learning platforms, with insights for enterprise IT leaders to consider when selecting their data science (DS) and machine learning (ML) platforms. Data science and machine learning platforms help data scientists understand patterns in data. Algorithms can offer answers to enterprises to understand the past, present, and future (to a degree of probability). The challenge is in making the right investments. In this POV, we highlight the key trends from the Top 10 report and the voice of the customer feedback we garnered to support them.
The hyperscalers topped the charts, and there were notable performances from niche players, too
Unsurprisingly, the hyperscalers’ platforms (AWS Sagemaker, Microsoft Azure ML, and Google Cloud AI) scored three of the four top spots in our Top 10. Among the Top 10 are also H20.ai, DataRobot, and Databricks, disruptive startups that are making great strides to expand their enterprise footprint with innovative technology investments in this field. IBM, Domino, Dataiku, and RapidMiner rounded out the Top 10. As is customary in HFS Top 10 reports, the scoring criteria focused on execution, innovation, and voice of the customer (VOC). The inputs to this process included conversations with 30+ power users of DS and ML platforms (enterprise clients and solution providers) and our proprietary database of over 300 DS and ML engagements across industries and across the globe.
A solid data strategy is a rare and beautiful thing
Data, too often touted as the new oil, needs a lot of refining before any liquid gold emerges. Many enterprises have not got a solid data strategy within their organizations. The future is digitized, hyperconnected, and automated. The need for products, services, and strategies to support internal and external data possibilities grows. It’s going to be a journey getting there.
Data scientists spend too much precious and expensive time on data preparation and data wrangling. Any progress toward shortcutting the data preparation phase is valuable to data scientists (and those who pay their salaries). Dataiku was highly rated for data wrangling support.
According to one of its IT advisors, one of the largest life sciences companies in the world has data quality among its top AI priorities. He points out the biggest pushes for the firm are “to develop more technology-minded staff internally, having a great data architecture and focus on data quality. Overall, the market needs to make AI adoption easier to use.”
Scalability, critical for big data loads, led to several platforms getting a shout out: Microsoft Azure ML, H20.ai, Google Cloud AI, DataRobot, and Databricks. Despite leading the Top10, it was thought that AWS Sagemaker has room for improvement in this regard.
Integration matters at the platform level and for workflows
The most successful DS and ML platforms are those that can provide integration with other platforms and between the workflows of data scientists as they develop models. What we see is that integration capabilities have improved; H20.ai excels here. But, there is still a need for investment in collaboration between users and workflows. Everyone using the platform—from data scientists to data engineers to citizen data scientists—needs to be able to communicate and collaborate effectively throughout the full life cycle of model development, deployment, and maintenance.
An ecosystem full of possibilities…
The broad DS and ML ecosystem is competitive, yet collaboration is integral to the community, even between competitors. Open-source platforms are popular; developers frequently cite them as favorites because of user-friendliness. However, enterprise leaders deciding which platform to use should make sure they understand the pros and cons of these open-source options and consider whether they are truly the best option for their organization. Investigate the possibility of accessing enterprise support options if necessary.
AWS, Microsoft Azure, Google Cloud, and IBM’s ecosystems were called out especially for the range of tools that could be used in harmony.
The right platform is the one that delivers most impact to your data scientists’ productivity
For those looking to invest in a DS and ML platform, it’s not necessary to be exclusive with your selection. View the selection of platforms as a toolbox for you to select from a range of options as your organization’s needs dictate. From our research, most data scientists, analysts, and developers have three or four favorite DS ML platforms, on average.
Productivity improvements for data scientists and developers are key to bottom-line impact. DS and ML platforms can offer many time and effort reductions, such as platform-specific software development kits (SDKs) for Jupyter Notebooks and plug-ins for common integrated development environments (IDEs). And as there continue to be advancements with AutoML, platforms can reduce the effort and expertise required to train models, specifically by automating the steps it takes to train and optimize new models.
A senior data scientist at one of the largest banks in North America shared their needs from an ML platform, “Our biggest AI priority is having a flexible platform that can support a wide variety of use cases, features auto ML to more quickly find insights, and be able to support unstructured data processing.” For many organizations, AutoML is becoming a real solution to gain speed with a small data scientist team.
The Bottom Line: Data science and machine learning platforms are the foundation for data-driven innovation within an enterprise. Invest in both boosting your data scientist team’s productivity and finding ways to create more self-serve analytics and ML capabilities to drive adoption.
Ultimately, this all comes down to how you can tie these technology investments to real business goals. For example, the VP of IT at a North American water utility in our study outlined a wide range of business outcomes that their organization is relying on their DS and ML platforms to deliver, “[We need these platforms] to improve processes and decision making … to drive down our cost of service without sacrificing reliability of service and improve customer engagement.”
DS and ML platforms must keep pace with advancements in machine learning, deep learning, and computing infrastructure through open integration and flexibility, take a proactive approach to model governance, create an environment to better support the workflows and hand-offs between data scientists, engineers, and developers, and, finally, provide the best possible avenues for non-technical business users to consume insights to influence actual business decision making and enterprise performance.