This article first appeared in the DataPlatformGeeks Magazine.
Author: Will McGrath.
Subscribe to get your copy.
Most organizations would no doubt love to have an AI “easy button.” While no such thing exists, there is a self-service alternative that makes building an AI-as-a-Service platform a possibility – even for those without deep pockets.
That option is called Red Hat OpenShift Data Science. Red Hat OpenShift Data Science is a centralized “hub” housing various open source tools for data management and analysis. Different groups within organizations, data engineers, data scientists, and DevOps teams, can use Red Hat OpenShift Data Science to build their own AI platforms and manage analytic and data science workloads.
Let’s take a deeper look at the solution, including how it originated, how it works, who it helps, and what it could mean for the future of data analytics and AI/ML.
A Blueprint of Common Open Source AI Components
Red Hat OpenShift Data Science started life as an open source project called Open Data Hub. Open Data Hub was developed by software engineers for data scientists at Red Hat who were looking for a new way to access large data sets. These engineers wanted to use existing open source tools to enhance data sharing between DevOps and data scientists and more efficiently use compute resources, since data modeling of large workloads can be exceptionally resource-intensive.
Over time the solution grew into a reference architecture for assembling more than 20 common open source components into an AI-as-a-Service platform, including JupyterHub, Apache Spark, sckit-learn, TensorFlow, Kubeflow, Prometheus, and more. Users could pick and choose the tools and components they needed for specific tasks, tie them together in ways that worked best for them, and operate them as they saw fit. For example, data scientists could install a component that met their specific needs—JupyterHub, for example, or Apache Spark. Meanwhile, DevOps professionals could install tools that could help their development work, deploy ML workflows, and more.
An End-to-End AI/ML Platform
In 2021, several components of Open Data Hub were chosen to create a supported cloud service called Red Hat OpenShift Data Science. This service was made available for the masses with a simple mandate: to use open source tools to provide an end-to-end AI/ML platform. While Red Hat only supports the Red Hat product components in the Open Data Hub architecture, Red Hat provides site reliability engineering (SRE) support for OpenShift Data Science along with the underlying OpenShift cloud service platform on which it is layered.
Red Hat OpenShift Data Science provides DevOps teams, data scientists, and data engineers with access to a core set of tools like Jupyter notebooks, Pytorch, and TensorFlow and different commercial and open source tools to build their own AI platforms and train AI models. For example, DevOps professionals can select their own monitoring, model serving, or optimization tools or leverage an integrated technology partner like Seldon. Meanwhile, data engineers can choose from a variety of data management tools derived from different projects for their responsibilities, or use technology from a data grid partner like Starburst. And data scientists can work with various data analysis and model building and training and testing tools that can reduce the time needed to deploy models and gain insights.
Improved Collaboration and Coordination Across Teams
Typically, data management and DevOps teams work independently from each other. A company might have a set of data scientists and data engineers extracting insights and managing data sets, respectively. They might pass that information off to developers, who begin building intelligent applications that are then provided to corporate business units. Unfortunately, this step-by-step siloed process does not lend itself to simplicity and speed, especially when companies are dealing with highly complex and raw, unstructured data sets.
Red Hat OpenShift Data Science provides a common platform upon which all teams can work. Whereas before, DevOps teams would have to wait for data scientists to complete their work and research, now, teams can work together and in sync with each other. They can more easily share data and use their respective tools to complete portions of their projects simultaneously. This can help expedite the collection and sharing of data and actionable insights and accelerate the application development process.
The Future of Red Hat OpenShift Data Science
Red Hat OpenShift Data Science is supported by the innovation of the open source community and continues to expand as new features and components are added. As it evolves, users should see a greater menu to choose from as they seek to begin or continue their forays into data science and AI.
What’s best: they’ll be able to more easily gain valuable insights on their own terms, using the tools they want, when they need them. They’ll be able to experience the freedom of open source and the power of AI, all from a single central location.
This article first appeared in the DataPlatformGeeks Magazine.
Author: Will McGrath.
Subscribe to get your copy.