Empower Enterprise Users with Machine Learning
Data science projects are increasingly pervasive in enterprises both large and small. Such projects rely on not only data, data preparation, visualization, and machine learning, but also data management and the ability to deploy solutions quickly and easily. Additionally, such projects are not only the realm of data scientists, but include a broader range of personas, or user roles, to realize solutions to business problems. These roles include business and data analysts, database administrators and information technology professionals, application and dashboard developers, as well as executives who sponsor key data science initiatives.
In this blog, we look at the capabilities that are important to individuals in these user roles.
Let’s start with the capabilities data scientists rely on to achieve enterprise goals. Today, Python, R, and SQL are powerful and popular data science languages, enabling data scientists to explore, visualize, and prepare data, and develop machine learning models. With rich open source ecosystems from Python and R, data scientists benefit from the thousands of packages and libraries that aid productivity, avoid reinventing standard techniques, and make available state-of-the-art techniques.
However, two of the challenges facing data scientists are scalability and performance. In enterprise settings, data volumes and enterprise deployment requirements may conflict with the benefits of open source software. Typically, open source packages are not designed with performance in mind and rely on data residing in-memory, which limits scalability. Solutions are needed that combine the best of open source with the scalability and performance of enterprise-level tools.
Increasingly, data scientists are looking for tools that automate traditionally manual and repetitive Machine Learning in Mining tasks. These can range from data preparation and feature selection to algorithm selection and model tuning. Automation doesn’t mean replacing data scientists, but making them more productive and free more time for solving additional business problems.
Collaboration is also an important capability as data scientists need to work with other enterprise users to realize data science solutions. Collaboration involves communication, but also environments that facilitate cooperative problem solving and provide easy and immediate access to work products like scripts, notebooks, models, and visualizations.
Since data is at the center of a data science solution, data scientists also require the ability to find, access, and integrate data across the enterprise. Platforms that provide data catalog functionality and ease of granting and managing data access can greatly benefit data science project outcomes.
Business and Data Analysts
Business and data analysts—once the key data analysis force in enterprises that relied mostly on deductive techniques involving spreadsheets, database queries, and business intelligence tools—are themselves expanding their analytical tool set with machine learning. One of the more recent developments enabling analysts is automated machine learning.
Automated machine learning is particularly valuable for analysts who may not yet have formally enhanced their skillset with machine learning methodologies and algorithms, but can apply their extensive domain knowledge to enterprise business problems. Automated machine learning—whether provided as a code-free user interface or programming interface—can deal with many of the machine learning algorithm-specific details, such as data preparation requirements, the algorithm(s) to be used, the predictors most suited to the algorithm, and how algorithm-specific parameters should be tuned to optimize predictive model performance.
Business and data analysts also benefit from broader collaboration, for example up taking scripts, prepared data, and models from data scientists to augment their work, or making their own results available to developers for inclusion in applications and dashboards.
DBAs and IT Professionals
Most enterprises today view data management as a key business function, normally relying on database administrators and information technology professionals to support the data, software, hardware—and now Cloud—needs of the enterprise. DBAs and IT professionals strive to maximize the value enterprises derive from their technology investment.
They must also provide other users, like data scientists, analysts, and developers, with the scalability and performance needed to address enterprise-scale business problems. The availability of integrated, streamlined, converged, and automated infrastructure not only facilitate data science projects and reduce enterprise costs and complexity, but also frees up DBA and IT resources to tackle projects often deferred due to time-consuming maintenance activities, e.g., software upgrade and patching, system backup, and recovering from failures.
Today, many DBAs are expanding their skills in the area of data science, taking advantage of machine learning integrated with database management systems, whether through SQL or integration with Python and R. Often, their extensive knowledge of database technology combined with data manipulation using SQL serves as an excellent foundation for contributing to enterprise data science projects.
In support of other enterprise users, DBAs and IT professionals need to manage the access to data in database and big data sources. Historically, and even in enterprises today, data access is provided ad hoc. Users request DBAs to provide data extracts—often as flat files—and these requests can take several iterations to get the “right” data. This wastes both the data users’ time, but also the that of the DBA. As data volumes have grown such approaches simply don’t scale, either due to human resource limitations or due to the shear volumes of data involved. Having tools that enable direct access to data, but with proper security and data life cycle controls can greatly aid DBAs and IT professionals.