-
Notifications
You must be signed in to change notification settings - Fork 9
Expand file tree
/
Copy pathindex.html
More file actions
executable file
·333 lines (310 loc) · 23.3 KB
/
index.html
File metadata and controls
executable file
·333 lines (310 loc) · 23.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
<!DOCTYPE html>
<html class="writer-html5" lang="en">
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Fast, Scalable and Easy Machine Learning With DAAL4PY — daal4py 2021.1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/css/theme.css" />
<link rel="stylesheet" type="text/css" href="_static/style.css" />
<!--[if lt IE 9]>
<script src="_static/js/html5shiv.min.js"></script>
<![endif]-->
<script src="_static/jquery.js"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/sphinx_highlight.js"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Input Data" href="data.html" />
<link rel="prev" title="Contents" href="contents.html" />
<script defer type="text/javascript" src="https://www.intel.com/content/dam/www/global/wap/performance-config.js" ></script>
<script type="text/javascript">
// Configure TMS settings
var wapLocalCode = 'us-en'; // Dynamically set per localized site, see mapping table for values
var wapSection = "scikit-learn"; // WAP team will give you a unique section for your site
// Load TMS
if(document.location.href.contains("intel.github.io/scikit-learn-intelex")){
(function () {
var url = 'https://www.intel.com/content/dam/www/global/wap/tms-loader.js'; // WAP file URL
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = url;
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
})();
}
</script>
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="contents.html" class="icon icon-home">
daal4py
</a>
<div class="version">
2021
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">About daal4py</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#designed-for-data-scientists-and-framework-designers">Designed for Data Scientists and Framework Designers</a></li>
<li class="toctree-l2"><a class="reference internal" href="#api-design-and-usage">API Design and usage</a></li>
<li class="toctree-l2"><a class="reference internal" href="#oneapi-and-gpu-support-in-daal4py">oneAPI and GPU support in daal4py</a></li>
<li class="toctree-l2"><a class="reference internal" href="#daal4py-s-design">Daal4py’s Design</a></li>
<li class="toctree-l2"><a class="reference internal" href="#built-for-performance">Built for Performance</a></li>
<li class="toctree-l2"><a class="reference internal" href="#getting-daal4py">Getting daal4py</a></li>
<li class="toctree-l2"><a class="reference internal" href="#overview">Overview</a></li>
<li class="toctree-l2"><a class="reference internal" href="#scikit-learn-api-and-patching">Scikit-Learn API and patching</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="data.html">Data</a></li>
<li class="toctree-l1"><a class="reference internal" href="model-builders.html">Model Builders</a></li>
<li class="toctree-l1"><a class="reference internal" href="algorithms.html">Supported Algorithms</a></li>
<li class="toctree-l1"><a class="reference internal" href="scaling.html">Distributed Mode</a></li>
<li class="toctree-l1"><a class="reference internal" href="streaming.html">Streaming Mode</a></li>
<li class="toctree-l1"><a class="reference internal" href="examples.html">Examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="sklearn.html">Scikit-Learn API</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="contents.html">daal4py</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="contents.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">Fast, Scalable and Easy Machine Learning With DAAL4PY</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/index.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="fast-scalable-and-easy-machine-learning-with-daal4py">
<span id="index"></span><h1>Fast, Scalable and Easy Machine Learning With DAAL4PY<a class="headerlink" href="#fast-scalable-and-easy-machine-learning-with-daal4py" title="Permalink to this heading"></a></h1>
<div class="admonition note" id="note">
<p class="admonition-title">Note</p>
<p>Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package, <a class="reference external" href="https://github.com/intel/scikit-learn-intelex">Intel(R) Extension for Scikit-learn*</a>.
All future patches will be available only in Intel(R) Extension for Scikit-learn*. Use the scikit-learn-intelex package instead of daal4py for the scikit-learn acceleration.</p>
</div>
<p>Daal4py makes your Machine Learning algorithms in Python lightning fast and easy to use. It provides
highly configurable Machine Learning kernels, some of which support streaming input data and/or can
be easily and efficiently scaled out to clusters of workstations. Internally it uses Intel(R)
oneAPI Data Analytics Library to deliver the best performance.</p>
<section id="designed-for-data-scientists-and-framework-designers">
<h2>Designed for Data Scientists and Framework Designers<a class="headerlink" href="#designed-for-data-scientists-and-framework-designers" title="Permalink to this heading"></a></h2>
<p>daal4py was created to give data scientists the easiest way to utilize Intel(R) oneAPI Data Analytics
Library powerful machine learning building blocks directly in a high-productivity manner. A
simplified API gives high-level abstractions to the user with minimal boilerplate, allowing for
quick to write and easy to maintain code when utilizing Jupyter Notebooks. For scaling capabilities,
daal4py also provides the ability to do distributed machine learning, giving a quick way to scale
out. Its streaming mode provides a flexible mechanism for processing large amounts of data and/or
non-contiguous input data.</p>
<p>For framework designers, daal4py has been fashioned to be built under other frameworks from both an
API and feature perspective. The machine learning models split the training and inference classes,
allowing the model to be exported and serialized if desired. This design also gives the flexibility
to work directly with the model and associated primitives, allowing one to customize the behavior of
the model itself. The daal4py package can be built with customized algorithm loadouts, allowing for
a smaller footprint of dependencies when necessary.</p>
</section>
<section id="api-design-and-usage">
<h2>API Design and usage<a class="headerlink" href="#api-design-and-usage" title="Permalink to this heading"></a></h2>
<p>As an example of the type of API that would be used in a data science context,
the linear regression workflow is showcased below:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">daal4py</span> <span class="k">as</span> <span class="nn">d4p</span>
<span class="c1"># train, test, and target are Pandas dataframes</span>
<span class="n">d4p_lm</span> <span class="o">=</span> <span class="n">d4p</span><span class="o">.</span><span class="n">linear_regression_training</span><span class="p">(</span><span class="n">interceptFlag</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">lm_trained</span> <span class="o">=</span> <span class="n">d4p_lm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">train</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
<span class="n">lm_predictor_component</span> <span class="o">=</span> <span class="n">d4p</span><span class="o">.</span><span class="n">linear_regression_prediction</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">lm_predictor_component</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">test</span><span class="p">,</span> <span class="n">lm_trained</span><span class="o">.</span><span class="n">model</span><span class="p">)</span>
</pre></div>
</div>
<p>In the example above, it can be seen that model is divided into training and
prediction. This gives flexibility when writing custom grid searches and custom
functions that modify model behavior or use it as a parameter. Daal4py also
allows for direct usage of NumPy arrays and pandas DataFrames instead of oneDAL
NumericTables, which allow for better integration with the pandas/NumPy/SciPy stack.</p>
<p>Daal4py machine learning algorithms are constructed with a rich set of
parameters. Assuming we want to find the initial set of centroids for kmeans,
we first create an algorithm and configure it for 10 clusters using the ‘PlusPlus’ method:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">kmi</span> <span class="o">=</span> <span class="n">kmeans_init</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">"plusPlusDense"</span><span class="p">)</span>
</pre></div>
</div>
<p>Assuming we have all our data in a CSV file we can now call it:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">kmi</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="s1">'data.csv'</span><span class="p">)</span>
</pre></div>
</div>
<p>Our result will hold the computed centroids in the ‘centroids’ attribute:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">result</span><span class="o">.</span><span class="n">centroids</span><span class="p">)</span>
</pre></div>
</div>
<p>The full example could look like this:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">daal4py</span> <span class="kn">import</span> <span class="n">kmeans_init</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">kmeans_init</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">"plusPlusDense"</span><span class="p">)</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="s1">'data.csv'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">result</span><span class="o">.</span><span class="n">centroids</span><span class="p">)</span>
</pre></div>
</div>
<p>One can even <a class="reference internal" href="scaling.html#distributed"><span class="std std-ref">run this on a cluster</span></a> by simply
adding initializing/finalizing the network and adding a keyword-parameter:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">daal4py</span> <span class="kn">import</span> <span class="n">daalinit</span><span class="p">,</span> <span class="n">daalfini</span><span class="p">,</span> <span class="n">kmeans_init</span>
<span class="n">daalinit</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">kmeans_init</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">"plusPlusDense"</span><span class="p">,</span> <span class="n">distributed</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">my_file</span><span class="p">)</span>
<span class="n">daalfini</span><span class="p">()</span>
</pre></div>
</div>
<p>Last but not least, daal4py allows <a class="reference internal" href="streaming.html#streaming"><span class="std std-ref">getting input data from streams</span></a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">daal4py</span> <span class="kn">import</span> <span class="n">svd</span>
<span class="n">algo</span> <span class="o">=</span> <span class="n">svd</span><span class="p">(</span><span class="n">streaming</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">for</span> <span class="nb">input</span> <span class="ow">in</span> <span class="n">stream_or_filelist</span><span class="p">:</span>
<span class="n">algo</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">algo</span><span class="o">.</span><span class="n">finalize</span><span class="p">()</span>
</pre></div>
</div>
</section>
<section id="oneapi-and-gpu-support-in-daal4py">
<h2>oneAPI and GPU support in daal4py<a class="headerlink" href="#oneapi-and-gpu-support-in-daal4py" title="Permalink to this heading"></a></h2>
<p>daal4py oneAPI and GPU support is deprecated. Use <a class="reference external" href="https://intel.github.io/scikit-learn-intelex/latest/oneapi-gpu.html#">scikit-learn-intelex</a>
instead.</p>
</section>
<section id="daal4py-s-design">
<h2>Daal4py’s Design<a class="headerlink" href="#daal4py-s-design" title="Permalink to this heading"></a></h2>
<p>The design of daal4py utilizes several different technologies to deliver Intel(R) oneAPI Data
Analytics Library performance in a flexible design to Data Scientists and Framework designers. The
package uses Jinja templates to generate Cython-wrapped oneDAL C++ headers, with Cython as a bridge
between the generated oneDAL code and the Python layer. This design allows for quicker development
cycles and acts as a reference design to those looking to tailor their build of daal4py. Cython
also allows for good Python behavior, both for compatibility to different frameworks and for
pickling and serialization.</p>
</section>
<section id="built-for-performance">
<h2>Built for Performance<a class="headerlink" href="#built-for-performance" title="Permalink to this heading"></a></h2>
<p>Besides superior (e.g. close to native C++ Intel(R) oneAPI Data Analytics Library) performance on a
single node, the distribution mechanics of daal4py provides excellent strong and weak scaling. It
nicely handles distributing a fixed input size on increasing clusters sizes (strong scaling: orange)
which addresses possible response time requirements. It also scales with growing input size (weak
scaling: yellow) which is needed if the data no longer fits into memory of a single node.</p>
<figure class="align-default" id="id1">
<img alt="_images/d4p-linreg-scale.jpg" src="_images/d4p-linreg-scale.jpg" />
<figcaption>
<p><span class="caption-text">On a 32-node cluster (1280 cores) daal4py computed linear regression
of 2.15 TB of data in 1.18 seconds and 68.66 GB of data in less than
48 milliseconds.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
</figcaption>
</figure>
<figure class="align-default" id="id2">
<img alt="_images/d4p-kmeans-scale.jpg" src="_images/d4p-kmeans-scale.jpg" />
<figcaption>
<p><span class="caption-text">On a 32-node cluster (1280 cores) daal4py computed K-Means (10
clusters) of 1.12 TB of data in 107.4 seconds and 35.76 GB of data
in 4.8 seconds.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
</figcaption>
</figure>
<p>Configuration: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, EIST/Turbo on 2
sockets, 20 cores per socket, 192 GB RAM, 16 nodes connected with Infiniband,
Oracle Linux Server release 7.4, using 64-bit floating point numbers</p>
</section>
<section id="getting-daal4py">
<h2>Getting daal4py<a class="headerlink" href="#getting-daal4py" title="Permalink to this heading"></a></h2>
<p>daal4py is available at the <a class="reference external" href="https://pypi.org/project/daal4py/">Python Package Index</a>,
on Anaconda Cloud in <a class="reference external" href="https://anaconda.org/conda-forge/daal4py">Conda Forge channel</a>
and in <a class="reference external" href="https://anaconda.org/intel/daal4py">Intel channel</a>.
Sources and build instructions are available in
<a class="reference external" href="https://github.com/intel/scikit-learn-intelex/tree/main/daal4py">daal4py repository</a>.</p>
<p>The daal4py package is available via same distribution channels and platforms as scikit-learn-intelex.
See
<cite>scikit-learn-intelex requirements <https://intel.github.io/scikit-learn-intelex/latest/system-requirements.html></cite> _</p>
<ul>
<li><p>Install from PyPI:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">daal4py</span>
</pre></div>
</div>
</li>
<li><p>Install from Anaconda Cloud: Conda-Forge channel:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">сonda</span> <span class="n">install</span> <span class="n">daal4py</span> <span class="o">-</span><span class="n">c</span> <span class="n">conda</span><span class="o">-</span><span class="n">forge</span>
</pre></div>
</div>
</li>
<li><p>Install using conda from the Intel repository:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">conda</span> <span class="n">install</span> <span class="n">daal4py</span> <span class="o">-</span><span class="n">c</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">software</span><span class="o">.</span><span class="n">repos</span><span class="o">.</span><span class="n">intel</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">python</span><span class="o">/</span><span class="n">conda</span><span class="o">/</span>
</pre></div>
</div>
</li>
</ul>
<p>We recommend to use <strong>PyPi</strong>. If you are using Intel® Distribution for Python,
we recommend using <strong>conda from the Intel Repository</strong>.
In other cases, use <strong>Anaconda Cloud: conda-forge channel</strong>.</p>
</section>
<section id="overview">
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this heading"></a></h2>
<p>All algorithms in daal4py work the same way:</p>
<ol class="arabic simple">
<li><p>Instantiate and parameterize</p></li>
<li><p>Run/compute on input data</p></li>
</ol>
<p>The below tables list the accepted arguments. Those with no default (None) are
required arguments. All other arguments with defaults are optional and can be
provided as keyword arguments (like <code class="docutils literal notranslate"><span class="pre">optarg=77</span></code>). Each algorithm returns a
class-like object with properties as its result.</p>
<p>For algorithms with training and prediction, simply extract the <code class="docutils literal notranslate"><span class="pre">model</span></code>
property from the result returned by the training and pass it in as the (second)
input argument.</p>
<p>Note that all input objects and the result/model properties are native types,
e.g. standard types (integer, float, Numpy arrays, Pandas DataFrames,
…). Additionally, if you provide the name of a csv-file as an input argument
daal4py will work on the entire file content.</p>
</section>
<section id="scikit-learn-api-and-patching">
<h2>Scikit-Learn API and patching<a class="headerlink" href="#scikit-learn-api-and-patching" title="Permalink to this heading"></a></h2>
<div class="admonition tip">
<p class="admonition-title">Tip</p>
<p>We recommend using
the ‘scikit-learn-intelex package patching <<a class="reference external" href="https://intel.github.io/scikit-learn-intelex/latest/what-is-patching.html">https://intel.github.io/scikit-learn-intelex/latest/what-is-patching.html</a>>’ _ for the scikit-learn patching.</p>
</div>
<p>daal4py exposes some oneDAL solvers using a scikit-learn compatible API.</p>
<p>daal4py can furthermore monkey-patch the <code class="docutils literal notranslate"><span class="pre">sklearn</span></code> package to use the DAAL
solvers as drop-in replacement without any code change.</p>
<p>Please refer to the section on <a class="reference internal" href="sklearn.html#sklearn"><span class="std std-ref">scikit-learn API and patching</span></a>
for more details.</p>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="contents.html" class="btn btn-neutral float-left" title="Contents" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="data.html" class="btn btn-neutral float-right" title="Input Data" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>© Copyright Intel.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>