forked from PolMine/RcppCWB
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
353 lines (331 loc) · 43.4 KB
/
index.html
File metadata and controls
353 lines (331 loc) · 43.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>Changelog • RcppCWB</title><!-- jquery --><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js" integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=" crossorigin="anonymous"></script><!-- Bootstrap --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.4.1/css/bootstrap.min.css" integrity="sha256-bZLfwXAP04zRMK2BjiO8iu9pf4FbLqX6zitd+tIvLhE=" crossorigin="anonymous"><script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.4.1/js/bootstrap.min.js" integrity="sha256-nuL8/2cJ5NDSSwnKD8VqreErSWHtnEP9E7AySL+1ev4=" crossorigin="anonymous"></script><!-- bootstrap-toc --><link rel="stylesheet" href="../bootstrap-toc.css"><script src="../bootstrap-toc.js"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css" integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk=" crossorigin="anonymous"><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css" integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw=" crossorigin="anonymous"><!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><!-- headroom.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js" integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js" integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4=" crossorigin="anonymous"></script><!-- pkgdown --><link href="../pkgdown.css" rel="stylesheet"><script src="../pkgdown.js"></script><meta property="og:title" content="Changelog"><!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]--></head><body data-spy="scroll" data-target="#toc">
<div class="container template-news">
<header><div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<span class="navbar-brand">
<a class="navbar-link" href="../index.html">RcppCWB</a>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="">0.6.0</span>
</span>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav"><li>
<a href="../reference/index.html">Reference</a>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" data-bs-toggle="dropdown" aria-expanded="false">
Articles
<span class="caret"></span>
</a>
<ul class="dropdown-menu" role="menu"><li>
<a href="../articles/vignette.html">Writing performance code with RcppCWB</a>
</li>
</ul></li>
<li>
<a href="../news/index.html">Changelog</a>
</li>
</ul><ul class="nav navbar-nav navbar-right"><li>
<a href="https://github.com/PolMine/RcppCWB/" class="external-link">
<span class="fab fa-github fa-lg"></span>
</a>
</li>
</ul></div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
</header><div class="row">
<div class="col-md-9 contents">
<div class="page-header">
<h1 data-toc-skip>Changelog <small></small></h1>
<small>Source: <a href="https://github.com/PolMine/RcppCWB/blob/HEAD/NEWS.md" class="external-link"><code>NEWS.md</code></a></small>
</div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.5.5.9001" id="rcppcwb-0559001-9003">RcppCWB 0.5.5.9001-.9003<a class="anchor" aria-label="anchor" href="#rcppcwb-0559001-9003"></a></h2>
<ul><li>Rcpp wrappers for Corpus Library (CL) functions are exposed directly and<br>
can be used in C++ functions imported using <code><a href="https://rdrr.io/pkg/Rcpp/man/sourceCpp.html" class="external-link">Rcpp::sourceCpp()</a></code> or <code><a href="https://rdrr.io/pkg/Rcpp/man/cppFunction.html" class="external-link">Rcpp::cppFunction()</a></code>.</li>
<li>Dependency PCRE has been updated to PCRE2 <a href="https://github.com/PolMine/RcppCWB/issues/68" class="external-link">#68</a>.</li>
<li>The README suggested to install the development version of RcppCWB using the snippet <code>devtools::install_github("PolMine/RcppCWB")</code>. The missing <code>ref = "dev"</code> has been inserted.</li>
<li>
<code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> crashed if arguments <code>data_dir</code> and <code>vrt_dir</code> include a tilde. Tilde expansion is now applied to these arguments to avoid this <a href="https://github.com/PolMine/RcppCWB/issues/73" class="external-link">#73</a>.</li>
<li>A new vignette explains how to write C++ inline functions.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.5.5" id="rcppcwb-055">RcppCWB 0.5.5<small>2023-01-25</small><a class="anchor" aria-label="anchor" href="#rcppcwb-055"></a></h2>
<ul><li>C++ code replaces <code><a href="https://rdrr.io/r/base/sprintf.html" class="external-link">sprintf()</a></code> with <code>snprintf()</code> to address security issue.</li>
<li>Package now depends on Rcpp v1.0.10, which replaces one remaining <code><a href="https://rdrr.io/r/base/sprintf.html" class="external-link">sprintf()</a></code> <a href="https://github.com/PolMine/RcppCWB/issues/70" class="external-link">#70</a>.</li>
<li>
<code><a href="../reference/registry_info.html">corpus_properties()</a></code> and <code><a href="../reference/registry_info.html">corpus_property()</a></code> do not crash any more, if corpus is not loaded or not present <a href="https://github.com/PolMine/RcppCWB/issues/69" class="external-link">#69</a>.</li>
<li>New function <code><a href="../reference/p_attr_default.html">p_attr_default()</a></code> to programmatically extract default p-attribute <a href="https://github.com/PolMine/RcppCWB/issues/63" class="external-link">#63</a>.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.5.4" id="rcppcwb-054">RcppCWB 0.5.4<small>2022-08-30</small><a class="anchor" aria-label="anchor" href="#rcppcwb-054"></a></h2>
<ul><li>Fixed package configuration that prevented that compiler is used for compiling CWB C scripts as intended <a href="https://github.com/PolMine/RcppCWB/issues/66" class="external-link">#66</a>.</li>
<li>Adding ‘-luuid’ to PKG_FLAGS in Makevars solves linker issue FOLDERID_ <a href="https://github.com/PolMine/RcppCWB/issues/67" class="external-link">#67</a>.</li>
<li>GitHub Actions now working for Windows <a href="https://github.com/PolMine/RcppCWB/issues/47" class="external-link">#47</a>.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.5.3" id="rcppcwb-053">RcppCWB 0.5.3<small>2022-05-19</small><a class="anchor" aria-label="anchor" href="#rcppcwb-053"></a></h2>
<ul><li>Fixed a bug in the <code>region_matrix_corpus()</code> C++ code that would not show any context at all if s_attribute expansion transgressed start or end of corpus.</li>
<li>Fixed a bug in the <code>region_matrix_corpus()</code> C++ code that would result from not considering that query matches may go cover more than one strucs of a structural attribute.</li>
<li>
<code><a href="../reference/registry_info.html">corpus_info_file()</a></code> does not crash if INFO is not defined in the registry file (<a href="https://github.com/PolMine/RcppCWB/issues/62" class="external-link">#62</a>).</li>
<li>Implicit processing of arguments <code>sAttribute</code> and <code>pAttribute</code> as <code>s_attribute</code> or <code>p_attribute</code> respectively is now accompanied by a warning that arguments are deprectated.</li>
<li>The <code><a href="../reference/checks.html">check_corpus()</a></code> function distinguishes between whether a corpus is loaded in the CL and/or CQP context.</li>
<li>
<code><a href="../reference/cwb_utils.html">cwb_huffcode()</a></code> and <code><a href="../reference/cwb_utils.html">cwb_compress_rdx()</a></code> have argument <code>delete</code> to trigger deleting redundant files after compression (<a href="https://github.com/PolMine/RcppCWB/issues/60" class="external-link">#60</a>).</li>
<li>
<code>cqp_load_corpus</code> will internally upper corpus ID as required in the CQP context (<a href="https://github.com/PolMine/RcppCWB/issues/64" class="external-link">#64</a>).</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.5.2" id="rcppcwb-052">RcppCWB 0.5.2<small>2022-03-29</small><a class="anchor" aria-label="anchor" href="#rcppcwb-052"></a></h2>
<ul><li>The example for <code><a href="../reference/registry_info.html">corpus_data_dir()</a></code> dir not work as intended without explicitly setting the <code>registry</code> argument. Fixed.</li>
<li>New functions <code><a href="../reference/registry_info.html">corpus_info_file()</a></code>, <code><a href="../reference/registry_info.html">corpus_full_name()</a></code>, <code><a href="../reference/registry_info.html">corpus_p_attributes()</a></code>, <code><a href="../reference/registry_info.html">corpus_s_attributes()</a></code>, <code><a href="../reference/registry_info.html">corpus_properties()</a></code> and <code><a href="../reference/registry_info.html">corpus_property()</a></code> to retrieve registry file data.</li>
<li>New function <code><a href="../reference/registry_info.html">corpus_registry_dir()</a></code>.</li>
<li>The path to the info file in the registry file of the REUTERS corpus was broken. Fixed.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.5.1" id="rcppcwb-051">RcppCWB 0.5.1<small>2022-03-06</small><a class="anchor" aria-label="anchor" href="#rcppcwb-051"></a></h2>
<div class="section level3">
<h3 id="new-features-0-5-1">New Features<a class="anchor" aria-label="anchor" href="#new-features-0-5-1"></a></h3>
<ul><li>New auxiliary function <code><a href="../reference/cwb_charsets.html">cwb_charsets()</a></code> reports the charsets supported by CWB.</li>
<li>New functions <code><a href="../reference/cl_load_corpus.html">cl_load_corpus()</a></code> and <code><a href="../reference/cqp_initialize.html">cqp_load_corpus()</a></code> do what the functions suggests.</li>
<li>New function <code><a href="../reference/cl_list_corpora.html">cl_list_corpora()</a></code> complements existing function <code><a href="../reference/cqp_list_corpora.html">cqp_list_corpora()</a></code> for the CL context.</li>
<li>New arguments <code>skip_blank_lines</code>, <code>strip_whitespace</code> and <code>xml</code> of <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> open configuration options of <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code>, overcoming the previously hard-coded equivalent to the command-line option “-xsB”.(<a href="https://github.com/PolMine/RcppCWB/issues/38" class="external-link">#38</a>)</li>
<li>Unexported functions <code>.cpos_to_id()</code>, <code>.cl_find_corpus()</code> and <code>.cl_new_attribute()</code> are an entry to passing around pointers, rather than re-creating objects whenever switching from R to C.</li>
<li>Functions <code>.s_attr()</code> and <code>.p_attr()</code> return pointers for a s- or p-attribute.</li>
<li>Functions <code>cl_*</code> are now available with pointer as input (e.g. <code><a href="../reference/cl_rework.html">cpos_to_id()</a></code>).</li>
<li>The CORPUS_REGISTRY environment variable is not set to the temporary registry, to avoid often confusing behavior and collissions whent loading RcppCWB and polmineR at the same time (<a href="https://github.com/PolMine/RcppCWB/issues/13" class="external-link">#13</a>).</li>
<li>The <code><a href="../reference/cqp_query.html">cqp_drop_subcorpus()</a></code> function that has been disabled temporarily is usable again (<a href="https://github.com/PolMine/RcppCWB/issues/34" class="external-link">#34</a>).</li>
<li>
<code><a href="../reference/cqp_query.html">cqp_query()</a></code> is now able to process subcorpora.</li>
<li>
<code>RcppCWB:::.cqp_subcropus()</code> will construct a subcorpus from a region matrix.</li>
<li>The <code><a href="../reference/checks.html">check_corpus()</a></code> does not re-set the registry directory and more, but tries to load the checked corpus if it has not yet been loaded.</li>
<li>A new function <code><a href="../reference/xml.html">s_attr_relationship()</a></code> will detect whether two s-attributes are siblings, or in a descendent or ancestor relationship.</li>
<li>Functions <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code>, <code><a href="../reference/cwb_utils.html">cwb_huffcode()</a></code>, <code><a href="../reference/cwb_utils.html">cwb_makeall()</a></code> and <code><a href="../reference/cwb_utils.html">cwb_compress_rdx()</a></code> now have an argument <code>quietly</code> to control display of output messages. <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> has an argument <code>verbose</code> to control whether counter on the number of tokens processed is dislpayed.</li>
</ul></div>
<div class="section level3">
<h3 id="minor-improvements-0-5-1">Minor improvements<a class="anchor" aria-label="anchor" href="#minor-improvements-0-5-1"></a></h3>
<ul><li>Difficulties of <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> to digest variations of path statements between macOS and Windows are addressed using a reliable normalization of paths with <code><a href="https://fs.r-lib.org/reference/path.html" class="external-link">fs::path()</a></code> (<a href="https://github.com/PolMine/RcppCWB/issues/48" class="external-link">#48</a>).</li>
<li>Argument <code>encoding</code> is checked for the validity of the encoding passed in (<a href="https://github.com/PolMine/RcppCWB/issues/34" class="external-link">#34</a>).</li>
<li>A patch introducing a sanity check omits ‘stringop-overflow’ compiler warning thrown by file cl/cdaccess.c on Windows (<a href="https://github.com/PolMine/RcppCWB/issues/45" class="external-link">#45</a>).</li>
<li>An update of Xcode command line developer tools includes flex 2.6.4 Apple(flex-34), and this is the version used not, resulting and extensive code changes in cl/lex.creg.c and cqp/lex.yy.c, yet without causing new errors or changing the functionality.</li>
<li>
<code><a href="../reference/checks.html">check_cpos()</a></code> issues a warning if argument <code>cpos</code> is <code>NULL</code> (<a href="https://github.com/PolMine/RcppCWB/issues/21" class="external-link">#21</a>).</li>
<li>Functions <code><a href="../reference/p_attributes.html">cl_cpos2id()</a></code>, <code><a href="../reference/s_attributes.html">cl_cpos2lbound()</a></code>, <code><a href="../reference/s_attributes.html">cl_cpos2rbound()</a></code>, <code><a href="../reference/p_attributes.html">cl_cpos2str()</a></code> and <code>cl_cpo2struc()</code> will return an empty, zero-length integer vector if argument <code>cpos</code> is <code>NULL</code> (<a href="https://github.com/PolMine/RcppCWB/issues/21" class="external-link">#21</a>).</li>
<li>Warnings issued by <code><a href="../reference/checks.html">check_corpus()</a></code> (used internally by many functions) resulted from slightly differing representations of otherwise identical paths. Using <code><a href="https://fs.r-lib.org/reference/path.html" class="external-link">fs::path()</a></code> for path for normalization internally will omit misleading warning messages.</li>
<li>
<code><a href="../reference/cqp_initialize.html">cqp_get_registry()</a></code> will now return a <code><a href="https://fs.r-lib.org/reference/path.html" class="external-link">fs::path</a></code> object, as a safeguard for a consistent normalization of paths.</li>
<li>Function <code><a href="../reference/cl_delete_corpus.html">cl_delete_corpus()</a></code> will now (visibly) return a <code>logial</code> value.</li>
<li>The check for the availability of ncurses is omitted in the configure file and the editline subdirectory of src/cwb is included in .Rbuildignore to minimize the size of the tarball. The ncurses library is a dependency of editline, but editline is not built in the context of this package (<a href="https://github.com/PolMine/RcppCWB/issues/26" class="external-link">#26</a>).</li>
<li>
<code><a href="../reference/cqp_initialize.html">cqp_load_corpus()</a></code> will return <code>FALSE</code> if corpus has not been loaded successfully.</li>
<li>Disaggregated <code>wrappers.cpp</code> into <code>cl.cpp</code>, <code>cqp.cpp</code> and <code>utils.cpp</code>, so that the code is organized more coherently corresponding to the different logics.</li>
<li>Function <code>check_cqp_query()</code> renamed to <code><a href="../reference/checks.html">check_query()</a></code> to avoid a conflict with a function defined in the polmineR package.</li>
<li>
<code><a href="../reference/cqp_query.html">cqp_list_subcorpora()</a></code> returns a <code>character</code> vector. Previously, we just had obscure printed messages.</li>
<li>
<code><a href="../reference/s_attribute_decode.html">s_attribute_decode()</a></code> will not break if s-attribute has no values (<a href="https://github.com/PolMine/RcppCWB/issues/54" class="external-link">#54</a>).</li>
<li>Functions <code><a href="../reference/s_attributes.html">cl_struc2str()</a></code> and <code><a href="../reference/s_attributes.html">cl_struc2cpos()</a></code> may now include negative values, the vectors returned will have <code>NA</code> values at respective positions. The check against negative values in <code>check_strucs</code> is dropped accordingly.</li>
</ul></div>
<div class="section level3">
<h3 id="bux-fixes-0-5-1">Bux fixes<a class="anchor" aria-label="anchor" href="#bux-fixes-0-5-1"></a></h3>
<ul><li>The <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> function did not declare structural attributes in the registry and mistakenly channeled output for the file to the terminal (<a href="https://github.com/PolMine/RcppCWB/issues/49" class="external-link">#49</a>). Fixed.</li>
<li>Re-running <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> did not reset global variables, which resulted in a set of errors. Solved. (<a href="https://github.com/PolMine/RcppCWB/issues/51" class="external-link">#51</a>)</li>
</ul></div>
</div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.5.0" id="rcppcwb-050">RcppCWB 0.5.0<small>2022-02-01</small><a class="anchor" aria-label="anchor" href="#rcppcwb-050"></a></h2>
<div class="section level3">
<h3 id="new-features-0-5-0">New Features<a class="anchor" aria-label="anchor" href="#new-features-0-5-0"></a></h3>
<ul><li>The CWB code is updated to v3.4.33 / r1690 (<a href="https://github.com/PolMine/RcppCWB/issues/29" class="external-link">#29</a>). Automated patches that have been developed are a safeguard that it will be painless in the future to align RcppCWB with upstream CWB development.</li>
<li>The C code in the files <code>cwb-huffcode.c</code>, <code>cwb-compress-rdx.c</code> and <code>cwb-makeall.c</code> was not in line with the CWB version of the rest of the code (v3.4.14 / SVN revision 1069) but rather v2.2.b99 or v3.0.0. All code changes up to v3.4.14 were reconstructed and implemented (<a href="https://github.com/PolMine/RcppCWB/issues/35" class="external-link">#35</a>). Note that <code>cwb-encode.c</code> was at CWB v3.4.14, as the encoding functionality was exposed at a later stage.</li>
<li>A new function <code><a href="../reference/cwb_version.html">cwb_version()</a></code> will report the version of the CWB source code.</li>
<li>The <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> function now has a previously missing argument <code>encoding</code> to state the encoding of the corpus to be indexed.</li>
<li>Reduced number of example *.vrt-files to one to keep package size below 5GB.</li>
</ul></div>
<div class="section level3">
<h3 id="minor-improvements-0-5-0">Minor Improvements<a class="anchor" aria-label="anchor" href="#minor-improvements-0-5-0"></a></h3>
<ul><li>Encoding a cropus using <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> now assumes implicitly that input files are XML files and remove blank lines and leading and trailing whitespace. This is equivalent to the option “-xsB” of the command line utility <code>cwb-encode</code>.</li>
<li>The C++ code of <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> is now a patch of the <code>main()</code> function of <code>cwb-encode.c</code>, so that code in the *.cpp file can be limited to a slim wrapper, limiting the risk that the code in RcppCWB looses touch with CWB upstream development.</li>
<li>Header files <code>_eval.h</code>, <code>_globalvars.h</code> and <code>_cl.h</code> in the <code>./src</code> directory are autogenerated files now, not to be edited by hand.</li>
<li>The C++ code of the <code><a href="../reference/cqp_query.html">cqp_drop_subcorpus()</a></code> function is temporarily disabled to ensure that the package can be built (<a href="https://github.com/PolMine/RcppCWB/issues/34" class="external-link">#34</a>).</li>
</ul></div>
</div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.4.4" id="rcppcwb-044">RcppCWB 0.4.4<small>2021-12-12</small><a class="anchor" aria-label="anchor" href="#rcppcwb-044"></a></h2>
<ul><li>Fixed a mishandling of paths on Windows in <code><a href="../reference/checks.html">check_corpus()</a></code> that would trigger resetting the registry unintendendly and potentially falsely.</li>
<li>To avoid a compiler warning (unused variable) issued by Rcpp solved by Rcpp v1.0.7, this version of Rcpp is now required (<a href="https://github.com/PolMine/RcppCWB/issues/22" class="external-link">#22</a>).</li>
<li>In <code>use_tmp_dir()</code>, <code><a href="https://rdrr.io/r/base/normalizePath.html" class="external-link">normalizePath()</a></code> is applied on the <code><a href="https://rdrr.io/r/base/tempfile.html" class="external-link">tempdir()</a></code> result to avoid confusion with symbolic links on macOS.</li>
<li>New unit test for <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> (not yet run on Windows).</li>
<li>A C-level inconsistency in <code><a href="../reference/cqp_initialize.html">cqp_get_registry()</a></code> that would sometimes result in a wrong return value (i.e. registry path) has been fixed (<a href="https://github.com/PolMine/RcppCWB/issues/14" class="external-link">#14</a>).</li>
<li>To avoid an unintended behavior of <code><a href="../reference/cwb_utils.html">cwb_makeall()</a></code>, an internal check is performed whether the corpus has been loaded already and whether the home directory of the loaded corpus and defined in the registry file are identical (<a href="https://github.com/PolMine/RcppCWB/issues/31" class="external-link">#31</a>).</li>
<li>The link to the TXM project has been removed from the documentation to avoid the error ‘SSL certificate problem: unable to get local issuer certificate’ (<a href="https://github.com/PolMine/RcppCWB/issues/32" class="external-link">#32</a>).</li>
<li>The <code><a href="../reference/cl_delete_corpus.html">cl_delete_corpus()</a></code> function crashed when trying to delete a corpus that has not been loaded (<a href="https://github.com/PolMine/RcppCWB/issues/33" class="external-link">#33</a>). The function now aborts gracefully returning 0 when trying to delete a corpus that has not been loaded.</li>
<li>A new function <code><a href="../reference/corpus_is_loaded.html">corpus_is_loaded()</a></code> can be used to check whether a corpus is loaded.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.4.3" id="rcppcwb-043">RcppCWB 0.4.3<small>2021-07-20</small><a class="anchor" aria-label="anchor" href="#rcppcwb-043"></a></h2>
<ul><li>Unused file ’_options.h’ removed from src/cwb/cl/cqp</li>
<li>Targets ‘lex.creg.c’, ‘registry.tab.c’ and ‘registry.tab.h’ removed from cl/Makefile to avoid an unwanted call of flex which is not necessarily present (<a href="https://github.com/PolMine/RcppCWB/issues/30" class="external-link">#30</a>).</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.4.2" id="rcppcwb-042">RcppCWB 0.4.2<small>2021-07-13</small><a class="anchor" aria-label="anchor" href="#rcppcwb-042"></a></h2>
<ul><li>Windows builds will be linked with a fresh and fully reproducible cross-compilation of CWB static libraries, see the PolMine/libcl repository. The consolidation of the workflow to prepare cross-compiled static libraries is a preparatory step to enable UCRT builds on Windows.</li>
<li>The Range struc in the code for util functionality (encode and more, files utils.h, utils.cpp and _cwb_encode.c) has been renamed as SAttrEncoder to avoid a C++ One Definition Rule warning resulting for a struc with the same name in the CL context (<a href="https://github.com/PolMine/RcppCWB/issues/28" class="external-link">#28</a>).</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.4.1" id="rcppcwb-041">RcppCWB 0.4.1<a class="anchor" aria-label="anchor" href="#rcppcwb-041"></a></h2>
<ul><li>A shortcoming when passing in variables into the format string to construct the PKG_LIBS variable resulted in a faulty call of the linker on Solaris and a compilation error. Fixed (<a href="https://github.com/PolMine/RcppCWB/issues/25" class="external-link">#25</a>).</li>
<li>A hacky and recently unnecessary LDFLAG “-Wl,–allow-multiple-definition” on Solaris has been dropped.</li>
<li>Usage and evaluation of the pcretest utility is now in line with POSIX requirements, omitting an error on Solaris. A statement on the availability of the tool provides information whether it is available at all (<a href="https://github.com/PolMine/RcppCWB/issues/24" class="external-link">#24</a>).</li>
<li>The message on the findability of ncurses is more telling now, avoiding a “mission critial”-style alarm when ncurses may be present but is not findable by pkg-config (<a href="https://github.com/PolMine/RcppCWB/issues/26" class="external-link">#26</a>).</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.4.0" id="rcppcwb-040">RcppCWB 0.4.0<small>2021-06-25</small><a class="anchor" aria-label="anchor" href="#rcppcwb-040"></a></h2>
<div class="section level3">
<h3 id="new-features-0-4-0">New Features<a class="anchor" aria-label="anchor" href="#new-features-0-4-0"></a></h3>
<ul><li>Encode XML (vrt file format) with new function <code><a href="../reference/cwb_utils.html">cwb_encode()</a></code> that exposes functionality of cwb-encode CWB utility.</li>
<li>Functions <code><a href="../reference/s_attributes.html">cl_cpos2lbound()</a></code> and <code><a href="../reference/s_attributes.html">cl_cpos2rbound()</a></code> will now accept an integer vector with length > 1 as argument <code>cpos</code> and return a vector with the same length. Useful to speed up iterated queries for left and right boundaries of regions (<a href="https://github.com/PolMine/RcppCWB/issues/19" class="external-link">#19</a>).</li>
<li>A new function <code><a href="../reference/cl_struc_values.html">cl_struc_values()</a></code> exposes the corresponding C function of the Corpus Library (CL). The previous implicit assumption that all structural attributes have values can thus be tested. Intended to work with annotations of sentences and paragraphs, i.e. common structural attributes that do usually not have values.</li>
<li>A new function <code><a href="../reference/registry_info.html">corpus_data_dir()</a></code> will derive the data directory from the internal C representation of a corpus.</li>
<li>New function <code><a href="../reference/s_attr_regions.html">s_attr_regions()</a></code> will derive regions defined by a structural attribute from the *.rng file. Fastest option for large corpora.</li>
<li>New functions <code><a href="../reference/xml.html">s_attr_is_sibling()</a></code> and <code><a href="../reference/xml.html">s_attr_is_descendent()</a></code> test the sibling/descendent relationship of structural attributes.</li>
</ul></div>
<div class="section level3">
<h3 id="minor-improvements-0-4-0">Minor Improvements<a class="anchor" aria-label="anchor" href="#minor-improvements-0-4-0"></a></h3>
<ul><li>Function <code><a href="../reference/checks.html">check_corpus()</a></code> now includes checks whether the registry provided (argument <code>registry</code>) is identical with the registry defined internally by CQP. The registry is reset if directories are not identical.</li>
<li>Minor adjustments of configure script for aarch64, adding -fPIC to CFLAGS so that this flag will be used when Linux default configuration is used as fallback.</li>
<li>The implementation of the <code><a href="../reference/s_attribute_decode.html">s_attribute_decode()</a></code> method was incomplete for method “Rcpp”. This alternative to the “pure R” approach is now implemented (<a href="https://github.com/PolMine/RcppCWB/issues/2" class="external-link">#2</a>).</li>
<li>The unused file ‘setpaths.R’ has been removed from the tools directory (<a href="https://github.com/PolMine/RcppCWB/issues/10" class="external-link">#10</a>).</li>
<li>The argument <code>method</code> previously setting “wininet” in ./tools/winlibs.R is omitted to avoid the warning “the ‘wininet’ method is deprecated for <a href="http://" class="external-link uri">http://</a> and <a href="https://" class="external-link uri">https://</a> URLs” on Windows.</li>
<li>The configure script will print the libdirs derived using pcre-config and link against libintl on macOS by default.</li>
</ul></div>
</div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.3.2" id="rcppcwb-032">RcppCWB 0.3.2<small>2021-02-03</small><a class="anchor" aria-label="anchor" href="#rcppcwb-032"></a></h2>
<ul><li>If RcppCWB is compiled on macOS, the package configure script checks the architecture of the machine and ensures that (if glib-2.0 is not yet present) a version of glib-2.0 compiled for Apple Silicon/the M1 chip is loaded in case an amd64 architecture is detected.</li>
<li>The package configure script now uses <code>pcre-config</code> to locate header files of PCRE.</li>
<li>The configure script checks whether pcre has been compiled with Unicode properties support. If not, a warning is issued that also explains the recommended solution to use ‘–enable-unicode-properties’ when calling configure.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.3.0" id="rcppcwb-030">RcppCWB 0.3.0<small>2020-07-08</small><a class="anchor" aria-label="anchor" href="#rcppcwb-030"></a></h2>
<ul><li>To avoid warnings when running R CMD check, the <a href="http://pcre.org" class="external-link uri">http://pcre.org</a> is used rather than <a href="https://pcre.org" class="external-link uri">https://pcre.org</a> in the DESCRIPTION and the README file.</li>
<li>To overcome a somewhat dirty solution for multiple symbol definitions, adding the ‘fcommon’ flag to the CFLAGS in the configure script has been removed. The C code has been modified such that multiple symbol definitions are omitted.</li>
<li>The macOS image used for test on Travis CI is now ‘xcode9.4’</li>
<li>On Solaris, the configure script would define the flag “-Wl,–allow-multiple-definition” to be passed to the linker flags. The rework of the CWB includes and the inclusion of the header file ‘env.h’ makes it possible to drop this flag. It was defined at a confusing place anyway.</li>
<li>Using the compiler desired by the user (in Makeconf, Makevars file) is now there for all OSes.</li>
<li>If pkg-config is not present on macOS, a warning is issued; the user gets the advice to use the brew package manager to install pkg-config.</li>
<li>There is an explicit check in the configure script whether the dependencies ncurses, pcre and glib-2.0 are present. If not, a telling error with installation instructions is displayed.</li>
<li>When unloading the package, the dynamic library RcppCWB.so is unloaded.</li>
<li>When loading the package, CQP is initialized by default (call <code><a href="../reference/cqp_initialize.html">cqp_initialize()</a></code>)</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.9" id="rcppcwb-029">RcppCWB 0.2.9<small>2020-06-25</small><a class="anchor" aria-label="anchor" href="#rcppcwb-029"></a></h2>
<ul><li>Starting with GCC 10, the compiler defaults to -fno-common, resulting in error messages during the linker stage, see <a href="https://gcc.gnu.org/gcc-10/changes.html" class="external-link">the change log of the GCC compiler</a>. To address this issue, the -fcommon option is now used by default when compiling the CWB C files on Linux 64bit systems. The CWB code includes header files multiple times, causing multiple definitions.</li>
<li>On Linux systems, the hard-coded definition as the preferred C compiler in the CWB configuration sripts will be replaced by what the CC variable defines (in ~/.R/Makevars or the Makeconf file, the result returned by R CMD config CC).</li>
<li>Remaining bashisms have been removed from the cleanup file. The shebang line of the cleanup and the configure file is now #!/bin/sh, to avoid any reliance on bash.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.8" id="rcppcwb-028">RcppCWB 0.2.8<small>2019-02-21</small><a class="anchor" aria-label="anchor" href="#rcppcwb-028"></a></h2>
<ul><li>There have been (minor) modifiations of the C code of the CWB so that compilation succeeds on Solaris.</li>
<li>Using the ‘-C’ flag in the CWB Makefiles has been replaced by ‘cd cl’ / ‘cd cqp’ to avoid dependence on GNU make. GNU make is still required, because of ‘include’ statements in the Makefiles.</li>
<li>Removed an action on ‘depend.mk’ from ‘cleanup’ script to avoid error messages that depend.mk is not present when Makefiles are first loaded.</li>
<li>Dummy depend.mk files will satisfy include statement in Makefiles when running ‘make clean’ (depend.mk files are created only when running depend.mk)</li>
<li>For creating index of static archives (libcl, libcqb, libcwb), a call to ‘ranlib’ has been replaced by an equivalent ‘ar -s’ in the Makefiles, but commented out.</li>
<li>In the platform-specific config files of the CWB, the ‘-march’-option has been taken out, to safeguard portability.</li>
<li>To meet the requirements of the upcoming changes in the CRAN check process to use staged installs, the procedure to reset the paths in the test data within the package has been replaced throughout by using a temporary registry directory. The <code><a href="../reference/tmp_registry.html">get_tmp_registry()</a></code> will return the whereabouts of this directory.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.7" id="rcppcwb-027">RcppCWB 0.2.7<small>2019-01-09</small><a class="anchor" aria-label="anchor" href="#rcppcwb-027"></a></h2>
<ul><li>If glib-2.0 is not present on macOS, binaries of the static library and header files are downloaded from a GitHub repo. This prepares to get RcppCWB pass macOS checking on CRAN machines.</li>
<li>A slight modification of the C code will now prevent previous crashes resulting from a faulty CQP syntax. The solution will not yet be effective for Windows systems until we have recompiled the libcqp static library that is downloaded during the installation process.</li>
<li>A new C++-level function ‘check_corpus’ checks whether a given corpus is available and is used by the <code><a href="../reference/checks.html">check_corpus()</a></code>-function. Problems with the previous implementation that relied on files in the registry directory to ensure the presence of a corpus hopefully do not occur.</li>
<li>Calling the ‘find_readline.perl’ utility script is omitted on macOS, so previous warning messages when running the makefile do not show up any more.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.6" id="rcppcwb-026">RcppCWB 0.2.6<small>2018-10-22</small><a class="anchor" aria-label="anchor" href="#rcppcwb-026"></a></h2>
<ul><li>Function <code><a href="../reference/cl_charset_name.html">cl_charset_name()</a></code> is exposed, it will return the charset of a corpus. Faster than parsing the registry file again and again.</li>
<li>A new <code><a href="../reference/cl_delete_corpus.html">cl_delete_corpus()</a></code>-function can remove loaded corpora from memory.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.5" id="rcppcwb-025">RcppCWB 0.2.5<small>2018-08-10</small><a class="anchor" aria-label="anchor" href="#rcppcwb-025"></a></h2>
<ul><li>In Makevars.win, libiconv is explicitly linked, to make RcppCWB compatible with new release of Rtools.</li>
<li>regex in check_s_attribute() for parsing registry file improved so that it does not produce an error if ‘# [attribute]’ follows after declaration of s_attribute</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.4" id="rcppcwb-024">RcppCWB 0.2.4<small>2018-06-15</small><a class="anchor" aria-label="anchor" href="#rcppcwb-024"></a></h2>
<ul><li>for linux and macOS, CWB 3.4.14 included, so that UTF-8 support is realized</li>
<li>bug removed in check_cqp_query that would prevent special characters from working in CQP queries</li>
<li>check_strucs, check_cpos and check_id are checking for NAs now to avoid crashes</li>
<li>cwb command line tools cwb-makeall, cwb-huffcode and cwb-compress-rdx exposed as cwb_makeall, cwb_huffcode and cwb_compress_rdx</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.3" id="rcppcwb-023">RcppCWB 0.2.3<small>2018-05-13</small><a class="anchor" aria-label="anchor" href="#rcppcwb-023"></a></h2>
<ul><li>when loading the package, a check is performed to make sure that paths in the registry files point to the data files of the sample data (issues may occur when installing binaries)</li>
<li>auxiliary functions to check whether input to Rcpp-wrappers/C functions is valid are now exported and documented</li>
<li>more consistent validity checks of input to functions for structural attributes</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.2" id="rcppcwb-022">RcppCWB 0.2.2<small>2018-05-01</small><a class="anchor" aria-label="anchor" href="#rcppcwb-022"></a></h2>
<ul><li>Compiling RcppCWB on unix-like systems (macOS, Linux) will work now without the presence of glib (on Windows, the dependency persists).#</li>
<li>The presence of the bison parser is not required any more. The package includes the C source generated by the bison parser along with the original input files.</li>
<li>Functionality to generate CWB-indexed corpora and to generate and manipulate the registry file describing a corpus has been moved to a new package ‘cwbtools’ (see <a href="https://www.github.com/PolMine/cwbtools" class="external-link uri">https://www.github.com/PolMine/cwbtools</a>) in order to maintain a clearly defined scope of RcppCWB to expose functionality of the C code of the CWB.</li>
<li>Minor intervention in function ‘valid_subcorpus_name’ to omit a -Wtautological-pointer-compare warning leading to a WARNING when checking package for R 3.5.0 with option –as-cran</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.1" id="rcppcwb-021">RcppCWB 0.2.1<small>2018-04-21</small><a class="anchor" aria-label="anchor" href="#rcppcwb-021"></a></h2>
<ul><li>In previous versions the drive of the working directory and of the registry/data directory had to be identical on Windows; this limitation does not persist;</li>
<li>Some utility functions could be removed that were necessary to check the identity of the drives of the working directory and the data.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.2.0" id="rcppcwb-020">RcppCWB 0.2.0<a class="anchor" aria-label="anchor" href="#rcppcwb-020"></a></h2>
<ul><li>In addition to low-level functionality of the corpus library (CL), functions of the Corpus Query Processor (CQP) are exposed, building on C wrappers in the rcqp package;</li>
<li>The authors of the rcqp package (Bernard Desgraupes and Sylvain Loiseau) are mentioned as package authors and as authors of functions using CQP, as the code used to expose CQP functionality is a modified version of rcqp code;</li>
<li>Extended package description explaining the rationale for developing the RcppCWB package;</li>
<li>Documentation of functions has been rearranged, many examples have been included;</li>
<li>Renaming of exposed functions of corpus library from cwb_… to cl_…;</li>
<li>sanity checks in R wrappers for Rcpp functions.</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.1.7" id="rcppcwb-017">RcppCWB 0.1.7<small>2018-02-20</small><a class="anchor" aria-label="anchor" href="#rcppcwb-017"></a></h2>
<ul><li>CWB source code included in package to be GPL compliant</li>
<li>template to adjust HOME and INFO in registry file used (tools/setpaths.R)</li>
<li>using VignetteBuilder has been removed</li>
<li>definition of Rprintf in cwb/cl/macros.c</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.1.6" id="rcppcwb-016">RcppCWB 0.1.6<a class="anchor" aria-label="anchor" href="#rcppcwb-016"></a></h2>
<ul><li>now using configure/configure.win script in combination with setpaths.R</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.1.1" id="rcppcwb-011">RcppCWB 0.1.1<a class="anchor" aria-label="anchor" href="#rcppcwb-011"></a></h2>
<ul><li>vignette included that explains cross-compiling CWB for Windows</li>
<li>check in struc2str to ensure that structure has attributes</li>
</ul></div>
<div class="section level2">
<h2 class="page-header" data-toc-text="0.1.0" id="rcppcwb-010">RcppCWB 0.1.0<a class="anchor" aria-label="anchor" href="#rcppcwb-010"></a></h2>
<ul><li>Windows compatibility (potentially still limited)</li>
</ul></div>
</div>
<div class="col-md-3 hidden-xs hidden-sm" id="pkgdown-sidebar">
<nav id="toc" data-toggle="toc" class="sticky-top"><h2 data-toc-skip>Contents</h2>
</nav></div>
</div>
<footer><div class="copyright">
<p></p><p>Developed by Andreas Blaette, Bernard Desgraupes, Sylvain Loiseau.</p>
</div>
<div class="pkgdown">
<p></p><p>Site built with <a href="https://pkgdown.r-lib.org/" class="external-link">pkgdown</a> 2.0.6.</p>
</div>
</footer></div>
</body></html>