INTERNALS.md
Go to the documentation of this file.
1 curl internals
2 ==============
3 
4  - [Intro](#intro)
5  - [git](#git)
6  - [Portability](#Portability)
7  - [Windows vs Unix](#winvsunix)
8  - [Library](#Library)
9  - [`Curl_connect`](#Curl_connect)
10  - [`Curl_do`](#Curl_do)
11  - [`Curl_readwrite`](#Curl_readwrite)
12  - [`Curl_done`](#Curl_done)
13  - [`Curl_disconnect`](#Curl_disconnect)
14  - [HTTP(S)](#http)
15  - [FTP](#ftp)
16  - [Kerberos](#kerberos)
17  - [TELNET](#telnet)
18  - [FILE](#file)
19  - [SMB](#smb)
20  - [LDAP](#ldap)
21  - [E-mail](#email)
22  - [General](#general)
23  - [Persistent Connections](#persistent)
24  - [multi interface/non-blocking](#multi)
25  - [SSL libraries](#ssl)
26  - [Library Symbols](#symbols)
27  - [Return Codes and Informationals](#returncodes)
28  - [AP/ABI](#abi)
29  - [Client](#client)
30  - [Memory Debugging](#memorydebug)
31  - [Test Suite](#test)
32  - [Asynchronous name resolves](#asyncdns)
33  - [c-ares](#cares)
34  - [`curl_off_t`](#curl_off_t)
35  - [curlx](#curlx)
36  - [Content Encoding](#contentencoding)
37  - [hostip.c explained](#hostip)
38  - [Track Down Memory Leaks](#memoryleak)
39  - [`multi_socket`](#multi_socket)
40  - [Structs in libcurl](#structs)
41 
42 <a name="intro"></a>
43 Intro
44 =====
45 
46  This project is split in two. The library and the client. The client part
47  uses the library, but the library is designed to allow other applications to
48  use it.
49 
50  The largest amount of code and complexity is in the library part.
51 
52 
53 <a name="git"></a>
54 git
55 ===
56 
57  All changes to the sources are committed to the git repository as soon as
58  they're somewhat verified to work. Changes shall be committed as independently
59  as possible so that individual changes can be easily spotted and tracked
60  afterwards.
61 
62  Tagging shall be used extensively, and by the time we release new archives we
63  should tag the sources with a name similar to the released version number.
64 
65 <a name="Portability"></a>
66 Portability
67 ===========
68 
69  We write curl and libcurl to compile with C89 compilers. On 32bit and up
70  machines. Most of libcurl assumes more or less POSIX compliance but that's
71  not a requirement.
72 
73  We write libcurl to build and work with lots of third party tools, and we
74  want it to remain functional and buildable with these and later versions
75  (older versions may still work but is not what we work hard to maintain):
76 
77 Dependencies
78 ------------
79 
80  - OpenSSL 0.9.7
81  - GnuTLS 1.2
82  - zlib 1.1.4
83  - libssh2 0.16
84  - c-ares 1.6.0
85  - libidn 0.4.1
86  - cyassl 2.0.0
87  - openldap 2.0
88  - MIT Kerberos 1.2.4
89  - GSKit V5R3M0
90  - NSS 3.14.x
91  - axTLS 2.1.0
92  - PolarSSL 1.3.0
93  - Heimdal ?
94  - nghttp2 1.0.0
95 
96 Operating Systems
97 -----------------
98 
99  On systems where configure runs, we aim at working on them all - if they have
100  a suitable C compiler. On systems that don't run configure, we strive to keep
101  curl running correctly on:
102 
103  - Windows 98
104  - AS/400 V5R3M0
105  - Symbian 9.1
106  - Windows CE ?
107  - TPF ?
108 
109 Build tools
110 -----------
111 
112  When writing code (mostly for generating stuff included in release tarballs)
113  we use a few "build tools" and we make sure that we remain functional with
114  these versions:
115 
116  - GNU Libtool 1.4.2
117  - GNU Autoconf 2.57
118  - GNU Automake 1.7
119  - GNU M4 1.4
120  - perl 5.004
121  - roffit 0.5
122  - groff ? (any version that supports "groff -Tps -man [in] [out]")
123  - ps2pdf (gs) ?
124 
125 <a name="winvsunix"></a>
126 Windows vs Unix
127 ===============
128 
129  There are a few differences in how to program curl the Unix way compared to
130  the Windows way. Perhaps the four most notable details are:
131 
132  1. Different function names for socket operations.
133 
134  In curl, this is solved with defines and macros, so that the source looks
135  the same in all places except for the header file that defines them. The
136  macros in use are sclose(), sread() and swrite().
137 
138  2. Windows requires a couple of init calls for the socket stuff.
139 
140  That's taken care of by the `curl_global_init()` call, but if other libs
141  also do it etc there might be reasons for applications to alter that
142  behaviour.
143 
144  3. The file descriptors for network communication and file operations are
145  not as easily interchangeable as in Unix.
146 
147  We avoid this by not trying any funny tricks on file descriptors.
148 
149  4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
150  destroying binary data, although you do want that conversion if it is
151  text coming through... (sigh)
152 
153  We set stdout to binary under windows
154 
155  Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
156  conditionals that deal with features *should* instead be in the format
157  `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
158  we maintain a `curl_config-win32.h` file in lib directory that is supposed to
159  look exactly like a `curl_config.h` file would have looked like on a Windows
160  machine!
161 
162  Generally speaking: always remember that this will be compiled on dozens of
163  operating systems. Don't walk on the edge!
164 
165 <a name="Library"></a>
166 Library
167 =======
168 
169  (See [Structs in libcurl](#structs) for the separate section describing all
170  major internal structs and their purposes.)
171 
172  There are plenty of entry points to the library, namely each publicly defined
173  function that libcurl offers to applications. All of those functions are
174  rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
175  put in the lib/easy.c file.
176 
177  `curl_global_init()` and `curl_global_cleanup()` should be called by the
178  application to initialize and clean up global stuff in the library. As of
179  today, it can handle the global SSL initing if SSL is enabled and it can init
180  the socket layer on windows machines. libcurl itself has no "global" scope.
181 
182  All printf()-style functions use the supplied clones in lib/mprintf.c. This
183  makes sure we stay absolutely platform independent.
184 
185  [ `curl_easy_init()`][2] allocates an internal struct and makes some
186  initializations. The returned handle does not reveal internals. This is the
187  `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
188  functions. All connections performed will get connect-specific data allocated
189  that should be used for things related to particular connections/requests.
190 
191  [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
192  be passed in pairs: the parameter-ID and the parameter-value. The list of
193  options is documented in the man page. This function mainly sets things in
194  the `Curl_easy` struct.
195 
196  `curl_easy_perform()` is just a wrapper function that makes use of the multi
197  API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
198  `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
199  and then returns.
200 
201  Some of the most important key functions in url.c are called from multi.c
202  when certain key steps are to be made in the transfer operation.
203 
204 <a name="Curl_connect"></a>
205 Curl_connect()
206 --------------
207 
208  Analyzes the URL, it separates the different components and connects to the
209  remote host. This may involve using a proxy and/or using SSL. The
210  `Curl_resolv()` function in lib/hostip.c is used for looking up host names
211  (it does then use the proper underlying method, which may vary between
212  platforms and builds).
213 
214  When `Curl_connect` is done, we are connected to the remote site. Then it
215  is time to tell the server to get a document/file. `Curl_do()` arranges
216  this.
217 
218  This function makes sure there's an allocated and initiated 'connectdata'
219  struct that is used for this particular connection only (although there may
220  be several requests performed on the same connect). A bunch of things are
221  inited/inherited from the `Curl_easy` struct.
222 
223 <a name="Curl_do"></a>
224 Curl_do()
225 ---------
226 
227  `Curl_do()` makes sure the proper protocol-specific function is called. The
228  functions are named after the protocols they handle.
229 
230  The protocol-specific functions of course deal with protocol-specific
231  negotiations and setup. They have access to the `Curl_sendf()` (from
232  lib/sendf.c) function to send printf-style formatted data to the remote
233  host and when they're ready to make the actual file transfer they call the
234  `Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and
235  returns.
236 
237  If this DO function fails and the connection is being re-used, libcurl will
238  then close this connection, setup a new connection and re-issue the DO
239  request on that. This is because there is no way to be perfectly sure that
240  we have discovered a dead connection before the DO function and thus we
241  might wrongly be re-using a connection that was closed by the remote peer.
242 
243  Some time during the DO function, the `Curl_setup_transfer()` function must
244  be called with some basic info about the upcoming transfer: what socket(s)
245  to read/write and the expected file transfer sizes (if known).
246 
247 <a name="Curl_readwrite"></a>
248 Curl_readwrite()
249 ----------------
250 
251  Called during the transfer of the actual protocol payload.
252 
253  During transfer, the progress functions in lib/progress.c are called at
254  frequent intervals (or at the user's choice, a specified callback might get
255  called). The speedcheck functions in lib/speedcheck.c are also used to
256  verify that the transfer is as fast as required.
257 
258 <a name="Curl_done"></a>
259 Curl_done()
260 -----------
261 
262  Called after a transfer is done. This function takes care of everything
263  that has to be done after a transfer. This function attempts to leave
264  matters in a state so that `Curl_do()` should be possible to call again on
265  the same connection (in a persistent connection case). It might also soon
266  be closed with `Curl_disconnect()`.
267 
268 <a name="Curl_disconnect"></a>
269 Curl_disconnect()
270 -----------------
271 
272  When doing normal connections and transfers, no one ever tries to close any
273  connections so this is not normally called when `curl_easy_perform()` is
274  used. This function is only used when we are certain that no more transfers
275  are going to be made on the connection. It can be also closed by force, or
276  it can be called to make sure that libcurl doesn't keep too many
277  connections alive at the same time.
278 
279  This function cleans up all resources that are associated with a single
280  connection.
281 
282 <a name="http"></a>
283 HTTP(S)
284 =======
285 
286  HTTP offers a lot and is the protocol in curl that uses the most lines of
287  code. There is a special file (lib/formdata.c) that offers all the multipart
288  post functions.
289 
290  base64-functions for user+password stuff (and more) is in (lib/base64.c) and
291  all functions for parsing and sending cookies are found in (lib/cookie.c).
292 
293  HTTPS uses in almost every case the same procedure as HTTP, with only two
294  exceptions: the connect procedure is different and the function used to read
295  or write from the socket is different, although the latter fact is hidden in
296  the source by the use of `Curl_read()` for reading and `Curl_write()` for
297  writing data to the remote server.
298 
299  `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
300  encoding.
301 
302  An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
303  series of functions we use. They append data to one single buffer, and when
304  the building is finished the entire request is sent off in one single write. This is done this way to overcome problems with flawed firewalls and lame servers.
305 
306 <a name="ftp"></a>
307 FTP
308 ===
309 
310  The `Curl_if2ip()` function can be used for getting the IP number of a
311  specified network interface, and it resides in lib/if2ip.c.
312 
313  `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
314  was made a separate function to prevent us programmers from forgetting that
315  they must be CRLF terminated. They must also be sent in one single write() to
316  make firewalls and similar happy.
317 
318 <a name="kerberos"></a>
319 Kerberos
320 --------
321 
322  Kerberos support is mainly in lib/krb5.c and lib/security.c but also
323  `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
324  `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
325 
326 <a name="telnet"></a>
327 TELNET
328 ======
329 
330  Telnet is implemented in lib/telnet.c.
331 
332 <a name="file"></a>
333 FILE
334 ====
335 
336  The file:// protocol is dealt with in lib/file.c.
337 
338 <a name="smb"></a>
339 SMB
340 ===
341 
342  The smb:// protocol is dealt with in lib/smb.c.
343 
344 <a name="ldap"></a>
345 LDAP
346 ====
347 
348  Everything LDAP is in lib/ldap.c and lib/openldap.c
349 
350 <a name="email"></a>
351 E-mail
352 ======
353 
354  The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
355 
356 <a name="general"></a>
357 General
358 =======
359 
360  URL encoding and decoding, called escaping and unescaping in the source code,
361  is found in lib/escape.c.
362 
363  While transferring data in Transfer() a few functions might get used.
364  `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
365 
366  lib/getenv.c offers `curl_getenv()` which is for reading environment
367  variables in a neat platform independent way. That's used in the client, but
368  also in lib/url.c when checking the proxy environment variables. Note that
369  contrary to the normal unix getenv(), this returns an allocated buffer that
370  must be free()ed after use.
371 
372  lib/netrc.c holds the .netrc parser
373 
374  lib/timeval.c features replacement functions for systems that don't have
375  gettimeofday() and a few support functions for timeval conversions.
376 
377  A function named `curl_version()` that returns the full curl version string
378  is found in lib/version.c.
379 
380 <a name="persistent"></a>
381 Persistent Connections
382 ======================
383 
384  The persistent connection support in libcurl requires some considerations on
385  how to do things inside of the library.
386 
387  - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
388  must never hold connection-oriented data. It is meant to hold the root data
389  as well as all the options etc that the library-user may choose.
390 
391  - The `Curl_easy` struct holds the "connection cache" (an array of
392  pointers to 'connectdata' structs).
393 
394  - This enables the 'curl handle' to be reused on subsequent transfers.
395 
396  - When libcurl is told to perform a transfer, it first checks for an already
397  existing connection in the cache that we can use. Otherwise it creates a
398  new one and adds that to the cache. If the cache is full already when a new
399  connection is added, it will first close the oldest unused one.
400 
401  - When the transfer operation is complete, the connection is left
402  open. Particular options may tell libcurl not to, and protocols may signal
403  closure on connections and then they won't be kept open, of course.
404 
405  - When `curl_easy_cleanup()` is called, we close all still opened connections,
406  unless of course the multi interface "owns" the connections.
407 
408  The curl handle must be re-used in order for the persistent connections to
409  work.
410 
411 <a name="multi"></a>
412 multi interface/non-blocking
413 ============================
414 
415  The multi interface is a non-blocking interface to the library. To make that
416  interface work as well as possible, no low-level functions within libcurl
417  must be written to work in a blocking manner. (There are still a few spots
418  violating this rule.)
419 
420  One of the primary reasons we introduced c-ares support was to allow the name
421  resolve phase to be perfectly non-blocking as well.
422 
423  The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
424  the code to allow non-blocking operations even on multi-stage command-
425  response protocols. They are built around state machines that return when
426  they would otherwise block waiting for data. The DICT, LDAP and TELNET
427  protocols are crappy examples and they are subject for rewrite in the future
428  to better fit the libcurl protocol family.
429 
430 <a name="ssl"></a>
431 SSL libraries
432 =============
433 
434  Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
435  extended to its successor OpenSSL but has since also been extended to several
436  other SSL/TLS libraries and we expect and hope to further extend the support
437  in future libcurl versions.
438 
439  To deal with this internally in the best way possible, we have a generic SSL
440  function API as provided by the vtls/vtls.[ch] system, and they are the only
441  SSL functions we must use from within libcurl. vtls is then crafted to use
442  the appropriate lower-level function calls to whatever SSL library that is in
443  use. For example vtls/openssl.[ch] for the OpenSSL library.
444 
445 <a name="symbols"></a>
446 Library Symbols
447 ===============
448 
449  All symbols used internally in libcurl must use a `Curl_` prefix if they're
450  used in more than a single file. Single-file symbols must be made static.
451  Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
452  but they are to be changed to follow this pattern in future versions.) Public
453  API functions are marked with `CURL_EXTERN` in the public header files so
454  that all others can be hidden on platforms where this is possible.
455 
456 <a name="returncodes"></a>
457 Return Codes and Informationals
458 ===============================
459 
460  I've made things simple. Almost every function in libcurl returns a CURLcode,
461  that must be `CURLE_OK` if everything is OK or otherwise a suitable error
462  code as the curl/curl.h include file defines. The very spot that detects an
463  error must use the `Curl_failf()` function to set the human-readable error
464  description.
465 
466  In aiding the user to understand what's happening and to debug curl usage, we
467  must supply a fair number of informational messages by using the
468  `Curl_infof()` function. Those messages are only displayed when the user
469  explicitly asks for them. They are best used when revealing information that
470  isn't otherwise obvious.
471 
472 <a name="abi"></a>
473 API/ABI
474 =======
475 
476  We make an effort to not export or show internals or how internals work, as
477  that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
478  for our promise to users.
479 
480 <a name="client"></a>
481 Client
482 ======
483 
484  main() resides in `src/tool_main.c`.
485 
486  `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
487  to display the complete "manual" and the `src/tool_urlglob.c` file holds the
488  functions used for the URL-"globbing" support. Globbing in the sense that the
489  {} and [] expansion stuff is there.
490 
491  The client mostly sets up its 'config' struct properly, then
492  it calls the `curl_easy_*()` functions of the library and when it gets back
493  control after the `curl_easy_perform()` it cleans up the library, checks
494  status and exits.
495 
496  When the operation is done, the ourWriteOut() function in src/writeout.c may
497  be called to report about the operation. That function is using the
498  `curl_easy_getinfo()` function to extract useful information from the curl
499  session.
500 
501  It may loop and do all this several times if many URLs were specified on the
502  command line or config file.
503 
504 <a name="memorydebug"></a>
505 Memory Debugging
506 ================
507 
508  The file lib/memdebug.c contains debug-versions of a few functions. Functions
509  such as malloc, free, fopen, fclose, etc that somehow deal with resources
510  that might give us problems if we "leak" them. The functions in the memdebug
511  system do nothing fancy, they do their normal function and then log
512  information about what they just did. The logged data can then be analyzed
513  after a complete session,
514 
515  memanalyze.pl is the perl script present in tests/ that analyzes a log file
516  generated by the memory tracking system. It detects if resources are
517  allocated but never freed and other kinds of errors related to resource
518  management.
519 
520  Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
521  is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
522  differentiate code which is _only_ used for memory tracking/debugging.
523 
524  Use -DCURLDEBUG when compiling to enable memory debugging, this is also
525  switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
526  when compiling to enable a debug build or run configure with --enable-debug.
527 
528  curl --version will list 'Debug' feature for debug enabled builds, and
529  will list 'TrackMemory' feature for curl debug memory tracking capable
530  builds. These features are independent and can be controlled when running
531  the configure script. When --enable-debug is given both features will be
532  enabled, unless some restriction prevents memory tracking from being used.
533 
534 <a name="test"></a>
535 Test Suite
536 ==========
537 
538  The test suite is placed in its own subdirectory directly off the root in the
539  curl archive tree, and it contains a bunch of scripts and a lot of test case
540  data.
541 
542  The main test script is runtests.pl that will invoke test servers like
543  httpserver.pl and ftpserver.pl before all the test cases are performed. The
544  test suite currently only runs on Unix-like platforms.
545 
546  You'll find a description of the test suite in the tests/README file, and the
547  test case data files in the tests/FILEFORMAT file.
548 
549  The test suite automatically detects if curl was built with the memory
550  debugging enabled, and if it was, it will detect memory leaks, too.
551 
552 <a name="asyncdns"></a>
553 Asynchronous name resolves
554 ==========================
555 
556  libcurl can be built to do name resolves asynchronously, using either the
557  normal resolver in a threaded manner or by using c-ares.
558 
559 <a name="cares"></a>
560 [c-ares][3]
561 ------
562 
563 ### Build libcurl to use a c-ares
564 
565 1. ./configure --enable-ares=/path/to/ares/install
566 2. make
567 
568 ### c-ares on win32
569 
570  First I compiled c-ares. I changed the default C runtime library to be the
571  single-threaded rather than the multi-threaded (this seems to be required to
572  prevent linking errors later on). Then I simply build the areslib project
573  (the other projects adig/ahost seem to fail under MSVC).
574 
575  Next was libcurl. I opened lib/config-win32.h and I added a:
576  `#define USE_ARES 1`
577 
578  Next thing I did was I added the path for the ares includes to the include
579  path, and the libares.lib to the libraries.
580 
581  Lastly, I also changed libcurl to be single-threaded rather than
582  multi-threaded, again this was to prevent some duplicate symbol errors. I'm
583  not sure why I needed to change everything to single-threaded, but when I
584  didn't I got redefinition errors for several CRT functions (malloc, stricmp,
585  etc.)
586 
587 <a name="curl_off_t"></a>
588 `curl_off_t`
589 ==========
590 
591  `curl_off_t` is a data type provided by the external libcurl include
592  headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
593  options that end with LARGE. The type is 64bit large on most modern
594  platforms.
595 
596 curlx
597 =====
598 
599  The libcurl source code offers a few functions by source only. They are not
600  part of the official libcurl API, but the source files might be useful for
601  others so apps can optionally compile/build with these sources to gain
602  additional functions.
603 
604  We provide them through a single header file for easy access for apps:
605  "curlx.h"
606 
607 `curlx_strtoofft()`
608 -------------------
609  A macro that converts a string containing a number to a `curl_off_t` number.
610  This might use the `curlx_strtoll()` function which is provided as source
611  code in strtoofft.c. Note that the function is only provided if no
612  strtoll() (or equivalent) function exist on your platform. If `curl_off_t`
613  is only a 32 bit number on your platform, this macro uses strtol().
614 
615 `curlx_tvnow()`
616 ---------------
617  returns a struct timeval for the current time.
618 
619 `curlx_tvdiff()`
620 --------------
621  returns the difference between two timeval structs, in number of
622  milliseconds.
623 
624 `curlx_tvdiff_secs()`
625 ---------------------
626  returns the same as `curlx_tvdiff` but with full usec resolution (as a
627  double)
628 
629 Future
630 ------
631 
632  Several functions will be removed from the public `curl_` name space in a
633  future libcurl release. They will then only become available as `curlx_`
634  functions instead. To make the transition easier, we already today provide
635  these functions with the `curlx_` prefix to allow sources to be built
636  properly with the new function names. The concerned functions are:
637 
638  - `curlx_getenv`
639  - `curlx_strequal`
640  - `curlx_strnequal`
641  - `curlx_mvsnprintf`
642  - `curlx_msnprintf`
643  - `curlx_maprintf`
644  - `curlx_mvaprintf`
645  - `curlx_msprintf`
646  - `curlx_mprintf`
647  - `curlx_mfprintf`
648  - `curlx_mvsprintf`
649  - `curlx_mvprintf`
650  - `curlx_mvfprintf`
651 
652 <a name="contentencoding"></a>
653 Content Encoding
654 ================
655 
656 ## About content encodings
657 
658  [HTTP/1.1][4] specifies that a client may request that a server encode its
659  response. This is usually used to compress a response using one of a set of
660  commonly available compression techniques. These schemes are 'deflate' (the
661  zlib algorithm), 'gzip' and 'compress'. A client requests that the server
662  perform an encoding by including an Accept-Encoding header in the request
663  document. The value of the header should be one of the recognized tokens
664  'deflate', ... (there's a way to register new schemes/tokens, see sec 3.5 of
665  the spec). A server MAY honor the client's encoding request. When a response
666  is encoded, the server includes a Content-Encoding header in the
667  response. The value of the Content-Encoding header indicates which scheme was
668  used to encode the data.
669 
670  A client may tell a server that it can understand several different encoding
671  schemes. In this case the server may choose any one of those and use it to
672  encode the response (indicating which one using the Content-Encoding header).
673  It's also possible for a client to attach priorities to different schemes so
674  that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
675  information on the Accept-Encoding header.
676 
677 ## Supported content encodings
678 
679  The 'deflate' and 'gzip' content encoding are supported by libcurl. Both
680  regular and chunked transfers work fine. The zlib library is required for
681  this feature.
682 
683 ## The libcurl interface
684 
685  To cause libcurl to request a content encoding use:
686 
687  [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
688 
689  where string is the intended value of the Accept-Encoding header.
690 
691  Currently, libcurl only understands how to process responses that use the
692  "deflate" or "gzip" Content-Encoding, so the only values for
693  [`CURLOPT_ACCEPT_ENCODING`][5] that will work (besides "identity," which does
694  nothing) are "deflate" and "gzip" If a response is encoded using the
695  "compress" or methods, libcurl will return an error indicating that the
696  response could not be decoded. If <string> is NULL no Accept-Encoding header
697  is generated. If <string> is a zero-length string, then an Accept-Encoding
698  header containing all supported encodings will be generated.
699 
700  The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
701  content to be automatically decoded. If it is not set and the server still
702  sends encoded content (despite not having been asked), the data is returned
703  in its raw form and the Content-Encoding type is not checked.
704 
705 ## The curl interface
706 
707  Use the [--compressed][6] option with curl to cause it to ask servers to
708  compress responses using any format supported by curl.
709 
710 <a name="hostip"></a>
711 hostip.c explained
712 ==================
713 
714  The main compile-time defines to keep in mind when reading the host*.c source
715  file are these:
716 
717 ## `CURLRES_IPV6`
718 
719  this host has getaddrinfo() and family, and thus we use that. The host may
720  not be able to resolve IPv6, but we don't really have to take that into
721  account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
722 
723 ## `CURLRES_ARES`
724 
725  is defined if libcurl is built to use c-ares for asynchronous name
726  resolves. This can be Windows or *nix.
727 
728 ## `CURLRES_THREADED`
729 
730  is defined if libcurl is built to use threading for asynchronous name
731  resolves. The name resolve will be done in a new thread, and the supported
732  asynch API will be the same as for ares-builds. This is the default under
733  (native) Windows.
734 
735  If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
736  libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
737  defined.
738 
739 ## host*.c sources
740 
741  The host*.c sources files are split up like this:
742 
743  - hostip.c - method-independent resolver functions and utility functions
744  - hostasyn.c - functions for asynchronous name resolves
745  - hostsyn.c - functions for synchronous name resolves
746  - asyn-ares.c - functions for asynchronous name resolves using c-ares
747  - asyn-thread.c - functions for asynchronous name resolves using threads
748  - hostip4.c - IPv4 specific functions
749  - hostip6.c - IPv6 specific functions
750 
751  The hostip.h is the single united header file for all this. It defines the
752  `CURLRES_*` defines based on the config*.h and `curl_setup.h` defines.
753 
754 <a name="memoryleak"></a>
755 Track Down Memory Leaks
756 =======================
757 
758 ## Single-threaded
759 
760  Please note that this memory leak system is not adjusted to work in more
761  than one thread. If you want/need to use it in a multi-threaded app. Please
762  adjust accordingly.
763 
764 
765 ## Build
766 
767  Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
768  --enable-debug fixes this). 'make clean' first, then 'make' so that all
769  files are actually rebuilt properly. It will also make sense to build
770  libcurl with the debug option (usually -g to the compiler) so that debugging
771  it will be easier if you actually do find a leak in the library.
772 
773  This will create a library that has memory debugging enabled.
774 
775 ## Modify Your Application
776 
777  Add a line in your application code:
778 
779  `curl_memdebug("dump");`
780 
781  This will make the malloc debug system output a full trace of all resource
782  using functions to the given file name. Make sure you rebuild your program
783  and that you link with the same libcurl you built for this purpose as
784  described above.
785 
786 ## Run Your Application
787 
788  Run your program as usual. Watch the specified memory trace file grow.
789 
790  Make your program exit and use the proper libcurl cleanup functions etc. So
791  that all non-leaks are returned/freed properly.
792 
793 ## Analyze the Flow
794 
795  Use the tests/memanalyze.pl perl script to analyze the dump file:
796 
797  tests/memanalyze.pl dump
798 
799  This now outputs a report on what resources that were allocated but never
800  freed etc. This report is very fine for posting to the list!
801 
802  If this doesn't produce any output, no leak was detected in libcurl. Then
803  the leak is mostly likely to be in your code.
804 
805 <a name="multi_socket"></a>
806 `multi_socket`
807 ==============
808 
809  Implementation of the `curl_multi_socket` API
810 
811  The main ideas of this API are simply:
812 
813  1 - The application can use whatever event system it likes as it gets info
814  from libcurl about what file descriptors libcurl waits for what action
815  on. (The previous API returns `fd_sets` which is very select()-centric).
816 
817  2 - When the application discovers action on a single socket, it calls
818  libcurl and informs that there was action on this particular socket and
819  libcurl can then act on that socket/transfer only and not care about
820  any other transfers. (The previous API always had to scan through all
821  the existing transfers.)
822 
823  The idea is that [`curl_multi_socket_action()`][7] calls a given callback
824  with information about what socket to wait for what action on, and the
825  callback only gets called if the status of that socket has changed.
826 
827  We also added a timer callback that makes libcurl call the application when
828  the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
829  and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
830  Internally, there's an added struct to each easy handle in which we store
831  an "expire time" (if any). The structs are then "splay sorted" so that we
832  can add and remove times from the linked list and yet somewhat swiftly
833  figure out both how long there is until the next nearest timer expires
834  and which timer (handle) we should take care of now. Of course, the upside
835  of all this is that we get a [`curl_multi_timeout()`][8] that should also
836  work with old-style applications that use [`curl_multi_perform()`][11].
837 
838  We created an internal "socket to easy handles" hash table that given
839  a socket (file descriptor) returns the easy handle that waits for action on
840  that socket. This hash is made using the already existing hash code
841  (previously only used for the DNS cache).
842 
843  To make libcurl able to report plain sockets in the socket callback, we had
844  to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
845  the conversion from sockets to `fd_sets` for that function is only done in
846  the last step before the data is returned. I also had to extend c-ares to
847  get a function that can return plain sockets, as that library too returned
848  only `fd_sets` and that is no longer good enough. The changes done to c-ares
849  are available in c-ares 1.3.1 and later.
850 
851 <a name="structs"></a>
852 Structs in libcurl
853 ==================
854 
855 This section should cover 7.32.0 pretty accurately, but will make sense even
856 for older and later versions as things don't change drastically that often.
857 
858 ## Curl_easy
859 
860  The `Curl_easy` struct is the one returned to the outside in the external API
861  as a "CURL *". This is usually known as an easy handle in API documentations
862  and examples.
863 
864  Information and state that is related to the actual connection is in the
865  'connectdata' struct. When a transfer is about to be made, libcurl will
866  either create a new connection or re-use an existing one. The particular
867  connectdata that is used by this handle is pointed out by
868  `Curl_easy->easy_conn`.
869 
870  Data and information that regard this particular single transfer is put in
871  the SingleRequest sub-struct.
872 
873  When the `Curl_easy` struct is added to a multi handle, as it must be in
874  order to do any transfer, the ->multi member will point to the `Curl_multi`
875  struct it belongs to. The ->prev and ->next members will then be used by the
876  multi code to keep a linked list of `Curl_easy` structs that are added to
877  that same multi handle. libcurl always uses multi so ->multi *will* point to
878  a `Curl_multi` when a transfer is in progress.
879 
880  ->mstate is the multi state of this particular `Curl_easy`. When
881  `multi_runsingle()` is called, it will act on this handle according to which
882  state it is in. The mstate is also what tells which sockets to return for a
883  specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
884 
885  The libcurl source code generally use the name 'data' for the variable that
886  points to the `Curl_easy`.
887 
888  When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
889  an individual stream, sharing the same connectdata struct. Multiplexing
890  makes it even more important to keep things associated with the right thing!
891 
892 ## connectdata
893 
894  A general idea in libcurl is to keep connections around in a connection
895  "cache" after they have been used in case they will be used again and then
896  re-use an existing one instead of creating a new as it creates a significant
897  performance boost.
898 
899  Each 'connectdata' identifies a single physical connection to a server. If
900  the connection can't be kept alive, the connection will be closed after use
901  and then this struct can be removed from the cache and freed.
902 
903  Thus, the same `Curl_easy` can be used multiple times and each time select
904  another connectdata struct to use for the connection. Keep this in mind, as
905  it is then important to consider if options or choices are based on the
906  connection or the `Curl_easy`.
907 
908  Functions in libcurl will assume that connectdata->data points to the
909  `Curl_easy` that uses this connection (for the moment).
910 
911  As a special complexity, some protocols supported by libcurl require a
912  special disconnect procedure that is more than just shutting down the
913  socket. It can involve sending one or more commands to the server before
914  doing so. Since connections are kept in the connection cache after use, the
915  original `Curl_easy` may no longer be around when the time comes to shut down
916  a particular connection. For this purpose, libcurl holds a special dummy
917  `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
918 
919  FTP uses two TCP connections for a typical transfer but it keeps both in
920  this single struct and thus can be considered a single connection for most
921  internal concerns.
922 
923  The libcurl source code generally use the name 'conn' for the variable that
924  points to the connectdata.
925 
926 ## Curl_multi
927 
928  Internally, the easy interface is implemented as a wrapper around multi
929  interface functions. This makes everything multi interface.
930 
931  `Curl_multi` is the multi handle struct exposed as "CURLM *" in external
932  APIs.
933 
934  This struct holds a list of `Curl_easy` structs that have been added to this
935  handle with [`curl_multi_add_handle()`][13]. The start of the list is
936  `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
937 
938  `->msglist` is a linked list of messages to send back when
939  [`curl_multi_info_read()`][14] is called. Basically a node is added to that
940  list when an individual `Curl_easy`'s transfer has completed.
941 
942  `->hostcache` points to the name cache. It is a hash table for looking up
943  name to IP. The nodes have a limited life time in there and this cache is
944  meant to reduce the time for when the same name is wanted within a short
945  period of time.
946 
947  `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
948  until it should be checked - normally some sort of timeout. Each `Curl_easy`
949  has one node in the tree.
950 
951  `->sockhash` is a hash table to allow fast lookups of socket descriptor for
952  which `Curl_easy` uses that descriptor. This is necessary for the
953  `multi_socket` API.
954 
955  `->conn_cache` points to the connection cache. It keeps track of all
956  connections that are kept after use. The cache has a maximum size.
957 
958  `->closure_handle` is described in the 'connectdata' section.
959 
960  The libcurl source code generally use the name 'multi' for the variable that
961  points to the `Curl_multi` struct.
962 
963 ## Curl_handler
964 
965  Each unique protocol that is supported by libcurl needs to provide at least
966  one `Curl_handler` struct. It defines what the protocol is called and what
967  functions the main code should call to deal with protocol specific issues.
968  In general, there's a source file named [protocol].c in which there's a
969  "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's
970  then the main array with all individual `Curl_handler` structs pointed to
971  from a single array which is scanned through when a URL is given to libcurl
972  to work with.
973 
974  `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
975  "HTTP" or "FTP" etc. SSL versions of the protocol need their own `Curl_handler` setup so HTTPS separate from HTTP.
976 
977  `->setup_connection` is called to allow the protocol code to allocate
978  protocol specific data that then gets associated with that `Curl_easy` for
979  the rest of this transfer. It gets freed again at the end of the transfer.
980  It will be called before the 'connectdata' for the transfer has been
981  selected/created. Most protocols will allocate its private
982  'struct [PROTOCOL]' here and assign `Curl_easy->req.protop` to point to it.
983 
984  `->connect_it` allows a protocol to do some specific actions after the TCP
985  connect is done, that can still be considered part of the connection phase.
986 
987  Some protocols will alter the `connectdata->recv[]` and
988  `connectdata->send[]` function pointers in this function.
989 
990  `->connecting` is similarly a function that keeps getting called as long as
991  the protocol considers itself still in the connecting phase.
992 
993  `->do_it` is the function called to issue the transfer request. What we call
994  the DO action internally. If the DO is not enough and things need to be kept
995  getting done for the entire DO sequence to complete, `->doing` is then
996  usually also provided. Each protocol that needs to do multiple commands or
997  similar for do/doing need to implement their own state machines (see SCP,
998  SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
999  a separate piece of the DO state called `DO_MORE`.
1000 
1001  `->doing` keeps getting called while issuing the transfer request command(s)
1002 
1003  `->done` gets called when the transfer is complete and DONE. That's after the
1004  main data has been transferred.
1005 
1006  `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
1007  this state when setting up the second connection.
1008 
1009  ->`proto_getsock`
1010  ->`doing_getsock`
1011  ->`domore_getsock`
1012  ->`perform_getsock`
1013  Functions that return socket information. Which socket(s) to wait for which
1014  action(s) during the particular multi state.
1015 
1016  ->disconnect is called immediately before the TCP connection is shutdown.
1017 
1018  ->readwrite gets called during transfer to allow the protocol to do extra
1019  reads/writes
1020 
1021  ->defport is the default report TCP or UDP port this protocol uses
1022 
1023  ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions
1024  have their "base" protocol set and then the SSL variation. Like
1025  "HTTP|HTTPS".
1026 
1027  ->flags is a bitmask with additional information about the protocol that will
1028  make it get treated differently by the generic engine:
1029 
1030  - `PROTOPT_SSL` - will make it connect and negotiate SSL
1031 
1032  - `PROTOPT_DUAL` - this protocol uses two connections
1033 
1034  - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
1035  connection. This flag is no longer used by code, yet still set for a bunch
1036  of protocol handlers.
1037 
1038  - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
1039  limit which "direction" of socket actions that the main engine will
1040  concern itself with.
1041 
1042  - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:)
1043 
1044  - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
1045  one unless one is provided
1046 
1047  - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
1048  (?foo=bar)
1049 
1050 ## conncache
1051 
1052  Is a hash table with connections for later re-use. Each `Curl_easy` has a
1053  pointer to its connection cache. Each multi handle sets up a connection
1054  cache that all added `Curl_easy`s share by default.
1055 
1056 ## Curl_share
1057 
1058  The libcurl share API allocates a `Curl_share` struct, exposed to the
1059  external API as "CURLSH *".
1060 
1061  The idea is that the struct can have a set of its own versions of caches and
1062  pools and then by providing this struct in the `CURLOPT_SHARE` option, those
1063  specific `Curl_easy`s will use the caches/pools that this share handle
1064  holds.
1065 
1066  Then individual `Curl_easy` structs can be made to share specific things
1067  that they otherwise wouldn't, such as cookies.
1068 
1069  The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
1070  session cache.
1071 
1072 ## CookieInfo
1073 
1074  This is the main cookie struct. It holds all known cookies and related
1075  information. Each `Curl_easy` has its own private CookieInfo even when
1076  they are added to a multi handle. They can be made to share cookies by using
1077  the share API.
1078 
1079 
1080 [1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
1081 [2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
1082 [3]: https://c-ares.haxx.se/
1083 [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
1084 [5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
1085 [6]: https://curl.haxx.se/docs/manpage.html#--compressed
1086 [7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
1087 [8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
1088 [9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
1089 [10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
1090 [11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
1091 [12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
1092 [13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
1093 [14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html


rc_tagdetect_client
Author(s): Monika Florek-Jasinska , Raphael Schaller
autogenerated on Sat Feb 13 2021 03:42:15