From 330303e07829dba90d84d1b384cce7490b83152a Mon Sep 17 00:00:00 2001 From: Vadim Zeitlin Date: Sat, 15 Apr 2023 20:34:24 +0200 Subject: [PATCH] Update wxString overview and documentation Avoid overlap between the two pages. Remove obsolete information. Document wxNO_IMPLICIT_WXSTRING_ENCODING. Don't mention wxUSE_STL any longer. --- docs/doxygen/overviews/string.h | 404 +++++-------------------------- docs/doxygen/overviews/unicode.h | 2 +- interface/wx/string.h | 118 ++++++--- 3 files changed, 149 insertions(+), 375 deletions(-) diff --git a/docs/doxygen/overviews/string.h b/docs/doxygen/overviews/string.h index 1d92ec1929..27e0670c30 100644 --- a/docs/doxygen/overviews/string.h +++ b/docs/doxygen/overviews/string.h @@ -11,30 +11,71 @@ @tableofcontents -wxString is a class which represents a Unicode string of arbitrary length and -containing arbitrary Unicode characters. +wxString is used for all strings in wxWidgets. This class is very similar to +the standard string class, and is implemented using it, but provides additional +compatibility functions to allow applications originally written for the much +older versions of wxWidgets to continue to work with the latest ones. -This class has all the standard operations you can expect to find in a string -class: dynamic memory management (string extends to accommodate new -characters), construction from other strings, compatibility with C strings and -wide character C strings, assignment operators, access to individual characters, string -concatenation and comparison, substring extraction, case conversion, trimming and -padding (with spaces), searching and replacing and both C-like @c printf (wxString::Printf) -and stream-like insertion functions as well as much more - see wxString for a -list of all functions. - -The wxString class has been completely rewritten for wxWidgets 3.0 but much work -has been done to make existing code using ANSI string literals work as it did -in previous versions. +When writing new code, you're encouraged to use wxString as if it were +`std::wstring` and use only functions compatible with the standard class. -@section overview_string_internal Internal wxString Encoding +@section overview_string_settings wxString Related Compilation Settings -Since wxWidgets 3.0 wxString may use any of @c UTF-16 (under Windows, using +The main build options affecting wxString are `wxUSE_UNICODE_WCHAR` and +`wxUSE_UNICODE_UTF8`, exactly one of which must be set to determine whether +fixed-width `wchar_t` or variable-width `char`-based strings are used +internally. Please see @ref overview_unicode_support_utf for more information +about this choice. + +The other options all affect the presence, or absence, of various implicit +conversions provided by this class. By default, wxString can be implicitly +created from `char*`, `wchar_t*`, `std::string` and `std::wstring` and can be +implicitly converted to `char*` or `wchar_t*`. This behaviour is convenient +and compatible with the previous wxWidgets versions, but is dangerous and may +result in unwanted conversions, please see @ref string_conv for how to disable +them. + + +@section overview_string_iterating Iterating over wxString + +It is possible to iterate over strings using indices, but the recommended way +to do it is to use use b iterators, either explicitly: + +@code +wxString s = "hello"; +wxString::const_iterator i; +for (i = s.begin(); i != s.end(); ++i) +{ + wxUniChar uni_ch = *i; + // do something with it +} +@endcode + +or, even simpler, implicitly, using range for loop: +@code +wxString s = "hello"; +for ( auto c : s ) +{ + // do something with "c" +} +@endcode + +@note wxString iterators have unusual proxy-like semantics and can be used to + modify the string even when @e not using references, i.e. with just @c + auto, as in the example above. + + +@section overview_string_internal wxString Internal Representation + +@note This section can be skipped at first reading and is provided solely for +informational purposes. + +As mentioned above, wxString may use any of @c UTF-16 (under Windows, using the native 16 bit @c wchar_t), @c UTF-32 (under Unix, using the native 32 bit @c wchar_t) or @c UTF-8 (under both Windows and Unix) to store its content. By default, @c wchar_t is used under all platforms, but wxWidgets can -be compiled with wxUSE_UNICODE_UTF8=1 to use UTF-8. +be compiled with wxUSE_UNICODE_UTF8=1 to use UTF-8 instead. For simplicity of implementation, wxString uses per code unit indexing instead of per code point indexing when using UTF-16, i.e. in the @@ -44,9 +85,10 @@ to be composed by 1 code unit, while this is really true only for characters in the @e BMP (Basic Multilingual Plane), as explained in more details in the @ref overview_unicode_encodings section. Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user code has to take care of -surrogate pairs himself. (Note however that Windows itself has -built-in support for surrogate pairs in UTF-16, such as for drawing strings on -screen.) +surrogate pairs manually if it needs to handle them (note however that +Windows itself has built-in support for surrogate pairs in UTF-16, such as for +drawing strings on screen, so nothing special needs to be done when just +passing strings containing surrogates to wxWidgets functions). @remarks Note that while the behaviour of wxString when wxUSE_UNICODE_WCHAR==1 @@ -111,326 +153,4 @@ more processing for common operations like e.g. length calculation. Finally, note that the type used by wxString to store Unicode code units (@c wchar_t or @c char) is always @c typedef-ined to be ::wxStringCharType. - -@section overview_string_binary Using wxString to store binary data - -wxString can be used to store binary data (even if it contains @c NULs) using the -functions wxString::To8BitData and wxString::From8BitData. - -Beware that even if @c NUL character is allowed, in the current string implementation -some methods might not work correctly with them. - -Note however that other classes like wxMemoryBuffer are more suited to this task. -For handling binary data you may also want to look at the wxStreamBuffer, -wxMemoryOutputStream, wxMemoryInputStream classes. - - -@section overview_string_comparison Comparison to Other String Classes - -The advantages of using a special string class instead of working directly with -C strings are so obvious that there is a huge number of such classes available. -The most important advantage is the need to always remember to allocate/free -memory for C strings; working with fixed size buffers almost inevitably leads -to buffer overflows. At last, C++ has a standard string class (@c std::string). So -why the need for wxString? There are several advantages: - -@li Efficiency: Since wxWidgets 3.0 wxString uses @c std::string (in UTF8 - mode under Linux, Unix and macOS) or @c std::wstring (in UTF16 mode under Windows) - internally by default to store its contents. wxString will therefore inherit the - performance characteristics from @c std::string. -@li Compatibility: This class tries to combine almost full compatibility - with the old wxWidgets 1.xx wxString class, some reminiscence of MFC's - CString class and 90% of the functionality of @c std::string class. -@li Rich set of functions: Some of the functions present in wxString are - very useful but don't exist in most of other string classes: for example, - wxString::AfterFirst, wxString::BeforeLast, wxString::Printf. - Of course, all the standard string operations are supported as well. -@li wxString is Unicode friendly: it allows to easily convert to - and from ANSI and Unicode strings (see @ref overview_unicode - for more details) and maps to @c std::wstring transparently. -@li Used by wxWidgets: And, of course, this class is used everywhere - inside wxWidgets so there is no performance loss which would result from - conversions of objects of any other string class (including @c std::string) to - wxString internally by wxWidgets. - -However, there are several problems as well. The most important one is probably -that there are often several functions to do exactly the same thing: for -example, to get the length of the string either one of wxString::length(), -wxString::Len() or wxString::Length() may be used. The first function, as -almost all the other functions in lowercase, is @c std::string compatible. The -second one is the "native" wxString version and the last one is the wxWidgets -1.xx way. - -So which is better to use? The usage of the @c std::string compatible functions is -strongly advised! It will both make your code more familiar to other C++ -programmers (who are supposed to have knowledge of @c std::string but not of -wxString), let you reuse the same code in both wxWidgets and other programs (by -just typedefing wxString as @c std::string when used outside wxWidgets) and by -staying compatible with future versions of wxWidgets which will probably start -using @c std::string sooner or later too. - -In the situations where there is no corresponding @c std::string function, please -try to use the new wxString methods and not the old wxWidgets 1.xx variants -which are deprecated and may disappear in future versions. - - -@section overview_string_advice Advice About Using wxString - -@subsection overview_string_implicitconv Implicit conversions - -The default behaviour, which can't be changed to avoid breaking compatibility -with the existing code, is to provide implicit conversions of wxString to -C-style strings, i.e. const char* and/or const wchar_t*. As -explained below, these conversions are dangerous and it is @e strongly -recommended to predefine @c wxNO_UNSAFE_WXSTRING_CONV for all new projects -using wxWidgets to disable them. Notice that this preprocessor symbol is -different from the more usual @c wxUSE_XXX build options, as it only needs to -be defined when building the application and doesn't require rebuilding the -library (and so can be used with e.g. system-provided libraries from Linux -system packages). - -If you can't disable the implicit conversions, it is still advised to use -wxString::c_str() instead of relying on them to clearly indicate when the -conversion is done as implicit conversions may result in difficult to find -bugs. For example, some of the dangers of this implicit conversion may be seen -in the following code fragment: - -@code -// this function converts the input string to uppercase, -// output it to the screen and returns the result -const char *SayHELLO(const wxString& input) -{ - wxString output = input.Upper(); - printf("Hello, %s!\n", output); - return output; -} -@endcode - -There are two nasty bugs in these three lines. The first is in the call to the -@c printf() function. Although the implicit conversion to C strings is applied -automatically by the compiler in the case of - -@code -puts(output); -@endcode - -because the argument of @c puts() is known to be of the type -const char*, this is @b not done for @c printf() which is a function -with variable number of arguments (and whose arguments are of unknown types). -So this call may do any number of things (including displaying the correct -string on screen), although the most likely result is a program crash. -The solution is to use wxString::c_str(). Just replace this line with this: - -@code -printf("Hello, %s!\n", output.c_str()); -@endcode - -The second bug is that returning @c output doesn't work. The implicit cast is -used again, so the code compiles, but as it returns a pointer to a buffer -belonging to a local variable which is deleted as soon as the function exits, -its contents are completely arbitrary. The solution to this problem is also -easy, just make the function return wxString instead of a C string. - -This leads us to the following general advice: all functions taking string -arguments should take const wxString& (this makes assignment to the -strings inside the function faster) and all functions returning strings -should return wxString - this makes it safe to return local variables. - -Note that wxString uses by default the current locale encoding to convert any C string -literal to Unicode. The same is done for converting to and from @c std::string -and for the return value of c_str(). -For this conversion, the @a wxConvLibc class instance is used. -See wxCSConv and wxMBConv. - -It is also possible to disable any automatic conversions from C -strings to Unicode. This can be useful when the @a wxConvLibc encoding -is not appropriate for the current software and platform. The macro @c -wxNO_IMPLICIT_WXSTRING_ENCODING disables all implicit conversions, and -forces the code to explicitly indicate the encoding of all C strings. - -Finally note that encodings, either implicitly or explicitly selected, -may not be able to represent all the string's characters. The result -in this case is undefined: the string may be empty, or the -unrepresentable characters may be missing or wrong. - -@code -wxString s; -// s = "world"; does not compile with wxNO_IMPLICIT_WXSTRING_ENCODING -s = wxString::FromAscii("world"); // Always compiles -s = wxASCII_STR("world"); // shorthand for the above -s = wxString::FromUTF8("world"); // Always compiles -s = wxString("world", wxConvLibc); // Always compiles, explicit encoding -s = wxASCII_STR("Grüße"); // Always compiles but encoding fails - -const char *c; -// c = s.c_str(); does not compile with wxNO_IMPLICIT_WXSTRING_ENCODING -// c = s.mb_str(); does not compile with wxNO_IMPLICIT_WXSTRING_ENCODING -c = s.ToAscii(); // Always compiles, encoding may fail -c = s.ToUTF8(); // Always compiles, encoding never fails -c = s.utf8_str(); // Alias for the above -c = s.mb_str(wxConvLibc); // Always compiles, explicit encoding -@endcode - -@subsection overview_string_iterating Iterating wxString Characters - -As previously described, when wxUSE_UNICODE_UTF8==1, wxString internally -uses the variable-length UTF8 encoding. -Accessing a UTF-8 string by index can be very @b inefficient because -a single character is represented by a variable number of bytes so that -the entire string has to be parsed in order to find the character. -Since iterating over a string by index is a common programming technique and -was also possible and encouraged by wxString using the access operator[]() -wxString implements caching of the last used index so that iterating over -a string is a linear operation even in UTF-8 mode. - -It is nonetheless recommended to use @b iterators (instead of index based -access) like this: - -@code -wxString s = "hello"; -wxString::const_iterator i; -for (i = s.begin(); i != s.end(); ++i) -{ - wxUniChar uni_ch = *i; - // do something with it -} -@endcode - -or, even simpler, range for loop: -@code -wxString s = "hello"; -for ( auto c : s ) -{ - // do something with "c" -} -@endcode - -@note wxString iterators have unusual proxy-like semantics and can be used to - modify the string even when @e not using references, i.e. with just @c - auto, as in the example above. - - -@section overview_string_related String Related Functions and Classes - -As most programs use character strings, the standard C library provides quite -a few functions to work with them. Unfortunately, some of them have rather -counter-intuitive behaviour (like @c strncpy() which doesn't always terminate -the resulting string with a @NUL) and are in general not very safe (passing -@NULL to them will probably lead to program crash). Moreover, some very useful -functions are not standard at all. This is why in addition to all wxString -functions, there are also a few global string functions which try to correct -these problems: wxIsEmpty() verifies whether the string is empty (returning -@true for @NULL), wxStrlen() also handles @NULL correctly and returns -0 for them and wxStricmp() is just a platform-independent version of -case-insensitive string comparison function known either as @c stricmp() or -@c strcasecmp() on different platforms. - -The @ header also defines wxSnprintf() and wxVsnprintf() -functions which should be used instead of the inherently dangerous standard -@c sprintf() and which use @c snprintf() instead which does buffer size checks -whenever possible. Of course, you may also use wxString::Printf which is also -safe. - -There is another class which might be useful when working with wxString: -wxStringTokenizer. It is helpful when a string must be broken into tokens and -replaces the standard C library @c strtok() function. - -And the very last string-related class is wxArrayString: it is just a version -of the "template" dynamic array class which is specialized to work with -strings. Please note that this class is specially optimized (using its -knowledge of the internal structure of wxString) for storing strings and so it -is vastly better from a performance point of view than a wxObjectArray of -wxStrings. - - -@section overview_string_tuning Tuning wxString for Your Application - -@note This section is strictly about performance issues and is absolutely not -necessary to read for using wxString class. Please skip it unless you feel -familiar with profilers and relative tools. - -For the performance reasons wxString doesn't allocate exactly the amount of -memory needed for each string. Instead, it adds a small amount of space to each -allocated block which allows it to not reallocate memory (a relatively -expensive operation) too often as when, for example, a string is constructed by -subsequently adding one character at a time to it, as for example in: - -@code -// delete all vowels from the string -wxString DeleteAllVowels(const wxString& original) -{ - wxString vowels( "aeuioAEIOU" ); - wxString result; - wxString::const_iterator i; - for ( i = original.begin(); i != original.end(); ++i ) - { - if (vowels.Find( *i ) == wxNOT_FOUND) - result += *i; - } - - return result; -} -@endcode - -This is quite a common situation and not allocating extra memory at all would -lead to very bad performance in this case because there would be as many memory -(re)allocations as there are consonants in the original string. Allocating too -much extra memory would help to improve the speed in this situation, but due to -a great number of wxString objects typically used in a program would also -increase the memory consumption too much. - -The very best solution in precisely this case would be to use wxString::Alloc() -function to preallocate, for example, len bytes from the beginning - this will -lead to exactly one memory allocation being performed (because the result is at -most as long as the original string). - -However, using wxString::Alloc() is tedious and so wxString tries to do its -best. The default algorithm assumes that memory allocation is done in -granularity of at least 16 bytes (which is the case on almost all of -wide-spread platforms) and so nothing is lost if the amount of memory to -allocate is rounded up to the next multiple of 16. Like this, no memory is lost -and 15 iterations from 16 in the example above won't allocate memory but use -the already allocated pool. - -The default approach is quite conservative. Allocating more memory may bring -important performance benefits for programs using (relatively) few very long -strings. The amount of memory allocated is configured by the setting of -@c EXTRA_ALLOC in the file string.cpp during compilation (be sure to understand -why its default value is what it is before modifying it!). You may try setting -it to greater amount (say twice nLen) or to 0 (to see performance degradation -which will follow) and analyse the impact of it on your program. If you do it, -you will probably find it helpful to also define @c WXSTRING_STATISTICS symbol -which tells the wxString class to collect performance statistics and to show -them on stderr on program termination. This will show you the average length of -strings your program manipulates, their average initial length and also the -percent of times when memory wasn't reallocated when string concatenation was -done but the already preallocated memory was used (this value should be about -98% for the default allocation policy, if it is less than 90% you should -really consider fine tuning wxString for your application). - -It goes without saying that a profiler should be used to measure the precise -difference the change to @c EXTRA_ALLOC makes to your program. - - -@section overview_string_settings wxString Related Compilation Settings - -wxString always supports Unicode in wxWidgets 3.3 and later, but it may use -either UTF-8 or `wchar_t` (which, in turn, may use either UTF-16 or UTF-32) -internally. It uses the latter if @c wxUSE_UNICODE_WCHAR is set, which is the case by -default. You may want to set it to 0 and set @c wxUSE_UNICODE_UTF8 to 1 instead -to use UTF-8 internally. wxString still provides the same API in this case, but -using UTF-8 has performance implications as explained in @ref -overview_unicode_performance, so it probably shouldn't be enabled for legacy -code which might contain a lot of index-using loops. - -As mentioned in @ref overview_string_implicitconv, @c wxNO_UNSAFE_WXSTRING_CONV -should be defined by all code using this class to opt-in safer, but not -backwards-compatible, behaviour of @e not providing dangerous implicit -conversions to C-style strings. This option is convenient when using standard -build of the library as it doesn't require rebuilding it, but for custom builds -it is also possible to set @c wxUSE_UNSAFE_WXSTRING_CONV to 0 in order to -disable the implicit conversions for all applications using it. - -See also @ref page_wxusedef_important for a few other options affecting wxString. - */ diff --git a/docs/doxygen/overviews/unicode.h b/docs/doxygen/overviews/unicode.h index 2eb94c6bc7..8480948191 100644 --- a/docs/doxygen/overviews/unicode.h +++ b/docs/doxygen/overviews/unicode.h @@ -185,7 +185,7 @@ UTF-16 without support for surrogate characters) is used as @c wchar_t is 2 bytes on this platform. Under Unix systems, including macOS, UCS-4 (also known as UTF-32) is used by default, however it is also possible to build wxWidgets to use UTF-8 internally by passing @c \--enable-utf8 option to -configure. +configure or setting `wxUSE_UNICODE_UTF8` to 1 in `wx/setup.h`. The interface provided by wxString is the same independently of the format used internally. However different formats have specific advantages and diff --git a/interface/wx/string.h b/interface/wx/string.h index 1d8a2410fc..c487553b2a 100644 --- a/interface/wx/string.h +++ b/interface/wx/string.h @@ -33,9 +33,8 @@ wxString tries to be similar to both @c std::string and @c std::wstring and can mostly be used as either class. It provides practically all of the methods of these classes, which behave exactly the same as in the standard - C++, and so are not documented here (please see any standard library - documentation, for example http://en.cppreference.com/w/cpp/string for more - details). + C++, and so are not documented here (please see documentation at + https://en.cppreference.com/w/cpp/string/basic_string for this). In addition to these standard methods, wxString adds functions dealing with the conversions between different string encodings, described below, as @@ -57,23 +56,46 @@ - ASCII string guaranteed to contain only 7 bit characters using wxString::FromAscii(). - Narrow @c char* string in the current locale encoding using implicit - wxString::wxString(const char*) constructor. + wxString::wxString(const char*) constructor or using more explicit + wxString::wxString(const char*, const wxMBConv&) constructor passing it + wxConvLibc as the second argument. - Narrow @c char* string in UTF-8 encoding using wxString::FromUTF8(). - Narrow @c char* string in the given encoding using wxString::wxString(const char*, const wxMBConv&) constructor passing a wxCSConv corresponding to the encoding as the second argument. - - Standard @c std::string using implicit wxString::wxString(const - std::string&) constructor. Notice that this constructor supposes that - the string contains data in the current locale encoding, use FromUTF8() - or the constructor taking wxMBConv if this is not the case. - - Wide @c wchar_t* string using implicit wxString::wxString(const - wchar_t*) constructor. - - Standard @c std::wstring using implicit wxString::wxString(const - std::wstring&) constructor. + - Standard @c std::string using implicit + wxString::wxString(const std::string&) constructor. + Notice that this constructor supposes that the string contains data in + the current locale encoding, use FromUTF8() if the string contains + UTF-8-encoded data instead. + - Wide @c wchar_t* string using implicit + wxString::wxString(const wchar_t*) constructor. + - Standard @c std::wstring using implicit + wxString::wxString(const std::wstring&) constructor. Notice that many of the constructors are implicit, meaning that you don't even need to write them at all to pass the existing string to some - wxWidgets function taking a wxString. + wxWidgets function taking a wxString. This is convenient, but can also be + dangerous when constructing wxString from `char*` or `std::string` if it + doesn't have the expected encoding, as the resulting string will be empty + if the conversion from the current locale encoding fails. If you want to + disable all such conversions at compile-time, you may predefine + `wxNO_IMPLICIT_WXSTRING_ENCODING` when compiling the application code and + the corresponding conversions become inaccessible, i.e. + @code + wxString s; + // s = "world"; does not compile with wxNO_IMPLICIT_WXSTRING_ENCODING + s = wxString::FromAscii("world"); // Always compiles + s = wxASCII_STR("world"); // shorthand for the above + s = wxString::FromUTF8("world"); // Always compiles + s = wxString("world", wxConvLibc); // Always compiles, explicit encoding + s = wxASCII_STR("Grüße"); // Always compiles but s may be empty! + @endcode + + The only case in which such conversions are fully safe is when the library + is compiled with `wxUSE_UTF8_LOCALE_ONLY` option set to 1, as all the + strings are assumed to be in UTF-8 encoding then. + Similarly, wxString can be converted to: - ASCII string using wxString::ToAscii(). This is a potentially @@ -89,19 +111,45 @@ of the returned string is specified with a wxMBConv object, so this conversion is potentially destructive as well. To ensure that there is no data loss, use @c wxConvUTF8 conversion or wxString::utf8_string(). - - Wide C string using wxString::wc_str(). + - Wide C string using implicit conversion or wxString::wc_str() + explicitly. - Standard @c std::wstring using wxString::ToStdWstring(). - @note If you built wxWidgets with @c wxUSE_STL set to 1, the implicit - conversions to both narrow and wide C strings are disabled and replaced - with implicit conversions to @c std::string and @c std::wstring. - Please notice that the conversions marked as "potentially destructive" - above can result in loss of data if their result is not checked, so you - need to verify that converting the contents of a non-empty Unicode string - to a non-UTF-8 multibyte encoding results in non-empty string. The simplest - and best way to ensure that the conversion never fails is to always use - UTF-8. + As above, defining `wxNO_IMPLICIT_WXSTRING_ENCODING` when compiling + application code prevents the implicit use of the current locale encoding + and disables implicit conversions to `char*` and `std::string` as well as + using mb_str() and ToStdString() without explicitly specifying the + encoding: + @code + const char *c; + // c = s.c_str(); does not compile with wxNO_IMPLICIT_WXSTRING_ENCODING + // c = s.mb_str(); does not compile with wxNO_IMPLICIT_WXSTRING_ENCODING + c = s.ToAscii(); // Always compiles, encoding may fail + c = s.ToUTF8(); // Always compiles, encoding never fails + c = s.utf8_str(); // Alias for the above + c = s.mb_str(wxConvLibc); // Always compiles, explicit encoding, but + // conversion may fail! + @endcode + + However, if completely disabling conversions to narrow strings by defining + `wxNO_IMPLICIT_WXSTRING_ENCODING` is undesirable, it is also possible to + disable implicit conversions by predefining `wxNO_UNSAFE_WXSTRING_CONV` + instead, i.e. with this symbol defined implicit conversion to `const char*` + becomes unavailable -- but explicit conversions using c_str() and mb_str() + still work. + + Finally, please note that implicit conversion to both `const char*` and + `const wchar_t*` may be entirely disabled by setting the build option + `wxUSE_CHAR_CONV_IN_WXSTRING` to 0. Unlike with `wxNO_XXX` constants, this + option requires rebuilding the library after changing its value. + + + To summarize, the safest way to use wxString is to always define + `wxNO_IMPLICIT_WXSTRING_ENCODING` in the application compilation options to + disable all implicit uses of encoding and specify it explicitly, typically + by using utf8_str() or utf8_string() and FromUTF8() for conversions, for + every operation. @section string_gotchas Traps for the unwary @@ -245,13 +293,11 @@ @section string_performance Performance characteristics - wxString uses @c std::basic_string internally to store its content (unless - this is not supported by the compiler or disabled specifically when - building wxWidgets) and it therefore inherits many features from @c - std::basic_string. In particular, most modern implementations of @c - std::basic_string are thread-safe and don't use reference counting (making - copying large strings potentially expensive) and so wxString has the same - characteristics. + wxString uses @c std::basic_string internally to store its content and it + therefore inherits many features from the standard class. In particular, + most implementations of @c std::basic_string use small string optimization, + meaning that they avoid allocating heap memory for short strings, and this + is also true for wxString. By default, wxString uses @c std::basic_string specialized for the platform-dependent @c wchar_t type, meaning that it is not memory-efficient @@ -691,6 +737,8 @@ public: Converts the string to an 8-bit string in ISO-8859-1 encoding in the form of a wxCharBuffer. + @note It is not recommended to use wxString for storing binary data. + This is a convenience method useful when storing binary data in wxString. It should be used @em only for this purpose. It is only valid to call this method on strings created using From8BitData(). @@ -1765,8 +1813,14 @@ public: ///@{ /** - Converts given buffer of binary data from 8-bit string to wxString. In - Unicode build, the string is interpreted as being in ISO-8859-1 + Converts given buffer of binary data from 8-bit string to wxString. + + @note Using `std::vector` is both simpler and more efficient + than using wxString for storing binary data. See also more specialized + classes provided by wxWidgets for working with binary data, such as + wxMemoryBuffer and wxMemoryOutputStream and wxMemoryInputStream. + + In Unicode build, the string is interpreted as being in ISO-8859-1 encoding. The version without @e len parameter takes NUL-terminated data.