View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000177 | Ecere SDK | ecere | public | 2009-05-03 05:55 | 2014-07-11 18:12 |
Reporter | jerome | Assigned To | |||
Priority | high | Severity | feature | Reproducibility | have not tried |
Status | new | Resolution | open | ||
Target Version | 0.45 Ginkakuji | ||||
Summary | 0000177: eC String Solution | ||||
Description | Design an eC String Solution C strings are very counter-intuitive and require an in-depth understanding of the concepts of arrays and pointers. They make doing simple stuff very difficult for beginners. However C has a great advantage with its strings in that they have very little overhead, and can be hold purely in stack memory, whereas more heavy string objects cannot. Also, we want to avoid copying strings many times when the same temporary buffer could be used. There are many different usages for strings, where string literals, strings that will remain the same, strings that will be modified or returned as an output, etc. A big problem with the use of fixed size buffers in C is the potential risk for buffer overflow and the security risks that ensues. We'd like to have a nice eC solutions that has all of the goods and none of the bads (which is probably why this feature is still not implemented). Some of the things we want in this String class: - Compatibility to transparently pass them directly to C APIs - Storing the allocated size and byte count along with the string. One possibility is to store it at the head of the allocated buffer memory to avoid a referencing level and allow a string to be purely a stack object. - Simplification of memory management (Possibly done along with local instances being auto-decref'ed/deleted) - Bundled String manipulation routines: possibly as methods of the class . | ||||
Tags | No tags attached. | ||||
|
Look at the BString library for C. It's pretty nice as far as C libraries go. http://bstring.sourceforge.net/ |
|
Initial String Design Notes (Ancient, Requires some serious review, not to be taken too literally) [ ] String dataType [ ] Solution for properties and text format strings [ ] Strings char * name; name.ext: char tempString1[MAX_EXTENSION]; (String_GetExtension(name, tempString1), tempString1) if(name.ext == "this") : char tempString1[MAX_EXTENSION]; if(Operator_String_Equal((String_GetExtension(name, tempString1), tempString1), "this")) (Return type of String_GetExtension is char[MAX_EXTENSION]) char * yo = "yo"; char * bla; bla = "This is " + yo + ", ok?"; delete bla; * Type of Operator_String_Add is "unassigned string" bla = Operator_String_Add_UnassignedString_String(Operator_String_Add_String_String("This is ", yo), ", ok?"); if(bla) free(bla); char * bla = "This" + " is"; bla += " nice"; delete bla; - Gets len of "This", gets len of " is" - Allocates enough space - Copies string1, strcat string2 - Gets len of " nice" - realloacte, strcat - frees // Wrong: char * bla = "This"; bla += " nice"; delete bla; // Correct: char * bla { "This" }; bla += " nice"; delete bla; char hey[100]; char * yo = "hey" + " you"; hey = yo; delete yo; // Just copies the address: char hey[100] = "hey you"; char * yo = hey; char * a = "this"; char hey[100] = "hey " + a + "you"; make a string out of "hey " + a reallocates, strcat "you" strcpy to hey, frees it [ ] Verify string reference / string copy |
|
Take a look at Joey's JString class |
|
Look at the usage of things like Array<char> s { minAllocSize = 1024 }; in the code which provide a current solution. |
2009-05-06 02:46
|
|
|
Strings would have whatever required data header at the top of the buffer. String[256] s { "Hello, Strings" }; String s = "Hello, Strings"; // This would simply make s point to the string literal String s { "Hello, Strings" }; // This would allocate memory for a copy of the string and be freed outside its scope String[256] s { "Hello, Strings" }; // This would make the string fixed to 256, but functions would have range checking... (akin to char s[256]); (Object & String buffer Allocated entirely on the stack) String[256+] s { "Hello, Strings" }; // This would start size at 256, and then support more String[256+64] s { "Hello, Strings" }; // This would start size at 256, and auto increase size by 64 String[] s { "Hello, Strings" }; // This would make the string fixed to required size String[] s { "bla" }; // String object is entirely on the stack String s; s = { "bla" }; // String survives outside its scope |
|
Internationalization issue, e.g. replacement of Strings by internationalized versions will need to be taken into consideration |
|
Strings are objects they have a tiny little overhead 4-6 variables (e.g. a char * pointing either somewhere else or to the start of the strings a few bytes later?) which will always preceed the character buffer String s; actually simply declares a pointer to such an object String s { }; allocates such an object on the stack with an empty string which will be decref'ed when it goes out of scope (according to the upcoming improved eC auto reference counting ) as a way to allow a string to be fully on the stack such as with world reknown C performance style char myString[256]; String[256] s { }; will have a fixed size, yet automatically preventing buffer overflow version. String[] s { "Hello" }; is the same but getting the size from the string (akin to char myString[] = "Hello";) Passing an eC string to a C API expecting a string simply passes the buffer assigning a C string or a constant string literal will have a local String data structure to hold the extra data and we can have the fancy notation for specifyiing auto increase size etc. String[256+] String[+64] String [256+64] we'll need to evaluate the possibility of storing the data a few bytes before, since reallocating would change the pointer I think we actually only care of having it all in one place for fixed size strings |
|
char [12412] foo; as syntactic sugar for: Array<char> 12412 foo { }; |
|
Latest Development: - We'll likely have full ref counting to enable things like PrintString and row.name not to leak [no need, we only need anonymous instances incref/decref] - We'll use a 'struct' type to store strings so as to avoid the extra reference level, but we'll want destructors support... - We'll support 3 modes: pointer, stack, heap These examples illustrate the 3 modes: String foo = "Hello!"; // Pointer String foo { "Hello" }; // Heap String<30> foo = "Hello" or String<30> foo { "Hello" }; // Stack String<> foo { "Hello" }; // Here equivalent to String<5> For the pointer mode, attempting to modify it would either fail or convert to heap model String<30> will be like either char foo[31] but will have awareness of the size to avoid overflows. - Support String foo { minSize = 30, maxSize = 200, string = "Hello" }; for heap strings - We'll support + and += as syntactic sugar for string manipulation - Mostly unrelated, but as mentioned above char [] a; should be syntactic sugar for Array<char> a { }; - char [30] a; should be syntactic sugar for Array<char> a { minAllocSize = 30 }; - Consider supporting foo[2..5] for syntactic sugar for substr - Consider supporting "Hello" - 2 to strip last 2 characters as "Hello" + 2 would strip 2 first 2 characters |
|
Latest code snippet: // TODO: Will want it to be on the stack but be ref counted and have constructors, destructors public enum StringAllocType { pointer, stack, heap }; public class ZString { char * _string; int len; StringAllocType allocType; int size; int minSize; int maxSize; ZString() { maxSize = MAXINT; } ~ZString() { if(allocType == heap) delete _string; } void copyString(char * value, int newLen) { if(allocType == pointer) { size = 0; _string = null; allocType = heap; } if(allocType == heap) { int newSize = newLen ? newLen + 1 : 0; if(newSize != size) { if(newSize < minSize) newSize = minSize; else if(newSize > maxSize) newSize = maxSize; if(newSize && size) _string = renew _string char[newSize]; else if(newSize) _string = new char[newSize]; else delete _string; size = newSize; } } if(newLen + 1 > size) newLen = size-1; len = newLen; if(value) { memcpy(_string, value, newLen); _string[newLen] = 0; } } public: char * OnGetString(char * tempString, void * fieldData, bool * needClass) { return _string; } bool OnGetDataFromString(char * string) { property::string = string; return true; } property char * string { set { copyString(value, value ? strlen(value) : 0); } get { return _string; } } property char * { get { return _string; } set { return { len = value ? strlen(value) : 0; _string = value; allocType = pointer; }; } } void concat(ZString s) { if(s && allocType != pointer) { int addedLen = s.len; int newLen = len + addedLen; if(allocType == heap && newLen + 1 > size) { int newSize = newLen + 1; if(newSize > maxSize) newSize = maxSize; if(newSize > size) { _string = renew _string char[newLen]; size = newSize; } } if(newLen + 1 > size) addedLen = size - 1 - len; if(addedLen > 0) { memcpy(_string + len, s._string, addedLen); len += addedLen; _string[len] = 0; } } } void copy(ZString s) { copyString(s._string, s.len); } }; |
|
Just realized that String can't be in auto storage if it has reference count if it's not always on the heap. Otherwise updating one reference will not update the ref count in other references... |
|
We'll probably want to support references so as to be able to do: String b { "a" }; String & a = b; // To avoid copy b... a += "b"; // To also modify b It will simplify returning multiple values as well if support references for basic C types. Might also want: a += 1 or a++ // To trim first char a -= 1 or a-- to trim last char a[0] or *a to obtain first char unichar ch = a[x] to obtain unichar at byte position x for(ch : a) to go through all chars for(unichar ch : a) to go through all chars with Unicode chars |
|
Re: String a = b; - Disallow this syntax, the = should be used with an anonymous instance, return value, or conversion property on the right side (a new String, { b } to copy an existing string) -- Instead of disallowing, a = b could be equivalent to a = { b } in the other cases. - a will not free itself, as it will normally be assumed to be uninitialized prior to doing this, unless it was declared as an instance. - It must be possible to pass a instance so that it will get autofreed (0'ed or stack or literal, or it will leak): String a { }; a = 123.getString(); - This does not apply to String a = "Foo"; // This is fine - a would have to be deleted if not an instance String a; a = 123.getString(); delete a; - It would be nice to have an easy way to copy a class instance, e.g. where: Point a { 1, 2 }; Point b = a; Works for struct... For instances: Object a { 1, 2 }; Object b { :a }; could do the equivalent, still freeing at end of scope. - It would be nice to have an instance finalizer which is ran after setting the members, this would allow code that would otherwise be in the constructor to be moved to the finalizer, and thus ran only once after the copy as opposed to once in the constructor and again after the copy. The copy itself could also skip or manually call the constructor. |
|
void MyFunction(String a) { a = "a" + "b"; // Conversion properties should act like anonymous instances in regard to ref counts/destruction. Thus "a" gets converted to a String, gets "b" concatenated, and the result goes to a without a copy. PrintLn("a" + "b"); // Here since there's no equal the string gets destructed on exit. A method wanting to keep a string passed to it -- see note below. |
|
We may want a special flag for anonymous instances / conversion properties strings which a function could take ownership of to avoid a useless extra copy e.g.: Button btn { caption = "You entered: " + (String)amount.data }; property String caption { set { // The copy method here would be smart enough to reuse the memory and flag the input value struct that it must not delete. caption = { value }; } } |
|
String a { "Foo" }; String b { a }; We'll want the latter to go through a more efficient path than getting the char * string and creating a new heap string, since in this case b could be marked as a literal as well. |
|
- Support begins with (^), ends with ($), contains (~), exact (=), case insensitive variants: ?^ ?$ ?~ ?= String fn = "test.jpg"; String<MAX_EXTENSION> ext = fn.ext; if(ext ?= "jpg") property String ext { get { GetExtension(this, value); // or: return value % '.' + 1; } } |
|
- Watch out for literals dying on module unload? |
|
Return struct in a parameter to allow having storage? String GetExtension(String fn) { return fn % '.' + 1; } becomes something like (very raw, ignore many mistakes/omissions): void GetExtension(String output, String fn) { char * tmp = RSearchChar(fn.string, '.'); if(tmp) tmp++; output = { tmp }; } ---- More fun operators for search & trimming: struct FileName : String { property String ext { get { return this % '.' + 1; } } property FileName lastDir { get { return this % ['/','\\']; } } void StripLastDir() { this -%= ['/','\\']; } property FileName firstDir { get { return this -/ ['/','\\']; } } void StripFirstDir() {this = this / ['/','\\'] + 1; } }; property FileName firstDir { get { char * tmp = SearchChar(string, '/'); if(tmp) output = substr(0, tmp - string); else output = this; } } |
|
property String ext { get { return find('.') + 1; } } property FileName firstDir { get { String slash = find(['/', '\\'); if(slash) output = this[0..slash - this - 1]; else return this; } } |
|
Something like String<64*=2> , String<64*=2..16384> , String<64+=2>, String<64..128>, String<..1024>, String<1024> (This one is on the stack, like a char[1024]), String<*=1.5> The default would be whatever makes sense Perhaps setting the string to a specific value would be an exact size but as soon as you start concatenating it would trigger this logic "This " + "Bla" + "That"... This should probably be smart enough to allocate only once |
Date Modified | Username | Field | Change |
---|---|---|---|
2009-05-03 05:55 | jerome | New Issue | |
2009-05-06 02:20 | Scott | Note Added: 0000105 | |
2009-05-06 02:21 | jerome | Note Added: 0000106 | |
2009-05-06 02:21 | jerome | Note Added: 0000107 | |
2009-05-06 02:25 | jerome | Description Updated | |
2009-05-06 02:29 | jerome | Description Updated | |
2009-05-06 02:33 | jerome | Note Edited: 0000106 | |
2009-05-06 02:41 | jerome | Note Added: 0000108 | |
2009-05-06 02:43 | jerome | Note Edited: 0000108 | |
2009-05-06 02:46 | jerome | File Added: jstring.tar.bz2 | |
2009-05-06 03:21 | jerome | Note Added: 0000109 | |
2009-05-06 03:21 | jerome | Note Added: 0000110 | |
2009-05-06 03:25 | jerome | Note Edited: 0000109 | |
2009-05-06 03:28 | jerome | Note Edited: 0000109 | |
2009-05-06 03:44 | jerome | Note Added: 0000111 | |
2009-05-06 03:45 | jerome | Note Edited: 0000111 | |
2010-07-28 15:18 | jerome | Priority | normal => high |
2010-08-19 01:13 | jerome | Relationship added | related to 0000318 |
2012-03-08 16:52 | redj | Target Version | => 0.45 Ginkakuji |
2012-03-29 07:53 | redj | Category | => Ecere Runtime Library |
2012-03-29 07:53 | redj | Project | @1@ => Ecere SDK |
2013-04-25 09:24 | jerome | Target Version | 0.45 Ginkakuji => 0.44.4 Strings |
2013-05-02 03:15 | jerome | Note Added: 0000776 | |
2014-05-04 10:46 | jerome | Note Added: 0001271 | |
2014-05-04 10:46 | jerome | Relationship added | related to 0000513 |
2014-05-04 10:54 | jerome | Note Edited: 0001271 | |
2014-05-04 10:54 | jerome | Note Added: 0001272 | |
2014-05-04 11:02 | jerome | Note Edited: 0001271 | |
2014-05-08 05:15 | jerome | Note Added: 0001277 | |
2014-05-08 08:11 | jerome | Note Edited: 0001277 | |
2014-05-08 08:13 | jerome | Note Edited: 0001271 | |
2014-05-09 17:09 | jerome | Note Added: 0001281 | |
2014-05-09 17:25 | jerome | Note Added: 0001282 | |
2014-05-09 17:35 | jerome | Note Added: 0001283 | |
2014-05-09 17:55 | jerome | Note Added: 0001284 | |
2014-05-09 17:56 | jerome | Note Edited: 0001284 | |
2014-05-09 17:58 | jerome | Note Edited: 0001283 | |
2014-05-09 18:24 | jerome | Note Edited: 0001282 | |
2014-05-09 18:30 | jerome | Note Edited: 0001282 | |
2014-05-09 18:32 | jerome | Note Edited: 0001282 | |
2014-05-09 18:46 | jerome | Note Added: 0001285 | |
2014-05-09 18:48 | jerome | Note Edited: 0001282 | |
2014-05-09 18:49 | jerome | Note Edited: 0001282 | |
2014-05-09 19:04 | jerome | Note Added: 0001286 | |
2014-05-09 19:09 | jerome | Note Edited: 0001285 | |
2014-05-09 19:11 | jerome | Note Added: 0001287 | |
2014-05-09 19:17 | jerome | Note Edited: 0001284 | |
2014-05-09 19:20 | jerome | Note Edited: 0001282 | |
2014-05-09 19:35 | jerome | Note Edited: 0001286 | |
2014-05-09 20:21 | jerome | Note Added: 0001288 | |
2014-05-09 20:23 | jerome | Note Edited: 0001288 | |
2014-05-09 20:41 | jerome | Note Edited: 0001288 | |
2014-05-09 21:07 | jerome | Note Edited: 0001288 | |
2014-05-09 23:26 | jerome | Note Edited: 0001282 | |
2014-05-09 23:35 | jerome | Note Edited: 0001282 | |
2014-05-09 23:41 | jerome | Note Added: 0001289 | |
2014-05-10 17:30 | jerome | Note Added: 0001290 | |
2014-07-11 18:12 | jerome | Target Version | 0.44.30 Strings => 0.45 Ginkakuji |