View Issue Details

IDProjectCategoryView StatusLast Update
0000177Ecere SDKecerepublic2014-07-11 18:12
Reporterjerome Assigned To 
PriorityhighSeverityfeatureReproducibilityhave not tried
Status newResolutionopen 
Target Version0.45 Ginkakuji 
Summary0000177: eC String Solution
DescriptionDesign an eC String Solution

C strings are very counter-intuitive and require an in-depth understanding of the concepts of arrays and pointers. They make doing simple stuff very difficult for beginners.

However C has a great advantage with its strings in that they have very little overhead, and can be hold purely in stack memory, whereas more heavy string objects cannot. Also, we want to avoid copying strings many times when the same temporary buffer could be used.

There are many different usages for strings, where string literals, strings that will remain the same, strings that will be modified or returned as an output, etc.

A big problem with the use of fixed size buffers in C is the potential risk for buffer overflow and the security risks that ensues.

We'd like to have a nice eC solutions that has all of the goods and none of the bads (which is probably why this feature is still not implemented).


Some of the things we want in this String class:

- Compatibility to transparently pass them directly to C APIs
- Storing the allocated size and byte count along with the string. One possibility is to store it at the head of the allocated buffer memory to avoid a referencing level and allow a string to be purely a stack object.
- Simplification of memory management (Possibly done along with local instances being auto-decref'ed/deleted)
- Bundled String manipulation routines: possibly as methods of the class .
TagsNo tags attached.

Relationships

related to 0000513 new More Consistent Reference Counting: Block Scope 
related to 0000318 closedjerome String shouldn't be a normal Class 

Activities

Scott

2009-05-06 02:20

reporter   ~0000105

Look at the BString library for C. It's pretty nice as far as C libraries go. http://bstring.sourceforge.net/

jerome

2009-05-06 02:21

administrator   ~0000106

Last edited: 2009-05-06 02:33

Initial String Design Notes (Ancient, Requires some serious review, not to be taken too literally)

   [ ] String dataType
   [ ] Solution for properties and text format strings


[ ] Strings
   char * name;
   name.ext:
      char tempString1[MAX_EXTENSION];
      (String_GetExtension(name, tempString1), tempString1)

   if(name.ext == "this") :
      char tempString1[MAX_EXTENSION];
      if(Operator_String_Equal((String_GetExtension(name, tempString1), tempString1), "this"))

      (Return type of String_GetExtension is char[MAX_EXTENSION])

   char * yo = "yo";
   char * bla;
   bla = "This is " + yo + ", ok?";
   delete bla;

      * Type of Operator_String_Add is "unassigned string"
   bla = Operator_String_Add_UnassignedString_String(Operator_String_Add_String_String("This is ", yo), ", ok?");
   if(bla) free(bla);

   char * bla = "This" + " is";
   bla += " nice";
   delete bla;

      - Gets len of "This", gets len of " is"
      - Allocates enough space
      - Copies string1, strcat string2
      - Gets len of " nice"
      - realloacte, strcat
      - frees

   // Wrong:
   char * bla = "This";
   bla += " nice";
   delete bla;

   // Correct:
   char * bla { "This" };
   bla += " nice";
   delete bla;

   char hey[100];
   char * yo = "hey" + " you";
   hey = yo;
   delete yo;

   // Just copies the address:
   char hey[100] = "hey you";
   char * yo = hey;

   char * a = "this";
   char hey[100] = "hey " + a + "you";

      make a string out of "hey " + a
      reallocates, strcat "you"
      strcpy to hey, frees it
   [ ] Verify string reference / string copy

jerome

2009-05-06 02:21

administrator   ~0000107

Take a look at Joey's JString class

jerome

2009-05-06 02:41

administrator   ~0000108

Last edited: 2009-05-06 02:43

Look at the usage of things like Array<char> s { minAllocSize = 1024 }; in the code which provide a current solution.

2009-05-06 02:46

 

jstring.tar.bz2 (3,148 bytes)

jerome

2009-05-06 03:21

administrator   ~0000109

Last edited: 2009-05-06 03:28

Strings would have whatever required data header at the top of the buffer.

String[256] s { "Hello, Strings" };

String s = "Hello, Strings"; // This would simply make s point to the string literal

String s { "Hello, Strings" }; // This would allocate memory for a copy of the string and be freed outside its scope

String[256] s { "Hello, Strings" }; // This would make the string fixed to 256, but functions would have range checking... (akin to char s[256]); (Object & String buffer Allocated entirely on the stack)

String[256+] s { "Hello, Strings" }; // This would start size at 256, and then support more

String[256+64] s { "Hello, Strings" }; // This would start size at 256, and auto increase size by 64

String[] s { "Hello, Strings" }; // This would make the string fixed to required size


String[] s { "bla" }; // String object is entirely on the stack

String s;
s = { "bla" }; // String survives outside its scope

jerome

2009-05-06 03:21

administrator   ~0000110

Internationalization issue, e.g. replacement of Strings by internationalized versions will need to be taken into consideration

jerome

2009-05-06 03:44

administrator   ~0000111

Last edited: 2009-05-06 03:45

Strings are objects
they have a tiny little overhead
4-6 variables (e.g. a char * pointing either somewhere else or to the start of the strings a few bytes later?)
which will always preceed the character buffer
String s;
actually simply declares a pointer to such an object
String s { };
allocates such an object on the stack with an empty string
which will be decref'ed when it goes out of scope (according to the upcoming improved eC auto reference counting )
as a way to allow a string to be fully on the stack such as with world reknown C performance style char myString[256];
String[256] s { };
will have a fixed size, yet automatically preventing buffer overflow version.
String[] s { "Hello" }; is the same but getting the size from the string (akin to char myString[] = "Hello";)
Passing an eC string to a C API expecting a string simply passes the buffer
assigning a C string or a constant string literal will have a local String data structure to hold the extra data
and we can have the fancy notation for specifyiing auto increase size etc.
String[256+] String[+64] String [256+64]

we'll need to evaluate the possibility of storing the data a few bytes before, since reallocating would change the pointer
I think we actually only care of having it all in one place for fixed size strings

jerome

2013-05-02 03:15

administrator   ~0000776

char [12412] foo;

as syntactic sugar for:

Array<char> 12412 foo { };

jerome

2014-05-04 10:46

administrator   ~0001271

Last edited: 2014-05-08 08:13

Latest Development:

- We'll likely have full ref counting to enable things like PrintString and row.name not to leak [no need, we only need anonymous instances incref/decref]
- We'll use a 'struct' type to store strings so as to avoid the extra reference level, but we'll want destructors support...
- We'll support 3 modes: pointer, stack, heap
These examples illustrate the 3 modes:

String foo = "Hello!"; // Pointer
String foo { "Hello" }; // Heap
String<30> foo = "Hello"
or
String<30> foo { "Hello" }; // Stack
String<> foo { "Hello" }; // Here equivalent to String<5>

For the pointer mode, attempting to modify it would either fail or convert to heap model

String<30> will be like either char foo[31] but will have awareness of the size to avoid overflows.

- Support String foo { minSize = 30, maxSize = 200, string = "Hello" }; for heap strings
- We'll support + and += as syntactic sugar for string manipulation
- Mostly unrelated, but as mentioned above char [] a; should be syntactic sugar for Array<char> a { };
- char [30] a; should be syntactic sugar for Array<char> a { minAllocSize = 30 };
- Consider supporting foo[2..5] for syntactic sugar for substr
- Consider supporting "Hello" - 2 to strip last 2 characters as "Hello" + 2 would strip 2 first 2 characters

jerome

2014-05-04 10:54

administrator   ~0001272

Latest code snippet:

// TODO: Will want it to be on the stack but be ref counted and have constructors, destructors
public enum StringAllocType { pointer, stack, heap };

public class ZString
{
   char * _string;
   int len;
   StringAllocType allocType;
   int size;
   int minSize;
   int maxSize;

   ZString()
   {
      maxSize = MAXINT;
   }

   ~ZString()
   {
      if(allocType == heap)
         delete _string;
   }

   void copyString(char * value, int newLen)
   {
      if(allocType == pointer)
      {
         size = 0;
         _string = null;
         allocType = heap;
      }
      if(allocType == heap)
      {
         int newSize = newLen ? newLen + 1 : 0;
         if(newSize != size)
         {
            if(newSize < minSize) newSize = minSize;
            else if(newSize > maxSize) newSize = maxSize;

            if(newSize && size)
               _string = renew _string char[newSize];
            else if(newSize)
               _string = new char[newSize];
            else
               delete _string;
            size = newSize;
         }
      }
      if(newLen + 1 > size) newLen = size-1;
      len = newLen;

      if(value)
      {
         memcpy(_string, value, newLen);
         _string[newLen] = 0;
      }
   }

public:

   char * OnGetString(char * tempString, void * fieldData, bool * needClass)
   {
      return _string;
   }

   bool OnGetDataFromString(char * string)
   {
      property::string = string;
      return true;
   }

   property char * string
   {
      set { copyString(value, value ? strlen(value) : 0); }
      get { return _string; }
   }

   property char *
   {
      get { return _string; }
      set
      {
         return
         {
            len = value ? strlen(value) : 0;
            _string = value;
            allocType = pointer;
         };
      }
   }

   void concat(ZString s)
   {
      if(s && allocType != pointer)
      {
         int addedLen = s.len;
         int newLen = len + addedLen;
         if(allocType == heap && newLen + 1 > size)
         {
            int newSize = newLen + 1;
            if(newSize > maxSize)
               newSize = maxSize;
            if(newSize > size)
            {
               _string = renew _string char[newLen];
               size = newSize;
            }
         }
         if(newLen + 1 > size)
            addedLen = size - 1 - len;
         if(addedLen > 0)
         {
            memcpy(_string + len, s._string, addedLen);
            len += addedLen;
            _string[len] = 0;
         }
      }
   }

   void copy(ZString s)
   {
      copyString(s._string, s.len);
   }
};

jerome

2014-05-08 05:15

administrator   ~0001277

Last edited: 2014-05-08 08:11

Just realized that String can't be in auto storage if it has reference count if it's not always on the heap.
Otherwise updating one reference will not update the ref count in other references...

jerome

2014-05-09 17:09

administrator   ~0001281

We'll probably want to support references so as to be able to do:

String b { "a" };
String & a = b; // To avoid copy b...
a += "b"; // To also modify b

It will simplify returning multiple values as well if support references for basic C types.

Might also want:

a += 1 or a++ // To trim first char
a -= 1 or a-- to trim last char

a[0] or *a to obtain first char
unichar ch = a[x] to obtain unichar at byte position x

for(ch : a) to go through all chars
for(unichar ch : a) to go through all chars with Unicode chars

jerome

2014-05-09 17:25

administrator   ~0001282

Last edited: 2014-05-09 23:35

Re: String a = b;

- Disallow this syntax, the = should be used with an anonymous instance, return value, or conversion property on the right side (a new String, { b } to copy an existing string) -- Instead of disallowing, a = b could be equivalent to a = { b } in the other cases.
- a will not free itself, as it will normally be assumed to be uninitialized prior to doing this, unless it was declared as an instance.
- It must be possible to pass a instance so that it will get autofreed (0'ed or stack or literal, or it will leak):
String a { };
a = 123.getString();

- This does not apply to String a = "Foo"; // This is fine

- a would have to be deleted if not an instance

String a;
a = 123.getString();
delete a;

- It would be nice to have an easy way to copy a class instance, e.g. where:

Point a { 1, 2 };
Point b = a;

Works for struct... For instances:

Object a { 1, 2 };
Object b { :a };

could do the equivalent, still freeing at end of scope.

- It would be nice to have an instance finalizer which is ran after setting the members, this would allow code that would otherwise be in the constructor to be moved to the finalizer, and thus ran only once after the copy as opposed to once in the constructor and again after the copy. The copy itself could also skip or manually call the constructor.

jerome

2014-05-09 17:35

administrator   ~0001283

Last edited: 2014-05-09 17:58

void MyFunction(String a)
{
   a = "a" + "b"; // Conversion properties should act like anonymous instances in regard to ref counts/destruction. Thus "a" gets converted to a String, gets "b" concatenated, and the result goes to a without a copy.

   PrintLn("a" + "b"); // Here since there's no equal the string gets destructed on exit. A method wanting to keep a string passed to it -- see note below.

jerome

2014-05-09 17:55

administrator   ~0001284

Last edited: 2014-05-09 19:17

We may want a special flag for anonymous instances / conversion properties strings which a function could take ownership of to avoid a useless extra copy e.g.:

Button btn { caption = "You entered: " + (String)amount.data };

property String caption
{
   set
   {
      // The copy method here would be smart enough to reuse the memory and flag the input value struct that it must not delete.
      caption = { value };
   }
}

jerome

2014-05-09 18:46

administrator   ~0001285

Last edited: 2014-05-09 19:09

String a { "Foo" };
String b { a };


We'll want the latter to go through a more efficient path than getting the char * string and creating a new heap string, since in this case b could be marked as a literal as well.

jerome

2014-05-09 19:04

administrator   ~0001286

Last edited: 2014-05-09 19:35

- Support begins with (^), ends with ($), contains (~), exact (=), case insensitive variants: ?^ ?$ ?~ ?=

String fn = "test.jpg";
String<MAX_EXTENSION> ext = fn.ext;
if(ext ?= "jpg")

property String ext
{
   get
   {
      GetExtension(this, value);

      // or:
      return value % '.' + 1;
   }
}

jerome

2014-05-09 19:11

administrator   ~0001287

- Watch out for literals dying on module unload?

jerome

2014-05-09 20:21

administrator   ~0001288

Last edited: 2014-05-09 21:07

Return struct in a parameter to allow having storage?

String GetExtension(String fn)
{
   return fn % '.' + 1;
}

becomes something like (very raw, ignore many mistakes/omissions):

void GetExtension(String output, String fn)
{
   char * tmp = RSearchChar(fn.string, '.');
   if(tmp) tmp++;
   output = { tmp };
}
----

More fun operators for search & trimming:

struct FileName : String
{
   property String ext { get { return this % '.' + 1; } }
   property FileName lastDir { get { return this % ['/','\\']; } }
   void StripLastDir() { this -%= ['/','\\']; }
   property FileName firstDir { get { return this -/ ['/','\\']; } }
   void StripFirstDir() {this = this / ['/','\\'] + 1; }
};

property FileName firstDir
{
   get
   {
      char * tmp = SearchChar(string, '/');
      if(tmp)
         output = substr(0, tmp - string);
      else
         output = this;
   }
}

jerome

2014-05-09 23:41

administrator   ~0001289

property String ext
{
   get
   {
      return find('.') + 1;
   }
}

property FileName firstDir
{
   get
   {
      String slash = find(['/', '\\');
      if(slash)
         output = this[0..slash - this - 1];
      else
         return this;
   }
}

jerome

2014-05-10 17:30

administrator   ~0001290

Something like String<64*=2> , String<64*=2..16384> , String<64+=2>, String<64..128>, String<..1024>, String<1024> (This one is on the stack, like a char[1024]), String<*=1.5>
The default would be whatever makes sense
Perhaps setting the string to a specific value would be an exact size but as soon as you start concatenating it would trigger this logic

"This " + "Bla" + "That"... This should probably be smart enough to allocate only once

Issue History

Date Modified Username Field Change
2009-05-03 05:55 jerome New Issue
2009-05-06 02:20 Scott Note Added: 0000105
2009-05-06 02:21 jerome Note Added: 0000106
2009-05-06 02:21 jerome Note Added: 0000107
2009-05-06 02:25 jerome Description Updated
2009-05-06 02:29 jerome Description Updated
2009-05-06 02:33 jerome Note Edited: 0000106
2009-05-06 02:41 jerome Note Added: 0000108
2009-05-06 02:43 jerome Note Edited: 0000108
2009-05-06 02:46 jerome File Added: jstring.tar.bz2
2009-05-06 03:21 jerome Note Added: 0000109
2009-05-06 03:21 jerome Note Added: 0000110
2009-05-06 03:25 jerome Note Edited: 0000109
2009-05-06 03:28 jerome Note Edited: 0000109
2009-05-06 03:44 jerome Note Added: 0000111
2009-05-06 03:45 jerome Note Edited: 0000111
2010-07-28 15:18 jerome Priority normal => high
2010-08-19 01:13 jerome Relationship added related to 0000318
2012-03-08 16:52 redj Target Version => 0.45 Ginkakuji
2012-03-29 07:53 redj Category => Ecere Runtime Library
2012-03-29 07:53 redj Project @1@ => Ecere SDK
2013-04-25 09:24 jerome Target Version 0.45 Ginkakuji => 0.44.4 Strings
2013-05-02 03:15 jerome Note Added: 0000776
2014-05-04 10:46 jerome Note Added: 0001271
2014-05-04 10:46 jerome Relationship added related to 0000513
2014-05-04 10:54 jerome Note Edited: 0001271
2014-05-04 10:54 jerome Note Added: 0001272
2014-05-04 11:02 jerome Note Edited: 0001271
2014-05-08 05:15 jerome Note Added: 0001277
2014-05-08 08:11 jerome Note Edited: 0001277
2014-05-08 08:13 jerome Note Edited: 0001271
2014-05-09 17:09 jerome Note Added: 0001281
2014-05-09 17:25 jerome Note Added: 0001282
2014-05-09 17:35 jerome Note Added: 0001283
2014-05-09 17:55 jerome Note Added: 0001284
2014-05-09 17:56 jerome Note Edited: 0001284
2014-05-09 17:58 jerome Note Edited: 0001283
2014-05-09 18:24 jerome Note Edited: 0001282
2014-05-09 18:30 jerome Note Edited: 0001282
2014-05-09 18:32 jerome Note Edited: 0001282
2014-05-09 18:46 jerome Note Added: 0001285
2014-05-09 18:48 jerome Note Edited: 0001282
2014-05-09 18:49 jerome Note Edited: 0001282
2014-05-09 19:04 jerome Note Added: 0001286
2014-05-09 19:09 jerome Note Edited: 0001285
2014-05-09 19:11 jerome Note Added: 0001287
2014-05-09 19:17 jerome Note Edited: 0001284
2014-05-09 19:20 jerome Note Edited: 0001282
2014-05-09 19:35 jerome Note Edited: 0001286
2014-05-09 20:21 jerome Note Added: 0001288
2014-05-09 20:23 jerome Note Edited: 0001288
2014-05-09 20:41 jerome Note Edited: 0001288
2014-05-09 21:07 jerome Note Edited: 0001288
2014-05-09 23:26 jerome Note Edited: 0001282
2014-05-09 23:35 jerome Note Edited: 0001282
2014-05-09 23:41 jerome Note Added: 0001289
2014-05-10 17:30 jerome Note Added: 0001290
2014-07-11 18:12 jerome Target Version 0.44.30 Strings => 0.45 Ginkakuji