First let me address why I'm not using the built-in string interning. My code is used in a utility that needs to free all of its memory once it's closed. Strings that are interned are not likely to be freed until the CLR is restarted. I can't create that situation since my utility is used on servers that primarily run .NET based services. (See this reference on CLR not releasing interned strings)
On to the question:
I have created a custom string pool in C#, implemented as an extension method in a static
class that contains the actual "pool" in the form of a dictionary:
internal static Dictionary<string, string> StringPool = new Dictionary<string, string>();
internal static string Pool(this string value)
{
if (!string.IsNullOrEmpty(value) && value.Length <= 200)
{
string poolRef;
if (StringPool.TryGetValue(value, out poolRef))
value = poolRef;
else
StringPool.Add(value, value);
}
return value;
}
To use it, I simply append .Pool()
to any string in my project that I want to be pooled. The assumption I am making is that this approach is storing two copies of strings. My assumption comes from the fact that I can recover the actual strings by iterating the Keys in the StringPool
object and printing or viewing their contents.
The question I have is whether anyone knows of another indexed object type that I could use as a StringPool
object such that the string data would only be stored once, as the Key, rather than twice, as both Key and Value. Additionally I need the fast Key lookup feature that a Dictionary has.
Here's the pseudo code that I believe conveys what I'm after in practice:
internal static CoolClass<string> StringPool = new CoolClass<string>();
internal static string Pool(this string s)
{
if (!string.IsNullOrEmpty(s) && s.Length <= 200)
{
string keyRef = null;
// obtain reference to key in CoolClass if it already exists
if (StringPool.TryGetKey(s, out keyRef))
s = keyRef;
else
StringPool.AddKey(s);
}
return s;
}