Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in lmdb.cc + improve performance while reading collections in resolveMultiMatches #2520

Open
wants to merge 4 commits into
base: v3/master
Choose a base branch
from

Conversation

ziollek
Copy link
Contributor

@ziollek ziollek commented Feb 24, 2021

What have been done ?

  1. Removed memory leak in lmdb.cc while passing data from MDB_val to VariableValue:
  • added new VariableValue constructors for rvalue strings
  • decreased memory footprint (just one data copying between lmdb and VariableValue thanks to above constructors)
  1. Optimized fetching collection in resolveMultiMatches:
  • at the beginning, the cursor is moved to a key equal to or greater than the input key (MDB_SET_RANGE)
  • values are fetched via cursor as long as the input key is equal to the key fetched from lmdb

ziollek added 3 commits Feb 24, 2021
…ableValue

- add new VariableValue constructors for rvalue strings
- decrease number of memory copies between lmdb and VariableValue
- move cursor to first key equal or greather than input key (MDB_SET_RANGE)
- break while if input key is not equal key fetched from lmdb
@zimmerle zimmerle self-requested a review Feb 26, 2021
@zimmerle
Copy link
Contributor

@zimmerle zimmerle commented Feb 26, 2021

Hi @ziollek, thanks for the contribution.

I am wondering how to fit the rvalue change on VariableValue of the 3.1-experimental development branch -

* Use cases for VariableValue creation:
*
* AnchoredSet - Use case A (eg. ARGS). - Collection + Key
* Anchored - Use case B (eg. REQUEST_URI). - Key
* Custom - Use case C (eg. WEBAPP_ID). - Key
* CustomSet
* Fixed - Use case D (eg. TX). - Collection + Key
* Dynamic - Use case E (eg. ENV). - Collection + Key
*

Here is the key + value constructor -

explicit VariableValue(const std::string *collection,
const std::string *value = nullptr)
: m_origin(),
m_value(),
m_valueHolder(new std::string(value != nullptr?*value:"")), // FIXME: do we really need a copy here?
m_key(nullptr),
m_keyHolder(nullptr),
m_collection(collection)
{
m_value = m_valueHolder.get();
};

Here is the collection + key + value -

/* Use case D.1. - ARGS */
VariableValue(const std::string *collection,
std::unique_ptr<std::string> key,
std::unique_ptr<std::string> value)
: m_origin(),
m_value(nullptr),
m_valueHolder(std::move(value)),
m_key(nullptr),
m_keyHolder(std::move(key)),
m_collection(collection)
{
m_value = m_valueHolder.get();
m_key = m_keyHolder.get();
};
/* Use case D.2. - RULE */
VariableValue(const std::string *collection,
const std::string *key,
std::unique_ptr<std::string> value)
: m_origin(),
m_value(nullptr),
m_valueHolder(std::move(value)),
m_key(key),
m_keyHolder(nullptr),
m_collection(collection)
{
m_value = m_valueHolder.get();
};
/* Use case D.3. - TX */
VariableValue(const std::string *collection,
const std::string *key,
const std::string *value)
: m_origin(),
m_value(value),
m_valueHolder(nullptr),
m_key(key),
m_keyHolder(nullptr),
m_collection(collection)
{ };
// FIXME: It maybe the case for VariableValue to use string_view for everything.
/* Use case D.4. - MATCHED_VARS */
VariableValue(const std::string *collection,
const std::string *key,
const bpstd::string_view *value)
: m_origin(),
m_value(),
m_valueHolder(std::unique_ptr<std::string>(new std::string(value->c_str()))),
m_key(key),
m_keyHolder(nullptr),
m_collection(collection)
{
m_value = m_valueHolder.get();
};
/* Use case E.1. - Env */
VariableValue(std::unique_ptr<std::string> value,
std::unique_ptr<std::string> key,
std::shared_ptr<std::string> collection)
: m_origin(),
m_value(nullptr),
m_valueHolder(std::move(value)),
m_key(nullptr),
m_keyHolder(std::move(key)),
m_collection(collection.get())
{
m_value = m_valueHolder.get();
m_key = m_keyHolder.get();
};

Giving that changes on VariableValue, we already have the LMDB in this shape -

while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0) {
l->insert(l->begin(), std::make_shared<VariableValue>(
&m_name,
new std::string(reinterpret_cast<char *>(key.mv_data),
key.mv_size),
new std::string(reinterpret_cast<char *>(data.mv_data),
data.mv_size)));
}
} else {
while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0) {
char *a = reinterpret_cast<char *>(key.mv_data);
if (strncmp(var.c_str(), a, keySize) == 0) {
l->insert(l->begin(), std::make_shared<VariableValue>(
&m_name,
new std::string(reinterpret_cast<char *>(key.mv_data),
key.mv_size),
new std::string(reinterpret_cast<char *>(data.mv_data),
data.mv_size)));
}

The benefit of MDB_SET_RANGE change is very clear. What do you think about the changes that we have on 3.1-experimental?

@zimmerle zimmerle self-assigned this Feb 26, 2021
@zimmerle zimmerle added the 3.x label Feb 26, 2021
@ziollek
Copy link
Contributor Author

@ziollek ziollek commented Feb 26, 2021

Hi @zimmerle - thanks for your comments.

I wasn't aware of VariableValue changes in 3.1-experimental. Despite of internal changes in variable_value.h i'm afraid, there is still a memory leak in lmdb.cc - because of using new std::string. In effect, below constructor is called:

/* Use case D.3. - TX */
VariableValue(const std::string *collection,
const std::string *key,
const std::string *value)
: m_origin(),
m_value(value),
m_valueHolder(nullptr),
m_key(key),
m_keyHolder(nullptr),
m_collection(collection)

Above constructor (unlike constructors quoted in your comment) don't 'move' allocated in lmdb memory - it just copies it.

Luckily it could be easily fixed by replacing
new std::string(...) to std::make_unique<std::string>(...)

@zimmerle
Copy link
Contributor

@zimmerle zimmerle commented Mar 19, 2021

Different methods from lmdb back-end are performing this string copy.

resolveFirst -

ret = std::unique_ptr<std::string>(new std::string(
reinterpret_cast<char *>(mdb_value_ret.mv_data),
mdb_value_ret.mv_size));

resolveSingleMatch -

std::string *a = new std::string(
reinterpret_cast<char *>(mdb_value_ret.mv_data),
mdb_value_ret.mv_size);
VariableValue *v = new VariableValue(&var, a);

resolveMultiMatches -

l->insert(l->begin(), new VariableValue(
&m_name,
new std::string(reinterpret_cast<char *>(key.mv_data),
key.mv_size),
new std::string(reinterpret_cast<char *>(data.mv_data),
data.mv_size)));

and

l->insert(l->begin(), new VariableValue(
&m_name,
new std::string(reinterpret_cast<char *>(key.mv_data),
key.mv_size),
new std::string(reinterpret_cast<char *>(data.mv_data),
data.mv_size)));

resolveRegularExpression -

VariableValue *v = new VariableValue(
new std::string(reinterpret_cast<char *>(key.mv_data),
key.mv_size),
new std::string(reinterpret_cast<char *>(data.mv_data),
data.mv_size));

They are using the constructors -

Method Constructor Leak on v3/master
resolveFirst
explicit VariableValue(const std::string *key,
const std::string *value = nullptr)
: m_collection(""),
m_key(*key),
m_keyWithCollection(*key),
m_value(value != nullptr?*value:"")
{ }
Likely
resolveSingleMatch
explicit VariableValue(const std::string *key,
const std::string *value = nullptr)
: m_collection(""),
m_key(*key),
m_keyWithCollection(*key),
m_value(value != nullptr?*value:"")
{ }
Likely
resolveMultiMatches (a)
VariableValue(const std::string *collection,
const std::string *key,
const std::string *value)
: m_collection(*collection),
m_key(*key),
m_keyWithCollection(*collection + ":" + *key),
m_value(*value)
{ }
likely
resolveMultiMatches (b)
VariableValue(const std::string *collection,
const std::string *key,
const std::string *value)
: m_collection(*collection),
m_key(*key),
m_keyWithCollection(*collection + ":" + *key),
m_value(*value)
{ }
Likely
resolveRegularExpression
explicit VariableValue(const std::string *key,
const std::string *value = nullptr)
: m_collection(""),
m_key(*key),
m_keyWithCollection(*key),
m_value(value != nullptr?*value:"")
{ }
Likely

The class VariableValue has four std::string members: m_collection, m_key, m_keyWithCollection, and m_value the attribution of their values happens on the constructor for VariableValue. m_value could be also set via setValue. All attributions are based on the std::string copy constructor.

Later the VariableValue is used at -

Transaction

std::vector<const VariableValue *> l;
m_variableRequestHeaders.resolve(&l);
for (auto &h : l) {
fullRequest = fullRequest + h->getKey() + ": " + h->getValue() + "\n";
delete h;
}

ModSecurity/src/transaction.cc

Lines 1533 to 1539 in 4127c1b

m_variableRequestHeaders.resolve(&l);
for (auto &h : l) {
size_t pos = strlen("REQUEST_HEADERS:");
audit_log << h->getKeyWithCollection().c_str() + pos << ": ";
audit_log << h->getValue().c_str() << std::endl;
delete h;
}

ModSecurity/src/transaction.cc

Lines 1571 to 1576 in 4127c1b

m_variableResponseHeaders.resolve(&l);
for (auto &h : l) {
audit_log << h->getKey().c_str() << ": ";
audit_log << h->getValue().c_str() << std::endl;
delete h;
}

ModSecurity/src/transaction.cc

Lines 1670 to 1674 in 4127c1b

m_variableRequestHeaders.resolve(&l);
for (auto &h : l) {
LOGFY_ADD(h->getKey().c_str(), h->getValue().c_str());
delete h;
}

ModSecurity/src/transaction.cc

Lines 1700 to 1704 in 4127c1b

m_variableResponseHeaders.resolve(&l);
for (auto &h : l) {
LOGFY_ADD(h->getKey().c_str(), h->getValue().c_str());
delete h;
}

RunTimeString

z->m_var->evaluate(t, rr, &l);
if (l.size() > 0) {
s.append(l[0]->getValue());
}

RuleWithOperator

var->evaluate(trans, this, &e);
for (const VariableValue *v : e) {
const std::string &value = v->getValue();
const std::string &key = v->getKeyWithCollection();
if (exclusion.contains(v) ||
std::find_if(trans->m_ruleRemoveTargetById.begin(),
trans->m_ruleRemoveTargetById.end(),
[&, v, this](std::pair<int, std::string> &m) -> bool {
return m.first == m_ruleId && m.second == v->getKeyWithCollection();
}) != trans->m_ruleRemoveTargetById.end()
) {
delete v;
v = NULL;
continue;
}
if (exclusion.contains(v) ||
std::find_if(trans->m_ruleRemoveTargetByTag.begin(),
trans->m_ruleRemoveTargetByTag.end(),
[&, v, trans, this](std::pair<std::string, std::string> &m) -> bool {
return containsTag(m.first, trans) && m.second == v->getKeyWithCollection();
}) != trans->m_ruleRemoveTargetByTag.end()
) {
delete v;
v = NULL;
continue;
}
TransformationResults values;
executeTransformations(trans, value, values);
for (const auto &valueTemp : values) {
bool ret;
std::string valueAfterTrans = std::move(*valueTemp.first);
ret = executeOperatorAt(trans, key, valueAfterTrans, ruleMessage);
if (ret == true) {
ruleMessage->m_match = m_operator->resolveMatchMessage(trans,
key, value);
for (auto &i : v->getOrigin()) {
ruleMessage->m_reference.append(i->toText());
}
ruleMessage->m_reference.append(*valueTemp.second);
updateMatchedVars(trans, key, valueAfterTrans);
executeActionsIndependentOfChainedRuleResult(trans,
&containsBlock, ruleMessage);
performLogging(trans, ruleMessage, false);
globalRet = true;
}
}
delete v;
v = NULL;
}
e.clear();
e.reserve(4);
}

As an alternative the inMemoryBackEnd back-end for collections does not duplicates/copy the strings.

Investigating the best approach to tackle this without too much modification on the v3/master code.

data.mv_size)));
}
string2val(var, &key);
rc = mdb_cursor_get(cursor, &key, &data, MDB_SET_RANGE);
Copy link

@sobczak-m sobczak-m Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ziollek replace MDB_SET_RANGE to MDB_SET_KEY
MDB_SET_RANGE - getting first key greater than or equal to specified key http://www.lmdb.tech/doc/group__mdb.html#ga1206b2af8b95e7f6b0ef6b28708c9127
If you are looking for none exist samplekey you get key containing samplekey like samplekey2 (if such key exist)

Copy link
Contributor

@zimmerle zimmerle May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
break;
}
} while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace MDB_NEXT to MDB_NEXT_DUB to iterate only over current key

Copy link
Contributor

@zimmerle zimmerle May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimmerle zimmerle added this to Todo in v3.0.5 May 18, 2021
@zimmerle zimmerle moved this from Todo to WiP in v3.0.5 May 24, 2021
@zimmerle zimmerle removed this from WiP in v3.0.5 May 28, 2021
@zimmerle zimmerle added this to In progress in v3.1.0 via automation May 28, 2021
@zimmerle zimmerle added this to Todo in v3.0.5 May 28, 2021
@zimmerle
Copy link
Contributor

@zimmerle zimmerle commented May 28, 2021

I see a benefit of having less copies around, studying the feasibility to have this pull request partially merged on the upcoming 3.0.5.

On QA - https://github.com/SpiderLabs/ModSecurity/actions/runs/883266568

@zimmerle zimmerle removed this from Todo in v3.0.5 Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x
Projects
v3.1.0
  
In progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants