What is “Soft Coding”, really?

Question

In this article by Alex Papadimoulis, you can see this snippet:

private void attachSupplementalDocuments()
{
  if (stateCode == "AZ" || stateCode == "TX") {

    //SR008-04X/I are always required in these states
    attachDocument("SR008-04X");
    attachDocument("SR008-04XI");
  }

  if (ledgerAmnt >= 500000) {
    //Ledger of 500K or more requires AUTHLDG-1A
    attachDocument("AUTHLDG-1A");
  }

  if (coInsuredCount >= 5  && orgStatusCode != "CORP") {
    //Non-CORP orgs with 5 or more co-ins require AUTHCNS-1A
    attachDocument("AUTHCNS-1A");
  }
}

I really don't understand this article.

I quote:

If every business rule constant was stored in some configuration file, life would be much difficult for everyone maintaining the software: there’d be a lot of code files that shared one, big file (or, the converse, a whole lot of tiny configuration files); deploying changes to the business rules require not new code, but manually changing the configuration files; and debugging is that much more difficult.

This is an argument against having the "500000" constant integer in a configuration file, or the "AUTHCNS-1A" and other string constants.

How can this be a bad practice?

In this snippet, "500000" is not a number. It's not, for example, the same as:

int doubleMe(int a) { return a * 2;}

where 2, is a number that needs not be abstracted. Its use is obvious, and it does not represent something that may be reused later on.

On the contrary, "500000" is not simply a number. It's a significant value, one that represents the idea of a breakpoint in functionality. This number could be used in more than one place, but it's not the number that you're using; it's the idea of the limit/borderline, below which one rule applies, and above which another.

How is referring to it from a configuration file, or even a #define, const or whatever your language provides, worse than including its value? If later on the program, or some other programmer, also requires that borderline, so that the software makes another choice, you're screwed (because when it changes, nothing guarantees you that it will change in both files). That's clearly worse for debugging.

In addition, if tomorrow, the government demands "From 5/3/2050, you need to add AUTHLDG-122B instead of AUTHLDG-1A", this string constant is not a simple string constant. It's one that represents an idea; it's just the current value of that idea (which is "the thing that you add if the ledger is above 500k").

Let me clarify. I'm not saying that the article is wrong; I just don't get it; maybe it's not too well explained (at least for my thinking).

I do understand that replacing every possible string literal or numerical value with a constant, define, or configuration variable, is not only not necessary, but overcomplicates things, but this particular example does not seem to fall under this category. How do you know that you will not need it later on? Or someone else for that matter?

Play the puzzle: what would be a good name for those numbers? I think you'll find that either the name adds no value whatsoever, or it describes everything the code is already describing and often while adding ambiguity ("LedgerLimitForAuthDlg1A"?). I found the article brilliant exactly because of how relevant is this. I've maintained systems that have used both approaches, and I can tell you that business rules belong in code - it makes them much easier to track, maintain and understand. When you use configuration, you better make it count - it's much more expensive. — Luaan, yesterday
Configuration should be reserved for things that need to be configured. If the business rules aren't configurable in general, putting bits of it in configuration anyway buys you nothing. — biziclop, 20 hours ago
For suitably advanced languages, configuration takes form of actual subroutines and not strings. — Thorbjørn Ravn Andersen, 5 hours ago

Ben Cottrell · Accepted Answer · 2016-04-09 07:29:59Z

The author is warning against premature abstraction.

The line if (ledgerAmt > 500000) looks like the kind of business rule that you would expect to see for large complex business sytems whose requirements are incredibly complex yet precise and well-documented.

Typically those kinds of requirements are exceptional/edge cases rather than usefully reusable logic. Those requirements are typically owned and maintained by business analysts and subject matter experts, rather than by engineers

(Note that 'ownership' of requirements by Business Analysts/experts in these cases typically occurs where developers working in specialist fields don't have sufficient domain expertise; although I would still expect full communication/cooperation between developers and the domain experts to protect against ambiguous or poorly written requirements.)

When maintaining systems whose requirements are packed full of edge-cases and highly complex logic, there is usually no way to usefully abstract that logic or make it more maintainable; attempts to try building abstractions can easily backfire - not just resulting in wasted time, but also resulting in less maintainable code.

How is referring to it from a config file, or even a #define, const or whatever your language provides, worse than including its value? If later on the program, or some other programmer, also requires that borderline, so that the software makes another choice, you're screwed (because when it changes, nothing guarantees you that it will change in both files). That's clearly worse for debugging.

This kind of code tends to be guarded by the fact that the code itself probably has a one-to-one mapping to requirements; i.e. when a developer knows that the 500000 figure appears twice in the requirements, that developer also knows that it appears twice in the code.

Consider the other (equally likely) scenario where 500000 appears in multiple places in the requirements document, but the Subject Matter Experts decide to only change one of them; there you have an even worse risk that somebody changing the const value might not realise the 500000 is used to mean different things - so the developer changes it in the one and only place he/she finds it in the code, and ends up breaking something which they didn't realise they had changed.

This scenario happens a lot in bespoke legal/financial software (e.g. insurance quotation logic) - people who write such documents aren't engineers, and they have no problem copy+pasting entire chunks of the spec, modifying a few words/numbers, but leaving most of it the same.

In those scenarios, the best way to deal with copy-paste requirements is to write copy-paste code, and to make the code look as similar to the requirements (including hard-coding all the data) as possible.

The reality of such requirements is that they don't usually stay copy+paste for long, and the values sometimes change on a regular basis, but they often don't change in tandem, so trying to rationalise or abstract those requirements out or simplify them any way ends up creating more of a maintenance headache than just translating requirements verbatim into code.

A Domain Specific Language (DSL) can be a good way to make the code read more like the requirement document. — Ian, 2 days ago
Another advantage of a DSL is that also makes it harder to accidentally mix application, presentation, or persistence logic with the business rules. — Erik Eidt, yesterday
@Ian going from Ewan's answer, that's exactly what the article is arguing against. — OrangeDog, yesterday
Thinking that your application is special enough to warrant its own DSL is usually hubris. — brian_o, yesterday
Those requirements are typically owned and maintained by business analysts and subject matter experts, rather than by engineers which is not always a good idea. Sometimes the act of turning those requirements into code will reveal corner cases where the requirements are either not well defined or are defined in such a way that it goes against the business interest. If the business analysts and developers can co-operate on achieving a common goal, then a lot of problems can be avoided. — kasperd, yesterday

JacquesB · Answer 2 · 2016-04-09 09:45:10Z

The article has a good point. How can it be a bad practice to extract constants to a configuration file? It can be a bad practice if it complicates the code unnecessarily. Having a value directly in code is much simpler than having to read it from a configuration file, and the code as written is easy to follow.

In addition, tomorrow, the government goes "From 5/3/2050, you need to add AUTHLDG-122B instead of AUTHLDG-1A".

Yeah, then you change the code. The point of the article is that it is not more complicated to change code than changing a configuration file.

The approach described in the article does not scale if you get more complex logic, but the point is that you have to make a judgment call, and sometimes the simplest solution simply is the best.

How do you know that you will not need it later on? Or someone else for that matter?

This is the point of the YAGNI principle. Don't design for an unknown future which may turn out completely diffent, design for the present. You are correct that if the value 500000 is used several places in the program it should of course be extracted to a constant. But this is not the case in the code in question.

Softcoding is really a question of seperation of concerns. You softcode information which you know might change independently from the core application logic. You would never hardcode a connection string to a database, because you know it might change independently from the application logic and you will need to differentiate it for different environments. In a web app we like to separate business logic from html templates and style sheets, because they might change independently and even be changed by different people.

But in the case in the code sample, the hardcoded strings and numbers are an integral part of the application logic. It is conceivable that one file might change its name due to some policy change outside your control, but it is just as conceivable that we need to add a new if-branch checking for a different condition. Extracting the file names and numbers actually breaks cohesion in this case.

Often it is a lot more complicated to change code than a configuration file. You may need a developer and a build system / release cycle for the former, while the latter only requires changing a number in a box in a friendly config UI. — OrangeDog, yesterday
What happens if the legislature of one of those states decides to change that (purely arbitrary) 500,000 to 650,000? — jamesqf, yesterday
@OrangeDog Yeah, that's how it looks at first. But if you do things like this, the config UI is going to be anything but friendly, with hundreds of totally meaningless text-boxes asking you for who knows what. And now you have to build the UI, and document it. Mind you, that doesn't mean configuration is never a good way to go - there's cases where it's absolutely the right choice. But not in any of the examples in the article. And when was the last time a legislation changed just the number? The last time VAT rules changed here, we had to redo all the calculations anyway. — Luaan, yesterday
@OrangeDog: You are assuming, here, that the software's configuration provides you with the necessary hooks for the check you need to make. Note how in the OP each and every if is based on a different variable! If the variable you need is not accessible from the configuration, you need to modify the software anyway. — Matthieu M., yesterday
@OrangeDog so you are suggesting that there should be significant changes to the logic of a software application, without a dev/qa/release cycle and appropriate testing? — NPSF3000, yesterday

Peter Mortensen · Answer 3 · 2016-04-09 09:25:00Z

up vote 20 down vote

The article goes on to talk about 'Enterprise Rule Engine's which are probably a better example of what he is arguing against.

The logic is that you can generalize to the point at which your configuration becomes so complicated that it contains its own programming language.

For instance, the state code to document mapping in the example could be moved to a configuration file. But you would then need to express a complex relationship.

<statecode id="AZ">
    <document id="SR008-04X"/>
    <document id="SR008-04XI"/>
</statecode>

Maybe you would also put the ledger amount in?

<statecode id="ALL">
    <document id="AUTHLDG-1A" rule="ledgerAmt >= 50000"/>
</statecode>

Soon you find that you are programming in a new language you have invented and saving that code in configuration files which have no source or change control.

It should be noted that this article is from 2007 when this kind of thing was a common approach.

Nowadays we would probably solve the issue with dependency injection (DI). I.e., you would have a 'hard coded'

InvoiceRules_America2007 : InvoiceRules

which you would replace with a hard coded, or more configurable

InvoiceRules_America2008 : InvoiceRules

when the law or business requirements changed.

edited yesterday

Peter Mortensen

1,11821014

answered 2 days ago

Ewan

3,879320

4

Perhaps you should define "DI". And maybe explain a bit more. – Basil Bourque yesterday

5

Why would that file not be in the source control system? – JDługosz yesterday

@JDługosz What if the file is client/user-specific? What if that file is written and managed by that client/user? What if that config is not a file but a row in a database? All real possibilities that could prevent it from being in source control. – cbojar yesterday

2

If it's client specific, does the coded version have a huge mess of if statements to give different values for each client? That sounds like something that should be in a config file. Being in one kind of file or another, all else being equal, is not a reason to not control/track/backup the file. @ewan seems to be saying that a DSL's file can't be saved as part of the project for some reason, when even non-code assets like images and sound files and documentation certainly is. – JDługosz yesterday

2

You should really refactor the "50000" value out of your XML and put it in a separate config file, don't you think? ...and it's supposed to be 500000, by the way. – Wildcard yesterday

| show 4 more comments

Thanos Tintinidis · Answer 4 · 2016-04-08 10:22:45Z

up vote 11 down vote

On the contrary, "500000" is not simply a number. It's a significant value, one that represents the idea of a breakpoint in functionality. This number could be used in more than one places, but it's not the number that you're using, it's the idea of the limit/borderline, below which one rule applies, and above which another.

And that is expressed by having (and I could argue that even the comment is redundant):

 if (ledgerAmnt >= 500000) {
    //Ledger of 500K or more requires AUTHLDG-1A
    attachDocument("AUTHLDG-1A");
  }

This is just repeating what the code is doing:

LEDGER_AMOUNT_REQUIRING_AUTHLDG1A=500000
if (ledgerAmnt >= LEDGER_AMOUNT_REQUIRING_AUTHLDG1A) {
    //Ledger of 500K or more requires AUTHLDG-1A
    attachDocument("AUTHLDG-1A");
}

Note that the author assumes that the meaning of 500000 is tied to this rule; it is not a value that is or is likely to be reused elsewhere:

The one and only business rule change that this preceding Soft Coding could ever account for is a change in the ledger amount that required a form AUTHLDG-1A. Any other business rule change would require even more work – configuration, documentation, code, etc

The article's main point, in my view, is that sometimes a number is just a number: it has no extra meaning other that what's conveyed in the code and it's not likely to be used elsewhere. Therefore, awkwardly summarizing what the code is doing (now) in a variable name just for the sake of avoiding hard-coded values is unnecessary repetition at best.

answered 2 days ago

Thanos Tintinidis

52418

+1 for the last paragraph; very good summary. – Wildcard yesterday

If you introduced the constant LEDGER_AMOUNT_REQUIRING_AUTHLDG1A, you wouldn't write the comment into the code anymore. Comments are not maintained well by programmers. If the amount ever changed, the if condition and the comment would get out of sync. On the contrary, the constant LEDGER_AMOUNT_REQUIRING_AUTHLDG1A never gets out of sync with itself and it explains its purpose without the needless comment. – ZeroOne yesterday

1

@ZeroOne: Except that if the business rule changes to "Ledger of 500K or more requires AUTHLDG-1A and AUTHLDG-2B", it's very likely that the person who adds the attachDocument("AUTHLDG-2B"); line will fail to update the constant-name at the same time. In this case, I think the code is quite clear enough with neither a comment nor an explainer variable. (Though it might make sense to have a convention of indicating the appropriate section of the business requirements document via code comments. Under such a convention, a code comment that does that would be appropriate here.) – ruakh 14 hours ago

@ruakh, OK, then I'd refactor the constant to be called LEDGER_AMOUNT_REQUIRING_ADDITIONAL_DOCUMENTS (which I probably should've done in the first place). I'm also in the habit of putting the business requirement IDs into the Git commit messages, not into the source code. – ZeroOne 4 hours ago

@ZeroOne: But there may be yet other documents with different ledger amount thresholds. – ruakh 29 mins ago

add a comment |

Peter Mortensen · Answer 5 · 2016-04-09 09:25:05Z

The other answers are correct, and thoughtful. But here is my short-and-sweet answer.

  Rule/value          |      At Runtime, rule/value…
  appears in code:    |   …Is fixed          …Changes
----------------------|------------------------------------
                      |                 |
  Once                |   Hard-code     |   Externalize
                      |                 |   (soft-code)
                      |                 |
                      |------------------------------------
                      |                 |
  More than once      |   Soft-code     |   Externalize
                      |   (internal)    |   (soft-code)
                      |                 |
                      |------------------------------------

If the rules and special values appear in one place in the code, and do not change during runtime, then hard-code as shown in the question.

If the rules or special values appear in more than one place in the code, and do not change during runtime, then soft-code. Soft-coding for a rule might me defining a specific class/method or use the Builder pattern. For values, soft-coding can mean defining a single constant or enum for the value to be used across your code.

If the rules or special values may change during runtime, then you must externalize them. It is commonly done by updating values in a database. Or update values in memory manually by a user entering data. It is also done by storing values in a text file (XML, JSON, plain text, whatever) that is repeatedly scanned for file modification date-time change.

Dewi Morgan · Answer 6 · 2016-04-10 02:30:13Z

This is the trap we fall into when we use a toy problem and then pose only strawman solutions, when we are trying to illustrate a real issue.

In the example given, it makes not one whit of difference whether the values given are hardcoded as inline values, or defined as consts.

It is the surrounding code that would make the example a maintenance and coding horror. If there is no surrounding code, then the snippet is fine, at least in an environment of constant refactoring. In an environment where refactoring tends not to happen, the maintainers of that code are already dead, for reasons that will shortly become obvious.

See, if there is code surrounding it, then bad things clearly happen.

The first bad thing is that the value 50000 gets used for another value somewhere, say, the ledger amount over which the tax rate changes in some states... then when change happens, the maintainer has no way of knowing, when he finds those two instances of 50000 in the code, whether they mean the same 50k, or entirely unrelated 50ks. And should you also search for 49999 and 50001, in case someone used those as constants, too? This is not a call to plonk those variables in a config file of a separate service: but hardcoding them inline is clearly also wrong. Instead, they should be constants, defined and scoped within the class or file in which they are used. If the two instances of 50k use the same constant, then they likely represent the same legislative restriction; if not, they probably don't; and either way, they will have a name, which will be less opaque than an inline number.

The filenames are being passed to a function - attachDocument() - which accepts base filenames as string, without path or extension. The filenames are, essentially, foreign keys to some filesystem, or database, or wherever attachDocument() gets the files from. But the strings tell you nothing about this - how many files are there? What types of file are they? How do you know, when opening into a new market, whether you need to update this function? To what types of thing can they be attached? The maintainer is left entirely in the dark, and all he has is a string, which may appear multiple times in the code and mean different things every time it appears. In one place, "SR008-04X" is a cheat code. In another, it's a command to order four SR008 booster rockets. Here, it's a filename? Are these related? Someone just changed that function to mention another file, "CLIENT". Then you, poor maintainer, have been told that the "CLIENT" file needs to be renamed to "CUSTOMER". But the string "CLIENT" appears 937 times in the code... where do you even start looking?

The toy problem is that the values are all unusual and can be reasonably guaranteed to be unique in the code. Not "1" or "10" but "50,000". Not "client" or "report" but "SR008-04X".

The strawman is that the only other way to address the problem of impenetrably opaque constants is to hive them off into the config file of some unrelated service.

Together, you can use these two fallacies to prove any argument true.

Not a toy problem, not a strawman. This is something you'll see all the time in these kinds of business applications. There's no "opening into a new market", there's no reuse of the same number (after all, that would give it another meaning anyway) and in any case, the article says nothing against DRY - if there's two dependencies on the value, it will be either moved into a method or a constant. Show examples of how those constants (of config settings, it doesn't really matter) should be named, and where they should be stored in a way that's future proof and more clear than the code. — Luaan, 8 hours ago
The example doesn't break down because it's a toy problem. The surrounding code will always be horrible because the business rules that the software has to execute are horror. Attempts to side-step this fundamental challenge with rules engines and DSLs and whatnot is often programmer procrastination, because solving CS problems is more enjoyable than solving intricacies of tax forms. Attempts at achieving 'elegance' are often fool's errands because the ultimate task of the software is to model a complicated disaster. — whatsisname, 8 hours ago

asked	2 days ago
viewed	8538 times
active	today

current community

your communities

more stack exchange communities

What is “Soft Coding”, really?

6 Answers 6

Your Answer

Not the answer you're looking for? Browse other questions tagged design or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

What is “Soft Coding”, really?

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged design or ask your own question.

Linked

Related

Hot Network Questions