Forbidden Characters



  • I'm relatively new in this place working on a moderately large application that's been around for ~10 years. There is a place where the users enter a name for something. Up until this point, the naming was overly strict, allowing only letters, digits and hyphens. My job is to loosen the unnecessary strictness. I am tempted to allow any characters, but I want to make sure I know what I'm doing.

    The application uses a PostGres database. It is not an HTML application, it does not have a web interface.

    On security, there are times when I would use a belt and suspenders approach. Let's take the worst-case scenario, and assume that there may be places in the application where SQL is constructed in Java through concatenation. Are there any characters I should not allow the user to use?

    There is no central database, even a hack would not affect anything outside their location, but I would still prefer to avoid any potential security issues.

    Epilogue:

    The issues seem to be more subtle and complicated than I initially anticipated. I think I'm settling on a minor adaptation of the existing method. I'm going to continue to use a short allowable characters list, just adding the underscore and a single space. (The existing code already does a trim.)


  • Trolleybus Mechanic

    In your dev environment you could remove the restrictions. Run tests that you have control over. Let it be used for a while to see what other sorts of garbage get added.

    You'll be able to make a better final decision once you know what's currently going on.



  • Read up on SQL injection, and make sure your code is not vulnerable, then you can allow everything. Things may break for viewers of data that are doing it wrong however.



  • @mikehurley said in Forbidden Characters:

    In your dev environment you could remove the restrictions. Run tests that you have control over. Let it be used for a while to see what other sorts of garbage get added.

    You'll be able to make a better final decision once you know what's currently going on.

    I could do that. The problem is that it's a fairly large application and has a lot of functionality, and I don't know it all that well. What I'm trying to avoid is the possibility of is finding out six months from now, "I didn't know that this column was later used in that way, so I didn't do a test on it." Are there specific characters that are necessary for a SQL injection attack?



  • @Carnage said in Forbidden Characters:

    Read up on SQL injection, and make sure your code is not vulnerable, then you can allow everything. Things may break for viewers of data that are doing it wrong however.

    Yes, and that's the idea. It's a large application, not one I'm writing from scratch. "Belt and suspenders" means to assume the worst, that there is code hiding in there somewhere that was done wrong.



  • @tharpa said in Forbidden Characters:

    Are there specific characters that are necessary for a SQL injection attack?

    Any characters used to build an SQL query. And this info probably isn't that helpful to you.


  • Banned

    @tharpa ban parentheses, whitespaces and semicolons, allow everything else. If you can't audit the whole application and evaluate whether they do stupid shit anywhere, that sounds like the safest approach.



  • @Gąska said in Forbidden Characters:

    @tharpa ban parentheses, whitespaces and semicolons, allow everything else. If you can't audit the whole application and evaluate whether they do stupid shit anywhere, that sounds like the safest approach.

    Thanks. The users specifically asked in the ticket to allow spaces, though. What about single and double-quotes?


  • ♿ (Parody)

    @tharpa said in Forbidden Characters:

    Are there specific characters that are necessary for a SQL injection attack?

    The problem is if you use string concatenation to build a query and include user supplied data as part of the query. Typically something like :

    sql = "select * from people where name = '" + name + "';";

    So, when they type in whatever gets put into name they add a single quote and then some other junk to neuter the condition, returning stuff that they shouldn't. In really lax systems, possibly adding another query that inserts or deletes data. Basically, make sure you're using your DB's / framework's method of using parameters.

    The main downsides of weird characters will be the ease of searching (assuming that's a thing that happens) due to some users being unable to type emojis or obscure unicode and the sort order. Also stuff like lookalike letters (like some Cyrillic characters) that will confuse the bejeesus out of everyone. Non-printable weirdness that will be impossible to figure out until you scrutinize the data.

    Ugh.


  • ♿ (Parody)

    @tharpa said in Forbidden Characters:

    The users specifically asked in the ticket to allow spaces, though.

    Make sure you trim leading and trailing spaces!


  • Banned

    @tharpa spaces are problematic not because of SQL, but because the software might do some stupid shit and build space-separated lists of things out of database elements (most often it happens when you call shell scripts, but that's not the only possible source of problems). Quotes are interesting because if spaces break program, then quotes probably don't matter to it, but if the program handles spaces half-correctly (through quoting things in appropriate places), quotes can make it not work as intended (sometimes only when spaces and quotes are present at the same time).

    Basically - either you ensure (or blindly assume) that your entire programs handles spaces correctly everywhere, or you disallow spaces at all, since they're most likely to make things go wrong. Alternatively, maybe you can replace all spaces with underscores, and maybe reverse the process when displaying values in GUI (or not - it's just cosmetics at this point).


  • Trolleybus Mechanic

    @boomzilla said in Forbidden Characters:

    @tharpa said in Forbidden Characters:

    The users specifically asked in the ticket to allow spaces, though.

    Make sure you trim leading and trailing spaces!

    Then do this:

    while($name.contains("  "))
    {
       $name = $name.replace("  ", " ");
    }
    

  • Trolleybus Mechanic

    @tharpa said in Forbidden Characters:

    @Gąska said in Forbidden Characters:

    @tharpa ban parentheses, whitespaces and semicolons, allow everything else. If you can't audit the whole application and evaluate whether they do stupid shit anywhere, that sounds like the safest approach.

    Thanks. The users specifically asked in the ticket to allow spaces, though. What about single and double-quotes?

    Sounds like you need to defer this ticket for a bit and do some research on what the system is doing. Both how its parts interact and what sorts of operations it does with the database data.


  • Notification Spam Recipient

    @boomzilla said in Forbidden Characters:

    @tharpa said in Forbidden Characters:

    The users specifically asked in the ticket to allow spaces, though.

    Make sure you trim leading and trailing spaces!

    GAH! This a million times! Just say no to spaces. Everywhere I've worked with databases eventually had a problem with trailing spaces.



  • Is the application are they using prepared statements?



  • @sweaty_gammon said in Forbidden Characters:

    In the application are they using prepared statements?

    Generally, yes, they're using hibernate. But the problem is that, like many applications, it was developed over a number of years by different people. I think there are probably places that SQL is being concatenated directly. Thus the belt and suspenders approach.


  • kills Dumbledore

    If you can't be sure there's no SQL injection vulnerability the most dangerous character is he single quote. That can be used to escape the kind of concatenation that @boomzilla mentioned. Banning that would reduce the attack surface but there are other tricks people can use.


  • Trolleybus Mechanic

    @Jaloopa said in Forbidden Characters:

    If you can't be sure there's no SQL injection vulnerability the most dangerous character is he single quote. That can be used to escape the kind of concatenation that @boomzilla mentioned. Banning that would reduce the attack surface but there are other tricks people can use.

    Bob O'Malley says feck yer arse.


    Filed under: Every non-human character from every hack fantasy and sci-fi writer also wants a word...



  • @tharpa said in Forbidden Characters:

    @sweaty_gammon said in Forbidden Characters:

    In the application are they using prepared statements?

    Generally, yes, they're using hibernate. But the problem is that, like many applications, it was developed over a number of years by different people. I think there are probably places that SQL is being concatenated directly. Thus the belt and suspenders approach.

    Is auditing the code base out of question?



  • @sweaty_gammon said in Forbidden Characters:

    @tharpa said in Forbidden Characters:

    @sweaty_gammon said in Forbidden Characters:

    In the application are they using prepared statements?

    Generally, yes, they're using hibernate. But the problem is that, like many applications, it was developed over a number of years by different people. I think there are probably places that SQL is being concatenated directly. Thus the belt and suspenders approach.

    Is auditing the code base out of question?

    Depending on how formally you mean auditing, yes. I'm relatively new there and pretty much at the bottom of the pecking order.


  • kills Dumbledore

    @Lorne-Kates said in Forbidden Characters:

    @Jaloopa said in Forbidden Characters:

    If you can't be sure there's no SQL injection vulnerability the most dangerous character is he single quote. That can be used to escape the kind of concatenation that @boomzilla mentioned. Banning that would reduce the attack surface but there are other tricks people can use.

    Bob O'Malley says feck yer arse.


    Filed under: Every non-human character from every hack fantasy and sci-fi writer also wants a word...

    If you need to prevent SQL injection and don't have control over the data layer that's a price you have to pay. Unless you replace them with smart quotes or some other lookalike, but that could cause problems in other parts that do searching


  • Considered Harmful

    @tharpa said in Forbidden Characters:

    The issues seem to more subtle and complicated than I initially anticipated.

    you don't say



  • @Gribnit said in Forbidden Characters:

    @tharpa said in Forbidden Characters:

    The issues seem to more subtle and complicated than I initially anticipated.

    you don't say

    I think I'm perfectly qualified to be a manager.



  • MD5 encode any apostrophes in the input, that should work pretty well.


  • Fake News

    Is there ever any input that forms a file name at some point?

    We had fun when people used one of / \ ? % * : | " < > (the double quote is mostly a problem if you're doing file downloads and your HTTP code does not really know how to escape it in a Content-Disposition header).



  • @tharpa I would want to audit the code base and make sure everything is done through NHibernate or at least ADO.NET and parametrised queries. However you might not have the time to do that. Maybe try to find the most egregious examples and then see what the rest of your team / manager thinks?


  • Considered Harmful

    @sweaty_gammon People use NHibernate?



  • @pie_flavor said in Forbidden Characters:

    @sweaty_gammon People use NHibernate?

    People use EF 😃 so there seems to be plenty of people that have strange ideas.


  • Considered Harmful

    @sweaty_gammon Huh, apparently they do. 1M downloads. Still, that doesn't come close to EF's 53M.



  • @tharpa said in Forbidden Characters:

    @Carnage said in Forbidden Characters:

    Read up on SQL injection, and make sure your code is not vulnerable, then you can allow everything. Things may break for viewers of data that are doing it wrong however.

    Yes, and that's the idea. It's a large application, not one I'm writing from scratch. "Belt and suspenders" means to assume the worst, that there is code hiding in there somewhere that was done wrong.

    How many millions of lines of code are you looking at?
    Is it decently designed so you at least have a data access layer where all the databasey stuff lives, or is it spread all over the place?
    The really annoying thing is that you may have bespoke code using whatever character as a delimiter or flow control character somewhere that just slightly breaks things, so without looking through all of the code, getting a decent grasp of how it works and what it does, any additional character may break something. Spaces are particularly annoying, if the original designers knew that they shouldn't be in the in the input, since there may be things doing stringy things that has spaces as significant characters, such as splitting up concatenated strings by spaces and grabbing individual strings by index.
    The database bits is easy. The rest is the annoying stuff that comes back and bites you in a few months time. The best way to handle it is to get to know the code properly. Hopefully, it's fairly well designed.


  • Considered Harmful

    @Carnage said in Forbidden Characters:

    Hopefully, it's fairly well designed.

    y'know, it's considered good form to add a 🐠 or a 🚎 or something



  • @Gribnit said in Forbidden Characters:

    @Carnage said in Forbidden Characters:

    Hopefully, it's fairly well designed.

    y'know, it's considered good form to add a 🐠 or a 🚎 or something

    Hey, it could happen! If not by anything else, then by the law of large numbers a decently designed software project will randomly just happen.


  • Java Dev

    @JBert said in Forbidden Characters:

    your HTTP code does not really know how to escape [...] a [...] header

    Neither does the RFC.


  • 🚽 Regular

    @JBert said in Forbidden Characters:

    We had fun when people used one of / \ ? % * : | " < >

    Shout out to the tab character; line feeds and carriage returns; ampersand (when going through XML or HTML); and the frigging comma (when through CSV) (semicolons as well).


  • Notification Spam Recipient

    @Carnage said in Forbidden Characters:

    How many millions of lines of code are you looking at?

    Hmmm, just under 1, I think.

    Is it decently designed so you at least have a data access layer where all the databasey stuff lives, or is it spread all over the place?

    Somewhat decently, if you consider ODBC "decent".

    The really annoying thing is that you may have bespoke code using whatever character as a delimiter or flow control character somewhere that just slightly breaks things, so without looking through all of the code, getting a decent grasp of how it works and what it does, any additional character may break something. Spaces are particularly annoying, if the original designers knew that they shouldn't be in the in the input, since there may be things doing stringy things that has spaces as significant characters, such as splitting up concatenated strings by spaces and grabbing individual strings by index.

    Luckily all the db calls are in on long file! and non-strings are typically not treated as strings.

    The database bits is easy. The rest is the annoying stuff that comes back and bites you in a few months time. The best way to handle it is to get to know the code properly. Hopefully, it's fairly well designed.

    Yeah, just recently it was "discovered" that single apostrophes will break things. 'course I knew about it, but I haven't had time to redesign everything to actually support parameterizing all the things. Soon™ though.



  • @Tsaukpaetra said in Forbidden Characters:

    @Carnage said in Forbidden Characters:

    How many millions of lines of code are you looking at?

    Hmmm, just under 1, I think.

    Is it decently designed so you at least have a data access layer where all the databasey stuff lives, or is it spread all over the place?

    Somewhat decently, if you consider ODBC "decent".

    The really annoying thing is that you may have bespoke code using whatever character as a delimiter or flow control character somewhere that just slightly breaks things, so without looking through all of the code, getting a decent grasp of how it works and what it does, any additional character may break something. Spaces are particularly annoying, if the original designers knew that they shouldn't be in the in the input, since there may be things doing stringy things that has spaces as significant characters, such as splitting up concatenated strings by spaces and grabbing individual strings by index.

    Luckily all the db calls are in on long file! and non-strings are typically not treated as strings.

    The database bits is easy. The rest is the annoying stuff that comes back and bites you in a few months time. The best way to handle it is to get to know the code properly. Hopefully, it's fairly well designed.

    Yeah, just recently it was "discovered" that single apostrophes will break things. 'course I knew about it, but I haven't had time to redesign everything to actually support parameterizing all the things. Soon™ though.

    Hmm. Carnage was asking me the questions, but you answered. So the only conclusion I can draw is that you work here. And there aren't many people who know enough about the code to give those answers, so that narrows down the possibilities.



  • @pie_flavor said in Forbidden Characters:

    Still, that doesn't come close to EF's 53M.

    How many times can be contributed to build machines doing a Nuget Install?


  • Notification Spam Recipient

    @tharpa said in Forbidden Characters:

    @Tsaukpaetra said in Forbidden Characters:

    @Carnage said in Forbidden Characters:

    How many millions of lines of code are you looking at?

    Hmmm, just under 1, I think.

    Is it decently designed so you at least have a data access layer where all the databasey stuff lives, or is it spread all over the place?

    Somewhat decently, if you consider ODBC "decent".

    The really annoying thing is that you may have bespoke code using whatever character as a delimiter or flow control character somewhere that just slightly breaks things, so without looking through all of the code, getting a decent grasp of how it works and what it does, any additional character may break something. Spaces are particularly annoying, if the original designers knew that they shouldn't be in the in the input, since there may be things doing stringy things that has spaces as significant characters, such as splitting up concatenated strings by spaces and grabbing individual strings by index.

    Luckily all the db calls are in on long file! and non-strings are typically not treated as strings.

    The database bits is easy. The rest is the annoying stuff that comes back and bites you in a few months time. The best way to handle it is to get to know the code properly. Hopefully, it's fairly well designed.

    Yeah, just recently it was "discovered" that single apostrophes will break things. 'course I knew about it, but I haven't had time to redesign everything to actually support parameterizing all the things. Soon™ though.

    Hmm. Carnage was asking me the questions, but you answered. So the only conclusion I can draw is that you work here. And there aren't many people who know enough about the code to give those answers, so that narrows down the possibilities.

    *ducks head and whispers urgently into pinkie finger* "position compromised, request immediate evac!"


  • Considered Harmful

    @tharpa said in Forbidden Characters:

    @Tsaukpaetra said in Forbidden Characters:

    @Carnage said in Forbidden Characters:

    How many millions of lines of code are you looking at?

    Hmmm, just under 1, I think.

    Is it decently designed so you at least have a data access layer where all the databasey stuff lives, or is it spread all over the place?

    Somewhat decently, if you consider ODBC "decent".

    The really annoying thing is that you may have bespoke code using whatever character as a delimiter or flow control character somewhere that just slightly breaks things, so without looking through all of the code, getting a decent grasp of how it works and what it does, any additional character may break something. Spaces are particularly annoying, if the original designers knew that they shouldn't be in the in the input, since there may be things doing stringy things that has spaces as significant characters, such as splitting up concatenated strings by spaces and grabbing individual strings by index.

    Luckily all the db calls are in on long file! and non-strings are typically not treated as strings.

    The database bits is easy. The rest is the annoying stuff that comes back and bites you in a few months time. The best way to handle it is to get to know the code properly. Hopefully, it's fairly well designed.

    Yeah, just recently it was "discovered" that single apostrophes will break things. 'course I knew about it, but I haven't had time to redesign everything to actually support parameterizing all the things. Soon™ though.

    Hmm. Carnage was asking me the questions, but you answered. So the only conclusion I can draw is that you work here. And there aren't many people who know enough about the code to give those answers, so that narrows down the possibilities.

    he's you.



  • @Gribnit said in Forbidden Characters:

    he's you.

    Hmm. Sock Rock upvoted that. You might be onto something.


  • And then the murders began.

    @sweaty_gammon said in Forbidden Characters:

    @pie_flavor said in Forbidden Characters:

    Still, that doesn't come close to EF's 53M.

    How many times can be contributed to build machines doing a Nuget Install?

    Those build machines should be caching the package locally. So... assuming 1 build machine per 4 developers, 20%?


  • Notification Spam Recipient

    @tharpa said in Forbidden Characters:

    @Gribnit said in Forbidden Characters:

    he's you.

    Hmm. Sock Rock upvoted that. You might be onto something.

    Masturbation is a perfectly normal and expected behaviour.


  • Considered Harmful

    @Tsaukpaetra If you wrote it wrong the first time, you are the problem.


  • Notification Spam Recipient

    @pie_flavor said in Forbidden Characters:

    @Tsaukpaetra If you wrote it wrong the first time, you are the problem.

    😕 Wrote what wrong?


  • Considered Harmful

    @Tsaukpaetra The database code. Not using parameters.


  • Notification Spam Recipient

    @pie_flavor said in Forbidden Characters:

    @Tsaukpaetra The database code. Not using parameters.

    Ah. Yes. The one who wrote the database code is the problem.

    Lucky it wasn't me, eh?


  • BINNED

    @Tsaukpaetra But now you do know that the other person who wrote it wrong the first time is the problem. Problem solved!


  • Notification Spam Recipient

    @kazitor said in Forbidden Characters:

    @Tsaukpaetra But now you do know that the other person who wrote it wrong the first time is the problem. Problem solved!

    Yeah, and I still keep finding pieces of hims code all over the place...


  • Discourse touched me in a no-no place

    @tharpa said in Forbidden Characters:

    I think there are probably places that SQL is being concatenated directly.

    Then your job is to search for and destroy those places; it's a true part of addressing the ticket. Unless they're building column or table names, but those shouldn't ever be under the control of users and probably shouldn't ever contain anything outside a limited subset of ASCII. (The labels in the GUI aren't the same thing, of course.)



  • @tharpa said in Forbidden Characters:

    I think there are probably places that SQL is being concatenated directly.

    Like dkf said, this ticket includes cleaning up any and all SQL concatenation. And I'd extend that to moving to only parametrized queries everywhere, to prevent future issues.
    Also remember to mull over any and all data massaging code to verify that there is no usage of magic token characters anywhere.
    The time estimated for the ticket should reflect this, and also should be clearly stated why it'd take time. And there should be no other options given. Whenever you say anything about cutting corners, everyone will tell you to cut the corners, and if you could cut a little extra, that'd be great, yeah.


Log in to reply