Stanley Tan
About        Archives        Life        @stnly

Ruby Regular Expressions

A regular expression (often referred to as regex or regexp) is a way of specifying a sequence of characters that is used to match patterns in text.

In Ruby, regexs are first class citizens. They are built into the core of the programming language. As such, it has a lot of useful integrations with the rest of the language. We’ll take a dive into the world of Ruby’s regular expressions to look at some of the lesser known or used features.

Plenty of people look at regex as though it is some sort of black magic. If you have never used or worked with regex before, it can look anything but regular. Although it may not look the part, it is easy to learn and very expressive. It is arguably the most useful text related skill any programmer can pick up.

If you’re just starting out with regular expressions, I suggest reading beginner’s guides to understand the basics. Once you know what’s going on, come back and see just how useful Ruby makes it.

Creating a Regexp object

Most Ruby programmers create a Regexp object by specifying a pattern between forward slash characters /.../. You can also create one by using the %r literal between any two punctuation marks %r{...} or by using the Regexp.new('...') constructor.

irb(main):001:0> /.../.class
=> Regexp
irb(main):002:0> %r{...}.class
=> Regexp
irb(main):003:0> %r|...|.class
=> Regexp
irb(main):004:0> %r(...).class
=> Regexp
irb(main):005:0> %r=...=.class
=> Regexp
irb(main):006:0> %r@...@.class
=> Regexp
irb(main):007:0> %r!...!.class
=> Regexp
irb(main):008:0> %r/.../.class
=> Regexp
irb(main):009:0> %r\...\.class
=> Regexp
irb(main):010:0> %r"...".class
=> Regexp
irb(main):011:0> Regexp.new('...').class
=> Regexp

Regexp object as an argument

Common use cases for regex are methods in the String class. #=~, #match, #scan, #sub, #gsub. Did you know that the #[] method accepts a Regexp object too?

irb(main):001:0> names = "adam ben charles"
=> "adam ben charles"
irb(main):002:0> names[/a\w*/]
=> "adam"
irb(main):003:0> names[/b\w*/]
=> "ben"
irb(main):004:0> names[/c\w*/]
=> "charles"
irb(main):005:0> names[/a(\w*)/, 1] # We can use capture group references
=> "dam"
irb(main):006:0> names[/b(\w*)/, 1]
=> "en"
irb(main):007:0> names[/z(\w*)/, 1]
=> nil
irb(main):008:0> # ^ is a more concise expression compared to:
irb(main):009:0* names =~ /a(\w*)/ and $1
=> "dam"
irb(main):006:0> names =~ /b(\w*)/ and $1
=> "en"

Capture groups

We have seen earlier that we can use parentheses () to capture groups. These groups are assigned to the global variables $1, $2, ... according to it’s occurrence. The first group will be assigned to $1, the second will be assigned to $2, and so on. It is also possible to refer to them within the regex itself using backreferences \1, \2, ....

irb(main):001:0> string = "how now brown cow"
=> "how now brown cow"
irb(main):002:0> string[/[hn](..)\s+[hn]\\1/]
=> "how now"
irb(main):003:0> $1
=> "ow"

Capture groups can also be assigned a name to improve code clarity. It can be defined by using the (?<name>) or (?'name') constructs.

irb(main):001:0> pattern = /(?<day>\d{2})-(?<month>\d{2})-(?<year>\d{4})/
=> /(?<day>\d{2})-(?<month>\d{2})-(?<year>\d{4})/
irb(main):002:0> data = "24-02-2014"
=> "24-02-2014"
irb(main):003:0> data[pattern, :day]
=> "24"
irb(main):004:0> data[pattern, :month]
=> "02"
irb(main):005:0> data[pattern, :year]
=> "2014"
irb(main):006:0> if pattern =~ data
irb(main):007:1>   $~[:year]
irb(main):008:1> end
=> "2014"

When using a literal regex on the left-hand side of an expression, the =~ operator, followed by the string. The named capture groups will be assigned to local variables with corresponding names.

irb(main):009:0> /(?<day>\d{2})-(?<month>\d{2})-(?<year>\d{4})/ =~ data
=> 0
irb(main):010:0> day
=> "24"
irb(main):011:0> month
=> "02"
irb(main):012:0> year
=> "2014"
irb(main):013:0> if /(?<day>\d{2})-(?<month>\d{2})-(?<year>\d{4})/ =~ data
irb(main):014:1>   puts "Day: #{day}"
irb(main):015:1>   puts "Month: #{month}"
irb(main):016:1>   puts "Year: #{year}"
irb(main):017:1> end
Day: 24
Month: 02
Year: 2014
=> nil

Named capture groups can also be backreferenced with \k<name>.

irb(main):001:0> pattern = /(?<day>\d{2})-\k<day>-(?<year>\d{4})/
=> /(?<day>\d{2})-\k<day>-(?<year>\d{4})/
irb(main):002:0> data = "01-02-2014\n02-02-2014\n03-02-2014\n04-02-2014\n"
=> "01-02-2014\n02-02-2014\n03-02-2014\n04-02-2014\n"
irb(main):003:0> data[pattern]
=> "02-02-2014"

Take note that it is not possible to use numbered backreferences together with named backreferences.

Regex with inline comments

Complex expressions are hard to read. Thankfully, we can add a special option after the end delimiter to control how the patterns are matched. /.../x allows us to add whitespace and comments to the pattern for code clarity.

# Take a look at this example,

pattern = /\A&[#](0[0-7]+|[0-9]+|x[0-9a-fA-F]+);\Z/

# versus this.

pattern = / \A                 # Start of string
           &[#]                # Start of numeric character reference
           (
               0[0-7]+         # Octal form
             | [0-9]+          # Decimal form
             | x[0-9a-fA-F]+   # Hexadecimal form
           )
           ;                   # Trailing semicolon
           \Z                  # End of string
          /x                   # Option

It is a contrived example, yet it shows how increased code clarity is possible with complex expressions. Since whitespace is ignored when the x option is activated, use escapes such as \s or \p{Space} to match them.

We can also use (?#comment) to add comments to expressions without the x option. I find this approach to be less useful. It is not consistent with Ruby’s style and further complicates the expression.

# Inline comments without the 'x' option

pattern = /\A&[#](0[0-7]+(?#octal)|[0-9]+|x[0-9a-fA-F]+);\Z/

I found that these little tricks help me write code that is more maintainable. As always, be careful when using regular expressions. Many security issues in Ruby often occur due to oversight when writing them.

Everything mentioned here can be found in the Ruby documentation. Have a look at it as it covers everything that is implemented by Ruby.

Did you find anything cool you could do with regular expressions in Ruby? Let me know!

Reducing Perceived Risk

Blue Mountains

There are only a few reasons why someone would pay money for something. Most of the time it is to scratch an itch. They seek to exchange a value (cash) that they have, for a value that your product proposes (solving their problem).

Here’s where things get interesting. Before making a decision, you can expect customers to perform risk assessments on your product. I have noticed that the time spent on these assessments (otherwise known as product research) is related to their perceived risk of the product.

It could be a conscious or subconscious act but it happens all the same. Some purchases take a couple of seconds, while others can span months or even years.

Furthermore, individual customers have their own perceptions depending on their individual circumstances. Here are a few different ways that can cause perceived risk to increase.

There are 5 factors which contributes to the total perceived risk that affect software as a service products. They are namely functional, social, psychological, financial and time. Together, these factors contribute to the customer’s decision making process. We’ll take a look at some of these factors as well as examples of techniques to reduce perceived risk.

  1. Functional. Will the product perform according to expectations? What features does it have?

    Provide a free trial or free tier. Customers should feel comfortable taking your product for a “test drive” before committing. If you can’t afford to provide a free trial, maybe consider doing a video demo or a screen cast that will portray the workflow of the product. This way, they would know that the product works before handing over their credit card details.

  2. Social. How would my boss or peers think when they find out?

    Getting social proof is a chicken and egg problem. People won’t use your product unless you have enough social proof, and you won’t have social proof until people use your product. The good news is early adopters are less risk averse to this factor.

    Testimonials and endorsements by well known people and businesses in your target market can help convince customers too. Social proof is all about the wisdom of the crowd. If enough people say it’s good, it should be good and thus reducing perceived risk. Guest posting on other industry blogs and other outreach efforts will help with awareness.

  3. Psychological. Is this a business I want to support? Does it have the same values as I do? Will it disappear overnight?

    The way you communicate with customers reflects your values. That includes everything from the copy on your landing page to the retention emails you send out.

    Add photos and short bios of your entire team together with links to their email and social media accounts on the about page. It portrays availability and helps customers put a face to the business.

  4. Financial. Can I or my business afford this product? Is it priced higher than it’s competitors?

    Generally, it is hard to beat other products based on price alone. Base your pricing on the value that your ideal customer will receive. Paying $99 per month to solve a real business pain would not be an issue for many businesses.

  5. Time. How much effort do I have to put in to switch over to this new product?

    Always guide a new user with an on-boarding process. Explain features and point out important links in the product. The ideal on-boarding process would walk a user through the product all the way until the “aha moment”.

You can also make it super easy for customers to switch to and from your product with one click importing and exporting of data. Examples include blogging platforms such as Wordpress or Tumblr.

By working through these factors, you can reduce perceived risk for your product or service. Making it less risky for the customer to choose you.

I would love to hear and discuss techniques that work for you. If you need help reducing perceived risk on anything (from my point of view), feel free to reach out.

Photograph is of Blue Mountains, NSW, Australia. It’s about an hour or two west of Sydney.

Heartbleed

Heartbleed

This is an apocalypse. The world is on fire. The sky is falling. Everything you hold sacred has now turned to dust.

The Heartbleed bug allows anyone on the Internet to read the memory of the systems protected by the vulnerable versions of the OpenSSL software. This compromises the secret keys used to identify the service providers and to encrypt the traffic, the names and passwords of the users and the actual content. This allows attackers to eavesdrop communications, steal data directly from the services and users and to impersonate services and users.

Heartbleed

Earlier today, a bunch of security researchers unleashed CVE-2014-0160 (CODENAME: Heartbleed) into the world. It is a serious vulnerability in the popular OpenSSL cryptographic software library. This weakness allows stealing the information protected, under normal conditions, by the SSL/TLS encryption used to secure the Internet. SSL/TLS provides communication security and privacy over the Internet for applications such as web, email, instant messaging (IM) and some virtual private networks (VPNs). It powers about 66% of internet connected devices. I can guarantee that everyone who uses the internet would use it at some point in their day to day activities.

What’s frightening is that this bug has been around for more than 2 years. It is extremely likely that it has been exploited by multiple intelligence agencies and blackhats. Exploitation of this bug also leaves no traces of anything abnormal happening in the logs.

Note that both servers and clients are affected. This means that a malicious server could dump the secrets in your client’s memory without you knowing.

In short, the vulnerability disclosed allows an attacker to read the memory of the affected system. Memory is where an attacker would find passwords and private keys as well as other decrypted and sensitive information.

Don’t understand the severity of this problem? Imagine walking up to any stranger and saying “Hey, how’s it going?”. Immediately, he/she will share with you whatever was on their mind at that point in time. It could likely be a private thought, a secret they do not want anyone else to know about. You could keep asking as many times as you wanted and the stranger would tell you new things each time. On top of that, the stranger would not have a clue of it occuring.

As an end user, there’s nothing much you can do. Except, turn the internet off and go for a walk until the bug is patched. Do not use any web services that are vulnerable as they may leak your username and password, or worse, credit card information. More importantly, do not visit sites that you do not trust.

Change your passwords and API keys only after web services fix the issues. Prematurely changing them could be riskier than leaving them unchanged. Information used most recently are the ones being leaked. For example, private keys are leaked on the first request after a restart.

If you own a server that runs OpenSSL, here’s a list of things you should do.

  1. Update OpenSSL. Your distribution most likely would have patched and tested the package.
  2. Recompile everything that is linked to the old version of OpenSSL. Pacakges such as Nginx and Ruby do so and you’ll have to recompile them.
  3. Reboot the server. Ensure everything is running on the patched version.
  4. Generate a new private key, Certificate Signing Request (CSR) and get a new certificate. Consider your old keys compromised and revoke them. Get a brand new set.
  5. Change any passwords you use on the servers. Passwords are kept in memory and could have been leaked.
  6. Generate and switch to a new secret if you’re using cookie based sessions in Sinatra or other web frameworks. Expire all active user sessions.
  7. Get your users to change their passwords. Passwords should be considered compromised as the server leaks memory and past traffic can be decrypted.
  8. Check your SSL configurations. Don’t support older protocols and broken SSL ciphers. Enable Perfect Forward Secrecy (PFS) and HTTP Strict Transport Security (HSTS). You can also choose to cache SSL sessions for improved performace.

Some PoCs have since been released.