Subscribe to Updates

Click here to subscribe to new posts by email. We use Google FeedBurner to send these notifications.

Regular Expressions 101 (Part 1)

Posted on: April 6th, 2007 by Craig Rairdin 1 Comment

Note: This corrects some missing pieces of information from the newsletter version of this article.

As users of PocketBible and MyBible you are aware that Laridian licenses books from Christian publishers and publishes them electronically. These include Bibles, commentaries, Bible dictionaries, devotionals and other Bible reference works that aren’t covered by those four general categories.

Since you’re getting this newsletter you are also familiar with our BookBuilder program which allows you to take original content (or content for private use) and turn it into a Laridian electronic book (.lbk). What you may or may not realize is that BookBuilder is the same program that we use internally to develop content for PocketBible and MyBible.

When we license content from Bible publishing houses we take what they give us as “electronic” files and turn it into what you purchase and install on your device. We get lots of different file formats from publishers: text files, pdf files, Quark files, Word documents, etc.

Since we get so many different file types there isn’t a specific program or procedure that we can use to automatically turn them into an .lbk. (Wouldn’t that be nice!?) The process is similar for each title, but not standardized.

What we work towards is to get every title into an html format and then we use TextPad to edit the html file. (An evaluation copy of TextPad is included with the BookBuilder product and also available at http://www.textpad.com/.)

One of the reasons we use TextPad is that it supports Search and Replacing using “Regular Expressions.” Regular Expressions (regexp or regexes) are a (very powerful) way of finding text using pattern matching. So, for instance, I can use regexps to find Bible references in a book file and insert tags around the references in a global manner.

Here’s an example:

To find Gen 2:7 or Romans 3:23 I would use the following regexp:

([A-Za-z]*)([0-9]*):([0-9]*)

The [A-Za-z] tells TextPad (or any other program that support regexp) to look for any letter, capital or lower case. The () around the [A-Za-z] is regexp way of telling the program to hold on to that character. The * tells the program that I’m looking for one or more character that matches and allows the program to keep it as a string. In our Bible reference example this would be Gen or Romans.

The next thing that you see in the regexp is a space. (Represented by ␢.) This is important as it helps to establish the pattern for which we are looking.

Understanding what the [A-Za-z] is doing makes it fairly straightforward to see what the next group is doing. [0-9] is looking for any number zero through nine. Again the () and * tell the program that we want to hang on to the string and it may be one or more character. In our Bible reference example this is the 2 or 3 indicating the chapter number.

The semicolon next again helps to establish the pattern. And the repeated ([0-9]*) is asking for the verse numbers. In our example the 7 and 23.

So, now what?

Now that you have the pattern established you can put this into the “Find what” field in TextPad. Making sure that the “Regular Expression” box is checked when you click “Find Next” you will step through your document finding each occurrence of a basic Bible reference.

The next step is to write your “Replace with” expression.

In the Laridian book we show that a Bible reference is a Bible reference using the following tag:

<pb_link format=”bcv | bc | cv | c | v”>…</pb_link>

So we would want the following tags for our examples:

<pb_link format=”bcv”>Gen 2:7</pb_link>

and

<pb_link format=”bcv”>Romans 3:23</pb_link>

Creating our “Replace with” expression is simple:

<pb_link format=”bcv”>1␢2:3</pb_link>

The only part of this expression that is regexp syntax are the 1 2 and 3. These indicate the three strings of () that we had collected in our “Find what” expression and indicate to the program where to place each string.

and. These indicate the three strings of that we had collected in our “Find what” expression and indicate to the program where to place each string.Two things to note:

1.) What I’ve just demonstrated here is done automatically in the VerseLinker program so it is rare that you would use this exact regexp in a Search and Replace. However, once you understand this you can use it to insert tags in commentaries that indicate where to place index and sync tags. (Your “Replace with” will be a different tag.) It also lays the ground work for some other powerful regexp that we’ll talk about in a later newsletter.

2.) This example will only find simple straightforward Bible references. You will need to restructure your “Find what” expression to find the following different kinds of Bible references: 1 Samuel 3:7-12; v. 1-3; Num 4; 3 John 8; Jude 3; etc. All of these can be handled with regexp, you just need to figure out how to structure your expressions.

This is a very simple example of what is a very powerful and potentially complex tool that we use to tag our books. Over the next few newsletters I’ll talk about more of the basics of regexp and maybe tackle some higher level expressions. In the course of tagging numerous books there really are only a handful of basic regexp components that are used regularly. I’ll try to cover these to give you what you need to tag just about anything.

Let me know if you have any specific questions or would like clarification on any of this.

One Response

  1. mcleod777 says:

    Thanks for the update. I’ve been trying the simple stuff but am having trouble coming up with an expression for the following different kinds of Bible references: 1 Samuel 3:7-12; v. 1-3; Num 4; 3 John 8; Jude 3

    Do you have time to help with these expressions?

Leave a Reply

©2014 Laridian