Analyzing Strings

Search
 

References

  1. MSDN
  2. Roxen
A large number of methods in traditional Smalltalk are dedicated to the analyzing of strings. But there is a well documented method for scanning strings in the C runtime libraries which Smalltalk MT makes extensive use of.

The C runtime library uses a 'format specification'. The documentation can be found here. Basically, you define a string with a % where you want to analyze something and after the % you indicate the way you want it analyzed.

For example if you were reading a string and you wanted to read 5 characters a space and another 7 characters. Try this code in a workspace.

str := 'YHOO 11/4/02'.
stock := String new: 5.
date := String new: 8.

WINAPI sscanf: str basicAddress
        with: '%5s %7s'
        with: stock basicAddress
        with: date basicAddress

Here the format specification says read 5 characters, then match a space, then read 8 characters. Is you display the result of the sscanf, it should return 2 indicating that two items were matched. You can use this return value as a test to see if you matched all of the items you expected.

If you experiment with other strings e.g. 'A 1/1/90' and 'EMLXA 12/30/01' you will see that sscanf correctly retrieves both strings.

But this is just the beginning since sscanf can scan numbers as well.

Assume that you have a string with a 5 character stock symbol, a space and the last stock trade price.

str := 'YHOO 15.06'.
stock := String new: 5.
price := LONG new.

WINAPI sscanf: str basicAddress
        with: '%5s %f'
        with: stock basicAddress
        with: price basicAddress.
Float value: price

Notice that the float stock price must be placed into a LONG. This is because sscanf returns a 32 bit float (not a 64 bit float as represented in Smalltalk MT). The 64 bit float can be retrieved from this long by the last line.

Now one last more powerful example. Assume that we have a string with a variable size stock name and a coincident date. We want to scan the string for the first numeric item and use this point as the split between the alpha stock name and the numeric date. Here is the code:

stock := String new: 5.
date := String new: 8.
str := 'YHOO11/2/02'.

WINAPI sscanf: str basicAddress
with: '%[^0-9]%s'
with: stock basicAddress
with: date basicAddress.

Here the '%[^0-9]' means return an item with no numerics which returns the stock name. The second %s then picks up the remainder of the string as the date.

So think about using 'format specifications' to analyze strings and numbers.