Recursivity: August 2007

Friday, August 31, 2007

A Google Query

What's the shortest string of lower-case letters and numbers (no spaces or other characters) you can type into google and not get any hits?

The shortest I've found so far, by non-systematic experimenting, is of length 5. I'd tell you what it is, but then it would be indexed by google and hence no longer work.

Oh, what the heck. I'll tell you anyway: zk4qj .

There. I've said it. And now. like Frankenstein, my lovely creation will soon be destroyed.

Monday, August 27, 2007

The Discovery Institute Lies Again

In this latest posting, the Discovery Institute shows once again why it has a well-deserved reputation for dishonesty.

The DI is touting William Dembski's No Free Lunch as "essential reading". But there is no mention that the book has received uniformly negative reviews from biologists and mathematicians, nor that the centerpiece calculation of the book, an estimate of a probability associated with the bacterial flagellum, is off by about 65 orders of magnitude. There is no mention that the definition of "complex specified information" is nonsensical and does not have the properties claimed for it. There is no mention that David Wolpert, co-discoverer of the "No Free Lunch" theorems mentioned in Dembski's book, criticized Dembski's argument as hopelessly imprecise.

But what else can you expect from the Dissembling Institute?

Thursday, August 23, 2007

The Noncommutative Frobenius Problem is Solved!

I rarely get the opportunity to talk about research problems on this blog, but this is an exception.

Consider the famous "Chicken McNuggets problem": if Chicken McNuggets are sold at McDonald's only in boxes of 6, 9, or 20 McNuggets, what's the largest number of McNuggets you can't buy at McDonald's? The answer happens to be my favorite number, 43. (Why it is my favorite is a story that will have to wait for another day.) Notice that you can buy any number of McNuggets greater than 43. For example,

44 = 4*6 + 1*20,

45 = 5*9,

46 = 1*6 + 2*20,

47 = 3*6 + 1*9 + 1*20,

48 = 8*6,

49 = 1*9 + 2*20,

and any number greater than 49 can be obtained by adding an appropriate multiple of 6 to these.

In general, you're given a set S of integers, and you want to know the largest number that cannot be expressed as a non-negative integer linear combination of the elements of S. This is called the Frobenius number because Frobenius is supposed to have mentioned it often during his lectures.

Of course, such an integer may not exist. For example, if S = {4, 6}, then you can only get multiples of 2 by forming an integer linear combination of 4 and 6. So you need to impose an important condition: the greatest common divisor of the elements of S must be 1. If this is the case, then you can express every sufficiently large integer as a non-negative integer linear combination of the elements of S.

Now, solving the classical Frobenius problem is easy in some cases. For example, here's a well-known formula for the case when S has only two elements, x and y, and here it is: xy - x - y. So, for example, the Frobenius number of 5 and 7 is 23. Every number larger than 23 can be expressed as a non-negative integer linear combination of 5 and 7. 24, for example, would be 2*5 + 2*7, and so forth -- but 23 itself cannot.

Greenberg wrote a paper in 1988, and Davison wrote another paper in 1994, in which they gave an efficient algorithm for three numbers.

Unfortunately, the general problem was proved NP-hard (under Turing reductions) by Ramirez Alfonsin in 1996. Roughly speaking, this means the problem is at least as hard as many classical problems for which we still have no efficient solution, such as the traveling salesman problem.

About 6 years ago, I suggested generalizing this problem from numbers to strings of symbols (sometimes called "words"). This kind of generalization is a typical activity in mathematics and theoretical computer science. You take a well-studied problem over one kind of domain, and see how the problem translates in another. The classical Frobenius problem dealt with positive integers, so we'll replace them with strings. Now S will be a set of strings.

Now we have to find the analogue of a non-negative integer linear combination. This is no problem: strings like to join up with other strings, in a process called concatenation, so our non-negative integer linear combinations will be replaced by all possible ways to concatenate the strings of S, using any of them any number of times (including 0 times). This process is sometimes called Kleene closure after the American logician Stephen Kleene, and we write it as S^*. For example, if S = {0, 01}, then S^* = { ε, 0, 00, 01, 000, 001, 010, ... }. That funny ε at the beginning is the usual mathematical representation for the empty string: the string with no letters at all in it. Notice that order matters in concatenation (houseboat isn't the same as boathouse), so we've gone from a commutative setting to a noncommutative one.

So S^* is an infinite set of strings. (By the way, in this particular example, where S = {0, 01}, the set S^* happens to have a nice description: it's the set of all strings that don't begin with a 1, and don't have two consecutive 1's appearing anywhere.)

OK, now we've generalized the "non-negative integer linear combination" part. How about the condition that the greatest common divisor (gcd) of the elements of S must be 1? Well, the point of that condition was just to ensure that every sufficiently large integer had a representation, so rather than worrying about how to define the greatest common divisor of two strings, let's replace the gcd condition with: every sufficiently long string must be in S^*. There's another way to say this: the complement of a set T is the set of all strings occurring in the universal set that don't occur in T. Here the universal set is just the set of all strings. So our condition is that the complement of S^* should be finite. In the arcane lingo of mathematicians, we can abbreviate this as: S^* is co-finite.

Now, are there any examples where S^* is co-finite? Yes. One example is S = {1, 00, 01,10, 000, 01010}. It shouldn't be too hard to convince yourself that every string except 0 and 010 can be written as concatenation of strings in this set. So in this case S^* is co-finite.

Finally, we need an analogue of the largest integer not representable. That's not hard --- it's just the length of the longest string not in S^*. In the previous example, where S = {1, 00, 01,10, 000, 01010}, the answer would be 3, since 010 is the longest string omitted from S^*. Let's call this the Frobenius number of the set of strings S.

Now it's not too hard to show that if the longest string in S is of length n, and S^* is co-finite, then the Frobenius number is bounded by a function which is exponential in n. For example, over an alphabet of two letters, you can show that the Frobenius number is < (2/3)*(4ⁿ - 1). (In case the exponent doesn't show up well in your browser, that's 4 to the n power, minus 1, times two-thirds.)

I figured out this generalization a while ago, and the upper bound a year or so ago. But I could not find any example where the Frobenius number was anywhere near as big as that upper bound. Indeed, for a long time, the worst-case example I knew was when you took S to be the set of all strings of length n, together with all the strings of length n+1. In this case, the Frobenius number is easily shown to be n² - n - 1.

I gave the problem to my Ph. D. student Zhi Xu, who only started a few months ago. We tried many different possibilities, but eventually we figured out that you should try sets S that contain almost all strings of some lengths, omitting only a few. And recently he made a big breakthrough: he was able to construct examples where the Frobenius number is exponentially large in n, the length of the longest string. Here's one of his examples: take S to be over a binary alphabet, that is, over {0,1}, and let it contain all the strings of length 3, all the strings of length 5, except that you remove the strings {00001, 01010, 10011}. Then S^* is co-finite, and the length of the longest string not in S^* is 25. For example, 0000101001100000001010011 can't be gotten by concatenating the strings of S.

So our understanding of Frobenius problem for strings is now greatly improved, in the sense that there is an exponential upper bound and a class of examples that provides an exponential lower bound. The bounds are not exactly tight, but they are much tighter than ever before. Zhi Xu has many other results on this problem, and you can see our paper on the subject here. We wrote it together with Jui-Yi Kao, another student of mine who also made some very nice contributions. I expect this paper will form the basis of Zhi Xu's Ph. D. thesis. Of course, there is much more to be done in this area.

I hope you enjoyed this tour of my recent research, and congratulations to Zhi Xu for a job well done!

Wednesday, August 22, 2007

How Stupid Do They Think We Are?

I see that the crafty and unctuous Sal Cordova is up to his old tricks again, this time truncating a quote of Darwin to make it appear that Darwin endorsed intelligent design. No doubt when called on this dishonesty, he'll call it "street theater".

David Warren: Blowhard of the Month, Again

David Warren, who I previously celebrated as Blowhard of the Month back in January 2006, is at it again.

Take a look at this recent column, which believe it or not, was actually published in the Ottawa Citizen.

From the very first line, Warren demonstrates ignorance (or dishonesty). He calls Paul Feyerabend a "scientist", when in fact, Feyerabend was a philosopher.

He then celebrates Freeman Dyson, a once-respectable mathematician and physicist who has become slightly, uh, eccentric in his dotage. (Dyson even wrote a friendly foreword to Elizabeth Lloyd Mayer's credulous Extraordinary Knowing.) Why does Warren like Dyson? I doubt Warren could coherently explain even one of Dyson's contributions. No, what Warren likes about Dyson is that Dyson is a global warming skeptic. Of course Dyson's opinion about global warming is about as valid as mine, considering that Dyson has no training in climatology and has not published any papers on the subject.

Later, Warren refers to "Darwinist cosmology". Considering that Darwin was a biologist, not a physicist or a cosmologist, I can only wonder what Warren thinks he is talking about.

Finally, in Warren's crowning moment, he completely misrepresents Michael Behe's testimony in the Dover case, claiming that "Behe demurred" when attorney Eric Rothschild said, "So you believe astrology is valid science." Of course, this is false, as a glance at the testimony will show. Rothschild never said what Warren claims. Here's the crucial exchange:

Q Under that same definition astrology is a scientific theory under your definition, correct?

A Under my definition, a scientific theory is a proposed explanation which focuses or points to physical, observable data and logical inferences. There are many things throughout the history of science which we now think to be incorrect which nonetheless would fit that -- which would fit that definition. Yes, astrology is in fact one, and so is the ether theory of the propagation of light, and many other -- many other theories as well.

Warren finishes by saying

How I wish the high priests of Darwinism could restrain themselves in similar ways. They can’t, because Darwinism is a religion, and moreover, a false religion, in fear of free inquiry.

Why is it that theists, when they want to insult science, they call it a religion?

Sunday, August 19, 2007

Return of the Vanity Scam

It seems the American Biographical Institute just won't give up.

They failed to entice me by selecting me as a "Great Mind of the 21st Century", so now they've upped the ante.

Now they're nominating me as "Man of the Year".

For only $195 (or $295 if I want it laminated on "Finland Birch Wood"), I can get a decree calling me "Man of the Year 2007", "based on his outstanding accomplishments to date and the noble example he has set for his peers and entire community."

Really, I wonder how vain you would have to be to fall for this transparent scam.

Oh. About this vain. Or this vain. Or this vain...

Tuesday, August 14, 2007

A New Word - "Outcheap" - and McCain's Unintentional Irony

On NPR this morning there were two items of linguistic interest.

The first was this interview with Joe White, the Detroit bureau chief for the Wall Street Journal. White discussed driving a plug-in electric car, and in the discussion he used a word I've never heard before:

"The reasons why we don't have all-electric cars running around - and this goes back more than a century - is that battery technology simply hasn't evolved to the point where we can outperform and outcheap gasoline and the internal combustion engine."

The OED doesn't list "outcheap" as a word, but it's clear what it means: "to be able to be produced more cheaply than an alternative". It reminds me of a word my friend Jean-Paul once invented, as a translation for the French defenestrer: to "outwindow". He was disappointed to learn that "defenestrate" already existed in English.

I found some earlier citations using Lexis. An article in Crain's Detroit Business, April 15, 1996, quotes Farmington Hills retailing consultant Frederick Marx as saying , ''Anybody can add volume giving it away, and that's what it's becoming: a contest of who can outcheap each other.''

The second item was this interview with presidential candidate John McCain.

"One of the things that drove Harry Truman's decision to integrate our military was how shocked and angered he was by the lynchings of two African-American World War II veterans. Truman was certainly no - you know, he used racial epitats in private conversation, etc., but he believed that his oath to protect the Constitution made him responsible to protect the rights of all the American people."

Here McCain used the nonexistent word "epitat". He really meant "epithet", but was probably mixing it up in his mind with "epitaph", and so out came this mélange of both words.

Ironically, McCain also said, "Let's have some straight talk. Harry Truman was not the greatest intellect that ever inhabited the White House, but he had the right instincts, and frankly, he had the humility and inspiration to recognize that you have to suborn your personal ambitions for the greater good."

Here McCain demonstrates that he doesn't know what the relatively uncommon word "suborn" means. According to the OED, it has several meanings, but the ones that are most familiar are

"to use illegal or underhanded means to induce someone else to do a misdeed"
"to induce someone to give false testimony" (as in "to suborn perjury")

McCain probably meant to say "subdue" or "subjugate". I like the irony that he was calling Truman intellectually deficient while, in the same sentence, he exhibited some ignorance himself.

Talk Radio Hosts Fired for Debunking Phony Racist Memo

From the Washington Post comes this sad story about two radio talk show hosts who have been fired for debunking a rumor, long passed around in the African-American community, that the Carter administration secretly plotted to undermine domestic black leaders.

Casey Lartigue Jr. and Eliot Morgan hosted "The Casey Lartigue Show", a weekly political talk show on XM satellite radio. They discussed this phony memo which exists in various versions on the Internet. (One tip-off that it's phony? Brzezinski's last name was spelled incorrectly.) The version I've linked to is on (no surprise) the website of Louis Farrakhan.

Lartigue and Morgan were fired when they exposed the memo as a fake. In doing so, they went against the XM's lead talk show host, Joe Madison, who still believes that the memo is real.

Talk radio: where telling the truth can get you fired. I guess that's why people like Rush Limbaugh still have jobs.

Monday, August 13, 2007

Combinatorics on Words, A Puzzle, and a Silly Research Proposal

One of the areas I work in is combinatorics on words. In this field, we are interested in words (strings of symbols) and their properties.

For example, a square is a nonempty word of the form xx, where x is a word. Some examples of squares in English include tartar and murmur. If a word contains no subword that is a square, we say it is squarefree. Notice that the word squarefree is not squarefree, as it contains the square ee.

A classical question, first discussed a hundred years ago by the Norwegian mathematician Axel Thue, is whether there exist infinite squarefree words over a finite alphabet. Over a two-letter alphabet, this is not possible, as any binary word of length 4 or more contains a square. (Proof: let the first letter be a. Then if the next letter is a, we have a square. So assume the next letter is b. If the third letter is b, we have a square again. So assume the third letter is a. Now, whatever letter is chosen as the fourth letter, we get a square.)

But is it possible over a three-letter alphabet? The answer is yes, as Thue showed.

Erdős considered a variation on this problem. Instead of trying to avoid squares, let's try to avoid abelian squares. An abelian square is a nonempty string of the form x x', where x' is a permutation of x. Some nice examples of abelian squares in English include reappear (the first half of the word reappears in the second half) and mesosome. Can you find the longest word in English that is an abelian square? Hint: it represents part of the body. If you're interested in this sort of wordplay, see my page on repetitions in natural languages, where the solution to the puzzle is given.

For many years it was not known whether one could construct an infinite word over a 4-letter alphabet that contains no abelian squares. But in 1992, Veikko Keränen showed that one could. His construction involves constructing a map that sends each letter in {a,b,c,d} to a string of length 85. Starting with a, one then simply iterates this map. For more information, visit Keränen's page.

Now we've covered combinatorics on words, and the puzzle. It's time for the third item in the title. Go visit this silly research proposal by a professor of civil engineering and engineering mechanics at Columbia University. The proposal is so poorly written that I can't really tell what he has in mind, but he seems to suggest representing strategies for development in third-world countries by strings of symbols that specify a temporal sequence of events. I really hope this is a joke, but if it is not a joke, I hope it doesn't get funded.

Recursivity