There were several requests on the next-to-last thread for definitions of specified complexity, so to redeem a promise and– I hope–eliminate the need for repeating the same definition ad infinitum, I’ll attempt to summarize the major points of Dembski’s paper on the subject here.
The context dependent measure of specified complexity is given by the following formula:
(pg 21)
This is the definition we are using in this course.M and N represent the replicational resources. In a general case, we may replace them by 10120 (upper bound of bit operations our universe could have accomplished throughout her history) to get
(pg 24)
Specificity is calculated by means of the following formula
(pg 18)
Aside from the ommision of replicational resources (M and N), this formula is identical to the above formula for specified complexity. It is dependent on two components, φs(T) and P(T|H).
φs(T) is defined as a measure of the specificational resources, and is given by the cardinality of {U ∈ patterns(Ω) | φ’s(U) ≤ φ’s(T)} where patterns(Ω) is the collection of all patterns that identify events in Ω. φ’s(T) is the descriptive complexity of T (a measure of the simplest way for s to describe T).
(pg 21)
“S” represents the subject who determines & describes, with the languages available to him, the pattern.
To make this simple, consider the special case where the only language is the language used in programming register machines, and the patterns being described are bit strings. When analyzing a string of a given length, you determine first how long the program describing it must be, and then determine how many other bit strings of that length there are with descriptions at least as short. This second number is equal to φs(T).
In the general case we have many more languages and patterns far more complex than bit strings, but the same concept holds.
P(T|H) is the probablity of pattern T given the relevant chance hypothesis H. P(T|H) ≠ 1 in cases where H, the chance hypothesis, is known, unless H is completely deterministic.
P(T|H) where T=heads and H is a fair coin toss is 1/2.
P(T|H) where T=(heads)(tails)(heads)(tails) and H is a series of independent fair coin tosses is 1/16.
Finally, T is considered a specification if the measure of specified complexity is strictly greater than 1.
Corrections from anyone comfortable with the paper are very welcome. In the meantime…
Michal wrote
Where is the complex specified information defined in a clear fixed unambiguous way, along with all the key sub-terms, Hannah?
Is there anything ambiguous about those definitions?
Thanks for that, Hannah.
I’m sure that others will be thrashing these terms out. But maybe class members and others might want to think about this in a somewhat different way.
The formula for specified complexity that is posted here is not unlike that that physical chemists might use to describe or calculate thermodynamic entropy. (I’m not the first or only one to notice this.) Recall (for the first time - LOL) that, in thermodynamics, entropy is a state variable. In other words, the absolute value of the entropy of a system is something that is independent of the pathway by which the system was attained.
My question for the class is this - is specified complexity a “state variable”? I think that this question makes one think about the issue, especially as it relates to the computation of things like specificational resources, in ways that we usually don’t see in discussions of CSI.
What do others think?
Comment by Art G — July 24, 2006 @ 4:58 pm
I’ve done pchem, but hadn’t thought of the resemblance.
No, because our calculations are dependent on the hypthetical path (through P(T|H)). Perhaps if one needed one could formulate a more general version, but I wonder if it might be too general to be actually useful.
Comment by Hannah — July 24, 2006 @ 5:06 pm
Thanks Hannah.
Okay, now using the definitions you provided, show me (1) each step in the calculation of the probability that ftsK protein in E. coli was designed and (2) please state your assumptions for each step and the reliability/standard error at each step.
Comment by Michal Hubl — July 24, 2006 @ 5:44 pm
I would have thought, after this, you’d be the one to owe me a favor? You seem to think it’s the other way round. ;)
It probably hasn’t occured to you, but we actually don’t all have infinite amounts of time, and some of us have in fact our own studies and research. That would suggest there might be a limit as to how much you can reasonably ask of another person.
Comment by Hannah — July 24, 2006 @ 5:56 pm
It probably hasn’t occured to you, but we actually don’t all have infinite amounts of time, and some of us have in fact our own studies and research. That would suggest there might be a limit as to how much you can reasonably ask of another person.
If you feel like you can’t meet my request in its entirety, then how about just showing us how you start out? Using your definitions, of course.
Maybe someone else who finds this stuff as straightforward as you make it sound can jump in and finish the job.
I mean all that’s at stake is the credibility of evolutionary biology and a shot at fame for you. I find it odd that you aren’t interested in following through. On the other hand, it seems like everyone is too busy to do these calculations, including ID promoters. That’s sort of an interesting data point.
Comment by Michal Hubl — July 24, 2006 @ 6:07 pm
Hannah
It probably hasn’t occured to you, but we actually don’t all have infinite amounts of time, and some of us have in fact our own studies and research.
Thanks for the insult, by the way. As a matter of fact, Hannah, I believe the fact that scientists don’t have infinite time is one of the many reasons that most scientists ignore creationists and ID promoters.
And you already know this because it’s been explained to you here on this blog.
Comment by Michal Hubl — July 24, 2006 @ 6:10 pm
I admit, I can’t follow what the mathematical representations of specificity and complexity imply, but what of the common complaint that Dembski describes specification as function, and that he only defines complexity as something which cannot (yet) be explained by regularity and/or chance, such that he’s mathematically arguing that something functional whose origin we do not (yet) understand is designed.
That may or may not be an accurate description, but at the very least, CSI is a vague mathematical representation, with no tangible feature in biology as an example, correct?
IOW - has Dembski or anyone else actually found and empirically identified any features with these equations, or is it just an abstract assertion?
Comment by Dan — July 24, 2006 @ 6:22 pm
I’d like to point out at what appears to be a glaring incosistency in the definition of
Phi_S(T). T refers to an element of Omega - the event space - as indicated by applying the probability measure P(.|H) to T. However, in the definition of Phi_S(T), the function Phi’(.) is applied both to T, which is a member of Omega, and to U, which is a member of Patterns(Omega). Even if patterns Omega is a well-defined set - which is far from certain - Phi’(.), the descriptive complexity of a pattern or event, is not a well-defined function, since it is being applied to two distinct domains.
Dembski tries to get around the problem by proclaiming that “Alternatively, T can be conceived abstractly as a pattern that
precisely identifies the event (target) T.” (p. 16). This definition, aside from being circular, is non-sense. The set patterns(Omega) is a subset of the power set of BasicCommunicationSymbols(S), where S is the semiotic agent in question. (It should really be Patterns(Omega, S), to underscore the dependence on S). Omega is the physical event space. The same entity T cannot belong to both spaces at once.
I’m not saying that this definition isn’t of CSI is necessarily irredeemable, but right now it is completely broken. Hannah, care to try and fix it?
Comment by Leonid Meyerguz — July 24, 2006 @ 6:38 pm
I admit, I can’t follow what the mathematical representations of specificity and complexity imply
With respect to their application to natural (not engineered by humans) biological systems, nobody can … except for maybe Hannah and Sal and Bill Dembksi. That’s why we’re waiting to see them apply this fantastic concept to a biological protein and prove that evolutionary biologists are wrong. Or something.
(checks watch)
Still waiting.
Comment by Michal Hubl — July 24, 2006 @ 6:38 pm
Silly SPAM filter can’t identify mathematical symbols. Hannah, I hope you get the chance to take a look at my last post once you fish it out of the SPAM filter. I think there are good reasons to attack Dembski’s definition of CSI on mathematical grounds alone, though it is the near-impossibility of applying the concept that constitutes CSI’s greatest weakness, IMO.
I’m on campus, so if Allen doesn’t mind, I might drop by the class when you guys are discussing Dembki’s work. Arguing on this forum may prove too time-consuming in the near future.
Comment by Leonid Meyerguz — July 24, 2006 @ 6:48 pm
Slight clarification: The various formulas entered are for bit-measures of specified complexity, not the CSI itself.
There is a difference between the measure and the actual information itself.
For example, I may say that the information in a file is 5000 bits, but that is shorthand for saying the MEASURE of information in the file is 5000 bits. The actual contents of the file are the information.
The formulas above are technically the measures of information. Dembski uses a common industry shorthand. But one should be aware of this nuance in discussion.
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 6:52 pm
Well, crackey! I had a not so long post ready to go, but didn’t type the anti-spam number and lost it down the memory hole!
So, here’s my question and comment:
First, thanks to Sal and other participants for hashing this out (sometimes a bit exhaustively, however necessary). As a layman in biology, genetics and math (you math geeks are sumpin’!), I’m learning a great deal and humbled by some of the intellects participating here.
I think Dan asks a good question: What is the practical application of Dembski’s CSI formula to biological systems?
Take e coli for example - in light of Sal’s recent comments, what factors would one use in the forumlae posted by Hannah to menaure the CSI for this system? Would one only look at a subsystem, such as the flagellum, or the whole cell or both? Is the quantity of genetic base pairs relevant? How about the number of proteins needed to construct the flagellum? Or would we need the amino acid count?
Or, do we need a baseline to measure from (as in, an ancestral bacterium)?
Forgive me my great ignorance if these questions aren’t framed properly.
Comment by todd — July 24, 2006 @ 7:21 pm
There are several hypotheses H that can be proposed, actually an infinite number. CSI is offered with respect to each H, but if Dembski’s displacement theorem is correct (which is still debated), a simple distribution is no worse on average than blind watchmakers trying to find the optimal distribution by random chance either. Thus average CSI from all possible distributions will not be any less CSI than simple distributions like the uniform one. Thus one does not have to explore an infinte number of H’s to make a reasonable design inference….
I cannot possibly go into all the details. I can only give a crude and very crude approximation.
We can explore the design of a protein under a lock-and-key metaphor. That is, there is no a-priori reason why a certain protein must be picked out over another. However, if we have pairs or sets of proteins, we can characterize the likelihood they will be well-matched.
Given protein-A in a system, protein-B will be the only one that can fit the role. This is a simplification of course, but we must start somewhere.
Let us start with the purely stochastic uniform distribution case first then move one to more sophisticated ones where H involves selection, not just chance.
For example, a 100-mer protein may has 20^100 possible configuratons. It may require 25 monomers to characterize it, with the other 75 permitted to be variable. P(T|H) for hitting that protein is CRUDELY approximated by 20^25/ 20^100 = 1/ 20^75. That’s a crude approximation because I did not include the issue of synonymous codons, but that’s kind of how the basic calculation is done.
If one argues natural selection improves the odds over random chance, it must justify that in view of Dembski’s displacement theorem which says evolutionary algorithims, without specific direction perform no better than random chance on average. That is, the selective forces themselves have to be an engineered search strategy. One cannot merely assume statistically that there existed a selective force to improve the search….
Crude CSI measure in this case for a single trial is:
-log2( 1 / 20^75) = 324 bits
For multiple trials one can put an MN that they think appropriate. Since this is not presumed an algorithmically compressible case, one can drop the Phi_s(T) term.
That’s a sketch. It is not complete, but that’s a start. We do not assert intelligence, we merely assert the confidence we can have the H is the correct hypothesis. The more bits the higher the confidence we can reject the hypothesis H.
One does not need UPB if that’s not the level of confidence one is trying to assert. If one will reject a theory because it is likely to be 99.99% wrong, then there is an appropriate number of bits to reach that conclusion. UPB is for context-independence.
For chance hypothesis H, we are trying to show it lead to a contradiction of the form: “E implies not-E” with X% amount of confidence.
Given that, I rather prefer this example since it explores ALL possible probabiliity measures P on any arbitrary H, both known and unknown:
Perfect Architectures Which Scream Design.
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 7:38 pm
Given protein-A in a system, protein-B will be the only one that can fit the role.
What role?
Why do you insist on abstraction, Sal? I can imagine why you might insist on abstraction — and particular abstractions as that — but maybe you can tell us why you refuse to show us how Dembski’s “theory” which you are so enamored with can be applied to a real world example.
Try it with ftsK. I picked it off the top of my head, by the way. I can give you another if ftsK frightens you for some reason.
Comment by Michal Hubl — July 24, 2006 @ 7:49 pm
Allen, Hannah,
I object to students being interrogated like this. The questions should not be directed at Hannah.
This weblog is for students to learn, IMHO, not for outsiders to demand they be taught by Cornell students or for Cornell students to be subjected to demands to “fix” Dembski’s work.
I think this is highly rude, IMHO.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 7:51 pm
Actually, I agree with Sal.
Allen should put Mike’s question on the final examination for the course …
Comment by Don Baccus — July 24, 2006 @ 7:55 pm
Geez, I’m out of depth! Please let me know if I understand Sal’s post correctly: Simply put, one begins with what we know about the arrangement of a given system’s components, then determine the probability those components would assemble into a functional whole without purposeful arrangement?
Comment by todd — July 24, 2006 @ 7:59 pm
How about a simple example then instead of biology, and then we work our way up. :-)
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 8:06 pm
OK, for the sake of argument, let’s just say everyone agreed that overwhelming evidence indicated that the bacterial flagellum evolved via natural processes. Here’s my question: in this case, would the bacterial flagellum exhibit specified complexity?
Answer “yes” or “no”, and support your answer…
Comment by nmatzke — July 24, 2006 @ 8:09 pm
I’m afraid that’s incorrect becuase T is subset of of Omega, thus it is not an element of Omega since T contains elements of Omega.
I illustrated that the calculation can be done here
If it is countably finite, patterns Omega is the Power Set of Omega, thus it is well defined for the issues under consideration.
That is not correct, he is merely saying certain T can be a singleton set within Omega. That has nothing to do with trying to get around the problem of defining Phi.
I already hinted how to calculate Phi_S(T). It is merely the cardinality of the set T. Sometimes the set T is hard to characterize. Fine, chose another T where Phi_S(T) is tractable. If the T under consideration is too hard to characterize with a Phi, then don’t use it. Better to make a false negative than a false positive.
I don’t think that’s correct. Dembski did not say that, he said
Thus he did not say T belonged to both. Even though Phi(T) which is a positive integer calculateble by through semiotic agents.
I think you need to fix your critique first. Finally, I don’t think it’s approriate to make students studying a theory defenders of it as well.
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 8:34 pm
Sal,
I was going on Dembski’s prison term coin toss scenario in the paper.
1. We need the playing field.
2. We need the equipment
3. We need the game permutations
Right?
Comment by todd — July 24, 2006 @ 8:36 pm
Sal (and Hannah):
I actually agree with Sal here, and I’m sorry if my comment #7 comes across as too pushy. Hannah, please don’t take my comment above as a “demand” that you fix Dembski’s definition; rather, I am primarily curious what you, as a math major, think about my observations above. I am generally quite impressed with the quality of your posts, and your genuine efforts to convey a clear and persuasive arguments: hence my interest in your PoV. However, please don’t feel like I’m trying to give you a homework assignment; if anything, most undergrads here at Cornell have plenty of those as it is.
Comment by Leonid Meyerguz — July 24, 2006 @ 8:36 pm
Let me suggest this thread would not be a good place to peer-review of Dembski’s math. We can proceed with the discussion without delving into the higher recesses of it. Especially the math that is already well accepted in the disciplines of statistics.
However, before I completely dismiss the topic let me illustrate how semeiotic agent can calculate Phi_S for certain T’s. I apologize for the extreme brevity. Again let’s work with 500 fair coins.
1. “take first coin and repeat” There are two members that conform, thus Phi_S(T_take_first coins_and_repeat) = 2 = Phi_S(T_all_heads) = Phi_S(T_all_tails)
H H H H H
T T T T T
2. “take first two coins and repeat” There are two members that conform,
Phi_S(T_take_first_coins_and_repeat) = 4 = Phi_S(T_all_heads) = Phi_S(T_all_tails)= Phi_S(T_HT_repeat) = Phi_S(T_TH_repeat)
H T H T H T
T H T H T H
H H H H H
T T T T T
etc.
Notice Phi_S(T_all_heads) can be either a member of T with cardinatily 2 or 4. Under Dembski’s definition, one will choose the lower cardinality.
Also, it did not matter that I used english to help me judge whether the members of set T belong together as they are describable by the same sentence.
Ok, any further disucssion, perhaps, we take it elsewhere or just wait till the end of the weblog. Otherwise we’ll lose the readers.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 8:58 pm
Dan–
A mathematical formulation, yes, but nothing vague about it.
Michal–
Oh, it wasn’t meant to be an insult– or at least, not an insulting one. Just a reminder that there is such a thing as an unreasonable request, even if the request is only made to an annoying student who spends too much time arguing :).The calculation of CSI in e. coli sounds like a fun research project. Unfortunately, I haven’t any time to take away from my other research project. So this one will have to wait.
Leonid–
No problem, you weren’t too pushy. And I was grateful you didn’t demand a doublespaced manuscript by midnight tonight (absolute deadline), because I want a bit of time to think about the set issue.
We went through Dembski’s work last week, and this week are on Johnson and all the random loose ends we haven’t yet covered. And after that is Ernst Mayr and the EB perspective on teleology. Ofcourse, maybe we could call a special meeting to hash out CSI….
Comment by Hannah — July 24, 2006 @ 9:13 pm
Close enough. :-)
It is highly suited to the origin of life question (i.e. what’s the chances it happened naturally). As far as the bacterial flagellum we can analyize the plausibility of various evolutionary scenarios. The higher the numbers that are generated by the above formulas, the less likely we are to accept a particular evolutionary pathway was the one.
But where IDers believe Dembski’s methods will eventually prevail is in the detection of linguistic constructs within biology, not so much asking “could this evolve via Darwinian evolution”? Most IDers are well past believing that. The more interesting question is detecting undiscovered designs in biology. Dembski’s methods are highly optimized to detect such linguistic constructs.
The question of evolution is only a fraction of what ID attempts to investigate.
Also in many instances the I is irrelevant to applying the majority of ID theory, but the D is very important. The above formula deals with D.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 9:18 pm
Salvador writes:
Right you are. I should have written “subset of the power set of Omega”, or “element of the power set of Omega”.
You’re doing no such thing. You are showing us how to compute P(T|H) for a series of coin tosses. This is trivial. You are not giving us a clue as to how to compute Phi_S(T).
Aside from a slight quibble - namely, every finite set is countable - this actually makes sense. Specifically, if we define Patterns(Omega) as the power set of Omega - call it PS(Omega) - then Phi’(T) is a well-defined function. It still doesn’t give us any means to compute except by vague analogy to compressability, but it’s a start.
No, no, no! I am amazed that you, the man who has all of Dembski’s books, would make such a claim. The cardinality of T - call it card(T) - is what lets us compute P(T|H) under the uniform null hypothesis H. Namely, Card(T)=P(T|H)*Card(Omega). Phi_S(T) is a different beast entirely. It is, to use your definition of Patters(Omega) above, the number of subsets of Omega whose “descriptive complexity” (the Phi’(.) value) is at least as complex as that of T. If I were a churlish knave, I would suggest you start misrepresenting Dembski and go read his latest paper. ;)
However, you did address my one concern - the incosistency in the definition of Phi’(.) Of course, the fact that Dembski never explicitly defines Patterns(Omega) doesn’t do wonders for the clarity of the paper.
Comment by Leonid Meyerguz — July 24, 2006 @ 9:35 pm
Once again, about state variables and the like. Trying to link “the math” with some biology. (Just trying, mind you.)
I’d agree with you, Hannah, that specified complexity is not a state variable. Accepting this, perhaps you can reflect on the usual ID theorist’s approach (an example of which Sal give in this thread) for estimating the specified complexity in a protein:
“For example, a 100-mer protein may has 20^100 possible configuratons. It may require 25 monomers to characterize it, with the other 75 permitted to be variable. P(T|H) for hitting that protein is CRUDELY approximated by 20^25/ 20^100 = 1/ 20^75. That’s a crude approximation because I did not include the issue of synonymous codons, but that’s kind of how the basic calculation is done.”
Dembski uses the same approach, as does Meyer and most (if not all) other IDists. I think that this method is not consistent with the (correct) notion that specified complexity is not a state variable.
(Sal makes different, basic but oft-repeated error in his calculation. But that’s for another discussion.)
Comment by Art G — July 24, 2006 @ 9:37 pm
Sal,
I meant to thank you in the last post for giving a cogent definition of Patterns(Omega). Somehow, that didn’t make it into the final version of my post. Anyhow, thanks.
Also, since that didn’t make it into my final post, I’d like to point out that the biggest problem in the definition of Phi_S(T) are now the Phi’(.) functions. Depending on how you define it Phi’(.), these quantities are either not computable, highly subjective, or can chosen so as to be very low for any T under consideration. (Meaning that every improbable event would exhibit SCI by definition.) While it may be possible to bound these quantities in the case of neatly constructed toy examples, it is probably impossible to compute them in the real world.
Comment by Leonid Meyerguz — July 24, 2006 @ 9:46 pm
You’re welcome.
I still don’t think that is quite right. T is a subset of Omega. T is a member of the powerset of Omega.
I was not very clear in my earlier link. My apologies. You may want to look at the example above then where I make explicit Phi_S(T) calculations.
There is no reason one needs a generalized method to cover every algorithmically compressible situation under the sun. If Phi_S(T) is intractable, then one would simply decide to characterized T as a pre-specification versus a specification, or simply drop T altogether.
My point was showing to the reader that Phi_S(T) does not have to be some super mysterious entity. If one is not comfortable with the conception, then one shouldn’t characterize T as a specification…
My language was inexact and thus led to confusion.
Phi_S(T) is the cardinality of the superset of T call it T_superset which all members have the same or less algorithmic complexity as T. For the readers benefit I gave a slight illustration of how that cardinality is calculated.
As I said, it’s only of interest if one is trying to identify every last designed algorithmically compressible string. One need not do that.
The issue is one does not have to compute it to make the formulas work. Domains where Phi_S(T) cannot be tractably computed imply we just don’t use T as a specification.
If it cannot be computed, then the domain of inquiry is beyond reach at that time, it does not negate the formulas for other domains of inquiry.
As I pointed out above, there is no absolute Phi_S, it is on a case by case basis, assuming T is even algorithmically compressible in the first place. If Phi_S is intractable, then treat T as a pre-specification, or simply admit for a pariticular inquiry, CSI cannot be asserted even though it might possibly be there.
For more examples, let’s say we discover we have a pattern of 500 coins where the first 250 coins mirroring the last 250 coins.
Call it T_single_target. If T_single_target is a member of a superset of symmetric T’s where each member of a 500 coin string has the first 250 coins mirroring the last 250 coins.
Phi_S(T_symmetric) = | T_symmetric | = 2^250
Phi_S(T_symmetric) P(T_single_target|H) = P(T_symmetric|H)
There are two ways to approch this. We can express context-dependent CSI crudely
A. -log2 ( Phi_S(T_symmetric) P(T_single_target|H) ) = 250 bits
or
B. -log2 ( P(T_symmetric|H) ) = 250 bits
A is the more proper approach, B would get the job done, get the same answer, but it takes a slightly unwholesome shortcut
Note: T_all_heads is a member of T_symmetric. For every element of T_symmetric one could use 2^250 as Phi_S(T) however in some cases that would be overkill as Phi_S(T_all_heads) = 2, but the formula would still work with Phi_S(T) = 2^250.
I’m sorry for delving into this more, but I wanted to reassure the readers Dembski’s conception do not rise or fall on the ability to calculate Phi_S(T) for every T under the sun!
Comment by Salvador T. Cordova, IDEA GMU — July 24, 2006 @ 10:33 pm
The one problem I have with the CSI calculation is that there is no concrete method (that I’m aware of, anyway) for determining equality between a specification and an actual manifestation.
As a quick example, for Mount Rushmore, where we know that there was Complex Specified information involved in its building, what sort of equality is required between the manifestation and the specification in order to register it as a match? This can have a very large impact of specificational complexity.
I think there is an intuitive comparison that is generally done (and is generally correct), but I think it should be a goal of ID to determine both how and why an object should be considered an implementation of its specification.
Comment by Jonathan Bartlett — July 24, 2006 @ 11:38 pm
In comment #10 Leonid Meyerguz wrote:
As Hannah pointed out above, we have finished with our discussion of Dembski (as far as this course is concerned), and are now moving on to a consideration of Phillip Johnson’s The Wedge of Truth. However, I think it would be very interesting to continue this discussion elsewhere, and suggest that the best venue might be at one of the weekly meetings of the Cornell IDEA Club. Hannah, what do you think? My impression from last week is that we have not exhausted this topic by any means….
Comment by Allen MacNeill — July 25, 2006 @ 12:16 am
Leonid:
That said, if you would still like to attend the seminar, feel free. The more, the merrier!
Comment by Allen MacNeill — July 25, 2006 @ 12:23 am
My claim that P(T|H)=1 where H is the hypothesis which actually matches the how T arose, was met with some justified opposition. Only when H is fully regular we are justified in arguing that P(T|H)=1. So we have established that for regularities, by definition, CSI cannot be generated. Sal and others raised the concept of chance but pure chance is also not really a likely explanation provided by science. In fact, a chance hypothesis of the order of 10^120 makes chance highly implausible or at least highly unsatisfying as an explanation. And since science when it comes to evolution does not rely on chance ‘explanations’ we cannot reject chance hypotheses as well. Note that in Dembski’s original formulation chance was excluded because it lacked specification.
So now we have left evolutionary hypothesis, what challenge does such a hypothesis face? Well, according to Dembski the hypothesis has to be causally specific and then science has to show that
The probability of each step has to be ‘reasonably large’. But a reasonably large probability would also have an unreasonably low Complexity.
We can understand this intuitively: A pathway which is so unlikely does not make for a very good scientific hypothesis and thus has to be rejected. In other words, any hypothesis which is sufficiently detailed and probable will by definition have little complexity and any hypothesis sufficiently improbable will have complexity but it will now be rejected because of its low probabilities.
Hence my claim that regularities are exempted by definition from generating complexity because P(T|H)=1 and that for regularity and chance hypotheses, P(T|H) has to be sufficiently large for the hypothesis to be sufficiently probable which also eliminates any hope for P(T|H) to generate any complexity. In fact, complexity as used by Dembksi is merely a transformed probability. When probability is too small, it is complex but then any regularity and chance hypothesis will fail as it is too unlikely and when the probability is sufficiently large, it will fail because of too low complexity. In other words, the deck is fully stacked against ID.
Worse, for a given T, ID cannot even show it contains CSI given the hypothesis H is T is designed since ID provides to specifics allowing one to calculate P(T|H) where H is the hypothesis that T is designed.
Which is why Dembski, when it comes to known hypotheses does not calculate P(T|H) where H is the design hypothesis but rather P(T|H) where H is the uniformly distributed probability function and thus P is easily turned into an unreasonably unlikely chance process which is now used as evidence that P must have been designed BUT no calculation of P(T|H) where H is the design hypothesis is provided.
If high P(T|H) where H is a regularity/chance hypothesis is needed to explain T then why is design decided based upon low P(T|H) where H most of the time is the uniformly distributed random distribution?
Of course, this conflation of terms leads to interesting problems since Dembski had to accept the concept of actual versus apparent CSI. In other words, there were cases which appeared to be designed (CSI) but in fact the CSI was generated/displaced by a natural process. So how does one distinguish between displaced and generated CSI? The Explanatory Filter does not tell us how.
So in short
1. CSI can only be generated by improbable hypotheses and thus are rejected as too improbable
2. It has yet to be shown that ID can generate CSI, so far only chance has been shown to be able to generate complexity.
3. Apparent and actual CSI or generated versus displaced CSI are introduced but no way is presented to differentiate between the two.
Hope this clarifies and extends my comments that P(T|H)=1 when H is the hypothesis which matches in sufficient causal specificity the actual pathway. Rather than being 1, P(T|H) has to be sufficiently large to be accepted as a valid hypothesis but such would also eliminate any CSI…
Comment by PvM — July 25, 2006 @ 12:38 am
In fact, my original comment was in response to a statement by Dembski (or another ID proponent) who argued that if science proposes a hypothesis which explains T sufficiently, P(T|H) has to be probable, destroying thus any CSI. But I cannot find the original quote.
Patience…
Comment by PvM — July 25, 2006 @ 12:39 am
Remember Dembski calculating the probability for the flagellar protein?
In fact Dembski shows that P(T|H) is so small that it has to be rejected as a valid hypothesis, although the probabilities involved show large amounts of CSI (read extremely low probabilities).
Miller et al observe
Dembski then offers his readers a calculation showing that the flagellum could not have possibly have evolved. Significantly, he begins that calculation by linking his arguments to those of Behe, writing: “I want therefore in this section to show how irreducible complexity is a special case of specified complexity, and in particular I want to sketch how one calculates the relevant probabilities needed to eliminate chance and infer design for such systems” (Dembski 2002a, 289). Dembski then tells us that an irreducibly complex system, like the flagellum, is a “discrete combinatorial object.” What this means, as he explains, is that the probability of assembling such an object can be calculated by determining the probabilities that each of its components might have originated by chance, that they might have been localized to the same region of the cell, and that they would be assembled in precisely the right order. Dembski refers to these three probabilities as Porig, Plocal, and Pconfig, and he regards each of them as separate and independent (Dembski 2002a, 291).
This approach overlooks the fact that the last two probabilities are actually contained within the first. Localization and self-assembly of complex protein structures in prokaryotic cells are properties generally determined by signals built into the primary structures of the proteins themselves. The same is likely true for the amino acid sequences of the 30 or so protein components of the flagellum and the approximately 20 proteins involved in the flagellum’s assembly (McNab 1999; Yonekura et al 2000). Therefore, if one gets the sequences of all the proteins right, localization and assembly will take care of themselves.
To the ID enthusiast, however, this is a point of little concern. According to Dembski, evolution could still not construct the 30 proteins needed for the flagellum. His reason is that the probability of their assembly falls below what he terms the “universal probability bound.” According to Dembski, the probability bound is a sensible allowance for the fact that highly improbable events do occur from time to time in nature. To allow for such events, he agrees that given enough time, any event with a probability larger than 10-150 might well take place. Therefore, if a sequence of events, such as a presumed evolutionary pathway, has a calculated probability less than 10-150 , we may conclude that the pathway is impossible. If the calculated probability is greater than 10-150, it’s possible (even if unlikely).
When Dembski turns his attention to the chances of evolving the 30 proteins of the bacterial flagellum, he makes what he regards as a generous assumption. Guessing that each of the proteins of the flagellum have about 300 amino acids, one might calculate that the chances of getting just one such protein to assemble from “random” evolutionary processes would be 20-300 , since there are 20 amino acids specified by the genetic code. Dembski, however, concedes that proteins need not get the exact amino acid sequence right in order to be functional, so he cuts the odds to just 20-30, which he tells his readers is “on the order of 10-39″ (Dembski 2002a, 301). Since the flagellum requires 30 such proteins, he explains that 30 such probabilities “will all need to be multiplied to form the origination probability”(Dembski 2002a, 301). That would give us an origination probability for the flagellum of 10^-1170, far below the universal probability bound. The flagellum couldn’t have evolved, and now we have the numbers to prove it. Right?
Or were the calculations without any relevance?
Comment by PvM — July 25, 2006 @ 1:05 am
Note: this the corrected (i.e., unmangled) version of this post — Hannah
Salvador writes:
I am not sure I agree. Little of Dembski’s math has actually been
peer-reviewed, as he never publiishes his work in relevant
mathematical journals. Many of his ideas are similar to those in
well-established computational theory, but are still new enough that
they should be scrutinized on their own. Therefore, I think this
forum is as good a place as any to discuss his work, especially where
interested mathematically inclined students might be concerned. The
worst thing that can happen is that we might all learn something.
Again, this is just plain wrong, and I strongly encourage you to
re-read what Dembski actually says. First, we need to compute
Phi’({HHHHH….H}), the “descriptive complexity” of the event T
described by the words “All Heads”. For instance, let’s take Phi’(T)
to be the number of English words needed to describe T (this is very
similar to what Dembski is doing on p. 18 of the paper when computing
the specificity inherent in the flagellum). Thus, Phi’(T) = 2. Now,
recall that Phi_S(T) = card({U in Patterns(Omega): Phi’(U) <=
Phi’(T)}, where Patterns(Omega)=PS(Omega), the power set of Omega.
Note Phi_S(T) counts events in the event space Omega: that is
subsets U of PS(Omega) s.t. Phi’(U)<=Phi’(T)=2, in our example.
So, how many events U in the space Omega have Phi’(U)=2?. Well,
except “All heads”, there is “All tails”, {TTTTTT…..}. Then there
is “All Alternating”, {THTHTHTH…., HTHTHTHT…}. Then there is
“Head First”, a very large set consisting of {HTTTT…T, HTTTT…H,
… HHHHH….T, HHHHH…H}; the set contains 2^(n-1) elements, where n
is the number of coins. In short, there is exactly one event
corresponding to any description, though the event could be a very
large compound one. Now, assume we have w words in our language.
Then, only w^2 two-word descriptions are possible. Since there is at
most one event per any description, there are at most w^2 events U in
PS(Omega) s.t. that Phi’(U)<=2. Hence, Phi_S(T) <=w^2 .
Thus, the specificity of T given the null hypothesis H for our
semiotic agent S, Spec_S(T|H, C) is given by:
Spec_S(T|H) = -log_2(Phi_S(T) * P(H|T)) >= -log_2(w^2 * 2^-n)
(Notice the sign flip above due to negation of the logarithm.)
Thus, if N=500 and w=1e5, Spec_S(T|H) >= 466.78 bits. Or
equivalently, an event of “descriptive complexity” no higher than T
has a probability of less than or equal to 1e10 ** 2^-500 = 3.055e-141
of happenning, from S’s limited perspective.
I hope the above clarifies an important point in Dembski’s work. I
think understanding Phi_S(.)is crucial before we can discuss the
mathematical and practical shortcomings of CSI. But hey, what do I
know - I don’t even own “The Design Inference”! ;)
Comment by Leonid Meyerguz — July 25, 2006 @ 2:34 am
Hannah or Allen:
My post #35 above is all mangled: I had some “less than” signs in my mathematical formulas, and the software mistook them for HTML tags, cutting out large portions of the text. I followed up with a post that removed the offending tags, but it was intercepted by the nefarious SPAM filter. Would you mind, when you get a minute replacing #35 with the post in the SPAM filter? Thanks in advance.
Comment by Leonid Meyerguz — July 25, 2006 @ 2:45 am
Sal: If one argues natural selection improves the odds over random chance, it must justify that in view of Dembski’s displacement theorem which says evolutionary algorithims, without specific direction perform no better than random chance on average.
Which of course assumes that 1) Dembski’s displacement theorem makes sense (I argue it doesn’t) and that 2) random search (not to be confused with random chance) is hard. As I have shown, under the No Free Lunch theorem, random search is actually trivially effective.
So in other words, the displacement theorem does little to help out resolve the issues here.
Other than that…
Sal: For chance hypothesis H, we are trying to show it lead to a contradiction of the form: “E implies not-E” with X% amount of confidence.
This could benefit from some clarification. In fact, I’d say it is mostly content free as presented here.
Comment by PvM — July 25, 2006 @ 2:47 am
Sal: If one argues natural selection improves the odds over random chance, it must justify that in view of Dembski’s displacement theorem which says evolutionary algorithims, without specific direction perform no better than random chance on average.
Which of course assumes that 1) Dembski’s displacement theorem makes sense (I argue it doesn’t) and that 2) random search (not to be confused with random chance) is hard. As I have shown, under the No Free Lunch theorem, random search is actually trivially effective.
So in other words, the displacement theorem does little to help out resolve the issues here.
Other than that…
Sal: For chance hypothesis H, we are trying to show it lead to a contradiction of the form: “E implies not-E” with X% amount of confidence.
This could benefit from some clarification. In fact, I’d say it is mostly content free as presented here.
Leonid, interesting posting.
Comment by PvM — July 25, 2006 @ 2:47 am
PvM:
Thank you, but as of right now, the posting is completely broken, with crucial terms undefined. The (hopefully) correct version is still languishing in Limbo that is the SPAM filter; hopefully, one of our heroic moderators will rescue it before long.
As for your own comment, #38, I am with you on everything you say, except, possibly, the part about random search. I read your Panda’s Thumb article, and totally agree with it from a technical standpoint. However, it doesn’t address the scenario, promoted by ID proponents, that the combinatorially large majority of, say, DNA sequences have no utility whatsoever, and only an exponentially small (in sequence length) minority of sequences can give rise to some function. If the ID assumption is correct - and while there is no convincing to support it yet, I think it is plausible - and if this set of possibly functional sequences remained frozen in evolutionary time, then random search would indeed be woefully ineffective in locating such “targets”. So, I think the statement “random search is trivially effective” should be cautiously qualified.
Of course, evolution is not “random search” by any measure, and the “displacement theorem” is pure fluff when it comes to eliminating non-intelligent causes of bias in a search procedure (and, I think, when it comes to even defining what a search procedure is). So we don’t really need random searches to refute Dembski and company. :)
Comment by Leonid Meyerguz — July 25, 2006 @ 3:26 am
PvM says: “ID provides no specifics allowing one to calculate P(T|H) where H is the hypothesis that T is designed.”
This concerns me also. The usual method of hypothesis testing using likelihoods goes something like:
1) Observe some data D
2) Compute P(D|H0) for some sensible choice of null hypothesis H0
3) Compute P(D|Hi) for some range of competing hypotheses Hi
4) Compare the values obtained in 2) and 3) - bigger is better.
So the obvious question is how to find P(D|H) when H is the design hypothesis. Without that, we seem to be having a barbeque but missing the beef, as it were.
At present CSI seems simply like a way to handwave around the hard work required in step 3, and jump straight from a small value under one hypothesis to a conclusion. Perhaps there’s something to it I’m missing, however. I’d be interested to hear more from the ID proponents.
Comment by MartinM — July 25, 2006 @ 6:59 am
Perhaps I should point out that I ignored the issue of priors in my post above. If all we’re interested in is whether or not a given piece of data favours one hypothesis over another, that’s an unneccesary diversion. The sketch I gave above shouldn’t be taken as complete, by any means.
Comment by MartinM — July 25, 2006 @ 7:28 am
Leonid–
I don’t see any post from you in the spam filter. Has it been released? Else maybe you could resubmit it, or email it to me (netid=hom4) and I’ll replace the old one.
We’d love to have you (and anyone else in the vicinity) join us for an IDEA discussion meeting on this topic. It’s not that anymore long till the semester begins…
Comment by Hannah — July 25, 2006 @ 8:51 am
“random search (not to be confused with random chance) is hard. As I have shown, under the No Free Lunch theorem, random search is actually trivially effective.”
I must have missed this. Which post was that? Remember, it must be “trivially effective” in a large search space. The search space for a given 3 base-pairs to change in a 2 megabase genome is roughly 10^19. If I had to change 3 amino acids, it could be much more than that.
Comment by Jonathan Bartlett — July 25, 2006 @ 8:54 am
Hannah(23),
What biological parameters, that we can actually test and observe, it is characterizing then? If
nothing, as I think is the case, then it certainly seems to be an abstraction to me…
Sal(24),
It is highly suited to the origin of life question
And anything else we don’t know much about. Is there any example of CSI not relying on
ignorance, was my earlier question in comment 7.
Jonathan Bartlett hits the nail on the head in comment 29, with the very reason why CSI is useless:
But anyway, PvM has been doing a great job reviewing Dembski’s math - and if I’m not mistaken, he has a great post on this very topic on The Panda’s Thumb somewhere that might be worth reading.
Comment by Dan — July 25, 2006 @ 11:12 am
Just an interesting tangential topic related to the probability of peptide (short chains of amino acids that make up proteins) formation. Here an inventive scientist somehow bypassed the slow, gradual, and improbable process. Is this kind of work factored into the equations Dembski uses when calculating chance?
http://www.firstscience.com/site/articles/genesis.asp
“Genesis by comets is a controversial idea, but it has received an important boost with the knowledge that a NASA supported experiment has revealed that complex molecules hitchhiking aboard a comet could have survived an impact with Earth.
“Our results suggest that the notion of organic compounds coming from outer space can’t be ruled out because of the severity of the impact event,” says Jennifer Blank, a geochemist at the University of California, Berkeley. Blank and colleagues simulated a comet collision by shooting a soda-can sized bullet into a metal target containing a teardrop of water mixed with amino acids - the building blocks of proteins.
Not only did a good fraction of the amino acids survive, but many polymerized into chains of two, three and four amino acids, so-called peptides. Peptides with longer chains are called polypeptides, while even longer ones are called proteins.
“The neat thing is that we got every possible combination of dipeptide, many tripeptides and some tetrapeptides,” said Blank. “We saw variations in the ratios of peptides produced depending on the conditions of temperature, pressure and duration of the impact. This is the beginning of a new field of science.”
Comment by Mike Hannigan — July 25, 2006 @ 11:28 am
In evaluating CSI, even the simplest ball park estimates could provide significant insight. e.g., see:
Ouzounis CA, Kunin V, Darzentas N, Goldovsky L. A minimal estimate for the gene content of the last universal common ancestor–exobiology from a terrestrial perspective. Res Microbiol. 2006 Jan-Feb;157(1):57-68. Epub 2005 Dec 19
Even an extremely optimistic estimate of 50% probability per gene family suggests probabilities less than 1/2^1000, which I think is a tad bit beyond Dembski’s Universal Probability Bound of 10^120!
Good luck in trying to showing abiogenesis to a minimum self reproducing cell within the UPB - AFTER which you get a chance to invoke the wonderous powers of natural selection.
Comment by David L. Hagen — July 25, 2006 @ 11:59 am
David,
Thank you for the opportunity to revisit the topic of creationist abiogenesis - that’s topic I’ve been looking for an excuse to segway into for a few days now.
Please see my post on revisiting creationist abiogenesis for my response to your comments.
Comment by Dan — July 25, 2006 @ 12:33 pm
Hannah,
I just resubmitted post #35. It’s still not showing up. Please let me know if it’s stuck in the SPAM filter. Thanks again for your help.
Comment by Leonid Meyerguz — July 25, 2006 @ 12:36 pm
How did you compute this, and how did you decide it’s “optimistic”?
Comment by Don Baccus — July 25, 2006 @ 12:36 pm
Leonid–
It isn’t. I’m not sure what’s wrong here– what browser/operating system do you use?
Comment by Hannah — July 25, 2006 @ 12:47 pm
Regarding PvM’s notes on Dembski, Wolpert, NFL and co-evolution:
Dembski stated: Fitness among Competitive Agents: A Brief Note
Dembski, William A. (2002) No Free Lunch: Why Specified Complexity Cannot be Purchased Without Intelligence (Lanham, Md.: Rowman and Littlefield).
Dembski, William A. (2005) “Searching Large Spaces: Displacement and the No Free Lunch Regress”
In his “jello” article on Dembski, David Wolpert, author of the NFL theorems then stated:
Wolpert, D.H. Macready, W.G. Coevolutionary free lunches IEEE Trans. Evolutionary Computation, Dec. 2005 V 9, #6 pp 721- 735
“In contrast to the traditional optimization case where the NFL results hold, we show that in self-play there are free lunches: in coevolution some algorithms have better performance than other algorithms, averaged across all possible problems. However, in the typical coevolutionary scenarios encountered in biology, where there is no champion, the NFL theorems still hold.”
Dembski responded in Fitness among Competitive Agents: A Brief Note
I suspect PvM overestimates his claims on No Free Lunch (NFL)
Comment by David L. Hagen — July 25, 2006 @ 12:57 pm
All right, I’m not averse to going though it, but bear in mind I am doing my best to put the ideas in non-technical terms for the readers. You and I, given our backgrounds are capable of going into such a level of formalism that it become mostly undreadable except to the uninitiated. I’m trying to put in terms at least some people will understand. Thus if you see some of my descriptions to be simplistic, it is because I’m not trying to kill the readers with formalisms.
Leonid,
I’m afraid you are presuming I didn’t understand what Dembski said given that I tried to give a simplified account of how to calculate Phi_S(T), but I don’t think that is the case.
But if we have to go formal, so that we don’t talk past each other, we go formal. I will hope the readers will forgive the rigor, otherwise we’re not going to resolve the impasse….
Dembski gives the definition of Phi_S(T) :
First of all Phi_S(T) is an integer number, it is the cardinality of a subset of Omega, call it |T_Superset| , T is subset of an appropriate T_superset
to clarifiy our notations
| T_Superset | = CARD (T_superset)
The cardinality of a set is merely the number of elements in tha set.
Again, I’m going back to our example of 500 fair coins. Consider the set T_all_heads it has one member. I’ll represent it as such with spacing for clairty only:
T_all_heads =
{
HH HH HH HH ……
};
likewise
T_all_tails =
T_all_tails =
{
TT TT TT TT……
};
T_all_tails_OR_all_heads =
{
HH HH HH HH ……,
TT TT TT TT……
};
CARD(T_all_heads) = 1
CARD(T_all_tails) = 1
CARD(T_all_tails_OR_all_heads) = 2
moving on
T_all_tails_OR_all_heads_OR_THrepeat_OR_HTrepeat =
{
HH HH HH HH ……,
TT TT TT TT……,
TH TH TH TH ……,
HT HT HT HT …..
}
CARD(T_all_tails_OR_all_heads_OR_THrepeat_OR_HTrepeat) = 4
however note
T_repeat_first_two_coins = T_all_tails_OR_all_heads_OR_THrepeat_OR_HTrepeat
Thus, the same decompression algorithm (independent of whatever language), will generate all members of T_all_tails_OR_all_heads_OR_THrepeat_OR_HTrepeat when only a fraction (2 coins) is provided as input.
Thus “repeat_first_two_coins” ojbectively describes that T_all_heads is can be adequately characterized by Phi_S(T) = 4 even though 4 is overkill since Phi_S(T) = 2 is sufficient for T_all_heads. One can see by way of extension for T_all_heads, letting Phi_S(T) = 10^30 is adequete, but absolute overkill — i.e, one seeing 500 heads coins will still infer design whether Phi_S(T) =2 or Phi_S(T) = 10^30….
Two issues characterizing Phi_S(T)
1. number of elements of an appropriate T_superset
2. amount of info from the description of each elements of T_superset needed for Semiotic agent to generate all elements of T_superset
#2 wasn’t explicitly addressed in Dembski’s paper, but as rule, if half of the bits in a string define the other half unequivocally, then chance is probably not at work, thus any algorithmic compression/decompression that can operate with no more than half of the bits string, such a semiotic description will probably identify good T_supersets
I think, Leonid, you’re making this a million times harder than it needs to be. Further CSI does not live or die on ones inability to calculate Phi_S(T), being able to calculate Phi_S(T) is merely icing on the cake.
PS
I was a bit disconcerted to see fractional numbers in your calculation above, CARD(T) should always be an integer. Did I read your calculation correctly? I may have misunderstood.
Comment by Salvador T. Cordova, IDEA GMU — July 25, 2006 @ 1:12 pm
David Hagen: I suspect PvM overestimates his claims on No Free Lunch (NFL)
I suspect that morely likely Dembski overestimated the relevance of NFL theorems to his claims.
Bartlett: I must have missed this. Which post was that? Remember, it must be “trivially effective” in a large search space. The search space for a given 3 base-pairs to change in a 2 megabase genome is roughly 10^19. If I had to change 3 amino acids, it could be much more than that.
In fact, random search’s efficiency under NFL does not depend on the size of the search space either.
In No Free Lunch Theorems and random search, I showed how Tom English showed that
Let’s first look at Dembski’s claim
This may suggest to the uninformed reader that thus evolutionary algorithms cannot really work…
Imagine their surprise when they find out that
Source: English T. (1999) “Some Information Theoretic Results On Evolutionary Optimization”, Proceedings of the 1999 Congress on Evolutionary Computation: CEC99, pp. 788-795
Comment by PvM — July 25, 2006 @ 1:20 pm
49 Don: If the “ball park” probability of 50% for obtaining a “gene family” by abiogenesis in a closed system of natural forces is not “extremely optimistic”, may I suggest trying 50% each per ball park of 300 codons or nucleotides in typical gene or protein to nominally give 2^300 per gene family. Then estimate the cumulative probability for 1000 gene families! I expect this new “wildly optimistic” ball park estimate is still a tad shy of the UPB of 10^-120. It appears to takes amazing “faith” in evolution to ground one’s career on such “high” probabilities.
See Sal and Hannah’s discussions for formal math and CSI etc. If you want to delve into serious probabilities, may I suggest you read (Sir) Fred Hoyle, The Mathematics of Evolution, 1999 ISBN 0-9669934-0-3. As I recall, he suggests 10^-4000 for the probability of evolution.
After that, you could look at the probabilities of population genetics popularly reviewed by geneticist John C. Sanford in Genomic Entropy and the Mystery of the Genome, 2005 (with serious abstracts in the appendix and full references.) He concludes that “The Emperor has no clothes.”
If that still does not satisfy, may I recommend taking up betting in Vegas. I expect the Casinos would be absolutely delighted to take you up on your estimate of the odds of abiogenesis and evolution. They would probably even give graduate courses and diplomas in the College of Hard Knocks.
Comment by David L. Hagen — July 25, 2006 @ 1:24 pm
Leonid I read your Panda’s Thumb article, and totally agree with it from a technical standpoint. However, it doesn’t address the scenario, promoted by ID proponents, that the combinatorially large majority of, say, DNA sequences have no utility whatsoever, and only an exponentially small (in sequence length) minority of sequences can give rise to some function. If the ID assumption is correct - and while there is no convincing to support it yet, I think it is plausible - and if this set of possibly functional sequences remained frozen in evolutionary time, then random search would indeed be woefully ineffective in locating such “targets”. So, I think the statement “random search is trivially effective” should be cautiously qualified.
Which is why it is qualified “under the NFL assumption”. The posting on PT shows that contrary to Dembski’s suggestion, under the NFL theorem, random search is trivial. And I think Dembski must have realized this since in Searching Large Spaces, he changes from the NFL assumptions of averaged over all fitness functions, to a ‘needle in the haystack’ kind of search.
Leonid Of course, evolution is not “random search” by any measure, and the “displacement theorem” is pure fluff when it comes to eliminating non-intelligent causes of bias in a search procedure (and, I think, when it comes to even defining what a search procedure is). So we don’t really need random searches to refute Dembski and company. :)
Of course we agree, but showing how under Dembski’s own assumptions, random search is actually quite efficient is just the icing on the cake.
ID’s hopes that sequence space is constrained to a small part may also be undermined by the following findings:
1. Gavrilets has shown that fitness landscapes become ‘Holey Landscapes’ when the dimensions increase. Such seems to be the case for DNA sequence space for instance.
2. In case of RNA it has already been shown that sequence space is ’scale free’ and that structures extend throughout sequence space and are connected via neutral networks that extend throughout sequence space.
It’s just an added ‘bonus’ that such scale free networks also explain robustness as well as evolvability, modularity and various other relevant aspects to evolution.
Combine this with coevolving fitness functions and the fact that evolvability (read neutrality) itself is selectable and one may start to understand how evolution has been so succesful. Oh yes, scale free systems can be explained ‘trivially’ via the processes of gene duplication and preferential attachment. Imagine the surprise that these processes can in fact be observed in the genome…
Comment by PvM — July 25, 2006 @ 1:34 pm
David Hagen: As I recall, he suggests 10^-4000 for the probability of evolution.
As is so often the case with probability theory, it is trivial to make something improbable, it’s much harder to find the relevant hypothesis that makes evolution possible and plausible. Just note Dembski’s calculations about the probability of flagellar proteins forming the flagella.
But it seems that ID may be slowly retreating to the probability arguments of its creationism roots while abandoning the unnecessary transformation of probability into ‘complexity’ or ‘information’, leading to much confusion with how these terms are used commonly in science.
Comment by PvM — July 25, 2006 @ 1:39 pm
PvM
Exactly. Dembski’s SC formula is a general hypothesis test. He tells us to use it to falsify hypotheses of “Darwinian and other material mechanisms”, but there’s no reason that it can’t be used to falsify any hypothesis, including one of design. Let’s do it.
Let E be the evolution of bacterial flagella, and T be “motor-driven propeller,” and H is the design hypothesis. The only designers we know of are human, and the probability of a human existing hundreds of millions of years ago, much less one with nanobiotechnology skills, is close to zero. So specified complexity obtains and the design hypothesis is falsified.
But shouldn’t we consider the possibility of an unknown non-human designer? No, says Dembski: Appealing to the unknown to undercut what we do know is never sound epistemological practice.
(If IDers object that my analysis is hardly rigorous, I reply that such is the nature of Dembski’s method.)
Comment by secondclass — July 25, 2006 @ 1:46 pm
Even the Needle in the haystack search may not be that hard as long as neutrality is present.
Tina Yu, Julian Miller Finding Needles in Haystacks is Not Hard with Neutrality (2002) European Conference on Genetic Programming
Abstract. We propose building neutral networks in needle-in-haystack fitness landscapes to assist an evolutionary algorithm to perform search. The experimental results on four different problems show that this approach improves the search success rates in most cases. In situations where neutral networks do not give performance improvement, no impairment occurs either.
We also tested a hypothesis proposed in our previous work. The results support the hypothesis: when the ratio of adaptive/neutral mutations during neutral walk is close to the ratio of adaptive/neutral mutations at the fitness improvement step, the evolutionary search has a high success rate. Moreover, the ratio magnitudes indicate that more neutral mutations (than adaptive mutations) are required for the algorithms to find a solution in this type of search space.
and
Vesselin K. Vassilev, Julian F. Miller The Advantages of Landscape Neutrality in Digital Circuit Evolution (2000)
I hope this shows that without specifics, one cannot make the general claim made by Dembski based on the NFL theorems that evolutionary search is hard/impossible.
And we agree that the ‘displacement argument’ is flawed as well.
Comment by PvM — July 25, 2006 @ 1:47 pm
Hannah,
This is very interesting. I’m using Firefox with Win XP Pro, and up until now, I’ve had no trouble posting. The only reason I can think my post keeps getting lost is because I’ve started using HTML escape sequences in (for the “less than” and “greater than” signs). Do you mind if I just email you the text, and you try to submit it yourself?
Comment by Leonid Meyerguz — July 25, 2006 @ 1:50 pm
A quick example to illustrate a problem with English language specifications:
Let’s apply the SC formula to the specification “Himalayas”. This word identifies a set of hundreds of mountains, each with distinct characteristics. Given our understanding of geological processes, the odds that these exact mountains would appear are miniscule. Thus P(T|H) is small, and so is phi_s(T) since T contains only one word. So, according to Dembski, the Himalayas are designed.
Comment by secondclass — July 25, 2006 @ 1:50 pm
Please do.
Comment by Hannah — July 25, 2006 @ 1:54 pm
Second Class: You make a good suggestion. Especially since ID claims that intelligent design can generate CSI. For this one has to show that indeed, CSI as calculated by Dembski’s formulate shows that P(T|H) where H is the design hypothesis is both likely and can generate sufficient CSI. It’s insufficient to claim that since ID can explain the unlikely to become likely, ID can generate CSI, one needs to show the actual pathways and show that under such a hypothesis, P(T|H) meets the requirements.
Now we are faced with the following conundrum:
For design to generate CSI, P(T|H) has to be sufficiently small, but for design to generate a plausible hypothesis, P(T|H) has to be sufficiently large or it will lose to other hypotheses. It’s also not sufficient that P(T|H) is larger than other hypotheses without considering the null hypothesis that we have missed a relevant hypothesis. Dembski ends his specification paper with the following
There are several objections to Dembski’s claim
1. Unknown hypotheses indeed may carry no value, thus it is essential that the design hypothesis is specified.
2. Independent information about the designer(s) may not be necessary but this will render the design inference inherently unreliable.
3. Design always is a possibility, it’s just that when it comes to providing relevant hypotheses for design, things a posteriori fall apart.
Comment by PvM — July 25, 2006 @ 1:56 pm
David Hagen states, among other things in his personal attack on me:
Since I live, betting on process that led to my being here would be a bit like betting on the winner of the first superbowl…
As PvM implied above, it’s easy to pick numbers out of thin air that make things seem incredibly improbable.
That’s all you’re doing …
Comment by Don Baccus — July 25, 2006 @ 2:24 pm
Salvador writes:
Salvador, Hannah was kind to fix my post #35 above, so now it (hopefully) makes sense. I am afraid that you still appear to be seriously misunderstanding at least a part of Dembski’s argument: I strongly encourage you to read my post above and Dembski’s paper; especially his definition of Phi_S(T), the specificational resources, on page 17, and his example computing the specificity of the flagellum, on page 18.
Still, let us walk through the example you gave, and I’ll point out the specific errors you are making.
You are right that is an integer - I never suggested otherwise - but it is not the cardinality of a subset of Omega. Rather, it is the cardinality of a subset of PS(Omega) - that is, the numbers of subsets of Omega (events) matching a certain descriptive complexity in the reference frame of S. Again, I urge you to review Dembski’s definition of Phi_S(T), and recall - as you correctly pointed out - that Patterns(Omega) is just PS(Omega).
Just to give you an impression of the “magnitude” of your error, consider that for n fair coins, the size of the even space Omega is 2^n, so no subset of Omega can have cardinality above 2^n. However, the size of PS(Omega) is 2^(2^n) - the number of all possible distinct subsets of Omega. Phi_S(T) potentially ranges between 0 and 2^(2^n), though in Dembski’s examples it is always relatively small even when compared to 2^n.
Your discussion of T_repeat_first_two_coins is absolutely correct, but you have to realize that the description “Repeat First Two Coins” corresponds only to a single event (subset of Omega) T that contains four members: T = {HHHH…, TTTT…, HTHT…, THTH…}. But for the purposes of computing Phi_S(T), we don’t care about the Card(T): rather we care about the number of events with the same descriptive complexity as T. Following the example I gave in post #35, the descriptive complexity of T, Phi’(T)=4 - four words used by our fixed agent S to describe T. Thus, given S has a vocabulary of w=1e5 words, and since each description corresponds to at most one event in Omega, there are less than w^5=1e25 events in Omega whose descriptive complexity is less than or equal to that of T. So, Phi_S(T) is less than 1e25, and the specificity of T can be bounded from below by -log_2(1e25 * 4 * 2^-500) = 414.952.
Not at all - that is the crux of the error you are making. The number of elements in the appropriate T_Superset is only needed to compute the P(T_superset | H), which Dembski argues allows us to rule out any subset of T_superset as a plausible chance event, assuming a sufficiently low value for both P(T_superset | H) and Phi_S(T_superset). The number of elements in T_superset is irrelevant for computing, or rather, bounding Phi_S(T) - that quantity depends only on the number of symbols in S’s communication system and the length of the description of T, Phi’(T). (Again, the function should be labeled Phi’(T,S), to underscore its dependence on S.) Dembski fails to make this explicit, but it is crucial to understanding his work.
I can’t really parse this, but I think what you are trying to describe here is Dembski’s “descriptive complexity” - roughly, length of a description - of T, represented by S’s function Phi’(T). If that is the case, you are correct.
I think, Salvador, I am making this at at least as hard as it needs to be, given that Dembski’s CSI formula does live or die with Phi_S(T) - substitute a very high value if you don’t believe me - and given how vaguely it is defined, and how subjective it is at its core. Certainly you, despite your obvious support and admiration for Dembski’s work, have so far shown that you clearly misunderstand of what he means by “specificational resources”. Surely you won’t claim that the notion of “specificational resources” is non-essential to calculating the value of specificity and the amound of CSI?
You misunderstood, though it is probably not your fault - until Hannah posted the correct version a while ago, my post was very badly mangled. But, no, the fractional values represent probabilities and specificities: Phi_S(T) is, of course, an integer.
Comment by Leonid Meyerguz — July 25, 2006 @ 3:14 pm
Offtopic Since some of our posts are beginning to look rather ugly– how many people would write in LaTeX if I could get LaTeX capabilities here?
Comment by Hannah — July 25, 2006 @ 3:21 pm
Hannah wrote:
Hmmm … You wouldn’t, by chance, be talking about me, would you? ;) But, yes, I think LaTeX is a great idea if other people would agree to it.
Comment by Leonid Meyerguz — July 25, 2006 @ 3:37 pm
[Raises hand]
Comment by secondclass — July 25, 2006 @ 3:44 pm
I’ll third that.
Comment by MartinM — July 25, 2006 @ 4:15 pm
Sal,
From my seat, reading you and Hannah arguing NFL math with Pim and Leonid is akin to Godzilla and Mothra battle Hedorah and Megalon!
You shouldn’t dumb down your formal arguments for readers such as myself. I don’t mind not completely grasping the formal terms, for the time being. Part of my job is making the technical plain for administrators, so I’ll just ask questions to try and get at some non-formal understanding. If you algorythmic geeks will indulge me, that is. I’ve got some questions, but have to run for now.
Comment by Todd — July 25, 2006 @ 4:21 pm
Time out please!
Is U an element of patterns(Omega) or is U a subset of patterns(Omega). Until we resolve this, the discussion is going nowhere.
I read it U is an element of patterns(Omega) or equivalently U is an element of PowerSet(Omega). Which implies U is a subset of Omega.
Notation for member (element) of a set can be found at Set theory.
Notation for subset
Thanks.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 25, 2006 @ 4:26 pm
I’m still waiting for a valid finding in biology that these mathematical derivations are applicable to…
… I guess I’ll be waiting a long time…
Comment by Dan — July 25, 2006 @ 4:30 pm
Thus {U … | …} is a set within powerset(Omega).
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 25, 2006 @ 4:37 pm
Sal writes:
No problem: set theory can be quite confusing. U is an element of PS(Omega). On the other hand, the set {U in PS(Omega): U has some propery P} is a subset of PS(Omega), which is what I was referring to in the text you bolded above. It is important to keep in mind that PS(Omega) is a set of sets (sometimes called a family of sets to avoid, or in my view, add to, confusion). Thus elements of PS(Omega) are sets, and subsets of PS(Omega) are sets of sets. So, the “cardinality of a subset of PS(Omega) with property P”, means “number of elements (sets) in PS(Omega) with property P”, or alternatively “the number of subsets of Omega that have property P” (since PS(Omega) is a set of all possible subsets of Omega).
Hope this makes my earlier posts easier to parse. Or at least not anymore confusing.
Comment by Leonid Meyerguz — July 25, 2006 @ 5:06 pm
Dan wrote:
A very, very long time. Don’t hold your breath. However, I think discussing Dembki’s mathematical formulations is useful because a) it can show how impractical such formulations are when we try to apply them to real-world phenomena, and b) it’s fun (and for certain definitions of useful, “fun”=”useful”). Besides, where better to discuss specified complexity than in a thread title “Specified Complexity”?
Comment by Leonid Meyerguz — July 25, 2006 @ 5:13 pm
I suppose…… but my mind keeps coming back to if SC is irrelevant (because it’s not connected to anything real), then why bother with it?
Comment by Dan — July 25, 2006 @ 5:30 pm
Thank you. In the post previous to yours I indicated I read “{” as “|”. I should have realized that earlier….
I am accostomed to seeing something like A = {….}, the shorter form threw me off.
In light of that, my earlier calculation of Phi_S(T) needs some revision.
I deliberately chose special cases for Phi_S(T) so that objectivity could be put into the issue where there was some independence from the language choice.
I believe that detecting algorithmic compressibilty given a number of inputs from a description of each element of T would permit characterizations of Phi_S(T) for algorithmically compressible cases.
I’m asserting a conjecture that all elements of A = {U element patterns Omega satisfying Phi_S’(U)
Comment by Salvador T. Cordova, IDEA GMU — July 25, 2006 @ 6:10 pm
My last post was cut off (the less than or equal to I think threw the weblog software off):
I’m asserting a conjecture that all elements of A = {U element patterns Omega satisfying Phi_S’(U) less than or equal to Phi_S’(T_repeat_first_two_coins) } less than or equal to Powerset (T_repeat_first_two_coins).
I think that is objectively defensible since all members of T_repeat_first_two_coins are essentially describable by the same semiotic description, independent of the language in use. Any objections?
I am thus revising my calculations:
not merely
CARD(T_repeat_first_two_coins) as I had earlier stated.
I believe algorithmic compression analysis is the most objective way to characterize Phs_S. Beyond that, characterization of Phi_S(T) will require further exploration. Hopefully it might be apparent how this principle might be generalized to less trivial cases.
Nevertheless, I do not think CSI lives or dies on the ability to characterize Phi_S(T). There is recourse to pre-specifications in case Phi_S(T) cannot be ascertained for generalized specifications.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 25, 2006 @ 6:35 pm
It probably hasn’t occured to you, but we actually don’t all have infinite amounts of time, and some of us have in fact our own studies and research. That would suggest there might be a limit as to how much you can reasonably ask of another person.
Please spread that meme to other ID proponents, such as William Dembski and Michael Behe. They have both made such requests.
Comment by ivy privy — July 25, 2006 @ 7:42 pm
I object to students being interrogated like this. The questions should not be directed at Hannah.
1) It was my understanding that folks from the IDEA club are co-presenting in the course, so that may be an inaccurate characterization of Hannah’s status, even if she is a student in the course (which I don’t know)
2) Hannah started this thread to defend the definition of CSI. That gives her a status obove that of a mere onlooker or learner.
Comment by ivy privy — July 25, 2006 @ 7:48 pm
Salvador:
I probably won’t have time to address your latest post in detail tonight - though I will if I get the chance - but for now, I urge you to re-read Dembski’s work, and to consider the definition of Phi_S(T) specifically. Your latest proposal, while slightly closer to the truth, is still unworkable. In particular, you write:
This is still wrong. The most important point to understand is that “specificational resources”, Phi_S(T), cannot be defined as a function of the size of the set T. For example, using your definition above, I would define Phi_S(”Any Sequence”) = 2^Card(Omega) = 2^(2^n) - the largest possible value for Phi_S - even though I need an extremely simple pattern to describe this sequence. Furhtermore, under your definition, all small sets T will have the same value of Phi_S, regardless of whether or not you could use a simple algorithm to generate their elements.
So, again, you cannot define Phi_S(T) in terms of Card(T). Rather, define it by counting other event descriptions (or, if you will, algorithmic representations) that are as short as the representation of T. If you think you can divorce the notion Phi_S(T) from any form of a coding system, I suggest you review Dembski’s definition: you will be disappointed. Even if you wish to use algorithmic complexity, you will need to pick a coding system: namely, the reference programming language (or, more accurately, Universal Turing Machine). So, pick one - I suggest regular expressions for the fair coin case - define the function Phi’(T), and proceed from there. Count all the representations in your coding systems that are shorter Phi’(T). Voila! That’s how you obtain un upper bound on Phi_S(T).
I also recommend reading Shallit & Elseberry’s paper, particularly the first Appendix. They actually do give a workable definition of specified complexity in terms of Kolmogorov complexity. See my post #295 in the “Analogy, Induction, and Specious Arguments” thread for some clarifications.
Comment by Leonid Meyerguz — July 25, 2006 @ 8:43 pm
Leonid,
Thank you for reading my post and offering your suggestions. I have very early on, prior to Elsberry and Shallit suggested algorithmic compression as one method of specification. Wigner in 1960’s as well….
But recall, I have insisted repeatedly that if Phi_S(T) is not tractable, one has recourse to pre-specification. Pre-specification is far less controversial. I have even proposed one architecture that is pre-specified, namely the Turing machine. We could also use human artifacts or common practice for solving functional problems in engineering as sources of specifications. The code/decode metaphor, the lock-key metaphor, etc. are so prevalent in engineering and computer science, I hardly think we need to worry over the scarcity of pre-specification to project onto biology.
For that matter they are doing linguistic pattern search on all sorts of things in biology. I don’t think chance hypotheses can account for certain linguistic structures. We’re already using pre-specification detection in biology without any of the metaphysical baggage….for example you are probably aware of these recent developments: Hidden codes within codes.
Now, I’m sure you have no reason to describe linguistic structures (like codes) as obeying a chance hypothesis?
I do not think Phi_S(T) as a concept is yet mature enough. It should be discussed and explored, and I don’t not think Dembski represented his internet musing as Gospel.
I can understand the confusion, but I deliberately chose descriptions which would include the most minimally complex patterns (All_heads and All_tails), thus if it did not include these, Phi_S(T) would be invalid according to the way I constructed Phi_S(T). I don’t know of any other way to even begin to construct it objectively.
The reason I chose high symmetry strings, is that even though other patterns are algorithmically compressible, one gets into difficult issues of how much of the compression algorithm was a post-dictive projection onto the data.
If this is giving everyone too much indigestion, there is no reason IDers must adopt Phi_S(T). It has been laid out as a something to consider. But as I said, pre-specification comes with far less issues, and that is what is workable for now, and used unwittingly in practice already…
That is why, I rather focus on finding ANALOGIES to engineering, because ANALOGIES are sources of pre-specifications.
Comment by Salvador T. Cordova, IDEA GMU — July 25, 2006 @ 9:31 pm
Todd wrote in #12
There has been a lot of tooing and froing with coin tosses, why not try a real, simple biological example.
Bacterial TEM betalactamases (for example S. typimurium TEM, NCBI entrez acession number AAS18375, 286 aa long) can hydrolyse cefotaxime with high efficencey after 4 mutations EA42G, E104K,M182T, and G238S
Calculate the CSI involved in going from TEM to cefotaximase.
Comment by Ian Musgrave — July 25, 2006 @ 11:29 pm
David L. Hagen wrote in #46 (after quoting a paper on the LUCA)
The Last Universal Common Ancestor is NOT the first living thing produced by abiogenesis. The LUCA is the last ancestor of archebacteria, eubacteria and eukaryotes. It occurs well after abiogenesis (and the RNA world, origin of self replication and the DNA-RNA transition), the appearance of the first things we would think of as cells and the exhaustion of prebiotic resources of complex organics.
Comment by Ian Musgrave — July 25, 2006 @ 11:37 pm
Thanks for the clarification Ian. However, I expect the complexity to be similar. It has to involve a genome, genomic duplication, cell structure replication, and sustainable conversion of abiotic energy to biotic energy such as ATP via ATPase (Whether from photosynthesis or hydrocarbon conversion.)
Comment by David L. Hagen — July 25, 2006 @ 11:46 pm
63 Don
Apologies for my remarks being taken as a personal attack. None was intended. The casino comment was to give an example of those who make real life calculations of probability and who ensure the odds are for them. Mark Ritchie, God in the Pits, describes other examples of Chicago Board of Trade commodity brokers who make their living by probabilites.
That appears to involve the metaphysical assumption of methodological naturalism and that abiogenesis happened. That is the subject of the debate of ID vs Evo origin theories.
I understand 50% to be “extremely optimistic” compared to a conservative guestimate of the order of 2^-300 for the CSI of the function of a typical gene or protein. I expect the probability of Complex Specified Information in defining the function of each of the 1000 “genome families” to be alot closer to 2^-300 than 50%. Thus my first “ballpark” estimate giving the order of magnitude estimates that are far beyond the Universal Probability Bound are still highly conservative compared to what actual CSI of the functions will be.
If you wish a more precise estimate of the probabilities of genome strings, may I refer you to Hubert P. Yockey, Information Theory, Evolution and the Origin of Life, 2005, Cambridge Press. Yockey cites Pasteur’s
Yockey states:
Like Dembski, Yockey (p 117) observes
. I.e., Yockey is identifying a necessary biotic function that can be used to calculate biotic CSI necessary before probability calculations.In Section 6.4 Yockey
In section 7.1 Yockey calculates the probability of the current DNA code and observes
These two factors alone exceed the Universal Probability Bound. I leave it to you to decide how much beyond the UPB is worth estimating the probability of CSI for the function of the rest of the more than 900 genomic families required for the simplest self replicating cell.
In Section 4.1 p 29, Yockey calculates
i.e., As a rule of thumb, there are about 2.06 bits/amino acid. or triplet codon. So an 300 amino acid seqeuence nominally involves about 2^(2.06*300) or 2^618 (or 618 bits, if my late night math doesn’t have too many errors. (Correction to previous post: I should have said “each per ball park of 300 codons or amino acids”).May I recommend that you dig into Hoyle, Yockey and Sanford (rather than blindly accepting PvM’s superficial dismissals.)
Is 1/2^1000 still not be “extremely optimistic” for the CSI of the functions of the 1000 gene families in a minimal self replicating cell?
Comment by David L. Hagen — July 25, 2006 @ 11:49 pm
Hannah,
My sincere apologies here. Some of the things in Bill Dembski’s paper had somethings I did not have a chance to clarify with him yet as his books have usually been my principle source of information.
Sometimes the discussions between he and I take months as I try to be very sparing of how much I indulge his personal help….
The T1, T2, T3 on top of page 17 was a bit unfortunate….those were unfortunately NOT the same T’s in Omega. This is the first time I ever had serious issues decoding his notation. And as is apparent, some of the conclusions Leonid and I were getting from this were outrageous. I kept thinking to myself, Bill couldn’t possibly have meant this! Cardinalities of large Power Sets!
Turns out, he didn’t mean it the way Leonid and I were supposing…. :=)
However, I figured out one way to calculate a bound on Phi_S(T) indirectly as a general principle. The figure I get is 10^80, Bill’s was 10^30. Mine of course is far more conservative than Bill’s but should be informative nonetheless. My method is alternate to Bill’s however.
If we have a specifications, say 500 bits each, how many of these specifications can we possible write down assuming we have perfect language?
The universe has finite information storage capacity. That is, even if we humans could in principle use every atom in the universe to write down all our 500-bit specifications, we would run out of storage to do so. That would be on the order of 10^80.
Thus if one is determined to utilize a Phi_S(T), 10^80 is an outside number which bounds our specificational resources from a practical standpoint.
The 10^30 figure Bill uses requires more discussion, but the good news is we have a reasonable bound of 10^80 to start with.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 25, 2006 @ 11:51 pm
Hannah
Haven’t used LaTex. If you could, it would help to be able to insert an object such as PDF, JPG, GIF or TIFF. i.e., for those of us plebes using word processors.
PS *** Please restore Preview ***
Comment by David L. Hagen — July 26, 2006 @ 12:00 am
Sal: For that matter they are doing linguistic pattern search on all sorts of things in biology. I don’t think chance hypotheses can account for certain linguistic structures. We’re already using pre-specification detection in biology without any of the metaphysical baggage….for example you are probably aware of these recent developments: Hidden codes within codes.
Again Sal’s claim could benefit from some supporting evidence. Hidden codes, is a New York times article on the exiciting work done by Eran Segal et al published in Nature. While some IDers have shown some enthusiasm since the NY times article used the term ‘design’, on closer scrutiny, there seems to be little relevance to ID or even to specification or pre-specification.
The work is however quite interesting as it shows how nucleosomes may have a real function in the genomes. In fact, this may help resolve somewhat of a problem in biology namely how transcription factors which recognize genes, which are typically short and would match the DNA in multiple positions, do their work. Nucleosomes cause parts of the DNA to be shielded from ‘access’.
Since nucleosomes are fully natural and ’specified’ by laws of physics and chemistry, I doubt that there is much relevance to ID in these findings. Nevertheless, one should not be surprised if the news is met with some cheers from IDers. Once they get to read the actual paper, I wonder how much will be said…
Eran Segal’s homepage describes Segal’s work in more detail.
A Genomic Code for Nucleosome Positioning
Eran Segal, Yvonne Fondufe-Mittendorf, Lingyi Chen, AnnChristine Thastrom, Yair Field, Irene K. Moore, Ji-Ping Z. Wang, Jonathan Widom
Eukaryotic genomes are packaged into nucleosome particles, which occlude their DNA from interaction with most DNA binding proteins. Nucleosomes have higher affinity for particular DNA sequences, reflecting the sequences’ ability to sharply bend, as required by the nucleosome structure. However, whether these sequence preferences have a significant influence on nucleosome positions in vivo, and thus regulate the access of other proteins to DNA is not known. Here, we isolated nucleosome-bound sequences at high resolution from yeast, and used these sequences in a novel computational approach to construct and experimentally validate a nucleosome-DNA interaction model, and to predict the genome-wide organization of nucleosomes. Our results demonstrate that genomes encode an intrinsic nucleosome organization, and that this intrinsic organization itself can explain ~50% of the in vivo nucleosome positions. This nucleosome positioning code facilitates many specific chromosome functions, including transcription factor binding, transcription initiation, and even remodeling of the nucleosomes themselves.
Information theory and computational biology are excellent areas of science where science is making incredible progress in its understanding of evolution and its many mechanisms.
That Nucleosomes may have some relevance to DNA has long been speculated and these scientists basically provided compelling evidence in support of this thesis.
Nucleosomes are quite interesting: The proteins that build the scaffold of the nucleosome are called histones. They form a family of five major classes of histone proteins called H1 (H5), H2A, H2B H3, and H4. The amino acid sequences of histones are highly conserved during evolution indicating their critical function for the chromosome organization and control of gene expression with the highest freqeuncy of mutations found in H1 (H5). This histone type has a special function in the nucleosomal complex at the nucleosome surface.
Highly conserved proteins indeed.
I am not sure how Sal envisions how pre-specification plays a role in biology?
Comment by PvM — July 26, 2006 @ 12:07 am
Regarding Yockey’s “calculations”, and similar efforts: as was discussed in the midst of this informative ISCID thread, the information content of a functional polypeptide is not a linear (or any other sort of) function of polypeptide length. In other words, when Yockey estimates the “bits” of cyt C as he does, he uses a method that does not agree with experimental observation. I would recommend to David and others that a different metric be found.
This has a great impact on the application of the equations being discussed here to biological examples. And it is related (albeit rather indirectly) to the matter of state variables that I have mentioned once or twice.
Comment by Art G — July 26, 2006 @ 12:18 am
Leonid: However, it doesn’t address the scenario, promoted by ID proponents, that the combinatorially large majority of, say, DNA sequences have no utility whatsoever, and only an exponentially small (in sequence length) minority of sequences can give rise to some function. If the ID assumption is correct - and while there is no convincing to support it yet, I think it is plausible - and if this set of possibly functional sequences remained frozen in evolutionary time, then random search would indeed be woefully ineffective in locating such “targets”. So, I think the statement “random search is trivially effective” should be cautiously qualified.
Hi Leonid. I checked out the web and found some of your fascinating work in this area of what protein space looks like.
I’d love to hear more from you on this topic of what protein space looks like as it seems quite relevant to ID.
I also appreciate your mathematical expertise which has helped significantly in explaining to Sal the concepts of Sets and how to apply this to Dembski’s paper.
It’s always tempting that when the math gets tricky, one returns to ‘analogies’ or ‘pre-specification’ which are far less a mathematical argument and as such contain some probability that these concepts are applied inconsistently.
To those who do not know Leonid Meyerguz, he is affiliated with Cornell and has written some papers on a very interesting topic of ‘the temperature of evolution’. Basically the argument is that “native sequences of more than 200 amino acids have roughly constant temperature”. These data are suggestive of two distinct hypotheses.
Other interesting work on the very applicable topic of proteins includes his paper on “The Evolutionary Capacity of Protein Structures”.
I am looking forward to hear more about your ideas on these topics.
Comment by PvM — July 26, 2006 @ 12:20 am
David L. Hagen wrote in post #84
When your genome is self replicating RNA’s, which double as catalysts and genomes, then a whole lot of complexity disappears. Heck, even if you separate the genes and the replication machinery in an RNA world, there are whole suites of machinery (clamp loaders, helicases etc. etc.) that are completely irrelevant. When amino acids come free in the sea, you don’t need amino acid biosynthetic machinery, when amino acids physically complex with RNA, you don’t need specified loaders, when energy sources are based on inorganic pyrophosphate rather than ATP (as they are in some modern organisms in hydrothermal sites) you don’t need a suite of ATPases (but since something like 1 in 10^11 of all RNA ribozymes are ATP binding proteins, ribozymal ATP handling probably arose very early). When cell division is simple expansion and budding (as shown by David Deamer in several experimental models) then a whole raft of machinery is not needed. The first “living things” may have been something as simple as self-replicating RNA’s embedded in the surface of lipid films on rocks, not something we recognise as a cell.
The extensive Yockey quotations are irrelevant. Yockey is beating up a strawman. No one expects the first living organisms to have modern numbers of modern-looking complements of proteins assembled just by chance. Whether RNA world first, Protein world first or RibonucleoProtein worlds; all of these start of with small, simple systems (often 10 mer proteins in protein-first worlds) that develop complexity later. The exception is certain thermal protein scenarios, where large, higgledy-piggledy proteins with even handedness and weird branching, are quite passable catalysts (while modern proteins are all L isomers, D-L copolymers are quite reasonable catalysts, there is no need for exclusive L-amino acids at the start of life), but again, in the thermal co-polymer worlds, no one expects cytochrome-C, just a protein that can do electron transfer. Cytochrome C is a much later production.
However, this is getting off the point. Lets see a CSI calculation for a real, known, simple biological example, like TEM to cefotaximase.
Comment by Ian Musgrave — July 26, 2006 @ 12:43 am
David Hagen: May I recommend that you dig into Hoyle, Yockey and Sanford (rather than blindly accepting PvM’s superficial dismissals.)
By pointing out that Yockey’s ‘calculations; seem to have little relevance to the topic of evolution? It’s trivial to inflate probabilities to show that something is impossible, it’s the hard work of science that uncovers plausible ways to explain something.
Yockey and other creationist arguments of improbability become somewhat common place but one should not confuse them with evidence against the processes of chance.
In fact, I’d say they show similarly little hope that ID can explain them without further details as to the how, when etc. That’s the hard work one would expect from science.
As Korthof observes in his review of Yockey
One molecule of iso-l-cytochrome c can be formed spontaneously with a probability of 0.95 in 1.5 x 1044 trials. He adds some further conditions that lowers the probability and concludes that even if we believe that the buildings blocks are available, they do not spontaneously make proteins, at least not by chance
At least not by chance. But then again, who is arguing a pure chance pathway anyway?
What is even more relevant is that scientists have shown how processes of variation and chance can in fact create complex specified information (Schneider, Adami, Lenski) so we should not be too worried about these probabilities, as they can be overcome by adding the concept of selection for instance (not to mention necessities like constraints, history etc).
Of course science has to explain the Shannon information content in the genome, so far there seem to be little reason that they cannot.
Of course, only by arguing a pure chance scenario can one hope to ‘undermine’ evolutionary theory which does not even rely on pure chance alone.
Schneider points out that the concept of information is often misunderstood
For instance Yockey
Hubert Yockey (molecular biologist/information theorist) said (Thu, 26 Jan 1995 00:39:52 GMT)
Information’ is, of course, not the very opposite of randomness. Elitzur is using the word ‘information’ in the semantic sense as synonym for knowledge or meaning. Everyone knows that a random sequence, that is, one chosen without intersymbol restrictions or influence, carries the most information in the sense use by Shannon and in computer technology. …
or Dembski
William Dembski (creationist) in the book No Free Lunch (page not identified, please tell me if you find it!). imagined flipping a coin 1000 times to get 1000 bits of information. But since an unbiased coin could have two possibilities, the uncertainty before flipping is Hbefore = 1 bit. After flipping, one will see frequencies approaching 50%, so Hafter = 1 bit. Therefore the information gained is Hbefore - Hafter = 0 bits.
basically they are confusing Shannon entropy with Shannon information.
Is this btw the same Yockey who wrote the following in Response to FTE?
As I show in my book, Information Theory, Evolution and the Origin of Life (Cambridge
University Press, 2005), Darwin’s theory of evolution is one of the most well-established
theories in science. There is no need of an “Intelligent Designer” in evolution for the
following reasons:
Comment by PvM — July 26, 2006 @ 12:45 am
David Hagen: May I suggest you read (Sir) Fred Hoyle, The Mathematics of Evolution
That’s where he suggests that the Haldane dilemma is no real dilemma?
Korthof again
Hoyle’s efforts to investigate the famous ‘Haldane’s Dilemma’, also investigated by
Walter Remine, results in the surprising conclusion that “Haldane’s so-called cost principle is an illusion.” (p123). And since Remine uses Haldane’s dilemma as an important argument against neo-Darwinian evolution, Remine’s argument is an illusion. Furthermore Motoo Kimura ‘bases’ his Neutral theory of molecular evolution on a wrong result of Haldane’s calculation of the cost of substitution. According to Hoyle Kimura’s calculations of high costs only apply to a continuing declining environment. Maybe this is the most sensational chapter in the book. Hoyle’s worries about deleterious mutations in the human species proved prophetic. In January this year (1999) the geneticist James Crow (6) stated: “3 new deleterious mutations per person per generation. Why aren’t we extinct?”; “A way out is for mutations to be eliminated in bunches. This happens if selection operates such that individuals with the most mutations are preferentially
So much for the hopes by Sal and some other IDers that Haldane is a real problem for evolution.
Comment by PvM — July 26, 2006 @ 12:49 am
Salvador:
If I recall correctly, specificational resources has been a factor in Dembski’s design detection method since day one. It seems to me that to describe it as not yet mature enough is to contradict Dembski’s many assertions that SC is a reliable indicator of design.As to prespecifications, Dembski uses the term inconsistently. In section 5 of the Specification paper, he describes prespecifications as patterns that were manifest prior to the event, and then remanifested in the event. Salvador seems to use the term this way. But in the addendum, as quoted by Salvador, Dembski says that “with prespecifications, there is one and only one target,” making Phi_S(T) = 1. In other words, it is not enough for an event to manifest one of many pre-existing patterns; you have to actually narrow your set of acceptable patterns down to one before the event occurs (or before you’re aware of it). As far as I can tell, Dembski’s and Salvador’s examples of prespecifications do not meet this requirement.
Comment by secondclass — July 26, 2006 @ 2:18 pm
Ian Musgrave wrote:
Dembski defines the “measure of complexity” of CSI as, I= -log_2(#T/#Omega). In this case we have four mutations occurring. Assuming an average length of 300 a.a., and further assuming 20 different a.a. can be placed at each position, then I= -log(bs2)(1/20^300) which is roughly 1400 bits for just one mutation. For 4 such mutations to have occurred, the total bits of information would be 4 x 1400 = 5600 bits. It appears to be very complexified information.
Comment by Lino D'Ischia — July 26, 2006 @ 2:37 pm
Lino D’Ischia,
That’s assuming quite a bit. What happens if you don’t make those assumptions???
Comment by Dan — July 26, 2006 @ 2:49 pm
Lino,
Sorry - I just realized that my previous comment was a knee-jerk reaction, and that the assumptions are reasonable, so please ignore the assumption bit.
Still though, your “1400 bits of specified information” does not appear to contrain the available repertoire of mutations, it appears to de-constrain the directions in which a protein can evolve, making that protein highly “evolvable.”
Comment by Dan — July 26, 2006 @ 2:54 pm
Salvador writes:
Brilliant, Sal, just brilliant. Except - one teeny problem! Recall that the definition of context-independent specified complexity is
Chi = -log_2(10^120 * Phi_S(T) * P(T|H)).
Now, you are bounding Phi_S(T) from above by a constant, so (taking care to flip the inequality due to the negative sign):
Chi >= -log_2(10^120 * 10^80 * P(T|H)) = -log_2(10^200 * P(T|H))
Suddently, specified complexity depends on nothing but the probability of T under H. Dembski would weep - unless he is the one who suggested the idea to you in the first place, in which case he should weep.
For instance, let T be any singleton even corresponding to an outcome of 1000 flips of a fair coint. Than for any T, P(H|T)=2^-1000. Now, for any T, Chi(T) >= -log_2(10^200 * P(T|H)) = 335.614. Clearly, any sequence of random coin flips exhibits specified complexity, and therefore must be designed! I don’t know about everyone else, but I’ve just been struck with a revelation! Hannah, do you concur?
The important point here is that in order to utilize specified complexity, Phi_S(T) should be very high for random events - it cannot be subject to any arbitrary constant bound. One potentially viable interpretation of Phi_S(T) is the following: assume k is the minimum number of bits needed to encode a program that can enumerate the elements of T (in S’s programming language of chioce); then Phi_S(T) = 2^k. Phi_S(T), to be even remotely useful, must measure the number of all possible representations at a particular descriptive complexity; it cannot be be limited to only those representations that S can enumerate within a fixed period of time or store within a fixed amount of space.
Well, Sal don’t keep us in suspence - what are the T’s? The way I see it, one of the few ways for Phi’(T) to be a valid function is if Patterns(Omega) is a set of events - that is, Patterns(Omage) is a subset of PS(Omega), or Patterns(Omega)=PS(Omega). I think you were right in the first place: otherwise, Dembski’s definition remains broken.
Comment by Leonid Meyerguz — July 26, 2006 @ 2:59 pm
Salvador writes:
Brilliant, Sal, just brilliant. Except - one teeny problem! Recall that the definition of context-independent specified complexity is
Chi = -log_2(10^120 * Phi_S(T) * P(T|H)).
Now, you are bounding Phi_S(T) from above by a constant, so (taking care to flip the inequality due to the negative sign):
Chi >= -log_2(10^120 * 10^80 * P(T|H)) = -log_2(10^200 * P(T|H))
Suddently, specified complexity depends on nothing but the probability of T under H. Dembski would weep - unless he is the one who suggested the idea to you in the first place, in which case he should weep.
For instance, let T be any singleton even corresponding to an outcome of 1000 flips of a fair coint. Than for any T, P(H|T)=2^-1000. Now, for any T, Chi(T) >= -log_2(10^200 * P(T|H)) = 335.614. Clearly, any sequence of random coin flips exhibits specified complexity, and therefore must be designed! I don’t know about everyone else, but I’ve just been struck with a revelation! Hannah, do you concur?
The important point here is that in order to utilize specified complexity, Phi_S(T) should be very high for random events - it cannot be subject to any arbitrary constant bound. One potentially viable interpretation of Phi_S(T) is the following: assume k is the minimum number of bits needed to encode a program that can enumerate the elements of T (in S’s programming language of chioce); then Phi_S(T) = 2^k. Phi_S(T), to be even remotely useful, must measure the number of all possible representations at a particular descriptive complexity; it cannot be be limited to only those representations that S can enumerate within a fixed period of time or store within a fixed amount of space.
Well, Sal don’t keep us in suspence - what are the T’s? The way I see it, one of the few ways for Phi’(T) to be a valid function is if Patterns(Omega) is a set of events - that is, Patterns(Omage) is a subset of PS(Omega), or Patterns(Omega)=PS(Omega). I think you were right in the first place: otherwise, Dembski’s definition remains broken.
Comment by Leonid Meyerguz — July 26, 2006 @ 2:59 pm
Hmmm … the SPAM filter must really hate me. Anyhow, Sal, the last definition of Phi_S(T) you gave is horribly broken. You can either wait for my post to show up, or think about what you’ve done and bow your head and shame. ;)
Comment by Leonid Meyerguz — July 26, 2006 @ 3:02 pm
Lino writes:
No, Lino, that’s just the “I” in CSI, and only under the assumption of uniform probability over all outcomes in Omega. The “CS” part of “CSI” may well be impossible to compute. But your calculation of “I” - or rather, the underlying probability - is also incorrect, I am afraid.
Let’s forget about “I” for a second, and focus solely on the probabilities. If I understand you correctly, you are claiming that, under the assumption that all amino acids changes are equally probable, the probability that the “right” aa will be substituted at the “right” position is 20^-300. That is wrong.
Suppose, for simplicity, that all your assumptions are correct as the protein undergoes a mutation. So, the probability of a mutation hitting the “right” site is 1/300, and probability of picking the “right” amino acid at that site is 1/20. Then, the probability that the “target” mutation occurs is 1/300*1/20 = 1/6000, not 1/20^-300. The latter probability corresponds to the event that all the “right” amino acids are in place along every position in the sequence, assuming that there is exactly one “right” amino acid at each site, and that all sequences are equally probable.
Notice that the above probability calculations assume uniform distributions on sequences, and do not take selection into account at all. That, in my view, is only slightly more realistic than basing probability calculations off the assumption that a ball is equally likely to fly in any direction once we release from atop a building.
By the way, I saw your latest post in the “Analogies” thread. You raise some interesting questions, but it would take me quite a while to address them, and I doubt I will have the time today. In addition, it would help if I knew something of your background in computational and information theory. If you haven’t yet read the Wikipedia articles “Information Entropy” and “Kolmogorov complexity”, I would strongly recommend them: I think they are both very well written, and provide a good introduction to the relevant concepts. (Unless, of course, you’ve already read other books or technical articles on the subject, though I would strongly encourage you not to rely on Dembski alone in this regard.)
Comment by Leonid Meyerguz — July 26, 2006 @ 3:41 pm
Dan:“Still though, your “1400 bits of specified information” does not appear to contrain the available repertoire of mutations, it appears to de-constrain the directions in which a protein can evolve, making that protein highly “evolvable.””
Well, obviously the enzyme/protein has changed. We can call that ‘evolution’ if we want. But the best way of specifying it would seem to be just simply saying it has ‘changed’. (We know that much; the rest needs to be explored.)
I would say that this is a way of invalidating ID. The high level of complexity says, basically, that this ‘change’ in the protein COULD NOT have happened by ‘chance’, (a random, stochastic process) but had to be ‘directed’ in some sense. As microbiology continues to advance, this will likely be demonstrated, one way or the other. The only word of caution I would throw out here is that there is a tendency (this is an understatement on my part) for everything to be subsumed into the ‘prevailing paradigm’ (which in this case is Darwinism=RM+NS); so, I think it would be best to make up your mind before hand as to what you will make of the discovery that bacterial ‘change’/'evolution’ is directed in some fashion by the genome. Once this is discovered, it will immediately be ‘explained’ via the ‘prevailing paradigm.’ I suggest taking a position on how to personally interpret this discovery well before it happens (should it happen; if it doesn’t happen, then ID wlll, at the least, have to be revised) and is then “interpreted”.
Comment by Lino D'Ischia — July 26, 2006 @ 3:53 pm
Sal writes:
Actually, you are contradicting Dembski here. Dembsk treats analogies as part of specification: he is relying on them to compute the “specificational resources” (again, see his flagellum example). Dembski clearly defines “prespecification” as patterns defined before the random event in question is observed (i.e. “before-the-event patterns”, p. 13). Once you find an “analogy to engineering” in a biological system, it can no longer be uesd as a part of any prespecification when computing the specified complexity inherent in that system.
Of course, the very notion that one can use vague analogies in place of rigorously defined functions in order to compute probabilities is sheer non-sense from a mathematical standpoint. But that’s just another reason why specified complexity is completely broken.
Comment by Leonid Meyerguz — July 26, 2006 @ 4:17 pm
Lino,
Evolution is change, by well-described mechanisms.
Put another way for the example of Newton and the apple… Well, obviously the apple fell. We can call that ‘gravity’ if we want.
Of course, however, what you’re really disputing is whether a telic process was involved, towards which you said:
Yet, molecular biology has found in recent decades that such protein changes do in fact occur, within the range of a large set of deconstrained possibilities, and no telic process has ever been associated with this natural range of available genetic changes.
That’s a big prediction, and sounds very similar to the type of prediction that creationists have been claiming for decades - that evidence for the Creator or Designer or whatever is just around the corner. I’m not going to hold my breath.
Indeed, that is a good question. And I can’t think of anything that we might reasonably find that would effectively refute the wealth of empirical evidence that’s been gathered in the past 150 years.
But let’s look at the flip-side of that question - is there anything that would make you question your belief in Intelligent Design, or convince you in the validity of Evolution as a theory? Even more specifically, is there anything that would convince Young-Earth Creationists (and no, I don’t know your stance on that, I’m speaking generally) that the Earth and life on it began much much more than 6000 years ago? Anything at all?
Comment by Dan — July 26, 2006 @ 4:25 pm
Salvador:
Actually 10^30 is the Phi_S of a random 100 bit sequence. Crank it up to 1000 bits and you get 10^300. As Lino explained, an upper limit on Phi_S doesn’t make sense.
Comment by secondclass — July 26, 2006 @ 4:48 pm
I meant Leonid, not Lino. Sorry.
Comment by secondclass — July 26, 2006 @ 4:48 pm
Indeed, that is a good question. And I can’t think of anything that we might reasonably find that would effectively refute the wealth of empirical evidence that’s been gathered in the past 150 years.
Actually Lino, I was thinking about it, and I don’t want to give the impression that I just endorsed “blind faith” in evolution, as IDers and Creationists are fond of mistakenly claiming.
What I’m saying is, for biologists to be convinced that Evolution is false, all of the research in biology over the past 150 years (actually, round that up to 200) would have to be erased, and the fossil, anatomical, molecular evidence that we are currently acquiring would have to be changed, fundamentally altering the world as we know it.
The evidence for evolution is just that extensive - any new data that is incongruent with the current body of knowledge would have too much explaining to do.
I mean, can you think of any evidence that would convince you that Gravity is an invalid theory?
Comment by Dan — July 26, 2006 @ 4:52 pm
Dan:Yet, molecular biology has found in recent decades that such protein changes do in fact occur, within the range of a
large set of deconstrained possibilities, and no telic process has ever been associated with this natural range of available
genetic changes.”
I remember reading a paper recently that suggested that within the protein there are changes within the protein itself where
a.a.s were changing but that the overall effect was that it was a kind over/under, and the changes were essentially
‘neutral’. Now that might involve simple (well, not really so simple) chemical changes perhaps involving quantum shifts and
determined by quantum mechanical considerations. Here, however, since we’re seeing a definite cause and effect in the
bacteria, given the probabilities, if it is demonstrated that no manner of ‘directedness’ is involved, well, I would consider
that a major blow to ID.
As to YEC’s, their starting point is outside of science, which to my way of thinking means that they can never be persuaded
by scientific arguments. But I also think that there are very thoughtful YEC’s who see some kind of congruence between
certain scientific findings and their faith. These might be persuaded to let go of their YEC leanings. But, as we ’speak’,
I just read where they’ve found ‘open air caves’ in Australia that are 350 million years old. Based on considerations of
plate tectonics, that seems fairly wild. I wonder if the YEC’s aren’t going to make some hay with that one. You see what I
mean?
In the end, whether one is a YECer, an IDer, a Darwinister, an ETer, or whatever, the human heart and mind are such that
we’ll, in the end, only be moved by the truth. So, I think things should rise and fall with the quality of the facts and arguments, and not all the side issues.
Comment by Lino D'Ischia — July 26, 2006 @ 5:05 pm
Dan: “I mean, can you think of any evidence that would convince you that Gravity is an invalid theory?”
Well, I don’t think anyone quibble much about gravity here on earth do you? The problem is out there yonder.
New Scientist magazine has an article about MOND, which, I believe, deals with an article that has just come out in Nature regarding a different way of looking at gravity and dark matter. So, gravity, as a fact here on earth, no one quibbles about. But gravity as a theory of what happens out there yonder is up for grabs a little.
The parallel with the “theory of evolution” is that many, if not most, IDers outright concede that RM+NS operates at a species level; that is, microevolution. The problem always comes when we’re considering so-called ‘macroevolution’, or change at higher taxonomic levels. You might not be too familiar with the ‘underbelly’ of Darwinian theory. Denton’s “Evolution: a Theory in Crisi” is a good way of getting familiar with some of the biochemical and taxonomic difficulties that Darwinism needs to overcome. The Ptolemaic paradigm existed for centuries. All kinds of mathematical work went into it. But it finally collapsed. Is that going to happen to Darwinism? I think it will. But you know, strange as it seems, I may be wrong.
Comment by Lino D'Ischia — July 26, 2006 @ 5:17 pm
PvM writes:
Pim, thank you very much for your kinds words and your interest in our work - though I feel compelled to point out that while I set up and ran the computational experiments, most of the really interesting ideas in paper you cite are courtesy of my collaborators. However, I do not think this is the appropriate forum to discuss the paper: if for no other reason then because the work is very much on-going at the moment, and if I’m going to write about it, I might as well do it as a part of my dissertation (which is slated to be completed “any day now”, as it has been for the last eight months). My apologies.
I will, however, say this: every time an IDer gives a calculation for the very low probability of proteins evolving, it brings a bemused smile to my lips. I’ve said it a few times before, but it bears repeating - my work, and the work of most scientists, would be so very much easier if only we were to embrace ID. :)
Right you are. The only way Dembski could conceive of computing the specificational resources associated with a real biological system (the flagellum) is by resorting to analogy. Yet, analogy has no place in formal mathematical definitions required to make his CSI calculations work. So there really doesn’t appear to be any viable way to apply Dembski’s ideas to real-world phenomena. And that’s even without taking into account the probabilities that are presently impossible to compute or estimate reliably, and will likely remain so in the future.
Comment by Leonid Meyerguz — July 26, 2006 @ 5:27 pm
Yes, neutral mutations can and do occur, but I don’t know what you mean by your reference to quantum mechanics - it sounds as if you learned about biology from a theoretical physicist. I don’t know how much the average theoretical physicist knows about biology, but I work with several applied physicists, and their knowledge of biology isn’t too extensive (no offense to them, they’re brilliant in their own fields). That, and I wouldn’t try learning quantum physics from me (a molecular biologist) - you wouldn’t learn very much. ;-)
I don’t know of the paper you’re referring to, but I can’t see that paper helping the YEC’s any - they’d have to accept that the world was indeed here 350mya before they can make anything out of such findings.
Huh??? Are you suggesting that there’s some factual basis for Creationism; or that scientists haven’t been logically considering evidence-based facts and arguments; or that we’re idly discussing irrelevant side issues here; or something else?
…sounds like mumbo-jumbo to me.
Comment by Dan — July 26, 2006 @ 5:27 pm
Dan: “Huh??? Are you suggesting that there’s some factual basis for Creationism; or that scientists haven’t been logically considering evidence-based facts and arguments; or that we’re idly discussing irrelevant side issues here; or something else?
…sounds like mumbo-jumbo to me.
“
Well, I considered not mentioning anything about it since I thought it might prove to be a distraction. Personally, I think I prefer a world that’s billions of years old rather than one created not too long ago–and I don’t know why this is. But I’ve run across some stuff on creationist sites that I’ve visited from time to time that does call out for explanations. Maybe they’ve been refuted–I don’t know. I’m not interseted enough to find out. But again, we should go where the truth leads us; not just where we feel comfortable. That’s all I was saying. But let us not digress…..
Comment by Lino D'Ischia — July 26, 2006 @ 6:09 pm
Lino (109),
Yes, and no one is disputing that Gravity is real - but our understanding of Gravity and Relativity is changing all the time. Likewise, as you note, no one with any understanding of biology is questioning that Evoltuion is real, but there are debates on how it works.
I’ve read Denton. He’s an idiot. I recommend you read Carroll’s Endless Forms Most Beautiful, it completely knocks the feet out from under the notion that there’s any difference between speciation and phylogenic change, except the timescale.
And back to my earlier point we go: that the “collapse” of Darwinism is about as likely as the collapse of Gravity (or if you prefer, Relativity) in modern physics.
To your comments in 112: Ok, no digressing then. ;-)
Comment by Dan — July 26, 2006 @ 6:38 pm
Ian,
Thank you for participating. Let me offer a view heretical to most on the pro-ID and anti-ID side, but one I think is correct. But it will help answer your question for a CSI calculation for TEM or other proteins….
Regarding the objects you describe, I will show a method of how to demonstrate CSI in biology. Some readers may cringe at the way I calculate it in this post. Nobel Laureate Eugene Wigner in the 1960’s wrote an essay which inspired my approach to calculating CSI.
Recall the property of design in Dembski’s definition is merely a statistical property, CSI can be calculated with respect to various chance hypothesis H, and in the absence of searching an infinite number of hypotheses H, a simple uniform distribution is offered as the initial H to judge the existence of CSI.
To illustrate the calculation, consider that we have human factories making computer chips. There are thousands upon thousands of identical computer chips, with identical architectures. Thus, an indirect statistical check is to simply detect the patterns in each are identical. If they are identical, given that there is no law of nature forcing chips to conform to a particular pattern, we can say the artifact fits the definition of design. Saying it is designed makes no commitment to intelligent agency. Even biologist say something is designed without invoking ID.
One could possibly observe all the millions of transistors perfectly positioned in one chip, and then see that same identical in the other. Thus, one chip serves as a pre-specification of another.
However, such a CSI detection is overkill because it is so obvious from other external factors (like context), but the technique would statistically characterize these manufactured components as designed or evidencing CSI.
As Dembski wrote in The Design Inference:
From a standpoint of proximal causes, a mechanical process like a factory can make them. That does not negate the fact they are statistically characterized as designed. There is not any immediate aspect in the definition that would require one to do anything except give it the statistical label of design, or evidencing CSI. It speaks nothing to whether a machine or an intelligence was the proximal or ultimate cause for the artifact.
With that in mind, as a general principle, we can explore long biopolymers. A mix of bio-monomers will bond randomly (if at all) unless properly fabricated and placed in accordance with a pattern.
Consider a bio-polymer made of 400 amino acids (like a large protein). Since there are many copies of it, one may consider one to fit the definition of pre-specification.
The repeat of the pattern in numerous organisms allows it to be a pre-specification, thus Phi_S(T) may be dropped in the calculation. There are about 300 kinds of amino acids, but to be generous, I use the figure 20 to correspond to the 20 we find in most of life. Given there are 400 positions in this pre-specified protein, CSI with respect to uniform chance hypothesis for any of the 20 appearing at random would yield:
-log2(10^120 P( T|H) = -log2(10^120 * / 20^400) = 1330 bits
With the context independent CSI. This speaks nothing to whether a machine or an intelligence was the proximal or ultimate cause for the artifact. In fact, we observe these proteins emerge on account of cellular machinery not direct intelligent intervention. But by definition, the output of the cellular machine could be regarded as CSI.
About 45 years ago, Nobel Laureate Eugene Wigner was so amazed at the replication capacity of cells, he proposed a Biotonic Law of nature. A new law of nature. What in fact he observed was not a law of nature but what is known as a cybernetic law. Cybernetic laws are the laws that a machine abides by in addition to the laws of physics. As long as the machine can function, it will operate under the likeness of a rule of law. For example, “machine X will do task Y”, that is as statement of a cybernetic law.
Regarding my example, the fact that the phrase “design” is pervasive in describing architectures within biology, Dembski’s conception is consistent with the common practice of calling a biological artifact designed. He merely formalized what is ordinary practice of labelling things designed.
This of course has created some disconcert:
Intelligent Design the Future: Rudy Raff to biologists: Watch your language, the kids are listening
Raff write:
The practice of detecting CSI is alread followed implicity. Dembski only formalized what is common practice for labelling things designed.
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 6:41 pm
Ian,
Thank you for participating. Let me offer a view heretical to most on the pro-ID and anti-ID side, but one I think is correct. But it will help answer your question for a CSI calculation for TEM or other proteins….
Regarding the objects you describe, I will show a method of how to demonstrate CSI in biology. Some readers may cringe at the way I calculate it in this post. Nobel Laureate Eugene Wigner in the 1960’s wrote an essay which inspired my approach to calculating CSI.
Recall the property of design in Dembski’s definition is merely a statistical property, CSI can be calculated with respect to various chance hypothesis H, and in the absence of searching an infinite number of hypotheses H, a simple uniform distribution is offered as the initial H to judge the existence of CSI.
To illustrate the calculation, consider that we have human factories making computer chips. There are thousands upon thousands of identical computer chips, with identical architectures. Thus, an indirect statistical check is to simply detect the patterns in each are identical. If they are identical, given that there is no law of nature forcing chips to conform to a particular pattern, we can say the artifact fits the definition of design. Saying it is designed makes no commitment to intelligent agency. Even biologist say something is designed without invoking ID.
One could possibly observe all the millions of transistors perfectly positioned in one chip, and then see that same identical in the other. Thus, one chip serves as a pre-specification of another.
However, such a CSI detection is overkill because it is so obvious from other external factors (like context), but the technique would statistically characterize these manufactured components as designed or evidencing CSI.
As Dembski wrote in The Design Inference:
From a standpoint of proximal causes, a mechanical process like a factory can make them. That does not negate the fact they are statistically characterized as designed. There is not any immediate aspect in the definition that would require one to do anything except give it the statistical label of design, or evidencing CSI. It speaks nothing to whether a machine or an intelligence was the proximal or ultimate cause for the artifact.
With that in mind, as a general principle, we can explore long biopolymers. A mix of bio-monomers will bond randomly (if at all) unless properly fabricated and placed in accordance with a pattern.
Consider a bio-polymer made of 400 amino acids (like a large protein). Since there are many copies of it, one may consider one to fit the definition of pre-specification.
The repeat of the pattern in numerous organisms allows it to be a pre-specification, thus Phi_S(T) may be dropped in the calculation. There are about 300 kinds of amino acids, but to be generous, I use the figure 20 to correspond to the 20 we find in most of life. Given there are 400 positions in this pre-specified protein, CSI with respect to uniform chance hypothesis for any of the 20 appearing at random would yield:
-log2(10^120 P( T|H) = -log2(10^120 * / 20^400) = 1330 bits
With the context independent CSI. This speaks nothing to whether a machine or an intelligence was the proximal or ultimate cause for the artifact. In fact, we observe these proteins emerge on account of cellular machinery not direct intelligent intervention. But by definition, the output of the cellular machine could be regarded as CSI.
About 45 years ago, Nobel Laureate Eugene Wigner was so amazed at the replication capacity of cells, he proposed a Biotonic Law of nature. A new law of nature. What in fact he observed was not a law of nature but what is known as a cybernetic law. Cybernetic laws are the laws that a machine abides by in addition to the laws of physics. As long as the machine can function, it will operate under the likeness of a rule of law. For example, “machine X will do task Y”, that is as statement of a cybernetic law.
Regarding my example, the fact that the phrase “design” is pervasive in describing architectures within biology, Dembski’s conception is consistent with the common practice of calling a biological artifact designed. He merely formalized what is ordinary practice of labeling things designed.
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 6:45 pm
Salvador:
Sal, see comments 94 and 103.
Comment by secondclass — July 26, 2006 @ 6:57 pm
Sal - you do realize that entire argument (in 113) amounts to nothing more than the Design Analogy that Hannah wrote a post on last week, and I replied to, don’t you?
Further, as I reminded Lino in a similar comment upthread, when he mentioned similar calculations to a protein of about “1400 bits of information,” that allows said protein a starting point of 1400 possibly viable mutations. That’s a lot of options, which significantly deconstrains the possibilities for protein A to protein A’ given mutations, selection and time, making the 300 amino acid protein you mention downright “evolvable.”
IOW, you still have yet to demonstrate a viable example of CSI in biology that precludes non-telic processes.
Comment by Dan — July 26, 2006 @ 6:58 pm
Seems like I’m not the only one confronted with this problem.
Intelligent Design the Future: Rudy Raff to biologists: Watch your language, the kids are listening
Paul Nelson writes:
Rudy Raff writes:
The practice of detecting CSI and calling things designed is already followed implicity in science, even by biologists. Dembski only formalized what is common practice for labelling things designed.
Of course if biological complexity is formailized as “something not being the product of stochastic processes”, why should we expect anythig like Darwnian Evolution to create it. What sort of math will describe the details Darwinian evolution? Answer : none.
Naturalistic evolution properly follows the logical form : “E implies not-E” which is self contradictory.
I thought I did. And this it realates to protein duplication which I just calculated CSI for. Biological Turing Machines
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 7:20 pm
…2nd try, first got stuck in the spam filter…
Lino (109),
Yes, and no one is disputing that Gravity is real - but our understanding of Gravity and Relativity is changing all the time. Likewise, as you note, no one with any understanding of biology is questioning that Evoltuion is real, but there are debates on how it works.
I’ve read Denton. Not a good book, even when it was written, and by now it’s hopelessly outdated. I recommend you read Carroll’s Endless Forms Most Beautiful, it completely knocks the feet out from under the notion that there’s any difference between speciation and phylogenic change, except the timescale.
And back to my earlier point we go: that the “collapse” of Darwinism is about as likely as the collapse of Gravity (or if you prefer, Relativity) in modern physics.
To your comments in 112: Ok, no digressing then. ;-)
Comment by Dan — July 26, 2006 @ 7:24 pm
Phi_S(T) however deals with specifications that are not also pre-specifications. That distinction was not in his earlier writings….
However, pre-specifications are so abundant that this will mostly become a moot point. Although I thought the idea of Phi_S(T) an interesting idea in it’s own right.
What is an example of a pre-specification? Someting not post dictive, or something an engineer competent in practice would have created. If biologists are already calling things designed, there is surely a motivation for this, and it is the fact they recognize analogous technologies in the cell, in fact, superior technologies!
Denton pointed this out:
Paley’s watches abound in biology. There is no scarcity of pre-specifications. Even in evolutionary biology we have a term sympathetic to this, the idea of functional convergence.
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 7:29 pm
Dan:
Of course, according to Salvador, gravity is also designed. Under a uniform chance hypothesis, which Sal considers sufficient to detect design, it’s very improbable that objects would always fall down instead of up.
Comment by secondclass — July 26, 2006 @ 7:35 pm
Salvador:
Dembski seems to agree with you, but then he tells us that Phi_S(T) &>; 1 for T=”royal flush” and for T=”bidirectional rotary motor-driven propeller”. How do you explain that?
Comment by secondclass — July 26, 2006 @ 7:53 pm
Sorry, that should read: “Phi_S(T) > 1″
Comment by secondclass — July 26, 2006 @ 7:55 pm
Sal, also, how do you explain your quote from Dembski that “with prespecifications, there is one and only one target”? In all of your prespecification examples, there are many different patterns to which the events could have conformed rather than the pattern that you mentioned. How, then, is the pattern that you mentioned the only one target?
Comment by secondclass — July 26, 2006 @ 8:01 pm
Why thank you.
Phi_S(T) is ultimately bounded by physics, independent of Dembski’s considerations. Physics is something you might consider. :=)
Indeed that was my point as I am trying to show for the case of simple single concept, well-defined specifications, such as the “Champernowne sequence”, “all 1’s”, “all zero’s”…Assuming we can even map these simple concepts out to a specific bit pattern of n-length, we would still be constrained by physics to 10^80 such single concepts. But if one wants to be a stickler and look at combinational concepts like “1’s complement of Champernowne sequence”, then something like 10^(80*2) for 2 combined concepts…for N combined concepts 10^(80*N)
But why all the trouble? If a single concept is well defined, it is a pre-specification.
Phi_(T) bounded by 10^(80*N) where N is the number of concepts. Overkill to the max to satisfy your critics. There, Bill, hope you feel better.
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 8:56 pm
secondclass,
About 3 of my posts are in the spam queue. One deals with pre-specifications.
That means there is no equivocation of the each prespecifications meaning, for each pre-specification, there is only one target.
For example, the prespecificaiton “all heads” maps to only one target for a given number of coins.
In contrast, the concept “ordered” maps to who knows how many patterns of coins.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 9:06 pm
Please wait for the unspam to come out. Thanks.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 9:07 pm
Sal writes:
Except we already know the null hypothesis of uniform distribution to be wrong, so your CSI calculation is pointless. You might as well compute the CSI inherent in a ball dropping to the ground 100 times when under the null hypothesis that it’ll just fly off into a random direction when you let it go.
So, according to your discussion above, we can say the following about CSI in a protein:
1) CSI is a one-to-one, onto function of protein sequence length. We might as well define CSI as just sequence length and be done with it.
2) CSI tells us nothing about the causal history, functional complexity, stability, foldability, or evolvability of the protein. It tells us nothing whatsoever except sequence length, which we already knew before we set out to compute CSI.
I can certainly see why this idea would be considered heretical in pro-ID circles. However, it is perfectly consistent with the anti-ID position that CSI can tell us nothing useful about biology, so you’ve committed no heresy as far as I’m concerned. ;)
Comment by Leonid Meyerguz — July 26, 2006 @ 9:13 pm
No it is not. That was one example of CSI from a strand, there may be more.
There are about 7 layers in the ISO/OSI model of the internet. That means 7 layers of specification for the same system. You’re being too hasty to presume that my example is the only specification dimension of CSI detection.
A bit-stream may evidence many layers of CSI, likewise in regards to bio-polymers, I only identified one, an easy one for the sake of the readers.
I offered an simple example in order to educate readers with known and uncontroversial facts such that there will be little to dispute. However, one shouldn’t presume that as a general rule one shouldn’t presume CSI will be dealing with such small cases.
I already pointed out we are discovering linguistic structures in biology (CSI) for which we may not know the function. We are presuming the language there may signal functionality. Any time we find a design in biology corresponding to an engineering construct, that is a detection of CSI, and an advance for science.
Salvador
PS
Some of my other responses to you are still trapped in the spam queue.
Comment by Salvador T. Cordova, IDEA GMU — July 26, 2006 @ 9:41 pm
I wrote:
Salvador responded:
Be that as it may, you just gave an example where CSI is a trivial function of sequence length and is entirely meaningless. So, you essentially concede that CSI of 150 bits may well mean nothing at all, much less constitute a “design inference”, unless lots of additional context is provided. Thank you for making my case for me.
The only case where you can compute CSI is a small and entirely trivial one, and even you appear to agree that it doesn’t tell us anything at all. So how do we compute CSI for biological systems where it wouldn’t be meaningless?
Comment by Leonid Meyerguz — July 26, 2006 @ 10:37 pm
Salvador:
Sal, first of all, analogies don’t meet this requirement. Second, Phi_S(T) has nothing to do with the number of events that conform to T, so a pattern that’s unequivocal in the sense you describe will not result in a Phi_S of 1.
Comment by secondclass — July 26, 2006 @ 10:46 pm
On second thought, Sal, analogies can meet your requirement, but you still don’t get a Phi_S of 1.
Comment by secondclass — July 26, 2006 @ 10:56 pm
Salvador write:
Sigh. It is becoming readily apparent that you really don’t understand Dembski’s paper. Phi_S(T) is the size of a combinatorially large set, equivalent to the number of all possible bit strings of length necessary to fully describe T. If S needs 10000 bits to describe T, Phi_S(T) can be upper bounded by 2^10000, not 10^80, or 10^160.
Didn’t you even read the rest of my post #100? Did you miss the part where I can show, that under your definition of Phi_S(T), any sequence of 1000 random coin tosses will exhibit specified complexity? Please go back and re-read it. If something is unclear, feel free to ask any questions.
The point is - and this is crucial - that you simply cannot bound Phi_S(T) by any arbitrary constant without completely breaking Dembski’s definition of specified complexity without pre-specification. Not that his defintion works well even with unbounded Phi_S(T), but it’s a start.
But Phi_S(T) has nothing to do with pre-specifiactions: it is supposed to give us a way to compute specified complexity when pre-specification is absent. 10^80 would indeed be a good upper bound on the number of pre-specifications in S’s background knowledge, but that doesn’t help your argument in any way. Phi_S(T) is needed to compute Dembski’s specified complexity when the random event does not match any “pre-specification”, so you simple cannot bount Phi_S(T) the way you do. Again, re-read post #100, and go ask Dembski himself if you don’t believe me.
Comment by Leonid Meyerguz — July 26, 2006 @ 11:20 pm
second class writes:
You are correct, I think. However, in the case of pre-specifications, Phi_S can be bounded by a fixed constant (like Sal’s 10^80), unlike the general case of specifications. I’ll try to post more on that tomorrow.
Of course, the fun thing about using analogies to construct events T is that the set of things that someone, somewhere, under a certain light sees as, say, a “motor” will be very large indeed, so the low probabilities that IDers like to assert will evaporate.
Comment by Leonid Meyerguz — July 27, 2006 @ 1:52 am
Not if the bit string is associated with a single concept such as “all heads”, “Champernowne sequence”, or some other well-defined pre-specification etc.
All map to a small Phi_S(T) because they are exactly specified by a concept. A string that is Kolmogorov-Complex and not-prespecified does not have that luxury. Thus one wouldn’t even dream of using Phi_S(T) on that situation, as one doesn’t have a specificaiton that’s really shorter in the human culture than the specification itself.
However, if I made a random toss of 1000 coins, recorded the outcome and labeled it PasswordX15, then that can serve as a single concept in the future from which other concepts can be merged with it. At that point passwordX15 has become a pre-specification, part of the human culture. Simple transformations of passwordX15 would be subject to the Phi_S(T) restrictions such as “1’s complement passwordX15″. In that case it would be, using my extreme formula
Phi_S(T) = 10^80^2 since :
Phi_S(PasswordX15) = 1
Phi_S(1’s comlement) = 10^80^2
And this is exactly the kind of situation Bill Dembski was referring to:
He does not use my 10^80, but 10^5
The 10^30 value given would appear to allow about a 5 word description. A four word description would be “1’s complement Champernowne sequence”.
I did read it, and I believe I just addressed your concern.
Comment by Salvador T. Cordova, IDEA GMU — July 27, 2006 @ 2:47 am
Salvador:
Here you’re saying that “PasswordX15″ is the only single concept in your repertoire. I find that hard to believe.
Comment by secondclass — July 27, 2006 @ 10:24 am
Dembski assigns pre-specifications a Phi_S(T) = 1, but would it make you feel better if for pre-specifications a Phi_S(pre-specification) = 10^80?
Is that unreasonable? I know 10^80 is far higher than Dembski’s 10^5, but well, if you can live 10^80 as a pre-specification, I’ll try to work with it for the sake of discussion.
Salvador
Comment by Salvador T. Cordova, IDEA GMU — July 27, 2006 @ 10:45 am
Salvador, 10^80 sounds like a very generous upper bound for 1-level concepts.
I suppose “bidirectional rotary motor-driven propeller” is a 1-level concept if someone has considered that phrase before. And every phrase for which Google returns at least 1 hit. I think 10^80 is still generous.
Thanks. I think we’re on the same page.
Comment by secondclass — July 27, 2006 @ 11:34 am
secondclass,
I plan to post on an analogy which is accepted well in biology, namely, the Genetic Code.
We consider DNA “genes” to be an analogy to proteins. There is no a priori requirement that humans project onto genes and proteins the metaphor (analogy) of a coding/decoding system. We do this so naturally we don’t even step back and say, “is it even appropriate to project such specifications on to the system? After all it’s not a law of physics, yet we model it as a law.” Indeed we model it as a law, but it is a cybernetic law, not a physical law.
We effectively, from an operational standpoint, use design metaphors. Independent of any metaphysical baggage, this makes operational sense since the metaphor of code/decode is so operationally effective.
In projecting this metaphor onto biology, we have implicitly adopted idea which was formalized by:
Chi = -log| M N Phi_S(T) P(T|H) |
Some how, in practice, biologists have implictly accepted Chi was sufficiently high. Since the genetic code is pretty much beyond dispute, I think that will be a non-controversial example of using analogies as pre-specifications. This will also be interesting because there are non-Universal genetic codes in operation as well. The point is then to show why when biologists in peer-reviewed papers say something is designed, it is a statistically justifiable claim, and that this statistical property need not necessarily come with any metaphysical baggage.
In otherwords, there is something special about machines or biology which separates them from ordinary matter. It is not that they are made of a different substance, but that they are organized in a special way that makes them very distinctive. CSI helps formalize that distinction.
Comment by Salvador T. Cordova, IDEA GMU — July 27, 2006 @ 12:14 pm
Sal is making a lot of handwaving assertions. I am familiar with Dembski making a similarly fallacious claim that the explanatory filter is how we infer design routinely.
In fact, in most if not all cases, rather than going through the steps outlined by Dembski, people are far more pragmatic and look for such items as means, motives, opportunities, eye-witnesses, hearsay, physical evidence, pathways and more.
That we use ‘analogical’ terms in biology however does not mean that analogies play the role Sal hopes to give to them. Analogies, metaphors are ways to communicate ideas but unlike ID, analogies are at most the start. Take the genetic code for instance: science has unraveled much of the likely history of the genetic code, showing for instance how the early code may have involved regularities (law like behavior) between the code and the aminoacids it coded.
To argue that CSI helps formalize the obvious distinction is just wishful thinking. CSI is a concept so intractable that I doubt it has ever been applied to any non trivial examples. In fact, as I have argued, CSI is a concept which excludes regularities a priori as being able to generate CSI and by assigning both the requirement of high probability P(T|H) for H to be a plausible hypothesis and the requirement that P(T|H) has to be extremely small to satisfy specification, ID is basically arguing E and not E at the same time…
Rather than make outlandish claims, Sal should focus on providing a more reasoned argument as to why analogies should be seen as relevant to the concept of design.
When biologists use the term ‘designed’ it has a clear meaning which should not be conflated with the way ID uses design.
IDers, by quickly embracing biologists using the term design as evidence for their own concept of design are avoiding the limited usage of the terminology of design in science.
In fact, by embracing how biologists use the term, ID has not only embraced the weak argument of analogy but also apparant versus actual design.
If ID’s goal is that Intelligent Design remains a possibility then they need not go through so many hoops, ID is always a logical possibility, certainly a priori. But a posteriori, the lack of much any evidence, hypotheses etc and the existence of plausible scientific hypotheses means that Occam effectively kills off ID’s vacuity.
Could Sal explain, when it comes to the genetic code, how replicational and specificational resoures play a role, or even how the concept of CSI plays a role. I doubt that most any scientist uses an approach which even ID proponents seem to consider as unusable, and which by definition requires good scientific hypotheses to have low CSI. Looking through this thread one sees how much confusion the mathematics of the new CSI has generated, even amongst ID proponents and yet we see the same ID proponents that CSI is a real and relevant formalization of how science works.
My suggestion to Sal: Take any biological system and show that it contains CSI. Why not take the bacterial flagella? ID claims it contains CSI, can you share your calculations with us how this was achieved where P(T|H) is calculated given the latest relevant hypotheses of the flagella? Or is ‘formalization’ just another term for incalculable term? Does this mean that ID, in addition to providing no hypotheses as to how to explain a ‘designed’ system also relies on an impractical tool to even reach the claim of ‘design’?
Does that not make ID scientifically irrelevant or vacuous?
Comment by PvM — July 27, 2006 @ 1:12 pm
Leonid excellent observes
By ignoring any scientific hypotheses, ID basically returns to the creationist assertion that life is to improbable to have arise by pure chance.
In the mean time science has done quite some work in unraveling the history of the genetic code, it’s evolution via RNA world to protein world. Science has shown how RNA the scale free nature, which can arise via simple processes, makes sequence space well connected, in the sense that via neutral mutations most any common structure is connected to other common structures. Leonid, who is involved in exploring similar aspects in proteins, also seems quite familiar with the scientific status when it comes to proteins.
Evolvability indeed is a beautiful concept in sciences which shows how in fact evolution can evolve itself.
In the mean time, ID shows all the evidence of stasis when it comes to making pure chance claims against evolution (and in favor of intelligent design). But in that case intelligent design is far different from how this term is interpreted. Intelligent Design becomes in fact a quite limited argument, succeptible to knowledge, based on gaps, making ID scientifically irrelevant. Although, calculating pure chance probabilities may be fun in mathematics…
Comment by PvM — July 27, 2006 @ 1:26 pm
Sal writes:
Actually, Phi_S(T) is intended to be used exactly in the situation when one does not have a single prespecification handy, but some combinations of available elementary concepts apply. That is what Dembski’s flagellum example is all about, as I will show below. Again, I encourage you to ask him, if you don’t believe me.
Now consider tossing the coin another 1000 times, and recording the other seemingly random sequence, and labeling it passwordX20. Now, suppose someone presents you with what they claim to be an outcome of 1000 coin tosses, but you notice something odd: the first 500 coin tosses correspond exactly to the first 500 tosses in passwordX15, and the second 500 correspond to the first 500 tosses in passwordX20. From these two “pre-specifications” (elementary concepts), you would form a description: “first half passwordX15 plus first half passwordX20″. Supposing each word corresponds to an elementary concept, the total length of the description is 7. Assuming you have 10^5 elementary concepts in your knowledge base, Phi_S(T) is upper bounded by 10^(5*7) = 10^35.
Now, on to Dembski’s flagellum example. He writes:
Note that 10^20 = (10^5)^4 = The number of descripritons consisting of at most level-one concepts. If, instead of “bidirectional rotary motor-driven propeller”, Dembski chose to desribe the flagellum as “assemblage [of] proteins vaguely resembling something [an] engineer might conceive”, then, with each non-bracketed word corresponding to an elementary concept, the number of concepts used to describe the flagellum is 8, and Phi_S(T) becomes bounded by (10^5)^8 = 10^40. In other words, Phi_S(T) can be arbitrarily large - as large as the number of elementary concepts in the knowledge base raised to the power of the shortest description of the event in question. Obviously, the higher the Phi_S(T), the lower Dembski’s specified complexity.
Sorry, but I still don’t think you did. The fact of the matter is, with an artificially bounded Phi_S(T), every single improbable event exhibits CSI. You are claiming we can’t ever compute Phi_S(T) unless the event matches some a-priori defined description: i.e. a “prespecification”. I think Dembski would strongly disagree: in fact, his flagellum example is precisely an attempt to illustrate how to determine Phi_S(T) when no pre-specification exists.
Comment by Leonid Meyerguz — July 27, 2006 @ 2:28 pm
I read it, and I responded.
I am not claiming that, either. Consider that you’ve encounter two 500-coin strings which have identical patterns, none of which you’ve ever seen before. As far as you can tell, the pattern is Kolmogorov Complex.
We can consider them specified because they are identical, and a stochastic processes would be an inapproriate description to account for the 100% correlation between the two patterns even though you’ve never seen them in your life. Phi_S(T) = 1 for practical purposes because identity is so fundamental to science, one does not need any special Phi_S(T) for cases involving absolute identity.
If on the other hand, one pattern was the one’s complement of the other, Phi_S(T) would be some number because one string merely inverts the pattern of the other string, and under this simple concept, the two can be correlated.
Phi_S(T) in that case would be 10^5 under Dembski, and 10^80 under my extreme supposition. I pointed out a reasonable penalty for each component in such a “transformation” is 10^80^N where N is the number of elementary components to define the transformation.
To see an approximation of why that is, consider that one has a language X with a dictionary of cardinality 10^5, under uniform distribution, each symbol conveys -log2( 1 / 10^5 ) bits. With two words on conveys -log2 ( 1/10^5^2), and for N words -log2(1/10^5^N). Thus the more words one invokes the more POST-DICTIVE ones projection of a specification is. Phi_S(T) factors out amounts of the post-dictive information that may be prejudicing our inquiry.
There is also one important constraint, the description must also map to a T which sufficiently small P(T|H). Thus a word like “whatever” or “anything” would disqualify consideration of the target, thus CSI would not be inferred in those cases either. For highly specific patterns like Champernowne, they tend to be pre-specifications anyway, so that tends to drop the need of a Phi_S(T) in the first place. Thus the assumption of uniform distribution is not overturned because a word like Champernowne might carry more information with it to construct a specification. In fact that is a highly desirable quality!
The number of words is an most objective measure we have available. But because a dictionary is ultimately circular in defining words by other words, one can only treat one symbol (a word) as ultimately informative as another…. Thus the number of words in the dictionary is reasonable for calculating Phi_S(T), but it that seems too generous, I pointed out 10^80^N is available given we’re using every atom in the universe as a “word” in our dictionary “universal dictionary”.
Thus I have found counter examples to your hypothesis, “with an artificially bounded Phi_S(T), every single improbable event exhibits CSI”.
The fact of the matter is when biologist start arguing for non-random relationships between gene and proteins, they are making unwitting statistical claims. CSI can be used to formalize the descriptive fidelity of engineering metaphors being projected onto biological systems.
And in regard to your claim, take the 500 randomly tossed coins:
The only description then is 500 elementary concepts, with about one concept per coin. (i.e, coin #1 heads, coin #2 tails, ….etc.)
Phi_S(T) = 10^80^500
This number is so large it will yield a CHI
CHI_context_independent =
-log2( 10^120 * 10^80^500 * 1/2^500 )
Comment by Salvador T. Cordova, IDEA GMU — July 27, 2006 @ 3:59 pm
If I may offer this thought, Dembski has a background in Cryptography.
When one thinks one has broken a code, one has to ask oneself, “is my supposed breaking of the code an artifact of my own prejudices, what is the appropriate Phi_S(T) which will indicate to me I have not post-dictively projected something into my code breaking scheme.”
I have a strong feeling Dembski’s Phi_S(T) originated from issues within cryptography. The less conceptually rich the code breaking scheme relative to the message length being studied, the more confidence one had that the decryption hypothesis is correct ( that the CSI detected was using appropriate specifications).
Comment by Salvador T. Cordova, IDEA GMU — July 27, 2006 @ 4:05 pm
… only insofar as your posts are some of the most interesting on the thread, but I have to rewrite them to figure out what you’re talking about :).
I couldn’t get it to work here… blogsome is rather limited. But will you all see what you can do at Specified Complexity? — details here.
Comment by Hannah — July 27, 2006 @ 4:34 pm
Salvador writes:
Yes, you did. But only in this last post. Now, we finally understand one another (almost).
The following stochastic process will is account quite nicely for the sequence in question:
1) Start with an empty string
2) Add a random character (either “H” or “T”) to the end of the string, each with equal probability.
3) With probability 1/500 duplicate the string.
4) Go to step 2.
Let us not pretend that the only stochastic process under the sun is uniform random sampling, shall we? And while you are welcome to bring up the displacement theorem again, you probably shouldn’t, as you seem averse to actually discussing it.
You are very much correct here. It appears you and I are making progress. Think about it this way: when you witness the first pattern T, you automatically add its description to your knowledge base. Now, you have an additional level-one concept - a pre-specification, if you will - in your knowledge base. Say, then you witness “one’s-complement of T”. You can now construct the shortest description based on the contents of your knowledge base; this description is “Not T”. If the number of “level one concepts”, or pre-specifications, is 10^80, then Phi_S(T) will be 10^160. Are we on the same page?
Exactly! This is essentially what I’ve been trying to tell you all through the second half of this thread. My main goal was to point you to the correct understanding of what Dembski meant by Phi_S(T): namely, that it is not a function of the cardinality of T, and that it is not a fixed constant. After all, if you’re going to champion Dembki’s work, you might as well know what he’s talking about.
At least, that’s what it is supposed to do. But it only winds up trading off the prejudice of a post-dictive description of an event for the prejudice of a subjective observer giving that description.
Except you are no longer artificially bounding Phi_S(T) by any fixed constant: it are allowing it to vary exponentially with the description length (as well you should). That’s what I’ve been trying to tell you all along!
Sure, just like when physicists start talking about a relationship between charge and the direction of electrical force, they are making unwitting statistical claims, or when I talk about my computer turning on when I flip the switch to the “on” position, I’m making an unwitting statistical claim. Unwitting statistical claims all around!
And there you go - see what happens when you don’t just artificially bound Phi_S(T) by a fixed constant value? Congradulations, Sal - you finally understand Dembski. Now, perhaps, we can start talking about the near-impossibility of applying CSI to real-world biology in any meaningful way.
Comment by Leonid Meyerguz — July 27, 2006 @ 6:02 pm
Hannah:
My hat’s off to you, madam! Usually, no amount of re-writing can help me figure out what I’m talking about. :)
The website looks great - very impressive job! - but so far there is one thing I can’t - register. I’ve tried both my gmail and my cornell email addresses, but haven’t received a password on either one. Or should I have waited for the approval from you after submitting the first registration request?
Touche.
Comment by Leonid Meyerguz — July 27, 2006 @ 6:20 pm
Yet another attempt to try and connect “the math” with some sort of biology (actually, biochemistry). And seeing where things go.
Speaking for the moment in terms of polypeptides: As far as I can tell, the “specificity” that is being discussed here is roughly (perhaps exactly) equal to the ratio of polypeptides with some specified function (say, binding to streptavidin) to all possible polypeptides (nominally, of lengths up to some arbitrary limit). That seems to me to be a good way to relate the matters of compressibility and probabilities to some quantity that biochemists can relate to (and better yet, measure).
Given this (and if there are quibbles, please offer them up), we can pretty quickly realize that, in the general sense (where all of the presumed replicational resources of the universe are available), the specified complexity of proteins is vanishingly small. This follows from the observations, derived from random combinatorial experiments, that the above-mentioned ratio is, generously, going to be about 10^-20. (This value is not really dependent on the length of the polypeptide in question; I’ve pointed to a discussion that deals with the experimental data, but I can add here that the way to understand this is to apply the concept of compressibility to protein function. It may not be perfect, or formally correct, but in the context of this discussion it’s a good way to understand the approximate length-independence of protein functionality.)
My questions for the crew here:
Leonid, is this a fair way to relate specificity to something like polypeptide function?
ID supporters - given the experimental results I allude to, is there a way to “rescue” Dembski’s derivation, at least when it comes to the hope for actually finding specified complexity in biology?
Anyone else - does any of this make sense? Or am I making things more confusing?
Comment by Art G — July 27, 2006 @ 7:45 pm
Looks like a very large negative number. What meaning does such a number have I wonder?
Comment by PvM — July 27, 2006 @ 11:42 pm
Guess that it means much of nothing ?
Comment by PvM — July 30, 2006 @ 1:16 am
It means there is no CSI, and thus it is a counter example to Leonid’s claim that all things can become CSI.
My post was cut off becuse of the “less-than-or-equal” sign. The rest of my conclusion was truncated where I pointed out that Leonid’s thesis was incorrect.
Comment by Salvador T. Cordova, IDEA GMU — July 30, 2006 @ 2:57 am
I have more comments at Specified Complexity
Comment by Salvador T. Cordova, IDEA GMU — July 30, 2006 @ 6:27 am
Sal: It means there is no CSI, and thus it is a counter example to Leonid’s claim that all things can become CSI
Your are making a logical error here. Showing a counter example requires one to show that your example covers all relevant H processes.
Comment by PvM — July 30, 2006 @ 7:20 am
In addition, I do not believe that you accurately represent Leonid’s argument.
Comment by PvM — July 30, 2006 @ 7:24 am
As to 155: Sal had claimed that Thus if one is determined to utilize a Phi_S(T), 10^80 is an outside number which bounds our specificational resources from a practical standpoint. When Leonid pointed out what this would mean, Sal recalculated with Phi_S(T) = 10^80^500
Comment by PvM — July 30, 2006 @ 7:30 am
In fact, when Sal replaced 10^80 with the correct Phi_S(T), Leonid responded as follows:
Showing that one cannot artifically state that 10^80 is a good estimate for Phi_s(T)
One has to do the hard work of actually calculating the appropriate phi_s(T).
Comment by PvM — July 30, 2006 @ 7:35 am
PvM–
It would be better if you could respond there, not here.
Comment by Hannah — July 30, 2006 @ 7:36 am
Ok, I was merely responding to Sal’s claims made on this thread. But if you believe that they belong elsewhere, fine with me too.
Comment by PvM — July 30, 2006 @ 7:43 am