Analysis of Unprovenanced Artifacts Was Too Flawed to Be True

Click here for the LMLK Dotcom home page!

           Books

           Dealers

           Photos

           Research

           Museum

Here is the Forgery-Statistics department!

The following article refutes the conclusions of Vaughn and Pillers Dobler published several years ago in 2 academic publications (see Note #1).

Vaughn, a friend of mine since 2002, sent a courtesy copy of the original article to me in May 2004, which was beyond my comprehension level at the time.

My criticisms herein are purely academic on a particular subject, and do not reflect my admiration of his scholarship in general.

You can post comments at the related entry on LMLK Blogspot or LMLK Wordpress.

Science writings by G.M. Grena:

Analysis of Unprovenanced Artifacts Was Too Flawed to Be True

Are forgeries similar to authentic specimens or different? How similar? How different?

by G.M. GRENA 4000 YEARS OF WRITING HISTORY POB 484 REDONDO BEACH CA 90277-0484 USA		NOTES
Nearly a decade has elapsed since Andrew G. Vaughn and Carolyn Pillers Dobler (hereafter "VD") authored an analysis of Hebrew seal designs in an effort to detect the probability of forgeries among the corpus of known specimens. The results from their study appeared in two separate scholarly publications,^1 and until now have not been challenged in any formal way.^2 Maybe that's because most people assume it's reliable without bothering to analyze it in detail. Such was the case for me when I initially read them.		^1) They completed their research prior to 2002 according to footnote #2 in "The Probability of Forgeries: Reflections on a Statistical Analysis" at the SBL Forum 3.3 (2005). The original research appeared a year later in "A Provenance Study of Hebrew Seals and Seal Impressions: A Statistical Analysis" in "I Will Speak the Riddles of Ancient Times" vol. 2, Archaeological and Historical Studies in Honor of Amihai Mazar on the Occasion of His Sixtieth Birthday edited by Aren M. Maeir and Pierre de Miroschedji (Winona Lake, Indiana: Eisenbrauns, 2006). ^2) BAR vol. 31 #3 (May/June 2005) contains an anonymous (presumably editor, Hershel Shanks) satirical article, "Major New Forgeries Uncovered by Professor GBSTMT" on pp. 49-50.
Their stated purpose was "to look objectively at various traits ... to see if they could find statistically significant differences between seals and seal impressions from known and unknown provenance."^3 Said another way, "if seals and seal impressions from known and unknown provenances were all authentic, one should find similar features in both groups."^4		^3) VD 2005, first paragraph of "A Desire to Scientifically Test the TBTBT [sic] Rule" section. TGTBT (misspelled in the article heading) = Too good to be true. ^4) VD 2006, p. 758.
Their conclusion: "They were amazed that such striking differences were found. They were actually shocked that they found as much evidence for the forging of Hebrew bullae as for the forging of Hebrew seals."^5		^5) VD 2005, second paragraph of "A Desire to Scientifically Test the TBTBT [sic] Rule" section.
As a collector of unprovenanced antiquities (including a bulla presumably impressed by a servant of a famous person named in the Bible), as well as an independent researcher who has studied provenanced antiquities from scientific excavations (jar handles bearing seal impressions^6), this report perplexed me.		^6) VD excluded these artifacts from their study since they found that the overwhelming majority of unprovenanced specimens matched provenanced ones (VD 2006, pp. 758-9).
Probable Problems
The first and most glaring problem is that they arbitrarily equate "differences" with "evidence for the forging" of the seals. Do observed differences (or even similarities) among groups necessarily imply forgery? VD provided no basis for such a conclusion, but assumed it without even attempting to qualify a threshold between similar and different, genuineness and fakeness. Three personal seals excavated at Arad in southern Israel provide the classic example of significant, striking differences among provenanced specimens. Though the same name appears on each, one is spelled differently (omitted letters, though it has the largest writing area), another is shaped differently (greater height/width ratio), all three are different gemstones (agate, carnelian, black paste), and all three bear radically different inscription-divider designs (straight lines, fan-shaped lines, lotus-bud icon).^7 Had they lacked provenance, would VD's method have been useful in determining the probability of their authenticity?		^7) Arad Inscriptions (in Hebrew) by Yohanan Aharoni (Jerusalem: The Bialik Institute and the Israel Exploration Society, 1975), p. 121.
VD do not believe all unprovenanced artifacts are fake. In one place they say their findings merely "suggest that there may be bullas and seals of unknown provenance that are not authentic."^8a But in another they imply that they believe the majority of bullae are fake: "Surely at least a few of the bullae of unknown provenance are authentic..."^8b In some respects, such semantics resemble the Good Cop, Bad Cop tactic, with VD 2006 giving unprovenanced artifacts the benefit of a doubt, while VD 2005 incites a lynch mob!		^8a) VD 2006 (p. 770). ^8b) VD 2005 (seventh paragraph of "A Desire to Scientifically Test the TBTBT [sic] Rule" section).
Statistical Expectations
Let's step away from VD for a moment to list all possible extreme scenarios we should expect to find for unprovenanced artifacts: A) If a skillful forgery artist used published specimens as templates to produce fake ones, we would expect the traits of these two groups to be very similar, and statistically indistinguishable.
B) If the artist had no templates or strayed artistically, then significant differences should be observed for most, if not all, traits.^9 C) If the majority of unprovenanced specimens are genuine (found legally or illegally at ancient sites, then kept in private collections or sold at antiquities markets), we would expect them to be similar to provenanced ones, but not necessarily indistinguishable via statistics unless we knew in advance all traits of ancient Hebrew seals (VD overlooked this crucial fact). You can see that the only instance that would definitely produce statistical variations is the second scenario, B. Most traits would show obvious differences. If the groups are similar as in scenarios A and C, or if there were a balanced combination of any of the scenarios, it would be impossible to prove which ones are genuine and which ones are fake using statistics alone.		^9) This sounds like what VD announced in the 2005 conclusion I quoted above, but their 2006 conclusion used less hyperbole, drifting ambiguously with subjective terms such as "less well known or popular." By "significant", I specifically mean >50% when comparing statistical percentages. The most significant variation found by VD for any trait studied was a mere 52% (4% of unprov. seals with icons had Egyptian ankhs compared to 56% of known-provs. per untitled table in VD 2006 p. 764). The next-most-significant difference was only 17%, and the average difference between all comparisons was a trivial 5%!
Survey Says...
So let's return now to see what VD actually found. They initially chose the following six features to analyze (relationship found between provenanced and unprovenanced specimens in parentheses^10):		^10) See untitled table summarizing the actual statistics in VD 2006, p. 760.
1) Seal perforation^11 (similar)		^11) A hole drilled through the side of the seal allowing it to be attached to a carrying device such as a string or ring.
2a) BN or BT^12 on seals (similar) 2b) BN or BT on bullae (different)		^12) A common noun in Hebrew names, translated "son [of]" or "daughter [of]" between two other names.
3a) Lamed preposition^13 on seals (different) 3b) Lamed preposition on bullae (different)		^13) An introductory device designating ownership of the seal, translated "[seal] of" followed by a name.
4a) Title^14 on seals (similar) 4b) Title on bullae (similar)		^14) "Servant" is a common example, as on the aforementioned one I own.
5a) Divider^15 presence on seals (similar) 5b) Divider presence on bullae (similar)		^15) An artistic device that separates the inscription into distinct regions.
6a) Icon^16 presence on seals (similar) 6b) Icon presence on bullae (different) They discovered that #1, #2a, #4a, #4b, #5a, #5b, and #6a all indicated no significant differences, but #2b, #3a, #3b, and #6b did. So at the outset it would seem that the majority of features (7/11 or 64%) suggest that the provenanced and unprovenanced groups are mostly the same, which as I showed above in scenario C, makes it impossible to distinguish between forgeries and authentic specimens based on these criteria (i.e., not knowing which criteria a fake item should exhibit).		^16) A figure or artwork distinct from letters.
More Surveys Needed?
Unsatisfied with their initial results, they sought additional differences, delving into minutiae by subdividing feature #5 into three additional categories, #6 into another, and finishing with two more based on size:
7a) Divider-line quantity^17 on seals 7b) Divider-line quantity on bullae		^17) Provenanced specimens have 1, 2, or 3 lines (if any) per untitled tables in VD 2006, pp. 762-3. Sometimes the lines are straight, sometimes they are curved, and sometimes they contain additional decoration, but VD did not parse these features.
8a) Divider types^18 on seals 8b) Divider types on bullae		^18) Lines, nothing, or something else per untitled table in VD 2006, p. 763.
9a) Divider icons^19 on seals 9b) Divider icons on bullae		^19) Mostly a lotus-bud divider per untitled table in VD 2006, p. 764, which is basically a circle (sometimes dotted) flanked by 2 ovals representing leaves. Alternatively, sometimes a "ladder" pattern is used.
10a) Icon types^20 on seals 10b) Icon types on bullae		^20) Untitled table in VD 2006, p. 764 lists 16 figures such as a griffin, beetle, bird, etc.
11a) Mean size of seals^21 11b) Mean size of bullae^22		^21) Untitled table in VD 2006, p. 766. ^22) Untitled table in VD 2006, p. 765.
12a) Size ratio^23 of seals 12b) Size ratio of bullae VD did not actually discuss #7a, #8a, #9a, and #10b, presumably since they were seeking differences, not similarities, to support their foregone conclusion. The difference for #7b was admittedly weak, and there was no difference for #12a. So even with these additional criteria, only half (6/12 or 50%) support the thesis that differences indicate forgeries. But since their proposition is arbitrary, you can just as easily assume that about half these criteria indicate authenticity, right?		^23) Height vs. width per untitled table in VD 2006, p. 769.
2nd Standard Needed?
Apparently that conclusion was unacceptable to VD, so they switched gears for #11a and #11b, introducing a different standard (where forgeries would have similarities per my scenario A). As it turns out, the mean size for each parameter of provenanced artifacts varies more than the mean size of unprovenanced ones! Stated differently, the unprovenanced ones are well within the range defined by provenanced specimens.
If VD were truly objective researchers, they probably would have acknowledged at the outset that they needed a standard by which they could proceed. But they did not quit; they simply introduced this contradictory assumption^24 that they would find similarities: "one would expect forged seals and seal impressions to be very similar to authentic examples in any case."^25		^24) As is known from any elementary study of logic, by allowing a single contradiction into your proposition, you become illogical and can seemingly prove anything. ^25) VD 2006, p. 760.
These arbitrary, contradictory, subjective statements (that differences imply forgery, and that forgeries are similar to authentic specimens, therefore there are forgeries) seem out-of-bounds for a scholarly publication. A scientific analysis would compare specific attributes of known forgeries to genuine specimens, whereas VD go on a circular witch hunt: they believed some were fake even though they knew their method could not distinguish between fake ones and authentic ones: "...this statistical study cannot state with certainty which items are authentic and which are not..." ^26 It not only does not state with certainty, it does not state at all which items are authentic and which are not!		^26) VD 2005, final paragraph.
For each of the 12 traits VD chose to examine, an objective expectation should have been given as to whether large or small statistical variations would indicate forgery or authenticity. For example, why would anyone expect #11 (mean size) to be more similar among forgeries, and #12 (size ratio) to be more similar among authentic specimens? Yet in each case VD absurdly state that their findings are "what one would expect."^27		^27) VD 2005, first paragraph of the "Size of the Seals" section. Similar verbiage appears in VD 2006 on pp. 766 and 768.
On the one hand, VD imagine the forger(s) from my scenario A making "an imitation that closely fit the mean size of known bullas and seals."^28 And yet the forger(s) from my scenario B must be at work proving VD's main thesis that unprovenanced artifacts exhibit "significant" differences!		^28) VD 2006, p. 766.
"Significant" Speculations
So which is it? Did conservative artisans study the provenanced artifacts down to the millimeter and mass-produce minor variations for gullible buyers? Or did liberal artisans over-improvise and make many "significant" variations to entice repeat buyers? Or did genuine, ancient artisans make a wider variety of designs than excavations have revealed? Many of these bullae are fragmentary, and unprovenanced specimens may supplement provenanced ones in the future if they are published and available when the time comes.^29 Likewise, if an excavated bulla were to match an unprovenanced seal, it would provide useful information about the ancient culture. Vilifying those who collect and publish these valuable artifacts serves no useful purpose, especially when using arbitrary criteria.		^29) See "Tracking Down Shebnayahu, Servant of the King: How an Antiquities Market Find Solved a 42-Year-Old Excavation Puzzle" by Robert Deutsch in BAR vol. 35 #3 (May/June 2009), pp. 45-9, 67. In "Epilogue: Methodological Musings From The Field" at the SBL Forum 3.3 (2005), Andrew G. Vaughn and Christopher A. Rollston offered a list of six guidelines when considering unprovenanced antiquities. Their #4 guideline makes this same essential point.
Their stated purpose was "to look objectively" at these traits, but what VD reported was a useless tautology: if a trait differs, it proves there are forgeries, and if a trait is similar, it proves there are forgeries.
Epilogue: Extra Examples of Erroneous Expectations
Forgeries definitely exist on the antiquities market, but VD offered no method of determining which ones are, which aren't, how many there are, or how many there aren't. They gave us a common example of cart-before-the-horse reasoning. I'm sure it happens in many fields of research. I didn't recognize it in 2005 because I had not yet realized how flawed scientific publications could be. After studying the Creation/Evolution debate in recent years, I am now better able to spot logical fallacies.
Evolution^30 is the worst possible subject to apply a scientific expectation to, because according to the standard definition preached by its adherents, it's an unguided, purposeless process. Some organisms remain biologically similar for hundreds of millions of years, while others branch off and change from fish to amphibian to reptile to mammal or bird, while some even change back from land-dwelling to sea-dwelling! Yet many intelligent evolutionists cannot resist the sinful urge to justify their irrational belief system, and occasionally fall into a fallacious trap.		^30) When I use "evolution" here, I'm referring to "macroevolution", the disputed large-scale transitions over long periods of time that cannot be observed or tested (such as the transition from fishes to birds), not "microevolution", the undisputed small-scale transitions regularly observed and tested (also known as "speciation" or "adaptation" such as white-fur bears in snowy environments vs. brown-fur bears in wooded environments).
On the Understanding Evolution site, we see this example by Dr. Anna Thanukos (the site's Principal Editor with a Master's degree in Integrative Biology and Ph.D. in Science Education): "[Dr. Jennifer McElwain] catalogued which terrestrial plant lineages went extinct during this extinction event and found that large-leafed lineages were particularly likely to go extinct � exactly what one would expect to find if global warming had contributed to the mass extinction!" First it says the researcher determined which plants "went extinct" (via the fossil record), then equivocates it to plants that were merely "likely to go extinct". Did all large-leaf plants go extinct? Did any small-leaf plants go extinct? How large? How small? Relative to what? Also notice that the statement merely says "contributed to", not "caused", lessening the significance of the entire argument! Was it a major contribution or only a minor one? How major? Relative to what? How many minor contributors were there? Could there have been a cumulative effect? The charts that accompany the article are fictitious cartoons reflecting the author's wishful imagination. Why not show charts populated with real data and real leaves instead of 2 stereotypes? Such an ambiguous presentation contributes nothing to science, and misleads its readers. Exactly what one would not expect to find on a prominent educational institution's website! Over at The BioLogos Foundation site, an article in a series by two prominent Biology professors appears with ten fallacious expectations.^31 For brevity's sake, I'll highlight two specimens, one from the first paragraph, and one from the next-to-last paragraph.		^31) "Evidences for Evolution: TINMAN and the Development of a Heart" by Darrel Falk and David Kerk. Both authors have PhD's; Falk is the organization's president; Kerk is a professor emeritus.
"[I]f evolution is true, we would expect* that the embryonic development of complex structures ... would proceed through a process of 'adding new features on top of old.'"* At the beginning of the hypothetical process there would be no "old" features. Like I said, the standard theory of Evolution is that it is a random process; sometimes features are added, sometimes they're eliminated. If new features were always added, then no plant or animal would have any unique feature except the latest one. Said another way, nothing in the fossil record would have a unique feature; it would always have been passed on to its descendants. But this is obviously not true, so saying that sometimes a feature is added and sometimes it is lost demonstrates the theory is true, is a useless tautology. "If evolution is true, one would expect* that frequently the same switch would be used down through the eons of time."* Oh, so if Evolution is true, sometimes ("frequently" implies that sometimes it does, sometimes it doesn't) the switch is used, and on the other occasions when it is not used, Evolution is true. Can you think of an occasion when this magical switch would falsify Evolution? This final example comes from an "Introduction to Evolution" lecture by Stephen T. Abedon, a Microbiology PhD with an impressive CV (there are several examples in his text, but I'm using the first one in para. 2a to illustrate my point): "Extant organisms look and behave in a manner which is exactly what one would expect* given an evolutionary genesis."* Would we really expect humans not to be able to reproduce with apes if they both had the same apelike ancestor? Would we really expect a random process to prevent not just the mating of fish with reptiles with mammals with birds, but the simultaneous development of males and females within each of the numerous species? Here I'm not arguing that it did not happen by this proposed process (i.e., Evolution, in all its random^32 glory), I'm merely asking if what we observe today is "exactly what one would expect"?		^32) In para. 1f, Dr. Abedon humorously defines it as having "a particularly big dollop of chance."
Or is the expected evidence for Evolution simply too good to be true?

Click here to jump back to top of page

This page was created on June 12, 2011, & last updated on June 12, 2011; visits

Probable Problems

Statistical Expectations

Survey Says...

More Surveys Needed?

2nd Standard Needed?

"Significant" Speculations

Epilogue: Extra Examples of Erroneous Expectations