Monday, May 27, 2024

Law in Failure: We do Not Know Which Way the Jury Will Go

 The claim is that the jury will review the facts, the defenses, and the law. The jury will weigh the burden of proof, and whether it has been met. Those prcesses should make all verdicts absolutely predictable.  The unpredictability of the verdict means that none of those processes are taking place. What is taking place is the applicaton of the feelings of people from the street. They know nothing about the questions of the trial. If they like the defendant, the verdict will be not guilty. 


Only half the murders are prosecuted, about 12000 a year. Verdicts decide very high stakes outcomes. There is an average of about 35.36 exonerations per year for murder cases. It's important to note that the exoneration rate for murder cases is influenced by various factors, including the type of evidence used, the quality of legal representation, and the prevalence of official misconduct. For instance, a significant proportion of exonerations (55%) involved Black individuals, which may reflect systemic issues within the criminal justice system.

Because of the stakes in criminal cases, each process requires validation. None of the validation steps have ever been carried out for the Rules of Criminal Procedure. Studies of exoneration showing an error rate of 1%, underestimate the rate of false guiilty verdicts. Many innocent people choose the plea deal to avoid the risks and expenses of going to trial. The rate of false guilty verdicts is far higher than 1%. 

Reliability is required before validation. It means repeatability. Repeatability is the essence of scientific conclusions. 

Reliability Statistics:  (see this for more)

  • Inter-rater reliability assesses the degree of agreement between two or more raters in their appraisals. For example, a person gets a stomach ache and different doctors all give the same diagnosis.[5]: 71 
  • Test-retest reliability assesses the degree to which test scores are consistent from one test administration to the next. Measurements are gathered from a single rater who uses the same methods or instruments and the same testing conditions.[4] This includes intra-rater reliability.
  • Inter-method reliability assesses the degree to which test scores are consistent when there is a variation in the methods or instruments used. This allows inter-rater reliability to be ruled out. When dealing with forms, it may be termed parallel-forms reliability.[6]
  • Internal consistency reliability, assesses the consistency of results across items within a test.[6]


Once a high rate or reliability in each of the above types, like over 80%, is established, validations must be proven. Go HereHere

Validity (statistics)

From Wikipedia, the free encyclopedia

Validity is the main extent to which a concept, conclusion, or measurement is well-founded and likely corresponds accurately to the real world.[1][2] The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool (for example, a test in education) is the degree to which the tool measures what it claims to measure.[3] Validity is based on the strength of a collection of different types of evidence (e.g. face validity, construct validity, etc.) described in greater detail below.

In psychometrics, validity has a particular application known as test validity: "the degree to which evidence and theory support the interpretations of test scores" ("as entailed by proposed uses of tests").[4]

It is generally accepted that the concept of scientific validity addresses the nature of reality in terms of statistical measures and as such is an epistemological and philosophical issue as well as a question of measurement. The use of the term in logic is narrower, relating to the relationship between the premises and conclusion of an argument. In logic, validity refers to the property of an argument whereby if the premises are true then the truth of the conclusion follows by necessity. The conclusion of an argument is true if the argument is sound, which is to say if the argument is valid and its premises are true. By contrast, "scientific or statistical validity" is not a deductive claim that is necessarily truth preserving, but is an inductive claim that remains true or false in an undecided manner. This is why "scientific or statistical validity" is a claim that is qualified as being either strong or weak in its nature, it is never necessary nor certainly true. This has the effect of making claims of "scientific or statistical validity" open to interpretation as to what, in fact, the facts of the matter mean.

Validity is important because it can help determine what types of tests to use, and help to ensure researchers are using methods that are not only ethical and cost-effective, but also those that truly measure the ideas or constructs in question.

Test validity[edit]

Validity (accuracy) [edit]

Validity[5] of an assessment is the degree to which it measures what it is supposed to measure. This is not the same as reliability, which is the extent to which a measurement gives results that are very consistent. Within validity, the measurement does not always have to be similar, as it does in reliability. However, just because a measure is reliable, it is not necessarily valid. E.g. a scale that is 5 pounds off is reliable but not valid. A test cannot be valid unless it is reliable. Validity is also dependent on the measurement measuring what it was designed to measure, and not something else instead.[6] Validity (similar to reliability) is a relative concept; validity is not an all-or-nothing idea. There are many different types of validity.

Construct validity[edit]

Construct validity refers to the extent to which operationalizations of a construct (e.g., practical tests developed from a theory) measure a construct as defined by a theory. It subsumes all other types of validity. For example, the extent to which a test measures intelligence is a question of construct validity. A measure of intelligence presumes, among other things, that the measure is associated with things it should be associated with (convergent validity), not associated with things it should not be associated with (discriminant validity).[7]

Construct validity evidence involves the empirical and theoretical support for the interpretation of the construct. Such lines of evidence include statistical analyses of the internal structure of the test including the relationships between responses to different test items. They also include relationships between the test and measures of other constructs. As currently understood, construct validity is not distinct from the support for the substantive theory of the construct that the test is designed to measure. As such, experiments designed to reveal aspects of the causal role of the construct also contribute to constructing validity evidence.[7]

Content validity[edit]

Content validity is a non-statistical type of validity that involves "the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured" (Anastasi & Urbina, 1997 p. 114). For example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific literature?

Content validity evidence involves the degree to which the content of the test matches a content domain associated with the construct. For example, a test of the ability to add two numbers should include a range of combinations of digits. A test with only one-digit numbers, or only even numbers, would not have good coverage of the content domain. Content related evidence typically involves a subject matter expert (SME) evaluating test items against the test specifications. Experts should pay attention to any cultural differences. For example, when a driving assessment questionnaire adopts from England (e. g. DBQ), the experts should consider right-hand driving in Britain. Some studies found how this will be critical to get a valid questionnaire.[8] Before going to the final administration of questionnaires, the researcher should consult the validity of items against each of the constructs or variables and accordingly modify measurement instruments on the basis of SME's opinion.

A test has content validity built into it by careful selection of which items to include (Anastasi & Urbina, 1997). Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. Foxcroft, Paterson, le Roux & Herbst (2004, p. 49)[9] note that by using a panel of experts to review the test specifications and the selection of items the content validity of a test can be improved. The experts will be able to review the items and comment on whether the items cover a representative sample of the behavior domain.

Face validity[edit]

Face validity is an estimate of whether a test appears to measure a certain criterion; it does not guarantee that the test actually measures phenomena in that domain. Measures may have high validity, but when the test does not appear to be measuring what it is, it has low face validity. Indeed, when a test is subject to faking (malingering), low face validity might make the test more valid. Considering one may get more honest answers with lower face validity, it is sometimes important to make it appear as though there is low face validity whilst administering the measures.

Face validity is very closely related to content validity. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion (e.g. does assessing addition skills yield in a good measure for mathematical skills? To answer this you have to know, what different kinds of arithmetic skills mathematical skills include) face validity relates to whether a test appears to be a good measure or not. This judgment is made on the "face" of the test, thus it can also be judged by the amateur.

Face validity is a starting point, but should never be assumed to be probably valid for any given purpose, as the "experts" have been wrong before—the Malleus Malificarum (Hammer of Witches) had no support for its conclusions other than the self-imagined competence of two "experts" in "witchcraft detection", yet it was used as a "test" to condemn and burn at the stake tens of thousands men and women as "witches".[10]

Criterion validity[edit]

Criterion validity evidence involves the correlation between the test and a criterion variable (or variables) taken as representative of the construct. In other words, it compares the test with other measures or outcomes (the criteria) already held to be valid. For example, employee selection tests are often validated against measures of job performance (the criterion), and IQ tests are often validated against measures of academic performance (the criterion).

If the test data and criterion data are collected at the same time, this is referred to as concurrent validity evidence. If the test data are collected first in order to predict criterion data collected at a later point in time, then this is referred to as predictive validity evidence.

Concurrent validity[edit]

Concurrent validity refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. When the measure is compared to another measure of the same type, they will be related (or correlated). Returning to the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews.

Predictive validity[edit]

Predictive validity refers to the degree to which the operationalization can predict (or correlate with) other measures of the same construct that are measured at some time in the future. Again, with the selection test example, this would mean that the tests are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.

This is also when measurement predicts a relationship between what is measured and something else; predicting whether or not the other thing will happen in the future. High correlation between ex-ante predicted and ex-post actual outcomes is the strongest proof of validity.

***********************************
If one wants to get a rating scale to be used in a class or in a personality test, one must go through these steps. If one wants to execute a guy, none of those steps are required, just the feelings of the jury. The lawyer profession is not just extremely stupid, uninformed, atavistic, and mired in 13th Century practices, it is quackery. 

************************************

Juries: The advantage of using a jury trial  is the possibility of getting the Wisdom of the Crowd. The problems of the jury are less than those of the judge, an individual. The judge is a lawyer. He has the interests of the profession. His jobs depends on promoting litigation.  The jury is rarely a real sampling of peers. If it were, there would be no excuses, no exclusions, no voire dire (delection by the lawyers). Only random sampling of the population would be permitted. The jury comes from a local culture. The jury is irritated by the interruption of their lives. They are irritated by the low pay. Jurors should be paid their usual day rate if they are going to lose their earnings during the trial. If there is a knowledgeable or passionate member, he may bully the rest. The rest just wants to go home, and will give in to a member with certainty and a dominant personality. 

************************************

Alternative: It is fair to say the current trial practices of judge and jury verdicts are 13th Century garbage practices. The adversarial system is copied from the disputation method of Scholasticism. That was cool in 1275 AD. Today, it is a ridiculous way for lawyers to double their income and add no value whatsoever. Scholasticism is a Catholic Church philosopy. The entire court trial violates the Establishment Clause. The court looks like a church. It has an altar on which the judge sits. It has a judge wearing clerical robes. It makes the congregants sit and stand several times, as in a Catholic church service. The modern trial is atavistic religious garbage with bad reliaiblity and validity statistics. It not only violates the Establishment Clause by its origins in Catholic Church practices. It also violates the Procedural Due Process Clauses of the Fifth and Fournteenth Amendments. One of its rights is a right to a fair hearing. 

The remedy is to make judging a separate profession from lawyering. Start judge schools. They would admit mature middle aged people. Anyone who passed 1L would be disqualified because the intellect and the ethics of that person have been destroyed. The main message of the judge education would be to apply the law, and not to make the law. The judge would be allowed to investigate the question of the trial, being the smartest and most experienced person in the trial. Now, judges are totally muted, and will be removed if they investigate on their own. Judges should be liable for wrong decisions if they cause damages. They should carry insurance to cover their mistakes. If a judge makes many, his insurance will be cancelled, and he should not be on the bench. 


No comments:

Post a Comment