english

The impossible (pipe) dream—single-payer health reform


Led by presidential candidate Bernie Sanders, one-time supporters of ‘single-payer’ health reform are rekindling their romance with a health reform idea that was, is, and will remain a dream.  Single-payer health reform is a dream because, as the old joke goes, ‘you can’t get there from here.

Let’s be clear: opposing a proposal only because one believes it cannot be passed is usually a dodge.One should judge the merits. Strong leaders prove their skill by persuading people to embrace their visions. But single-payer is different. It is radical in a way that no legislation has ever been in the United States.

Not so, you may be thinking. Remember such transformative laws as the Social Security Act, Medicare, the Homestead Act, and the Interstate Highway Act. And, yes, remember the Affordable Care Act. Those and many other inspired legislative acts seemed revolutionary enough at the time. But none really was. None overturned entrenched and valued contractual and legislative arrangements. None reshuffled trillions—or in less inflated days, billions—of dollars devoted to the same general purpose as the new legislation. All either extended services previously available to only a few, or created wholly new arrangements.

To understand the difference between those past achievements and the idea of replacing current health insurance arrangements with a single-payer system, compare the Affordable Care Act with Sanders’ single-payer proposal.

Criticized by some for alleged radicalism, the ACA is actually stunningly incremental. Most of the ACA’s expanded coverage comes through extension of Medicaid, an existing public program that serves more than 60 million people. The rest comes through purchase of private insurance in “exchanges,” which embody the conservative ideal of a market that promotes competition among private venders, or through regulations that extended the ability of adult offspring to remain covered under parental plans. The ACA minimally altered insurance coverage for the 170 million people covered through employment-based health insurance. The ACA added a few small benefits to Medicare but left it otherwise untouched. It left unaltered the tax breaks that support group insurance coverage for most working age Americans and their families. It also left alone the military health programs serving 14 million people. Private nonprofit and for-profit hospitals, other vendors, and privately employed professionals continue to deliver most care.

In contrast, Senator Sanders’ plan, like the earlier proposal sponsored by Representative John Conyers (D-Michigan) which Sanders co-sponsored, would scrap all of those arrangements. Instead, people would simply go to the medical care provider of their choice and bills would be paid from a national trust fund. That sounds simple and attractive, but it raises vexatious questions.

  • How much would it cost the federal government? Where would the money to cover the costs come from?
  • What would happen to the $700 billion that employers now spend on health insurance?
  • How would the $600 billion a year reductions in total health spending that Sanders says his plan would generate come from?
  • What would happen to special facilities for veterans and families of members of the armed services?

Sanders has answers for some of these questions, but not for others. Both the answers and non-answers show why single payer is unlike past major social legislation.

The answer to the question of how much single payer would cost the federal government is simple: $4.1 trillion a year, or $1.4 trillion more than the federal government now spends on programs that the Sanders plan would replace. The money would come from new taxes. Half the added revenue would come from doubling the payroll tax that employers now pay for Social Security. This tax approximates what employers now collectively spend on health insurance for their employees...if they provide health insurance. But many don’t. Some employers would face large tax increases. Others would reap windfall gains.

The cost question is particularly knotty, as Sanders assumes a 20 percent cut in spending averaged over ten years, even as roughly 30 million currently uninsured people would gain coverage. Those savings, even if actually realized, would start slowly, which means cuts of 30 percent or more by Year 10. Where would they come from? Savings from reduced red-tape associated with individual insurance would cover a small fraction of this target. The major source would have to be fewer services or reduced prices. Who would determine which of the services physicians regard as desirable -- and patients have come to expect -- are no longer ‘needed’? How would those be achieved without massive bankruptcies among hospitals, as columnist Ezra Klein has suggested, and would follow such spending cuts? What would be the reaction to the prospect of drastic cuts in salaries of health care personnel – would we have a shortage of doctors and nurses? Would patients tolerate a reduction in services? If people thought that services under the Sanders plan were inadequate, would they be allowed to ‘top up’ with private insurance? If so, what happens to simplicity? If not, why not?

Let me be clear: we know that high quality health care can be delivered at much lower cost than is the U.S. norm. We know because other countries do it. In fact, some of them have plans not unlike the one Senator Sanders is proposing. We know that single-payer mechanisms work in some countries. But those systems evolved over decades, based on gradual and incremental change from what existed before. That is the way that public policy is made in democracies. Radical change may occur after a catastrophic economic collapse or a major war. But in normal times, democracies do not tolerate radical discontinuity. If you doubt me, consider the tumult precipitated by the really quite conservative Affordable Care Act.


Editor's note: This piece originally appeared in Newsweek.

Authors

Publication: Newsweek
Image Source: © Jim Young / Reuters
      
 
 




english

2016: The most important election since 1932


The 2016 presidential election confronts the U.S. electorate with political choices more fundamental than any since 1964 and possibly since 1932. That statement may strike some as hyperbolic, but the policy differences between the two major parties and the positions of candidates vying for their presidential nominations support this claim.

A victorious Republican candidate would take office backed by a Republican-controlled Congress, possibly with heightened majorities and with the means to deliver on campaign promises. On the other hand, the coattails of a successful Democratic candidate might bring more Democrats to Congress, but that president would almost certainly have to work with a Republican House and, quite possibly, a still Republican Senate. The political wars would continue, but even a president engaged in continuous political trench warfare has the power to get a lot done.

Candidates always promise more than they can deliver and often deliver different policies from those they have promised. Every recent president has been buffeted by external events unanticipated when he took office. But this year, more than in half a century or more, the two parties offer a choice, not an echo. Here is a partial and selective list of key issues to illustrate what is at stake.

Health care 

The Affordable Care Act, known as Obamacare or the ACA, passed both houses of Congress with not a single Republican vote. The five years since enactment of the ACA have not dampened Republican opposition.

The persistence and strength of opposition to the ACA is quite unlike post-enactment reactions to the Social Security Act of 1935 or the 1965 amendments that created Medicare. Both earlier programs were hotly debated and controversial. But a majority of both parties voted for the Social Security Act. A majority of House Republicans and a sizeable minority of Senate Republicans supported Medicare. In both cases, opponents not only became reconciled to the new laws but eventually participated in improving and extending them. Republican members of Congress overwhelmingly supported, and a Republican president endorsed, adding Disability Insurance to the Social Security Act.  In 2003, a Republican president proposed and fought for the addition of a drug benefit to Medicare.

The current situation bears no resemblance to those two situations. Five years after enactment of Obamacare, in contrast, every major candidate for the Republican presidential nomination has called for its repeal and replacement. So have the Republican Speaker of the House of Representatives and Majority Leader in the Senate.  

Just what 'repeal and replace' might look like under a GOP president remains unclear as ACA critics have not agreed on an alternative. Some plans would do away with some of the elements of Obamacare and scale back others. Some proposals would repeal the mandate that people carry insurance, the bar on 'medical underwriting' (a once-routine practice under which insurers vary premiums based on expected use of medical care), or the requirement that insurers sell plans to all potential customers. Other proposals would retain tax credits to help make insurance affordable but reduce their size, or would end rules specifying what 'adequate' insurance plans must cover.

Repeal is hard to imagine if a Democrat wins the presidency in 2016. Even if repeal legislation could overcome a Senate filibuster, a Democratic president would likely veto it and an override would be improbable. 

But a compromise with horse-trading, once routine, might once again become possible. A Democratic president might agree to Republican-sponsored changes to the ACA, such as dropping the requirement that employers of 50 or more workers offer insurance to their employees, if Republicans agreed to changes in the ACA that supporters seek, such as the extension of tax credits to families now barred from them because one member has access to very costly employer-sponsored insurance.

In sum, the 2016 election will determine the future of the most far-reaching social insurance legislation in half a century.

Social Security

Social Security faces a projected long-term gap between what it takes in and what it is scheduled to pay out. Every major Republican candidate has called for cutting benefits below those promised under current law. None has suggested any increase in payroll tax rates. Each Democratic candidate has proposed raising both revenues and benefits. Within those broad outlines, the specific proposals differ.

Most Republican candidates would cut benefits across the board or selectively for high earners. For example, Senator Ted Cruz proposes to link benefits to prices rather than wages, a switch that would reduce Social Security benefits relative to current law by steadily larger amounts: an estimated 29 percent by 2065 and 46 percent by 2090. He would allow younger workers to shift payroll taxes to private accounts. Donald Trump has proposed no cuts in Social Security because, he says, proposing cuts is inconsistent with winning elections and because meeting current statutory commitments is 'honoring a deal.' Trump also favors letting people invest part of their payroll taxes in private securities. He has not explained how he would make up the funding gap that would result if current benefits are honored but revenues to support them are reduced. Senator Marco Rubio has endorsed general benefit cuts, but he has also proposed to increase the minimum benefit. Three Republican candidates have proposed ending payroll taxes for older workers, a step that would add to the projected funding gap.

Democratic candidates, in contrast, would raise benefits, across-the-board or for selected groups—care givers or survivors. They would switch the price index used to adjust benefits for inflation to one that is tailored to consumption of the elderly and that analysts believe would raise benefits more rapidly than the index now in use. All would raise the ceiling on earnings subject to the payroll tax. Two would broaden the payroll tax base.

As these examples indicate, the two parties have quite different visions for Social Security. Major changes, such as those envisioned by some Republican candidates, are not easily realized, however. Before he became president, Ronald Reagan in numerous speeches called for restructuring Social Security. Those statements did not stop him from signing a 1983 law that restored financial balance to the very program against which he had inveighed but with few structural changes. George W. Bush sought to partially privatize Social Security, to no avail. Now, however, Social Security faces a funding gap that must eventually be filled. The discipline of Trust Fund financing means that tax increases, benefit cuts, or some combination of the two are inescapable. Action may be delayed beyond the next presidency, as current projections indicate that the Social Security Trust Fund and current revenues can sustain scheduled benefits until the mid 2030s. But that is not what the candidates propose. Voters face a choice, clear and stark, between a Democratic president who would try to maintain or raise benefits and would increase payroll taxes to pay for it, and a Republican president who would seek to cut benefits, oppose tax increases, and might well try to partially privatize Social Security.

The Environment

On no other issue is the split between the two parties wider or the stakes in their disagreement higher than on measures to deal with global warming. Leading Republican candidates have denied that global warming is occurring (Trump), scorned evidence supporting the existence of global warming as bogus (Cruz), acknowledged that global warming is occurring but not because of human actions (Rubio, Carson), or admitted that it is occurring but dismissed it as not a pressing issue (Fiorina, Christie). Congressional Republicans oppose current Administration initiatives under the Clean Air Act to curb emission of greenhouse gases.

Democratic candidates uniformly agree that global warming is occurring and that it results from human activities. They support measures to lower those emissions by amounts similar to those embraced in the Paris accords of December 2015 as essential to curb the speed and ultimate extent of global warming.

Climate scientists and economists are nearly unanimous that unabated emissions of greenhouse gases pose serious risks of devastating and destabilizing outcomes—that climbing average temperatures could render some parts of the world uninhabitable, that increases in sea levels that will inundate coastal regions inhabited by tens of millions of people, and that storms, droughts, and other climatic events will be more frequent and more destructive. Immediate actions to curb emission of greenhouse gases can reduce these effects. But no actions can entirely avoid them, and delay is costly.  Environmental economists also agree, with little partisan division, that the way to proceed is to harness market forces to reduce greenhouse gas emissions.” 

The division between the parties on global warming is not new. In 2009, the House of Representatives narrowly passed the American Clean Energy and Security Act. That law would have capped and gradually lowered greenhouse gas emissions. Two hundred eleven Democrats but only 8 Republicans voted for the bill. The Senate took no action, and the proposal died.

Now Republicans are opposing the Obama administration’s Clean Power Plan, a set of regulations under the Clean Air Act to lower emissions by power plants, which account for 40 percent of the carbon dioxide released into the atmosphere. The Clean Power Plan is a stop-gap measure. It applies only to power plants, not to other sources of emissions, and it is not nationally uniform. These shortcomings reflect the legislative authority on which the plan is based, the Clean Air Act. That law was designed to curb the local problem of air pollution, not the global damage from greenhouse gases. Environmental economists of both parties recognize that a tax or a cap on greenhouse gas emissions would be more effective and less costly than the current regulations, but superior alternatives are now politically unreachable.

Based on their statements, any of the current leading Republican candidates would back away from the recently negotiated Paris climate agreement, scuttle the Clean Power Plan, and resist any tax on greenhouse gas emissions. Any of the Democratic candidates would adhere to the Clean Power Plan and support the Paris climate agreement. One Democratic candidate has embraced a carbon tax. None has called for the extension of the Clean Power Plan to other emission sources, but such policies are consistent with their current statements.

The importance of global policy to curb greenhouse gas emissions is difficult to exaggerate. While the United States acting alone cannot entirely solve the problem, resolute action by the world’s largest economy and second largest greenhouse gas emitter is essential, in concert with other nations, to forestall climate catastrophe.

The Courts

If the next president serves two terms, as six of the last nine presidents have done, four currently sitting justices will be over age 86 and one over age 90 by the time that presidency ends—provided that they have not died or resigned.

The political views of the president have always shaped presidential choices regarding judicial appointments. As all carry life-time tenure, these appointments influence events long after the president has left office. The political importance of these appointments has always been enormous, but it is even greater now than in the past. One reason is that the jurisprudence of sitting Supreme Court justices now lines up more closely than in the past with that of the party of the president who appointed them. Republican presidents appointed all sitting justices identified as conservative; Democratic presidents appointed all sitting justices identified as liberal. The influence of the president’s politics extends to other judicial appointments as well.

A second reason is that recent judicial decisions have re-opened decisions once regarded as settled. The decision in the first case dealing with the Affordable Care Act (ACA), NFIB v. Sibelius is illustrative.

When the ACA was enacted, few observers doubted the power of the federal government to require people to carry health insurance. That power was based on a long line of decisions, dating back to the 1930s, under the Constitutional clause authorizing the federal government to regulate interstate commerce. In the 1930s, the Supreme Court rejected an older doctrine that had barred such regulations. The earlier doctrine dated from 1905 when the Court overturned a New York law that prohibited bakers from working more than 10 hours a day or 60 hours a week. The Court found in the 14th Amendment, which prohibits any state from ‘depriving any person of life, liberty or property, without due process of law,’ a right to contract previously invisible to jurists which it said the New York law violated. In the early- and mid-1930s, the Court used this doctrine to invalidate some New Deal legislation. Then the Court changed course and authorized a vast range of regulations under the Constitution’s Commerce Clause.  It was on this line of cases that supporters of the ACA relied.

Nor did many observers doubt the power of Congress to require states to broaden Medicaid coverage as a condition for remaining in the Medicaid program and receiving federal matching grants to help them pay for required medical services.

To the surprise of most legal scholars, a 5-4 Supreme Court majority ruled in NFIB v. Sibelius that the Commerce Clause did not authorize the individual health insurance mandate. But it decided, also 5 to 4, that tax penalties could be imposed on those who fail to carry insurance. The tax saved the mandate. But the decision also raised questions about federal powers under the Commerce Clause. The Court also ruled that the Constitution barred the federal government from requiring states to expand Medicaid coverage as a condition for remaining in the program. This decision was odd, in that Congress certainly could constitutionally have achieved the same objective by repealing the old Medicaid program and enacting a new Medicaid program with the same rules as those contained in the ACA that states would have been free to join or not.

NFIB v. Sibelius and other cases the Court has recently heard or soon will hear raise questions about what additional attempts to regulate interstate commerce might be ruled unconstitutional and about what limits the Court might impose on Congress’s power to require states to implement legislated rules as a condition of receiving federal financial aid. The Court has also heard, or soon will hear, a series of cases of fundamental importance regarding campaign financing, same-sex marriage, affirmative action, abortion rights, the death penalty, the delegation of powers to federal regulatory agencies, voting rights, and rules under which people can seek redress in the courts for violation of their rights.

Throughout U.S. history, the American people have granted nine appointed judges the power to decide whether the actions taken by elected legislators are or are not consistent with a constitution written more than two centuries ago. As a practical matter, the Court could not maintain this sway if it deviated too far from public opinion. But the boundaries within which the Court has substantially unfettered discretion are wide, and within those limits the Supreme Court can profoundly limit or redirect the scope of legislative authority. The Supreme Court’s switch in the 1930s from doctrines under which much of the New Deal was found to be unconstitutional to other doctrines under which it was constitutional illustrates the Court’s sensitivity to public opinion and the profound influence of its decisions.

The bottom line is that the next president will likely appoint enough Supreme Court justices and other judges to shape the character of the Supreme Court and of lower courts with ramifications both broad and enduring on important aspects of every person’s life.

***

The next president will preside over critical decisions relating to health care policy, Social Security, and environmental policy, and will shape the character of the Supreme Court for the next generation. Profound differences distinguish the two major parties on these and many other issues. A recent survey of members of the House of Representatives found that on a scale of ‘liberal to conservative’ the most conservative Democrat was more liberal than the least conservative Republican. Whatever their source, these divisions are real.  The examples cited here are sufficient to show that the 2016 election richly merits the overworked term 'watershed'—it will be the most consequential presidential election in a very long time.

Authors

      
 
 




english

Is the ACA in trouble?


Editor's Note: This post originally appeared in InsideSources. The author wishes to thank Kevin Lucia for helpful comments and suggestions.

United Health Care’s surprise announcement that it is considering whether to stop selling health insurance through the Affordable Care Act’s health exchanges in 2017 and is also pulling marketing and broker commissions in 2016 has health policy analysts scratching their heads. The announcement is particularly puzzling, as just a month ago, United issued a bullish announcement that it was planning to expand to 11 additional individual markets, taking its total to 34.

United’s stated reason is that this business is unprofitable. That may be true, but it is odd that the largest health insurer in the nation would vacate a growing market without putting up a fight. Is United’s announcement seriously bad news for Obamacare, as many commentators have asserted? Is United seeking concessions in another area and using this announcement as a bargaining chip? Or, is something else going on? The answer, I believe, is that the announcement, while a bit of all of these things, is less significant than many suppose.

To make sense of United’s actions, one has to understand certain peculiarities of United’s business model and some little-understood aspects of the Affordable Care Act.

  • Most of United’s business consists of group sales of insurance through employers who offer plans to their employees as a fringe benefit. United has chosen not to sell insurance aggressively to individuals in most places and, where it does, not to offer the lowest-premium plans. In some states, it does not sell to individuals at all.
  • In 49 states, insurers may sell plans either through the ACA health exchange or directly to customers outside the exchanges. The exceptions are Vermont and the District of Columbia in which individuals buying insurance must go through their exchanges. Thus, insurers may find that “good” risks—those with below-average use of health care—disproportionately buy directly, while the “poor” risks buy through the exchanges.
  • State regulators must review insurance premiums to assure that they are reasonable and set other rules that insurers must follow. This process typically involves some negotiation. With varying skill and intensity, state insurance commissioners try to hold down prices. If they are too lax, buyers may be overcharged. If they are too aggressive, insurers may simply withdraw from the market, causing politically-unpopular inconvenience. These negotiations go on separately in 50 states and the District of Columbia each and every year.
  • Finally, fewer people are now expected to buy insurance through the health exchanges than was expected a couple of years ago. ACA subsidies are modest for people with moderate incomes and the penalties for not carrying insurance have been small. Some people with modest incomes face high deductibles, high out-of-pocket costs, narrow networks of providers, or some mix of all three. As a result, some people who expected not to need much health care have chosen to ‘go bare’ and pay the modest penalties for not carrying insurance.

What seems to have happened—one can’t be sure, as the United announcement is Delphic—is that the company, which mostly delayed its participation in the individual exchanges until 2015, incurred substantial start-up costs, enrolled few customers who turned out to be sicker than anticipated, and experienced more-than-anticipated attrition. Other insurers, including Blue-Cross/Blue-Shield plans nation-wide which hold a dominant position in individual markets in many states, did well enough so that Joseph Swedish, CEO of Anthem, Inc., one of the largest of the ‘Blues,’ announced that his company is firmly committed to the exchanges. But minor players in the individual market, such as United, may have concluded that the costs of developing that market are too high for the expected pay-off.

In evaluating these diverse factors, one needs to recognize that the ACA, in general, and the health exchanges, in particular, have changed insurance markets in fundamental ways. Millions of people who were previously uninsured are now trying to understand the bewildering complexities of health insurance. Insurance companies have a lot to learn, too. The ACA now bars insurance companies from ‘underwriting’—the practice of varying premiums based on the characteristics of individual customers, something at which they were quite expert. Under the ACA, insurance companies must sell insurance to all comers, however sick they may be, and must charge premiums that can vary only based on age. Now, companies must ‘manage’ risk, which is easier for a company with a large market share of the individual market, as the Blues have in most states, than it is for a company like United with only a small share.

What this means is that United’s announcement is regrettable news for those states from which they may decide to withdraw, as its departure would reduce competition. United might also use the threat of departure to negotiate favorable terms with states and the Administration. And it means that federal regulators need to write regulations to discourage individual customers from practices that unfairly saddle insurers with risks, such as buying insurance outside open-enrollment periods designed for exceptional circumstances and then dropping coverage a few months later. But it would be a mistake to treat United’s announcement, presumably made for good and sufficient business reasons, as a portentous omen of an ACA crisis.

Authors

Publication: InsideSources
     
 
 




english

Can taxing the rich reduce inequality? You bet it can!


Two recently posted papers by Brookings colleagues purport to show that “even a large increase in the top marginal rate would barely reduce inequality.”[1]  This conclusion, based on one commonly used measure of inequality, is an incomplete and misleading answer to the question posed: would a stand-alone increase in the top income tax bracket materially reduce inequality?  More importantly, it is the wrong question to pose, as a stand-alone increase in the top bracket rate would be bad tax policy that would exacerbate tax avoidance incentives.  Sensible tax policy would package that change with at least one other tax modification, and such a package would have an even more striking effect on income inequality.  In brief:

    • stand-alone increase in the top tax bracket would be bad tax policy, but it would meaningfully increase the degree to which the tax system reduces economic inequality.  It would have this effect even though it would fall on just ½ of 1 percent of all taxpayers and barely half of their income.
    • Tax policy significantly reduces inequality.  But transfer payments and other spending reduce it far more.  In combination, taxes and public spending materially offset the inequality generated by market income.
    • The revenue from a well-crafted increase in taxes on upper-income Americans, dedicated to a prudent expansions of public spending, would go far to counter the powerful forces that have made income inequality more extreme in the United States than in any other major developed economy.

[1] The quotation is from Peter R. Orszag, “Education and Taxes Can’t Reduce Inequality,” Bloomberg View, September 28, 2015 (at http://bv.ms/1KPJXtx). The two papers are William G. Gale, Melissa S. Kearney, and Peter R. Orszag, “Would a significant increase in the top income tax rate substantially alter income inequality?” September 28, 2015 (at http://brook.gs/1KK40IX) and “Raising the top tax rate would not do much to reduce overall income inequality–additional observations,” October 12, 2015 (at http://brook.gs/1WfXR2G). 

Downloads

Authors

Image Source: © Jonathan Ernst / Reuters
     
 
 




english

Why fewer jobless Americans are counting on disability


As government funding for disability insurance is expected to run out next year, Congress should re-evaluate the costs of the program.

Nine million people in America today are receiving Social Security Disability Insurance, double the number in 1995 and six times the number in 1970. With statistics like that, it’s hardly surprising to see some in Congress worry that more will enroll in the program and costs would continue to rise, especially since government funding for disability insurance is expected to run out by the end of next year. If Congress does nothing, benefits would fall by 19% immediately following next year’s presidential election. So, Congress will likely do something. But what exactly should it do?

Funding for disability insurance has nearly run out of money before. Each time, Congress has simply increased the share of the Social Security payroll tax that goes for disability insurance. This time, however, many members of Congress oppose such a shift unless it is linked to changes that curb eligibility and promote return to work. They fear that rolls will keep growing and costs would keep rising, but findings from a report by a government panel conclude that disability insurance rolls have stopped rising and will likely shrink. The report, authored by a panel of the Social Security Advisory Board, is important in that many of the factors that caused disability insurance to rise, particularly during the Great Recession, have ended.

  • Baby-boomers, who added to the rolls as they reached the disability-prone middle age years, are aging out of disability benefits and into retirement benefits. 

  • The decades-long flood of women increased the pool of people with the work histories needed to be eligible for disability insurance. But women’s labor force participation has fallen a bit from pre-Great Recession peaks, and is not expected again to rise materially. 

  • The Great Recession, which led many who lost jobs and couldn’t find work to apply for disability insurance, is over and applications are down. A recession as large as that of 2008 is improbable any time soon. 

  • Approval rates by administrative law judges, who for many years were suspected of being too ready to approve applications, have been falling. Whatever the cause, this stringency augurs a fall in the disability insurance rolls.

Nonetheless, the Disability Insurance program is not without serious flaws. At the front end, employers, who might help workers with emerging impairments remain on the job by providing therapy or training, have little incentive to do either. Employers often save money if workers leave and apply for benefits. Creating a financial incentive to encourage employers to help workers stay active is something both liberals and conservatives can and should embrace. Unfortunately, figuring out exactly how to do that remains elusive.

At the next stage, applicants who are initially denied benefits confront intolerable delays. They must wait an average of nearly two years to have their cases finally decided and many wait far longer. For the nearly 1 million people now in this situation, the effects can be devastating. As long as their application is pending, applicants risk immediate rejection if they engage in ‘substantial gainful activity,’ which is defined as earning more than $1,090 in any month. This virtual bar on work brings a heightened risk of utter destitution. Work skills erode and the chance of ever reentering the workforce all but vanishes. Speeding eligibility determination is vital but just how to do so is also enormously controversial.

For workers judged eligible for benefits, numerous provisions intended to encourage work are not working. People have advanced ideas on how to help workers regain marketplace skills and to make it worthwhile for them to return to work. But evidence that they will work is scant.

The problems are clear enough. As noted, solutions are not. Analysts have come up with a large number of proposed changes in the program. Two task forces, one organized by The Bipartisan Policy Center and one by the Committee for a Responsible Federal Budget, have come up with lengthy menus of possible modifications to the current program. Many have theoretical appeal. None has been sufficiently tested to allow evidence-based predictions on how they would work in practice.

So, with the need to do something to sustain benefits and to do it fast, Congress confronts a program with many problems for which a wide range of untested solutions have been proposed. Studies and pilots of some of these ideas are essential and should accompany the transfer of payroll tax revenues necessary to prevent a sudden and unjustified cut in benefits for millions of impaired people who currently have little chance of returning to work. Implementing such a research program now will enable Congress to improve a program that is vital, but that is acknowledged to have serious problems.

And the good news, delivered by a group of analysts, is that rapid growth of enrollments will not break the bank before such studies can be carried out.



Editor's Note: This post originally appeared on Fortune Magazine.

Authors

Publication: Fortune Magazine
Image Source: © Randall Hill / Reuters
     
 
 




english

The myth behind America’s deficit


Medicare Hospital Insurance and Social Security would not add to deficits because they can’t spend money they don’t have.

The dog days of August have given way to something much worse. Congress returned to session this week, and the rest of the year promises to be nightmarish. The House and Senate passed budget resolutions earlier this year calling for nearly $5 trillion in spending cuts by 2025. More than two-thirds of those cuts would come from programs that help people with low-and moderate-incomes. Health care spending would be halved. If such cuts are enacted, the president will likely veto them. At best, another partisan budget war will ensue after which the veto is sustained. At worst, the cuts become law.

The putative justification for these cuts is that the nation faces insupportable increases in public debt because of expanding budget deficits. Even if the projections were valid, it would be prudent to enact some tax increases in order to preserve needed public spending. But the projections of explosively growing debt are not valid. They are fantasy.

Wait! you say. The Congressional Budget Office has been telling us for years about the prospect of rising deficit and exploding debt. They repeated those warnings just two months ago. Private organizations of both the left and right agree with the CBO’s projections, in general if not in detail. How can any sane person deny that the nation faces a serious long-term budget deficit problem?

The answer is simple: The CBO and private organizations use a convention in preparing their projections that is at odds with established policy and law. If, instead, projections are based on actual current law, as they claim to be, the specter of an increasing debt burden vanishes. What is that convention? Why is it wrong? Why did CBO adopt it, and why have others kept it?

CBO’s budget projections cover the next 75 years. Its baseline projections claim to be based on current law and policy. (CBO also presents an ‘alternative scenario’ based on assumed changes in law and policy). Within that period, Social Security (OASDI) and Medicare Hospital Insurance (HI) expenditures are certain to exceed revenues earmarked to pay for them. Both are financed through trust funds. Both funds have sizeable reserves — government securities — that can be used to cover short falls for a while. But when those reserves are exhausted, expenditures cannot exceed current revenues. Trust fund financing means that neither Social Security nor Medicare Hospital Insurance can run deficits. Nor can they add to the public debt.

Nonetheless, CBO and other organizations assume that Social Security and Medicare Hospital Insurance can and will spend money they don’t have and that current law bars them from spending.

One of the reasons why trust fund financing was used, first for Social Security and then for Medicare Hospital Insurance, was to create a framework that disciplined Congress earmarked to earmark sufficient revenues to pay for benefits it might award. Successive presidents and Congresses, both Republican and Democratic, have repeatedly acted to prevent either program’s cumulative spending from exceeding cumulative revenues. In 1983, for example, faced with an impending trust fund shortfall, Congress cut benefits and raised taxes enough to turn prospective cash flow trust fund deficits into cash flow surpluses. And President Reagan signed the bill. In so doing, they have reaffirmed the discipline imposed by trust fund financing.

Trust fund accounting explains why people now are worrying about the adequacy of funding for Social Security and Medicare. They recognize that the trust funds will be depleted in a couple of decades. They understand that between now and then Congress must either raise earmarked taxes or fashion benefit cuts. If it doesn’t raise taxes, benefits will be cut across the board. Either way, the deficits that CBO and other organizations have built into their budget projections will not materialize.

The implications for projected debt of CBO’s inclusion in its projections of deficits that current law and established policy do not allow are enormous, as the graph below shows.

If one excludes deficits in Social Security and Medicare Hospital Insurance that cannot occur under current law and established policy, the ratio of national debt to gross domestic product will fall, not rise, as CBO budget projections indicate. In other words, the claim that drastic cuts in government spending are necessary to avoid calamitous budget deficits is bogus.

It might seem puzzling that CBO, an agency known for is professionalism and scrupulous avoidance of political bias, would adopt a convention so at odds with law and policy. The answer is straightforward—Congress makes them do it. Section 257 of the Balanced Budget and Emergency Deficit Control Act of 1985 requires CBO to assume that the trust funds can spend money although legislation governing trust fund operations bars such expenditures. CBO is obeying the law.

No similar explanation exonerates the statement of the Committee for a Responsible Federal Budget, which on August 25, 2015 cited, with approval, the conclusion that ‘debt continues to grow unsustainably,’ or that of the Bipartisan Policy Center, which wrote on the same day that ‘America’s debt continues to grow on an unsustainable path.’ Both statements are wrong.

To be sure, the dire budget future anticipated in the CBO projections could materialize. Large deficits could result from an economic calamity or war. Congress could abandon the principle that Social Security and Medicare Hospital Insurance should be financed within trust funds. It could enact other fiscally rash policies. But such deficits do not flow from current law or reflect the trust fund discipline endorsed by both parties over the last 80 years. And it is current law and policy that are supposed to underlie budget projections. Slashing spending because a thirty-year old law requires CBO to assume that Congress will do something it has shown no sign of doing—overturn decades of bipartisan prudence requiring that the major social insurance programs spend only money specifically earmarked for them, and not a penny more—would impose enormous hardship on vulnerable populations in the name of a fiscal fantasy.



Editor's Note: This post originally appeared in Fortune Magazine.

Authors

Publication: Fortune Magazine
Image Source: © Jonathan Ernst / Reuters
     
 
 




english

King v. Burwell: Chalk one up for common sense


The Supreme Court today decided that Congress meant what it said when it enacted the Affordable Care Act (ACA). The ACA requires people in all 50 states to carry health insurance and provided tax credits to help them afford it. To have offered such credits only in the dozen states that set up their own exchanges would have been cruel and unsustainable because premiums for many people would have been unaffordable.

But the law said that such credits could be paid in exchanges ‘established by a state,’ which led some to claim that the credits could not be paid to people enrolled by the federally operated exchange. In his opinion, Chief Justice Roberts euphemistically calls that wording ‘inartful.’ Six Supreme Court justices decided that, read in its entirety, the law provides tax credits in every state, whether the state manages the exchange itself or lets the federal government do it for them.

That decision is unsurprising. More surprising is that the Court agreed to hear the case. When it did so, cases on the same issue were making their ways through four federal circuits. In only one of the four circuits was there a standing decision, and it found that tax credits were available everywhere. It is customary for the Supreme Court to wait to take a case until action in lower courts is complete or two circuits have disagreed. In this situation, the justices, eyeing the electoral calendar, may have preferred to hear the case sooner rather than later to avoid confronting it in the middle of a presidential election.

Whatever the Court’s motives for taking the case, their willingness to hear the case caused supporters of the Affordable Care Act enormous unease. Were the more conservative members of the Court poised to accept an interpretation of the law that ACA supporters found ridiculous but that inartful legislative drafting gave the gloss of plausibility? Judicial demeanor at oral argument was not comforting. A 5-4 decision disallowing payment of tax credits seemed ominously plausible.

Future Challenges for the ACA

The Court’s 6-3 decision ended those fears. The existential threat to health reform from litigation is over. But efforts to undo the Affordable Care Act are not at an end. They will continue in the political sphere. And that is where they should be. ACA opponents know that there is little chance for them to roll back the Affordable Care Act in any fundamental way as long as a Democrat is in the White House. To dismantle the law, they must win the presidency in 2016.

But winning the presidency will not be enough. It would be mid 2017 before ACA opponents could draft and enact legislation to curb the Affordable Care Act and months more before it could take effect. To borrow a metaphor from the military, even if those opposed to the ACA win the presidency, they will have to deal with ‘facts on the ground.’

Well over 30 million Americans will be receiving health insurance under the Affordable Care Act. That will include people who can afford health insurance because of the tax credits the Supreme Court affirmed today. It will include millions more insured through Medicaid in the steadily growing number of states that have agreed to extend Medicaid coverage. It will include the young adult children covered under parental plans because the ACA requires this option.

Insurance companies will have millions more customers because of the ACA. Hospitals will fill more beds because previously uninsured people will be able to afford care and will have fewer unpaid bills generated by people who were uninsured but the hospitals had to admit under previous law. Drug companies and device manufacturers will be enjoying increased sales because of the ACA.

The elderly will have better drug coverage because the ACA has eliminated the notorious ‘donut hole’—the drug expenditures that Medicare previously did not cover.

Those facts will discourage any frontal assault on the ACA, particularly if the rate of increase of health spending remains as well controlled as it has been for the past seven years.

Of course, differences between supporters and opponents of the ACA will not vanish. But those differences will not preclude constructive legislation. Beginning in 2017, the ACA gives states, an opening to propose alternative ways of achieving the goals of the Affordable Care Act, alone on in groups, by alternative means. The law authorizes the president to approve such waivers if they serve the goals of the law. The United States is large and diverse. Use of this authority may help diffuse the bitter acrimony surrounding Obamacare, as my colleague, Stuart Butler, has suggested. At the same time, Obamacare supporters have their own list of changes that they believe would improve the law. At the top of the list is fixing the ‘family glitch,’ a drafting error that unintentionally deprives many families of access to the insurance exchanges and to tax credits that would make insurance affordable.

As Chief Justice Roberts wrote near the end of his opinion of the Court, “In a democracy, the power to make the law rests with those chosen by the people....Congress passed the Affordable Care Act to improve health insurance markets, not to destroy them.” The Supreme Court decision assuring that tax credits are available in all states spares the nation chaos and turmoil. It returns the debate about health care policy to the political arena where it belongs. In so doing, it brings a bit closer the time when the two parties may find it in their interest to sit down and deal with the twin realities of the Affordable Care Act: it is imperfect legislation that needs fixing, and it is decidedly here to stay.

Authors

Image Source: © Jim Tanner / Reuters
     
 
 




english

Eurozone desperately needs a fiscal transfer mechanism to soften the effects of competitiveness imbalances


The eurozone has three problems: national debt obligations that cannot be met, medium-term imbalances in trade competitiveness, and long-term structural flaws.

The short-run problem requires more of the monetary easing that Germany has, with appalling shortsightedness, been resisting, and less of the near-term fiscal restraint that Germany has, with equally appalling shortsightedness, been seeking. To insist that Greece meet all of its near-term current debt service obligations makes about as much sense as did French and British insistence that Germany honor its reparations obligations after World War I. The latter could not be and were not honored. The former cannot and will not be honored either.

The medium-term problem is that, given a single currency, labor costs are too high in Greece and too low in Germany and some other northern European countries. Because adjustments in currency values cannot correct these imbalances, differences in growth of wages must do the job—either wage deflation and continued depression in Greece and other peripheral countries, wage inflation in Germany, or both. The former is a recipe for intense and sustained misery. The latter, however politically improbable it may now seem, is the better alternative.

The long-term problem is that the eurozone lacks the fiscal transfer mechanisms necessary to soften the effects of competitiveness imbalances while other forms of adjustment take effect. This lack places extraordinary demands on the willingness of individual nations to undertake internal policies to reduce such imbalances. Until such fiscal transfer mechanisms are created, crises such as the current one are bound to recur.

Present circumstances call for a combination of short-term expansionary policies that have to be led or accepted by the surplus nations, notably Germany, who will also have to recognize and accept that not all Greek debts will be paid or that debt service payments will not be made on time and at originally negotiated interest rates. The price for those concessions will be a current and credible commitment eventually to restore and maintain fiscal balance by the peripheral countries, notably Greece.


Authors

Publication: The International Economy
Image Source: © Vincent Kessler / Reuters
     
 
 




english

Strengthening Medicare for 2030


Event Information

June 5, 2015
9:00 AM - 1:00 PM EDT

Falk Auditorium
Brookings Institution
1775 Massachusetts Avenue, N.W.
Washington, DC 20036

Register for the Event

In its 50th year, the Medicare program currently provides health insurance coverage for more than 49 million Americans and accounts for $600 billion in federal spending. With those numbers expected to rise as the baby boomer generation ages, many policy experts consider this impending expansion a major threat to the nation’s economic future and question how it might affect the quality and value of health care for Medicare beneficiaries.

On June 5, the Center for Health Policy at Brookings and the USC Leonard D. Schaeffer Center for Health Policy and Economics hosted a half-day forum on the future of Medicare. Instead of reflecting on historical accomplishments, the event looked ahead to 2030—a time when the youngest Baby Boomers will be Medicare-eligible—and explore the changing demographics, health care needs, medical technology costs, and financial resources available to beneficiaries. The panels focused on modernizing Medicare's infrastructure, benefit design, marketplace competition, and payment mechanisms. The event also included the release of five policy papers from featured panelists.

Please note that presentation slides from USC's Dana Goldman will not be available for download. For more information on findings from his presentation download the working paper available on this page or watch the event video.

Video

Audio

Transcript

Event Materials

     
 
 




english

Strengthening Medicare for 2030 - A working paper series


The addition of Medicare in 1965 completed a suite of federal programs designed to protect the wealth and health of people reaching older ages in the United States, starting with the Committee on Economic Security of 1934—known today as Social Security. While few would deny Medicare’s important role in improving older and disabled Americans’ financial security and health, many worry about sustaining and strengthening Medicare to finance high-quality, affordable health care for coming generations.

In 1965, average life expectancy for a 65-year-old man and woman was another 13 years and 16 years, respectively. Now, life expectancy for 65-year-olds is 18 years for men and 20 years for women—effectively a four- to five-year increase.

In 2011, the first of 75-million-plus baby boomers became eligible for Medicare. And by 2029, when all of the baby boomers will be 65 or older, the U.S. Census Bureau predicts 20 percent of the U.S. population will be older than 65. Just by virtue of the sheer size of the aging population, Medicare spending growth will accelerate sharply in the coming years.


Estimated Medicare Spending, 2010-2030



Sources: Future Elderly Model (FEM), University of Southern California Leonard D. Schaeffer Center for Health Policy & Economics, U.S. Census Bureau projections, Medicare Current Beneficiary Survey and Centers for Medicare & Medicaid Services.

The Center for Health Policy at Brookings and the USC Leonard D. Schaeffer Center for Health Policy and Economics' half-day forum on the future of Medicare, looked ahead to the year 2030--a year when the youngest baby boomers will be Medicare-eligible-- to explore the changing demographics, health care needs, medical technology costs, and financial resources that will be available to beneficiaries. The working papers below address five critical components of Medicare reform, including: modernizing Medicare's infrastructure, benefit design, marketplace competition, and payment mechanisms.

DISCUSSION PAPERS

  • Health and Health Care of Beneficiaries in 2030, Étienne Gaudette, Bryan Tysinger, Alwyn Cassil and Dana Goldman: This chartbook, prepared by the USC Schaeffer Center, aims to help policymakers understand how Medicare spending and beneficiary demographics will likely change over the next 15 years to help strengthen and sustain the program.
  • Trends in the Well-Being of Aged and their Prospects through 2030, Gary Burtless: This paper offers a survey of trends in old-age poverty, income, inequality, labor market activity, insurance coverage, and health status, and provides a brief discussion of whether the favorable trends of the past half century can continue in the next few decades.
  • The Transformation of Medicare, 2015 to 2030, Henry J. Aaron and Robert Reischauer: This paper discusses how Medicare can be made a better program and how it should look in 2030s using the perspectives of beneficiaries, policymakers and administrators; and that of society at large.
  • Improving Provider Payment in Medicare, Paul Ginsburg and Gail Wilensky: This paper discusses the various alternative payment models currently being implemented in the private sector and elsewhere that can be employed in the Medicare program to preserve quality of care and also reduce costs.

Authors

Publication: The Brookings Institution and the USC Schaeffer Center
     
 
 




english

Three cheers for logrolling: The demise of the Sustainable Growth Rate (SGR)


Editor's note: This post originally appeared in the New England Journal of Medicine's Perspective online series on April 22, 2015.

Congress has finally euthanized the sustainable growth rate formula (SGR). Enacted in 1997 and intended to hold down growth of Medicare spending on physician services, the formula initially worked more or less as intended. Then it began to call for progressively larger and more unrealistic fee cuts — nearly 30% in some years, 21% in 2015. Aware that such cuts would be devastating, Congress repeatedly postponed them, and most observers understood that such cuts would never be implemented. Still, many physicians fretted that the unthinkable might happen.

Now Congress has scrapped the SGR, replacing it with still-embryonic but promising incentives that could catalyze increased efficiency and greater cost control than the old, flawed formula could ever really have done, in a law that includes many other important provisions. How did such a radical change occur?  And why now?

The “how” was logrolling — the trading of votes by legislators in order to pass legislation of interest to each of them. Logrolling has become a dirty word, a much-reviled political practice. But the Medicare Access and CHIP (Children’s Health Insurance Program) Reauthorization Act (MACRA), negotiated by House leaders John Boehner (R-OH) and Nancy Pelosi (D-CA) and their staffs, is a reminder that old-time political horse trading has much to be said for it.

The answer to “why now?” can be found in the technicalities of budget scoring. Under the SGR, Medicare’s physician fees were tied through a complex formula to a target based on caseloads, practice costs, and the gross domestic product. When current spending on physician services exceeded the targets, the formula called for fee cuts to be applied prospectively. Fee cuts that were not implemented were carried forward and added to any future cuts the formula might generate. Because Congress repeatedly deferred cuts, a backlog developed. By 2012, this backlog combined with assumed rapid future growth in Medicare spending caused the Congressional Budget Office (CBO) to estimate the 10-year cost of repealing the SGR at a stunning $316 billion.

For many years, Congress looked the costs of repealing the SGR squarely in the eye — and blinked. The cost of a 1-year delay, as estimated by the CBO, was a tiny fraction of the cost of repeal. So Congress delayed — which is hardly surprising.

But then, something genuinely surprising did happen. The growth of overall health care spending slowed, causing the CBO to slash its estimates of the long-term cost of repealing the SGR. By 2015, the 10-year price of repeal had fallen to $136 billion. Even this number was a figment of budget accounting, since the chance that the fee cuts would ever have been imposed was minuscule. But the smaller number made possible the all-too-rare bipartisan collaboration that produced the legislation that President Barack Obama has just signed.

The core of the law is repeal of the SGR and abandonment of the 21% cut in Medicare physician fees it called for this year. In its place is a new method of paying physicians under Medicare. Some elements are specified in law; some are to be introduced later. The hard-wired elements include annual physician fee updates of 0.5% per year through 2019 and 0% from 2020 through 2025, along with a “merit-based incentive payment system” (MIPS) that will replace current incentive programs that terminate in 2018. The new program will assess performance in four categories: quality of care, resource use, meaningful use of electronic health records, and clinical practice improvement activities. Bonuses and penalties, ranging from +12% to –4% in 2020, and increasing to +27% to –9% for 2022 and later, will be triggered by performance scores in these four areas. The exact content of the MIPS will be specified in rules that the secretary of health and human services is to develop after consultation with physicians and other health care providers.

Higher fees will be available to professionals who work in “alternative payment organizations” that typically will move away from fee-for-service payment, cover multiple services, show that they can limit the growth of spending, and use performance-based methods of compensation. These and other provisions will ramp up pressure on physicians and other providers to move from traditional individual or small-group fee-for-service practices into risk-based multi-specialty settings that are subject to management and oversight more intense than that to which most practitioners are yet accustomed.

Both parties wanted to bury the SGR. But MACRA contains other provisions, unrelated to the SGR, that appeal to discrete segments of each party. Democrats had been seeking a 4-year extension of CHIP, which serves 8 million children and pregnant women. They were running into stiff head winds from conservatives who wanted to scale back the program. MACRA extends CHIP with no cuts but does so for only 2 years.  It also includes a number of other provisions sought by Democrats: a 2-year extension of the Maternal, Infant, and Early Childhood Home Visiting program, plus permanent extensions of the Qualified Individual program, which pays Part B Medicare premiums for people with incomes just over the federal poverty thresholds, and transitional medical assistance, which preserves Medicaid eligibility for up to 1 year after a beneficiary gets a job.

The law also facilitates access to health benefits. MACRA extends for two years states’ authority to enroll applicants for health benefits on the basis of data on income, household size, and other factors gathered when people enroll in other programs such as the Supplemental Nutrition Assistance Program, the National School Lunch Program, Temporary Assistance to Needy Families (“welfare”), or Head Start. It also provides $7.2 billion over the next two years to support community health centers, extending funding established in the Affordable Care Act.

Elements of each party, concerned about budget deficits, wanted provisions to pay for the increased spending. They got some of what they wanted, but not enough to prevent some conservative Republicans in both the Senate and the House from opposing final passage. Many conservatives have long sought to increase the proportion of Medicare Part B costs that are covered by premiums. Most Medicare beneficiaries pay Part B premiums covering 25% of the program’s actuarial value. Relatively high-income beneficiaries pay premiums that cover 35, 50, 65, or 80% of that value, depending on their income. Starting in 2018, MACRA will raise the 50% and 65% premiums to 65% and 80%, respectively, affecting about 2% of Medicare beneficiaries. No single person with an income (in 2015 dollars) below $133,501 or couple with income below $267,001 would be affected initially. MACRA freezes these thresholds through 2019, after which they are indexed for inflation. Under previous law, the thresholds were to have been greatly increased in 2019, reducing the number of high-income Medicare beneficiaries to whom these higher premiums would have applied. (For reference, half of all Medicare beneficiaries currently have incomes below $26,000 a year.)

A second provision bars Medigap plans from covering the Part B deductible, which is now $147. By exposing more people to deductibles, this provision will cause some reduction in Part B spending. Everyone who buys such plans will see reduced premiums; some will face increased out-of-pocket costs. The financial effects either way will be small.

Inflexible adherence to principle contributes to the political gridlock that has plunged rates of public approval of Congress to subfreezing lows. MACRA is a reminder of the virtues of compromise and quiet negotiation. A small group of congressional leaders and their staffs crafted a law that gives something to most members of both parties. Today’s appalling norm of poisonously polarized politics make this instance of political horse trading seem nothing short of miraculous.

Authors

Publication: NEJM
     
 
 




english

Government spending: yes, it really can cut the U.S. deficit


Hypocrisy is not scarce in the world of politics. But the current House and Senate budget resolutions set new lows. Each proposes to cut about $5 trillion from government spending over the next decade in pursuit of a balanced budget. Whatever one may think of putting the goal of reducing spending when the ratio of the debt-to-GDP is projected to be stable above investing in the nation’s future, you would think that deficit-reduction hawks wouldn’t cut spending that has been proven to lower the deficit.

Yes, there are expenditures that actually lower the deficit, typically by many dollars for each dollar spent. In this category are outlays on ‘program integrity’ to find and punish fraud, tax evasion, and plain old bureaucratic mistakes. You might suppose that those outlays would be spared. Guess again. Consider the following:

Medicare. Roughly 10% of Medicare’s $600 billion budget goes for what officials delicately call ‘improper payments, according to the 2014 financial report of the Department of Health and Human Services. Some are improper merely because providers ‘up-code’ legitimate services to boost their incomes. Some payments go for services that serve no valid purpose. And some go for phantom services that were never provided. Whatever the cause, approximately $60 billion of improper payments is not ‘chump change.’

Medicare tries to root out these improper payments, but it lacks sufficient staff to do the job. What it does spend on ‘program integrity’ yields an estimated $14.40? for each dollar spent, about $10 billion a year in total. That number counts only directly measurable savings, such as recoveries and claim denials. A full reckoning of savings would add in the hard-to-measure ‘policeman on the beat’ effect that discourages violations by would-be cheats.

Fat targets remain. A recent report from the Institute of Medicine presented findings that veritably scream ‘fraud.’ Per person spending on durable medical equipment and home health care is ten times higher in Miami-Dade County, Florida than the national average. Such equipment and home health accounts for nearly three-quarters of the geographical variation in per person Medicare spending. Yet, only 4% of current recoveries of improper payments come from audits of these two items and little from the highest spending locations.

Why doesn’t Medicare spend more and go after the remaining overpayments, you may wonder? The simple answer is that Congress gives Medicare too little money for administration. Direct overhead expenses of Medicare amount to only about 1.5% of program outlays—6% if one includes the internal administrative costs of private health plans that serve Medicare enrollees. Medicare doesn’t need to spend as much on administration as the average of 19% spent by private insurers, because for example, Medicare need not pay dividends to private shareholders or advertise.

But spending more on Medicare administration would both pay for itself—$2 for each added dollar spent, according to the conservative estimate in the President’s most recent budget—and improve the quality of care. With more staff, Medicare could stop more improper payments and reduce the use of approved therapies in unapproved ways that do no good and may cause harm.

Taxes. Compare two numbers: $540 billion and $468 billion. The first number is the amount of taxes owed but not paid. The second number is the projected federal budget deficit for 2015, according to the Congressional Budget Office.

Collecting all taxes legally owed but not paid is an impossibility. It just isn’t worth going after every violation. But current enforcement falls far short of practical limits. Expenditures on enforcement directly yields $4 to $6 for each dollar spent on enforcement. Indirect savings are many times larger—the cop-on-the-beat effect again. So, in an era of ostentatious concern about budget deficits, you would expect fiscal fretting in Congress to lead to increased efforts to collect what the law says people owe in taxes.

Wrong again. Between 2010 and 2014, the IRS budget was cut in real terms by 20%. At the same time, the agency had to shoulder new tasks under health reform, as well as process an avalanche of applications for tax exemptions unleashed by the 2010 Supreme Court decision in the Citizens United case. With less money to spend and more to do, enforcement staff dropped by 15% and inflation adjusted collections dropped 13%.

One should acknowledge that enforcement will not do away with most avoidance and evasion. Needlessly complex tax laws are the root cause of most tax underpayment. Tax reform would do even more than improved administration to increase the ratio of taxes paid to taxes due. But until that glorious day when Congress finds the wit and will to make the tax system simpler and fairer, it would behoove a nation trying to make ends meet to spend $2 billion to $3 billion more each year to directly collect $10 billion to 15 billion a year more of legally owed taxes and, almost certainly, raise far more than that by frightening borderline scoff-laws.

Disability Insurance. Thirteen million people with disabling conditions who are judged incapable of engaging in substantial gainful activity received $161 billion in disability insurance in 2013. If the disabling conditions improve enough so that beneficiaries can return to work, benefits are supposed to be stopped. Such improvement is rare. But when administrators believe that there is some chance, the law requires them to check. They may ask beneficiaries to fill out a questionnaire or, in some cases, undergo a new medical exam at government expense. Each dollar spent in these ways generated an estimated $16 in savings in 2013.

Still, the Social Security Administration is so understaffed that SSA has a backlog of 1.3 million disability reviews. Current estimates indicate that spending a little over $1 billion a year more on such reviews over the next decade would save $43 billion. Rather than giving Social Security the staff and spending authority to work down this backlog and realize those savings, Congress has been cutting the agency’s administrative budget and sequestration threatens further cuts.

Claiming that better administration will balance the budget would be wrong. But it would help. And it would stop some people from shirking their legal responsibilities and lighten the burdens of those who shoulder theirs. The failure of Congress to provide enough staff to run programs costing hundreds of billions of dollars a year as efficiently and honestly as possible is about as good a definition of criminal negligence as one can find.

Authors

     
 
 




english

The NAEP proficiency myth


On May 16, I got into a Twitter argument with Campbell Brown of The 74, an education website.  She released a video on Slate giving advice to the next president.  The video begins: “Without question, to me, the issue is education. Two out of three eighth graders in this country cannot read or do math at grade level.”  I study student achievement and was curious.  I know of no valid evidence to make the claim that two out of three eighth graders are below grade level in reading and math.  No evidence was cited in the video.  I asked Brown for the evidentiary basis of the assertion.  She cited the National Assessment of Educational Progress (NAEP).

NAEP does not report the percentage of students performing at grade level.  NAEP reports the percentage of students reaching a “proficient” level of performance.  Here’s the problem. That’s not grade level. 

In this post, I hope to convince readers of two things:

1.  Proficient on NAEP does not mean grade level performance.  It’s significantly above that.
2.  Using NAEP’s proficient level as a basis for education policy is a bad idea.

Before going any further, let’s look at some history.

NAEP history 

NAEP was launched nearly five decades ago.  The first NAEP test was given in science in 1969, followed by a reading test in 1971 and math in 1973.  For the first time, Americans were able to track the academic progress of the nation’s students.  That set of assessments, which periodically tests students 9, 13, and 17 years old and was last given in 2012, is now known as the Long Term Trend (LTT) NAEP. 

It was joined by another set of NAEP tests in the 1990s.  The Main NAEP assesses students by grade level (fourth, eighth, and twelfth) and, unlike the LTT, produces not only national but also state scores.  The two tests, LTT and main, continue on parallel tracks today, and they are often confounded by casual NAEP observers.  The main NAEP, which was last administered in 2015, is the test relevant to this post and will be the only one discussed hereafter.  The NAEP governing board was concerned that the conventional metric for reporting results (scale scores) was meaningless to the public, so achievement standards (also known as performance standards) were introduced.  The percentage of students scoring at advanced, proficient, basic, and below basic levels are reported each time the main NAEP is given.

Does NAEP proficient mean grade level? 

The National Center for Education Statistics (NCES) states emphatically, “Proficient is not synonymous with grade level performance.” The National Assessment Governing Board has a brochure with information on NAEP, including a section devoted to myths and facts.  There, you will find this:

Myth: The NAEP Proficient level is like being on grade level.

 

Fact: Proficient on NAEP means competency over challenging subject matter.  This is not the same thing as being “on grade level,” which refers to performance on local curriculum and standards. NAEP is a general assessment of knowledge and skills in a particular subject.

Equating NAEP proficiency with grade level is bogus.  Indeed, the validity of the achievement levels themselves is questionable.  They immediately came under fire in reviews by the U.S. Government Accountability Office, the National Academy of Sciences, and the National Academy of Education.[1]  The National Academy of Sciences report was particularly scathing, labeling NAEP’s achievement levels as “fundamentally flawed.”

Despite warnings of NAEP authorities and critical reviews from scholars, some commentators, typically from advocacy groups, continue to confound NAEP proficient with grade level.  Organizations that support school reform, such as Achieve Inc. and Students First, prominently misuse the term on their websites.  Achieve presses states to adopt cut points aligned with NAEP proficient as part of new Common Core-based accountability systems.  Achieve argues that this will inform parents whether children “can do grade level work.” No, it will not.  That claim is misleading.

How unrealistic is NAEP proficient? 

Shortly after NCLB was signed into law, Robert Linn, one of the most prominent psychometricians of the past several decades, called ”the target of 100% proficient or above according to the NAEP standards more like wishful thinking than a realistic possibility.”  History is on the side of that argument.  When the first main NAEP in mathematics was given in 1990, only 13 % of eighth graders scored proficient and 2 % scored advanced.  Imagine using “proficient” as synonymous with grade level—85 % scored below grade level! 

The 1990 national average in eighth grade scale scores was 263 (see Table 1).  In 2015, the average was 282, a gain of 19 scale score points.

Table 1.  Main NAEP Eighth Grade Math Score, by achievement levels, 1990-2015

Year

Scale Score Average

Below Basic (%)

Basic

Proficient

Advanced

Proficient and Above

2015

282

29

38

25

8

33

2009

283

27

39

26

8

34

2003

278

32

39

23

5

28

1996

270

39

38

20

4

24

1990

263

48

37

13

2

15

That’s an impressive gain.  Analysts who study NAEP often use 10 points on the NAEP scale as a back of the envelope estimate of one year’s worth of learning.  Eighth graders have gained almost two years.  The percentage of students scoring below basic has dropped from 48%  in 1990 to 29% in 2015.  The percentage of students scoring proficient or above has more than doubled, from 15% to 33%.  That’s not bad news; it’s good news.

But the cut point for NAEP proficient is 299.  By that standard, two-thirds of eighth graders are still falling short.  Even students in private schools, despite hailing from more socioeconomically advantaged homes and in some cases being selectively admitted by schools, fail miserably at attaining NAEP proficiency.  More than half (53 percent) are below proficient. 

Today’s eighth graders have made it about half-way to NAEP proficient in 25 years, but they still need to gain almost two more years of math learning (17 points) to reach that level.  And, don’t forget, that’s just the national average, so even when that lofty goal is achieved, half of the nation’s students will still fall short of proficient.  Advocates of the NAEP proficient standard want it to be for all students.  That is ridiculous.  Another way to think about it: proficient for today’s eighth graders reflects approximately what the average twelfth grader knew in mathematics in 1990.   Someday the average eighth grader may be able to do that level of mathematics.  But it won’t be soon, and it won’t be every student.

In the 2007 Brown Center Report on American Education, I questioned whether NAEP proficient is a reasonable achievement standard.[2]  That year, a study by Gary Phillips of American Institutes for Research was published that projected the 2007 TIMSS scores on the NAEP scale.  Phillips posed the question: based on TIMSS, how many students in other countries would score proficient or better on NAEP?  The study’s methodology only produces approximations, but they are eye-popping.

Here are just a few countries:

Table 2.  Projected Percent NAEP Proficient, Eighth Grade Math

Singapore

73

Hong Kong SAR

66

Korea, Rep. of

65

Chinese Taipei

61

Japan

57

Belgium (Flemish)

40

United States

26

Israel

24

England

22

Italy

17

Norway

9 

Singapore was the top scoring nation on TIMSS that year, but even there, more than a quarter of students fail to reach NAEP proficient.  Japan is not usually considered a slouch on international math assessments, but 43% of its eighth graders fall short.  The U.S. looks weak, with only 26% of students proficient.  But England, Israel, and Italy are even weaker.  Norway, a wealthy nation with per capita GDP almost twice that of the U.S., can only get 9 out of 100 eighth graders to NAEP proficient.

Finland isn’t shown in the table because it didn’t participate in the 2007 TIMSS.  But it did in 2011, with Finland and the U.S. scoring about the same in eighth grade math.  Had Finland’s eighth graders taken NAEP in 2011, it’s a good bet that the proportion scoring below NAEP proficient would have been similar to that in the U.S.  And yet articles such as “Why Finland Has the Best Schools,” appear regularly in the U.S. press.[3]

Why it matters

The National Center for Education Statistics warns that federal law requires that NAEP achievement levels be used on a trial basis until the Commissioner of Education Statistics determines that the achievement levels are “reasonable, valid, and informative to the public.”  As the NCES website states, “So far, no Commissioner has made such a determination, and the achievement levels remain in a trial status.  The achievement levels should continue to be interpreted and used with caution.”

Confounding NAEP proficient with grade-level is uninformed.  Designating NAEP proficient as the achievement benchmark for accountability systems is certainly not cautious use.  If high school students are required to meet NAEP proficient to graduate from high school, large numbers will fail. If middle and elementary school students are forced to repeat grades because they fall short of a standard anchored to NAEP proficient, vast numbers will repeat grades.    

On NAEP, students are asked the highest level math course they’ve taken.  On the 2015 twelfth grade NAEP, 19% of students said they either were taking or had taken calculus.   These are the nation’s best and the brightest, the crème-de la crème of math students.  Only one in five students work their way that high up the hierarchy of American math courses.  If you are over 45 years old and reading this, the proportion who took calculus in high school is less than one out of ten.  In the graduating class of 1990, for instance, only 7% of students had taken calculus.[4] 

Unsurprisingly, calculus students are also typically taught by the nation’s most knowledgeable math teachers.  The nation’s elite math students paired with the nation’s elite math teachers: if any group can prove NAEP proficient a reasonable goal and succeed in getting all students over the NAEP proficiency bar, this is the group. 

But they don’t.  A whopping 30% score below proficient on NAEP.  For black and Hispanic calculus students, the figures are staggering.  Two-thirds of black calculus students score below NAEP proficient.  For Hispanics, the figure is 52%.  The nation’s pre-calculus students also fair poorly (69% below proficient). Then the success rate falls off a cliff.  In the class of 2015, more than nine out of ten students whose highest math course was Trigonometry or Algebra II fail to meet the NAEP proficient standard.

Table 3.  2015 NAEP Twelfth Grade Math, Proficient by Highest Math Course Taken

Highest Math Course Taken

Percentage Below NAEP Proficient

Calculus

30

Pre-calculus

69

Trig/Algebra II

92

Source: NAEP Data Explorer

These data defy reason; they also refute common sense.  For years, educators have urged students to take the toughest courses they can possibly take.  Taken at face value, the data in Table 3 rip the heart out of that advice.  These are the toughest courses, and yet huge numbers of the nation’s star students, by any standard aligned with NAEP proficient, would be told that they have failed.  Some parents, misled by the confounding of proficient with grade level, might even mistakenly believe that their kids don’t know grade level math.

Conclusion 

NAEP proficient is not synonymous with grade level.  NAEP officials urge that proficient not be interpreted as reflecting grade level work.  It is a standard set much higher than that.  Scholarly panels have reviewed the NAEP achievement standards and found them flawed.  The highest scoring nations of the world would appear to be mediocre or poor performers if judged by the NAEP proficient standard.  Even large numbers of U.S. calculus students fall short.

As states consider building benchmarks for student performance into accountability systems, they should not use NAEP proficient—or any standard aligned with NAEP proficient—as a benchmark.  It is an unreasonable expectation, one that ill serves America’s students, parents, and teachers--and the effort to improve America’s schools.


[1] Shepard, L. A., Glaser, R., Linn, R., & Bohrnstedt, G. (1993) Setting Performance Standards For Student Achievement: Background Studies. Report of the NAE Panel on the Evaluation of the NAEP Trial State Assessment: An Evaluation of the 1992 Achievement Levels. National Academy of Education. 

[2] Loveless, Tom.  The 2007 Brown Center Report, pages 10-13.

[3] William Doyle, “Why Finland Has The Best Schools,” Los Angeles Times, March 18, 2016.

[4] NCES, America’s High School Graduates: Results of the 2009 NAEP High School Transcript Study.  See Table 8, p. 49.

Authors

Image Source: © Brian Snyder / Reuters
      
 
 




english

Common Core’s major political challenges for the remainder of 2016


The 2016 Brown Center Report (BCR), which was published last week, presented a study of Common Core State Standards (CCSS).   In this post, I’d like to elaborate on a topic touched upon but deserving further attention: what to expect in Common Core’s immediate political future. I discuss four key challenges that CCSS will face between now and the end of the year.

Let’s set the stage for the discussion.  The BCR study produced two major findings.  First, several changes that CCSS promotes in curriculum and instruction appear to be taking place at the school level.  Second, states that adopted CCSS and have been implementing the standards have registered about the same gains and losses on NAEP as states that either adopted and rescinded CCSS or never adopted CCSS in the first place.  These are merely associations and cannot be interpreted as saying anything about CCSS’s causal impact.  Politically, that doesn’t really matter. The big story is that NAEP scores have been flat for six years, an unprecedented stagnation in national achievement that states have experienced regardless of their stance on CCSS.  Yes, it’s unfair, but CCSS is paying a political price for those disappointing NAEP scores.  No clear NAEP differences have emerged between CCSS adopters and non-adopters to reverse that political dynamic.

"Yes, it’s unfair, but CCSS is paying a political price for those disappointing NAEP scores. No clear NAEP differences have emerged between CCSS adopters and non-adopters to reverse that political dynamic."

TIMSS and PISA scores in November-December

NAEP has two separate test programs.  The scores released in 2015 were for the main NAEP, which began in 1990.  The long term trend (LTT) NAEP, a different test that was first given in 1969, has not been administered since 2012.  It was scheduled to be given in 2016, but was cancelled due to budgetary constraints.  It was next scheduled for 2020, but last fall officials cancelled that round of testing as well, meaning that the LTT NAEP won’t be given again until 2024.  

With the LTT NAEP on hold, only two international assessments will soon offer estimates of U.S. achievement that, like the two NAEP tests, are based on scientific sampling:  PISA and TIMSS.  Both tests were administered in 2015, and the new scores will be released around the Thanksgiving-Christmas period of 2016.  If PISA and TIMSS confirm the stagnant trend in U.S. achievement, expect CCSS to take another political hit.  America’s performance on international tests engenders a lot of hand wringing anyway, so the reaction to disappointing PISA or TIMSS scores may be even more pronounced than what the disappointing NAEP scores generated.

Is teacher support still declining?

Watch Education Next’s survey on Common Core (usually released in August/September) and pay close attention to teacher support for CCSS.  The trend line has been heading steadily south. In 2013, 76 percent of teachers said they supported CCSS and only 12 percent were opposed.  In 2014, teacher support fell to 43 percent and opposition grew to 37 percent.  In 2015, opponents outnumbered supporters for the first time, 50 percent to 37 percent.  Further erosion of teacher support will indicate that Common Core’s implementation is in trouble at the ground level.  Don’t forget: teachers are the final implementers of standards.

An effort by Common Core supporters to change NAEP

The 2015 NAEP math scores were disappointing.  Watch for an attempt by Common Core supporters to change the NAEP math tests. Michael Cohen, President of Achieve, a prominent pro-CCSS organization, released a statement about the 2015 NAEP scores that included the following: "The National Assessment Governing Board, which oversees NAEP, should carefully review its frameworks and assessments in order to ensure that NAEP is in step with the leadership of the states. It appears that there is a mismatch between NAEP and all states' math standards, no matter if they are common standards or not.” 

Reviewing and potentially revising the NAEP math framework is long overdue.  The last adoption was in 2004.  The argument for changing NAEP to place greater emphasis on number and operations, revisions that would bring NAEP into closer alignment with Common Core, also has merit.  I have a longstanding position on the NAEP math framework. In 2001, I urged the National Assessment Governing Board (NAGB) to reject the draft 2004 framework because it was weak on numbers and operations—and especially weak on assessing student proficiency with whole numbers, fractions, decimals, and percentages.  

Common Core’s math standards are right in line with my 2001 complaint.  Despite my sympathy for Common Core advocates’ position, a change in NAEP should not be made because of Common Core.  In that 2001 testimony, I urged NAGB to end the marriage of NAEP with the 1989 standards of the National Council of Teachers of Mathematics, the math reform document that had guided the main NAEP since its inception.  Reform movements come and go, I argued.  NAGB’s job is to keep NAEP rigorously neutral.  The assessment’s integrity depends upon it.  NAEP was originally intended to function as a measuring stick, not as a PR device for one reform or another.  If NAEP is changed it must be done very carefully and should be rooted in the mathematics children must learn.  The political consequences of it appearing that powerful groups in Washington, DC are changing “The Nation’s Report Card” in order for Common Core to look better will hurt both Common Core and NAEP.

Will Opt Out grow?

Watch the Opt Out movement.  In 2015, several organized groups of parents refused to allow their children to take Common Core tests.  In New York state alone, about 60,000 opted out in 2014, skyrocketing to 200,000 in 2015.  Common Core testing for 2016 begins now and goes through May.  It will be important to see whether Opt Out can expand to other states, grow in numbers, and branch out beyond middle- and upper-income neighborhoods.

Conclusion

Common Core is now several years into implementation.  Supporters have had a difficult time persuading skeptics that any positive results have occurred. The best evidence has been mixed on that question.  CCSS advocates say it is too early to tell, and we’ll just have to wait to see the benefits.  That defense won’t work much longer.  Time is running out.  The political challenges that Common Core faces the remainder of this year may determine whether it survives.

Authors

Image Source: Jim Young / Reuters
      
 
 




english

Brookings Live: Reading and math in the Common Core era


Event Information

March 28, 2016
4:00 PM - 4:30 PM EDT

Online Only
Live Webcast

And more from the Brown Center Report on American Education


The Common Core State Standards have been adopted as the reading and math standards in more than forty states, but are the frontline implementers—teachers and principals—enacting them? As part of the 2016 Brown Center Report on American Education, Tom Loveless examines the degree to which CCSS recommendations have penetrated schools and classrooms. He specifically looks at the impact the standards have had on the emphasis of non-fiction vs. fiction texts in reading, and on enrollment in advanced courses in mathematics.

On March 28, the Brown Center hosted an online discussion of Loveless's findings, moderated by the Urban Institute's Matthew Chingos.  In addition to the Common Core, Loveless and Chingos also discussed the other sections of the three-part Brown Center Report, including a study of the relationship between ability group tracking in eighth grade and AP performance in high school.

Watch the archived video below.

Spreecast is the social video platform that connects people.
Check out Reading and Math in the Common Core Era on Spreecast.

      
 
 




english

How well are American students learning?


Tom Loveless, a nonresident senior fellow in Governance Studies, explains his latest research on measuring achievement of American students.

“The bottom line here: the implementation of the common core has appeared to have very little impact on student achievement,” Loveless says. In this episode, he discusses whether the common core is failing our students, whether AP achievement is indicative of student success, and the role of principals as instructional leaders.

Also in this episode: Get to know Constanze Stelzenmüller, the Robert Bosch Senior Fellow in the Center on the United States and Europe, during our "Coffee Break” segment. Also stay tuned to hear the final episode in our centenary series with current and past Brookings scholars.

Show Notes:

The Brown Center Report on American Education

Brookings Centenary Timeline


Subscribe to the Brookings Cafeteria on iTunes, listen in all the usual places, and send feedback email to BCP@Brookings.edu.

Authors

      
 
 




english

Reading and math in the Common Core era


      
 
 




english

Principals as instructional leaders: An international perspective


      
 
 




english

2016 Brown Center Report on American Education: How Well Are American Students Learning?


      
 
 




english

Tracking and Advanced Placement


      
 
 




english

Has Common Core influenced instruction?


The release of 2015 NAEP scores showed national achievement stalling out or falling in reading and mathematics.  The poor results triggered speculation about the effect of Common Core State Standards (CCSS), the controversial set of standards adopted by more than 40 states since 2010.  Critics of Common Core tended to blame the standards for the disappointing scores.  Its defenders said it was too early to assess CCSS’s impact and that implementation would take many years to unfold. William J. Bushaw, executive director of the National assessment Governing Board, cited “curricular uncertainty” as the culprit.  Secretary of Education Arne Duncan argued that new standards typically experience an “implementation dip” in the early days of teachers actually trying to implement them in classrooms.

In the rush to argue whether CCSS has positively or negatively affected American education, these speculations are vague as to how the standards boosted or depressed learning.  They don’t provide a description of the mechanisms, the connective tissue, linking standards to learning.  Bushaw and Duncan come the closest, arguing that the newness of CCSS has created curriculum confusion, but the explanation falls flat for a couple of reasons.  Curriculum in the three states that adopted the standards, rescinded them, then adopted something else should be extremely confused.  But the 2013-2015 NAEP changes for Indiana, Oklahoma, and South Carolina were a little bit better than the national figures, not worse.[i]  In addition, surveys of math teachers conducted in the first year or two after the standards were adopted found that:  a) most teachers liked them, and b) most teachers said they were already teaching in a manner consistent with CCSS.[ii]  They didn’t mention uncertainty.  Recent polls, however, show those positive sentiments eroding. Mr. Bushaw might be mistaking disenchantment for uncertainty.[iii] 

For teachers, the novelty of CCSS should be dissipating.  Common Core’s advocates placed great faith in professional development to implement the standards.  Well, there’s been a lot of it.  Over the past few years, millions of teacher-hours have been devoted to CCSS training.  Whether all that activity had a lasting impact is questionable.  Randomized control trials have been conducted of two large-scale professional development programs.  Interestingly, although they pre-date CCSS, both programs attempted to promote the kind of “instructional shifts” championed by CCSS advocates. The studies found that if teacher behaviors change from such training—and that’s not a certainty—the changes fade after a year or two.  Indeed, that’s a pattern evident in many studies of educational change: a pop at the beginning, followed by fade out.  

My own work analyzing NAEP scores in 2011 and 2013 led me to conclude that the early implementation of CCSS was producing small, positive changes in NAEP.[iv]  I warned that those gains “may be as good as it gets” for CCSS.[v]  Advocates of the standards hope that CCSS will eventually produce long term positive effects as educators learn how to use them.  That’s a reasonable hypothesis.  But it should now be apparent that a counter-hypothesis has equal standing: any positive effect of adopting Common Core may have already occurred.  To be precise, the proposition is this: any effects from adopting new standards and attempting to change curriculum and instruction to conform to those standards occur early and are small in magnitude.   Policymakers still have a couple of arrows left in the implementation quiver, accountability being the most powerful.  Accountability systems have essentially been put on hold as NCLB sputtered to an end and new CCSS tests appeared on the scene.  So the CCSS story isn’t over.  Both hypotheses remain plausible. 

Reading Instruction in 4th and 8th Grades

Back to the mechanisms, the connective tissue binding standards to classrooms.  The 2015 Brown Center Report introduced one possible classroom effect that is showing up in NAEP data: the relative emphasis teachers place on fiction and nonfiction in reading instruction.  The ink was still drying on new Common Core textbooks when a heated debate broke out about CCSS’s recommendation that informational reading should receive greater attention in classrooms.[vi] 

Fiction has long dominated reading instruction.  That dominance appears to be waning.



After 2011, something seems to have happened.  I am more persuaded that Common Core influenced the recent shift towards nonfiction than I am that Common Core has significantly affected student achievement—for either good or ill.   But causality is difficult to confirm or to reject with NAEP data, and trustworthy efforts to do so require a more sophisticated analysis than presented here.

Four lessons from previous education reforms

Nevertheless, the figures above reinforce important lessons that have been learned from previous top-down reforms.  Let’s conclude with four:

1.  There seems to be evidence that CCSS is having an impact on the content of reading instruction, moving from the dominance of fiction over nonfiction to near parity in emphasis.  Unfortunately, as Mark Bauerlein and Sandra Stotsky have pointed out, there is scant evidence that such a shift improves children’s reading.[vii]

2.  Reading more nonfiction does not necessarily mean that students will be reading higher quality texts, even if the materials are aligned with CCSS.   The Core Knowledge Foundation and the Partnership for 21st Century Learning, both supporters of Common Core, have very different ideas on the texts schools should use with the CCSS.[viii] The two organizations advocate for curricula having almost nothing in common.

3.  When it comes to the study of implementing education reforms, analysts tend to focus on the formal channels of implementation and the standard tools of public administration—for example, intergovernmental hand-offs (federal to state to district to school), alignment of curriculum, assessment and other components of the reform, professional development, getting incentives right, and accountability mechanisms.  Analysts often ignore informal channels, and some of those avenues funnel directly into schools and classrooms.[ix]  Politics and the media are often overlooked.  Principals and teachers are aware of the politics swirling around K-12 school reform.  Many educators undoubtedly formed their own opinions on CCSS and the fiction vs. nonfiction debate before the standard managerial efforts touched them.

4.  Local educators whose jobs are related to curriculum almost certainly have ideas about what constitutes good curriculum.  It’s part of the profession.  Major top-down reforms such as CCSS provide local proponents with political cover to pursue curricular and instructional changes that may be politically unpopular in the local jurisdiction.  Anyone who believes nonfiction should have a more prominent role in the K-12 curriculum was handed a lever for promoting his or her beliefs by CCSS. I’ve previously called these the “dog whistles” of top-down curriculum reform, subtle signals that give local advocates license to promote unpopular positions on controversial issues.


[i] In the four subject-grade combinations assessed by NAEP (reading and math at 4th and 8th grades), IN, SC, and OK all exceeded national gains on at least three out of four tests from 2013-2015.  NAEP data can be analyzed using the NAEP Data Explorer: http://nces.ed.gov/nationsreportcard/naepdata/.

[ii] In a Michigan State survey of teachers conducted in 2011, 77 percent of teachers, after being presented with selected CCSS standards for their grade, thought they were the same as their state’s former standards.  http://education.msu.edu/epc/publications/documents/WP33ImplementingtheCommonCoreStandardsforMathematicsWhatWeknowaboutTeacherofMathematicsin41S.pdf

[iii] In the Education Next surveys, 76 percent of teachers supported Common Core in 2013 and 12 percent opposed.  In 2015, 40 percent supported and 50 percent opposed. http://educationnext.org/2015-ednext-poll-school-reform-opt-out-common-core-unions.

[iv] I used variation in state implementation of CCSS to assign the states to three groups and analyzed differences of the groups’ NAEP gains

[v] http://www.brookings.edu/~/media/research/files/reports/2015/03/bcr/2015-brown-center-report_final.pdf

[vi] http://www.edweek.org/ew/articles/2012/11/14/12cc-nonfiction.h32.html?qs=common+core+fiction

[vii] Mark Bauerlein and Sandra Stotsky (2012). “How Common Core’s ELA Standards Place College Readiness at Risk.” A Pioneer Institute White Paper.

[viii] Compare the P21 Common Core Toolkit (http://www.p21.org/our-work/resources/for-educators/1005-p21-common-core-toolkit) with Core Knowledge ELA Sequence (http://www.coreknowledge.org/ccss).  It is hard to believe that they are talking about the same standards in references to CCSS.

[ix] I elaborate on this point in Chapter 8, “The Fate of Reform,” in The Tracking Wars: State Reform Meets School Policy (Brookings Institution Press, 1999).


Authors

Image Source: © Patrick Fallon / Reuters
      
 
 




english

No, the sky is not falling: Interpreting the latest SAT scores


Earlier this month, the College Board released SAT scores for the high school graduating class of 2015. Both math and reading scores declined from 2014, continuing a steady downward trend that has been in place for the past decade. Pundits of contrasting political stripes seized on the scores to bolster their political agendas. Michael Petrilli of the Fordham Foundation argued that falling SAT scores show that high schools need more reform, presumably those his organization supports, in particular, charter schools and accountability.* For Carol Burris of the Network for Public Education, the declining scores were evidence of the failure of polices her organization opposes, namely, Common Core, No Child Left Behind, and accountability.

Petrilli and Burris are both misusing SAT scores. The SAT is not designed to measure national achievement; the score losses from 2014 were miniscule; and most of the declines are probably the result of demographic changes in the SAT population. Let’s examine each of these points in greater detail.

The SAT is not designed to measure national achievement

It never was. The SAT was originally meant to measure a student’s aptitude for college independent of that student’s exposure to a particular curriculum. The test’s founders believed that gauging aptitude, rather than achievement, would serve the cause of fairness. A bright student from a high school in rural Nebraska or the mountains of West Virginia, they held, should have the same shot at attending elite universities as a student from an Eastern prep school, despite not having been exposed to the great literature and higher mathematics taught at prep schools. The SAT would measure reasoning and analytical skills, not the mastery of any particular body of knowledge. Its scores would level the playing field in terms of curricular exposure while providing a reasonable estimate of an individual’s probability of success in college.

Note that even in this capacity, the scores never suffice alone; they are only used to make admissions decisions by colleges and universities, including such luminaries as Harvard and Stanford, in combination with a lot of other information—grade point averages, curricular resumes, essays, reference letters, extra-curricular activities—all of which constitute a student’s complete application.

Today’s SAT has moved towards being a content-oriented test, but not entirely. Next year, the College Board will introduce a revised SAT to more closely reflect high school curricula. Even then, SAT scores should not be used to make judgements about U.S. high school performance, whether it’s a single high school, a state’s high schools, or all of the high schools in the country. The SAT sample is self-selected. In 2015, it only included about one-half of the nation’s high school graduates: 1.7 million out of approximately 3.3 million total. And that’s about one-ninth of approximately 16 million high school students.  Generalizing SAT scores to these larger populations violates a basic rule of social science. The College Board issues a warning when it releases SAT scores: “Since the population of test takers is self-selected, using aggregate SAT scores to compare or evaluate teachers, schools, districts, states, or other educational units is not valid, and the College Board strongly discourages such uses.”  

TIME’s coverage of the SAT release included a statement by Andrew Ho of Harvard University, who succinctly makes the point: “I think SAT and ACT are tests with important purposes, but measuring overall national educational progress is not one of them.”

The score changes from 2014 were miniscule

SAT scores changed very little from 2014 to 2015. Reading scores dropped from 497 to 495. Math scores also fell two points, from 513 to 511. Both declines are equal to about 0.017 standard deviations (SD).[i] To illustrate how small these changes truly are, let’s examine a metric I have used previously in discussing test scores. The average American male is 5’10” in height with a SD of about 3 inches. A 0.017 SD change in height is equal to about 1/20 of an inch (0.051). Do you really think you’d notice a difference in the height of two men standing next to each other if they only differed by 1/20th of an inch? You wouldn’t. Similarly, the change in SAT scores from 2014 to 2015 is trivial.[ii]

A more serious concern is the SAT trend over the past decade. Since 2005, reading scores are down 13 points, from 508 to 495, and math scores are down nine points, from 520 to 511. These are equivalent to declines of 0.12 SD for reading and 0.08 SD for math.[iii] Representing changes that have accumulated over a decade, these losses are still quite small. In the Washington Post, Michael Petrilli asked “why is education reform hitting a brick wall in high school?” He also stated that “you see this in all kinds of evidence.”

You do not see a decline in the best evidence, the National Assessment of Educational Progress (NAEP). Contrary to the SAT, NAEP is designed to monitor national achievement. Its test scores are based on a random sampling design, meaning that the scores can be construed as representative of U.S. students. NAEP administers two different tests to high school age students, the long term trend (LTT NAEP), given to 17-year-olds, and the main NAEP, given to twelfth graders.

Table 1 compares the past ten years’ change in test scores of the SAT with changes in NAEP.[iv] The long term trend NAEP was not administered in 2005 or 2015, so the closest years it was given are shown. The NAEP tests show high school students making small gains over the past decade. They do not confirm the losses on the SAT.

Table 1. Comparison of changes in SAT, Main NAEP (12th grade), and LTT NAEP (17-year-olds) scores. Changes expressed as SD units of base year.

SAT

2005-2015

Main NAEP

2005-2015

LTT NAEP

2004-2012

Reading

-0.12*

+.05*

+.09*

Math

-0.08*

+.09*

+.03

 *p<.05

Petrilli raised another concern related to NAEP scores by examining cohort trends in NAEP scores. The trend for the 17-year-old cohort of 2012, for example, can be constructed by using the scores of 13-year-olds in 2008 and 9-year-olds in 2004. By tracking NAEP changes over time in this manner, one can get a rough idea of a particular cohort’s achievement as students grow older and proceed through the school system. Examining three cohorts, Fordham’s analysis shows that the gains between ages 13 and 17 are about half as large as those registered between ages nine and 13. Kids gain more on NAEP when they are younger than when they are older.

There is nothing new here. NAEP scholars have been aware of this phenomenon for a long time. Fordham points to particular elements of education reform that it favors—charter schools, vouchers, and accountability—as the probable cause. It is true that those reforms more likely target elementary and middle schools than high schools. But the research literature on age discrepancies in NAEP gains (which is not cited in the Fordham analysis) renders doubtful the thesis that education policies are responsible for the phenomenon.[v]

Whether high school age students try as hard as they could on NAEP has been pointed to as one explanation. A 1996 analysis of NAEP answer sheets found that 25-to-30 percent of twelfth graders displayed off-task test behaviors—doodling, leaving items blank—compared to 13 percent of eighth graders and six percent of fourth graders. A 2004 national commission on the twelfth grade NAEP recommended incentives (scholarships, certificates, letters of recognition from the President) to boost high school students’ motivation to do well on NAEP. Why would high school seniors or juniors take NAEP seriously when this low stakes test is taken in the midst of taking SAT or ACT tests for college admission, end of course exams that affect high school GPA, AP tests that can affect placement in college courses, state accountability tests that can lead to their schools being deemed a success or failure, and high school exit exams that must be passed to graduate?[vi]

Other possible explanations for the phenomenon are: 1) differences in the scales between the ages tested on LTT NAEP (in other words, a one-point gain on the scale between ages nine and 13 may not represent the same amount of learning as a one-point gain between ages 13 and 17); 2) different rates of participation in NAEP among elementary, middle, and high schools;[vii] and 3) social trends that affect all high school students, not just those in public schools. The third possibility can be explored by analyzing trends for students attending private schools. If Fordham had disaggregated the NAEP data by public and private schools (the scores of Catholic school students are available), it would have found that the pattern among private school students is similar—younger students gain more than older students on NAEP. That similarity casts doubt on the notion that policies governing public schools are responsible for the smaller gains among older students.[viii]

Changes in the SAT population

Writing in the Washington Post, Carol Burris addresses the question of whether demographic changes have influenced the decline in SAT scores. She concludes that they have not, and in particular, she concludes that the growing proportion of students receiving exam fee waivers has probably not affected scores. She bases that conclusion on an analysis of SAT participation disaggregated by level of family income. Burris notes that the percentage of SAT takers has been stable across income groups in recent years. That criterion is not trustworthy. About 39 percent of students in 2015 declined to provide information on family income. The 61 percent that answered the family income question are probably skewed against low-income students who are on fee waivers (the assumption being that they may feel uncomfortable answering a question about family income).[ix] Don’t forget that the SAT population as a whole is a self-selected sample. A self-selected subsample from a self-selected sample tells us even less than the original sample, which told us almost nothing.

The fee waiver share of SAT takers increased from 21 percent in 2011 to 25 percent in 2015. The simple fact that fee waivers serve low-income families, whose children tend to be lower-scoring SAT takers, is important, but not the whole story here. Students from disadvantaged families have always taken the SAT. But they paid for it themselves. If an additional increment of disadvantaged families take the SAT because they don’t have to pay for it, it is important to consider whether the new entrants to the pool of SAT test takers possess unmeasured characteristics that correlate with achievement—beyond the effect already attributed to socioeconomic status.

Robert Kelchen, an assistant professor of higher education at Seton Hall University, calculated the effect on national SAT scores of just three jurisdictions (Washington, DC, Delaware, and Idaho) adopting policies of mandatory SAT testing paid for by the state. He estimated that these policies explain about 21 percent of the nationwide decline in test scores between 2011 and 2015. He also notes that a more thorough analysis, incorporating fee waivers of other states and districts, would surely boost that figure. Fee waivers in two dozen Texas school districts, for example, are granted to all juniors and seniors in high school. And all students in those districts (including Dallas and Fort Worth) are required to take the SAT beginning in the junior year. Such universal testing policies can increase access and serve the cause of equity, but they will also, at least for a while, lead to a decline in SAT scores.

Here, I offer my own back of the envelope calculation of the relationship of demographic changes with SAT scores. The College Board reports test scores and participation rates for nine racial and ethnic groups.[x] These data are preferable to family income because a) almost all students answer the race/ethnicity question (only four percent are non-responses versus 39 percent for family income), and b) it seems a safe assumption that students are more likely to know their race or ethnicity compared to their family’s income.

The question tackled in Table 2 is this: how much would the national SAT scores have changed from 2005 to 2015 if the scores of each racial/ethnic group stayed exactly the same as in 2005, but each group’s proportion of the total population were allowed to vary? In other words, the scores are fixed at the 2005 level for each group—no change. The SAT national scores are then recalculated using the 2015 proportions that each group represented in the national population.

Table 2. SAT Scores and Demographic Changes in the SAT Population (2005-2015)

Projected Change Based on Change in Proportions

Actual Change

Projected Change as Percentage of Actual Change

Reading

-9

-13

69%

Math

-7

-9

78%

The data suggest that two-thirds to three-quarters of the SAT score decline from 2005 to 2015 is associated with demographic changes in the test-taking population. The analysis is admittedly crude. The relationships are correlational, not causal. The race/ethnicity categories are surely serving as proxies for a bundle of other characteristics affecting SAT scores, some unobserved and others (e.g., family income, parental education, language status, class rank) that are included in the SAT questionnaire but produce data difficult to interpret.

Conclusion

Using an annual decline in SAT scores to indict high schools is bogus. The SAT should not be used to measure national achievement. SAT changes from 2014-2015 are tiny. The downward trend over the past decade represents a larger decline in SAT scores, but one that is still small in magnitude and correlated with changes in the SAT test-taking population.

In contrast to SAT scores, NAEP scores, which are designed to monitor national achievement, report slight gains for 17-year-olds over the past ten years. It is true that LTT NAEP gains are larger among students from ages nine to 13 than from ages 13 to 17, but research has uncovered several plausible explanations for why that occurs. The public should exercise great caution in accepting the findings of test score analyses. Test scores are often misinterpreted to promote political agendas, and much of the alarmist rhetoric provoked by small declines in scores is unjustified.


* In fairness to Petrilli, he acknowledges in his post, “The SATs aren’t even the best gauge—not all students take them, and those who do are hardly representative.”


[i] The 2014 SD for both SAT reading and math was 115.

[ii] A substantively trivial change may nevertheless reach statistical significance with large samples.

[iii] The 2005 SDs were 113 for reading and 115 for math.

[iv] Throughout this post, SAT’s Critical Reading (formerly, the SAT-Verbal section) is referred to as “reading.” I only examine SAT reading and math scores to allow for comparisons to NAEP. Moreover, SAT’s writing section will be dropped in 2016.

[v] The larger gains by younger vs. older students on NAEP is explored in greater detail in the 2006 Brown Center Report, pp. 10-11.

[vi] If these influences have remained stable over time, they would not affect trends in NAEP. It is hard to believe, however, that high stakes tests carry the same importance today to high school students as they did in the past.

[vii] The 2004 blue ribbon commission report on the twelfth grade NAEP reported that by 2002 participation rates had fallen to 55 percent. That compares to 76 percent at eighth grade and 80 percent at fourth grade. Participation rates refer to the originally drawn sample, before replacements are made. NAEP is conducted with two stage sampling—schools first, then students within schools—meaning that the low participation rate is a product of both depressed school (82 percent) and student (77 percent) participation. See page 8 of: http://www.nagb.org/content/nagb/assets/documents/publications/12_gr_commission_rpt.pdf

[viii] Private school data are spotty on the LTT NAEP because of problems meeting reporting standards, but analyses identical to Fordham’s can be conducted on Catholic school students for the 2008 and 2012 cohorts of 17-year-olds.

[ix] The non-response rate in 2005 was 33 percent.

[x] The nine response categories are: American Indian or Alaska Native; Asian, Asian American, or Pacific Islander; Black or African American; Mexican or Mexican American; Puerto Rican; Other Hispanic, Latino, or Latin American; White; Other; and No Response.

Authors

      
 
 




english

CNN’s misleading story on homework


Last week, CNN ran a back-to-school story on homework with the headline, “Kids Have Three Times Too Much Homework, Study Finds; What’s the Cost?” Homework is an important topic, especially for parents, but unfortunately, CNN’s story misleads rather than informs. The headline suggests American parents should be alarmed because their kids have too much homework. Should they? No, CNN has ignored the best evidence on that question, which suggests the opposite. The story relies on the results of one recent study of homework—a study that is limited in what it can tell us, mostly because of its research design. But CNN even gets its main findings wrong. The study suggests most students have too little homework, not too much.

The Study

The study that piqued CNN’s interest was conducted during four months (two in the spring and two in the fall) in Providence, Rhode Island. About 1,200 parents completed a survey about their children’s homework while waiting in 27 pediatricians’ offices. Is the sample representative of all parents in the U.S.? Probably not. Certainly CNN should have been a bit leery of portraying the results of a survey conducted in a single American city—any city—as evidence applying to a broader audience. More importantly, viewers are never told of the study’s significant limitations: that the data come from a survey conducted in only one city—in pediatricians’ offices by a self-selected sample of respondents.

The survey’s sampling design is a huge problem. Because the sample is non-random there is no way of knowing if the results can be extrapolated to a larger population—even to families in Providence itself. Close to a third of respondents chose to complete the survey in Spanish. Enrollment in English Language programs in the Providence district comprises about 22 percent of students. About one-fourth (26 percent) of survey respondents reported having one child in the family. According to the 2010 Census, the proportion of families nationwide with one child is much higher, at 43 percent.[i] The survey is skewed towards large, Spanish-speaking families. Their experience with homework could be unique, especially if young children in these families are learning English for the first time at school.

The survey was completed by parents who probably had a sick child as they were waiting to see a pediatrician. That’s a stressful setting. The response rate to the survey is not reported, so we don’t know how many parents visiting those offices chose not to fill out the survey. If the typical pediatrician sees 100 unique patients per month, in a four month span the survey may have been offered to more than ten thousand parents in the 27 offices. The survey respondents, then, would be a tiny slice, 10 to 15 percent, of those eligible to respond. We also don’t know the public-private school break out of the respondents, or how many were sending their children to charter schools. It would be interesting to see how many parents willingly send their children to schools with a heavy homework load.

I wish the CNN team responsible for this story had run the data by some of CNN’s political pollsters. Alarm bells surely would have gone off. The hazards of accepting a self-selected, demographically-skewed survey sample as representative of the general population are well known. Modern political polling—and its reliance on random samples—grew from an infamous mishap in 1936. A popular national magazine, the Literary Digest, distributed 10 million post cards for its readers to return as “ballots” indicating who they would vote for in the 1936 race for president. More than two million post cards were returned! A week before the election, the magazine confidently predicted that Alf Landon, the Republican challenger from Kansas, would defeat Franklin Roosevelt, the Democratic incumbent, by a huge margin: 57 percent to 43 percent. In fact, when the real election was held, the opposite occurred: Roosevelt won more than 60% of the popular vote and defeated Landon in a landslide. Pollsters learned that self-selected samples should be viewed warily. The magazine’s readership was disproportionately Republican to begin with, and sometimes disgruntled subjects are more likely to respond to a survey, no matter the topic, than the satisfied.

Here’s a very simple question: In its next poll on the 2016 presidential race, would CNN report the results of a survey of self-selected respondents in 27 pediatricians’ offices in Providence, Rhode Island as representative of national sentiment? Of course not. Then, please, CNN, don’t do so with education topics.

The Providence Study’s Findings

Let’s set aside methodological concerns and turn to CNN’s characterization of the survey’s findings. Did the study really show that most kids have too much homework? No, the headline that “Kids Have Three Times Too Much Homework” is not even an accurate description of the study’s findings. CNN’s on air coverage extended the misinformation. The online video of the coverage is tagged “Study: Your Kids Are Doing Too Much Homework.” The first caption that viewers see is “Study Says Kids Getting Way Too Much Homework.” All of these statements are misleading.

In the published version of the Providence study, the researchers plotted the average amount of time spent on homework by students’ grade.[ii] They then compared those averages to a “10 minutes per-grade” guideline that serves as an indicator of the “right” amount of homework. I have attempted to replicate the data here in table form (they were originally reported in a line graph) to make that comparison easier.[iii]

Contrary to CNN’s reporting, the data suggest—based on the ten minute per-grade rule—that most kids in this study have too little homework, not too much. Beginning in fourth grade, the average time spent on homework falls short of the recommended amount—a gap of only four minutes in fourth grade that steadily widens in later grades.

A more accurate headline would have been, “Study Shows Kids in Nine out of 13 Grades Have Too Little Homework.” It appears high school students (grades 9-12) spend only about half the recommended time on homework. Two hours of nightly homework is recommended for 12th graders. They are, after all, only a year away from college. But according to the Providence survey, their homework load is less than an hour.

So how in the world did CNN come up with the headline “Kids Have Three Times Too Much Homework?” By focusing on grades K-3 and ignoring all other grades. Here’s the reporting:

The study, published Wednesday in The American Journal of Family Therapy, found students in the early elementary school years are getting significantly more homework than is recommended by education leaders, in some cases nearly three times as much homework as is recommended.

 

The standard, endorsed by the National Education Association and the National Parent-Teacher Association, is the so-called "10-minute rule"— 10 minutes per-grade level per-night. That translates into 10 minutes of homework in the first grade, 20 minutes in the second grade, all the way up to 120 minutes for senior year of high school. The NEA and the National PTA do not endorse homework for kindergarten.

 

In the study involving questionnaires filled out by more than 1,100 English and Spanish speaking parents of children in kindergarten through grade 12, researchers found children in the first grade had up to three times the homework load recommended by the NEA and the National PTA.

 

Parents reported first-graders were spending 28 minutes on homework each night versus the recommended 10 minutes. For second-graders, the homework time was nearly 29 minutes, as opposed to the 20 minutes recommended.

 

And kindergartners, their parents said, spent 25 minutes a night on after-school assignments, according to the study

 

CNN focused on the four grades, K-3, in which homework exceeds the ten-minute rule. They ignored more than two-thirds of the grades. Even with this focus, a more accurate headline would have been, “Study Suggests First Graders in Providence, RI Have Three Times Too Much Homework.”

Conclusion

Homework is a controversial topic. People hold differing points of view as to whether there is too much, too little, or just the right amount of homework. That makes it vitally important that the media give accurate information on the empirical dimensions to the debate.  The amount of homework kids should have is subject to debate. But the amount of homework kids actually have is an empirical question. We can debate whether it’s too hot outside, but the actual temperature should be a matter of measurement, not debate. It’s impossible to think of a rational debate that can possibly ensue on the homework issue without knowing the empirical status quo in regards to time. Imagine someone beginning a debate by saying, “I am arguing that kids have too much [substitute “too little” here for the pro-homework side] homework but I must admit that I have no idea how much they currently have.”

Data from the National Assessment of Educational Progress (NAEP) provide the best evidence we have on the amount of homework that kids have. NAEP’s sampling design allows us to make inferences about national trends, and the Long-Term Trend (LTT) NAEP offers data on homework since 1984. The latest LTT NAEP results (2012) indicate that the vast majority of nine-year-olds (83 percent) have less than an hour of homework each night. There has been an apparent uptick in the homework load, however, as 35 percent reported no homework in 1984, and only 22 percent reported no homework in 2012. MET Life also periodically surveys a representative sample of students, parents, and teachers on the homework issue. In the 2007 results, a majority of parents (52 percent) of elementary grade students (grades 3-6 in the MET survey) estimated their children had 30 minutes or less of homework.

The MET Life survey found that parents have an overwhelmingly positive view of the amount of homework their children are assigned. Nine out of ten parents responded that homework offers the opportunity to talk and spend time with their children, and most do not see homework as interfering with family time or as a major source of familial stress. Minority parents, in particular, reported believing homework is beneficial for students’ success at school and in the future.[iv]

That said, just as there were indeed Alf Landon voters in 1936, there are indeed children for whom homework is a struggle. Some bring home more than they can finish in a reasonable amount of time. A complication for researchers of elementary age children is that the same students who have difficulty completing homework may have other challenges—difficulties with reading, low achievement, and poor grades in school.[v] Parents who question the value of homework often have a host of complaints about their child’s school. It is difficult for researchers to untangle all of these factors and determine, in the instances where there are tensions, whether homework is the real cause. To their credit, the researchers who conducted the Providence study are aware of these constraints and present a number of hypotheses warranting further study with a research design supporting causal inferencing. That’s the value of this research, not CNN’s misleading reporting of the findings.


[i] Calculated from data in Table 64, U.S. Census Bureau, Statistical Abstract of the United States: 2012, page 56. http://www.census.gov/compendia/statab/2012/tables/12s0064.pdf.

[ii] The mean sample size for each grade is reported as 7.7 percent (or 90 students).  Confidence intervals for each grade estimate are not reported.

[iii] The data in Table I are estimates (by sight) from a line graph incremented in five percentage point intervals.

[iv] Met Life, Met Life Survey of the American Teacher: The Homework Experience, November 13, 2007, pp. 15.

[v] Among high school students, the bias probably leans in the opposite direction: high achievers load up on AP, IB, and other courses that assign more homework.

Authors

     
 
 




english

Implementing Common Core: The problem of instructional time


This is part two of my analysis of instruction and Common Core’s implementation.  I dubbed the three-part examination of instruction “The Good, The Bad, and the Ugly.”  Having discussed “the “good” in part one, I now turn to “the bad.”  One particular aspect of the Common Core math standards—the treatment of standard algorithms in whole number arithmetic—will lead some teachers to waste instructional time.

A Model of Time and Learning

In 1963, psychologist John B. Carroll published a short essay, “A Model of School Learning” in Teachers College Record.  Carroll proposed a parsimonious model of learning that expressed the degree of learning (or what today is commonly called achievement) as a function of the ratio of time spent on learning to the time needed to learn.     

The numerator, time spent learning, has also been given the term opportunity to learn.  The denominator, time needed to learn, is synonymous with student aptitude.  By expressing aptitude as time needed to learn, Carroll refreshingly broke through his era’s debate about the origins of intelligence (nature vs. nurture) and the vocabulary that labels students as having more or less intelligence. He also spoke directly to a primary challenge of teaching: how to effectively produce learning in classrooms populated by students needing vastly different amounts of time to learn the exact same content.[i] 

The source of that variation is largely irrelevant to the constraints placed on instructional decisions.  Teachers obviously have limited control over the denominator of the ratio (they must take kids as they are) and less than one might think over the numerator.  Teachers allot time to instruction only after educational authorities have decided the number of hours in the school day, the number of days in the school year, the number of minutes in class periods in middle and high schools, and the amount of time set aside for lunch, recess, passing periods, various pull-out programs, pep rallies, and the like.  There are also announcements over the PA system, stray dogs that may wander into the classroom, and other unscheduled encroachments on instructional time.

The model has had a profound influence on educational thought.  As of July 5, 2015, Google Scholar reported 2,931 citations of Carroll’s article.  Benjamin Bloom’s “mastery learning” was deeply influenced by Carroll.  It is predicated on the idea that optimal learning occurs when time spent on learning—rather than content—is allowed to vary, providing to each student the individual amount of time he or she needs to learn a common curriculum.  This is often referred to as “students working at their own pace,” and progress is measured by mastery of content rather than seat time. David C. Berliner’s 1990 discussion of time includes an analysis of mediating variables in the numerator of Carroll’s model, including the amount of time students are willing to spend on learning.  Carroll called this persistence, and Berliner links the construct to student engagement and time on task—topics of keen interest to researchers today.  Berliner notes that although both are typically described in terms of motivation, they can be measured empirically in increments of time.     

Most applications of Carroll’s model have been interested in what happens when insufficient time is provided for learning—in other words, when the numerator of the ratio is significantly less than the denominator.  When that happens, students don’t have an adequate opportunity to learn.  They need more time. 

As applied to Common Core and instruction, one should also be aware of problems that arise from the inefficient distribution of time.  Time is a limited resource that teachers deploy in the production of learning.  Below I discuss instances when the CCSS-M may lead to the numerator in Carroll’s model being significantly larger than the denominator—when teachers spend more time teaching a concept or skill than is necessary.  Because time is limited and fixed, wasted time on one topic will shorten the amount of time available to teach other topics.  Excessive instructional time may also negatively affect student engagement.  Students who have fully learned content that continues to be taught may become bored; they must endure instruction that they do not need.

Standard Algorithms and Alternative Strategies

Jason Zimba, one of the lead authors of the Common Core Math standards, and Barry Garelick, a critic of the standards, had a recent, interesting exchange about when standard algorithms are called for in the CCSS-M.  A standard algorithm is a series of steps designed to compute accurately and quickly.  In the U.S., students are typically taught the standard algorithms of addition, subtraction, multiplication, and division with whole numbers.  Most readers of this post will recognize the standard algorithm for addition.  It involves lining up two or more multi-digit numbers according to place-value, with one number written over the other, and adding the columns from right to left with “carrying” (or regrouping) as needed.

The standard algorithm is the only algorithm required for students to learn, although others are mentioned beginning with the first grade standards.  Curiously, though, CCSS-M doesn’t require students to know the standard algorithms for addition and subtraction until fourth grade.  This opens the door for a lot of wasted time.  Garelick questioned the wisdom of teaching several alternative strategies for addition.  He asked whether, under the Common Core, only the standard algorithm could be taught—or at least, could it be taught first. As he explains:

Delaying teaching of the standard algorithm until fourth grade and relying on place value “strategies” and drawings to add numbers is thought to provide students with the conceptual understanding of adding and subtracting multi-digit numbers. What happens, instead, is that the means to help learn, explain or memorize the procedure become a procedure unto itself and students are required to use inefficient cumbersome methods for two years. This is done in the belief that the alternative approaches confer understanding, so are superior to the standard algorithm. To teach the standard algorithm first would in reformers’ minds be rote learning. Reformers believe that by having students using strategies in lieu of the standard algorithm, students are still learning “skills” (albeit inefficient and confusing ones), and these skills support understanding of the standard algorithm. Students are left with a panoply of methods (praised as a good thing because students should have more than one way to solve problems), that confuse more than enlighten. 

 

Zimba responded that the standard algorithm could, indeed, be the only method taught because it meets a crucial test: reinforcing knowledge of place value and the properties of operations.  He goes on to say that other algorithms also may be taught that are consistent with the standards, but that the decision to do so is left in the hands of local educators and curriculum designers:

In short, the Common Core requires the standard algorithm; additional algorithms aren’t named, and they aren’t required…Standards can’t settle every disagreement—nor should they. As this discussion of just a single slice of the math curriculum illustrates, teachers and curriculum authors following the standards still may, and still must, make an enormous range of decisions.

 

Zimba defends delaying mastery of the standard algorithm until fourth grade, referring to it as a “culminating” standard that he would, if he were teaching, introduce in earlier grades.  Zimba illustrates the curricular progression he would employ in a table, showing that he would introduce the standard algorithm for addition late in first grade (with two-digit addends) and then extend the complexity of its use and provide practice towards fluency until reaching the culminating standard in fourth grade. Zimba would introduce the subtraction algorithm in second grade and similarly ramp up its complexity until fourth grade.

 

It is important to note that in CCSS-M the word “algorithm” appears for the first time (in plural form) in the third grade standards:

 

3.NBT.2  Fluently add and subtract within 1000 using strategies and algorithms based on place value, properties of operations, and/or the relationship between addition and subtraction.

 

The term “strategies and algorithms” is curious.  Zimba explains, “It is true that the word ‘algorithms’ here is plural, but that could be read as simply leaving more choice in the hands of the teacher about which algorithm(s) to teach—not as a requirement for each student to learn two or more general algorithms for each operation!” 

 

I have described before the “dog whistles” embedded in the Common Core, signals to educational progressives—in this case, math reformers—that  despite these being standards, the CCSS-M will allow them great latitude.  Using the plural “algorithms” in this third grade standard and not specifying the standard algorithm until fourth grade is a perfect example of such a dog whistle.

 

Why All the Fuss about Standard Algorithms?

It appears that the Common Core authors wanted to reach a political compromise on standard algorithms. 

 

Standard algorithms were a key point of contention in the “Math Wars” of the 1990s.   The 1997 California Framework for Mathematics required that students know the standard algorithms for all four operations—addition, subtraction, multiplication, and division—by the end of fourth grade.[ii]  The 2000 Massachusetts Mathematics Curriculum Framework called for learning the standard algorithms for addition and subtraction by the end of second grade and for multiplication and division by the end of fourth grade.  These two frameworks were heavily influenced by mathematicians (from Stanford in California and Harvard in Massachusetts) and quickly became favorites of math traditionalists.  In both states’ frameworks, the standard algorithm requirements were in direct opposition to the reform-oriented frameworks that preceded them—in which standard algorithms were barely mentioned and alternative algorithms or “strategies” were encouraged. 

 

Now that the CCSS-M has replaced these two frameworks, the requirement for knowing the standard algorithms in California and Massachusetts slips from third or fourth grade all the way to sixth grade.  That’s what reformers get in the compromise.  They are given a green light to continue teaching alternative algorithms, as long as the algorithms are consistent with teaching place value and properties of arithmetic.  But the standard algorithm is the only one students are required to learn.  And that exclusivity is intended to please the traditionalists.

 

I agree with Garelick that the compromise leads to problems.  In a 2013 Chalkboard post, I described a first grade math program in which parents were explicitly requested not to teach the standard algorithm for addition when helping their children at home.  The students were being taught how to represent addition with drawings that clustered objects into groups of ten.  The exercises were both time consuming and tedious.  When the parents met with the school principal to discuss the matter, the principal told them that the math program was following the Common Core by promoting deeper learning.  The parents withdrew their child from the school and enrolled him in private school.

 

The value of standard algorithms is that they are efficient and packed with mathematics.  Once students have mastered single-digit operations and the meaning of place value, the standard algorithms reveal to students that they can take procedures that they already know work well with one- and two-digit numbers, and by applying them over and over again, solve problems with large numbers.  Traditionalists and reformers have different goals.  Reformers believe exposure to several algorithms encourages flexible thinking and the ability to draw on multiple strategies for solving problems.  Traditionalists believe that a bigger problem than students learning too few algorithms is that too few students learn even one algorithm.

 

I have been a critic of the math reform movement since I taught in the 1980s.  But some of their complaints have merit.  All too often, instruction on standard algorithms has left out meaning.  As Karen C. Fuson and Sybilla Beckmann point out, “an unfortunate dichotomy” emerged in math instruction: teachers taught “strategies” that implied understanding and “algorithms” that implied procedural steps that were to be memorized.  Michael Battista’s research has provided many instances of students clinging to algorithms without understanding.  He gives an example of a student who has not quite mastered the standard algorithm for addition and makes numerous errors on a worksheet.  On one item, for example, the student forgets to carry and calculates that 19 + 6 = 15.  In a post-worksheet interview, the student counts 6 units from 19 and arrives at 25.  Despite the obvious discrepancy—(25 is not 15, the student agrees)—he declares that his answers on the worksheet must be correct because the algorithm he used “always works.”[iii] 

 

Math reformers rightfully argue that blind faith in procedure has no place in a thinking mathematical classroom. Who can disagree with that?  Students should be able to evaluate the validity of answers, regardless of the procedures used, and propose alternative solutions.  Standard algorithms are tools to help them do that, but students must be able to apply them, not in a robotic way, but with understanding.

 

Conclusion

Let’s return to Carroll’s model of time and learning.  I conclude by making two points—one about curriculum and instruction, the other about implementation.

In the study of numbers, a coherent K-12 math curriculum, similar to that of the previous California and Massachusetts frameworks, can be sketched in a few short sentences.  Addition with whole numbers (including the standard algorithm) is taught in first grade, subtraction in second grade, multiplication in third grade, and division in fourth grade.  Thus, the study of whole number arithmetic is completed by the end of fourth grade.  Grades five through seven focus on rational numbers (fractions, decimals, percentages), and grades eight through twelve study advanced mathematics.  Proficiency is sought along three dimensions:  1) fluency with calculations, 2) conceptual understanding, 3) ability to solve problems.

Placing the CCSS-M standard for knowing the standard algorithms of addition and subtraction in fourth grade delays this progression by two years.  Placing the standard for the division algorithm in sixth grade continues the two-year delay.   For many fourth graders, time spent working on addition and subtraction will be wasted time.  They already have a firm understanding of addition and subtraction.  The same thing for many sixth graders—time devoted to the division algorithm will be wasted time that should be devoted to the study of rational numbers.  The numerator in Carroll’s instructional time model will be greater than the denominator, indicating the inefficient allocation of time to instruction.

As Jason Zimba points out, not everyone agrees on when the standard algorithms should be taught, the alternative algorithms that should be taught, the manner in which any algorithm should be taught, or the amount of instructional time that should be spent on computational procedures.  Such decisions are made by local educators.  Variation in these decisions will introduce variation in the implementation of the math standards.  It is true that standards, any standards, cannot control implementation, especially the twists and turns in how they are interpreted by educators and brought to life in classroom instruction.  But in this case, the standards themselves are responsible for the myriad approaches, many unproductive, that we are sure to see as schools teach various algorithms under the Common Core.


[i] Tracking, ability grouping, differentiated learning, programmed learning, individualized instruction, and personalized learning (including today’s flipped classrooms) are all attempts to solve the challenge of student heterogeneity.  

[ii] An earlier version of this post incorrectly stated that the California framework required that students know the standard algorithms for all four operations by the end of third grade. I regret the error.

[iii] Michael T. Battista (2001).  “Research and Reform in Mathematics Education,” pp. 32-84 in The Great Curriculum Debate: How Should We Teach Reading and Math? (T. Loveless, ed., Brookings Instiution Press).

Authors

     
 
 




english

Common Core and classroom instruction: The good, the bad, and the ugly


This post continues a series begun in 2014 on implementing the Common Core State Standards (CCSS).  The first installment introduced an analytical scheme investigating CCSS implementation along four dimensions:  curriculum, instruction, assessment, and accountability.  Three posts focused on curriculum.  This post turns to instruction.  Although the impact of CCSS on how teachers teach is discussed, the post is also concerned with the inverse relationship, how decisions that teachers make about instruction shape the implementation of CCSS.

A couple of points before we get started.  The previous posts on curriculum led readers from the upper levels of the educational system—federal and state policies—down to curricular decisions made “in the trenches”—in districts, schools, and classrooms.  Standards emanate from the top of the system and are produced by politicians, policymakers, and experts.  Curricular decisions are shared across education’s systemic levels.  Instruction, on the other hand, is dominated by practitioners.  The daily decisions that teachers make about how to teach under CCSS—and not the idealizations of instruction embraced by upper-level authorities—will ultimately determine what “CCSS instruction” really means.

I ended the last post on CCSS by describing how curriculum and instruction can be so closely intertwined that the boundary between them is blurred.  Sometimes stating a precise curricular objective dictates, or at least constrains, the range of instructional strategies that teachers may consider.  That post focused on English-Language Arts.  The current post focuses on mathematics in the elementary grades and describes examples of how CCSS will shape math instruction.  As a former elementary school teacher, I offer my own personal opinion on these effects.

The Good

Certain aspects of the Common Core, when implemented, are likely to have a positive impact on the instruction of mathematics. For example, Common Core stresses that students recognize fractions as numbers on a number line.  The emphasis begins in third grade:

CCSS.MATH.CONTENT.3.NF.A.2
Understand a fraction as a number on the number line; represent fractions on a number line diagram.

CCSS.MATH.CONTENT.3.NF.A.2.A
Represent a fraction 1/b on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into b equal parts. Recognize that each part has size 1/b and that the endpoint of the part based at 0 locates the number 1/b on the number line.

CCSS.MATH.CONTENT.3.NF.A.2.B
Represent a fraction a/b on a number line diagram by marking off a lengths 1/b from 0. Recognize that the resulting interval has size a/b and that its endpoint locates the number a/b on the number line.


When I first read this section of the Common Core standards, I stood up and cheered.  Berkeley mathematician Hung-Hsi Wu has been working with teachers for years to get them to understand the importance of using number lines in teaching fractions.[1] American textbooks rely heavily on part-whole representations to introduce fractions.  Typically, students see pizzas and apples and other objects—typically other foods or money—that are divided up into equal parts.  Such models are limited.  They work okay with simple addition and subtraction.  Common denominators present a bit of a challenge, but ½ pizza can be shown to be also 2/4, a half dollar equal to two quarters, and so on. 

With multiplication and division, all the little tricks students learned with whole number arithmetic suddenly go haywire.  Students are accustomed to the fact that multiplying two whole numbers yields a product that is larger than either number being multiplied: 4 X 5 = 20 and 20 is larger than both 4 and 5.[2]  How in the world can ¼ X 1/5 = 1/20, a number much smaller than either 1/4or 1/5?  The part-whole representation has convinced many students that fractions are not numbers.  Instead, they are seen as strange expressions comprising two numbers with a small horizontal bar separating them. 

I taught sixth grade but occasionally visited my colleagues’ classes in the lower grades.  I recall one exchange with second or third graders that went something like this:

“Give me a number between seven and nine.”  Giggles. 

“Eight!” they shouted. 

“Give me a number between two and three.”  Giggles.

“There isn’t one!” they shouted. 

“Really?” I’d ask and draw a number line.  After spending some time placing whole numbers on the number line, I’d observe,  “There’s a lot of space between two and three.  Is it just empty?” 

Silence.  Puzzled little faces.  Then a quiet voice.  “Two and a half?”

You have no idea how many children do not make the transition to understanding fractions as numbers and because of stumbling at this crucial stage, spend the rest of their careers as students of mathematics convinced that fractions are an impenetrable mystery.   And  that’s not true of just students.  California adopted a test for teachers in the 1980s, the California Basic Educational Skills Test (CBEST).  Beginning in 1982, even teachers already in the classroom had to pass it.   I made a nice after-school and summer income tutoring colleagues who didn’t know fractions from Fermat’s Last Theorem.  To be fair, primary teachers, teaching kindergarten or grades 1-2, would not teach fractions as part of their math curriculum and probably hadn’t worked with a fraction in decades.  So they are no different than non-literary types who think Hamlet is just a play about a young guy who can’t make up his mind, has a weird relationship with his mother, and winds up dying at the end.

Division is the most difficult operation to grasp for those arrested at the part-whole stage of understanding fractions.  A problem that Liping Ma posed to teachers is now legendary.[3]

She asked small groups of American and Chinese elementary teachers to divide 1 ¾ by ½ and to create a word problem that illustrates the calculation.  All 72 Chinese teachers gave the correct answer and 65 developed an appropriate word problem.  Only nine of the 23 American teachers solved the problem correctly.  A single American teacher was able to devise an appropriate word problem.  Granted, the American sample was not selected to be representative of American teachers as a whole, but the stark findings of the exercise did not shock anyone who has worked closely with elementary teachers in the U.S.  They are often weak at math.  Many of the teachers in Ma’s study had vague ideas of an “invert and multiply” rule but lacked a conceptual understanding of why it worked.

A linguistic convention exacerbates the difficulty.  Students may cling to the mistaken notion that “dividing in half” means “dividing by one-half.”  It does not.  Dividing in half means dividing by two.  The number line can help clear up such confusion.  Consider a basic, whole-number division problem for which third graders will already know the answer:  8 divided by 2 equals 4.   It is evident that a segment 8 units in length (measured from 0 to 8) is divided by a segment 2 units in length (measured from 0 to 2) exactly 4 times.  Modeling 12 divided by 2 and other basic facts with 2 as a divisor will convince students that whole number division works quite well on a number line. 

Now consider the number ½ as a divisor.  It will become clear to students that 8 divided by ½ equals 16, and they can illustrate that fact on a number line by showing how a segment ½ units in length divides a segment 8 units in length exactly 16 times; it divides a segment 12 units in length 24 times; and so on.  Students will be relieved to discover that on a number line division with fractions works the same as division with whole numbers.

Now, let’s return to Liping Ma’s problem: 1 ¾ divided by ½.   This problem would not be presented in third grade, but it might be in fifth or sixth grades.  Students who have been working with fractions on a number line for two or three years will have little trouble solving it.  They will see that the problem simply asks them to divide a line segment of 1 3/4 units by a segment of ½ units.  The answer is 3 ½ .  Some students might estimate that the solution is between 3 and 4 because 1 ¾ lies between 1 ½ and 2, which on the number line are the points at which the ½ unit segment, laid end on end, falls exactly three and four times.  Other students will have learned about reciprocals and that multiplication and division are inverse operations.  They will immediately grasp that dividing by ½ is the same as multiplying by 2—and since 1 ¾ x 2 = 3 ½, that is the answer.  Creating a word problem involving string or rope or some other linearly measured object is also surely within their grasp.

Conclusion

I applaud the CCSS for introducing number lines and fractions in third grade.  I believe it will instill in children an important idea: fractions are numbers.  That foundational understanding will aid them as they work with more abstract representations of fractions in later grades.   Fractions are a monumental barrier for kids who struggle with math, so the significance of this contribution should not be underestimated.

I mentioned above that instruction and curriculum are often intertwined.  I began this series of posts by defining curriculum as the “stuff” of learning—the content of what is taught in school, especially as embodied in the materials used in instruction.  Instruction refers to the “how” of teaching—how teachers organize, present, and explain those materials.  It’s each teacher’s repertoire of instructional strategies and techniques that differentiates one teacher from another even as they teach the same content.  Choosing to use a number line to teach fractions is obviously an instructional decision, but it also involves curriculum.  The number line is mathematical content, not just a teaching tool.

Guiding third grade teachers towards using a number line does not guarantee effective instruction.  In fact, it is reasonable to expect variation in how teachers will implement the CCSS standards listed above.  A small body of research exists to guide practice. One of the best resources for teachers to consult is a practice guide published by the What Works Clearinghouse: Developing Effective Fractions Instruction for Kindergarten Through Eighth Grade (see full disclosure below).[4]  The guide recommends the use of number lines as its second recommendation, but it also states that the evidence supporting the effectiveness of number lines in teaching fractions is inferred from studies involving whole numbers and decimals.  We need much more research on how and when number lines should be used in teaching fractions.

Professor Wu states the following, “The shift of emphasis from models of a fraction in the initial stage to an almost exclusive model of a fraction as a point on the number line can be done gradually and gracefully beginning somewhere in grade four. This shift is implicit in the Common Core Standards.”[5]  I agree, but the shift is also subtle.  CCSS standards include the use of other representations—fraction strips, fraction bars, rectangles (which are excellent for showing multiplication of two fractions) and other graphical means of modeling fractions.  Some teachers will manage the shift to number lines adroitly—and others will not.  As a consequence, the quality of implementation will vary from classroom to classroom based on the instructional decisions that teachers make.  

The current post has focused on what I believe to be a positive aspect of CCSS based on the implementation of the standards through instruction.  Future posts in the series—covering the “bad” and the “ugly”—will describe aspects of instruction on which I am less optimistic.



[1] See H. Wu (2014). “Teaching Fractions According to the Common Core Standards,” https://math.berkeley.edu/~wu/CCSS-Fractions_1.pdf. Also see "What's Sophisticated about Elementary Mathematics?" http://www.aft.org/sites/default/files/periodicals/wu_0.pdf

[2] Students learn that 0 and 1 are exceptions and have their own special rules in multiplication.

[3] Liping Ma, Knowing and Teaching Elementary Mathematics.

[4] The practice guide can be found at: http://ies.ed.gov/ncee/wwc/pdf/practice_guides/fractions_pg_093010.pdf I serve as a content expert in elementary mathematics for the What Works Clearinghouse.  I had nothing to do, however, with the publication cited.

[5] Wu, page 3.

Authors

     
 
 




english

Brookings Live: Girls, boys, and reading


Event Information

March 26, 2015
2:00 PM - 2:30 PM EDT

Online Only
Live Webcast

And more from the Brown Center Report on American Education



Girls outscore boys on practically every reading test given to a large population. And they have for a long time. A 1942 Iowa study found girls performing better than boys on tests of reading comprehension, vocabulary, and basic language skills, and girls have outscored boys on every reading test ever given by the National Assessment of Educational Progress (NAEP). This gap is not confined to the U.S. Reading tests administered as part of the Progress in International Reading Literacy Study (PIRLS) and the Program for International Student Assessment (PISA) reveal that the gender gap is a worldwide phenomenon.

On March 26, join Brown Center experts Tom Loveless and Matthew Chingos as they discuss the latest Brown Center Report on American Education, which examines this phenomenon. Hear what Loveless's analysis revealed about where the gender gap stands today and how it's trended over the past several decades - in the U.S. and around the world.

Tune in below or via Spreecast where you can submit questions. 

Spreecast is the social video platform that connects people.
Check out Girls, Boys, and Reading on Spreecast.

     
 
 




english

The gender gap in reading


This week marks the release of the 2015 Brown Center Report on American Education, the fourteenth issue of the series.  One of the three studies in the report, “Girls, Boys, and Reading,” examines the gender gap in reading.  Girls consistently outscore boys on reading assessments.  They have for a long time.  A 1942 study in Iowa discovered that girls were superior to boys on tests of reading comprehension, vocabulary, and basic language skills.[i]  Girls have outscored boys on the National Assessment of Educational Progress (NAEP) reading assessments since the first NAEP was administered in 1971. 

I hope you’ll read the full study—and the other studies in the report—but allow me to summarize the main findings of the gender gap study here.

Eight assessments generate valid estimates of U.S. national reading performance: the Main NAEP, given at three grades (fourth, eighth, and 12th grades); the NAEP Long Term Trend (NAEP-LTT), given at three ages (ages nine, 13, and 17); the Progress in International Reading Literacy Study (PIRLS), an international assessment given at fourth grade; and the Program for International Student Assessment (PISA), an international assessment given to 15-year-olds.  Females outscore males on the most recent administration of all eight tests.  And the gaps are statistically significant.  Expressed in standard deviation units, they range from 0.13 on the NAEP-LTT at age nine to 0.34 on the PISA at age 15.

The gaps are shrinking.  At age nine, the gap on the NAEP-LTT declined from 13 scale score points in 1971 to five points in 2012.  During the same time period, the gap at age 13 shrank from 11 points to eight points, and at age 17, from 12 points to eight points.  Only the decline at age nine is statistically significant, but at ages 13 and 17, declines since the gaps peaked in the 1990s are also statistically significant.  At all three ages, gaps are shrinking because of males making larger gains on NAEP than females.  In 2012, seventeen-year-old females scored the same on the NAEP reading test as they did in 1971.  Otherwise, males and females of all ages registered gains on the NAEP reading test from 1971-2012, with males’ gains outpacing those of females.

The gap is worldwide.  On the 2012 PISA, 15-year-old females outperformed males in all sixty-five participating countries.  Surprisingly, Finland, a nation known for both equity and excellence because of its performance on PISA, evidenced the widest gap.  Girls scored 556 and boys scored 494, producing an astonishing gap of 62 points (about 0.66 standard deviations—or more than one and a half years of schooling).   Finland also had one of the world’s largest gender gaps on the 2000 PISA, and since then it has widened.  Both girls’ and boys’ reading scores declined, but boys’ declined more (26 points vs. 16 points).  To put the 2012 scores in perspective, consider that the OECD average on the reading test is 496.  Finland’s strong showing on PISA is completely dependent on the superior performance of its young women.

The gap seems to disappear by adulthood.  Tests of adult reading ability show no U.S. gender gap in reading by 25 years of age.  Scores even tilt toward men in later years. 

The words “seems to disappear” are used on purpose.  One must be careful with cross-sectional data not to assume that differences across age groups indicate an age-based trend.  A recent Gallup poll, for example, asked several different age groups how optimistic they were about finding jobs as adults.  Optimism fell from 68% in grade five to 48% in grade 12.  The authors concluded that “optimism about future job pursuits declines over time.”  The data do not support that conclusion.  The data were collected at a single point in time and cannot speak to what optimism may have been before or after that point.  Perhaps today’s 12th graders were even more pessimistic several years ago when they were in fifth grade.  Perhaps the 12th-graders are old enough to remember when unemployment spiked during the Great Recession and the fifth-graders are not.   Perhaps 12th-graders are simply savvier about job prospects and the pitfalls of seeking employment, topics on which fifth-graders are basically clueless.

At least with the data cited above we can track measures of the same cohorts’ gender gap in reading over time.  By analyzing multiple cross-sections—data collected at several different points in time—we can look at real change.  Those cohorts of nine-year-olds in the 1970s, 1980s, and 1990s, are—respectively—today in their 50s, 40s, and 30s.  Girls were better readers than boys when these cohorts were children, but as grown ups, women are not appreciably better readers than men.

Care must be taken nevertheless in drawing firm conclusions.  There exists what are known as cohort effects that can bias measurements.  I mentioned the Great Recession.   Experiencing great historical cataclysms, especially war or economic chaos, may bias a particular cohort’s responses to survey questions or even its performance on tests.  American generations who experienced the Great Depression, World War II, and the Vietnam War—and more recently, the digital revolution, the Great Recession, and the Iraq War—lived through events that uniquely shape their outlook on many aspects of life. 

What Should be Done?

The gender gap is large, worldwide, and persistent through the K-12 years. What should be done about it?  Maybe nothing.  As just noted, the gap seems to dissipate by adulthood.  Moreover, crafting an effective remedy for the gender gap is made more difficult because we don’t definitely know its cause. Enjoyment of reading is a good example.  Many commentators argue that schools should make a concerted effort to get boys to enjoy reading more.  Enjoyment of reading is statistically correlated with reading performance, and the hope is that making reading more enjoyable would get boys to read more, thereby raising reading skills.

It makes sense, but I’m skeptical.  The fact that better readers enjoy reading more than poor readers—and that the relationship stands up even after boatloads of covariates are poured into a regression equation—is unpersuasive evidence of causality.  As I stated earlier, PISA produces data collected at a single point in time.  It isn’t designed to test causal theories.  Reverse causality is a profound problem.  Getting kids to enjoy reading more may in fact boost reading ability.  But the causal relationship might be flowing in the opposite direction, with enhanced skill leading to enjoyment.   The correlation could simply be indicating that people enjoy activities that they’re good at—a relationship that probably exists in sports, music, and many human endeavors, including reading.

A Key Policy Question

A key question for policymakers is whether boosting boys’ enjoyment of reading would help make boys better readers.  I investigate by analyzing national changes in PISA reading scores from 2000, when the test was first given, to 2102.  PISA creates an Index of Reading Enjoyment based on several responses to a student questionnaire.  Enjoyment of reading has increased among males in some countries and decreased in others.  Is there any relationship between changes in boys’ enjoyment and changes in PISA reading scores? 

There is not.  The correlation coefficient for the two phenomena is -0.01.  Nations such as Germany raised boys’ enjoyment of reading and increased their reading scores by about 10 points on the PISA scale.  France, on the other hand, also raised boys’ enjoyment of reading, but French males’ reading scores declined by 15 points.  Ireland increased how much boys enjoy reading by a little bit but the boys’ scores fell a whopping 37 points. Poland’s males actually enjoyed reading less in 2012 than in 2000, but their scores went up more than 14 points.  No relationship.

Some Final Thoughts

How should policymakers proceed?  Large, cross-sectional assessments are good for measuring academic performance at one point in time.  They are useful for generating hypotheses based on observed relationships, but they are not designed to confirm or reject causality.  To do that, randomized control trials should be conducted of programs purporting to boost reading enjoyment.  Also, consider that it ultimately may not matter whether enjoying reading leads to more proficient readers.  Enjoyment of reading may be an end worthy of attainment irrespective of its relationship to achievement.  In that case, RCTs should carefully evaluate the impact of interventions on both enjoyment of reading and reading achievement, whether the two are related or not.  



[i] J.B. Stroud and E.F. Lindquist, “Sex differences in achievement in the elementary and secondary schools,” Journal of Educational Psychology, vol. 33(9) (Washington, D.C.: American Psychological Association, 1942), 657–667.

Authors

     
 
 




english

Measuring effects of the Common Core


Part II of the 2015 Brown Center Report on American Education

Over the next several years, policy analysts will evaluate the impact of the Common Core State Standards (CCSS) on U.S. education.  The task promises to be challenging.  The question most analysts will focus on is whether the CCSS is good or bad policy.  This section of the Brown Center Report (BCR) tackles a set of seemingly innocuous questions compared to the hot-button question of whether Common Core is wise or foolish.  The questions all have to do with when Common Core actually started, or more precisely, when the Common Core started having an effect on student learning.  And if it hasn’t yet had an effect, how will we know that CCSS has started to influence student achievement? 

The analysis below probes this issue empirically, hopefully persuading readers that deciding when a policy begins is elemental to evaluating its effects.  The question of a policy’s starting point is not always easy to answer.  Yet the answer has consequences.  You can’t figure out whether a policy worked or not unless you know when it began.[i] 

The analysis uses surveys of state implementation to model different CCSS starting points for states and produces a second early report card on how CCSS is doing.  The first report card, focusing on math, was presented in last year’s BCR.  The current study updates state implementation ratings that were presented in that report and extends the analysis to achievement in reading.  The goal is not only to estimate CCSS’s early impact, but also to lay out a fair approach for establishing when the Common Core’s impact began—and to do it now before data are generated that either critics or supporters can use to bolster their arguments.  The experience of No Child Left Behind (NCLB) illustrates this necessity.

Background

After the 2008 National Assessment of Educational Progress (NAEP) scores were released, former Secretary of Education Margaret Spellings claimed that the new scores showed “we are on the right track.”[ii] She pointed out that NAEP gains in the previous decade, 1999-2009, were much larger than in prior decades.  Mark Schneider of the American Institutes of Research (and a former Commissioner of the National Center for Education Statistics [NCES]) reached a different conclusion. He compared NAEP gains from 1996-2003 to 2003-2009 and declared NCLB’s impact disappointing.  “The pre-NCLB gains were greater than the post-NCLB gains.”[iii]  It is important to highlight that Schneider used the 2003 NAEP scores as the starting point for assessing NCLB.  A report from FairTest on the tenth anniversary of NCLB used the same demarcation for pre- and post-NCLB time frames.[iv]  FairTest is an advocacy group critical of high stakes testing—and harshly critical of NCLB—but if the 2003 starting point for NAEP is accepted, its conclusion is indisputable, “NAEP score improvement slowed or stopped in both reading and math after NCLB was implemented.” 

Choosing 2003 as NCLB’s starting date is intuitively appealing.  The law was introduced, debated, and passed by Congress in 2001.  President Bush signed NCLB into law on January 8, 2002.  It takes time to implement any law.  The 2003 NAEP is arguably the first chance that the assessment had to register NCLB’s effects. 

Selecting 2003 is consequential, however.  Some of the largest gains in NAEP’s history were registered between 2000 and 2003.  Once 2003 is established as a starting point (or baseline), pre-2003 gains become “pre-NCLB.”  But what if the 2003 NAEP scores were influenced by NCLB? Experiments evaluating the effects of new drugs collect baseline data from subjects before treatment, not after the treatment has begun.   Similarly, evaluating the effects of public policies require that baseline data are not influenced by the policies under evaluation.   

Avoiding such problems is particularly difficult when state or local policies are adopted nationally.  The federal effort to establish a speed limit of 55 miles per hour in the 1970s is a good example.  Several states already had speed limits of 55 mph or lower prior to the federal law’s enactment.  Moreover, a few states lowered speed limits in anticipation of the federal limit while the bill was debated in Congress.  On the day President Nixon signed the bill into law—January 2, 1974—the Associated Press reported that only 29 states would be required to lower speed limits.  Evaluating the effects of the 1974 law with national data but neglecting to adjust for what states were already doing would obviously yield tainted baseline data.

There are comparable reasons for questioning 2003 as a good baseline for evaluating NCLB’s effects.  The key components of NCLB’s accountability provisions—testing students, publicizing the results, and holding schools accountable for results—were already in place in nearly half the states.  In some states they had been in place for several years.  The 1999 iteration of Quality Counts, Education Week’s annual report on state-level efforts to improve public education, entitled Rewarding Results, Punishing Failure, was devoted to state accountability systems and the assessments underpinning them. Testing and accountability are especially important because they have drawn fire from critics of NCLB, a law that wasn’t passed until years later.

The Congressional debate of NCLB legislation took all of 2001, allowing states to pass anticipatory policies.  Derek Neal and Diane Whitmore Schanzenbach reported that “with the passage of NCLB lurking on the horizon,” Illinois placed hundreds of schools on a watch list and declared that future state testing would be high stakes.[v] In the summer and fall of 2002, with NCLB now the law of the land, state after state released lists of schools falling short of NCLB’s requirements.  Then the 2002-2003 school year began, during which the 2003 NAEP was administered.  Using 2003 as a NAEP baseline assumes that none of these activities—previous accountability systems, public lists of schools in need of improvement, anticipatory policy shifts—influenced achievement.  That is unlikely.[vi]

The Analysis

Unlike NCLB, there was no “pre-CCSS” state version of Common Core.  States vary in how quickly and aggressively they have implemented CCSS.  For the BCR analyses, two indexes were constructed to model CCSS implementation.  They are based on surveys of state education agencies and named for the two years that the surveys were conducted.  The 2011 survey reported the number of programs (e.g., professional development, new materials) on which states reported spending federal funds to implement CCSS.  Strong implementers spent money on more activities.  The 2011 index was used to investigate eighth grade math achievement in the 2014 BCR.  A new implementation index was created for this year’s study of reading achievement.  The 2013 index is based on a survey asking states when they planned to complete full implementation of CCSS in classrooms.  Strong states aimed for full implementation by 2012-2013 or earlier.      

Fourth grade NAEP reading scores serve as the achievement measure.  Why fourth grade and not eighth?  Reading instruction is a key activity of elementary classrooms but by eighth grade has all but disappeared.  What remains of “reading” as an independent subject, which has typically morphed into the study of literature, is subsumed under the English-Language Arts curriculum, a catchall term that also includes writing, vocabulary, listening, and public speaking.  Most students in fourth grade are in self-contained classes; they receive instruction in all subjects from one teacher.  The impact of CCSS on reading instruction—the recommendation that non-fiction take a larger role in reading materials is a good example—will be concentrated in the activities of a single teacher in elementary schools. The burden for meeting CCSS’s press for non-fiction, on the other hand, is expected to be shared by all middle and high school teachers.[vii] 

Results

Table 2-1 displays NAEP gains using the 2011 implementation index.  The four year period between 2009 and 2013 is broken down into two parts: 2009-2011 and 2011-2013.  Nineteen states are categorized as “strong” implementers of CCSS on the 2011 index, and from 2009-2013, they outscored the four states that did not adopt CCSS by a little more than one scale score point (0.87 vs. -0.24 for a 1.11 difference).  The non-adopters are the logical control group for CCSS, but with only four states in that category—Alaska, Nebraska, Texas, and Virginia—it is sensitive to big changes in one or two states.  Alaska and Texas both experienced a decline in fourth grade reading scores from 2009-2013.

The 1.11 point advantage in reading gains for strong CCSS implementers is similar to the 1.27 point advantage reported last year for eighth grade math.  Both are small.  The reading difference in favor of CCSS is equal to approximately 0.03 standard deviations of the 2009 baseline reading score.  Also note that the differences were greater in 2009-2011 than in 2011-2013 and that the “medium” implementers performed as well as or better than the strong implementers over the entire four year period (gain of 0.99).

Table 2-2 displays calculations using the 2013 implementation index.  Twelve states are rated as strong CCSS implementers, seven fewer than on the 2011 index.[viii]  Data for the non-adopters are the same as in the previous table.  In 2009-2013, the strong implementers gained 1.27 NAEP points compared to -0.24 among the non-adopters, a difference of 1.51 points.  The thirty-four states rated as medium implementers gained 0.82.  The strong implementers on this index are states that reported full implementation of CCSS-ELA by 2013.  Their larger gain in 2011-2013 (1.08 points) distinguishes them from the strong implementers in the previous table.  The overall advantage of 1.51 points over non-adopters represents about 0.04 standard deviations of the 2009 NAEP reading score, not a difference with real world significance.  Taken together, the 2011 and 2013 indexes estimate that NAEP reading gains from 2009-2013 were one to one and one-half scale score points larger in the strong CCSS implementation states compared to the states that did not adopt CCSS.

Common Core and Reading Content

As noted above, the 2013 implementation index is based on when states scheduled full implementation of CCSS in classrooms.  Other than reading achievement, does the index seem to reflect changes in any other classroom variable believed to be related to CCSS implementation?  If the answer is “yes,” that would bolster confidence that the index is measuring changes related to CCSS implementation. 

Let’s examine the types of literature that students encounter during instruction.  Perhaps the most controversial recommendation in the CCSS-ELA standards is the call for teachers to shift the content of reading materials away from stories and other fictional forms of literature in favor of more non-fiction.  NAEP asks fourth grade teachers the extent to which they teach fiction and non-fiction over the course of the school year (see Figure 2-1). 

Historically, fiction dominates fourth grade reading instruction.  It still does.  The percentage of teachers reporting that they teach fiction to a “large extent” exceeded the percentage answering “large extent” for non-fiction by 23 points in 2009 and 25 points in 2011.  In 2013, the difference narrowed to only 15 percentage points, primarily because of non-fiction’s increased use.  Fiction still dominated in 2013, but not by as much as in 2009.

The differences reported in Table 2-3 are national indicators of fiction’s declining prominence in fourth grade reading instruction.  What about the states?  We know that they were involved to varying degrees with the implementation of Common Core from 2009-2013.  Is there evidence that fiction’s prominence was more likely to weaken in states most aggressively pursuing CCSS implementation? 

Table 2-3 displays the data tackling that question.  Fourth grade teachers in strong implementation states decisively favored the use of fiction over non-fiction in 2009 and 2011.  But the prominence of fiction in those states experienced a large decline in 2013 (-12.4 percentage points).  The decline for the entire four year period, 2009-2013, was larger in the strong implementation states (-10.8) than in the medium implementation (-7.5) or non-adoption states (-9.8).  

Conclusion

This section of the Brown Center Report analyzed NAEP data and two indexes of CCSS implementation, one based on data collected in 2011, the second from data collected in 2013.  NAEP scores for 2009-2013 were examined.  Fourth grade reading scores improved by 1.11 scale score points in states with strong implementation of CCSS compared to states that did not adopt CCSS.  A similar comparison in last year’s BCR found a 1.27 point difference on NAEP’s eighth grade math test, also in favor of states with strong implementation of CCSS.  These differences, although certainly encouraging to CCSS supporters, are quite small, amounting to (at most) 0.04 standard deviations (SD) on the NAEP scale.  A threshold of 0.20 SD—five times larger—is often invoked as the minimum size for a test score change to be regarded as noticeable.  The current study’s findings are also merely statistical associations and cannot be used to make causal claims.  Perhaps other factors are driving test score changes, unmeasured by NAEP or the other sources of data analyzed here. 

The analysis also found that fourth grade teachers in strong implementation states are more likely to be shifting reading instruction from fiction to non-fiction texts.  That trend should be monitored closely to see if it continues.  Other events to keep an eye on as the Common Core unfolds include the following:

1.  The 2015 NAEP scores, typically released in the late fall, will be important for the Common Core.  In most states, the first CCSS-aligned state tests will be given in the spring of 2015.  Based on the earlier experiences of Kentucky and New York, results are expected to be disappointing.  Common Core supporters can respond by explaining that assessments given for the first time often produce disappointing results.  They will also claim that the tests are more rigorous than previous state assessments.  But it will be difficult to explain stagnant or falling NAEP scores in an era when implementing CCSS commands so much attention.   

2.  Assessment will become an important implementation variable in 2015 and subsequent years.  For analysts, the strategy employed here, modeling different indicators based on information collected at different stages of implementation, should become even more useful.  Some states are planning to use Smarter Balanced Assessments, others are using the Partnership for Assessment of Readiness for College and Careers (PARCC), and still others are using their own homegrown tests.   To capture variation among the states on this important dimension of implementation, analysts will need to use indicators that are up-to-date.

3.  The politics of Common Core injects a dynamic element into implementation.  The status of implementation is constantly changing.  States may choose to suspend, to delay, or to abandon CCSS.  That will require analysts to regularly re-configure which states are considered “in” Common Core and which states are “out.”  To further complicate matters, states may be “in” some years and “out” in others.

A final word.  When the 2014 BCR was released, many CCSS supporters commented that it is too early to tell the effects of Common Core.  The point that states may need more time operating under CCSS to realize its full effects certainly has merit.  But that does not discount everything states have done so far—including professional development, purchasing new textbooks and other instructional materials, designing new assessments, buying and installing computer systems, and conducting hearings and public outreach—as part of implementing the standards.  Some states are in their fifth year of implementation.  It could be that states need more time, but innovations can also produce their biggest “pop” earlier in implementation rather than later.  Kentucky was one of the earliest states to adopt and implement CCSS.  That state’s NAEP fourth grade reading score declined in both 2009-2011 and 2011-2013.  The optimism of CCSS supporters is understandable, but a one and a half point NAEP gain might be as good as it gets for CCSS.



[i] These ideas were first introduced in a 2013 Brown Center Chalkboard post I authored, entitled, “When Does a Policy Start?”

[ii] Maria Glod, “Since NCLB, Math and Reading Scores Rise for Ages 9 and 13,” Washington Post, April 29, 2009.

[iii] Mark Schneider, “NAEP Math Results Hold Bad News for NCLB,” AEIdeas (Washington, D.C.: American Enterprise Institute, 2009).

[iv] Lisa Guisbond with Monty Neill and Bob Schaeffer, NCLB’s Lost Decade for Educational Progress: What Can We Learn from this Policy Failure? (Jamaica Plain, MA: FairTest, 2012).

[v] Derek Neal and Diane Schanzenbach, “Left Behind by Design: Proficiency Counts and Test-Based Accountability,” NBER Working Paper No. W13293 (Cambridge: National Bureau of Economic Research, 2007), 13.

[vi] Careful analysts of NCLB have allowed different states to have different starting dates: see Thomas Dee and Brian A. Jacob, “Evaluating NCLB,” Education Next 10, no. 3 (Summer 2010); Manyee Wong, Thomas D. Cook, and Peter M. Steiner, “No Child Left Behind: An Interim Evaluation of Its Effects on Learning Using Two Interrupted Time Series Each with Its Own Non-Equivalent Comparison Series,” Working Paper 09-11 (Evanston, IL: Northwestern University Institute for Policy Research, 2009).

[vii] Common Core State Standards Initiative. “English Language Arts Standards, Key Design Consideration.” Retrieved from: http://www.corestandards.org/ELA-Literacy/introduction/key-design-consideration/

[viii] Twelve states shifted downward from strong to medium and five states shifted upward from medium to strong, netting out to a seven state swing.

« Part I: Girls, boys, and reading Part III: Student Engagement »

Downloads

Authors

     
 
 




english

Student engagement


Part III of the 2015 Brown Center Report on American Education

Student engagement refers to the intensity with which students apply themselves to learning in school.  Traits such as motivation, enjoyment, and curiosity—characteristics that have interested researchers for a long time—have been joined recently by new terms such as, “grit,” which now approaches cliché status.  International assessments collect data from students on characteristics related to engagement.  This study looks at data from the Program for International Student Assessment (PISA), an international test given to fifteen-year-olds.  In the U.S., most PISA students are in the fall of their sophomore year.  The high school years are a time when many observers worry that students lose interest in school.

Compared to their peers around the world, how do U.S. students appear on measures of engagement?  Are national indicators of engagement related to achievement?  This analysis concludes that American students are about average in terms of engagement.  Data reveal that several countries noted for their superior ranking on PISA—e.g., Korea, Japan, Finland, Poland, and the Netherlands—score below the U.S. on measures of student engagement.  Thus, the relationship of achievement to student engagement is not clear cut, with some evidence pointing toward a weak positive relationship and other evidence indicating a modest negative relationship. 

The Unit of Analysis Matters

Education studies differ in units of analysis.  Some studies report data on individuals, with each student serving as an observation.  Studies of new reading or math programs, for example, usually report an average gain score or effect size representing the impact of the program on the average student.  Others studies report aggregated data, in which test scores or other measurements are averaged to yield a group score. Test scores of schools, districts, states, or countries are constructed like that.  These scores represent the performance of groups, with each group serving as a single observation, but they are really just data from individuals that have been aggregated to the group level.

Aggregated units are particularly useful for policy analysts.  Analysts are interested in how Fairfax County or the state of Virginia or the United States is doing.  Governmental bodies govern those jurisdictions and policymakers craft policy for all of the citizens within the political jurisdiction—not for an individual.  

The analytical unit is especially important when investigating topics like student engagement and their relationships with achievement.  Those relationships are inherently individual, focusing on the interaction of psychological characteristics.  They are also prone to reverse causality, meaning that the direction of cause and effect cannot readily be determined.  Consider self-esteem and academic achievement.  Determining which one is cause and which is effect has been debated for decades.  Students who are good readers enjoy books, feel pretty good about their reading abilities, and spend more time reading than other kids.  The possibility of reverse causality is one reason that beginning statistics students learn an important rule:  correlation is not causation.

Starting with the first international assessments in the 1960s, a curious pattern has emerged. Data on students’ attitudes toward studying school subjects, when examined on a national level, often exhibit the opposite relationship with achievement than one would expect.  The 2006 Brown Center Report (BCR) investigated the phenomenon in a study of “the happiness factor” in learning.[i]  Test scores of fourth graders in 25 countries and eighth graders in 46 countries were analyzed.  Students in countries with low math scores were more likely to report that they enjoyed math than students in high-scoring countries.  Correlation coefficients for the association of enjoyment and achievement were -0.67 at fourth grade and -0.75 at eighth grade. 

Confidence in math performance was also inversely related to achievement.  Correlation coefficients for national achievement and the percentage of students responding affirmatively to the statement, “I usually do well in mathematics,” were -0.58 among fourth graders and -0.64 among eighth graders.  Nations with the most confident math students tend to perform poorly on math tests; nations with the least confident students do quite well.   

That is odd.  What’s going on?  A comparison of Singapore and the U.S. helps unravel the puzzle.  The data in figure 3-1 are for eighth graders on the 2003 Trends in Mathematics and Science Study (TIMSS).  U.S. students were very confident—84% either agreed a lot or a little (39% + 45%) with the statement that they usually do well in mathematics.  In Singapore, the figure was 64% (46% + 18%).  With a score of 605, however, Singaporean students registered about one full standard deviation (80 points) higher on the TIMSS math test compared to the U.S. score of 504. 

When within-country data are examined, the relationship exists in the expected direction.  In Singapore, highly confident students score 642, approximately 100 points above the least-confident students (551).  In the U.S., the gap between the most- and least-confident students was also about 100 points—but at a much lower level on the TIMSS scale, at 541 and 448.  Note that the least-confident Singaporean eighth grader still outscores the most-confident American, 551 to 541.

The lesson is that the unit of analysis must be considered when examining data on students’ psychological characteristics and their relationship to achievement.  If presented with country-level associations, one should wonder what the within-country associations are.  And vice versa.  Let’s keep that caution in mind as we now turn to data on fifteen-year-olds’ intrinsic motivation and how nations scored on the 2012 PISA.

Intrinsic Motivation

PISA’s index of intrinsic motivation to learn mathematics comprises responses to four items on the student questionnaire:  1) I enjoy reading about mathematics; 2) I look forward to my mathematics lessons; 3) I do mathematics because I enjoy it; and 4) I am interested in the things I learn in mathematics.  Figure 3-2 shows the percentage of students in OECD countries—thirty of the most economically developed nations in the world—responding that they agree or strongly agree with the statements.  A little less than one-third (30.6%) of students responded favorably to reading about math, 35.5% responded favorably to looking forward to math lessons, 38.2% reported doing math because they enjoy it, and 52.9% said they were interested in the things they learn in math.  A ballpark estimate, then, is that one-third to one-half of students respond affirmatively to the individual components of PISA’s intrinsic motivation index.

Table 3-1 presents national scores on the 2012 index of intrinsic motivation to learn mathematics.  The index is scaled with an average of 0.00 and a standard deviation of 1.00.  Student index scores are averaged to produce a national score.  The scores of 39 nations are reported—29 OECD countries and 10 partner countries.[ii]  Indonesia appears to have the most intrinsically motivated students in the world (0.80), followed by Thailand (0.77), Mexico (0.67), and Tunisia (0.59).  It is striking that developing countries top the list.  Universal education at the elementary level is only a recent reality in these countries, and they are still struggling to deliver universally accessible high schools, especially in rural areas and especially to girls.  The students who sat for PISA may be an unusually motivated group.  They also may be deeply appreciative of having an opportunity that their parents never had.

The U.S. scores about average (0.08) on the index, statistically about the same as New Zealand, Australia, Ireland, and Canada.  The bottom of the table is extremely interesting.  Among the countries with the least intrinsically motivated kids are some PISA high flyers.  Austria has the least motivated students (-0.35), but that is not statistically significantly different from the score for the Netherlands (-0.33).  What’s surprising is that Korea (-0.20), Finland (-0.22), Japan (-0.23), and Belgium (-0.24) score at the bottom of the intrinsic motivation index even though they historically do quite well on the PISA math test.

Enjoying Math and Looking Forward to Math Lessons

Let’s now dig a little deeper into the intrinsic motivation index.  Two components of the index are how students respond to “I do mathematics because I enjoy it” and “I look forward to my mathematics lessons.”  These sentiments are directly related to schooling.  Whether students enjoy math or look forward to math lessons is surely influenced by factors such as teachers and curriculum.  Table 3-2 rank orders PISA countries by the percentage of students who “agree” or “strongly agree” with the questionnaire prompts.  The nations’ 2012 PISA math scores are also tabled.  Indonesia scores at the top of both rankings, with 78.3% enjoying math and 72.3% looking forward to studying the subject.  However, Indonesia’s PISA math score of 375 is more than one full standard deviation below the international mean of 494 (standard deviation of 92).  The tops of the tables are primarily dominated by low-performing countries, but not exclusively so.  Denmark is an average-performing nation that has high rankings on both sentiments.  Liechtenstein, Hong Kong-China, and Switzerland do well on the PISA math test and appear to have contented, positively-oriented students.

Several nations of interest are shaded.  The bar across the middle of the tables, encompassing Australia and Germany, demarcates the median of the two lists, with 19 countries above and 19 below that position.  The United States registers above the median on looking forward to math lessons (45.4%) and a bit below the median on enjoyment (36.6%).  A similar proportion of students in Poland—a country recently celebrated in popular media and in Amanda Ripley’s book, The Smartest Kids in the World,[iii] for making great strides on PISA tests—enjoy math (36.1%), but only 21.3% of Polish kids look forward to their math lessons, very near the bottom of the list, anchored by Netherlands at 19.8%. 

Korea also appears in Ripley’s book.  It scores poorly on both items.  Only 30.7% of Korean students enjoy math, and less than that, 21.8%, look forward to studying the subject.  Korean education is depicted unflatteringly in Ripley’s book—as an academic pressure cooker lacking joy or purpose—so its standing here is not surprising.  But Finland is another matter.  It is portrayed as laid-back and student-centered, concerned with making students feel relaxed and engaged.  Yet, only 28.8% of Finnish students say that they study mathematics because they enjoy it (among the bottom four countries) and only 24.8% report that they look forward to math lessons (among the bottom seven countries).  Korea, the pressure cooker, and Finland, the laid-back paradise, look about the same on these dimensions.

Another country that is admired for its educational system, Japan, does not fare well on these measures.  Only 30.8% of students in Japan enjoy mathematics, despite the boisterous, enthusiastic classrooms that appear in Elizabeth Green’s recent book, Building a Better Teacher.[iv]  Japan does better on the percentage of students looking forward to their math lessons (33.7%), but still places far below the U.S.  Green’s book describes classrooms with younger students, but even so, surveys of Japanese fourth and eighth graders’ attitudes toward studying mathematics report results similar to those presented here.  American students say that they enjoy their math classes and studying math more than students in Finland, Japan, and Korea.

It is clear from Table 3-2 that at the national level, enjoying math is not positively related to math achievement.  Nor is looking forward to one’s math lessons.  The correlation coefficients reported in the last row of the table quantify the magnitude of the inverse relationships.  The -0.58 and -0.57 coefficients indicate a moderately negative association, meaning, in plain English, that countries with students who enjoy math or look forward to math lessons tend to score below average on the PISA math test.  And high-scoring nations tend to register below average on these measures of student engagement.  Country-level associations, however, should be augmented with student-level associations that are calculated within each country.

Within-Country Associations of Student Engagement with Math Performance

The 2012 PISA volume on student engagement does not present within-country correlation coefficients on intrinsic motivation or its components.  But it does offer within-country correlations of math achievement with three other characteristics relevant to student engagement. Table 3-3 displays statistics for students’ responses to: 1) if they feel like they belong at school; 2) their attitudes toward school, an index composed of four factors;[v] and 3) whether they had arrived late for school in the two weeks prior to the PISA test. These measures reflect an excellent mix of behaviors and dispositions.

The within-country correlations trend in the direction expected but they are small in magnitude.  Correlation coefficients for math performance and a sense of belonging at school range from -0.02 to 0.18, meaning that the country exhibiting the strongest relationship between achievement and a sense of belonging—Thailand, with a 0.18 correlation coefficient—isn’t registering a strong relationship at all.  The OECD average is 0.08, which is trivial.  The U.S. correlation coefficient, 0.07, is also trivial.  The relationship of achievement with attitudes toward school is slightly stronger (OECD average of 0.11), but is still weak.

Of the three characteristics, arriving late for school shows the strongest correlation, an unsurprising inverse relationship of -0.14 in OECD countries and -0.20 in the U.S.  Students who tend to be tardy also tend to score lower on math tests.  But, again, the magnitude is surprisingly small.  The coefficients are statistically significant because of large sample sizes, but in a real world “would I notice this if it were in my face?” sense, no, the correlation coefficients are suggesting not much of a relationship at all.    

The PISA report presents within-country effect sizes for the intrinsic motivation index, calculating the achievement gains associated with a one unit change in the index.  One of several interesting findings is that intrinsic motivation is more strongly associated with gains at the top of the achievement distribution, among students at the 90th percentile in math scores, than at the bottom of the distribution, among students at the 10th percentile. 

The report summarizes the within-country effect sizes with this statement: “On average across OECD countries, a change of one unit in the index of intrinsic motivation to learn mathematics translates into a 19 score-point difference in mathematics performance.”[vi]  This sentence can be easily misinterpreted.  It means that within each of the participating countries students who differ by one unit on PISA’s 2012 intrinsic motivation index score about 19 points apart on the 2012 math test.  It does not mean that a country that gains one unit on the intrinsic motivation index can expect a 19 point score increase.[vii]    

Let’s now see what that association looks like at the national level.

National Changes in Intrinsic Motivation, 2003-2012

PISA first reported national scores on the index of intrinsic motivation to learn mathematics in 2003.  Are gains that countries made on the index associated with gains on PISA’s math test?  Table 3-4 presents a score card on the question, reporting the changes that occurred in thirty-nine nations—in both the index and math scores—from 2003 to 2012.  Seventeen nations made statistically significant gains on the index; fourteen nations had gains that were, in a statistical sense, indistinguishable from zero—labeled “no change” in the table; and eight nations experienced statistically significant declines in index scores. 

The U.S. scored 0.00 in 2003 and 0.08 in 2012, notching a gain of 0.08 on the index (statistically significant).  Its PISA math score declined from 483 to 481, a decline of 2 scale score points (not statistically significant).

Table 3-4 makes it clear that national changes on PISA’s intrinsic motivation index are not associated with changes in math achievement.  The countries registering gains on the index averaged a decline of 3.7 points on PISA’s math assessment.  The countries that remained about the same on the index had math scores that also remain essentially unchanged (-0.09) And the most striking finding: countries that declined on the index (average of -0.15) actually gained an average of 10.3 points on the PISA math scale.  Intrinsic motivation went down; math scores went up.  The correlation coefficient for the relationship over all, not shown in the table, is -0.30.

Conclusion

The analysis above investigated student engagement.  International data from the 2012 PISA were examined on several dimensions of student engagement, focusing on a measure that PISA has employed since 2003, the index of intrinsic motivation to learn mathematics.  The U.S. scored near the middle of the distribution on the 2012 index.  PISA analysts calculated that, on average, a one unit change in the index was associated with a 19 point gain on the PISA math test.  That is the average of within-country calculations, using student-level data that measure the association of intrinsic motivation with PISA score.  It represents an effect size of about 0.20—a positive effect, but one that is generally considered small in magnitude.[viii]

The unit of analysis matters.  Between-country associations often differ from within-country associations.  The current study used a difference in difference approach that calculated the correlation coefficient for two variables at the national level: the change in intrinsic motivation index from 2003-2012 and change in PISA score for the same time period.  That analysis produced a correlation coefficient of -0.30, a negative relationship that is also generally considered small in magnitude.

Neither approach can justify causal claims nor address the possibility of reverse causality occurring—the possibility that high math achievement boosts intrinsic motivation to learn math, rather than, or even in addition to, high levels of motivation leading to greater learning.  Poor math achievement may cause intrinsic motivation to fall.  Taken together, the analyses lead to the conclusion that PISA provides, at best, weak evidence that raising student motivation is associated with achievement gains.  Boosting motivation may even produce declines in achievement.

Here’s the bottom line for what PISA data recommends to policymakers: Programs designed to boost student engagement—perhaps a worthy pursuit even if unrelated to achievement—should be evaluated for their effects in small scale experiments before being adopted broadly.  The international evidence does not justify wide-scale concern over current levels of student engagement in the U.S. or support the hypothesis that boosting student engagement would raise student performance nationally.

Let’s conclude by considering the advantages that national-level, difference in difference analyses provide that student-level analyses may overlook.

1. They depict policy interventions more accurately.  Policies are actions of a political unit affecting all of its members.  They do not simply affect the relationship of two characteristics within an individual’s psychology. Policymakers who ask the question, “What happens when a country boosts student engagement?” are asking about a country-level phenomenon.

2.  Direction of causality can run differently at the individual and group levels.  For example, we know that enjoying a school subject and achievement on tests of that subject are positively correlated at the individual level.  But they are not always correlated—and can in fact be negatively correlated—at the group level. 

3.  By using multiple years of panel data and calculating change over time, a difference in difference analysis controls for unobserved variable bias by “baking into the cake” those unobserved variables at the baseline.  The unobserved variables are assumed to remain stable over the time period of the analysis.  For the cultural factors that many analysts suspect influence between-nation test score differences, stability may be a safe assumption.  Difference in difference, then, would be superior to cross-sectional analyses in controlling for cultural influences that are omitted from other models.

4.  Testing artifacts from a cultural source can also be dampened.  Characteristics such as enjoyment are culturally defined, and the language employed to describe them is also culturally bounded.  Consider two of the questionnaire items examined above: whether kids “enjoy” math and how much they “look forward” to math lessons.  Cultural differences in responding to these prompts will be reflected in between-country averages at the baseline, and any subsequent changes will reflect fluctuations net of those initial differences.



[i] Tom Loveless, “The Happiness Factor in Student Learning,” The 2006 Brown Center Report on American Education: How Well are American Students Learning? (Washington, D.C.: The Brookings Institution, 2006).

[ii] All countries with 2003 and 2012 data are included.

[iii] Amanda Ripley, The Smartest Kids in the World: And How They Got That Way (New York, NY: Simon & Schuster, 2013)

[iv] Elizabeth Green, Building a Better Teacher: How Teaching Works (and How to Teach It to Everyone) (New York, NY: W.W. Norton & Company, 2014).

[v] The attitude toward school index is based on responses to: 1) Trying hard at school will help me get a good job, 2) Trying hard at school will help me get into a good college, 3) I enjoy receiving good grades, 4) Trying hard at school is important.  See: OECD, PISA 2012 Database, Table III.2.5a.

[vi] OECD, PISA 2012 Results: Ready to Learn: Students’ Engagement, Drive and Self-Beliefs (Volume III) (Paris: PISA, OECD Publishing, 2013), 77.

[vii] PISA originally called the index of intrinsic motivation the index of interest and enjoyment in mathematics, first constructed in 2003.  The four questions comprising the index remain identical from 2003 to 212, allowing for comparability.  Index values for 2003 scores were re-scaled based on 2012 scaling (mean of 0.00 and SD of 1.00), meaning that index values published in PISA reports prior to 2012 will not agree with those published after 2012 (including those analyzed here).  See: OECD, PISA 2012 Results: Ready to Learn: Students’ Engagement, Drive and Self-Beliefs (Volume III) (Paris: PISA, OECD Publishing, 2013), 54.

[viii] PISA math scores are scaled with a standard deviation of 100, but the average within-country standard deviation for OECD nations was 92 on the 2012 math test.

« Part II: Measuring Effects of the Common Core

Downloads

Authors

     
 
 




english

Girls, boys, and reading


Part I of the 2015 Brown Center Report on American Education

Girls score higher than boys on tests of reading ability.  They have for a long time.  This section of the Brown Center Report assesses where the gender gap stands today and examines trends over the past several decades.  The analysis also extends beyond the U.S. and shows that boys’ reading achievement lags that of girls in every country in the world on international assessments.  The international dimension—recognizing that U.S. is not alone in this phenomenon—serves as a catalyst to discuss why the gender gap exists and whether it extends into adulthood.

Background

One of the earliest large-scale studies on gender differences in reading, conducted in Iowa in 1942, found that girls in both elementary and high schools were better than boys at reading comprehension.[i] The most recent results from reading tests of the National Assessment of Educational Progress (NAEP) show girls outscoring boys at every grade level and age examined.  Gender differences in reading are not confined to the United States.  Among younger children—age nine to ten, or about fourth grade—girls consistently outscore boys on international assessments, from a pioneering study of reading comprehension conducted in fifteen countries in the 1970s, to the results of the Program in International Reading Literacy Study (PIRLS) conducted in forty-nine nations and nine benchmarking entities in 2011.  The same is true for students in high school.  On the 2012 reading literacy test of the Program for International Student Assessment (PISA), worldwide gender gaps are evident between fifteen-year-old males and females.

As the 21st century dawned, the gender gap came under the scrutiny of reporters and pundits.  Author Christina Hoff Sommers added a political dimension to the gender gap, and some say swept the topic into the culture wars raging at the time, with her 2000 book The War Against Boys: How Misguided Feminism is Harming Our Young Men.[ii] Sommers argued that boys’ academic inferiority, and in particular their struggles with reading, stemmed from the feminist movement’s impact on schools and society.  In the second edition, published in 2013, she changed the subtitle to How Misguided Policies Are Harming Our Young Men.  Some of the sting is removed from the  indictment of “misguided feminism.”  But not all of it.  Sommers singles out for criticism a 2008 report from the American Association of University Women.[iii] That report sought to debunk the notion that boys fared poorly in school compared to girls.  It left out a serious discussion of boys’ inferior performance on reading tests, as well as their lower grade point averages, greater rate of school suspension and expulsion, and lower rate of acceptance into college.

Journalist Richard Whitmire picked up the argument about the gender gap in 2010 with Why Boys Fail: Saving Our Sons from an Educational System That’s Leaving Them Behind.[iv] Whitmire sought to separate boys’ academic problems from the culture wars, noting that the gender gap in literacy is a worldwide phenomenon and appears even in countries where feminist movements are weak to nonexistent.  Whitmire offers several reasons for boys’ low reading scores, including poor reading instruction (particularly a lack of focus on phonics), and too few books appealing to boys’ interests.  He also dismisses several explanations that are in circulation, among them, video games, hip-hop culture, too much testing, and feminized classrooms.  As with Sommers’s book, Whitmire’s culprit can be found in the subtitle: the educational system.  Even if the educational system is not the original source of the problem, Whitmire argues, schools could be doing more to address it. 

In a 2006 monograph, education policy researcher Sara Mead took on the idea that American boys were being shortchanged by schools.  After reviewing achievement data from NAEP and other tests, Mead concluded that the real story of the gender gap wasn’t one of failure at all.  Boys and girls were both making solid academic progress, but in some cases, girls were making larger gains, misleading some commentators into concluding that boys were being left behind.  Mead concluded, “The current boy crisis hype and the debate around it are based more on hopes and fears than on evidence.”[v]

Explanations for the Gender Gap

The analysis below focuses on where the gender gap in reading stands today, not its causes.  Nevertheless, readers should keep in mind the three most prominent explanations for the gap.  They will be used to frame the concluding discussion.

Biological/Developmental:  Even before attending school, young boys evidence more problems in learning how to read than girls.  This explanation believes the sexes are hard-wired differently for literacy.

School Practices: Boys are inferior to girls on several school measures—behavioral, social, and academic—and those discrepancies extend all the way through college.  This explanation believes that even if schools do not create the gap, they certainly don’t do what they could to ameliorate it. 

Cultural Influences: Cultural influences steer boys toward non-literary activities (sports, music) and define literacy as a feminine characteristic.  This explanation believes cultural cues and strong role models could help close the gap by portraying reading as a masculine activity. 

The U.S. Gender Gap in Reading

Table 1-1 displays the most recent data from eight national tests of U.S. achievement.  The first group shows results from the National Assessment of Educational Progress Long Term Trend (NAEP-LTT), given to students nine, 13, and 17 years of age.  The NAEP-LTT in reading was first administered in 1971.  The second group of results is from the NAEP Main Assessment, which began testing reading achievement in 1992.  It assesses at three different grade levels: fourth, eighth, and twelfth.   The last two tests are international assessments in which the U.S. participates, the Progress in International Reading Literacy Study (PIRLS), which began in 2001, and the Program for International Student Assessment (PISA), first given in 2000.  PIRLS tests fourth graders, and PISA tests 15-year-olds.  In the U.S., 71 percent of students who took PISA in the fall of 2012 were in tenth grade. 

Two findings leap out.  First, the test score gaps between males and females are statistically significant on all eight assessments.  Because the sample sizes of the assessments are quite large, statistical significance does not necessarily mean that the gaps are of practical significance—or even noticeable if one observed several students reading together.  The tests also employ different scales.  The final column in the table expresses the gaps in standard deviation units, a measure that allows for comparing the different scores and estimating their practical meaningfulness.

The second finding is based on the standardized gaps (expressed in SDs).  On both NAEP tests, the gaps are narrower among elementary students and wider among middle and high school students.  That pattern also appears on international assessments.  The gap is twice as large on PISA as on PIRLS.[vi]  A popular explanation for the gender gap involves the different maturation rates of boys and girls.  That theory will be discussed in greater detail below, but at this point in the analysis, let’s simply note that the gender gap appears to grow until early adolescence—age 13 on the LTT-NAEP and grade eight on the NAEP Main.

Should these gaps be considered small or large?  Many analysts consider 10 scale score points on NAEP equal to about a year of learning.  In that light, gaps of five to 10 points appear substantial.  But compared to other test score gaps on NAEP, the gender gap is modest in size.  On the 2012 LTT-NAEP for nine-year-olds, the five point gap between boys and girls is about one-half of the 10 point gap between students living in cities and those living in suburbs.[vii]  The gap between students who are eligible for free and reduced lunch and those who are not is 28 points; between black and white students, it is 23 points; and between English language learners (ELL) and non-ELL students, it is 34 points. 

Table 1-1 only shows the size of the gender gap as gauged by assessments at single points in time.  For determining trends, let’s take a closer look at the LTT-NAEP, since it provides the longest running record of the gender gap.  In Table 1-2, scores are displayed from tests administered since 1971 and given nearest to the starts and ends of decades.  Results from 2008 and 2012 are both shown to provide readers an idea of recent fluctuations.  At all three ages, gender gaps were larger in 1971 than they are today.  The change at age nine is statistically significant, but not at age 13 (p=0.10) or age 17 (p=.07), although they are close.  Slight shrinkage occurred in the 1980s, but the gaps expanded again in the 1990s.  The gap at age 13 actually peaked at 15 scale score points in 1994 (not shown in the table), and the decline since then is statistically significant.  Similarly, the gap at age 17 peaked in 1996 at 15 scale score points, and the decline since then is also statistically significant.  More recently, the gap at age nine began to shrink again in 1999, age 13 began shrinking in the 2000s, and age 17 in 2012.

Table 1-3 decomposes the change figures by male and female performance.  Sara Mead’s point, that the NAEP story is one of both sexes gaining rather than boys falling behind, is even truer today than when she made it in 2006.  When Mead’s analysis was published, the most recent LTT-NAEP data were from 2004.  Up until then, girls had made greater reading gains than boys.  But that situation has reversed.  Boys have now made larger gains over the history of LTT-NAEP, fueled by the gains that they registered from 2004 to 2012.  The score for 17-year-old females in 2012 (291) was identical to their score in 1971.

International Perspective

The United States is not alone in reading’s gender gap.  Its gap of 31 points is not even the largest (see Figure 1-1). On the 2012 PISA, all OECD countries exhibited a gender gap, with females outscoring males by 23 to 62 points on the PISA scale (standard deviation of 94).   On average in the OECD, girls outscored boys by 38 points (rounded to 515 for girls and 478 for boys).  The U.S. gap of 31 points is less than the OECD average.

Finland had the largest gender gap on the 2012 PISA, twice that of the U.S., with females outscoring males by an astonishing 62 points (0.66 SDs).  Finnish girls scored 556, and boys scored 494.  To put this gap in perspective, consider that Finland’s renowned superiority on PISA tests is completely dependent on Finnish girls.  Finland’s boys’ score of 494 is about the same as the international average of 496, and not much above the OECD average for males (478).  The reading performance of Finnish boys is not statistically significantly different from boys in the U.S. (482) or from the average U.S. student, both boys and girls (498). Finnish superiority in reading only exists among females.

There is a hint of a geographical pattern.  Northern European countries tend to have larger gender gaps in reading.  Finland, Sweden, Iceland, and Norway have four of the six largest gaps.  Denmark is the exception with a 31 point gap, below the OECD average.   And two Asian OECD members have small gender gaps.  Japan’s gap of 24 points and South Korea’s gap of 23 are ranked among the bottom four countries. The Nordic tendency toward large gender gaps in reading was noted in a 2002 analysis of the 2000 PISA results.[viii]  At that time, too, Denmark was the exception.  Because of the larger sample and persistence over time, the Nordic pattern warrants more confidence than the one in the two Asian countries.

Back to Finland.  That’s the headline story here, and it contains a lesson for cautiously interpreting international test scores.  Consider that the 62 point gender gap in Finland is only 14 points smaller than the U.S. black-white gap (76 points) and 21 points larger than the white-Hispanic gap (41 points) on the same test.  Finland’s gender gap illustrates the superficiality of much of the commentary on that country’s PISA performance.  A common procedure in policy analysis is to consider how policies differentially affect diverse social groups.  Think of all the commentators who cite Finland to promote particular policies, whether the policies address teacher recruitment, amount of homework, curriculum standards, the role of play in children’s learning, school accountability, or high stakes assessments.[ix]  Advocates pound the table while arguing that these policies are obviously beneficial.  “Just look at Finland,” they say.  Have you ever read a warning that even if those policies contribute to Finland’s high PISA scores—which the advocates assume but serious policy scholars know to be unproven—the policies also may be having a negative effect on the 50 percent of Finland’s school population that happens to be male?

Would Getting Boys to Enjoy Reading More Help Close the Gap?

One of the solutions put forth for improving boys’ reading scores is to make an effort to boost their enjoyment of reading.  That certainly makes sense, but past scores of national reading and math performance have consistently, and counterintuitively, shown no relationship (or even an inverse one) with enjoyment of the two subjects.  PISA asks students how much they enjoy reading, so let’s now investigate whether fluctuations in PISA scores are at all correlated with how much 15-year-olds say they like to read.

The analysis below employs what is known as a “differences-in-differences” analytical strategy.  In both 2000 and 2009, PISA measured students’ reading ability and asked them several questions about how much they like to read.  An enjoyment index was created from the latter set of questions.[x]  Females score much higher on this index than boys.  Many commentators believe that girls’ greater enjoyment of reading may be at the root of the gender gap in literacy.

When new international test scores are released, analysts are tempted to just look at variables exhibiting strong correlations with achievement (such as amount of time spent on homework), and embrace them as potential causes of high achievement. But cross-sectional correlations can be deceptive.  The direction of causality cannot be determined, whether it’s doing a lot of homework that leads to high achievement, or simply that good students tend to take classes that assign more homework.  Correlations in cross-sectional data are also vulnerable to unobserved factors that may influence achievement.  For example, if cultural predilections drive a country’s exemplary performance, their influence will be masked or spuriously assigned to other variables unless they are specifically modeled.[xi]  Class size, between-school tracking, and time spent on learning are all topics on which differences-in-differences has been fruitfully employed to analyze multiple cross-sections of international data.

Another benefit of differences-in-differences is that it measures statistical relationships longitudinally.  Table 1-4 investigates the question: Is the rise and fall of reading enjoyment correlated with changes in reading achievement?  Many believe that if boys liked reading more, their literacy test scores would surely increase.  Table 1-4 does not support that belief.  Data are available for 27 OECD countries, and they are ranked by how much they boosted males’ enjoyment of reading.  The index is set at the student-level with a mean of 0.00 and standard deviation of 1.00.  For the twenty-seven nations in Table 1-4, the mean national change in enjoyment is -.02 with a standard deviation of .09. 

Germany did the best job of raising boys’ enjoyment of reading, with a gain of 0.12 on the index.  German males’ PISA scores also went up—a little more than 10 points (10.33).  France, on the other hand, raised males’ enjoyment of reading nearly as much as Germany (0.11), but French males’ PISA scores declined by 15.26 points.  A bit further down the column, Ireland managed to get boys to enjoy reading a little more (a gain of 0.05) but their reading performance fell a whopping 36.54 points.  Toward the bottom end of the list, Poland’s boys enjoyed reading less in 2009 than in 2000, a decline of 0.14 on the index, but over the same time span, their reading literacy scores increased by more than 14 points (14.29).  Among the countries in which the relationship goes in the expected direction is Finland.  Finnish males’ enjoyment of reading declined (-0.14) as did their PISA scores in reading literacy (-11.73).  Overall, the correlation coefficient for change in enjoyment and change in reading score is -0.01, indicating no relationship between the two.

Christina Hoff Sommers and Richard Whitmire have praised specific countries for first recognizing and then addressing the gender gap in reading.  Recently, Sommers urged the U.S. to “follow the example of the British, Canadians, and Australians.”[xii]  Whitmire described Australia as “years ahead of the U.S. in pioneering solutions” to the gender gap.  Let’s see how those countries appear in Table 1-4.  England does not have PISA data for the 2000 baseline year, but both Canada and Australia are included.  Canada raised boys’ enjoyment of reading a little bit (0.02) but Canadian males’ scores fell by about 12 points (-11.74).  Australia suffered a decline in boys’ enjoyment of reading (-0.04) and achievement (-16.50).  As promising as these countries’ efforts may have appeared a few years ago, so far at least, they have not borne fruit in raising boys’ reading performance on PISA.

Achievement gaps are tricky because it is possible for the test scores of the two groups being compared to both decline while the gap increases or, conversely, for scores of both to increase while the gap declines.  Table 1-4 only looks at males’ enjoyment of reading and its relationship to achievement.  A separate differences-in-differences analysis was conducted (but not displayed here) to see whether changes in the enjoyment gap—the difference between boys’ and girls’ enjoyment of reading—are related to changes in reading achievement.  They are not (correlation coefficient of 0.08).  National PISA data simply do not support the hypothesis that the superior reading performance of girls is related to the fact that girls enjoy reading more than boys. 

Discussion

Let’s summarize the main findings of the analysis above. Reading scores for girls exceed those for boys on eight recent assessments of U.S. reading achievement.  The gender gap is larger for middle and high school students than for students in elementary school.  The gap was apparent on the earliest NAEP tests in the 1970s and has shown some signs of narrowing in the past decade.  International tests reveal that the gender gap is worldwide.  Among OECD countries, it even appears among countries known for superior performance on PISA’s reading test.  Finland not only exhibited the largest gender gap in reading on the 2012 PISA, the gap had widened since 2000.  A popular recommendation for boosting boys’ reading performance is finding ways for them to enjoy reading more.  That theory is not supported by PISA data.  Countries that succeeded in raising boys’ enjoyment of reading from 2000 to 2009 were no more likely to improve boys’ reading performance than countries where boys’ enjoyment of reading declined. 

The origins of the gender gap are hotly debated.  The universality of the gap certainly supports the argument that it originates in biological or developmental differences between the two sexes.  It is evident among students of different ages in data collected at different points in time.  It exists across the globe, in countries with different educational systems, different popular cultures, different child rearing practices, and different conceptions of gender roles.  Moreover, the greater prevalence of reading impairment among young boys—a ratio of two or three to one—suggests an endemic difficulty that exists before the influence of schools or culture can take hold.[xiii] 

But some of the data examined above also argue against the developmental explanation.  The gap has been shrinking on NAEP.  At age nine, it is less than half of what it was forty years ago.  Biology doesn’t change that fast.  Gender gaps in math and science, which were apparent in achievement data for a long time, have all but disappeared, especially once course taking is controlled.  The reading gap also seems to evaporate by adulthood.  On an international assessment of adults conducted in 2012, reading scores for men and women were statistically indistinguishable up to age 35—even in Finland and the United States.  After age 35, men had statistically significantly higher scores in reading, all the way to the oldest group, age 55 and older.  If the gender gap in literacy is indeed shaped by developmental factors, it may be important for our understanding of the phenomenon to scrutinize periods of the life cycle beyond the age of schooling.   

Another astonishing pattern emerged from the study of adult reading.  Participants were asked how often they read a book.  Of avid book readers (those who said they read a book once a week) in the youngest group (age 24 and younger), 59 percent were women and 41 percent were men.  By age 55, avid book readers were even more likely to be women, by a margin of 63 percent to 37 percent.  Two-thirds of respondents who said they never read books were men.  Women remained the more enthusiastic readers even as the test scores of men caught up with those of women and surpassed them.

A few years ago, Ian McEwan, the celebrated English novelist, decided to reduce the size of the library in his London townhouse.  He and his younger son selected thirty novels and took them to a local park.  They offered the books to passers-by.  Women were eager and grateful to take the books, McEwan reports.  Not a single man accepted.  The author’s conclusion? “When women stop reading, the novel will be dead.”[xiv] 

McEwan might be right, regardless of the origins of the gender gap in reading and the efforts to end it.



[i] J.B. Stroud and E.F. Lindquist, “Sex differences in achievement in the elementary and secondary schools,” Journal of Educational Psychology, vol. 33(9) (Washington, D.C.: American Psychological Association, 1942), 657-667.

[ii] Christina Hoff Sommers, The War Against Boys: How Misguided Feminism Is Harming Our Young Men (New York, NY: Simon & Schuster, 2000).

[iii] Christianne Corbett, Catherine Hill, and Andresse St. Rose, Where the Girls Are: The Facts About Gender Equity in Education (Washington, D.C.: American Association of University Women, 2008).

[iv] Richard Whitmire, Why Boys Fail: Saving Our Sons from an Educational System That’s Leaving Them Behind (New York, NY: AMACOM, 2010).

[v] Sara Mead, The Evidence Suggests Otherwise: The Truth About Boys and Girls (Washington, D.C.: Education Sector, 2006).

[vi] PIRLS and PISA assess different reading skills.  Performance on the two tests may not be comparable.

[vii] NAEP categories were aggregated to calculate the city/suburb difference.

[viii] OECD, Reading for Change: Performance and Engagement Across Countries (Paris: OECD, 2002), 125.

[ix] The best example of promoting Finnish education policies is Pasi Sahlberg’s  Finnish Lessons: What Can the World Learn from Educational Change in Finland? (New York: Teachers College Press, 2011).

[x] The 2009 endpoint was selected because 2012 data for the enjoyment index were not available on the NCES PISA data tool.

[xi] A formal name for the problem of reverse causality is endogeneity and for the problem of unobserved variables, omitted variable bias.

[xii] Christina Hoff Sommers, “The Boys at the Back,” New York Times, February 2, 2013;  Richard Whitmire, Why Boys Fail (New York: AMACOM, 2010), 153.

[xiii] J.L. Hawke, R.K. Olson, E.G. Willcutt, S.J. Wadsworth, & J.C. DeFries, “Gender ratios for reading difficulties,” Dyslexia 15(3), (Chichester, England: Wiley, 2009), 239–242.

[xiv] Daniel Zalewski, “The Background Hum: Ian McEwan’s art of unease,” The New Yorker, February 23, 2009. 

  Part II: Measuring Effects of the Common Core »

Downloads

Authors

     
 
 




english

2015 Brown Center Report on American Education: How Well Are American Students Learning?


Editor's Note: The introduction to the 2015 Brown Center Report on American Education appears below. Use the Table of Contents to navigate through the report online, or download a PDF of the full report.

TABLE OF CONTENTS

Part I: Girls, Boys, and Reading

Part II: Measuring Effects of the Common Core

Part III: Student Engagement


INTRODUCTION

The 2015 Brown Center Report (BCR) represents the 14th edition of the series since the first issue was published in 2000.  It includes three studies.  Like all previous BCRs, the studies explore independent topics but share two characteristics: they are empirical and based on the best evidence available.  The studies in this edition are on the gender gap in reading, the impact of the Common Core State Standards -- English Language Arts on reading achievement, and student engagement.

Part one examines the gender gap in reading.  Girls outscore boys on practically every reading test given to a large population.  And they have for a long time.  A 1942 Iowa study found girls performing better than boys on tests of reading comprehension, vocabulary, and basic language skills.  Girls have outscored boys on every reading test ever given by the National Assessment of Educational Progress (NAEP)—the first long term trend test was administered in 1971—at ages nine, 13, and 17.  The gap is not confined to the U.S.  Reading tests administered as part of the Progress in International Reading Literacy Study (PIRLS) and the Program for International Student Assessment (PISA) reveal that the gender gap is a worldwide phenomenon.  In more than sixty countries participating in the two assessments, girls are better readers than boys. 

Perhaps the most surprising finding is that Finland, celebrated for its extraordinary performance on PISA for over a decade, can take pride in its high standing on the PISA reading test solely because of the performance of that nation’s young women.  With its 62 point gap, Finland has the largest gender gap of any PISA participant, with girls scoring 556 and boys scoring 494 points (the OECD average is 496, with a standard deviation of 94).   If Finland were only a nation of young men, its PISA ranking would be mediocre.

Part two is about reading achievement, too. More specifically, it’s about reading and the English Language Arts standards of the Common Core (CCSS-ELA).  It’s also about an important decision that policy analysts must make when evaluating public policies—the determination of when a policy begins. How can CCSS be properly evaluated? 

Two different indexes of CCSS-ELA implementation are presented, one based on 2011 data and the other on data collected in 2013.  In both years, state education officials were surveyed about their Common Core implementation efforts.  Because forty-six states originally signed on to the CCSS-ELA—and with at least forty still on track for full implementation by 2016—little variability exists among the states in terms of standards policy.  Of course, the four states that never adopted CCSS-ELA can serve as a small control group.  But variation is also found in how the states are implementing CCSS.  Some states are pursuing an array of activities and aiming for full implementation earlier rather than later.  Others have a narrow, targeted implementation strategy and are proceeding more slowly. 

The analysis investigates whether CCSS-ELA implementation is related to 2009-2013 gains on the fourth grade NAEP reading test.  The analysis cannot verify causal relationships between the two variables, only correlations.  States that have aggressively implemented CCSS-ELA (referred to as “strong” implementers in the study) evidence a one to one and one-half point larger gain on the NAEP scale compared to non-adopters of the standards.  This association is similar in magnitude to an advantage found in a study of eighth grade math achievement in last year’s BCR.  Although positive, these effects are quite small.  When the 2015 NAEP results are released this winter, it will be important for the fate of the Common Core project to see if strong implementers of the CCSS-ELA can maintain their momentum.

Part three is on student engagement.  PISA tests fifteen-year-olds on three subjects—reading, math, and science—every three years.  It also collects a wealth of background information from students, including their attitudes toward school and learning.  When the 2012 PISA results were released, PISA analysts published an accompanying volume, Ready to Learn: Students’ Engagement, Drive, and Self-Beliefs, exploring topics related to student engagement.

Part three provides secondary analysis of several dimensions of engagement found in the PISA report.  Intrinsic motivation, the internal rewards that encourage students to learn, is an important component of student engagement.  National scores on PISA’s index of intrinsic motivation to learn mathematics are compared to national PISA math scores.  Surprisingly, the relationship is negative.  Countries with highly motivated kids tend to score lower on the math test; conversely, higher-scoring nations tend to have less-motivated kids. 

The same is true for responses to the statements, “I do mathematics because I enjoy it,” and “I look forward to my mathematics lessons.”  Countries with students who say that they enjoy math or look forward to their math lessons tend to score lower on the PISA math test compared to countries where students respond negatively to the statements.  These counterintuitive finding may be influenced by how terms such as “enjoy” and “looking forward” are interpreted in different cultures.  Within-country analyses address that problem.  The correlation coefficients for within-country, student-level associations of achievement and other components of engagement run in the anticipated direction—they are positive.  But they are also modest in size, with correlation coefficients of 0.20 or less. 

Policymakers are interested in questions requiring analysis of aggregated data—at the national level, that means between-country data.  When countries increase their students’ intrinsic motivation to learn math, is there a concomitant increase in PISA math scores?  Data from 2003 to 2012 are examined.  Seventeen countries managed to increase student motivation, but their PISA math scores fell an average of 3.7 scale score points.  Fourteen countries showed no change on the index of intrinsic motivation—and their PISA scores also evidenced little change.  Eight countries witnessed a decline in intrinsic motivation.  Inexplicably, their PISA math scores increased by an average of 10.3 scale score points.  Motivation down, achievement up.

Correlation is not causation.  Moreover, the absence of a positive correlation—or in this case, the presence of a negative correlation—is not refutation of a possible positive relationship.  The lesson here is not that policymakers should adopt the most effective way of stamping out student motivation.  The lesson is that the level of analysis matters when analyzing achievement data.  Policy reports must be read warily—especially those freely offering policy recommendations.  Beware of analyses that exclusively rely on within- or between-country test data without making any attempt to reconcile discrepancies at other levels of analysis.  Those analysts could be cherry-picking the data.  Also, consumers of education research should grant more credence to approaches modeling change over time (as in difference in difference models) than to cross-sectional analyses that only explore statistical relationships at a single point in time. 

  Part I: Girls, Boys, and Reading »

Downloads

Authors

Image Source: Elizabeth Sablich
     
 
 




english

High Achievers, Tracking, and the Common Core


A curriculum controversy is roiling schools in the San Francisco Bay Area.  In the past few months, parents in the San Mateo-Foster City School District, located just south of San Francisco International Airport, voiced concerns over changes to the middle school math program. The changes were brought about by the Common Core State Standards (CCSS).  Under previous policies, most eighth graders in the district took algebra I.  Some very sharp math students, who had already completed algebra I in seventh grade, took geometry in eighth grade. The new CCSS-aligned math program will reduce eighth grade enrollments in algebra I and eliminate geometry altogether as a middle school course. 

A little background information will clarify the controversy.  Eighth grade mathematics may be the single grade-subject combination most profoundly affected by the CCSS.  In California, the push for most students to complete algebra I by the end of eighth grade has been a centerpiece of state policy, as it has been in several states influenced by the “Algebra for All” movement that began in the 1990s.  Nationwide, in 1990, about 16 percent of all eighth graders reported that they were taking an algebra or geometry course.  In 2013, the number was three times larger, and nearly half of all eighth graders (48 percent) were taking algebra or geometry.[i]  When that percentage goes down, as it is sure to under the CCSS, what happens to high achieving math students?

The parents who are expressing the most concern have kids who excel at math.  One parent in San Mateo-Foster City told The San Mateo Daily Journal, “This is really holding the advanced kids back.”[ii] The CCSS math standards recommend a single math course for seventh grade, integrating several math topics, followed by a similarly integrated math course in eighth grade.  Algebra I won’t be offered until ninth grade.  The San Mateo-Foster City School District decided to adopt a “three years into two” accelerated option.  This strategy is suggested on the Common Core website as an option that districts may consider for advanced students.  It combines the curriculum from grades seven through nine (including algebra I) into a two year offering that students can take in seventh and eighth grades.[iii]  The district will also provide—at one school site—a sequence beginning in sixth grade that compacts four years of math into three.  Both accelerated options culminate in the completion of algebra I in eighth grade.

The San Mateo-Foster City School District is home to many well-educated, high-powered professionals who work in Silicon Valley.  They are unrelentingly liberal in their politics.  Equity is a value they hold dear.[iv]  They also know that completing at least one high school math course in middle school is essential for students who wish to take AP Calculus in their senior year of high school.  As CCSS is implemented across the nation, administrators in districts with demographic profiles similar to San Mateo-Foster City will face parents of mathematically precocious kids asking whether the “common” in Common Core mandates that all students take the same math course.  Many of those districts will respond to their constituents and provide accelerated pathways (“pathway” is CCSS jargon for course sequence). 

But other districts will not.  Data show that urban schools, schools with large numbers of black and Hispanic students, and schools located in impoverished neighborhoods are reluctant to differentiate curriculum.  It is unlikely that gifted math students in those districts will be offered an accelerated option under CCSS.  The reason why can be summed up in one word: tracking.

Tracking in eighth grade math means providing different courses to students based on their prior math achievement.  The term “tracking” has been stigmatized, coming under fire for being inequitable.  Historically, where tracking existed, black, Hispanic, and disadvantaged students were often underrepresented in high-level math classes; white, Asian, and middle-class students were often over-represented.  An anti-tracking movement gained a full head of steam in the 1980s.  Tracking reformers knew that persuading high schools to de-track was hopeless.  Consequently, tracking’s critics focused reform efforts on middle schools, urging that they group students heterogeneously with all students studying a common curriculum.  That approach took hold in urban districts, but not in the suburbs.

Now the Common Core and de-tracking are linked.  Providing an accelerated math track for high achievers has become a flashpoint throughout the San Francisco Bay Area.  An October 2014 article in The San Jose Mercury News named Palo Alto, Saratoga, Cupertino, Pleasanton, and Los Gatos as districts that have announced, in response to parent pressure, that they are maintaining an accelerated math track in middle schools.  These are high-achieving, suburban districts.  Los Gatos parents took to the internet with a petition drive when a rumor spread that advanced courses would end.  Ed Source reports that 900 parents signed a petition opposing the move and board meetings on the issue were packed with opponents. The accelerated track was kept.  Piedmont established a single track for everyone, but allowed parents to apply for an accelerated option.  About twenty five percent did so.  The Mercury News story underscores the demographic pattern that is unfolding and asks whether CCSS “could cement a two-tier system, with accelerated math being the norm in wealthy areas and the exception elsewhere.”

What is CCSS’s real role here?  Does the Common Core take an explicit stand on tracking?  Not really.  But de-tracking advocates can interpret the “common” in Common Core as license to eliminate accelerated tracks for high achievers.  As a noted CCSS supporter (and tracking critic), William H. Schmidt, has stated, “By insisting on common content for all students at each grade level and in every community, the Common Core mathematics standards are in direct conflict with the concept of tracking.”[v]  Thus, tracking joins other controversial curricular ideas—e.g., integrated math courses instead of courses organized by content domains such as algebra and geometry; an emphasis on “deep,” conceptual mathematics over learning procedures and basic skills—as “dog whistles” embedded in the Common Core.  Controversial positions aren’t explicitly stated, but they can be heard by those who want to hear them.    

CCSS doesn’t have to take an outright stand on these debates in order to have an effect on policy.  For the practical questions that local grouping policies resolve—who takes what courses and when do they take them—CCSS wipes the slate clean.  There are plenty of people ready to write on that blank slate, particularly administrators frustrated by unsuccessful efforts to de-track in the past

Suburban parents are mobilized in defense of accelerated options for advantaged students.  What about kids who are outstanding math students but also happen to be poor, black, or Hispanic?  What happens to them, especially if they attend schools in which the top institutional concern is meeting the needs of kids functioning several years below grade level?  I presented a paper on this question at a December 2014 conference held by the Fordham Institute in Washington, DC.  I proposed a pilot program of “tracking for equity.”  By that term, I mean offering black, Hispanic, and poor high achievers the same opportunity that the suburban districts in the Bay Area are offering.  High achieving middle school students in poor neighborhoods would be able to take three years of math in two years and proceed on a path toward AP Calculus as high school seniors.

It is true that tracking must be done carefully.  Tracking can be conducted unfairly and has been used unjustly in the past.  One of the worst consequences of earlier forms of tracking was that low-skilled students were tracked into dead end courses that did nothing to help them academically.  These low-skilled students were disproportionately from disadvantaged communities or communities of color.  That’s not a danger in the proposal I am making.  The default curriculum, the one every student would take if not taking the advanced track, would be the Common Core.  If that’s a dead end for low achievers, Common Core supporters need to start being more honest in how they are selling the CCSS.  Moreover, to ensure that the policy gets to the students for whom it is intended, I have proposed running the pilot program in schools predominantly populated by poor, black, or Hispanic students.  The pilot won’t promote segregation within schools because the sad reality is that participating schools are already segregated.

Since I presented the paper, I have privately received negative feedback from both Algebra for All advocates and Common Core supporters.  That’s disappointing.  Because of their animus toward tracking, some critics seem to support a severe policy swing from Algebra for All, which was pursued for equity, to Algebra for None, which will be pursued for equity.  It’s as if either everyone or no one should be allowed to take algebra in eighth grade.  The argument is that allowing only some eighth graders to enroll in algebra is elitist, even if the students in question are poor students of color who are prepared for the course and likely to benefit from taking it.

The controversy raises crucial questions about the Common Core.  What’s common in the common core?  Is it the curriculum?  And does that mean the same curriculum for all?  Will CCSS serve as a curricular floor, ensuring all students are exposed to a common body of knowledge and skills?  Or will it serve as a ceiling, limiting the progress of bright students so that their achievement looks more like that of their peers?  These questions will be answered differently in different communities, and as they are, the inequities that Common Core supporters think they’re addressing may surface again in a profound form.   



[i] Loveless, T. (2008). The 2008 Brown Center Report on American Education. Retrieved from http://www.brookings.edu/research/reports/2009/02/25-education-loveless. For San Mateo-Foster City’s sequence of math courses, see: page 10 of http://smfc-ca.schoolloop.com/file/1383373423032/1229222942231/1242346905166154769.pdf 

[ii] Swartz, A. (2014, November 22). “Parents worry over losing advanced math classes: San Mateo-Foster City Elementary School District revamps offerings because of Common Core.” San Mateo Daily Journal. Retrieved from http://www.smdailyjournal.com/articles/lnews/2014-11-22/parents-worry-over-losing-advanced-math-classes-san-mateo-foster-city-elementary-school-district-revamps-offerings-because-of-common-core/1776425133822.html

[iii] Swartz, A. (2014, December 26). “Changing Classes Concern for parents, teachers: Administrators say Common Core Standards Reason for Modifications.” San Mateo Daily Journal. Retrieved from http://www.smdailyjournal.com/articles/lnews/2014-12-26/changing-classes-concern-for-parents-teachers-administrators-say-common-core-standards-reason-for-modifications/1776425135624.html

[iv] In the 2014 election, Jerry Brown (D) took 75% of Foster City’s votes for governor.  In the 2012 presidential election, Barak Obama received 71% of the vote. http://www.city-data.com/city/Foster-City-California.html

[v] Schmidt, W.H. and Burroughs, N.A. (2012) “How the Common Core Boosts Quality and Equality.” Educational Leadership, December 2012/January 2013. Vol. 70, No. 4, pp. 54-58.

Authors

     
 
 




english

40 years later: America’s energy path and the road ahead

In a 1976 Foreign Affairs article, Amory Lovins offered a novel—and controversial—vision for America’s energy strategy. With U.S. security and energy independence threatened by oil market instability, Lovins urged policymakers to move away from fossil fuels and nuclear and towards efficiency and renewable energy. This “soft energy path,” he argued, offered a myriad of clear…

       




english

The post-Paris clean energy landscape: Renewable energy in 2016 and beyond

Last year’s COP21 summit saw global economic powers and leading greenhouse gas emitters—including the United States, China, and India—commit to the most ambitious clean energy targets to date. Bolstered by sharp reductions in costs and supportive government policies, renewable power spread globally at its fastest-ever rate in 2015, accounting for more than half of the…

       




english

The presidential candidates’ views on energy and climate

This election cycle, what will separate Democrats from Republicans on energy policy and their approach to climate change? Republicans tend to be fairly strong supporters of the fossil fuel industry, and to various degrees deny that climate change is occurring. Democratic candidates emphasize the importance of further expanding the share of renewable energy at the…

       




english

India’s energy and climate policy: Can India meet the challenge of industrialization and climate change?

In Paris this past December, 195 nations came to an historical agreement to reduce carbon emissions and limit the devastating impacts of climate change. While it was indeed a triumphant event worthy of great praise, these nations are now faced with the daunting task of having to achieve their intended climate goals. For many developing…

       




english

The halfway point of the U.S. Arctic Council chairmanship

On April 24, 2015, the United States assumed chairmanship of the Arctic Council for a two-year term. Over the course of the last year, the United States has outlined plans within three central priorities: improving economic and living conditions for Arctic communities; Arctic Ocean safety, security, and stewardship; and addressing the impacts of climate change.…

       




english

6 years from the BP Deepwater Horizon oil spill: What we’ve learned, and what we shouldn’t misunderstand

Six years ago today, the BP Deepwater Horizon oil spill occurred in the U.S. Gulf of Mexico with devastating effects on the local environment and on public perception of offshore oil and gas drilling. The blowout sent toxic fluids and gas shooting up the well, leading to an explosion on board the rig that killed…

       




english

When the champagne is finished: Why the post-Paris parade of climate euphoria is largely premature

The new international climate change agreement has received largely positive reviews despite the fact that many years of hard work will be required to actually turn “Paris” into a success. As with all international agreements, the Paris agreement too will have to be tested and proven over time. The Eiffel Tower is engulfed in fog…

       




english

COP 21 at Paris: The issues, the actors, and the road ahead on climate change

At the end of the month, governments from nearly 200 nations will convene in Paris, France for the 21st annual U.N. climate conference (COP21). Expectations are high for COP21 as leaders aim to achieve a legally binding and universal agreement on limiting global temperature increases for the first time in over 20 years. Ahead of this…

       




english

Does decarbonization mean de-coalification? Discussing carbon reduction policies

In September, the Energy Security and Climate Initiative (ESCI) at Brookings held the third meeting of its Coal Task Force (CTF), during which participants discussed the dynamics of three carbon policy instruments: performance standards, cap and trade, and a carbon tax. The dialogue revolved around lessons learned from implementing these policy mechanisms, especially as they…

       




english

Lessons from energy transitions in Germany and Japan

As the United Nations Conference on Climate Change in Paris approaches, countries around the world are looking for ways to lower carbon emissions. Germany and Japan are both undertaking dramatic transitions in their electricity sectors, moving away from nuclear energy and deploying more renewable power. Germany has set an ambitious goal of 80 to 95…

       




english

American workers’ safety net is broken. The COVID-19 crisis is a chance to fix it.

The COVID-19 pandemic is forcing some major adjustments to many aspects of our daily lives that will likely remain long after the crisis recedes: virtual learning, telework, and fewer hugs and handshakes, just to name a few. But in addition, let’s hope the crisis also drives a permanent overhaul of the nation’s woefully inadequate worker…

       




english

COVID-19 is turning the Midwest’s long legacy of segregation deadly

The COVID-19 pandemic is unmasking a lot of ugly economic and social truths across the Midwest, especially in my home state of Michigan. The appearance of a good economy in the Midwest following the Great Recession (which hit the region very hard) was a bit of an illusion. Prior to the arrival of the coronavirus,…

       




english

Most business incentives don’t work. Here’s how to fix them.

In 2017, the state of Wisconsin agreed to provide $4 billion in state and local tax incentives to the electronics manufacturing giant Foxconn. In return, the Taiwan-based company promised to build a new manufacturing plant in the state for flat-screen television displays and the subsequent creation of 13,000 new jobs. It didn’t happen. Those 13,000…

       




english

How Promise programs can help former industrial communities

The nation is seeing accelerating gaps in economic opportunity and prosperity between more educated, tech-savvy, knowledge workers congregating in the nation’s “superstar” cities (and a few university-town hothouses) and residents of older industrial cities and the small towns of “flyover country.” These growing divides are shaping public discourse, as policymakers and thought leaders advance recipes…

       




english

What do Midwest working-class voters want and need?

If Donald Trump ends up facing off against Joe Biden in 2020, it will be portrayed as a fight for the hearts and souls of white working-class voters in Pennsylvania, Wisconsin, and my home state of Michigan. But what do these workers want and need? The President and his allies on the right offer a…

       




english

As the venture capital game gets bigger, the Midwest keeps missing out

Those working to accelerate economic growth in the Heartland must face some stark realities. The Great Lakes region continues to export wealth to coastal economies, even as investment leaders try to equalize growth between the coasts and the Heartland. The region sees only a tiny fraction of venture capital (VC) deals, despite producing one quarter…

       




english

The Impact of Domestic Drones on Privacy, Safety and National Security

Legal and technology experts hosted a policy discussion on how drones and forthcoming Federal Aviation Agency regulations into unmanned aerial vehicles will affect Americans’ privacy, safety and the country’s overall security on April 4, 2012 at Brookings. The event followed a new aviation bill, signed in February, which will open domestic skies to “unmanned aircraft…

       




english

Targeted Killing in U.S. Counterterrorism Strategy and Law

The following is part of the Series on Counterterrorism and American Statutory Law, a joint project of the Brookings Institution, the Georgetown University Law Center, and the Hoover Institution Introduction It is a slight exaggeration to say that Barack Obama is the first president in American history to have run in part on a political…