y

Three keys to reforming government: Lessons from repairing the VA


On June 20, I moderated a conversation on the future of the Department of Veterans Affairs with Secretary Robert McDonald. When he took office almost two years ago, Secretary McDonald inherited an organization in crisis: too many veterans faced shockingly long wait-times before they received care, VA officials had allegedly falsified records, and other allegations of mismanagement abounded.

Photo: Paul Morigi

Since he was sworn into office, Secretary McDonald has led the VA through a period of ambitious reform, anchored by the MyVA program. He and his team have embraced three core strategies that are securing meaningful change. They are important insights for all government leaders, and private sector ones as well.

1. Set bold goals

Secretary McDonald’s vision is for the VA to become the number one customer-service agency in the federal government. But he and his team know that words alone won’t make this happen. They developed twelve breakthrough priorities for 2016 that will directly improve service to veterans. These actionable short-term objectives support the VA’s longer term aim to deliver an exceptional experience for our veterans. By aiming high, and also drafting a concrete roadmap, the VA has put itself on a path to success.

2. Hybridize the best of public and private sectors

To accomplish their ambitious goal, VA leadership is applying the best practices of customer-service businesses around the nation. The Secretary and his colleagues are leveraging the goodwill, resources, and expertise of both the private and public sector. To do that, the VA has brought together diverse groups of business leaders, medical professionals, government executives, and veteran advocates under their umbrella MyVA Advisory Committee. Following the examples set by private sector leaders in service provision and innovation, the VA is developing user-friendly mobile apps for veterans, modernizing its website, and seeking to make hiring practices faster, more competitive, and more efficient. And so that no good idea is left unheard, the VA has created a "shark tank” to capture and enact suggestions and recommendations for improvement from the folks who best understand daily VA operations—VA employees themselves.

3. Data, data, data

The benefits of data-driven decision making in government are well known. As led by Secretary McDonald, the VA has continued to embrace the use of data to inform its policies and improve its performance. Already a leader in the collection and publication of data, the VA has recently taken even greater strides in sharing information between its healthcare delivery agencies. In addition to collecting administrative and health-outcomes information, the VA is gathering data from veterans about what they think . Automated kiosks allow veterans to check in for appointments, and to record their level of satisfaction with the services provided.

The results that the Secretary and his team have achieved speak for themselves:

  • 5 million more appointments completed last fiscal year over the previous fiscal year
  • 7 million additional hours of care for veterans in the last two years (based on an increase in the clinical workload of 11 percent over the last two years)
  • 97 percent of appointments completed within 30 days of the veteran’s preferred date; 86 percent within 7 days; 22 percent the same day
  • Average wait times of 5 days for primary care, 6 days for specialty care, and 2 days for mental health are
  • 90 percent of veterans say they are satisfied or completely satisfied with when they got their appointment (less than 3 percent said they were dissatisfied or completely dissatisfied).
  • The backlog for disability claims—once over 600,000 claims that were more than 125 days old—is down almost 90 percent.

Thanks to Secretary McDonald’s continued commitment to modernization, the VA has made significant progress. Problems, of course, remain at the VA and the Secretary has more work to do to ensure America honors the debt it owes its veterans, but the past two years of reform have moved the Department in the right direction. His strategies are instructive for managers of change everywhere.

Fred Dews and Andrew Kenealy contributed to this post.

Authors

Image Source: © Jim Bourg / Reuters
       




y

The Iran deal, one year out: What Brookings experts are saying


How has the Joint Comprehensive Plan of Action (JCPOA)—signed between the P5+1 and Iran one year ago—played out in practice? Several Brookings scholars, many of whom participated prominently in debates last year surrounding official congressional review, offered their views.

Strobe Talbott, President, Brookings Institution:

At the one-year mark, it’s clear that the nuclear agreement between Iran and the major powers has substantially restricted Tehran’s ability to produce the fissile material necessary to build a bomb. That’s a net positive—for the United States and the broader region.

Robert Einhorn, Senior Fellow, Center for 21st Century Security and Intelligence and Senior Fellow, Arms Control and Non-Proliferation Initiative, Foreign Policy program:

One year after its conclusion, the JCPOA remains controversial in Tehran and Washington (as I describe in more detail here), with opponents unreconciled to the deal and determined to derail it. But opponents have had to scale back their criticism, in large part because the JCPOA, at least so far, has delivered on its principal goal—blocking Iran’s path to nuclear weapons for an extended period of time. Moreover, Iran’s positive compliance record has not given opponents much ammunition. The IAEA found Iran in compliance in its two quarterly reports issued in 2016.

But challenges to the smooth operation and even the longevity of the deal are already apparent.

A real threat to the JCPOA is that Iran will blame the slow recovery of its economy on U.S. failure to conscientiously fulfill its sanctions relief commitments and, using that as a pretext, will curtail or even end its own implementation of the deal. But international banks and businesses have been reluctant to engage Iran not because they have been discouraged by the United States but because they have their own business-related reasons to be cautious. Legislation proposed in Congress could also threaten the nuclear deal. 

For now, the administration is in a position to block new legislation that it believes would scuttle the deal. But developments outside the JCPOA, especially Iran’s regional behavior and its crackdown on dissent at home, could weaken support for the JCPOA within the United States and give proponents of deal-killing legislation a boost. 

A potential wildcard for the future of the JCPOA is coming governing transitions in both Washington and Tehran. Hillary Clinton would maintain the deal but perhaps a harder line than her predecessor. Donald Trump now says he will re-negotiate rather than scrap the deal, but a better deal will not prove negotiable. With President Hassan Rouhani up for re-election next year and the health of the Supreme Leader questionable, Iran’s future policy toward the JCPOA cannot be confidently predicted.

A final verdict on the JCPOA is many years away. But it is off to a promising start, as even some of its early critics now concede. Still, it is already clear that the path ahead will not always be smooth, the longevity of the deal cannot be taken for granted, and keeping it on track will require constant focus in Washington and other interested capitals. 

Suzanne Maloney, Deputy Director, Foreign Policy program and Senior Fellow, Center for Middle East Policy, Foreign Policy program:

The Joint Comprehensive Plan of Action has fulfilled neither the worst fears of its detractors nor the most soaring ambitions of its proponents. All of the concerns that have shaped U.S. policy toward Tehran for more than a generation—terrorism, human rights abuses, weapons of mass destruction, regional destabilization—remain as relevant, and as alarming, as they have ever been. Notably, much the same is true on the Iranian side; the manifold grievances that Tehran has harbored toward Washington since the 1979 revolution continue to smolder.

An important truth about the JCPOA, which has been wielded by both its defenders and its detractors in varying contexts, is that it was transactional, not transformational. As President Barack Obama repeatedly insisted, the accord addressed one specific problem, and in those narrow terms, it can be judged a relative success. The value of that relative success should not be underestimated; a nuclear-armed Iran would magnify risks in a turbulent region in a terrible way. 

But in the United States, in Iran, and across the Middle East, the agreement has always been viewed through a much broader lens—as a waystation toward Iranian-American rapprochement, as an instrument for addressing the vicious cycle of sectarian violence that threatens to consume the region, as a boost to the greater cause of moderation and democratization in Iran. And so the failure of the deal to catalyze greater cooperation from Iran on a range of other priorities—Syria, Yemen, Iraq, to name a few—or to jumpstart improvements in Iran’s domestic dynamics cannot be disregarded simply because it was not its original intent. 

For the “new normal” of regularized diplomatic contact between Washington and Tehran to yield dividends, the United States will need a serious strategy toward Tehran that transcends the JCPOA, building on the efficacy of the hard-won multilateral collaboration on the nuclear issue. Iranians, too, must begin to pivot the focus of their efforts away from endless litigation of the nuclear deal and toward a more constructive approach to addressing the deep challenges facing their country today. 

Bruce Riedel, Senior Fellow, Center for Middle East Policy and Center for 21st Century Security and Intelligence and Director, Intelligence Project, Foreign Policy program:

As I explain more fully here, one unintended but very important consequence of the Iran nuclear deal has been to aggravate and intensify Saudi Arabia's concerns about Iran's regional goals and intentions. This fueling of Saudi fears has in turn fanned sectarian tensions in the region to unprecedented levels, and the results are likely to haunt the region for years to come.

Riyadh's concerns about Iran have never been primarily focused on the nuclear danger. Rather, the key Saudi concern is that Iran seeks regional hegemony and uses terrorism and subversion to achieve it. The deal deliberately does not deal with this issue. In Saudi eyes, it actually makes the situation worse because lifting sanctions removed Iran's isolation as a rogue state and gives it more income. 

Washington has tried hard to reassure the Saudis, and President Obama has wisely sought to build confidence with King Salman and his young son. The Iran deal is a good one, and I've supported it from its inception. But it has had consequences that are dangerous and alarming. In the end, Riyadh and Tehran are the only players who can deescalate the situation—the Saudis show no sign of interest in that road. 

Norman Eisen, Visiting Fellow, Governance Studies:

The biggest disappointment of the post-deal year has been the failure of Congress to pass legislation complementing the JCPOA. There is a great deal that the legislative branch could do to support the pact. Above all, it could establish criteria putting teeth into U.S. enforcement of Preamble Section III, Iran's pledge never to seek nuclear weapons. Congress could and should make clear what the ramp to seeking nuclear weapons would look like, what the triggers would be for U.S. action, and what kinds of U.S. action would be on the table. If Iran knows that, it will modulate its behavior accordingly. If it does not, it will start to act out, and we have just kicked the can down the road. That delay is of course immensely valuable—but why not extend the road indefinitely? Congress can do that, and much more (e.g. by increasing funding for JCPOA oversight by the administration and the IAEA), with appropriate legislation.

Richard Nephew, Nonresident Senior Fellow, Center for 21st Century Security and Intelligence, Arms Control and Non-Proliferation Initiative, Foreign Policy program:

Over the past year, much effort has gone into ensuring that the Iran deal is fully implemented. To date, the P5+1 has—not surprisingly—gotten the better end of the bargain, with significant security benefits accruing to them and their partners in the Middle East once the International Atomic Energy Agency (IAEA) verified the required changes to Iran's nuclear program. Iran, for its part, has experienced a natural lag in its economic resurgence, held back by the collapse in oil prices in 2014, residual American and European sanctions, and reluctance among banks and businesses to re-engage.

But, Iran's economy has stabilized and—if the deal holds for its full measure—the security benefits that the P5+1 and their partners have won may fall away while Iran's economy continues to grow. The most important challenge related to the deal for the next U.S. administration (and, presumably, the Rouhani administration in its second term) is therefore: how can it be taken forward, beyond the 10- to 15-year transition period? Iran will face internal pressure to expand its nuclear program, but it also will face pressure to refrain both externally and internally, should other countries in the region seek to create their own matching nuclear capabilities. 

The best next step for all sides is to negotiate a region-wide arrangement to manage nuclear programs –one that constrains all sides, though perhaps not equally. It must ensure—at a minimum—that nuclear developments in the region are predictable, understandable, and credibly civilian (something Bob Einhorn and I addressed in a recent report). The next White House will need to do the hard work of convincing countries in the region—and beyond—not to rest on the victory of the JCPOA. Rather, they must take it for what it is: another step towards a more stable and manageable region.

Tamara Wittes, Senior Fellow and Director, Center for Middle East Policy, Foreign Policy program

This week, Washington is awash in events and policy papers taking stock of how the Iran nuclear deal has changed the Middle East in the past year. The narratives presented this week largely track the positions that the authors, speakers, or organizations articulated on the nuclear deal when it was first concluded last summer. Those who opposed the deal have marshaled evidence of how the deal has "emboldened" Iran's destabilizing behavior, while those who supported the deal cite evidence of "moderated" politics in the Islamic Republic. That polarized views on the deal last year produce polarized assessments of the deal's impact this year should surprise no one.

In fact, no matter which side of the nuclear agreement’s worth it presents, much of the analysis out this week ascribes to the nuclear deal Iranian behavior and attitudes in the region that existed before the deal's conclusion and implementation. Iran has been a revisionist state, and a state sponsor of terrorism, since the 1979 Islamic Revolution. The Saudi-Iranian rivalry predates the revolution; Iran's backing of Houthi militias against Saudi and its allies in Yemen well predates the nuclear agreement. Most notably, the upheavals in the Arab world since 2011 have given Iran wider opportunities than perhaps ever before to exploit the cracks within Arab societies—and to use cash, militias, and other tools to advance its interests and expand its influence. Iran has exploited those opportunities skillfully in the last five years and, as I wrote last summer, was likely to continue to do so regardless of diplomatic success or failure in Vienna. To argue that the nuclear deal somehow created these problems, or could solve them, is ahistorical. 

It is true that Iran's access to global markets might free even more cash for these endeavors, and that is a real issue worth tracking. But since severe sanctions did not prevent Iran from spending hundreds of millions of dollars to support and supply Hezbollah, or marshaling Islamic Revolutionary Guard Corps (IRGC) and militia fighters to sustain the faltering regime of Bashar Assad in Syria, it's not clear that additional cash will generate a meaningful difference in regional outcomes. Certainly, the nuclear deal's conclusion and implementation did not alter the trajectory of Iranian policy in Yemen, Iraq, Syria, or Lebanon to any noticeable degree—and that means that, no matter what the merits or dangers of the JCPOA, the United States must still confront and work to resolve enduring challenges to regional instability—including Iran's revisionist behavior.

Kenneth M. Pollack, Senior Fellow, Center for Middle East Policy, Foreign Policy program: 

When the JCPOA was being debated last year, I felt that the terms of the deal were far less consequential than how the United States responded to Iranian regional behavior after a deal was signed. I see the events of the past 12 months as largely having borne that out. While both sides have accused the other of "cheating," the deal has so far largely held. However, as many of my colleagues have noted, the real frictions have arisen from the U.S. geostrategic response to the deal.

I continue to believe that signing the JCPOA was better than any of the realistic alternatives—though I also continue to believe that a better deal was possible, had the administration handled the negotiations differently. However, the administration’s regional approach since then has been problematic—with officials condemning Riyadh and excusing Tehran in circumstances where both were culpable and ignoring some major Iranian transgressions, for instance (and with President Obama gratuitously insulting the Saudis and other U.S. allies in interviews). 

America's traditional Sunni Arab allies (and to some extent Turkey and Israel) feared that either the United States would use the JCPOA as an excuse to further disengage from the region or to switch sides and join the Iranian coalition. Their reading of events has been that this is precisely what has happened, and it is causing the GCC states to act more aggressively.

I think our traditional allies would enthusiastically welcome a Hillary Clinton presidency. She would likely do all that she could to reassure them that she plans to be more engaged and more willing to commit American resources and energy to Middle Eastern problems. But those allies will eventually look for her to turn words into action. I cannot imagine a Hillary Clinton administration abrogating the JCPOA, imposing significant new economic sanctions on Iran, or otherwise acting in ways that it would fear could provoke Tehran to break the deal. Our allies may see that as Washington trying to remain on the fence, which will infuriate them. 

So there are some important strategic differences between the United States and its regional allies. The second anniversary of the JCPOA could therefore prove even more fraught for America and the Middle East than the first. 


       




y

Refugees: Why Seeking Asylum is Legal and Australia’s Policies are Not

      
 
 




y

Australia’s Asylum Bill is High-Handed and Cambodia Deal Just a Quick Fix

      
 
 




y

One Step Forward, Many Steps Back for Refugees

      
 
 




y

Australia’s Obligations Still Apply Despite High Court Win

      
 
 




y

Migration with dignity – climate change and Kiribati

      
 
 




y

Principles for Transparency and Public Participation in Redistricting

Scholars from the Brookings Institution and the American Enterprise Institute are collaborating to promote transparency in redistricting. In January 2010, an advisory board of experts and representatives of good government groups was convened in order to articulate principles for transparent redistricting and to identify barriers to the public and communities who wish to create redistricting…

      
 
 




y

Using Crowd-Sourced Mapping to Improve Representation and Detect Gerrymanders in Ohio

Analysis of dozens of publicly created redistricting plans shows that map-making technology can improve political representation and detect a gerrymander.  In 2012, President Obama won the vote in Ohio by three percentage points, while Republicans held a 13-to-5 majority in Ohio’s delegation to the U.S. House. After redistricting in 2013, Republicans held 12 of Ohio’s…

      
 
 




y

Terrorists and Detainees: Do We Need a New National Security Court?

In the wake of the 9/11 attacks and the capture of hundreds of suspected al Qaeda and Taliban fighters, we have been engaged in a national debate as to the proper standards and procedures for detaining “enemy combatants” and prosecuting them for war crimes. Dissatisfaction with the procedures established at Guantanamo for detention decisions and…

       




y

Targeted Killing in U.S. Counterterrorism Strategy and Law

The following is part of the Series on Counterterrorism and American Statutory Law, a joint project of the Brookings Institution, the Georgetown University Law Center, and the Hoover Institution Introduction It is a slight exaggeration to say that Barack Obama is the first president in American history to have run in part on a political…

       




y

The Impact of Domestic Drones on Privacy, Safety and National Security

Legal and technology experts hosted a policy discussion on how drones and forthcoming Federal Aviation Agency regulations into unmanned aerial vehicles will affect Americans’ privacy, safety and the country’s overall security on April 4, 2012 at Brookings. The event followed a new aviation bill, signed in February, which will open domestic skies to “unmanned aircraft…

       




y

COVID-19 is turning the Midwest’s long legacy of segregation deadly

The COVID-19 pandemic is unmasking a lot of ugly economic and social truths across the Midwest, especially in my home state of Michigan. The appearance of a good economy in the Midwest following the Great Recession (which hit the region very hard) was a bit of an illusion. Prior to the arrival of the coronavirus,…

       




y

American workers’ safety net is broken. The COVID-19 crisis is a chance to fix it.

The COVID-19 pandemic is forcing some major adjustments to many aspects of our daily lives that will likely remain long after the crisis recedes: virtual learning, telework, and fewer hugs and handshakes, just to name a few. But in addition, let’s hope the crisis also drives a permanent overhaul of the nation’s woefully inadequate worker…

       




y

Lessons from energy transitions in Germany and Japan

As the United Nations Conference on Climate Change in Paris approaches, countries around the world are looking for ways to lower carbon emissions. Germany and Japan are both undertaking dramatic transitions in their electricity sectors, moving away from nuclear energy and deploying more renewable power. Germany has set an ambitious goal of 80 to 95…

       




y

When the champagne is finished: Why the post-Paris parade of climate euphoria is largely premature

The new international climate change agreement has received largely positive reviews despite the fact that many years of hard work will be required to actually turn “Paris” into a success. As with all international agreements, the Paris agreement too will have to be tested and proven over time. The Eiffel Tower is engulfed in fog…

       




y

6 years from the BP Deepwater Horizon oil spill: What we’ve learned, and what we shouldn’t misunderstand

Six years ago today, the BP Deepwater Horizon oil spill occurred in the U.S. Gulf of Mexico with devastating effects on the local environment and on public perception of offshore oil and gas drilling. The blowout sent toxic fluids and gas shooting up the well, leading to an explosion on board the rig that killed…

       




y

The halfway point of the U.S. Arctic Council chairmanship

On April 24, 2015, the United States assumed chairmanship of the Arctic Council for a two-year term. Over the course of the last year, the United States has outlined plans within three central priorities: improving economic and living conditions for Arctic communities; Arctic Ocean safety, security, and stewardship; and addressing the impacts of climate change.…

       




y

India’s energy and climate policy: Can India meet the challenge of industrialization and climate change?

In Paris this past December, 195 nations came to an historical agreement to reduce carbon emissions and limit the devastating impacts of climate change. While it was indeed a triumphant event worthy of great praise, these nations are now faced with the daunting task of having to achieve their intended climate goals. For many developing…

       




y

The presidential candidates’ views on energy and climate

This election cycle, what will separate Democrats from Republicans on energy policy and their approach to climate change? Republicans tend to be fairly strong supporters of the fossil fuel industry, and to various degrees deny that climate change is occurring. Democratic candidates emphasize the importance of further expanding the share of renewable energy at the…

       




y

The post-Paris clean energy landscape: Renewable energy in 2016 and beyond

Last year’s COP21 summit saw global economic powers and leading greenhouse gas emitters—including the United States, China, and India—commit to the most ambitious clean energy targets to date. Bolstered by sharp reductions in costs and supportive government policies, renewable power spread globally at its fastest-ever rate in 2015, accounting for more than half of the…

       




y

40 years later: America’s energy path and the road ahead

In a 1976 Foreign Affairs article, Amory Lovins offered a novel—and controversial—vision for America’s energy strategy. With U.S. security and energy independence threatened by oil market instability, Lovins urged policymakers to move away from fossil fuels and nuclear and towards efficiency and renewable energy. This “soft energy path,” he argued, offered a myriad of clear…

       




y

Girls, boys, and reading


Part I of the 2015 Brown Center Report on American Education

Girls score higher than boys on tests of reading ability.  They have for a long time.  This section of the Brown Center Report assesses where the gender gap stands today and examines trends over the past several decades.  The analysis also extends beyond the U.S. and shows that boys’ reading achievement lags that of girls in every country in the world on international assessments.  The international dimension—recognizing that U.S. is not alone in this phenomenon—serves as a catalyst to discuss why the gender gap exists and whether it extends into adulthood.

Background

One of the earliest large-scale studies on gender differences in reading, conducted in Iowa in 1942, found that girls in both elementary and high schools were better than boys at reading comprehension.[i] The most recent results from reading tests of the National Assessment of Educational Progress (NAEP) show girls outscoring boys at every grade level and age examined.  Gender differences in reading are not confined to the United States.  Among younger children—age nine to ten, or about fourth grade—girls consistently outscore boys on international assessments, from a pioneering study of reading comprehension conducted in fifteen countries in the 1970s, to the results of the Program in International Reading Literacy Study (PIRLS) conducted in forty-nine nations and nine benchmarking entities in 2011.  The same is true for students in high school.  On the 2012 reading literacy test of the Program for International Student Assessment (PISA), worldwide gender gaps are evident between fifteen-year-old males and females.

As the 21st century dawned, the gender gap came under the scrutiny of reporters and pundits.  Author Christina Hoff Sommers added a political dimension to the gender gap, and some say swept the topic into the culture wars raging at the time, with her 2000 book The War Against Boys: How Misguided Feminism is Harming Our Young Men.[ii] Sommers argued that boys’ academic inferiority, and in particular their struggles with reading, stemmed from the feminist movement’s impact on schools and society.  In the second edition, published in 2013, she changed the subtitle to How Misguided Policies Are Harming Our Young Men.  Some of the sting is removed from the  indictment of “misguided feminism.”  But not all of it.  Sommers singles out for criticism a 2008 report from the American Association of University Women.[iii] That report sought to debunk the notion that boys fared poorly in school compared to girls.  It left out a serious discussion of boys’ inferior performance on reading tests, as well as their lower grade point averages, greater rate of school suspension and expulsion, and lower rate of acceptance into college.

Journalist Richard Whitmire picked up the argument about the gender gap in 2010 with Why Boys Fail: Saving Our Sons from an Educational System That’s Leaving Them Behind.[iv] Whitmire sought to separate boys’ academic problems from the culture wars, noting that the gender gap in literacy is a worldwide phenomenon and appears even in countries where feminist movements are weak to nonexistent.  Whitmire offers several reasons for boys’ low reading scores, including poor reading instruction (particularly a lack of focus on phonics), and too few books appealing to boys’ interests.  He also dismisses several explanations that are in circulation, among them, video games, hip-hop culture, too much testing, and feminized classrooms.  As with Sommers’s book, Whitmire’s culprit can be found in the subtitle: the educational system.  Even if the educational system is not the original source of the problem, Whitmire argues, schools could be doing more to address it. 

In a 2006 monograph, education policy researcher Sara Mead took on the idea that American boys were being shortchanged by schools.  After reviewing achievement data from NAEP and other tests, Mead concluded that the real story of the gender gap wasn’t one of failure at all.  Boys and girls were both making solid academic progress, but in some cases, girls were making larger gains, misleading some commentators into concluding that boys were being left behind.  Mead concluded, “The current boy crisis hype and the debate around it are based more on hopes and fears than on evidence.”[v]

Explanations for the Gender Gap

The analysis below focuses on where the gender gap in reading stands today, not its causes.  Nevertheless, readers should keep in mind the three most prominent explanations for the gap.  They will be used to frame the concluding discussion.

Biological/Developmental:  Even before attending school, young boys evidence more problems in learning how to read than girls.  This explanation believes the sexes are hard-wired differently for literacy.

School Practices: Boys are inferior to girls on several school measures—behavioral, social, and academic—and those discrepancies extend all the way through college.  This explanation believes that even if schools do not create the gap, they certainly don’t do what they could to ameliorate it. 

Cultural Influences: Cultural influences steer boys toward non-literary activities (sports, music) and define literacy as a feminine characteristic.  This explanation believes cultural cues and strong role models could help close the gap by portraying reading as a masculine activity. 

The U.S. Gender Gap in Reading

Table 1-1 displays the most recent data from eight national tests of U.S. achievement.  The first group shows results from the National Assessment of Educational Progress Long Term Trend (NAEP-LTT), given to students nine, 13, and 17 years of age.  The NAEP-LTT in reading was first administered in 1971.  The second group of results is from the NAEP Main Assessment, which began testing reading achievement in 1992.  It assesses at three different grade levels: fourth, eighth, and twelfth.   The last two tests are international assessments in which the U.S. participates, the Progress in International Reading Literacy Study (PIRLS), which began in 2001, and the Program for International Student Assessment (PISA), first given in 2000.  PIRLS tests fourth graders, and PISA tests 15-year-olds.  In the U.S., 71 percent of students who took PISA in the fall of 2012 were in tenth grade. 

Two findings leap out.  First, the test score gaps between males and females are statistically significant on all eight assessments.  Because the sample sizes of the assessments are quite large, statistical significance does not necessarily mean that the gaps are of practical significance—or even noticeable if one observed several students reading together.  The tests also employ different scales.  The final column in the table expresses the gaps in standard deviation units, a measure that allows for comparing the different scores and estimating their practical meaningfulness.

The second finding is based on the standardized gaps (expressed in SDs).  On both NAEP tests, the gaps are narrower among elementary students and wider among middle and high school students.  That pattern also appears on international assessments.  The gap is twice as large on PISA as on PIRLS.[vi]  A popular explanation for the gender gap involves the different maturation rates of boys and girls.  That theory will be discussed in greater detail below, but at this point in the analysis, let’s simply note that the gender gap appears to grow until early adolescence—age 13 on the LTT-NAEP and grade eight on the NAEP Main.

Should these gaps be considered small or large?  Many analysts consider 10 scale score points on NAEP equal to about a year of learning.  In that light, gaps of five to 10 points appear substantial.  But compared to other test score gaps on NAEP, the gender gap is modest in size.  On the 2012 LTT-NAEP for nine-year-olds, the five point gap between boys and girls is about one-half of the 10 point gap between students living in cities and those living in suburbs.[vii]  The gap between students who are eligible for free and reduced lunch and those who are not is 28 points; between black and white students, it is 23 points; and between English language learners (ELL) and non-ELL students, it is 34 points. 

Table 1-1 only shows the size of the gender gap as gauged by assessments at single points in time.  For determining trends, let’s take a closer look at the LTT-NAEP, since it provides the longest running record of the gender gap.  In Table 1-2, scores are displayed from tests administered since 1971 and given nearest to the starts and ends of decades.  Results from 2008 and 2012 are both shown to provide readers an idea of recent fluctuations.  At all three ages, gender gaps were larger in 1971 than they are today.  The change at age nine is statistically significant, but not at age 13 (p=0.10) or age 17 (p=.07), although they are close.  Slight shrinkage occurred in the 1980s, but the gaps expanded again in the 1990s.  The gap at age 13 actually peaked at 15 scale score points in 1994 (not shown in the table), and the decline since then is statistically significant.  Similarly, the gap at age 17 peaked in 1996 at 15 scale score points, and the decline since then is also statistically significant.  More recently, the gap at age nine began to shrink again in 1999, age 13 began shrinking in the 2000s, and age 17 in 2012.

Table 1-3 decomposes the change figures by male and female performance.  Sara Mead’s point, that the NAEP story is one of both sexes gaining rather than boys falling behind, is even truer today than when she made it in 2006.  When Mead’s analysis was published, the most recent LTT-NAEP data were from 2004.  Up until then, girls had made greater reading gains than boys.  But that situation has reversed.  Boys have now made larger gains over the history of LTT-NAEP, fueled by the gains that they registered from 2004 to 2012.  The score for 17-year-old females in 2012 (291) was identical to their score in 1971.

International Perspective

The United States is not alone in reading’s gender gap.  Its gap of 31 points is not even the largest (see Figure 1-1). On the 2012 PISA, all OECD countries exhibited a gender gap, with females outscoring males by 23 to 62 points on the PISA scale (standard deviation of 94).   On average in the OECD, girls outscored boys by 38 points (rounded to 515 for girls and 478 for boys).  The U.S. gap of 31 points is less than the OECD average.

Finland had the largest gender gap on the 2012 PISA, twice that of the U.S., with females outscoring males by an astonishing 62 points (0.66 SDs).  Finnish girls scored 556, and boys scored 494.  To put this gap in perspective, consider that Finland’s renowned superiority on PISA tests is completely dependent on Finnish girls.  Finland’s boys’ score of 494 is about the same as the international average of 496, and not much above the OECD average for males (478).  The reading performance of Finnish boys is not statistically significantly different from boys in the U.S. (482) or from the average U.S. student, both boys and girls (498). Finnish superiority in reading only exists among females.

There is a hint of a geographical pattern.  Northern European countries tend to have larger gender gaps in reading.  Finland, Sweden, Iceland, and Norway have four of the six largest gaps.  Denmark is the exception with a 31 point gap, below the OECD average.   And two Asian OECD members have small gender gaps.  Japan’s gap of 24 points and South Korea’s gap of 23 are ranked among the bottom four countries. The Nordic tendency toward large gender gaps in reading was noted in a 2002 analysis of the 2000 PISA results.[viii]  At that time, too, Denmark was the exception.  Because of the larger sample and persistence over time, the Nordic pattern warrants more confidence than the one in the two Asian countries.

Back to Finland.  That’s the headline story here, and it contains a lesson for cautiously interpreting international test scores.  Consider that the 62 point gender gap in Finland is only 14 points smaller than the U.S. black-white gap (76 points) and 21 points larger than the white-Hispanic gap (41 points) on the same test.  Finland’s gender gap illustrates the superficiality of much of the commentary on that country’s PISA performance.  A common procedure in policy analysis is to consider how policies differentially affect diverse social groups.  Think of all the commentators who cite Finland to promote particular policies, whether the policies address teacher recruitment, amount of homework, curriculum standards, the role of play in children’s learning, school accountability, or high stakes assessments.[ix]  Advocates pound the table while arguing that these policies are obviously beneficial.  “Just look at Finland,” they say.  Have you ever read a warning that even if those policies contribute to Finland’s high PISA scores—which the advocates assume but serious policy scholars know to be unproven—the policies also may be having a negative effect on the 50 percent of Finland’s school population that happens to be male?

Would Getting Boys to Enjoy Reading More Help Close the Gap?

One of the solutions put forth for improving boys’ reading scores is to make an effort to boost their enjoyment of reading.  That certainly makes sense, but past scores of national reading and math performance have consistently, and counterintuitively, shown no relationship (or even an inverse one) with enjoyment of the two subjects.  PISA asks students how much they enjoy reading, so let’s now investigate whether fluctuations in PISA scores are at all correlated with how much 15-year-olds say they like to read.

The analysis below employs what is known as a “differences-in-differences” analytical strategy.  In both 2000 and 2009, PISA measured students’ reading ability and asked them several questions about how much they like to read.  An enjoyment index was created from the latter set of questions.[x]  Females score much higher on this index than boys.  Many commentators believe that girls’ greater enjoyment of reading may be at the root of the gender gap in literacy.

When new international test scores are released, analysts are tempted to just look at variables exhibiting strong correlations with achievement (such as amount of time spent on homework), and embrace them as potential causes of high achievement. But cross-sectional correlations can be deceptive.  The direction of causality cannot be determined, whether it’s doing a lot of homework that leads to high achievement, or simply that good students tend to take classes that assign more homework.  Correlations in cross-sectional data are also vulnerable to unobserved factors that may influence achievement.  For example, if cultural predilections drive a country’s exemplary performance, their influence will be masked or spuriously assigned to other variables unless they are specifically modeled.[xi]  Class size, between-school tracking, and time spent on learning are all topics on which differences-in-differences has been fruitfully employed to analyze multiple cross-sections of international data.

Another benefit of differences-in-differences is that it measures statistical relationships longitudinally.  Table 1-4 investigates the question: Is the rise and fall of reading enjoyment correlated with changes in reading achievement?  Many believe that if boys liked reading more, their literacy test scores would surely increase.  Table 1-4 does not support that belief.  Data are available for 27 OECD countries, and they are ranked by how much they boosted males’ enjoyment of reading.  The index is set at the student-level with a mean of 0.00 and standard deviation of 1.00.  For the twenty-seven nations in Table 1-4, the mean national change in enjoyment is -.02 with a standard deviation of .09. 

Germany did the best job of raising boys’ enjoyment of reading, with a gain of 0.12 on the index.  German males’ PISA scores also went up—a little more than 10 points (10.33).  France, on the other hand, raised males’ enjoyment of reading nearly as much as Germany (0.11), but French males’ PISA scores declined by 15.26 points.  A bit further down the column, Ireland managed to get boys to enjoy reading a little more (a gain of 0.05) but their reading performance fell a whopping 36.54 points.  Toward the bottom end of the list, Poland’s boys enjoyed reading less in 2009 than in 2000, a decline of 0.14 on the index, but over the same time span, their reading literacy scores increased by more than 14 points (14.29).  Among the countries in which the relationship goes in the expected direction is Finland.  Finnish males’ enjoyment of reading declined (-0.14) as did their PISA scores in reading literacy (-11.73).  Overall, the correlation coefficient for change in enjoyment and change in reading score is -0.01, indicating no relationship between the two.

Christina Hoff Sommers and Richard Whitmire have praised specific countries for first recognizing and then addressing the gender gap in reading.  Recently, Sommers urged the U.S. to “follow the example of the British, Canadians, and Australians.”[xii]  Whitmire described Australia as “years ahead of the U.S. in pioneering solutions” to the gender gap.  Let’s see how those countries appear in Table 1-4.  England does not have PISA data for the 2000 baseline year, but both Canada and Australia are included.  Canada raised boys’ enjoyment of reading a little bit (0.02) but Canadian males’ scores fell by about 12 points (-11.74).  Australia suffered a decline in boys’ enjoyment of reading (-0.04) and achievement (-16.50).  As promising as these countries’ efforts may have appeared a few years ago, so far at least, they have not borne fruit in raising boys’ reading performance on PISA.

Achievement gaps are tricky because it is possible for the test scores of the two groups being compared to both decline while the gap increases or, conversely, for scores of both to increase while the gap declines.  Table 1-4 only looks at males’ enjoyment of reading and its relationship to achievement.  A separate differences-in-differences analysis was conducted (but not displayed here) to see whether changes in the enjoyment gap—the difference between boys’ and girls’ enjoyment of reading—are related to changes in reading achievement.  They are not (correlation coefficient of 0.08).  National PISA data simply do not support the hypothesis that the superior reading performance of girls is related to the fact that girls enjoy reading more than boys. 

Discussion

Let’s summarize the main findings of the analysis above. Reading scores for girls exceed those for boys on eight recent assessments of U.S. reading achievement.  The gender gap is larger for middle and high school students than for students in elementary school.  The gap was apparent on the earliest NAEP tests in the 1970s and has shown some signs of narrowing in the past decade.  International tests reveal that the gender gap is worldwide.  Among OECD countries, it even appears among countries known for superior performance on PISA’s reading test.  Finland not only exhibited the largest gender gap in reading on the 2012 PISA, the gap had widened since 2000.  A popular recommendation for boosting boys’ reading performance is finding ways for them to enjoy reading more.  That theory is not supported by PISA data.  Countries that succeeded in raising boys’ enjoyment of reading from 2000 to 2009 were no more likely to improve boys’ reading performance than countries where boys’ enjoyment of reading declined. 

The origins of the gender gap are hotly debated.  The universality of the gap certainly supports the argument that it originates in biological or developmental differences between the two sexes.  It is evident among students of different ages in data collected at different points in time.  It exists across the globe, in countries with different educational systems, different popular cultures, different child rearing practices, and different conceptions of gender roles.  Moreover, the greater prevalence of reading impairment among young boys—a ratio of two or three to one—suggests an endemic difficulty that exists before the influence of schools or culture can take hold.[xiii] 

But some of the data examined above also argue against the developmental explanation.  The gap has been shrinking on NAEP.  At age nine, it is less than half of what it was forty years ago.  Biology doesn’t change that fast.  Gender gaps in math and science, which were apparent in achievement data for a long time, have all but disappeared, especially once course taking is controlled.  The reading gap also seems to evaporate by adulthood.  On an international assessment of adults conducted in 2012, reading scores for men and women were statistically indistinguishable up to age 35—even in Finland and the United States.  After age 35, men had statistically significantly higher scores in reading, all the way to the oldest group, age 55 and older.  If the gender gap in literacy is indeed shaped by developmental factors, it may be important for our understanding of the phenomenon to scrutinize periods of the life cycle beyond the age of schooling.   

Another astonishing pattern emerged from the study of adult reading.  Participants were asked how often they read a book.  Of avid book readers (those who said they read a book once a week) in the youngest group (age 24 and younger), 59 percent were women and 41 percent were men.  By age 55, avid book readers were even more likely to be women, by a margin of 63 percent to 37 percent.  Two-thirds of respondents who said they never read books were men.  Women remained the more enthusiastic readers even as the test scores of men caught up with those of women and surpassed them.

A few years ago, Ian McEwan, the celebrated English novelist, decided to reduce the size of the library in his London townhouse.  He and his younger son selected thirty novels and took them to a local park.  They offered the books to passers-by.  Women were eager and grateful to take the books, McEwan reports.  Not a single man accepted.  The author’s conclusion? “When women stop reading, the novel will be dead.”[xiv] 

McEwan might be right, regardless of the origins of the gender gap in reading and the efforts to end it.



[i] J.B. Stroud and E.F. Lindquist, “Sex differences in achievement in the elementary and secondary schools,” Journal of Educational Psychology, vol. 33(9) (Washington, D.C.: American Psychological Association, 1942), 657-667.

[ii] Christina Hoff Sommers, The War Against Boys: How Misguided Feminism Is Harming Our Young Men (New York, NY: Simon & Schuster, 2000).

[iii] Christianne Corbett, Catherine Hill, and Andresse St. Rose, Where the Girls Are: The Facts About Gender Equity in Education (Washington, D.C.: American Association of University Women, 2008).

[iv] Richard Whitmire, Why Boys Fail: Saving Our Sons from an Educational System That’s Leaving Them Behind (New York, NY: AMACOM, 2010).

[v] Sara Mead, The Evidence Suggests Otherwise: The Truth About Boys and Girls (Washington, D.C.: Education Sector, 2006).

[vi] PIRLS and PISA assess different reading skills.  Performance on the two tests may not be comparable.

[vii] NAEP categories were aggregated to calculate the city/suburb difference.

[viii] OECD, Reading for Change: Performance and Engagement Across Countries (Paris: OECD, 2002), 125.

[ix] The best example of promoting Finnish education policies is Pasi Sahlberg’s  Finnish Lessons: What Can the World Learn from Educational Change in Finland? (New York: Teachers College Press, 2011).

[x] The 2009 endpoint was selected because 2012 data for the enjoyment index were not available on the NCES PISA data tool.

[xi] A formal name for the problem of reverse causality is endogeneity and for the problem of unobserved variables, omitted variable bias.

[xii] Christina Hoff Sommers, “The Boys at the Back,” New York Times, February 2, 2013;  Richard Whitmire, Why Boys Fail (New York: AMACOM, 2010), 153.

[xiii] J.L. Hawke, R.K. Olson, E.G. Willcutt, S.J. Wadsworth, & J.C. DeFries, “Gender ratios for reading difficulties,” Dyslexia 15(3), (Chichester, England: Wiley, 2009), 239–242.

[xiv] Daniel Zalewski, “The Background Hum: Ian McEwan’s art of unease,” The New Yorker, February 23, 2009. 

  Part II: Measuring Effects of the Common Core »

Downloads

Authors

     
 
 




y

Brookings Live: Girls, boys, and reading


Event Information

March 26, 2015
2:00 PM - 2:30 PM EDT

Online Only
Live Webcast

And more from the Brown Center Report on American Education



Girls outscore boys on practically every reading test given to a large population. And they have for a long time. A 1942 Iowa study found girls performing better than boys on tests of reading comprehension, vocabulary, and basic language skills, and girls have outscored boys on every reading test ever given by the National Assessment of Educational Progress (NAEP). This gap is not confined to the U.S. Reading tests administered as part of the Progress in International Reading Literacy Study (PIRLS) and the Program for International Student Assessment (PISA) reveal that the gender gap is a worldwide phenomenon.

On March 26, join Brown Center experts Tom Loveless and Matthew Chingos as they discuss the latest Brown Center Report on American Education, which examines this phenomenon. Hear what Loveless's analysis revealed about where the gender gap stands today and how it's trended over the past several decades - in the U.S. and around the world.

Tune in below or via Spreecast where you can submit questions. 

Spreecast is the social video platform that connects people.
Check out Girls, Boys, and Reading on Spreecast.

     
 
 




y

Common Core and classroom instruction: The good, the bad, and the ugly


This post continues a series begun in 2014 on implementing the Common Core State Standards (CCSS).  The first installment introduced an analytical scheme investigating CCSS implementation along four dimensions:  curriculum, instruction, assessment, and accountability.  Three posts focused on curriculum.  This post turns to instruction.  Although the impact of CCSS on how teachers teach is discussed, the post is also concerned with the inverse relationship, how decisions that teachers make about instruction shape the implementation of CCSS.

A couple of points before we get started.  The previous posts on curriculum led readers from the upper levels of the educational system—federal and state policies—down to curricular decisions made “in the trenches”—in districts, schools, and classrooms.  Standards emanate from the top of the system and are produced by politicians, policymakers, and experts.  Curricular decisions are shared across education’s systemic levels.  Instruction, on the other hand, is dominated by practitioners.  The daily decisions that teachers make about how to teach under CCSS—and not the idealizations of instruction embraced by upper-level authorities—will ultimately determine what “CCSS instruction” really means.

I ended the last post on CCSS by describing how curriculum and instruction can be so closely intertwined that the boundary between them is blurred.  Sometimes stating a precise curricular objective dictates, or at least constrains, the range of instructional strategies that teachers may consider.  That post focused on English-Language Arts.  The current post focuses on mathematics in the elementary grades and describes examples of how CCSS will shape math instruction.  As a former elementary school teacher, I offer my own personal opinion on these effects.

The Good

Certain aspects of the Common Core, when implemented, are likely to have a positive impact on the instruction of mathematics. For example, Common Core stresses that students recognize fractions as numbers on a number line.  The emphasis begins in third grade:

CCSS.MATH.CONTENT.3.NF.A.2
Understand a fraction as a number on the number line; represent fractions on a number line diagram.

CCSS.MATH.CONTENT.3.NF.A.2.A
Represent a fraction 1/b on a number line diagram by defining the interval from 0 to 1 as the whole and partitioning it into b equal parts. Recognize that each part has size 1/b and that the endpoint of the part based at 0 locates the number 1/b on the number line.

CCSS.MATH.CONTENT.3.NF.A.2.B
Represent a fraction a/b on a number line diagram by marking off a lengths 1/b from 0. Recognize that the resulting interval has size a/b and that its endpoint locates the number a/b on the number line.


When I first read this section of the Common Core standards, I stood up and cheered.  Berkeley mathematician Hung-Hsi Wu has been working with teachers for years to get them to understand the importance of using number lines in teaching fractions.[1] American textbooks rely heavily on part-whole representations to introduce fractions.  Typically, students see pizzas and apples and other objects—typically other foods or money—that are divided up into equal parts.  Such models are limited.  They work okay with simple addition and subtraction.  Common denominators present a bit of a challenge, but ½ pizza can be shown to be also 2/4, a half dollar equal to two quarters, and so on. 

With multiplication and division, all the little tricks students learned with whole number arithmetic suddenly go haywire.  Students are accustomed to the fact that multiplying two whole numbers yields a product that is larger than either number being multiplied: 4 X 5 = 20 and 20 is larger than both 4 and 5.[2]  How in the world can ¼ X 1/5 = 1/20, a number much smaller than either 1/4or 1/5?  The part-whole representation has convinced many students that fractions are not numbers.  Instead, they are seen as strange expressions comprising two numbers with a small horizontal bar separating them. 

I taught sixth grade but occasionally visited my colleagues’ classes in the lower grades.  I recall one exchange with second or third graders that went something like this:

“Give me a number between seven and nine.”  Giggles. 

“Eight!” they shouted. 

“Give me a number between two and three.”  Giggles.

“There isn’t one!” they shouted. 

“Really?” I’d ask and draw a number line.  After spending some time placing whole numbers on the number line, I’d observe,  “There’s a lot of space between two and three.  Is it just empty?” 

Silence.  Puzzled little faces.  Then a quiet voice.  “Two and a half?”

You have no idea how many children do not make the transition to understanding fractions as numbers and because of stumbling at this crucial stage, spend the rest of their careers as students of mathematics convinced that fractions are an impenetrable mystery.   And  that’s not true of just students.  California adopted a test for teachers in the 1980s, the California Basic Educational Skills Test (CBEST).  Beginning in 1982, even teachers already in the classroom had to pass it.   I made a nice after-school and summer income tutoring colleagues who didn’t know fractions from Fermat’s Last Theorem.  To be fair, primary teachers, teaching kindergarten or grades 1-2, would not teach fractions as part of their math curriculum and probably hadn’t worked with a fraction in decades.  So they are no different than non-literary types who think Hamlet is just a play about a young guy who can’t make up his mind, has a weird relationship with his mother, and winds up dying at the end.

Division is the most difficult operation to grasp for those arrested at the part-whole stage of understanding fractions.  A problem that Liping Ma posed to teachers is now legendary.[3]

She asked small groups of American and Chinese elementary teachers to divide 1 ¾ by ½ and to create a word problem that illustrates the calculation.  All 72 Chinese teachers gave the correct answer and 65 developed an appropriate word problem.  Only nine of the 23 American teachers solved the problem correctly.  A single American teacher was able to devise an appropriate word problem.  Granted, the American sample was not selected to be representative of American teachers as a whole, but the stark findings of the exercise did not shock anyone who has worked closely with elementary teachers in the U.S.  They are often weak at math.  Many of the teachers in Ma’s study had vague ideas of an “invert and multiply” rule but lacked a conceptual understanding of why it worked.

A linguistic convention exacerbates the difficulty.  Students may cling to the mistaken notion that “dividing in half” means “dividing by one-half.”  It does not.  Dividing in half means dividing by two.  The number line can help clear up such confusion.  Consider a basic, whole-number division problem for which third graders will already know the answer:  8 divided by 2 equals 4.   It is evident that a segment 8 units in length (measured from 0 to 8) is divided by a segment 2 units in length (measured from 0 to 2) exactly 4 times.  Modeling 12 divided by 2 and other basic facts with 2 as a divisor will convince students that whole number division works quite well on a number line. 

Now consider the number ½ as a divisor.  It will become clear to students that 8 divided by ½ equals 16, and they can illustrate that fact on a number line by showing how a segment ½ units in length divides a segment 8 units in length exactly 16 times; it divides a segment 12 units in length 24 times; and so on.  Students will be relieved to discover that on a number line division with fractions works the same as division with whole numbers.

Now, let’s return to Liping Ma’s problem: 1 ¾ divided by ½.   This problem would not be presented in third grade, but it might be in fifth or sixth grades.  Students who have been working with fractions on a number line for two or three years will have little trouble solving it.  They will see that the problem simply asks them to divide a line segment of 1 3/4 units by a segment of ½ units.  The answer is 3 ½ .  Some students might estimate that the solution is between 3 and 4 because 1 ¾ lies between 1 ½ and 2, which on the number line are the points at which the ½ unit segment, laid end on end, falls exactly three and four times.  Other students will have learned about reciprocals and that multiplication and division are inverse operations.  They will immediately grasp that dividing by ½ is the same as multiplying by 2—and since 1 ¾ x 2 = 3 ½, that is the answer.  Creating a word problem involving string or rope or some other linearly measured object is also surely within their grasp.

Conclusion

I applaud the CCSS for introducing number lines and fractions in third grade.  I believe it will instill in children an important idea: fractions are numbers.  That foundational understanding will aid them as they work with more abstract representations of fractions in later grades.   Fractions are a monumental barrier for kids who struggle with math, so the significance of this contribution should not be underestimated.

I mentioned above that instruction and curriculum are often intertwined.  I began this series of posts by defining curriculum as the “stuff” of learning—the content of what is taught in school, especially as embodied in the materials used in instruction.  Instruction refers to the “how” of teaching—how teachers organize, present, and explain those materials.  It’s each teacher’s repertoire of instructional strategies and techniques that differentiates one teacher from another even as they teach the same content.  Choosing to use a number line to teach fractions is obviously an instructional decision, but it also involves curriculum.  The number line is mathematical content, not just a teaching tool.

Guiding third grade teachers towards using a number line does not guarantee effective instruction.  In fact, it is reasonable to expect variation in how teachers will implement the CCSS standards listed above.  A small body of research exists to guide practice. One of the best resources for teachers to consult is a practice guide published by the What Works Clearinghouse: Developing Effective Fractions Instruction for Kindergarten Through Eighth Grade (see full disclosure below).[4]  The guide recommends the use of number lines as its second recommendation, but it also states that the evidence supporting the effectiveness of number lines in teaching fractions is inferred from studies involving whole numbers and decimals.  We need much more research on how and when number lines should be used in teaching fractions.

Professor Wu states the following, “The shift of emphasis from models of a fraction in the initial stage to an almost exclusive model of a fraction as a point on the number line can be done gradually and gracefully beginning somewhere in grade four. This shift is implicit in the Common Core Standards.”[5]  I agree, but the shift is also subtle.  CCSS standards include the use of other representations—fraction strips, fraction bars, rectangles (which are excellent for showing multiplication of two fractions) and other graphical means of modeling fractions.  Some teachers will manage the shift to number lines adroitly—and others will not.  As a consequence, the quality of implementation will vary from classroom to classroom based on the instructional decisions that teachers make.  

The current post has focused on what I believe to be a positive aspect of CCSS based on the implementation of the standards through instruction.  Future posts in the series—covering the “bad” and the “ugly”—will describe aspects of instruction on which I am less optimistic.



[1] See H. Wu (2014). “Teaching Fractions According to the Common Core Standards,” https://math.berkeley.edu/~wu/CCSS-Fractions_1.pdf. Also see "What's Sophisticated about Elementary Mathematics?" http://www.aft.org/sites/default/files/periodicals/wu_0.pdf

[2] Students learn that 0 and 1 are exceptions and have their own special rules in multiplication.

[3] Liping Ma, Knowing and Teaching Elementary Mathematics.

[4] The practice guide can be found at: http://ies.ed.gov/ncee/wwc/pdf/practice_guides/fractions_pg_093010.pdf I serve as a content expert in elementary mathematics for the What Works Clearinghouse.  I had nothing to do, however, with the publication cited.

[5] Wu, page 3.

Authors

     
 
 




y

CNN’s misleading story on homework


Last week, CNN ran a back-to-school story on homework with the headline, “Kids Have Three Times Too Much Homework, Study Finds; What’s the Cost?” Homework is an important topic, especially for parents, but unfortunately, CNN’s story misleads rather than informs. The headline suggests American parents should be alarmed because their kids have too much homework. Should they? No, CNN has ignored the best evidence on that question, which suggests the opposite. The story relies on the results of one recent study of homework—a study that is limited in what it can tell us, mostly because of its research design. But CNN even gets its main findings wrong. The study suggests most students have too little homework, not too much.

The Study

The study that piqued CNN’s interest was conducted during four months (two in the spring and two in the fall) in Providence, Rhode Island. About 1,200 parents completed a survey about their children’s homework while waiting in 27 pediatricians’ offices. Is the sample representative of all parents in the U.S.? Probably not. Certainly CNN should have been a bit leery of portraying the results of a survey conducted in a single American city—any city—as evidence applying to a broader audience. More importantly, viewers are never told of the study’s significant limitations: that the data come from a survey conducted in only one city—in pediatricians’ offices by a self-selected sample of respondents.

The survey’s sampling design is a huge problem. Because the sample is non-random there is no way of knowing if the results can be extrapolated to a larger population—even to families in Providence itself. Close to a third of respondents chose to complete the survey in Spanish. Enrollment in English Language programs in the Providence district comprises about 22 percent of students. About one-fourth (26 percent) of survey respondents reported having one child in the family. According to the 2010 Census, the proportion of families nationwide with one child is much higher, at 43 percent.[i] The survey is skewed towards large, Spanish-speaking families. Their experience with homework could be unique, especially if young children in these families are learning English for the first time at school.

The survey was completed by parents who probably had a sick child as they were waiting to see a pediatrician. That’s a stressful setting. The response rate to the survey is not reported, so we don’t know how many parents visiting those offices chose not to fill out the survey. If the typical pediatrician sees 100 unique patients per month, in a four month span the survey may have been offered to more than ten thousand parents in the 27 offices. The survey respondents, then, would be a tiny slice, 10 to 15 percent, of those eligible to respond. We also don’t know the public-private school break out of the respondents, or how many were sending their children to charter schools. It would be interesting to see how many parents willingly send their children to schools with a heavy homework load.

I wish the CNN team responsible for this story had run the data by some of CNN’s political pollsters. Alarm bells surely would have gone off. The hazards of accepting a self-selected, demographically-skewed survey sample as representative of the general population are well known. Modern political polling—and its reliance on random samples—grew from an infamous mishap in 1936. A popular national magazine, the Literary Digest, distributed 10 million post cards for its readers to return as “ballots” indicating who they would vote for in the 1936 race for president. More than two million post cards were returned! A week before the election, the magazine confidently predicted that Alf Landon, the Republican challenger from Kansas, would defeat Franklin Roosevelt, the Democratic incumbent, by a huge margin: 57 percent to 43 percent. In fact, when the real election was held, the opposite occurred: Roosevelt won more than 60% of the popular vote and defeated Landon in a landslide. Pollsters learned that self-selected samples should be viewed warily. The magazine’s readership was disproportionately Republican to begin with, and sometimes disgruntled subjects are more likely to respond to a survey, no matter the topic, than the satisfied.

Here’s a very simple question: In its next poll on the 2016 presidential race, would CNN report the results of a survey of self-selected respondents in 27 pediatricians’ offices in Providence, Rhode Island as representative of national sentiment? Of course not. Then, please, CNN, don’t do so with education topics.

The Providence Study’s Findings

Let’s set aside methodological concerns and turn to CNN’s characterization of the survey’s findings. Did the study really show that most kids have too much homework? No, the headline that “Kids Have Three Times Too Much Homework” is not even an accurate description of the study’s findings. CNN’s on air coverage extended the misinformation. The online video of the coverage is tagged “Study: Your Kids Are Doing Too Much Homework.” The first caption that viewers see is “Study Says Kids Getting Way Too Much Homework.” All of these statements are misleading.

In the published version of the Providence study, the researchers plotted the average amount of time spent on homework by students’ grade.[ii] They then compared those averages to a “10 minutes per-grade” guideline that serves as an indicator of the “right” amount of homework. I have attempted to replicate the data here in table form (they were originally reported in a line graph) to make that comparison easier.[iii]

Contrary to CNN’s reporting, the data suggest—based on the ten minute per-grade rule—that most kids in this study have too little homework, not too much. Beginning in fourth grade, the average time spent on homework falls short of the recommended amount—a gap of only four minutes in fourth grade that steadily widens in later grades.

A more accurate headline would have been, “Study Shows Kids in Nine out of 13 Grades Have Too Little Homework.” It appears high school students (grades 9-12) spend only about half the recommended time on homework. Two hours of nightly homework is recommended for 12th graders. They are, after all, only a year away from college. But according to the Providence survey, their homework load is less than an hour.

So how in the world did CNN come up with the headline “Kids Have Three Times Too Much Homework?” By focusing on grades K-3 and ignoring all other grades. Here’s the reporting:

The study, published Wednesday in The American Journal of Family Therapy, found students in the early elementary school years are getting significantly more homework than is recommended by education leaders, in some cases nearly three times as much homework as is recommended.

 

The standard, endorsed by the National Education Association and the National Parent-Teacher Association, is the so-called "10-minute rule"— 10 minutes per-grade level per-night. That translates into 10 minutes of homework in the first grade, 20 minutes in the second grade, all the way up to 120 minutes for senior year of high school. The NEA and the National PTA do not endorse homework for kindergarten.

 

In the study involving questionnaires filled out by more than 1,100 English and Spanish speaking parents of children in kindergarten through grade 12, researchers found children in the first grade had up to three times the homework load recommended by the NEA and the National PTA.

 

Parents reported first-graders were spending 28 minutes on homework each night versus the recommended 10 minutes. For second-graders, the homework time was nearly 29 minutes, as opposed to the 20 minutes recommended.

 

And kindergartners, their parents said, spent 25 minutes a night on after-school assignments, according to the study

 

CNN focused on the four grades, K-3, in which homework exceeds the ten-minute rule. They ignored more than two-thirds of the grades. Even with this focus, a more accurate headline would have been, “Study Suggests First Graders in Providence, RI Have Three Times Too Much Homework.”

Conclusion

Homework is a controversial topic. People hold differing points of view as to whether there is too much, too little, or just the right amount of homework. That makes it vitally important that the media give accurate information on the empirical dimensions to the debate.  The amount of homework kids should have is subject to debate. But the amount of homework kids actually have is an empirical question. We can debate whether it’s too hot outside, but the actual temperature should be a matter of measurement, not debate. It’s impossible to think of a rational debate that can possibly ensue on the homework issue without knowing the empirical status quo in regards to time. Imagine someone beginning a debate by saying, “I am arguing that kids have too much [substitute “too little” here for the pro-homework side] homework but I must admit that I have no idea how much they currently have.”

Data from the National Assessment of Educational Progress (NAEP) provide the best evidence we have on the amount of homework that kids have. NAEP’s sampling design allows us to make inferences about national trends, and the Long-Term Trend (LTT) NAEP offers data on homework since 1984. The latest LTT NAEP results (2012) indicate that the vast majority of nine-year-olds (83 percent) have less than an hour of homework each night. There has been an apparent uptick in the homework load, however, as 35 percent reported no homework in 1984, and only 22 percent reported no homework in 2012. MET Life also periodically surveys a representative sample of students, parents, and teachers on the homework issue. In the 2007 results, a majority of parents (52 percent) of elementary grade students (grades 3-6 in the MET survey) estimated their children had 30 minutes or less of homework.

The MET Life survey found that parents have an overwhelmingly positive view of the amount of homework their children are assigned. Nine out of ten parents responded that homework offers the opportunity to talk and spend time with their children, and most do not see homework as interfering with family time or as a major source of familial stress. Minority parents, in particular, reported believing homework is beneficial for students’ success at school and in the future.[iv]

That said, just as there were indeed Alf Landon voters in 1936, there are indeed children for whom homework is a struggle. Some bring home more than they can finish in a reasonable amount of time. A complication for researchers of elementary age children is that the same students who have difficulty completing homework may have other challenges—difficulties with reading, low achievement, and poor grades in school.[v] Parents who question the value of homework often have a host of complaints about their child’s school. It is difficult for researchers to untangle all of these factors and determine, in the instances where there are tensions, whether homework is the real cause. To their credit, the researchers who conducted the Providence study are aware of these constraints and present a number of hypotheses warranting further study with a research design supporting causal inferencing. That’s the value of this research, not CNN’s misleading reporting of the findings.


[i] Calculated from data in Table 64, U.S. Census Bureau, Statistical Abstract of the United States: 2012, page 56. http://www.census.gov/compendia/statab/2012/tables/12s0064.pdf.

[ii] The mean sample size for each grade is reported as 7.7 percent (or 90 students).  Confidence intervals for each grade estimate are not reported.

[iii] The data in Table I are estimates (by sight) from a line graph incremented in five percentage point intervals.

[iv] Met Life, Met Life Survey of the American Teacher: The Homework Experience, November 13, 2007, pp. 15.

[v] Among high school students, the bias probably leans in the opposite direction: high achievers load up on AP, IB, and other courses that assign more homework.

Authors

     
 
 




y

No, the sky is not falling: Interpreting the latest SAT scores


Earlier this month, the College Board released SAT scores for the high school graduating class of 2015. Both math and reading scores declined from 2014, continuing a steady downward trend that has been in place for the past decade. Pundits of contrasting political stripes seized on the scores to bolster their political agendas. Michael Petrilli of the Fordham Foundation argued that falling SAT scores show that high schools need more reform, presumably those his organization supports, in particular, charter schools and accountability.* For Carol Burris of the Network for Public Education, the declining scores were evidence of the failure of polices her organization opposes, namely, Common Core, No Child Left Behind, and accountability.

Petrilli and Burris are both misusing SAT scores. The SAT is not designed to measure national achievement; the score losses from 2014 were miniscule; and most of the declines are probably the result of demographic changes in the SAT population. Let’s examine each of these points in greater detail.

The SAT is not designed to measure national achievement

It never was. The SAT was originally meant to measure a student’s aptitude for college independent of that student’s exposure to a particular curriculum. The test’s founders believed that gauging aptitude, rather than achievement, would serve the cause of fairness. A bright student from a high school in rural Nebraska or the mountains of West Virginia, they held, should have the same shot at attending elite universities as a student from an Eastern prep school, despite not having been exposed to the great literature and higher mathematics taught at prep schools. The SAT would measure reasoning and analytical skills, not the mastery of any particular body of knowledge. Its scores would level the playing field in terms of curricular exposure while providing a reasonable estimate of an individual’s probability of success in college.

Note that even in this capacity, the scores never suffice alone; they are only used to make admissions decisions by colleges and universities, including such luminaries as Harvard and Stanford, in combination with a lot of other information—grade point averages, curricular resumes, essays, reference letters, extra-curricular activities—all of which constitute a student’s complete application.

Today’s SAT has moved towards being a content-oriented test, but not entirely. Next year, the College Board will introduce a revised SAT to more closely reflect high school curricula. Even then, SAT scores should not be used to make judgements about U.S. high school performance, whether it’s a single high school, a state’s high schools, or all of the high schools in the country. The SAT sample is self-selected. In 2015, it only included about one-half of the nation’s high school graduates: 1.7 million out of approximately 3.3 million total. And that’s about one-ninth of approximately 16 million high school students.  Generalizing SAT scores to these larger populations violates a basic rule of social science. The College Board issues a warning when it releases SAT scores: “Since the population of test takers is self-selected, using aggregate SAT scores to compare or evaluate teachers, schools, districts, states, or other educational units is not valid, and the College Board strongly discourages such uses.”  

TIME’s coverage of the SAT release included a statement by Andrew Ho of Harvard University, who succinctly makes the point: “I think SAT and ACT are tests with important purposes, but measuring overall national educational progress is not one of them.”

The score changes from 2014 were miniscule

SAT scores changed very little from 2014 to 2015. Reading scores dropped from 497 to 495. Math scores also fell two points, from 513 to 511. Both declines are equal to about 0.017 standard deviations (SD).[i] To illustrate how small these changes truly are, let’s examine a metric I have used previously in discussing test scores. The average American male is 5’10” in height with a SD of about 3 inches. A 0.017 SD change in height is equal to about 1/20 of an inch (0.051). Do you really think you’d notice a difference in the height of two men standing next to each other if they only differed by 1/20th of an inch? You wouldn’t. Similarly, the change in SAT scores from 2014 to 2015 is trivial.[ii]

A more serious concern is the SAT trend over the past decade. Since 2005, reading scores are down 13 points, from 508 to 495, and math scores are down nine points, from 520 to 511. These are equivalent to declines of 0.12 SD for reading and 0.08 SD for math.[iii] Representing changes that have accumulated over a decade, these losses are still quite small. In the Washington Post, Michael Petrilli asked “why is education reform hitting a brick wall in high school?” He also stated that “you see this in all kinds of evidence.”

You do not see a decline in the best evidence, the National Assessment of Educational Progress (NAEP). Contrary to the SAT, NAEP is designed to monitor national achievement. Its test scores are based on a random sampling design, meaning that the scores can be construed as representative of U.S. students. NAEP administers two different tests to high school age students, the long term trend (LTT NAEP), given to 17-year-olds, and the main NAEP, given to twelfth graders.

Table 1 compares the past ten years’ change in test scores of the SAT with changes in NAEP.[iv] The long term trend NAEP was not administered in 2005 or 2015, so the closest years it was given are shown. The NAEP tests show high school students making small gains over the past decade. They do not confirm the losses on the SAT.

Table 1. Comparison of changes in SAT, Main NAEP (12th grade), and LTT NAEP (17-year-olds) scores. Changes expressed as SD units of base year.

SAT

2005-2015

Main NAEP

2005-2015

LTT NAEP

2004-2012

Reading

-0.12*

+.05*

+.09*

Math

-0.08*

+.09*

+.03

 *p<.05

Petrilli raised another concern related to NAEP scores by examining cohort trends in NAEP scores. The trend for the 17-year-old cohort of 2012, for example, can be constructed by using the scores of 13-year-olds in 2008 and 9-year-olds in 2004. By tracking NAEP changes over time in this manner, one can get a rough idea of a particular cohort’s achievement as students grow older and proceed through the school system. Examining three cohorts, Fordham’s analysis shows that the gains between ages 13 and 17 are about half as large as those registered between ages nine and 13. Kids gain more on NAEP when they are younger than when they are older.

There is nothing new here. NAEP scholars have been aware of this phenomenon for a long time. Fordham points to particular elements of education reform that it favors—charter schools, vouchers, and accountability—as the probable cause. It is true that those reforms more likely target elementary and middle schools than high schools. But the research literature on age discrepancies in NAEP gains (which is not cited in the Fordham analysis) renders doubtful the thesis that education policies are responsible for the phenomenon.[v]

Whether high school age students try as hard as they could on NAEP has been pointed to as one explanation. A 1996 analysis of NAEP answer sheets found that 25-to-30 percent of twelfth graders displayed off-task test behaviors—doodling, leaving items blank—compared to 13 percent of eighth graders and six percent of fourth graders. A 2004 national commission on the twelfth grade NAEP recommended incentives (scholarships, certificates, letters of recognition from the President) to boost high school students’ motivation to do well on NAEP. Why would high school seniors or juniors take NAEP seriously when this low stakes test is taken in the midst of taking SAT or ACT tests for college admission, end of course exams that affect high school GPA, AP tests that can affect placement in college courses, state accountability tests that can lead to their schools being deemed a success or failure, and high school exit exams that must be passed to graduate?[vi]

Other possible explanations for the phenomenon are: 1) differences in the scales between the ages tested on LTT NAEP (in other words, a one-point gain on the scale between ages nine and 13 may not represent the same amount of learning as a one-point gain between ages 13 and 17); 2) different rates of participation in NAEP among elementary, middle, and high schools;[vii] and 3) social trends that affect all high school students, not just those in public schools. The third possibility can be explored by analyzing trends for students attending private schools. If Fordham had disaggregated the NAEP data by public and private schools (the scores of Catholic school students are available), it would have found that the pattern among private school students is similar—younger students gain more than older students on NAEP. That similarity casts doubt on the notion that policies governing public schools are responsible for the smaller gains among older students.[viii]

Changes in the SAT population

Writing in the Washington Post, Carol Burris addresses the question of whether demographic changes have influenced the decline in SAT scores. She concludes that they have not, and in particular, she concludes that the growing proportion of students receiving exam fee waivers has probably not affected scores. She bases that conclusion on an analysis of SAT participation disaggregated by level of family income. Burris notes that the percentage of SAT takers has been stable across income groups in recent years. That criterion is not trustworthy. About 39 percent of students in 2015 declined to provide information on family income. The 61 percent that answered the family income question are probably skewed against low-income students who are on fee waivers (the assumption being that they may feel uncomfortable answering a question about family income).[ix] Don’t forget that the SAT population as a whole is a self-selected sample. A self-selected subsample from a self-selected sample tells us even less than the original sample, which told us almost nothing.

The fee waiver share of SAT takers increased from 21 percent in 2011 to 25 percent in 2015. The simple fact that fee waivers serve low-income families, whose children tend to be lower-scoring SAT takers, is important, but not the whole story here. Students from disadvantaged families have always taken the SAT. But they paid for it themselves. If an additional increment of disadvantaged families take the SAT because they don’t have to pay for it, it is important to consider whether the new entrants to the pool of SAT test takers possess unmeasured characteristics that correlate with achievement—beyond the effect already attributed to socioeconomic status.

Robert Kelchen, an assistant professor of higher education at Seton Hall University, calculated the effect on national SAT scores of just three jurisdictions (Washington, DC, Delaware, and Idaho) adopting policies of mandatory SAT testing paid for by the state. He estimated that these policies explain about 21 percent of the nationwide decline in test scores between 2011 and 2015. He also notes that a more thorough analysis, incorporating fee waivers of other states and districts, would surely boost that figure. Fee waivers in two dozen Texas school districts, for example, are granted to all juniors and seniors in high school. And all students in those districts (including Dallas and Fort Worth) are required to take the SAT beginning in the junior year. Such universal testing policies can increase access and serve the cause of equity, but they will also, at least for a while, lead to a decline in SAT scores.

Here, I offer my own back of the envelope calculation of the relationship of demographic changes with SAT scores. The College Board reports test scores and participation rates for nine racial and ethnic groups.[x] These data are preferable to family income because a) almost all students answer the race/ethnicity question (only four percent are non-responses versus 39 percent for family income), and b) it seems a safe assumption that students are more likely to know their race or ethnicity compared to their family’s income.

The question tackled in Table 2 is this: how much would the national SAT scores have changed from 2005 to 2015 if the scores of each racial/ethnic group stayed exactly the same as in 2005, but each group’s proportion of the total population were allowed to vary? In other words, the scores are fixed at the 2005 level for each group—no change. The SAT national scores are then recalculated using the 2015 proportions that each group represented in the national population.

Table 2. SAT Scores and Demographic Changes in the SAT Population (2005-2015)

Projected Change Based on Change in Proportions

Actual Change

Projected Change as Percentage of Actual Change

Reading

-9

-13

69%

Math

-7

-9

78%

The data suggest that two-thirds to three-quarters of the SAT score decline from 2005 to 2015 is associated with demographic changes in the test-taking population. The analysis is admittedly crude. The relationships are correlational, not causal. The race/ethnicity categories are surely serving as proxies for a bundle of other characteristics affecting SAT scores, some unobserved and others (e.g., family income, parental education, language status, class rank) that are included in the SAT questionnaire but produce data difficult to interpret.

Conclusion

Using an annual decline in SAT scores to indict high schools is bogus. The SAT should not be used to measure national achievement. SAT changes from 2014-2015 are tiny. The downward trend over the past decade represents a larger decline in SAT scores, but one that is still small in magnitude and correlated with changes in the SAT test-taking population.

In contrast to SAT scores, NAEP scores, which are designed to monitor national achievement, report slight gains for 17-year-olds over the past ten years. It is true that LTT NAEP gains are larger among students from ages nine to 13 than from ages 13 to 17, but research has uncovered several plausible explanations for why that occurs. The public should exercise great caution in accepting the findings of test score analyses. Test scores are often misinterpreted to promote political agendas, and much of the alarmist rhetoric provoked by small declines in scores is unjustified.


* In fairness to Petrilli, he acknowledges in his post, “The SATs aren’t even the best gauge—not all students take them, and those who do are hardly representative.”


[i] The 2014 SD for both SAT reading and math was 115.

[ii] A substantively trivial change may nevertheless reach statistical significance with large samples.

[iii] The 2005 SDs were 113 for reading and 115 for math.

[iv] Throughout this post, SAT’s Critical Reading (formerly, the SAT-Verbal section) is referred to as “reading.” I only examine SAT reading and math scores to allow for comparisons to NAEP. Moreover, SAT’s writing section will be dropped in 2016.

[v] The larger gains by younger vs. older students on NAEP is explored in greater detail in the 2006 Brown Center Report, pp. 10-11.

[vi] If these influences have remained stable over time, they would not affect trends in NAEP. It is hard to believe, however, that high stakes tests carry the same importance today to high school students as they did in the past.

[vii] The 2004 blue ribbon commission report on the twelfth grade NAEP reported that by 2002 participation rates had fallen to 55 percent. That compares to 76 percent at eighth grade and 80 percent at fourth grade. Participation rates refer to the originally drawn sample, before replacements are made. NAEP is conducted with two stage sampling—schools first, then students within schools—meaning that the low participation rate is a product of both depressed school (82 percent) and student (77 percent) participation. See page 8 of: http://www.nagb.org/content/nagb/assets/documents/publications/12_gr_commission_rpt.pdf

[viii] Private school data are spotty on the LTT NAEP because of problems meeting reporting standards, but analyses identical to Fordham’s can be conducted on Catholic school students for the 2008 and 2012 cohorts of 17-year-olds.

[ix] The non-response rate in 2005 was 33 percent.

[x] The nine response categories are: American Indian or Alaska Native; Asian, Asian American, or Pacific Islander; Black or African American; Mexican or Mexican American; Puerto Rican; Other Hispanic, Latino, or Latin American; White; Other; and No Response.

Authors

      
 
 




y

The NAEP proficiency myth


On May 16, I got into a Twitter argument with Campbell Brown of The 74, an education website.  She released a video on Slate giving advice to the next president.  The video begins: “Without question, to me, the issue is education. Two out of three eighth graders in this country cannot read or do math at grade level.”  I study student achievement and was curious.  I know of no valid evidence to make the claim that two out of three eighth graders are below grade level in reading and math.  No evidence was cited in the video.  I asked Brown for the evidentiary basis of the assertion.  She cited the National Assessment of Educational Progress (NAEP).

NAEP does not report the percentage of students performing at grade level.  NAEP reports the percentage of students reaching a “proficient” level of performance.  Here’s the problem. That’s not grade level. 

In this post, I hope to convince readers of two things:

1.  Proficient on NAEP does not mean grade level performance.  It’s significantly above that.
2.  Using NAEP’s proficient level as a basis for education policy is a bad idea.

Before going any further, let’s look at some history.

NAEP history 

NAEP was launched nearly five decades ago.  The first NAEP test was given in science in 1969, followed by a reading test in 1971 and math in 1973.  For the first time, Americans were able to track the academic progress of the nation’s students.  That set of assessments, which periodically tests students 9, 13, and 17 years old and was last given in 2012, is now known as the Long Term Trend (LTT) NAEP. 

It was joined by another set of NAEP tests in the 1990s.  The Main NAEP assesses students by grade level (fourth, eighth, and twelfth) and, unlike the LTT, produces not only national but also state scores.  The two tests, LTT and main, continue on parallel tracks today, and they are often confounded by casual NAEP observers.  The main NAEP, which was last administered in 2015, is the test relevant to this post and will be the only one discussed hereafter.  The NAEP governing board was concerned that the conventional metric for reporting results (scale scores) was meaningless to the public, so achievement standards (also known as performance standards) were introduced.  The percentage of students scoring at advanced, proficient, basic, and below basic levels are reported each time the main NAEP is given.

Does NAEP proficient mean grade level? 

The National Center for Education Statistics (NCES) states emphatically, “Proficient is not synonymous with grade level performance.” The National Assessment Governing Board has a brochure with information on NAEP, including a section devoted to myths and facts.  There, you will find this:

Myth: The NAEP Proficient level is like being on grade level.

 

Fact: Proficient on NAEP means competency over challenging subject matter.  This is not the same thing as being “on grade level,” which refers to performance on local curriculum and standards. NAEP is a general assessment of knowledge and skills in a particular subject.

Equating NAEP proficiency with grade level is bogus.  Indeed, the validity of the achievement levels themselves is questionable.  They immediately came under fire in reviews by the U.S. Government Accountability Office, the National Academy of Sciences, and the National Academy of Education.[1]  The National Academy of Sciences report was particularly scathing, labeling NAEP’s achievement levels as “fundamentally flawed.”

Despite warnings of NAEP authorities and critical reviews from scholars, some commentators, typically from advocacy groups, continue to confound NAEP proficient with grade level.  Organizations that support school reform, such as Achieve Inc. and Students First, prominently misuse the term on their websites.  Achieve presses states to adopt cut points aligned with NAEP proficient as part of new Common Core-based accountability systems.  Achieve argues that this will inform parents whether children “can do grade level work.” No, it will not.  That claim is misleading.

How unrealistic is NAEP proficient? 

Shortly after NCLB was signed into law, Robert Linn, one of the most prominent psychometricians of the past several decades, called ”the target of 100% proficient or above according to the NAEP standards more like wishful thinking than a realistic possibility.”  History is on the side of that argument.  When the first main NAEP in mathematics was given in 1990, only 13 % of eighth graders scored proficient and 2 % scored advanced.  Imagine using “proficient” as synonymous with grade level—85 % scored below grade level! 

The 1990 national average in eighth grade scale scores was 263 (see Table 1).  In 2015, the average was 282, a gain of 19 scale score points.

Table 1.  Main NAEP Eighth Grade Math Score, by achievement levels, 1990-2015

Year

Scale Score Average

Below Basic (%)

Basic

Proficient

Advanced

Proficient and Above

2015

282

29

38

25

8

33

2009

283

27

39

26

8

34

2003

278

32

39

23

5

28

1996

270

39

38

20

4

24

1990

263

48

37

13

2

15

That’s an impressive gain.  Analysts who study NAEP often use 10 points on the NAEP scale as a back of the envelope estimate of one year’s worth of learning.  Eighth graders have gained almost two years.  The percentage of students scoring below basic has dropped from 48%  in 1990 to 29% in 2015.  The percentage of students scoring proficient or above has more than doubled, from 15% to 33%.  That’s not bad news; it’s good news.

But the cut point for NAEP proficient is 299.  By that standard, two-thirds of eighth graders are still falling short.  Even students in private schools, despite hailing from more socioeconomically advantaged homes and in some cases being selectively admitted by schools, fail miserably at attaining NAEP proficiency.  More than half (53 percent) are below proficient. 

Today’s eighth graders have made it about half-way to NAEP proficient in 25 years, but they still need to gain almost two more years of math learning (17 points) to reach that level.  And, don’t forget, that’s just the national average, so even when that lofty goal is achieved, half of the nation’s students will still fall short of proficient.  Advocates of the NAEP proficient standard want it to be for all students.  That is ridiculous.  Another way to think about it: proficient for today’s eighth graders reflects approximately what the average twelfth grader knew in mathematics in 1990.   Someday the average eighth grader may be able to do that level of mathematics.  But it won’t be soon, and it won’t be every student.

In the 2007 Brown Center Report on American Education, I questioned whether NAEP proficient is a reasonable achievement standard.[2]  That year, a study by Gary Phillips of American Institutes for Research was published that projected the 2007 TIMSS scores on the NAEP scale.  Phillips posed the question: based on TIMSS, how many students in other countries would score proficient or better on NAEP?  The study’s methodology only produces approximations, but they are eye-popping.

Here are just a few countries:

Table 2.  Projected Percent NAEP Proficient, Eighth Grade Math

Singapore

73

Hong Kong SAR

66

Korea, Rep. of

65

Chinese Taipei

61

Japan

57

Belgium (Flemish)

40

United States

26

Israel

24

England

22

Italy

17

Norway

9 

Singapore was the top scoring nation on TIMSS that year, but even there, more than a quarter of students fail to reach NAEP proficient.  Japan is not usually considered a slouch on international math assessments, but 43% of its eighth graders fall short.  The U.S. looks weak, with only 26% of students proficient.  But England, Israel, and Italy are even weaker.  Norway, a wealthy nation with per capita GDP almost twice that of the U.S., can only get 9 out of 100 eighth graders to NAEP proficient.

Finland isn’t shown in the table because it didn’t participate in the 2007 TIMSS.  But it did in 2011, with Finland and the U.S. scoring about the same in eighth grade math.  Had Finland’s eighth graders taken NAEP in 2011, it’s a good bet that the proportion scoring below NAEP proficient would have been similar to that in the U.S.  And yet articles such as “Why Finland Has the Best Schools,” appear regularly in the U.S. press.[3]

Why it matters

The National Center for Education Statistics warns that federal law requires that NAEP achievement levels be used on a trial basis until the Commissioner of Education Statistics determines that the achievement levels are “reasonable, valid, and informative to the public.”  As the NCES website states, “So far, no Commissioner has made such a determination, and the achievement levels remain in a trial status.  The achievement levels should continue to be interpreted and used with caution.”

Confounding NAEP proficient with grade-level is uninformed.  Designating NAEP proficient as the achievement benchmark for accountability systems is certainly not cautious use.  If high school students are required to meet NAEP proficient to graduate from high school, large numbers will fail. If middle and elementary school students are forced to repeat grades because they fall short of a standard anchored to NAEP proficient, vast numbers will repeat grades.    

On NAEP, students are asked the highest level math course they’ve taken.  On the 2015 twelfth grade NAEP, 19% of students said they either were taking or had taken calculus.   These are the nation’s best and the brightest, the crème-de la crème of math students.  Only one in five students work their way that high up the hierarchy of American math courses.  If you are over 45 years old and reading this, the proportion who took calculus in high school is less than one out of ten.  In the graduating class of 1990, for instance, only 7% of students had taken calculus.[4] 

Unsurprisingly, calculus students are also typically taught by the nation’s most knowledgeable math teachers.  The nation’s elite math students paired with the nation’s elite math teachers: if any group can prove NAEP proficient a reasonable goal and succeed in getting all students over the NAEP proficiency bar, this is the group. 

But they don’t.  A whopping 30% score below proficient on NAEP.  For black and Hispanic calculus students, the figures are staggering.  Two-thirds of black calculus students score below NAEP proficient.  For Hispanics, the figure is 52%.  The nation’s pre-calculus students also fair poorly (69% below proficient). Then the success rate falls off a cliff.  In the class of 2015, more than nine out of ten students whose highest math course was Trigonometry or Algebra II fail to meet the NAEP proficient standard.

Table 3.  2015 NAEP Twelfth Grade Math, Proficient by Highest Math Course Taken

Highest Math Course Taken

Percentage Below NAEP Proficient

Calculus

30

Pre-calculus

69

Trig/Algebra II

92

Source: NAEP Data Explorer

These data defy reason; they also refute common sense.  For years, educators have urged students to take the toughest courses they can possibly take.  Taken at face value, the data in Table 3 rip the heart out of that advice.  These are the toughest courses, and yet huge numbers of the nation’s star students, by any standard aligned with NAEP proficient, would be told that they have failed.  Some parents, misled by the confounding of proficient with grade level, might even mistakenly believe that their kids don’t know grade level math.

Conclusion 

NAEP proficient is not synonymous with grade level.  NAEP officials urge that proficient not be interpreted as reflecting grade level work.  It is a standard set much higher than that.  Scholarly panels have reviewed the NAEP achievement standards and found them flawed.  The highest scoring nations of the world would appear to be mediocre or poor performers if judged by the NAEP proficient standard.  Even large numbers of U.S. calculus students fall short.

As states consider building benchmarks for student performance into accountability systems, they should not use NAEP proficient—or any standard aligned with NAEP proficient—as a benchmark.  It is an unreasonable expectation, one that ill serves America’s students, parents, and teachers--and the effort to improve America’s schools.


[1] Shepard, L. A., Glaser, R., Linn, R., & Bohrnstedt, G. (1993) Setting Performance Standards For Student Achievement: Background Studies. Report of the NAE Panel on the Evaluation of the NAEP Trial State Assessment: An Evaluation of the 1992 Achievement Levels. National Academy of Education. 

[2] Loveless, Tom.  The 2007 Brown Center Report, pages 10-13.

[3] William Doyle, “Why Finland Has The Best Schools,” Los Angeles Times, March 18, 2016.

[4] NCES, America’s High School Graduates: Results of the 2009 NAEP High School Transcript Study.  See Table 8, p. 49.

Authors

Image Source: © Brian Snyder / Reuters
      
 
 




y

Government spending: yes, it really can cut the U.S. deficit


Hypocrisy is not scarce in the world of politics. But the current House and Senate budget resolutions set new lows. Each proposes to cut about $5 trillion from government spending over the next decade in pursuit of a balanced budget. Whatever one may think of putting the goal of reducing spending when the ratio of the debt-to-GDP is projected to be stable above investing in the nation’s future, you would think that deficit-reduction hawks wouldn’t cut spending that has been proven to lower the deficit.

Yes, there are expenditures that actually lower the deficit, typically by many dollars for each dollar spent. In this category are outlays on ‘program integrity’ to find and punish fraud, tax evasion, and plain old bureaucratic mistakes. You might suppose that those outlays would be spared. Guess again. Consider the following:

Medicare. Roughly 10% of Medicare’s $600 billion budget goes for what officials delicately call ‘improper payments, according to the 2014 financial report of the Department of Health and Human Services. Some are improper merely because providers ‘up-code’ legitimate services to boost their incomes. Some payments go for services that serve no valid purpose. And some go for phantom services that were never provided. Whatever the cause, approximately $60 billion of improper payments is not ‘chump change.’

Medicare tries to root out these improper payments, but it lacks sufficient staff to do the job. What it does spend on ‘program integrity’ yields an estimated $14.40? for each dollar spent, about $10 billion a year in total. That number counts only directly measurable savings, such as recoveries and claim denials. A full reckoning of savings would add in the hard-to-measure ‘policeman on the beat’ effect that discourages violations by would-be cheats.

Fat targets remain. A recent report from the Institute of Medicine presented findings that veritably scream ‘fraud.’ Per person spending on durable medical equipment and home health care is ten times higher in Miami-Dade County, Florida than the national average. Such equipment and home health accounts for nearly three-quarters of the geographical variation in per person Medicare spending. Yet, only 4% of current recoveries of improper payments come from audits of these two items and little from the highest spending locations.

Why doesn’t Medicare spend more and go after the remaining overpayments, you may wonder? The simple answer is that Congress gives Medicare too little money for administration. Direct overhead expenses of Medicare amount to only about 1.5% of program outlays—6% if one includes the internal administrative costs of private health plans that serve Medicare enrollees. Medicare doesn’t need to spend as much on administration as the average of 19% spent by private insurers, because for example, Medicare need not pay dividends to private shareholders or advertise.

But spending more on Medicare administration would both pay for itself—$2 for each added dollar spent, according to the conservative estimate in the President’s most recent budget—and improve the quality of care. With more staff, Medicare could stop more improper payments and reduce the use of approved therapies in unapproved ways that do no good and may cause harm.

Taxes. Compare two numbers: $540 billion and $468 billion. The first number is the amount of taxes owed but not paid. The second number is the projected federal budget deficit for 2015, according to the Congressional Budget Office.

Collecting all taxes legally owed but not paid is an impossibility. It just isn’t worth going after every violation. But current enforcement falls far short of practical limits. Expenditures on enforcement directly yields $4 to $6 for each dollar spent on enforcement. Indirect savings are many times larger—the cop-on-the-beat effect again. So, in an era of ostentatious concern about budget deficits, you would expect fiscal fretting in Congress to lead to increased efforts to collect what the law says people owe in taxes.

Wrong again. Between 2010 and 2014, the IRS budget was cut in real terms by 20%. At the same time, the agency had to shoulder new tasks under health reform, as well as process an avalanche of applications for tax exemptions unleashed by the 2010 Supreme Court decision in the Citizens United case. With less money to spend and more to do, enforcement staff dropped by 15% and inflation adjusted collections dropped 13%.

One should acknowledge that enforcement will not do away with most avoidance and evasion. Needlessly complex tax laws are the root cause of most tax underpayment. Tax reform would do even more than improved administration to increase the ratio of taxes paid to taxes due. But until that glorious day when Congress finds the wit and will to make the tax system simpler and fairer, it would behoove a nation trying to make ends meet to spend $2 billion to $3 billion more each year to directly collect $10 billion to 15 billion a year more of legally owed taxes and, almost certainly, raise far more than that by frightening borderline scoff-laws.

Disability Insurance. Thirteen million people with disabling conditions who are judged incapable of engaging in substantial gainful activity received $161 billion in disability insurance in 2013. If the disabling conditions improve enough so that beneficiaries can return to work, benefits are supposed to be stopped. Such improvement is rare. But when administrators believe that there is some chance, the law requires them to check. They may ask beneficiaries to fill out a questionnaire or, in some cases, undergo a new medical exam at government expense. Each dollar spent in these ways generated an estimated $16 in savings in 2013.

Still, the Social Security Administration is so understaffed that SSA has a backlog of 1.3 million disability reviews. Current estimates indicate that spending a little over $1 billion a year more on such reviews over the next decade would save $43 billion. Rather than giving Social Security the staff and spending authority to work down this backlog and realize those savings, Congress has been cutting the agency’s administrative budget and sequestration threatens further cuts.

Claiming that better administration will balance the budget would be wrong. But it would help. And it would stop some people from shirking their legal responsibilities and lighten the burdens of those who shoulder theirs. The failure of Congress to provide enough staff to run programs costing hundreds of billions of dollars a year as efficiently and honestly as possible is about as good a definition of criminal negligence as one can find.

Authors

     
 
 




y

Eurozone desperately needs a fiscal transfer mechanism to soften the effects of competitiveness imbalances


The eurozone has three problems: national debt obligations that cannot be met, medium-term imbalances in trade competitiveness, and long-term structural flaws.

The short-run problem requires more of the monetary easing that Germany has, with appalling shortsightedness, been resisting, and less of the near-term fiscal restraint that Germany has, with equally appalling shortsightedness, been seeking. To insist that Greece meet all of its near-term current debt service obligations makes about as much sense as did French and British insistence that Germany honor its reparations obligations after World War I. The latter could not be and were not honored. The former cannot and will not be honored either.

The medium-term problem is that, given a single currency, labor costs are too high in Greece and too low in Germany and some other northern European countries. Because adjustments in currency values cannot correct these imbalances, differences in growth of wages must do the job—either wage deflation and continued depression in Greece and other peripheral countries, wage inflation in Germany, or both. The former is a recipe for intense and sustained misery. The latter, however politically improbable it may now seem, is the better alternative.

The long-term problem is that the eurozone lacks the fiscal transfer mechanisms necessary to soften the effects of competitiveness imbalances while other forms of adjustment take effect. This lack places extraordinary demands on the willingness of individual nations to undertake internal policies to reduce such imbalances. Until such fiscal transfer mechanisms are created, crises such as the current one are bound to recur.

Present circumstances call for a combination of short-term expansionary policies that have to be led or accepted by the surplus nations, notably Germany, who will also have to recognize and accept that not all Greek debts will be paid or that debt service payments will not be made on time and at originally negotiated interest rates. The price for those concessions will be a current and credible commitment eventually to restore and maintain fiscal balance by the peripheral countries, notably Greece.


Authors

Publication: The International Economy
Image Source: © Vincent Kessler / Reuters
     
 
 




y

The myth behind America’s deficit


Medicare Hospital Insurance and Social Security would not add to deficits because they can’t spend money they don’t have.

The dog days of August have given way to something much worse. Congress returned to session this week, and the rest of the year promises to be nightmarish. The House and Senate passed budget resolutions earlier this year calling for nearly $5 trillion in spending cuts by 2025. More than two-thirds of those cuts would come from programs that help people with low-and moderate-incomes. Health care spending would be halved. If such cuts are enacted, the president will likely veto them. At best, another partisan budget war will ensue after which the veto is sustained. At worst, the cuts become law.

The putative justification for these cuts is that the nation faces insupportable increases in public debt because of expanding budget deficits. Even if the projections were valid, it would be prudent to enact some tax increases in order to preserve needed public spending. But the projections of explosively growing debt are not valid. They are fantasy.

Wait! you say. The Congressional Budget Office has been telling us for years about the prospect of rising deficit and exploding debt. They repeated those warnings just two months ago. Private organizations of both the left and right agree with the CBO’s projections, in general if not in detail. How can any sane person deny that the nation faces a serious long-term budget deficit problem?

The answer is simple: The CBO and private organizations use a convention in preparing their projections that is at odds with established policy and law. If, instead, projections are based on actual current law, as they claim to be, the specter of an increasing debt burden vanishes. What is that convention? Why is it wrong? Why did CBO adopt it, and why have others kept it?

CBO’s budget projections cover the next 75 years. Its baseline projections claim to be based on current law and policy. (CBO also presents an ‘alternative scenario’ based on assumed changes in law and policy). Within that period, Social Security (OASDI) and Medicare Hospital Insurance (HI) expenditures are certain to exceed revenues earmarked to pay for them. Both are financed through trust funds. Both funds have sizeable reserves — government securities — that can be used to cover short falls for a while. But when those reserves are exhausted, expenditures cannot exceed current revenues. Trust fund financing means that neither Social Security nor Medicare Hospital Insurance can run deficits. Nor can they add to the public debt.

Nonetheless, CBO and other organizations assume that Social Security and Medicare Hospital Insurance can and will spend money they don’t have and that current law bars them from spending.

One of the reasons why trust fund financing was used, first for Social Security and then for Medicare Hospital Insurance, was to create a framework that disciplined Congress earmarked to earmark sufficient revenues to pay for benefits it might award. Successive presidents and Congresses, both Republican and Democratic, have repeatedly acted to prevent either program’s cumulative spending from exceeding cumulative revenues. In 1983, for example, faced with an impending trust fund shortfall, Congress cut benefits and raised taxes enough to turn prospective cash flow trust fund deficits into cash flow surpluses. And President Reagan signed the bill. In so doing, they have reaffirmed the discipline imposed by trust fund financing.

Trust fund accounting explains why people now are worrying about the adequacy of funding for Social Security and Medicare. They recognize that the trust funds will be depleted in a couple of decades. They understand that between now and then Congress must either raise earmarked taxes or fashion benefit cuts. If it doesn’t raise taxes, benefits will be cut across the board. Either way, the deficits that CBO and other organizations have built into their budget projections will not materialize.

The implications for projected debt of CBO’s inclusion in its projections of deficits that current law and established policy do not allow are enormous, as the graph below shows.

If one excludes deficits in Social Security and Medicare Hospital Insurance that cannot occur under current law and established policy, the ratio of national debt to gross domestic product will fall, not rise, as CBO budget projections indicate. In other words, the claim that drastic cuts in government spending are necessary to avoid calamitous budget deficits is bogus.

It might seem puzzling that CBO, an agency known for is professionalism and scrupulous avoidance of political bias, would adopt a convention so at odds with law and policy. The answer is straightforward—Congress makes them do it. Section 257 of the Balanced Budget and Emergency Deficit Control Act of 1985 requires CBO to assume that the trust funds can spend money although legislation governing trust fund operations bars such expenditures. CBO is obeying the law.

No similar explanation exonerates the statement of the Committee for a Responsible Federal Budget, which on August 25, 2015 cited, with approval, the conclusion that ‘debt continues to grow unsustainably,’ or that of the Bipartisan Policy Center, which wrote on the same day that ‘America’s debt continues to grow on an unsustainable path.’ Both statements are wrong.

To be sure, the dire budget future anticipated in the CBO projections could materialize. Large deficits could result from an economic calamity or war. Congress could abandon the principle that Social Security and Medicare Hospital Insurance should be financed within trust funds. It could enact other fiscally rash policies. But such deficits do not flow from current law or reflect the trust fund discipline endorsed by both parties over the last 80 years. And it is current law and policy that are supposed to underlie budget projections. Slashing spending because a thirty-year old law requires CBO to assume that Congress will do something it has shown no sign of doing—overturn decades of bipartisan prudence requiring that the major social insurance programs spend only money specifically earmarked for them, and not a penny more—would impose enormous hardship on vulnerable populations in the name of a fiscal fantasy.



Editor's Note: This post originally appeared in Fortune Magazine.

Authors

Publication: Fortune Magazine
Image Source: © Jonathan Ernst / Reuters
     
 
 




y

Why fewer jobless Americans are counting on disability


As government funding for disability insurance is expected to run out next year, Congress should re-evaluate the costs of the program.

Nine million people in America today are receiving Social Security Disability Insurance, double the number in 1995 and six times the number in 1970. With statistics like that, it’s hardly surprising to see some in Congress worry that more will enroll in the program and costs would continue to rise, especially since government funding for disability insurance is expected to run out by the end of next year. If Congress does nothing, benefits would fall by 19% immediately following next year’s presidential election. So, Congress will likely do something. But what exactly should it do?

Funding for disability insurance has nearly run out of money before. Each time, Congress has simply increased the share of the Social Security payroll tax that goes for disability insurance. This time, however, many members of Congress oppose such a shift unless it is linked to changes that curb eligibility and promote return to work. They fear that rolls will keep growing and costs would keep rising, but findings from a report by a government panel conclude that disability insurance rolls have stopped rising and will likely shrink. The report, authored by a panel of the Social Security Advisory Board, is important in that many of the factors that caused disability insurance to rise, particularly during the Great Recession, have ended.

  • Baby-boomers, who added to the rolls as they reached the disability-prone middle age years, are aging out of disability benefits and into retirement benefits. 

  • The decades-long flood of women increased the pool of people with the work histories needed to be eligible for disability insurance. But women’s labor force participation has fallen a bit from pre-Great Recession peaks, and is not expected again to rise materially. 

  • The Great Recession, which led many who lost jobs and couldn’t find work to apply for disability insurance, is over and applications are down. A recession as large as that of 2008 is improbable any time soon. 

  • Approval rates by administrative law judges, who for many years were suspected of being too ready to approve applications, have been falling. Whatever the cause, this stringency augurs a fall in the disability insurance rolls.

Nonetheless, the Disability Insurance program is not without serious flaws. At the front end, employers, who might help workers with emerging impairments remain on the job by providing therapy or training, have little incentive to do either. Employers often save money if workers leave and apply for benefits. Creating a financial incentive to encourage employers to help workers stay active is something both liberals and conservatives can and should embrace. Unfortunately, figuring out exactly how to do that remains elusive.

At the next stage, applicants who are initially denied benefits confront intolerable delays. They must wait an average of nearly two years to have their cases finally decided and many wait far longer. For the nearly 1 million people now in this situation, the effects can be devastating. As long as their application is pending, applicants risk immediate rejection if they engage in ‘substantial gainful activity,’ which is defined as earning more than $1,090 in any month. This virtual bar on work brings a heightened risk of utter destitution. Work skills erode and the chance of ever reentering the workforce all but vanishes. Speeding eligibility determination is vital but just how to do so is also enormously controversial.

For workers judged eligible for benefits, numerous provisions intended to encourage work are not working. People have advanced ideas on how to help workers regain marketplace skills and to make it worthwhile for them to return to work. But evidence that they will work is scant.

The problems are clear enough. As noted, solutions are not. Analysts have come up with a large number of proposed changes in the program. Two task forces, one organized by The Bipartisan Policy Center and one by the Committee for a Responsible Federal Budget, have come up with lengthy menus of possible modifications to the current program. Many have theoretical appeal. None has been sufficiently tested to allow evidence-based predictions on how they would work in practice.

So, with the need to do something to sustain benefits and to do it fast, Congress confronts a program with many problems for which a wide range of untested solutions have been proposed. Studies and pilots of some of these ideas are essential and should accompany the transfer of payroll tax revenues necessary to prevent a sudden and unjustified cut in benefits for millions of impaired people who currently have little chance of returning to work. Implementing such a research program now will enable Congress to improve a program that is vital, but that is acknowledged to have serious problems.

And the good news, delivered by a group of analysts, is that rapid growth of enrollments will not break the bank before such studies can be carried out.



Editor's Note: This post originally appeared on Fortune Magazine.

Authors

Publication: Fortune Magazine
Image Source: © Randall Hill / Reuters
     
 
 




y

Can taxing the rich reduce inequality? You bet it can!


Two recently posted papers by Brookings colleagues purport to show that “even a large increase in the top marginal rate would barely reduce inequality.”[1]  This conclusion, based on one commonly used measure of inequality, is an incomplete and misleading answer to the question posed: would a stand-alone increase in the top income tax bracket materially reduce inequality?  More importantly, it is the wrong question to pose, as a stand-alone increase in the top bracket rate would be bad tax policy that would exacerbate tax avoidance incentives.  Sensible tax policy would package that change with at least one other tax modification, and such a package would have an even more striking effect on income inequality.  In brief:

    • stand-alone increase in the top tax bracket would be bad tax policy, but it would meaningfully increase the degree to which the tax system reduces economic inequality.  It would have this effect even though it would fall on just ½ of 1 percent of all taxpayers and barely half of their income.
    • Tax policy significantly reduces inequality.  But transfer payments and other spending reduce it far more.  In combination, taxes and public spending materially offset the inequality generated by market income.
    • The revenue from a well-crafted increase in taxes on upper-income Americans, dedicated to a prudent expansions of public spending, would go far to counter the powerful forces that have made income inequality more extreme in the United States than in any other major developed economy.

[1] The quotation is from Peter R. Orszag, “Education and Taxes Can’t Reduce Inequality,” Bloomberg View, September 28, 2015 (at http://bv.ms/1KPJXtx). The two papers are William G. Gale, Melissa S. Kearney, and Peter R. Orszag, “Would a significant increase in the top income tax rate substantially alter income inequality?” September 28, 2015 (at http://brook.gs/1KK40IX) and “Raising the top tax rate would not do much to reduce overall income inequality–additional observations,” October 12, 2015 (at http://brook.gs/1WfXR2G). 

Downloads

Authors

Image Source: © Jonathan Ernst / Reuters
     
 
 




y

The impossible (pipe) dream—single-payer health reform


Led by presidential candidate Bernie Sanders, one-time supporters of ‘single-payer’ health reform are rekindling their romance with a health reform idea that was, is, and will remain a dream.  Single-payer health reform is a dream because, as the old joke goes, ‘you can’t get there from here.

Let’s be clear: opposing a proposal only because one believes it cannot be passed is usually a dodge.One should judge the merits. Strong leaders prove their skill by persuading people to embrace their visions. But single-payer is different. It is radical in a way that no legislation has ever been in the United States.

Not so, you may be thinking. Remember such transformative laws as the Social Security Act, Medicare, the Homestead Act, and the Interstate Highway Act. And, yes, remember the Affordable Care Act. Those and many other inspired legislative acts seemed revolutionary enough at the time. But none really was. None overturned entrenched and valued contractual and legislative arrangements. None reshuffled trillions—or in less inflated days, billions—of dollars devoted to the same general purpose as the new legislation. All either extended services previously available to only a few, or created wholly new arrangements.

To understand the difference between those past achievements and the idea of replacing current health insurance arrangements with a single-payer system, compare the Affordable Care Act with Sanders’ single-payer proposal.

Criticized by some for alleged radicalism, the ACA is actually stunningly incremental. Most of the ACA’s expanded coverage comes through extension of Medicaid, an existing public program that serves more than 60 million people. The rest comes through purchase of private insurance in “exchanges,” which embody the conservative ideal of a market that promotes competition among private venders, or through regulations that extended the ability of adult offspring to remain covered under parental plans. The ACA minimally altered insurance coverage for the 170 million people covered through employment-based health insurance. The ACA added a few small benefits to Medicare but left it otherwise untouched. It left unaltered the tax breaks that support group insurance coverage for most working age Americans and their families. It also left alone the military health programs serving 14 million people. Private nonprofit and for-profit hospitals, other vendors, and privately employed professionals continue to deliver most care.

In contrast, Senator Sanders’ plan, like the earlier proposal sponsored by Representative John Conyers (D-Michigan) which Sanders co-sponsored, would scrap all of those arrangements. Instead, people would simply go to the medical care provider of their choice and bills would be paid from a national trust fund. That sounds simple and attractive, but it raises vexatious questions.

  • How much would it cost the federal government? Where would the money to cover the costs come from?
  • What would happen to the $700 billion that employers now spend on health insurance?
  • How would the $600 billion a year reductions in total health spending that Sanders says his plan would generate come from?
  • What would happen to special facilities for veterans and families of members of the armed services?

Sanders has answers for some of these questions, but not for others. Both the answers and non-answers show why single payer is unlike past major social legislation.

The answer to the question of how much single payer would cost the federal government is simple: $4.1 trillion a year, or $1.4 trillion more than the federal government now spends on programs that the Sanders plan would replace. The money would come from new taxes. Half the added revenue would come from doubling the payroll tax that employers now pay for Social Security. This tax approximates what employers now collectively spend on health insurance for their employees...if they provide health insurance. But many don’t. Some employers would face large tax increases. Others would reap windfall gains.

The cost question is particularly knotty, as Sanders assumes a 20 percent cut in spending averaged over ten years, even as roughly 30 million currently uninsured people would gain coverage. Those savings, even if actually realized, would start slowly, which means cuts of 30 percent or more by Year 10. Where would they come from? Savings from reduced red-tape associated with individual insurance would cover a small fraction of this target. The major source would have to be fewer services or reduced prices. Who would determine which of the services physicians regard as desirable -- and patients have come to expect -- are no longer ‘needed’? How would those be achieved without massive bankruptcies among hospitals, as columnist Ezra Klein has suggested, and would follow such spending cuts? What would be the reaction to the prospect of drastic cuts in salaries of health care personnel – would we have a shortage of doctors and nurses? Would patients tolerate a reduction in services? If people thought that services under the Sanders plan were inadequate, would they be allowed to ‘top up’ with private insurance? If so, what happens to simplicity? If not, why not?

Let me be clear: we know that high quality health care can be delivered at much lower cost than is the U.S. norm. We know because other countries do it. In fact, some of them have plans not unlike the one Senator Sanders is proposing. We know that single-payer mechanisms work in some countries. But those systems evolved over decades, based on gradual and incremental change from what existed before. That is the way that public policy is made in democracies. Radical change may occur after a catastrophic economic collapse or a major war. But in normal times, democracies do not tolerate radical discontinuity. If you doubt me, consider the tumult precipitated by the really quite conservative Affordable Care Act.


Editor's note: This piece originally appeared in Newsweek.

Authors

Publication: Newsweek
Image Source: © Jim Young / Reuters
      
 
 




y

What America’s retirees really deserve


Social Security faces a financial shortfall. If Congress does nothing about it, current projections indicate that benefits will be cut automatically by 21 percent in 2034. Congress could close the gap by raising revenues, lowering benefits, or doing some of both. If benefits seem generous, Congress is likely to lean toward benefit cuts more than revenue increases. If they seem stingy, then the reverse.

Given the split between the two parties on whether to cut benefits or to raise them, evidence on the adequacy of benefits is central to this key policy debate. Those perceptions will help determine whether Social Security continues to provide basic retirement income for workers with comparatively low earnings histories and a foundation of retirement income for most others or it will become just a minimal safety-net backstop against extreme destitution?

Down-in-the-weeds disagreements among analysts often seem too arcane for anyone other than specialists. But sometimes they are too important to ignore. A current debate about the adequacy of Social Security benefits is an example.

The not-so-simple question is this: are Social Security benefits ‘generous’ or ‘stingy’? To answer this question, people long looked to the Office of the Social Security Actuary. For many years that office published estimates of something called the ‘replacement rate’—that is, how high are benefits paid to retirees and the disabled relative what they earned during their working years. A 2014 retiree with median earnings had average lifetime earnings of about $46,000. That worker qualified for a benefit at age 66 of about $19,000, a replacement rate of about 41%. Replacement rates vary with earnings. Dollar benefits rise with earnings, but they rise less than proportionately. As a result, replacement rates of low earners are higher than replacement rates of high earners.

As you might suppose, there are many ways in which to compute such ‘replacement rates. Because of analytical disputes on which method is best, the Social Security trustees in 2014 decided to stop including replacement rate estimates in their annual reports.

In December 2015, the Congressional Budget Office (CBO) offered what it considered a better measure of the generosity of Social Security. It estimated that replacement rates for middle income recipients were about 60%–dramatically higher than the 41% that the Social Security Trustees had estimated.

The gap between the estimates of CBO and those of Social Security is even larger than it seems. To see why, one needs to recognize that to sustain living standards retirees on average need only about 75% to 80% as much income as they did when working. Retirees need less income because they are spared some work-related expenses, such as transportation to and from work. Those are only average of course; some need more, some less.

If one believed the SSA actuaries, Social Security provides median earners barely more than half of what they need to be as well off as they were when working. Benefit cuts from that modest level would threaten the well-being for the majority of retirees who are entirely or mostly dependent on Social Security benefits—and especially for those with large medical expenses uncovered by Medicare.

On the other hand, if one accepted CBO’s estimates, Social Security provids more than three-quarters of the retirement income target. Against that baseline, benefit cuts would still sting, but they would pose less of a threat, and not much of a threat at all for most retirees who have some income from private pensions or personal savings.

When the CBO estimates came out, conservative commentators welcomed the findings and cited CBO’s well-established and well-earned reputation for objectivity. They correctly noted that many retirees have additional income from private pensions, 401ks, or other personal savings, and asserted that there was no general retirement income shortage. By inference, cutting benefits a bit to help close the long-term funding gap would be no big deal. Social Security advocates were put on the defensive, hard-pressed to challenge the estimates of the widely-respected Congressional Budget Office.

But earlier this year, CBO acknowledged that it had made mistakes in its Decameter estimates and revised them. The new CBO estimate put the replacement rate for middle-level earners at around 42%, almost the same as the estimate of the Social Security actuaries, not the much higher level that had sent ripples through the policy community. One conservative analyst, Andrew Biggs, who had trumpeted the initial CBO finding in The Wall Street Journal, promptly and honorably retracted his article.

Two aspects of this green-eyeshade kerfuffle stand out. The first is that policy debates often depend on obscure technical analyses that are, in turn, remarkably sensitive to ‘black-box’ methods to which few or no outsiders have ready access. The second is that CBO burnished its reputation for honesty by owning up to its own mistakes — in this case, a whopping overestimate of a key number. Such candor is all too rare; it merits notice and praise.

But there is a broader lesson as well. Technical issues of comparable complexity surround numerous current political disputes. Is Bernie Sanders’ single-payer plan affordable? Will Marco Rubio’s tax plan cause deficits to balloon? To vote rationally, people must struggle to see through the rhetorical chaff that surrounds candidates’ favorite claims. There is, alas, no substitute for paying close attention to the data, even if they are ‘down in the weeds.’


Editor's note: This piece originally appeared in Fortune.

Authors

Publication: Fortune
Image Source: Ho New
      
 
 




y

How to fix the backlog of disability claims


The American people deserve to have a federal government that is both responsive and effective. That simply isn’t the case for more than 1 million people who are awaiting the adjudication of their applications for disability benefits from the Social Security Administration.

Washington can and must do better. This gridlock harms applicants either by depriving them of much-needed support or effectively barring them from work while their cases are resolved because having any significant earnings would immediately render them ineligible. This is unacceptable.

Within the next month, the Government Accountability Office, the nonpartisan congressional watchdog, will launch a study on the issue. More policymakers should follow GAO’s lead. A solution to this problem is long overdue. Here’s how the government can do it.

Congress does not need to look far for an example of how to reduce the SSA backlog. In 2013, the Veterans Administration cut its 600,000-case backlog by 84 percent and reduced waiting times by nearly two-thirds, all within two years. It’s an impressive result.

Why have federal officials dealt aggressively and effectively with that backlog, but not the one at SSA? One obvious answer is that the American people and their representatives recognize a debt to those who served in the armed forces. Allowing veterans to languish while a sluggish bureaucracy dithers is unconscionable. Public and congressional outrage helped light a fire under the bureaucracy. Administrators improved services the old-fashioned way — more staff time. VA employees had to work at least 20 hours overtime per month.

Things are a bit more complicated at SSA, unfortunately. Roughly three quarters of applicants for disability benefits have their cases decided within about nine months and, if denied, decide not to appeal. But those whose applications are denied are legally entitled to ask for a hearing before an administrative law judge — and that is where the real bottleneck begins.

There are too few ALJs to hear the cases. Even in the best of times, maintaining an adequate cadre of ALJs is difficult because normal attrition means that SSA has to hire at least 100 ALJs a year to stay even. When unemployment increases, however, so does the number of applications for disability benefits. After exhausting unemployment benefits, people who believe they are impaired often turn to the disability programs. So, when the Great Recession hit, SSA knew it had to hire many more ALJs. It tried to do so, but SSA cannot act without the help of the Office of Personnel Management, which must provide lists of qualified candidates before agencies can hire them. SSA employs 85 percent of all ALJs and for several years has paid OPM approximately $2 million annually to administer the requisite tests and interviews to establish a register of qualified candidates. Nonetheless, OPM has persistently refused to employ legally trained people to vet ALJ candidates or to update registers. And when SSA sought to ramp up ALJ hiring to cope with the recession challenge, OPM was slow to respond.

In 2009, for example, OPM promised to supply a new register containing names of ALJ candidates. Five years passed before it actually delivered the new list of names. For a time, the number of ALJs deciding cases actually fell. The situation got so bad that the president’s January 2015 budget created a work group headed by the Office of Management and Budget and the Administrative Conference of the United States to try to break the logjam. OPM promised a list for 2015, but insisted it could not change procedures. Not trusting OPM to mend its ways, Congress in October 2015 enacted legislation that explicitly required OPM to administer a new round of tests within the succeeding six months.

These stopgap measures are inadequate to the challenge. Both applicants and taxpayers deserve prompt adjudication of the merits of claims. The million-person backlog and the two-year average waits are bad enough. Many applicants wait far longer. Meanwhile, they are strongly discouraged from working, as anything more than minimal earnings will cause their applications automatically to be denied. Throughout this waiting period, applicants have no means of self-support. Any skills applicants retain atrophy.

The shortage of ALJs is not the only problem. The quality and consistency of adjudication by some ALJs has been called into question. For example, differences in approval rates are so large that differences among applicants cannot plausibly explain them. Some ALJs have processed so many cases that they could not possibly have applied proper standards. In recognition of both problems, SSA has increased oversight and beefed up training. The numbers have improved. But large and troubling variations in workloads and approval rates persist.

For now, political polarization blocks agreement on whether and how to modify eligibility rules and improve incentives to encourage work by those able to work. But there is bipartisan agreement that dragging out the application process benefits no one. While completely eliminating hearing delays is impossible, adequate administrative funding and more, better trained hearing officers would help reduce them. Even if OPM’s past record were better than it is, OPM is now a beleaguered agency, struggling to cope with the fallout from a security breach that jeopardizes the security of the nation and the privacy of millions of current and past federal employees and federal contractors. Mending this breach and establishing new procedures will — and should — be OPM’s top priority.

That’s why, for the sake of everyone concerned, responsibility for screening candidates for administrative law judge positions should be moved, at least temporarily, to another agency, such as the Administrative Conference of the United States. Shortening the period that applicants for disability benefits now spend waiting for a final answer is an achievable goal that can and should be addressed. Our nation’s disabled and its taxpayers deserve better.


Editor's note: This piece originally appeared in Politico.

Authors

Publication: Politico
      
 
 




y

Recent Social Security blogs—some corrections


Recently, Brookings has posted two articles commenting on proposals to raise the full retirement age for Social Security retirement benefits from 67 to 70. One revealed a fundamental misunderstanding of how the program actually works and what the effects of the policy change would be. The other proposes changes to the system that would subvert the fundamental purpose of the Social Security in the name of ‘reforming’ it.

A number of Republican presidential candidates and others have proposed raising the full retirement age. In a recent blog, Robert Shapiro, a Democrat, opposed this move, a position I applaud. But he did so based on alleged effects the proposal would in fact not have, and misunderstanding about how the program actually works. In another blog, Stuart Butler, a conservative, noted correctly that increasing the full benefit age would ‘bolster the system’s finances,’ but misunderstood this proposal’s effects. He proposed instead to end Social Security as a universal pension based on past earnings and to replace it with income-related welfare for the elderly and disabled (which he calls insurance).

Let’s start with the misunderstandings common to both authors and to many others. Each writes as if raising the ‘full retirement age’ from 67 to 70 would fall more heavily on those with comparatively low incomes and short life expectancies. In fact, raising the ‘full retirement age’ would cut Social Security Old-Age Insurance benefits by the same proportion for rich and poor alike, and for people whose life expectancies are long or short. To see why, one needs to understand how Social Security works and what ‘raising the full retirement age’ means.

People may claim Social Security retirement benefits starting at age 62. If they wait, they get larger benefits—about 6-8 percent more for each year they delay claiming up to age 70. Those who don’t claim their benefits until age 70 qualify for benefits -- 77 percent higher than those with the same earnings history who claim at age 62. The increments approximately compensate the average person for waiting, so that the lifetime value of benefits is independent of the age at which they claim. Mechanically, the computation pivots on the benefit payable at the ‘full retirement age,’ now age 66, but set to increase to age 67 under current law. Raising the full retirement age still more, from 67 to 70, would mean that people age 70 would get the same benefit payable under current law at age 67. That is a benefit cut of 24 percent. Because the annual percentage adjustment for waiting to claim would be unchanged, people who claim benefits at any age, down to age 62, would also receive benefits reduced by 24 percent.

In plain English, ‘raising the full benefit age from 67 to 70' is simply a 24 percent across-the-board cut in benefits for all new claimants, whatever their incomes and whatever their life-expectancies.

Thus, Robert Shapiro mistakenly writes that boosting the full-benefit age would ‘effectively nullify Social Security for millions of Americans’ with comparatively low life expectancies. It wouldn’t. Anyone who wanted to claim benefits at age 62 still could. Their benefits would be reduced. But so would benefits of people who retire at older ages.

Equally mistaken is Stuart Butler’s comment that increasing the full-benefit age from 67 to 70 would ‘cut total lifetime retirement benefits proportionately more for those on the bottom rungs of the income ladder.’ It wouldn’t. The cut would be proportionately the same for everyone, regardless of past earnings or life expectancy.

Both Shapiro and Butler, along with many others including my other colleagues Barry Bosworth and Gary Burtless, have noted correctly that life expectancies of high earners have risen considerably, while those of low earners have risen little or not at all. As a result, the lifetime value of Social Security Old-Age Insurance benefits has grown more for high- than for low-earners. That development has been at least partly offset by trends in Social Security Disability Insurance, which goes disproportionately to those with comparatively low earnings and life expectancies and which has been growing far faster than Old-Age Insurance, the largest component of Social Security.

But even if the lifetime value of all Social Security benefits has risen faster for high earners than for low earners, an across the board cut in benefits does nothing to offset that trend. In the name of lowering overall Social Security spending, it would cut benefits by the same proportion for those whose life expectancies have risen not at all because the life expectancy of others has risen. Such ‘evenhandeness’ calls to mind Anatole France’s comment that French law ‘in its majestic equality, ...forbids rich and poor alike to sleep under bridges, beg in streets, or steal loaves of bread.’

Faulty analyses, such as those of Shapiro and Butler, cannot conceal a genuine challenge to policy makers. Social Security does face a projected, long-term funding shortfall. Trends in life expectancies may well have made the system less progressive overall than it was in the past. What should be done?

For starters, one needs to recognize that for those in successive age cohorts who retire at any given age, rising life expectancy does not lower, but rather increases their need for Social Security retirement benefits because whatever personal savings they may have accumulated gets stretched more thinly to cover more retirement years.

For those who remain healthy, the best response to rising longevity may be to retire later. Later retirement means more time to save and fewer years to depend on savings. Here is where the wrong-headedness of Butler’s proposal, to phase down benefits for those with current incomes of $25,000 or more and eliminate them for those with incomes over $100,000, becomes apparent. The only source of income for full retirees is personal savings and, to an ever diminishing degree, employer-financed pensions. Converting Social Security from a program whose benefits are based on past earnings to one that is based on current income from savings would impose a tax-like penalty on such savings, just as would a direct tax on those savings. Conservatives and liberals alike should understand that taxing something is not the way to encourage it.

Still, working longer by definition lowers retirement income needs. That is why some analysts have proposed raising the age at which retirement benefits may first be claimed from age 62 to some later age. But this proposal, like across-the-board benefit cuts, falls alike on those who can work longer without undue hardship and on those in physically demanding jobs they can no longer perform, those whose abilities are reduced, and those who have low life expectancies. This group includes not only blue-collar workers, but also many white-collar employees, as indicated by a recent study of the Boston College Retirement Center. If entitlement to Social Security retirement benefits is delayed, it is incumbent on policymakers to link that change to other ‘backstop’ policies that protect those for whom continued work poses a serious burden. It is also incumbent on private employers to design ways to make workplaces friendlier to an aging workforce.

The challenge of adjusting Social Security in the face of unevenly distributed increases in longevity, growing income inequality, and the prospective shortfall in Social Security financing is real. The issues are difficult. But solutions are unlikely to emerge from confusion about the way Social Security operates and the actual effects of proposed changes to the program. And it will not be advanced by proposals that would bring to Social Security the failed Vietnam War strategy of destroying a village in order to save it.

Authors

Image Source: © Sam Mircovich / Reuters
      
 
 




y

Disability insurance: The Way Forward


Editor’s note: The remarks below were delivered to the Committee for a Responsible Federal Budget on release of their report on the SSDI Solutions Initiative

I want to thank Marc Goldwein for inviting me to join you for today’s event. We all owe thanks to Jim McCrery and Earl Pomeroy for devoting themselves to the SSDI Solutions Initiative, to the staff of CFRB who backed them up, and most of all to the scholars and practitioners who wrote the many papers that comprise this effort. This is the sort of practical, problem-solving enterprise that this town needs more of. So, to all involved in this effort, ‘hats off’ and ‘please, don’t stop now.’

The challenge of improving how public policy helps people with disabilities seemed urgent last year. Depletion of the Social Security Disability Insurance trust loomed. Fears of exploding DI benefit rolls were widespread and intense.

Congress has now taken steps that delay projected depletion until 2022. Meticulous work by Jeffrey Liebman suggests that Disability Insurance rolls have peaked and will start falling. The Technical Panel appointed by the Social Security Advisory Board, concurred in its 2015 report. With such ‘good’ news, it is all too easy to let attention drift to other seemingly more pressing items.

But trust fund depletion and growing beneficiary rolls are not the most important reasons why policymakers should be focusing on these programs.

The primary reason is that the design and administration of disability programs can be improved with benefit to taxpayers and to people with disabilities alike. And while 2022 seems a long time off, doing the research called for in the SSDI Solutions Initiative will take all of that time and more. So, it is time to get to work, not to relax.

Before going any further, I must make a disclaimer. I was invited to talk here as chair of the Social Security Advisory Board. Everything I am going to say from now on will reflect only my personal views, not those of the other members or staff of the SSAB except where the Board has spoken as a group. The same disclaimer applies to the trustees, officers, and other staff of the Brookings Institution. Blame me, not them.

Let me start with an analogy. We economists like indices. Years ago, the late Arthur Okun came up with an index to measure how much pain the economy was inflicting on people. It was a simple index, just the sum of inflation and the unemployment rate. Okun called it the ‘misery index.’

I suggest a ‘policy misery index’—a measure of the grief that a policy problem causes us. It is the sum of a problem’s importance and difficulty. Never mind that neither ‘importance’ nor ‘difficulty’ is quantifiable. Designing and administering interventions intended to improve the lives of people with disabilities has to be at or near the top of the policy misery index.

Those who have worked on disability know what I mean. Programs for people with disabilities are hugely important and miserably hard to design and administer well. That would be true even if legislators were writing afresh on a blank legislative sheet. That they must cope with a deeply entrenched program about which analysts disagree and on which many people depend makes the problems many times more challenging.

I’m going to run through some of the reasons why designing and administering benefits for people determined to be disabled is so difficult. Some may be obvious, even banal, to the highly informed group here today. And you will doubtless think of reasons I omit.

First, the concept of disability, in the sense of a diminished capacity to work, has no clear meaning, the SSA definition of disability notwithstanding. We can define impairments. Some are so severe that work or, indeed, any other form of self-support seems impossible. But even among those with severe impairments, some people work for pay, and some don’t.

That doesn’t mean that if someone with a given impairment works, everyone with that same impairment could work if they tried hard enough. It means that physical or mental impairments incompletely identify those for whom work is not a reasonable expectation. The possibility of work depends on the availability of jobs, of services to support work effort, and of a host of personal characteristics, including functional capacities, intelligence, and grit.

That is not how the current disability determination process works. It considers the availability of jobs in the national, not the local, economy. It ignores the availability of work supports or accommodations by potential employers.

Whatever eligibility criteria one may establish for benefits, some people who really can’t work, or can’t earn enough to support themselves, will be denied benefits. And some will be awarded benefits who could work.

Good program design helps keep those numbers down. Good administration helps at least as much as, and maybe more than, program design. But there is no way to reduce the number of improper awards and improper denials to zero.

Second, the causes of disability are many and varied. Again, this observation is obvious, almost banal. Genetic inheritance, accidents and injuries, wear and tear from hard physical labor, and normal aging all create different needs for assistance.

These facts mean that people deemed unable to work have different needs. They constitute distinct interest groups, each seeking support, but not necessarily of the same kind. These groups sometimes compete with each other for always-limited resources. And that competition means that the politics of disability benefits are, shall we say, interesting.

Third, the design of programs to help people deemed unable to work is important and difficult. Moral hazard is endemic. Providing needed support and services is an act of compassion and decency. The goal is to provide such support and services while preserving incentives to work and to controlling costs borne by taxpayers.

But preserving work incentives is only part of the challenge. The capacity to work is continuous, not binary. Training and a wide and diverse range of services can help people perform activities of daily living and work.

Because resources are scarce, policy makers and administrators have to sort out who should get those services. Should it be those who are neediest? Those who are most likely to recover full capacities? Triage is inescapable. It is technically difficult. And it is always ethically fraught.

Designing disability benefit programs is hard. But administering them well is just as important and at least as difficult.

These statements may also be obvious to those who here today. But recent legislation and administrative appropriations raise doubts about whether they are obvious to or accepted by some members of Congress.

Let’s start with program design. We can all agree, I think, that incentives matter. If benefits ceased at the first dollar earned, few who come on the rolls would ever try to work.

So, Congress, for many years, has allowed beneficiaries to earn any amount for a brief period and small amounts indefinitely without losing eligibility. Under current law, there is a benefit cliff. If—after a trial work period—beneficiaries earn even $1 more than what is called substantial gainful activity, $1,130 in 2016, their benefit checks stop. They retain eligibility for health coverage for a while even after they leave the rolls. And for an extended period they may regain cash and health benefits without delay if their earnings decline.

Members of Congress have long been interested in whether a more gradual phase-out of benefits as earnings rise might encourage work. Various aspects of the current Disability Insurance program reflect Congress’s desire to encourage work.

The so-called Benefit Offset National Demonstration—or BOND—was designed to test the impact on labor supply by DI beneficiaries of one formula—replacing the “cliff” with a gradual reduction in benefits: $1 of benefit last for each $2 of earnings above the Substantial Gainful Activity level.

Alas, there were problems with that demonstration. It tested only one offset scenario – one starting point and one rate. So, there could be no way of knowing whether a 2-for-1 offset was the best way to encourage work.

And then there was the uncomfortable fact that, at the time of the last evaluation, out of 79,440 study participants only 21 experienced the offset. So there was no way of telling much of anything, other than that few people had worked enough to experience the offset.

Nor was the cause of non-response obvious. It is not clear how many demonstration participants even understood what was on offer.

Unsurprisingly, members of Congress interested in promoting work among DI recipients asked SSA to revisit the issue. The 2015 DI legislation mandates a new demonstration, christened the Promoting Opportunity Demonstration, or POD. POD uses the same 2 for 1 offset rate that BOND did, but the offset starts at an earnings level at or below earnings of $810 a month in 2016—which is well below the earnings at which the BOND phase-out began.

Unfortunately, as Kathleen Romig has pointed out in an excellent paper for the Center on Budget and Policy Priorities, this demonstration is unlikely to yield useful results. Only a very few atypical DI beneficiaries are likely to find it in their interest to participate in the demonstration, fewer even than in the BOND. That is because the POD offset begins at lower earnings than the BOND offset did. In addition, participants in POD sacrifice the right under current law that permits people receiving disability benefits to earn any amount for 9 months of working without losing any benefits.

Furthermore, the 2015 law stipulated that no Disability Insurance beneficiary could be required to participate in the demonstration or, having agreed to participate, forced to remain in the demonstration. Thus, few people are likely to respond to the POD or to remain in it.

There is a small group to whom POD will be very attractive—those few DI recipients who retain a lot of earning capacity. The POD will allow them to retain DI coverage until their earnings are quite high. For example, a person receiving a $2,000 monthly benefit—well above the average, to be sure, but well below the maximum—would remain eligible for some benefits until his or her annual earnings exceeded $57,700. I don’t know about you, but I doubt that Congress would favorably consider permanent law of this sort.

Not only would those participating be a thin and quite unrepresentative sample of DI beneficiaries in general, or even of those with some earning capacity, but selection bias resulting from the opportunity to opt out at any time would destroy the external validity of any statistical results.

Let me be clear. My comments on POD, the demonstration mandated in the 2015 legislation, are not meant to denigrate the need for, or the importance of, research on how to encourage work by DI recipients, especially those for whom financial independence is plausible. On the contrary, as I said at the outset, research is desperately needed on this issue, as well as many others. It is not yet too late to authorize a research design with a better chance of producing useful results.

But it will be too late soon. Fielding demonstrations takes time:

  • to solicit bids from contractors,
  • for contractors to formulate bids,
  • for government boards to select the best one,
  • for contractors to enroll participants,
  • for contractors to administer the demonstration,
  • and for analysts to process the data generated by the demonstrations.

That process will take all the time available between now and 2021 or 2022 when the DI trust fund will again demand attention. It will take a good deal more time than that to address the formidable and intriguing research agenda of SSDI Solutions Initiative.

I should like to conclude with plugs for two initiatives to which the Social Security Advisory Board has been giving some attention.

It takes too long for disability insurance applicants to have their cases decided. Perhaps the whole determination process should be redesigned. One of the CFRB papers proposes just that. But until that happens, it is vital to shorten the unconscionable delays separating initial denials and reconsideration from hearings before administrative law judges to which applicants are legally entitled. Procedural reforms in the hearing process might help. More ALJs surely will.

The 2015 budget act requires the Office of Personnel Management to take steps that will help increase the number of ALJs hired. I believe that the new director, Beth Colbert, is committed to reforms. But it is very hard to change legal interpretations that have hampered hiring for years and the sluggish bureaucratic culture that fostered them.

So, the jury is out on whether OPM can deliver. In a recent op-ed in Politico, Lanhee Chen, a Republican member of the SSAB, and I jointly endorsed urged Congress to be ready, if OPM fails to deliver on more and better lists of ALJ candidates and streamlined procedures for their appointment, to move the ALJ examination authority to another federal organization, such as the Administrative Conference of the United States.

Lastly, there is a facet of income support policy that we on the SSAB all agree merits much more attention than it has received. Just last month, the SSAB released a paper entitled Representative Payees: A Call to Action. More than eight million beneficiaries have been deemed incapable of managing $77 billion in benefits that the Social Security Administration provided them in 2014.

We believe that serious concern is warranted about all aspects of the representative payee program—how this infringement of personal autonomy is found to be necessary, how payees are selected, and how payee performance is monitored.

Management of representative payees is a particular challenge for the Social Security Administration. Its primary job is to pay cash benefits in the right amount to the right person at the right time. SSA does that job at rock-bottom costs and with remarkable accuracy. It is handing rapidly rising workloads with budgets that have barely risen. SSA is neither designed nor staffed to provide social services. Yet determining the need for, selecting, and monitoring representative payees is a social service function.

As the Baby Boom ages, the number of people needing help in administering cash benefits from the Social Security Administration—and from other agencies such as the Veterans Administration—will grow. So will the number needing help in making informed choices under Medicare and Medicaid.

The SSAB is determined to look into this challenge and to make constructive suggestions. We are just beginning and invite others to join in studying what I have called “the most important problem the public has never heard of.”

Living with disabilities today is markedly different from what it was in 1956 when the Disability Insurance program began. Yet, the DI program has changed little. Beneficiaries and taxpayers are pay heavily the failure of public policy to apply what has been learned over the past six decades about health, disability, function, and work.

I hope that SSA and Congress will use well the time until it next must legislate on Disability Insurance. The DI rolls are stabilizing. The economy has grown steadily since the Great Recession. Congress has reinstated demonstration authority. With adequate funding for research and testing, the SSA can rebuild its research capability. Along with the external research community, it can identify what works and help Congress improve the DI program for beneficiaries and taxpayers alike. The SSDI Solutions Initiative is a fine roadmap.

Authors

Publication: Committee for a Responsible Federal Budget
Image Source: © Max Whittaker / Reuters
      
 
 




y

A tribute to longtime Brookings staff member Kathleen Elliott Yinug

Only days before her retirement at age 71, Kathleen Elliott Yinug succumbed to a recurrence of cancer, which had been in remission for fifteen years. Over a Brookings career spanning four decades, she not only assisted several members of the Brookings community, but also became their valued friend. A woman of intelligence and liberal values, she elicited, demanded, and merited the respect of all with whom she worked.

After college, she joined the Peace Corps and was sent to the island of Yap. There she met her husband to be and there her son, Falan, was born. The family returned to the United States so that her husband could attend law school. Kathleen came to work at Brookings, helping to support her husband's law school training. When he returned to Yap, Kathleen assumed all parental responsibility. Her son has grown into a man of character, a devoted husband and father of two daughters. He and his wife, Louise, with compassion and generosity, made their home Kathleen's refuge during her final illness. Over extended periods, she held second jobs to supplement her Brookings income.

Her personal warmth, openness, and personal integrity made her a natural confidante of senior fellows, staff assistants, and research assistants, alike. She demanded and received respect from all. Her judgment on those who did not meet her standards was blunt and final; on one occasion, she 'fired'—that is, flatly refused to work with—one senior staff member whose behavior and values she rightly deplored.

With retirement approaching, Kathleen bought a condominium in Maine, a place she had come to love after numerous visits with her long-time friend, Lois Rice. After additional visits, her affection for Maine residents and the community she had chosen deepened. She spoke with intense yearning for the post-retirement time when she could take up life in her new home. That she was denied that time is a cruel caprice of life and only deepens the sense of loss of those who knew and loved her.

Authors

      
 
 




y

Iraqi Shia leaders split over loyalty to Iran

       




y

Not just a typographical change: Why Brookings is capitalizing Black

Brookings is adopting a long-overdue policy to properly recognize the identity of Black Americans and other people of ethnic and indigenous descent in our research and writings. This update comes just as the 1619 Project is re-educating Americans about the foundational role that Black laborers played in making American capitalism and prosperity possible. Without Black…

       




y

Webinar: COVID-19 and the economy

With more than 1,000 deaths, 3 million and counting unemployed, and no definite end in sight, the coronavirus has upended nearly every aspect of American life. In the last two weeks, the Federal Reserve and Congress scrambled to pass policies to mitigate what will be a very deep recession. Americans across the country are asking—…

       




y

COVID-19 outbreak highlights critical gaps in school emergency preparedness

The COVID-19 epidemic sweeping the globe has affected millions of students, whose school closures have more often than not caught them, their teachers, and families by surprise. For some, it means missing class altogether, while others are trialing online learning—often facing difficulties with online connections, as well as motivational and psychosocial well-being challenges. These problems…

       




y

Poll shows American views on Muslims and the Middle East are deeply polarized

A recent public opinion survey conducted by Brookings non-resident senior fellow Shibley Telhami sparked headlines focused on its conclusion that American views of Muslims and Islam have become favorable. However, the survey offered another important finding that is particularly relevant in this political season: evidence that the cleavages between supporters of Hillary Clinton and Donald Trump, respectively, on Muslims, Islam, and the Israeli-Palestinians peace process are much deeper than on most other issues.

      
 
 




y

Youth unemployment in Egypt: A ticking time bomb

Earlier this week, a satirical Facebook post announced that the Egyptian Army engineers have developed an Egyptian dollar to combat the continued rise of the U.S. dollar. The new and improved $100 note features Egyptian President Abdel-Fattah el-Sissi’s photo instead of Benjamin Franklin’s. Another post shows a video of Karam, a simple man from upper Egypt, revealing his secret […]

      
 
 




y

A better way to counter violent extremism

      
 
 




y

Minding the gap: A multi-layered approach to tackling violent extremism

      
 
 




y

An agenda for reducing poverty and improving opportunity


SUMMARY:
With the U.S. poverty rate stuck at around 15 percent for years, it’s clear that something needs to change, and candidates need to focus on three pillars of economic advancement-- education, work, family -- to increase economic mobility, according to Brookings Senior Fellow Isabel Sawhill and Senior Research Assistant Edward Rodrigue.

“Economic success requires people’s initiative, but it also requires us, as a society, to untangle the web of disadvantages that make following the sequence difficult for some Americans. There are no silver bullets. Government cannot do this alone. But government has a role to play in motivating individuals and facilitating their climb up the economic ladder,” they write.

The pillar of work is the most urgent, they assert, with every candidate needing to have concrete jobs proposals. Closing the jobs gap (the difference in work rates between lower and higher income households) has a huge effect on the number of people in poverty, even if the new workers hold low-wage jobs. Work connects people to mainstream institutions, helps them learn new skills, provides structure to their lives, and provides a sense of self-sufficiency and self-respect, while at the aggregate level, it is one of the most important engines of economic growth. Specifically, the authors advocate for making work pay (EITC), a second-earner deduction, childcare assistance and paid leave, and transitional job programs. On the education front, they suggest investment in children at all stages of life: home visiting, early childhood education, new efforts in the primary grades, new kinds of high schools, and fresh policies aimed at helping students from poor families attend and graduate from post-secondary institutions. And for the third prong, stable families, Sawhill and Rodrique suggest changing social norms around the importance of responsible, two-person parenthood, as well as making the most effective forms of birth control (IUDs and implants) more widely available at no cost to women.

“Many of our proposals would not only improve the life prospects of less advantaged children; they would pay for themselves in higher taxes and less social spending. The candidates may have their own blend of responses, but we need to hear less rhetoric and more substantive proposals from all of them,” they conclude.

Downloads

Authors

     
 
 




y

Campaign 2016: Ideas for reducing poverty and improving economic mobility


We can be sure that the 2016 presidential candidates, whoever they are, will be in favor of promoting opportunity and cutting poverty. The question is: how? In our contribution to a new volume published today, “Campaign 2016: Eight big issues the presidential candidates should address,” we show that people who clear three hurdles—graduating high school, working full-time, and delaying parenthood until they in a stable, two-parent family—are very much more likely to climb to middle class than fall into poverty:

But what specific policies would help people achieve these three benchmarks of success?  Our paper contains a number of ideas that candidates might want to adopt. Here are a few examples: 

1. To improve high school graduation rates, expand “Small Schools of Choice,” a program in New York City, which replaced large, existing schools with more numerous, smaller schools that had a theme or focus (like STEM or the arts). The program increased graduation rates by about 10 percentage points and also led to higher college enrollment with no increase in costs.

2. To support work, make the Child and Dependent Care Tax Credit (CDCTC) refundable and cap it at $100,000 in household income. Because the credit is currently non-refundable, low-income families receive little or no benefit, while those with incomes above $100,000 receive generous tax deductions. This proposal would make the program more equitable and facilitate low-income parents’ labor force participation, at no additional cost.

3. To strengthen families, make the most effective forms of birth control (IUDs and implants) more widely available at no cost to women, along with good counselling and a choice of all FDA-approved methods. Programs that have done this in selected cities and states have reduced unplanned pregnancies, saved money, and given women better ability to delay parenthood until they and their partners are ready to be parents. Delayed childbearing reduces poverty rates and leads to better prospects for the children in these families.

These are just a few examples of good ideas, based on the evidence, of what a candidate might want to propose and implement if elected. Additional ideas and analysis will be found in our longer paper on this topic.

Authors

Image Source: © Darren Hauck / Reuters
     
 
 




y

The District’s proposed law shows the wrong way to provide paid leave


The issue of paid leave is heating up in 2016. At least two presidential candidates — Democrat Hillary Clinton and Republican Sen. Marco Rubio (Fla.) — have proposed new federal policies. Several states and large cities have begun providing paid leave to workers when they are ill or have to care for a newborn child or other family member.

This forward movement on paid-leave policy makes sense. The United States is the only advanced country without a paid-leave policy. While some private and public employers already provide paid leave to their workers, the workers least likely to get paid leave are low-wage and low-income workers who need it most. They also cannot afford to take unpaid leave, which the federal government mandates for larger companies.

Paid leave is good for the health and development of children; it supports work, enabling employees to remain attached to the labor force when they must take leave; and it can lower costly worker turnover for employers. Given the economic and social benefits it provides and given that the private market will not generate as much as needed, public policies should ensure that such leave is available to all.

But it is important to do so efficiently, so as not to burden employers with high costs that could lead them to substantially lower wages or create fewer jobs.

States and cities that require employers to provide paid sick days mandate just a small number, usually three to seven days. Family or temporary disability leaves that must be longer are usually financed through small increases in payroll taxes paid by workers and employers, rather than by employer mandates or general revenue.

Policy choices could limit costs while expanding benefits. For instance, states should limit eligibility to workers with experience, such as a year, and it might make sense to increase the benefit with years of accrued service to encourage labor force attachment. Some states provide four to six weeks of family leave, though somewhat larger amounts of time may be warranted, especially for the care of newborns, where three months seems reasonable.

Paid leave need not mean full replacement of existing wages. Replacing two-thirds of weekly earnings up to a set limit is reasonable. The caps and partial wage replacement give workers some incentive to limit their use of paid leave without imposing large financial burdens on those who need it most.

While many states and localities have made sensible choices in these areas, some have not. For instance, the D.C. Council has proposed paid-leave legislation for all but federal workers that violates virtually all of these rules. It would require up to 16 weeks of temporary disability leave and up to 16 weeks of paid family leave; almost all workers would be eligible for coverage, without major experience requirements; and the proposed law would require 100 percent replacement of wages up to $1,000 per week, and 50 percent coverage up to $3,000. It would be financed through a progressive payroll tax on employers only, which would increase to 1 percent for higher-paid employees.

Our analysis suggests that this level of leave would be badly underfunded by the proposed tax, perhaps by as much as two-thirds. Economists believe that payroll taxes on employers are mostly paid through lower worker wages, so the higher taxes needed to fully fund such generous leave would burden workers. The costly policy might cause employers to discriminate against women.

The disruptions and burdens of such lengthy leaves could cause employers to hire fewer workers or shift operations elsewhere over time. This is particularly true here, considering that the D.C. Council already has imposed costly burdens on employers, such as high minimum wages (rising to $11.50 per hour this year), paid sick leave (although smaller amounts than now proposed) and restrictions on screening candidates. The minimum wage in Arlington is $7.25 with no other mandates. Employers will be tempted to move operations across the river or to replace workers with technology wherever possible.

Cities, states and the federal government should provide paid sick and family leave for all workers. But it can and should be done in a fiscally responsible manner that does not place undue burdens on the workers themselves or on their employers.


Editor's note: this piece originally appeared in The Washington Post

Publication: The Washington Post
Image Source: © Charles Platiau / Reuters