Note:

Public comment period closed on 10/31/18. We thank all of those who commented on this draft, and will review and respond to every comment submitted during the comment period. A final, revised version of this document will be available in mid-December.





A Grand-Challenges Based Research Agenda for Scholarly Communication and Information Science

Contributors* to this report, listed alphabetically: Micah Altman, Chris Bourg, Philip Cohen, G. Sayeed Choudhury, Charles Henry, Sue Kriegsman, Mary Minow, Daisy Selematsela, Anasuya Sengupta, Peter Suber, Ece Turnator, Suzanne Wallen, Trevor Owens, & David Weinberger.

The workshop and paper are supported by a grant from The Andrew W. Mellon Foundation. Thanks to the Program Committee for scoping and mapping the original framing for the conversations: Micah Altman, Christine Borgman, Chris Bourg, G. Sayeed Choudhury, Charles (Chuck) Henry, Abby Smith Rumsey, and Ethan Zuckerman; to the keynote speakers, whose remarks framed each meetings’ discussions: Kate Zward, Anasuya Sengupta, and Joi Ito; and to the external participants and library staff who participated in workshop discussion, listed below.

Abby Smith Rumsey, Alex Chassanoff, Alex Wade, Amy Brand, Anasuya Sengupta, Bethany Nowviskie, Brewster Kahle, Charles Henry, Christine Borgman, Chris Bourg, Clifford Lynch, Daisy Selematsela, David Rosenthal, David Weinberger, Deborah Fitzgerald, Donald Waters, Douglas Armato, Ethan Zuckerman, Heather Yager, Jennifer Hansen, Karrie Peterson, Kate Zwaard, Mary Minow, Melissa Hagemann, Micah Altman, Nancy McGovern, Palagummi Sainath, Patricia Hswe, Peter Suber, Phil Bourne, Philip Cohen, Roger Mark, Safiya Noble, Sayeed Choudhury, Sue Kriegsman, Suzanne Wallen, Trevor Owens.


*  Contributor statement. The authors describe contributions to this paper using a standard taxonomy. 1MA and CB provided the core formulation of the papers goals and aims, and MA and SK led the creation of the substantive topic outline. Writing for Sections 1-5 was lead by (respectively) MA & SK; AS, CB, MA & SK; DW,  MA, MM, GSC, & PS; DW, GSC, MA, MM, PC & PS; and CH, ET, & MA. SW lead copyediting.  All contributors provided review and commentary. MA and CB lead in obtaining funding for this project, and served as PI’s.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Table of Contents

 

1 INTRODUCTION
2 TOWARDS A MORE INCLUSIVE, EQUITABLE, AND SUSTAINABLE SCHOLARLY KNOWLEDGE ECOSYSTEM
3 RESEARCH LANDSCAPE
4 TARGETED RESEARCH QUESTIONS
5 INTEGRATING RESEARCH, PRACTICE, AND POLICY
APPENDICES


EXECUTIVE SUMMARY

Editors’ Note: we will include executive summary section will be a 2-5 page stand alone section, which can be formatted and distributed independently of the full report. This will include the recommendations in each section, along with appropriate context to understand them -- including significance, scope, central problems and research areas


1 INTRODUCTION

1.1 Preface -- Identifying Grand Challenges

A global and multidisciplinary community of stakeholders came together in March 2018 to identify, scope, and prioritize a common vision for specific grand research challenges related to information science and scholarly communications. The participants were both traditional domain researchers and those who are aiming to democratize scholarship. An explicit aim of the summit was to identify research needs related to barriers in the development of scalable, interoperating, socially beneficial, and equitable systems for scholarly information; and to explore the development of non-market approaches to governing the scholarly ecosystem.

To spur discussion and exploration, grand challenge provocations were suggested by participants and framed into one of three sections: scholarly discovery; digital curation and preservation; and open scholarship. A few people participated in three segments but most only attended discussions around a single topic. Each track had approximately 20–25 people from different parts of the world -- including the United States, European Union, South Africa, and India. Domain researchers brought perspectives from a range of scientific disciplines; while practitioners brought perspectives from different roles (drawn from commercial, non-profit, and governmental sectors).

During our discussions it quickly became clear that the grand challenges themselves cannot be neatly categorized into discovery, curation and preservation, and open scholarship, or even for that matter limited to information science or librarianship. Several cross-cutting themes emerged, such as a strong desire to include underrepresented voices and communities outside of mainstream publishing and academic institutions, a need to identify incentives that will motivate people to make changes in their own approaches and processes toward a more open and trusted framework, and a need to identify collaborators and partners from multiple disciplines in order to build strong programs should be at the center of future planning.2

The discussions were full of energy, insights, enthusiasm for inclusive participation, and generally concluded with a desire for a global call to action to spark changes that will enable greater equitable and open scholarship. Some important and productive tensions surfaced in our discussions, particularly around the best paths forward on the challenges we identified. On many core topics, however, there was widespread agreement among participants - especially on the urgent need to address the exclusion from participation in knowledge production and access of so many people around the globe, and the troubling over-representation in the scholarly record of white, male, English-language voices. Ultimately everyone believed we can, should, and do have an obligation, to be better in this space and our communities can be catalysts for change.

1.2 Organization of this Report

While the spirit and intent of the workshop is present, this report is not intended to be a summary of the March 2018 workshop discussions. Instead, it draws attention to areas where a systematic research agenda and coordinated leadership have the potential to create a broad impact. In doing this, we seek to catalyze the advancement of knowledge management and scholarly communications globally, and across disciplines by charting specific challenges––and by identifying innovative, interdisciplinary, and collaborative research agendas to solve them.

In particular, this report describes a vision for a more inclusive, equitable, sustainable future for scholarship; characterizes the central technical, organizational, and institutional barriers to this future; describes the areas research needs to advance this future; and identifies several targeted “grand challenge” research problems. These “grand challenges” are fundamental research problems with broad applications, whose solutions are potentially achievable within the next decade.

We conclude the report with recommendations for concrete actions to advance scholarship. We call for academics, funders, knowledge creators, knowledge stewards, and educators to embrace these grand challenges, ignite changes in your own ecosystem to impact an information science and scholarly communications research agenda that will be more open, sustainable, collaborative, and globally inclusive.

2 TOWARDS A MORE INCLUSIVE, EQUITABLE, AND SUSTAINABLE SCHOLARLY KNOWLEDGE ECOSYSTEM

2.1 Vision

Despite the democratizing promise of internet technologies,3today’s scholarly communications and information sharing environments are plagued by exclusion, inequity, inefficiency, elitism, exorbitant costs, lack of interoperability or sustainability, commercial rather than public interests, opacity rather than transparency, hoarding rather than sharing, and myriad barriers at individual and institutional levels to access and participation. Despite, or perhaps because of, the range of perspectives represented, the summit participants agreed that our common vision was of a global information environment that ensures durable, open, equitable, meaningful global access to knowledge consumption and creation in its many forms.

Such a vision requires the centering of knowledge producing communities around the world into a global network of partnerships where we all work toward a more inclusive, equitable, trustworthy, and sustainable research and learning ecosystem. The vision is to create a powerful infrastructure to support local communities and organizations where people can create, share, evaluate, learn from, and interpret information on both small and large scales without barriers, or fear for lost knowledge, in order to support ongoing scholarship. Achieving this vision will require focusing not only on extant systems and process of knowledge sharing and production, but also critically evaluating individual and institutional roles and interests that contribute to the current state.

The problems that plague our systems and prevent us from generating and utilizing wide open scholarship are fundamental and embedded in problems of social justice4 that derive not only from the consequences of unequal distribution of knowledge, but also from trust, safety, security, and ‘epistemic’5 injustice.  (unfairness stemming from the definition of what constitutes knowledge, who is assumed to be knowledgeable, and how knowledge is transmitted). This notion applies to individual people, different forms of knowledge communities and cultures, and the information, objects, and systems that support or challenge them. One could imagine that on one end of the spectrum trust, safety, and security includes how people feel with regard to job security or their role in a community, the reliability of data, unbiased and ethical algorithms, and stable networks. At the other end of the spectrum is data that disappear before they can be saved, networks that are intentionally tampered with to alter an information flow, algorithms that are opaque and therefore mistrusted, and in extreme cases there are people who are concerned for their own personal physical safety because of what they have learned or disseminated.

Solving these problems requires that scholarship be easier to discover, more durable, and access to it more open -- but this is not sufficient. We aim for a scholarly ecosystem that embeds the core values of inclusion, equity, trustworthiness, and sustainability -- in which people can broadly participate in both the creation and definition of scholarship.

Knowledge, how it is shared, and what other people do with it includes a wide continuum of possibilities for an inclusive, equitable, and sustainable ecosystem. We are looking at how our knowledge is learned, conveyed, interpreted, and utilized along the whole research spectrum in order to reach the inclusive, genuine, and reliable research world we can imagine.

2.2 Broadest Impacts

Over the last two hundred and fifty years, there have been unprecedented advancements in the human condition -- encompassing improvements in health, longevity, life-satisfaction, productivity, individual wealth, and the range of choices. These improvements have been enabled in large part by systematic investigations to produce generalized, shared, and durable knowledge -- also known as science and scholarship. (See Stephan 20126 for a discussion of the macro-economic impact of science).

Despite its deep and broad social benefits, science itself remains surprisingly constricted in a number of fundamental aspects:

  1. The benefits of science are unevenly distributed.7

  2. Access to scientific data and scholarly communication, as well as STEM learning materials, has until recently been limited almost exclusively to those inside research or university environments with the ability to pay, and fluency in English.8

  3. Participation in our collective knowledge is limited to a small minority -- the vast majority of research is conducted in elite university settings in developed countries.9

  4. Even in those countries, participation in science is heavily skewed by gender, race, class, and language -- which affect the construction and evaluation of scientific knowledge.10

  5. The evidence base is restricted -- subjects (people), behaviors, languages, forms of knowledge, even ways of knowing are restricted, and the evidence base in many fields is shifting to new sources. 11

  6. The algorithms we use to interpret evidence embody unexamined bias.12

Inclusion of people in different communities in the creation, dissemination, and use of scholarship is not only ethically imperative, but can strengthen research and scholarship globally, and increase the impacts of scholarship on the world.

The potential for broader inclusion to increase impact is apparent when one examines recent advances in social science. In the last twenty years, it has become possible to observe large groups of people and their communications in detail and over continuous periods of time -- this has lead to the creation of some of the largest publicly accessible collections of information about humans in history.13 And this has resulted in changes in the methods, evidence base, pace, and impact of many disciplines in the social sciences -- yielding new insights and challenging previous coarse categorizations of people and their characteristics.14

However despite this vast broadening in the evidence base, our current sources of information about people are heavily skewed to online behavior of industrialized western populations. And the current systems of governance for that information raise questions of privacy, intellectual freedom, and agency -- creating new opportunities to manipulate people for both profit and power.15 The social sciences have much to gain from a globally inclusive system of evidence and knowledge; and society has much to gain from value-driven governance of such a system.

There are shifts in the evidence base of public health and medicine that parallel the shifts in social science, and offer analogous promise and perils.16 We have only started to tap increasing gains from “citizen-science”17 in the STEM fields.. Inclusion is not just important but urgent -- there are big problems in the world (climate change, refugees, etc) - reengineering the knowledge ecosystem will affect people’s (literal) lives and literally, the fate of the world.

2.3 Recommendations for Broad Impact

In order to promote the broadest impacts of research in this area, in service to the vision of a more inclusive, equitable, and sustainable system of scholarship, we make the following recommendations:

●      Recommendation 2-A: We recommend that researchers consider the broadest possible impact of their work -- and how that work could be used to improve the inclusiveness and equity of the scholarly knowledge ecosystem.

●      Recommendation 2-B: We recommend that research funders include consideration of impact on the research ecosystem in their criteria for programs, and that they specifically identify the extent to which proposed work could increase equity and inclusion.

3 RESEARCH LANDSCAPE

3.1 Challenges, Threats, and Barriers

The information science and scholarly communication research community should aim for a future which offers people across the world true opportunities to discover, access, and create scholarly knowledge; in which people have agency over knowledge about them and in interactions with knowledge systems; and in which scientific evidence and scholarship are abundant, enduring, transparent, and trustworthy. As we work towards this future, we must insure that the infrastructures, policies, collaborations and practices we adopt and support are informed by evidence and grounded in research-based decisions.

3.1.1 Challenges to Participation in the Research Community

Most of the current scholarly ecosystem contains information produced and controlled by a small part of the world’s population.18 Scholarly outputs are similarly limited. Most discoverable scholarship is in the form of refereed journals – which are dominated by a small community of professionals and publishers. This information is rarely accessible to everyone, especially in resource poor regions -- and access is, itself insufficient to enable participation.19 As a consequence, the knowledge, practices, and traditions of many communities is not discoverable, accessible, or preserved.

The potential impact of broadening participation in the creation and dissemination of scientific knowledge is substantial.20 The substantial improvements in people’s lives over the last 200 years stem largely from broader collection of and access to knowledge, and the many discoveries that knowledge enables. Broadening collection and access to knowledge increasingly depends on the meaningful participation of content creators across the world.21

3.1.2 Restrictions on Forms of Knowledge

Current scholarly outputs are dominated by English-language journal articles,22 and the available scholarly evidence-base is dominated by quantitative data.23 Because of this, current scholarship captures only a small portion of the diverse forms of knowledge, and ways of knowing.24 In many communities across the globe, knowledge is based on oral traditions, qualitative and experiential data, and other forms of knowing rarely recognized, valued, or represented in the current scholarly record.

One challenge here is to imagine new forms of scholarship that fit new forms of research in order to add new dimensions and perspectives25 broader than the conventional journal article, monograph, and dataset.26

A second, related challenge is to work on ways to make these new genres for scholarship acceptable to research institutions, especially to hiring, promotion, and tenure committees.27 New researchers should have the freedom to explore and present their work in a broad scope of formats and genres that are not restricted to existing norms.

A third, related challenge is for institutions to provide the infrastructure to support the creation and preservation of these new genres of scholarship,28 or to pay for scholars to host them elsewhere. Scholars will not want to pour time into these works if they cannot find platforms to support them for the long term.

Many new forms for scholarship and mechanisms for recognizing them have emerged have emerged at least as experiments.29 Many others have been proposed but not yet tried. Describing even the major ones would take more space than we have here. But we can point to some of the notable new properties that pioneering scholars are eager to try out. Some new genres are multimedia. Some integrate texts and data. Some are interactive. Some are dynamic, and offer regular or foreseeable updates. Some are designed to grow indefinitely and never reach a state that could be called finished. Some are collaborations by dozens or hundreds of people. Some might start as projects by one person, or one group, and later expand to accept contributions from the crowd. Some start as crowd-sourced projects. Some allow conventional attribution but some don’t. Some are so large that it’s not feasible to download them, but only to explore them in their online habitats. Some have APIs allowing them to integrate with other works, or other sources of information, creating hybrid or compound works of scholarship. Some are closer to living libraries or ecosystems than to individual works of scholarship.

As proposals for new genres become more numerous and more urgent, the research community will have to ask itself a series of hard questions. Which of these are worth trying? Which are worth encouraging and accommodating? Which are positively preferable to conventional genres, and for which purposes? How should we evaluate them (e.g. for hiring, promotion, and tenure), especially when they are hugely collaborative, too large to explore in full, or when they focus less on offering “an argument” for new conclusions than offering new ways to organize or validate knowledge? Should research institutions take a position on whether these should use certain open licenses, reside on open-source infrastructure, or become interoperable with certain other resources? We should expect this conversation to be ongoing and lengthy.

3.1.3 Threats to Integrity and Trust

Both technological advances and sustained democracy depend on the integrity of knowledge. However, formal scholarly knowledge generation is limited to small communities, and many members of the public mistrust science. Furthermore, it is increasingly difficult even for scholars to evaluate the weight of the evidence that should be given to claims made in scholarly communications.

Society is already wrestling with the challenges of ubiquitous fake information and disinformation -- even with respect to assertions that are simple and relatively straightforward to to verify.30 Current problems are expanding as the scale of scholarly production grows, placing strain on the mechanisms we have for peer review and quality control – which are slow, fallible, manipulable, and labor intensive.31 These include competing and overlapping systems of authority, including those run by corporate, state, and non-profit actors; increasing demands on the time of researchers asked to supervise and perform review and evaluation (with unclear reward systems); and threats from bad actors working at scales not previously possible (including states and automated systems).32

Much scientific data is not shared, and many knowledge products are ephemeral and can be erased, changed, and removed by politics, technological change, restrictive licenses, or neglect.33 Problems of access, integrity, and accountability all contribute to the problems of public mistrust and skepticism among government leaders.

3.1.4 Threats to the Durability of Knowledge

The durability of knowledge and scholarly products is essential to realizing the full range of scientific discoveries and works of scholarship, and to establishing the integrity of scholarly knowledge claims. Over the last decade, widespread shifts from tangible to digital media create imminent threats to the durability of the scholarly record and scientific evidence base. Moreover, the digital traces of human behavior have expanded far more rapidly than we collect, study, and preserve them.

The importance of digital preservation in ensuring the durability of knowledge is aptly summarized in the National Agenda for Digital Scholarship: “Effective digital preservation is vital to maintaining the authentic public records necessary for understanding and evaluating government actions; the verifiable scientific evidence base for reproducing research, and building on prior knowledge; and the integrity of the nation's cultural heritage. Substantial work is needed to ensure that today's valuable digital content remains accessible, useful, and comprehensible in the future — supporting a thriving economy, a robust democracy, and a rich cultural heritage.” 34 This agenda and preceding work35 have drawn attention to the challenges of particular formats, and the need for preservation infrastructure, business models, and organizational coordination among memory institutions.

Durability is not simply a challenge for memory institutions, however. Sustainable trustworthy scholarship requires that durability be designed into the evolving lifecycle of information creation and use. While the values of openness, inclusion, and durability are complementary, changes in one part of the scholarly ecosystem focused exclusively on promoting other value -- such as the adoption of article-fee based open access -- have the potential to affect the infrastructure and incentives for durability.

Moreover, the lack of diversity in the scholarly ecosystem results in biases not only in what is produced and analyzed, but in what is preserved within the current scholarly ecosystem. We are losing, through neglect, much of the world’s stock of traditional, local, historical memory and tacit knowledge.36 We are in a race against time, losing in many parts of the world the knowledge that is being generated as well as the window of opportunity to implement solutions to global problems.

3.1.5 Threats to Individual Agency

Participants inside the scholarly ecosystem are challenged to understand the increasingly complex algorithms that they implicitly rely upon.37 Further, ubiquitous data collection that gathers information from broad areas of society into academic and commercial research increases the need to maintain privacy, safety and control over research information.38 As participation in scholarship is broadened, there will be a need to honor different community norms on access and use of information.

Algorithmic discovery and analysis, while enabling many scientific advances, has the potential to amplify existing biases and to introduce new, and potentially hidden sources of unfairness. However, there is no consensus in research or practice over how to define or evaluate algorithmic transparency and fairness.

3.1.6  Barriers to a Scholarly Ecosystem that is Sustainable and Open

Open scholarship has been a goal for much of the scholarly community for 20-50 years. Public policy has driven requirements for open access to journal articles and for deposit of datasets. Multiple stakeholders39 have invested in repositories to capture scholarly products in digital libraries that are open to the world. However, open scholarship is still far from achieving the goals set long ago.40 While focus on journals and datasets has made some inroads in open access, other formats lag far behind. The worlds of music, ebooks, and video are tightly bound in a proprietary world, with licenses and digital rights management that are generally more restrictive than copyright law.41

Current structures, policies, systems, and norms do not incentivize the behaviors that will lead to the imagined open scholarship future we want. As open access has progressed, the commercial publishing industry has challenged (and sometimes co-opted) open access through changes in business models, copyright law, acquiring smaller companies and players, and other actions.42 At multiple levels, incentives are badly misaligned to the larger goals of scholarship and learning.

3.2 Grand Challenge Research Areas

The overarching question these problems pose is how to create a global scholarly knowledge ecosystem that supports participation, ensures agency, transparency, trustworthiness, and integrity, and is legally, economically, institutionally, and socially sustainable and durable.

Reaching this future state requires exploring a broad set of interrelated anthropological, behavioral, computational, economic, legal, policy, organizational, sociological, and technological areas. The extent of these areas of research is illustrated by the following examples:

●      What are the most effective modalities for sharing knowledge across different regions and communities, and promoting mutual learning across community boundaries? How can skills in scholarly knowledge creation, curation, and preservation be shared and learned from different knowledge communities? 43 What are the existing multiple models and traditions of preservation and curation from these broader communities including informal and unofficial stewards? How do these traditions and their trajectories relate to the affordances of digital materials and systems, and where is adaptation and refinement needed? How can these traditions and models be integrated to transform information science and formal library and archival practice?

●      What are the drivers for engagement and participation in scholarly knowledge creation, discovery and curation? What are the barriers to skill acquisition and transmission at the personal, organizational, disciplinary and ecosystem level? What interventions would lead to appropriate skills becoming pervasive? How do we address the need to be facilitative and supportive of skills development, while decolonializing power and control over methods, skills, and objects of curation?

●      What are forms of knowledge not represented in the current scholarly ecosystem? What approaches to describe, capture, and transmit tacit knowledge and other non-textual knowledge can be generalized and scaled? And how should the tacit knowledge that is the subject of scholarly study, or is integral to its practice be discovered, curated, and preserved?

●      What measures and algorithms are most effective for summarizing scholarly outputs at scale? What information architecture, semantic analysis, and computational infrastructure is needed to meaningfully link scholarly knowledge across sources and fields of study? How can both analysis and linkage be scaled to world knowledge, and adapted to its forms?

●      What parts of the scholarly knowledge ecosystem promote the values of transparency, individual agency, participation, accountability, and fairness? How can these values be reflected in the algorithms, information architecture, and technological systems supporting the scholarly knowledge ecosystem? What principles of design and governance would be effective for embedding these values?

●      How should the measures of use and utility of scholarly outputs be adapted for different communities of use, disciplines, theories, and cultures? What methods will improve our predictions of future value of collections of information, or enable the selection and construction of collections that will are likely to be of value in the future?

●      What are the determinants of scholarly and public trust in scholarly knowledge claims? What content (e.g. workflows, data) and characteristics of (information architectures, organizations, cultures, institutions) promote trustworthiness and the ability to evaluate the strength of evidence in claims? How can the mechanisms for promoting trust and trustworthiness be adapted to scholarly contributions by non-professional communities, and applied to non-traditional forms of knowledge?

●      What legal mechanisms, scholarly cultures, organizational designs, and economic models could support enduring access to knowledge without relying on a stream of access fees? What are the barriers and incentives against enduring open access, and what interventions could be effective in shifting laws, organizations, behaviors and markets to a sustainable open equilibrium?

The list above provides a partial outline of research areas that will need to be addressed in order to overcome the major barriers to a better future for scholarly communication and information science. As the field progresses in exploring these areas, and attempting to address the barriers discussed, new areas are likely to be identified. Even within this initial list of research areas, the number of important research questions ripe for exploration is large and pressing.

3.3 Recommendations for Research Areas and Programs

Based on the characterization of the research landscape above, we make the following recommendations:

●      Recommendation 3-A: We recommend that funders consider developing future programs and requests for proposals to address the barriers above.

●      Recommendation 3-B: We recommend that researchers in information science and related fields strongly consider selecting problems within a grand-challenge research area as part of their research program.

●      Recommendation 3-C: We recommend that reviewers give particular weight to research proposals and discoveries that address these barriers or advance grand-challenge research.

●      Recommendation 3-D: We recommend that the participants in the existing ecosystems, including publishers and ecosystem builders, consider how the systems they build can reduce the barriers identified above.

●      Recommendation 3-E: We recommend that researchers and stakeholders actively seek out new voices and participation in the design and conduct of research; and who can challenge currently accepted ways of conducting, communicating, and evaluating research.

4 TARGETED RESEARCH QUESTIONS

All of the research areas described above hold great promise for exploration. In this section, we discuss in detail four targeted individual research questions, drawn from these broad research areas. The aim is to provide a statement of the research question that can be understood by researchers and practitioners in multiple disciplines; suggest how progress toward a solution could be measured; explain how such progress could result in substantial progress in addressing the problems above; and identify lines of research and practice that offer potential insights into a solution. We argue that each of these questions is potentially solvable in the next 7-10 years, and, if solved, will have substantial impact across multiple central problem areas.

Research and scholarship are embedded within and shaped by a broader ecosystem that comprises stakeholder organizations,44 social norms,45 laws,46  economic markets,47 and political institutions.48 This ecosystem as a whole affects how knowledge is produced, accessed, discovered, and preserved. None of the major challenges to equitable, trustworthy, inclusive, and durable scholarship (discussed in this report in section 3) can be fully resolved without an improved understanding of how to design institutional and normative ecosystems, and of what interventions are effective for moving us toward better ecosystems.

Research on the challenge of enduring, inclusive and open scholarship begins with an understanding of the problems exacerbated by its absence. These include weak trust in scholarly knowledge claims,49 which remain unverifiable or opaque across research communities and among wider publics when the processes and outcomes of research are not open, and when disparate access to research knowledge exacerbates social inequalities. The pursuit of openness in scholarship, however -- especially in access to published work -- may manifest as a treadmill of increasing expenses absorbed as user fees or publisher profits that fail to lead to systemic solutions.50 With resources devoted to these costs, investment in preservation with durable open access is threatened, even as the volume and complexity of material to be preserved in the scholarly record multiplies.51

Despite common recognition of this set of problems, effective incentives to drive key actors to develop and enact solutions to address them seem to be lacking.52 For example, scholarly societies often depend on revenue from journal subscription fees to fund various organizational and member goals and activities, thus creating a disincentive to adopting open models of dissemination that reduce or eliminate subscription revenue streams.53 Similarly, researchers and funders, as well as universities, have incentives to see their work appear in the most prestigious publications, regardless of their public accessibility, even as most scholars and scholarly institutions claim to  seek wider audiences for their research outputs.54

Organizational and technological innovations that promote open scholarship have the potential to promote opportunities for broader engagement across research communities and broader publics,55 to allow the use of machine tools for analysis and dissemination of research outputs and materials,56 and to facilitate crowd-based methods of evaluation.57 However, such innovations also pose risks, including the empowerment of bad online actors58 at greater scale or velocity.

Research on open scholarship solutions is needed to assess the scale and breadth of access, the costs to actors and stakeholders at all levels, and the effects of openness on perceptions of trust and confidence in research and research organizations. This will require assessment of the costs and returns of open scholarship at a systemic level, rather than at the level of individual institutions or actors. We also need to assess how open scholarship can reduce barriers to research materials and knowledge, especially those set up by social and economic inequalities.In addition, research should address the permeability of open scholarship systems to researchers across multiple scientific fields, and whether -- and under what conditions -- open scholarship enhances interdisciplinary collaboration.

Please comment below  with additional citations, references, links to projects, or other notes on research and development to this area --  after reviewing Appendix 1 for a list of more detailed prompt questions. Please cite specific examples if possible.

●      Problem measurements

●      Solution impact

●      Suggested research approaches

4.2 Research Challenge: Measuring, Predicting, and Adapting to Use and Utility Across Scholarly Communities

In order to manage information we must value it: Systems and algorithms for discovery nominally aim to support users in finding information that is relevant to their needs -- information that is of value to them in their current context. Curation and preservation systems and strategies aim to deliver future (medium or long-term) value to specific communities of research or practice. Assumptions are embedded throughout the scholarly information ecosystem regarding what information is valuable (or will be), which communities will value it, and what forms of use and access will realize this valuable. Explicit models of  research information value and uses are much less common.

Search and discovery increasingly eludes expert (human) indexing, and relies on algorithms -- creators of search algorithms and discovery systems attempt to predict the value of specific information to a specific user at a specific time.59 These algorithms which in turn rely heavily on signals of broad and current use (e.g. clicks, downloads, links), and are influenced by the monetary value that can be derived from such system (such as sales of goods, or opportunities, ad placement). Approaches based on these aggregate models of information value are unlikely ever to support  systematic discovery of information of value to important, but small communities of knowledge seekers. For example, current search systems will rarely uncover the most promising yet-unexamined archival material in the history of robotics; the most promising corpus for evaluating methods to detect gerrymandering; the most reliable software for estimating models in comparative phylogenetics; or other types of materials, which are of high intellectual value, but not profitable; or which are valuable to a community that is distinct, but not large.

Researchers and curators often rely on professional judgement, and manual selection and assessment processes to decide what  information to retain, how long to retain it, what effort to expend in making it accessible and understandable,  and when that effort should be applied. These processes are often hyper-local and ad-hoc, based on the history of practice and the local values of the organization or community of practice making these decisions.60 Often these processes originate from a prior analog era, when all the information on which each organization relied had to be ‘held’ (formally acquired or created); and, in practice, it was possible to select and curate only information that was held.61  Many of the models of value underlying our current curation processes have not been updated or adapted to fit current realities.62 And the absence of explicit models of value makes it difficult to effectively adapt these processes to non-traditional forms of evidence (e.g. software, oral testimony);  for new non-traditional communities of research and practice; or for new types of use (e.g. non-consumptive data mining). 63

The development of formal models, methods, and empirical analysis, which would lead to more rigorous, reliable, and systematic evaluation of the value of research information constitutes an important, but challenging set of problems. Estimating the value of information is inherently difficult. Arrow’s information paradox states that ex-ante a buyer cannot assess the value of particular information – it can only be known ex-post, at which point the buyer has limited incentive to pay for it.64 Although assignment of intellectual property rights can address this issue to a limited extent, it is very challenging65  – and hence markets for information goods are generally thin. Furthermore, intellectual property rights notwithstanding, the non-consumptive and limited excludability that is inherent in properties of information goods implies that any pure market solution will produce and distribute information at levels that are socially sub-optimal.66 Although data quality is sometimes seen as a proxy for value, no feasible universal quality measure exists – data quality measures are notoriously varied, discipline specific, contextual, and difficult to implement in practice.67

In the preservation of information, diversification of storage and representation is recognized as an essential strategy for ensuring future accessibility -- and there is a well-recognized taxonomy of risk sources that guides diversification strategy. We have no equivalent strategies to diversify across the risks to information value. In economics, methods such as revealed preference analysis and contingent valuation surveys68 are often used to measure the value of non-market goods – yet these methods have not been applied to valuing research data. Similarly portfolio selection modeling69 is the primary tool used in finance to diversify across risky investments, but has never been applied to the ‘investments’ in developing collections of information. Solutions in this area would yield models of information valuation that could be examined, challenged, and refined; and taxonomies of uses, communities, and threats that could be used for diversification strategies.

Please comment below  with additional citations, references, links to projects, or other notes on research and development to this area --  after reviewing Appendix 1 for a list of more detailed prompt questions. Please cite specific examples if possible.

●      Problem measurements

●      Solution impact

●      Suggested research approaches

4.3 Research Challenge: Designing and Governing Algorithms in the Scholarly Ecosystem to Support Accountability, Credibility, and Agency

Across the scholarly information ecosystem automated algorithms play increasingly critical roles in discovery (e.g. relevance ranking; recommender systems);70 in information extraction and summarization (e.g. automated abstract generation, literature mining);71 and in the evaluation of scholars and scholarship (e.g. detection of plagiarism, image manipulation, or journal citation inflation; evaluation of collaboration impact; predicting productivity).72 Moreover, the rapid growth in the volume of evidence, number of publications, and scale of collaboration in research73 generates strong pressure to rely on such automated systems -- the growth of scientific knowledge relies on algorithms and algorithmic systems to support knowledge discovery, evaluation, and collaboration at scale.

As their ubiquity increases, algorithms in the scholarly ecosystem are growing increasingly complex and opaque: ranging from models that while theoretically well-defined remain difficult to estimate and interpret (e.g. use of latent-dirichlet allocation to extract science topics; use of network regression models to measure collaboration)74 to the nominally transparent but effectively inscrutable (e.g. use of open deep-learning for recommender systems)75 to algorithms that are opaque and ever-changing by design (e.g. Google’s systems for relevance ranking).76

The problems posed by the use of such complex algorithms are now becoming recognized in the wider public sphere. These problems include violation of human privacy or agency (e.g. recommender systems inadvertently revealing purchasing habits to others);77 of biases and inequities in outcomes, that result from algorithmic design choices (e.g. the poor performance of facial recognition algorithm for people of color);78 the potential for algorithmic systems to aggregate and amplify human biases (e.g. substantial explicit racialization of Google search ad placement resulting from the aggregation of implicit bias in click-through behavior);79 to the intentional adversarial manipulation of digital evidence80 and of machine learning algorithms to game evaluation  or actively harm others (e.g. adversarial attacks on image detection).81

Addressing this interrelated set of problems requires advances in multiple fields and at multiple levels. The design and evaluation of algorithmic bias, fairness, and manipulability is generally in early stages. Further, in the domain of scholarly information, we have yet to identify the necessary properties of algorithms that are required to protect individual agency, facilitate collaboration, facilitate the identification of new biases, prevent gaming, and preserve trustworthiness -- nor have we identified the fundamental constraints on and tradeoffs among these goals.  For those few properties that have been identified as desirable -- such as individual information privacy we have limited understanding of how to successfully design and deploy algorithmic systems that satisfy these properties.82 And even for those algorithms that are commonly in use, we have little systematic empirical evidence on their quality, manipulability, and biases.

Please comment below  with additional citations, references, links to projects, or other notes on research and development to this area --  after reviewing Appendix 1 for a list of more detailed prompt questions. Please cite specific examples if possible.

●      Problem measurements

●      Solution impact

●      Suggested research approaches

4.4 Research Challenge: Integrating Oral and Tacit Knowledge into the Scholarly Ecosystem

Participation in the collective knowledge of science and scholarship is currently limited to a small minority (as discussed in section 2, above). In part this is because scholarly communication and reputation is primarily transmitted and promoted through publication of journal articles and books.

 

Most culture, much knowledge about history, skills, and methods; is not written. Knowledge that derives from or pertains to indigenous, traditional, and local communities is often transmitted and preserved through oral histories and oral traditions. Even within our current system of science there is evidence that critical parts of the knowledge needed to conduct science (e.g. how to perform experimental bench methods);83 and to have  successful careers as scientists is tacit -- resistant to transmission in textual form. Within science this is often transmitted orally and experientially through collaboration and mentoring relationships -- which can have a substantial impact on both the reliability of scientific results,84 and disparities in the diversity of the academy.85

Neither the methods nor the systems used to represent and manage the scholarly record are well-adapted to non-textual knowledge. The result is that most knowledge in tacit or oral form remains unexamined, invisible, and is not recognized, curated or preserved within the scholarly community.

Integrating oral and tacit knowledge into the scholarly ecosystem raises not only methodological and technical challenges, but deep conceptual challenges as well.86 The scholarly conceptualization of information integrity will need to be expanded, along with the mechanisms and methods we use to manage authenticity, provenance, and versioning. Models of attribution, authority, and trust will need to be extended to both these forms of knowledge, and to the communities that produce it. Further the widespread dissemination of oral and tacit knowledge that is  embodied in the behavior of individuals raises challenges for information agency -- and for the mechanisms we use to provide consent for and control access to  information.

Please comment below  with additional citations, references, links to projects, or other notes on research and development to this area --  after reviewing Appendix 1 for a list of more detailed prompt questions. Please cite specific examples if possible.

●      Problem measurements

●      Solution impact

●      Suggested research approaches

5 INTEGRATING RESEARCH, PRACTICE, AND POLICY

5.1 The Need for Leadership to Coordinate Initiatives

Many of the opportunities for scholarship that are made possible by the rapidly advancing technologies have yet to be fully realized. There are several reasons for this: As discussed above, the social, legal, technical, and organizational systems for disseminating, discovering, reusing, and communicating scholarly information have not kept pace with the technologically induced changes in the scholarly ecosystem.

Left to the market, the economics of knowledge in digital form creates both network externalities and reputation effects that are increasingly exploited by rent-seeking monopolies.87 To avoid this market disequilibrium requires that institutions coordinate to manage scholarly knowledge -- and this requires leadership. Some set of individual organizations must go beyond their local interests -- and invest effort and reputation into changes to the scholarly ecosystem that yield broad benefits.

At the same time, organizations should not act in isolation. Almost every institution now relies for its business, operations, and mission on large amounts of information that go beyond institutional boundaries. The amount of information is so great, and the risks so diverse, that no single organization can effectively ensure sustainable access to all the information it produces or needs.88 At the same time, for many pools of digital information, multiple institutions value it. Together, these imply that collaboration is essential -- institutional leaders must not only innovate, but coordinate.

5.2 Role of Libraries as Advocates and Collaborators

Research universities are among the most long-lived of human institutions. University libraries are widely trusted as the permanent stewards of the scholarly record and scientific evidence base within these institutions, and libraries have highly refined expertise and infrastructures for the organization and dissemination of knowledge. Further, the grand challenges identified above will likely be solved only through a cross disciplinary approach, and libraries are by design interdisciplinary, and in practice trusted as an honest broker of knowledge. Finally, the values of libraries are deeply aligned with the values of knowledge communities -- libraries constitute themselves as being in service to these communities, in contrast with commercial entities, and even in contrast to the larger organization within which research libraries are embedded.

Libraries should collaborate in the grand challenge research we have described in this paper. Further, libraries should act in other ways as direct agents of change and also as a voice to enlist other change-makers. Libraries can help to educate the communities that they serve about information ethics, agency, and risks.89 Libraries can collaborate to develop common open infrastructure.90 Libraries can help to make the norms and culture of scholarship more inclusive by documenting and disseminating the tacit knowledge that is part of the successful practice of scholarship -- much of which is inaccessible except through direct mentoring.91 As trusted brokers for information, they can advocate on behalf of the scholarly community both to the government and to commercial information providers and intermediaries.

5.3 Incorporating Values of Openness, Sustainability, and Equity into Scholarly Infrastructure and practice

With respect to the practice of research, it is worth noting that many fields of scholarship, academic associations, professional groups, and societies have issued ethics statements involving integrity of the work, confidentiality of the individual, and being mindful of the direct or indirect impact that research/work outcome may have on the lives of individuals, groups, or societies. Leadership at these professional and academic organizations have the power to align “do no harm”,high level principles with active and impactful policy implementations that set as a goal equitable, diverse, inclusive, and socially just outcomes. Universities often work under explicit policies and procedures but defining and implementing such research outcomes requires systems in place that intentionally support the advancement of equitable and diverse societies worldwide. This remains an important challenge because it means saying no to certain funding sources, and adjusting relationships between wealthy and impactful research institutions and industries.

Much of the infrastructure for scholarship is neither owned nor designed by scholar, but has been developed by commercial entities for profit -- and is controlled by a few large companies.92 As the practice of research and publishing has accelerated, requiring more integration of information across the research lifecycle, this infrastructure has become increasingly complex, and increasingly dominated by a small number of commercial entities. Should similar ethical principles be applied to infrastructure as to practice? Does commercial dominance in infrastructure present risks to achieving the goal of open, sustainable, and equitable scholarship?

As an example of existing tensions, it has been broadly recognized that the profit-driven model of social-media companies such as Facebook and Twitter create strong incentives to collect and monetize information about participants in this network -- which is in tension with protecting information privacy.93 Similarly, the reliance of Google on advertisement revenue influences both what is indexed, and how relevance is operationalized.94 More generally, commercial entities have an incentive toward algorithmic opacity in order to protect their trade secrets and competitive advantages.95

The increasing prevalence of high-profile information breaches96 and the increasing ability to re-identify individuals and their characteristics based on aggregated or nominally ‘anonymous data’97 has led to increasingly widespread support for systems of information discovery and sharing that incorporate respect and protections for individual agency and information privacy in to their core design. In some cases, values such as openness, sustainability, and equity can and should be incorporated deep into the infrastructure of new systems from the beginning. In other cases, research is needed to determine whether and how such values could be effectively expressed and enacted  using existing infrastructure that was created for very different functions and with different value propositions than those animating the creation of systems explicitly designed to support open, equitable, sustainable scholarship. 

We also must critically examine the unintended consequences and uses of policies, practices and infrastructures that have been explicitly developed in support of open scholarship. For example:

●      How has the discovery and hosting of open-access content on proprietary infrastructure (e.g. SSRN, bepress, Google Scholar) created or mitigated barriers to accessing that content?

●      What creates incentives for stakeholders to use open software, standards, and API’s -- particularly when hosting open access content?

●      How can methods be used to design and refine open infrastructure to meet, to support reuse, extension, adaption at the local level -- while being able to function at the continually growing scale of global research output?

Addressing these questions requires integrating research with practice and infrastructure development. Research is needed to guide the design of platforms that are consistent with our values; and platforms are needed that can be instrumented to evaluate these designs, and contribute to our understanding of where we are successfully promoting the objectives we seek. To be successful at a global scale, valuation of practice should go beyond case-studies in their approach, and include replicable methods to support systematic inference, such as randomization and pre- and post-evaluation.

5.4 Funders, Catalysts and Coordinators

A number of organizations currently fund, coordinate, or catalyze advances in research, infrastructure, and practice, which enables open, inclusive, and durable scholarship. The US federal agencies Institute for Museum and LSciences (IMLS) and the National Endowment for Humanities (NEH); the European Research Council; and The Andrew W. Mellon and Alfred P. Sloan Foundations all have long track-records of supporting research, practice, and infrastructure in these areas.98 A number of other funders -- including Wellcome Trust, National Science Foundation, National Institutes for Health, Chan Zuckerberg Initiative, Gates Foundation, Helmsley Foundation, Open Society Foundation, and the Gordon and Betty Moore Foundation -- have supported more limited initiatives related to these areas and primarily centering on open and reproducible research.This good work notwithstanding, we argue that the problems and challenges described in this report merit recognition by the entire spectrum of funders engaged directly or indirectly in supporting research and scholarship. 

Finally, success in advancing these areas will rely on organizations to coordinate collaborative approaches to research, practice, and infrastructure. This is difficult because coordination is often a public good -- providing more benefits to the research community as a whole, than to the coordinating institution (indeed, many coordinating institutions invest more than they expect to receive directly).  Despite this structural challenge, institutions like CLIR (along with DLF and National Digital Stewardship Alliance (NDSA)), Research Data Alliance, SPARC and Co-Data have been successful in coordinating standards development and educational initiatives;99 and  organizations such as Duraspace, the Dataverse Community, Digital Preservation Network, and Center for Open Science100 have played vital roles in coordinating the development and support of the vital research infrastructure that supports open scholarship.

Organizations such the Coalition of Networked Information, Association of Research Libraries, and the National Academies (primarily through Board on Research Data and Information)  -- joined more recently by organizations such as Force11 and Sage Bionetworks have established themselves as catalysts for open scholarship. They play a vital role in disseminating information on initiatives and research, convening experts, and engaging in advocacy. Over the last decade organizations such as NDSA, The Long Now Foundation, and DPC101 have played a similar catalytic role for the issue of information durability. Only recently have  organizations focused on equitable and inclusive knowledge, such as Whose Knowledge,102 and have been recognized in the scholarly community.

Progress towards a more open, equitable, trustworthy, and durable scholarly ecosystem requires that more institutions take catalyzing and coordinating roles in addressing the challenges and exploring the research areas described in section three. Further, existing organizations can help greatly by recognizing in their programs the interrelationship between openness, impact, trustworthiness, durability, and inclusivity in research and scholarship.

5.5 Recommendations for Integrating Research, Practice, and Policy

Summarizing the discussion of the connection across research, policy, and practice above, we make the following recommendation:

●      Recommendation 5-A: We recommend that individual research institutions take public responsibility for leading and coordinating inclusive efforts to address the barriers to a more equitable and inclusive systems of scholarship.

●      Recommendation 5-B: We recommend that research libraries promote a vision of inclusive and equitable scholarship within their institutions; that they engage in work on legislation and public policy; and that they enlist others in the scholarly community as change-makers.

●      Recommendation 5-C: We recommend that those engaged in developing platforms and communities of practices actively seek new voices and participation in their design and use.

●      Recommendation 5-D: We recommend that those engaged in research, practice, and advocacy in the area of open and inclusive scholarship should collaborate to develop platforms and interventions that can contribute to our understanding of what works. Evaluation of practice should go beyond case-studies in their approach, and include replicable methods to support systematic inference.

●      Recommendation 5-E: We recommend that stakeholders give priority to resourcing programs that rigorously integrate research and practice; and particularly to those programs that systematically contribute to the overall cumulative evidence base for inclusive, equitable, and credible scholarship.


APPENDICES

Appendix 1: Research Problem Prompts

Problem measurements

●      What characterizes a feasible solution to the problem – what conditions must any solution satisfy?

●      What characterizes a “good” solution – what conditions are sufficient and/or how would researchers across disciplines measure the quality of the solution?

●      How do we evaluate progress toward a solution – what empirical evidence would demonstrate progress, how can progress be quantified?

Solution impact

●      Who would care if a “good” (as previously) was reached and what difference would it make?

●      In what ways would it advance multiple scientific fields – what other related problems would it solve and what new things would those fields be able to then accomplish?

●      What is the size of the potential economic impacts?

●      How will a solution to this problem affect individuals’ lives and society?

Current approaches

●      How is this problem (and the goals it addresses) approached today?

●      What are the gaps in current knowledge – are there necessary conditions that we do not understand how to satisfy?

●      What are the limits of current solutions – what criteria, and what expectations related to transparency, need to be improved to reach “good” solutions?

●      What types of new research discoveries and policy considerations are needed to achieve “good” solutions?

Research Approaches

●      Why do we think a good solution is achievable in the foreseeable future?

●      What are the new insights from theory; new empirical discoveries; new connections among disparate approaches; new methods; or new sources of data that suggest that solutions are coming within reach?

●      Are there previously unrecognized connections between this problem and successful solutions in other disciplines?

●      Are there previously unrecognized links to problems in other discipline in other disciplines that could be applied to this problem?

●      Are their active initiatives or projects demonstrating progress in these areas?

●       What are potential next steps: On-ramps? Collaborations to seek? Stakeholders to engage?


 

Endnotes

1.  Liz Allen, Jo Scott, Amy Brand, Marjorie Hlava & Micah Altman, “Publishing: Credit Where Credit Is Due”,  Nature 508, (2014): 312.

2.  Global Voices (http:/globalvoices.org) is an initiative that is amplifying stories from underrepresented communities. For a discussion of this and related initiatives see Torres, Lars Hasselblad. "Citizen sourcing in the public interest." Knowledge Management for Development Journal 3, no. 1 (2007): 134-145.

3.  For thoughtful analyses of technological changes and of their potential to further revolutionize scholarship see: Atkins, D., 2003, Revolutionizing science and engineering through cyberinfrastructure: Report of the National Science Foundation blue-ribbon advisory panel on cyberinfrastructure. National Science Foundation; Berman, F. and H. Brady, 2005, Workshop on Cyberinfrastructure for the Social and Behavioral Sciences: Final Report, National Science Foundation.

4.  Britz, J.J. (2008). Making the global information society good: a social justice perspective on the ethical dimensions of the global information society. Journal for the American Society for Information Science and Technology, 59(7), 1171-1183. doi:10.1002/asi.20848

5.  Fricker, Miranda. Epistemic injustice: Power and the ethics of knowing. Oxford University Press, 2007.

6.  Stephan, Paula E., How economics shapes science. Vol. 1. (Cambridge, MA: Harvard University Press, 2012).

7.   Examples: Fausto-Sterling, Anne, Myths of gender: Biological theories about women and men. (Basic Books, 2008); Braun, L., Fausto-Sterling, A., Fullwiley, D., Hammonds, E.M., Nelson, A., Quivers, W., Reverby, S.M. and Shields, A.E., 2007. “Racial categories in medical practice: how useful are they?”. PLoS Medicine, 4(9), p.e271; Fausto-Sterling, A., 2000. Sexing the body: Gender politics and the construction of sexuality. (Basic Books; Revised edited ed. 2000).

8.  Willinsky, John. The access principle: The case for open access to research and scholarship. (Cambridge, MA.: MIT Press, 2006).

9.  Azar, B. “Are your findings ‘WEIRD,’Monitor on Psychology 41, no. 5 (2010): 11.

10. Sugimoto, Cassidy R. “Global gender disparities in science,” Nature 504 (2013): 211-213 and supplement 1.  

11.  See for example Alvarez, R. Michael, ed. Computational social science. (Cambridge University Press, 2016); Altman, Micah, and Marguerite Avery. "Information wants someone else to pay for it: laws of information economics and scholarly publishing." Information Services & Use 35, no. 1-2 (2015): 57-70.

12.  Courtland, R. "Bias detectives: the researchers striving to make algorithms fair." Nature 558, no. 7710 (2018): 357.; Altman, Micah, Alexandra Wood, and Effy Vayena. "A harm-reduction framework for algorithmic fairness." IEEE Security & Privacy 16, no. 3 (2018): 34-45.

13.  King, Gary. "Restructuring the social sciences: reflections from Harvard's Institute for Quantitative Social Science." PS: Political Science & Politics 47, no. 1 (2014): 165-172.

14.  Lazer, David, Alex Sandy Pentland, Lada Adamic, Sinan Aral, Albert Laszlo Barabasi, Devon Brewer, Nicholas Christakis et al. "Life in the network: the coming age of computational social science." Science (New York, NY) 323, no. 5915 (2009): 721.

15.  See Hilbert, Martin. "How to Measure" How Much Information"? Theoretical, Methodological, and Statistical Challenges for the Social Sciences Introduction." International Journal of Communication 6 (2012): 1042-1055, summarizing a special issue of IJOC on this topic.

16. Raghupathi, Wullianallur, and Viju Raghupathi. "Big data analytics in healthcare: promise and potential." Health information science and systems 2, no. 1 (2014): 3; Vayena, Effy, Marcel Salathé, Lawrence C. Madoff, and John S. Brownstein. "Ethical challenges of big data in public health." PLoS computational biology 11, no. 2 (2015): e1003904.

17. Wiggins, Andrea, and Kevin Crowston. "From conservation to crowdsourcing: A typology of citizen science." In System Sciences (HICSS), 2011 44th Hawaii international conference on, pp. 1-10. IEEE, 2011.; Levine, S.S. and Prietula, M.J., 2013. “Open collaboration for innovation: Principles and performance. Organization Science,” 25(5), pp.1414-1433; Majchrzak, Ann, and Arvind Malhotra. "Towards an information systems perspective and research agenda on crowdsourcing for innovation." The Journal of Strategic Information Systems 22, no. 4 (2013): 257-268.

18.  See:Graham, Mark, Scott A. Hale, and Monica Stephens. "Geographies of the World’s Knowledge." London: Convoco(2011).;  Graham, Mark, Bernie Hogan, Ralph K. Straumann, and Ahmed Medhat. "Uneven geographies of user-generated information: Patterns of increasing informational poverty." Annals of the Association of American Geographers 104, no. 4 (2014): 746-764.

19.  As an example, the difficulty of accessing research information through mobile phones, which are the primary channel for accessing information, is a barrier that must be addressed: Hosman, Laura, and Elizabeth Fife. "The use of mobile phones for development in Africa: Top-down-meets-bottom-up partnering." The Journal of Community Informatics 8, no. 3 (2012); and see for barriers to engagement in the same region: Ojanpera, Sanna, Mark Graham, Ralph K. Straumann, Stefano De Sabbata, and Matthew Zook. "Engagement in the knowledge economy: Regional patterns of content creation with a focus on Sub-Saharan Africa." (2017): 33-51.

20.  Ali-Khan SE, Jean A, MacDonald E and Gold ER. “Defining Success in Open Science”. MNI Open Res 2018, 2:2 (doi: 10.12688/mniopenres.12780.2)

21.  See for  example: European Commission on Open Science http://ec.europa.eu/research/openscience

22.  Lillis, Theresa, and Mary Jane Curry. Academic writing in a global context: The politics and practices of publishing in English. Routledge, 2013.

23.  On the importance of non-numeric information; the dearth of scientific archives providing this content and the challenges of providing durable access, see:National Research Council. Frontiers in massive data analysis. National Academies Press, 2013.; Hammersley, Martyn. "Qualitative data archiving: some reflections on its prospects and problems." Sociology 31, no. 1 (1997): 131-142.;Elman, Colin, Diana Kapiszewski, and Lorena Vinuela. "Qualitative data archiving: Rewards and challenges." PS: Political Science & Politics 43, no. 1 (2010): 23-27.;  Mannheimer, Sara, Amy Pienta, Dessislava Kirilova, Colin Elman, and Amber Wutich. "Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges." American Behavioral Scientist (2018):.

24.  See for examples of knowledge that is not readily reducible to text: Ahn, Sun Joo, Joshua Bostick, Elise Ogle, Kristine L. Nowak, Kara T. McGillicuddy, and Jeremy N. Bailenson. "Experiencing nature: Embodying animals in immersive virtual environments increases inclusion of nature in self and involvement with nature." Journal of Computer-Mediated Communication 21, no. 6 (2016): 399-419; Bailenson, Jeremy. Experience on Demand: What Virtual Reality Is, how it Works, and what it Can Do. WW Norton & Company, 2018; McPherson, T., 2018. Feminist in a Software Lab: Difference+ Design. Harvard University Press; Eric Dinmore 2015  “Collecting, Curating, and Presenting ‘3-11’ With Harvard’s Digital Archive of Japan’s 2011 Disaster” Verge: Studies in Global Asias, 1(2): 37-41

25.  (Citation needed)

26.  And it is worth noting that even traditional outputs, such as datasets, have lacked consistent practices for publication and citation:  Altman, Micah, Christine Borgman, Mercè Crosas, and Maryann Matone. "An introduction to the joint principles for data citation." Bulletin of the Association for Information Science and Technology 41, no. 3 (2015): 43-45.

27.  See for example Andersen, Deborah Lines. Digital Scholarship in the Tenure, Promotion and Review Process. Routledge, 2015.; Cheverie, Joan F., Jennifer Boettcher, and John Buschman. "Digital scholarship in the university tenure and promotion process: A report on the sixth scholarly communication symposium at Georgetown University Library." Journal of Scholarly Publishing 40, no. 3 (2009): 219-230. And Flanders, Julia. "The productive unease of 21st-century digital scholarship." Defining Digital Humanities: A Reader (2013): 205-218.

28.  (Citation needed)

29.  See (respectively) for example emerging mechanisms for recognition of software as a product of scholarship in the sciences; and an foundational introduction to new forms of digital scholarship in the humanities: Niemeyer, Kyle E., Arfon M. Smith, and Daniel S. Katz. "The challenge and promise of software citation for credit, identification, discovery, and reuse." Journal of Data and Information Quality (JDIQ) 7, no. 4 (2016): 16.; Wardrip-Fruin, Noah, and Nick Montfort. "The New Media Reader, A User's Manual." MIT Press (2003).

30.  Allcott, Hunt, and Matthew Gentzkow. 2017. "Social Media and Fake News in the 2016 Election." Journal of Economic Perspectives, 31 (2): 211-36. Also see, for the verification of more complex information: Lewandowsky, Stephan, Ullrich KH Ecker, Colleen M. Seifert, Norbert Schwarz, and John Cook. "Misinformation and its correction: Continued influence and successful debiasing." Psychological Science in the Public Interest 13, no. 3 (2012): 106-131.

31.  Publons. 2018. “Global State of Peer Review.” ,  Clarivate Analytics <https://publons.com/static/Publons-Global-State-Of-Peer-Review-2018.pdf>; Bornmann, Lutz, Rüdiger Mutz, and Hans-Dieter Daniel. 2010. “A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants.” PLOS ONE 5 (Greenberg, Steven A. "How citation distortions create unfounded authority: analysis of a citation network." Bmj 339 (2009): b2680.; 12): e14331. https://doi.org/10.1371/journal.pone.0014331.; hjghhjmFranco, Annie, Neil Malhotra, and Gabor Simonovits. 2014. “Publication Bias in the Social Sciences: Unlocking the File Drawer.” Science 345 (6203): 1502–5. https://doi.org/10.1126/science.1255484

32. Allcott, H. and Gentzkow, M., 2017. Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), pp.211-36; Lazer, David MJ, Matthew A. Baum, Yochai Benkler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Menczer, Miriam J. Metzger et al. "The science of fake news." Science 359, no. 6380 (2018): 1094-1096.

33.  See: Alsheikh-Ali, A.A., Qureshi, W., Al-Mallah, M.H. and Ioannidis, J.P., 2011. Public availability of published research data in high-impact journals. PloS one, 6(9), p.e24357; Vines, Timothy H., Arianne YK Albert, Rose L. Andrew, Florence Débarre, Dan G. Bock, Michelle T. Franklin, Kimberly J. Gilbert, Jean-Sébastien Moore, Sébastien Renaut, and Diana J. Rennison. "The availability of research data declines rapidly with article age." Current biology 24, no. 1 (2014): 94-97.

34.  Altman, et al., 2015, National Agenda for Digital Stewardship, National Digital Stewardship Alliance. <http://ndsa.org/national-agenda/>

35.  BRTF, Force, Blue Ribbon Task. "Sustainable economics for a digital planet: Ensuring long-term access to digital information." Final Report of the Blue Ribbon Task Force (2010). OCLC.

36.  (Citation needed)

37. Diaz, Alejandro. "Through the Google goggles: Sociopolitical bias in search engine design." In Web search, pp. 11-34. Springer, Berlin, Heidelberg, 2008; Lazer, D., Kennedy, R., King, G. and Vespignani, A., 2014. The parable of Google Flu: traps in big data analysis. Science, 343(6176), pp.1203-1205

38.  See ↵ President's Council of Advisors on Science and Technology (PCAST),
Big Data and Privacy: A Technological Perspective (White House, Washington, DC,
2014); Narayanan, Arvind, Joanna Huey, and Edward W. Felten. "A precautionary approach to big data privacy." In Data protection on the move, pp. 357-385. Springer, Dordrecht, 2016; Altman, Micah, Alexandra Wood, David R. O’Brien, and Urs Gasser. "Practical approaches to big data privacy over time." International Data Privacy Law (2018);

39.  (Citation needed)

40.  For an overview of the current OA landscape in the the US, UK, and European Union, see: Dunn, Katherine, MIT Ad-Hoc Task Force on Open Access, 2018,  “Open Access at MIT and Beyond” <https://open-access.mit.edu/>

41.  For legal and empirical analysis see (respectively) Samuelson, Pamela. "DRM {and, or, vs.} the law." Communications of the ACM 46, no. 4 (2003): 41-45; Urban, Jennifer M. and Karaganis, Joe and Schofield, Brianna, Notice and Takedown in Everyday Practice (March 22, 2017). UC Berkeley Public Law Research Paper No. 2755628. < https://ssrn.com/abstract=2755628 or http://dx.doi.org/10.2139/ssrn.2755628 >

42.  See for an analysis, Altman, Micah, and Marguerite Avery. "Information wants someone else to pay for it: laws of information economics and scholarly publishing." Information Services & Use 35, no. 1-2 (2015): 57-70; and for a recent market description:  Rob Johnson, Anthony Watkinson,
Michael Mabe  The STM report: An overview of scientific and scholarly journal publishing, 5th edition. (2018) International Association of Scientific, Technical and Medical Publishers.

43.  Examples of sharing knowledge across communities: Library of Congress, Labs: https://labs.loc.gov/; Artist in the Archive podcast: https://artistinthearchive.podbean.com/

44.  For a discussion of relevant organizational design factors at the macro- and micro- levels see (respectively): Ostrom, Elinor. Understanding institutional diversity. Princeton, NJ: Princeton University Press, 1995.; Roberts, John . The modern firm: Organizational design for performance and growth. (Oxford University Press, 2007).

45.  See generally: Ostrom, E., 2000. Collective action and the evolution of social norms. Journal of economic perspectives, 14(3), pp.137-158.. For a survey of norms related to research and reproducibility, see Borgman, C.L., 2010. Scholarship in the digital age: Information, infrastructure, and the Internet. MIT press.

46.  See Benkler, Yochai. The wealth of networks: How social production transforms markets and freedom. (Yale University Press, 2006.)

47.  See, Foray, Dominique. Economics of knowledge. (Cambridge, MA: MIT press, 2004); Altman, Micah, and Marguerite Avery. "Information wants someone else to pay for it: laws of information economics and scholarly publishing." Information Services & Use 35, no. 1-2 (2015): 57-70.

48.  For a survey of modern approaches to institutional analysis see: Peters, B. Guy. Institutional theory in political science: The new institutionalism. Bloomsbury Publishing USA, 2011.

49.  Funk, Carey. “Mixed Messages about Public Trust in Science” Pew Research Center.. (December 8, 2017. )http://www.pewinternet.org/2017/12/08/mixed-messages-about-public-trust-in-science/.

50.  Green, Toby. 2018. “We’re Still Failing to Deliver Open Access and Solve the Serials Crisis: To Succeed We Need a Digital Transformation of Scholarly Communication Using  Internet-Era Principles.” Zenodo. https://doi.org/10.5281/zenodo.1410000.

51.  Lavoie, Brian, Eric R. Childress, Ricky Erway, Ixchel M. Faniel, Constance Malpas, Jennifer Schaffner, and Titia van der Werf.. “The Evolving Scholarly Record,” June (2014). https://www.oclc.org/research/publications/library/2014/oclcresearch-evolving-scholarly-record-2014-overview.html.

52.  National Academies of Sciences, Engineering, and Medicine (NASEM). 2018. Open Science by Design: Realizing a Vision for 21st Century Research. https://doi.org/10.17226/25116.

53.  (Citation needed)

54.  (Citation needed)

55.  (Citation needed)

56.  (Citation needed)

57.  (Citation needed)

58.  (Citation needed)

59.  (Citation needed)

60.  (Citation needed)

61.  (Citation needed)

62.  (Citation needed)

63. See, for example:  Zeng, Jiaan, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. "Cloud computing data capsules for non-consumptive use of texts." In Proceedings of the 5th ACM workshop on Scientific cloud computing, pp. 9-16. ACM, 2014.

64.  Arrow, K.J., 1972. Economic welfare and the allocation of resources for invention. In Readings in Industrial Economics (pp. 219-236). Palgrave, London.

65.  Gans, Joshua S., and Scott Stern. "Is there a market for ideas?." Industrial and Corporate Change 19, no. 3 (2010): 805-837.

66.  Hess, Charlotte, and Elinor Ostrom. "A Framework for Analyzing the Knowledge Commons: a chapter from Understanding Knowledge as a Commons: from Theory to Practice." (2005).

67. R. Price, G. Shanks, 2005. “A Semiotic Information Quality Framework: development and
comparative analysis”, Journal of Information Technology 20: 88-102. S.E. Madnick, R.Y. Wang, Y.W. Lee, H. Zhu, 2009. “Overview and Framework for Data and Information Quality Research”, ACM Journal of Data and Information Quality 1(2) 1-22; Altman, Micah . "Mitigating threats to data quality throughout the curation lifecycle." in G. Marciano, C. Lee, and H. Bowden, Curating For Quality: Ensuring Data Quality to Enable New Science. National Science Foundation, Arlington County, VA (2012): 1-119.

68.  See, for an introduction: John Loomis; Valuing Environmental and Natural Resources: The Econometrics of Non-Market Valuation, American Journal of Agricultural Economics, Volume 87, Issue 2, 1 May 2005, Pages 529–530,

69.   Markowitz, Harry. "Portfolio selection." The journal of finance 7, no. 1 (1952): 77-91.

70.  See for a review of recommendation system algorithms: Park, Deuk Hee, Hyea Kyeong Kim, Il Young Choi, and Jae Kyeong Kim. "A literature review and classification of recommender systems research." Expert Systems with Applications 39, no. 11 (2012): 10059-10072.

71. Andronis, Christos, Anuj Sharma, Vassilis Virvilis, Spyros Deftereos, and Aris Persidis. "Literature mining, ontologies and information visualization for drug repurposing." Briefings in bioinformatics 12, no. 4 (2011): 357-368.

72. See for example Farid, Hany. "Image forgery detection." IEEE Signal processing magazine 26, no. 2 (2009): 16-25.; Parrish, Debra, and Bridget Noonan. "Image manipulation as research misconduct." Science and Engineering Ethics 15, no. 2 (2009): 161-167; Engels, Steve, Vivek Lakshmanan, and Michelle Craig. "Plagiarism detection using feature-based neural networks." ACM SIGCSE Bulletin 39, no. 1 (2007): 34-38;

73. Adams, James D., Grant C. Black, J. Roger Clemmons, and Paula E. Stephan. "Scientific teams and institutional collaborations: Evidence from US universities, 1981–1999." Research policy 34, no. 3 (2005): 259-285.; Altman, Micah, and Marguerite Avery. "Information wants someone else to pay for it: laws of information economics and scholarly publishing." Information Services & Use 35, no. 1-2 (2015): 57-70;

74.  Abbasi, Alireza, Jörn Altmann, and Liaquat Hossain. "Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures." Journal of Informetrics 5, no. 4 (2011): 594-607.

75.  See, for example: Wang, H., Wang, N. and Yeung, D.Y., 2015, August. Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1235-1244). ACM.

76.  Diaz, Alejandro. "Through the Google goggles: Sociopolitical bias in search engine design." In Web search, pp. 11-34. Springer, Berlin, Heidelberg, 2008; Lazer, D., Kennedy, R., King, G. and Vespignani, A., 2014. The parable of Google Flu: traps in big data analysis. Science, 343(6176), pp.1203-1205.

77.  Ohm, Paul. "Broken promises of privacy: Responding to the surprising failure of anonymization." Ucla L. Rev. 57 (2009): 1701

78.  See Phillips, P. Jonathon, Fang Jiang, Abhijit Narvekar, Julianne Ayyad, and Alice J. O'Toole. "An other-race effect for face recognition algorithms." ACM Transactions on Applied Perception (TAP) 8, no. 2 (2011): 14; and for evidence of severity and ubiquity of the problem see White, D., Dunn, J.D., Schmid, A.C. and Kemp, R.I., 2015. Error rates in users of automatic face recognition software. PLoS One, 10(10), p.e0139827.; Garvie, Clare. The perpetual line-up: Unregulated police face recognition in america. Georgetown Law, Center on Privacy & Technology, 2016.

79.  See, for example: Sweeney, L., 2013. Discrimination in online ad delivery. Queue, 11(3), p.10; Sadler, Bess, and Chris Bourg. 2015. "Feminism and the future of library discovery." Code4Lib 10; Noble, Safiya Umoja. Algorithms of Oppression: How search engines reinforce racism. NYU Press, 2018. Eubanks, V., 2018. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press.

80.   For the difficulties of establishing authenticity of web based information see:Aturban, M., Nelson, M.L. and Weigle, M.C., 2017. Difficulties of Timestamping Archived Web Pages. arXiv preprint arXiv:1712.03140; and for the surprising difficulties associated with the muchn the simpler task of verifying that numerical data has been unaltered see: Altman, Micah. "A fingerprint method for scientific data verification." In Advances in Computer and Information Sciences and Engineering, pp. 311-316. Springer, Dordrecht, 2008.

81.  Moosavi-Dezfooli, Seyed-Mohsen, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. "Universal adversarial perturbations." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017.

82.  For example, recommender systems have become ubiquitous in discovery but generally fail to protect information privacy. Both new algorithm development and careful analysis is necessary to develop recommendation algorithms that preserve information privacy, see: McSherry, Frank, and Ilya Mironov. "Differentially private recommender systems: Building privacy into the netflix prize contenders." In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 627-636. ACM, 2009. For a review of challenges in protecting privacy in algorithmic systems  and tensions with transparency and open data requirements see Altman, Micah, Alexandra Wood, David R. O’Brien, and Urs Gasser. "Practical approaches to big data privacy over time." International Data Privacy Law (2018); and Altman, Micah, Alexandra Wood, David R. O'Brien, Salil Vadhan, and Urs Gasser. "Towards a modern approach to privacy-aware government data releases." Berkeley Tech. LJ30 (2015): 1967.

83.  See for example Pasquali, Matias. "Video in science: Protocol videos: the implications for research and society." EMBO reports 8, no. 8 (2007): 712-716.;  and

84.  Lithgow, Gordon J., Monica Driscoll, and Patrick Phillips. "A long journey to reproducible results." Nature News 548.7668 (2017): 387.

85.  Moss-Racusin, Corinne A., John F. Dovidio, Victoria L. Brescoll, Mark J. Graham, and Jo Handelsman. "Science faculty’s subtle gender biases favor male students." Proceedings of the National Academy of Sciences 109, no. 41 (2012): 16474-16479.

86.  See for example an analysis of the theoretical and practical challenges of capturing oral history and ‘folkloric’ information generally: Owens, Trevor. The Theory and Craft of Digital Preservation, John Hopkins University Press (2018).

87.  See: Ostrom, E. and C. Hess, 2007, Understanding knowledge as a commons: From theory to practice. Massachusetts Institute of Technology Press.

88.  Altman, et al., 2015, National Agenda for Digital Stewardship, National Digital Stewardship Alliance. <http://ndsa.org/national-agenda/>;

89.  Becker, S. Adams, Michele Cummins, A. Davis, A. Freeman, C. Hall Giesinger, V. Ananthanarayanan, K. Langley, and N. Wolfson. NMC horizon report: 2017 library edition. The New Media Consortium, 2017.

90.  See for reviews: Altman, Micah. "Open source software for Libraries: from {Greenstone} to the {Virtual Data Center} and beyond." iassist Quarterly 25 (2002) and Lesk, Michael. "A personal history of digital libraries." Library Hi Tech 30, no. 4 (2012): 592-603. Also see, for recent work the Code4Lib journal archives at <https://journal.code4lib.org/issues>

91.  (Citation needed)

92.  See for an analysis, Altman, Micah, and Marguerite Avery. "Information wants someone else to pay for it: laws of information economics and scholarly publishing." Information Services & Use 35, no. 1-2 (2015): 57-70; and for a recent market description:  Rob Johnson, Anthony Watkinson,
Michael Mabe T he STM report: An overview of scientific and scholarly journal publishing, 5th edition. (2018) International Association of Scientific, Technical and Medical Publishers.

93.  (Citation needed)

94.  (Citation needed)

95.  (Citation needed)

96.  Ayyagari, Ramakrishna. "An exploratory analysis of data breaches from 2005-2011: Trends and insights." Journal of Information Privacy and Security 8, no. 2 (2012): 33-56.

97.  See Ohm, Paul. "Broken promises of privacy: Responding to the surprising failure of anonymization." Ucla L. Rev. 57 (2009): 1701; Altman, Micah, Alexandra Wood, David R. O’Brien, and Urs Gasser. "Practical approaches to big data privacy over time." International Data Privacy Law (2018).

98.  (Citation needed)

99.  More information about these organizations can be found on their web pages: https://www.clir.org/about/ ; https://ndsa.org/ ; https://diglib.org; https://www.rd-alliance.org/ ; https://sparcopen.org/ ; and  http://www.codata.org/

100.   More information about these organizations can be found on their web pages: https://duraspace.org/ ; https://dataverse.org/ ; https://www.dpn.org/ ; https://cos.io/

101.  See respectively ndsa.org, longnow.org, .dpconline.org, and reports produced by these organizations, such as N. Beagrie, M. Joves  Digital Preservation Handbook. Digital Preservation Coalition, 2009; Brand, Stewart. The clock of the long now: Time and responsibility. Basic Books, 2008.

102.  See Graham, Mark, and Anasuya Sengupta. "We’re all connected now, so why is the internet so white and western?’." The Guardian 5 (2017).