Can you archive my website?

Guidance on archiving University websites. 


Answer

We are developing our services to support the long-term preservation of digital records. This FAQ provides advice for members of the University responsible for websites that are reaching the end of their lifecycle or need an occasional snapshot to be preserved of part or all of a site. 

What is web archiving? 

According to The National Archives, web archiving is 'the process of collecting websites and the information that they contain from the World Wide Web, and preserving these in an archive.' A website might be preserved because it contains information that is of long-term value that extends beyond the original purpose for which it was created. There may be a benefit to the University, researchers and the public in ongoing access to some web content. Examples might include webpages or websites containing institutional records, news articles, or research outputs that form part of the scholarly record.

Can my website be preserved in the University Archives? 

At present, archiving of web content in the University Archives is restricted to pages published on the main corporate University of Leicester website, published in Sitecore and previous web platforms (including Plone and CWIS). Our current preservation tools enable us to capture individual webpages and collections of webpages, but do not support largescale web archiving. We cannot yet capture snapshots of the entire University website, although we are working towards this. We are also unable to provide public access to the web content that we capture. Pages are collected for preservation in our digital preservation system and for future research access. 

How can I archive a website from a research project? 

The Leicester Research Archive (LRA) is used to manage, share, store, and preserve institutional open access research outputs, or publications, and final research data outputs. It is not suitable for archiving websites or webpages, but can preserve and provide access to datasets underpinning a website. For example, if your research output includes a searchable database presented via a website the research data can often be packaged and licensed for reuse via the LRA. This should be considered as part of the data management plan for your project. 

Legacy websites containing scholarly information generated by research projects may be candidates for preservation by external web archiving services. This is recommended where the content from the site can no longer be maintained. For example, as project staff leave the University there may be a lack of resource to continue to update the website or migrate it onto new platforms as technology changes. 

Submitting a website to the UK Web Archive

The UK Web Archive is a collaboration between the UK Legal Deposit Libraries. It aims to collect all UK websites at least once a year, so your site may have been archived already. This is done through automated 'web crawls' retrieving websites identified as being published in the UK. Due to the scale of the task, many websites are missed. To recommend your website for preservation, complete the Save a UK website form

Archived websites can be accessed in the reading rooms of the Legal Deposit Libraries. For your site to be publicly accessible through the online UK Web Archive you will need to complete a permission form. There is usually a backlog of content to be made available so this may take some time. To find out whether your website has been archived and how it can be accessed use the Collection archive search page

Internet Archive Wayback Machine

Internet Archive, a US not-for-profit organisation, also harvests and provides online access to archived websites. To find out whether your pages or site have been archived, you can search via Wayback Machine. Anyone can submit a webpage or collection of webpages to be captured via Internet Archive's Save Page Now form. 

Difficult to archive web content

Some web content cannot be easily captured by web archiving tools and services. These include: 

  • Embedded content such as video and audio hosted on external streaming services. 
  • Database-driven websites. 
  • Content hosted on Sharepoint or otherwise requiring a login to access. 

Research data management planning considerations

Due to the complexities and challenges associated with preserving and providing access to archived websites, we recommend that researchers consider the long-term sustainability of any web resources as part of the data management plan for their project. It is recommended that websites that comprise a research output are submitted to the UK Web Archive at the end of the project and that any research data outputs are deposited with the Leicester Research Archive

Contacts

For advice and support with archiving of University of Leicester corporate webpages contact specialcollections@le.ac.uk

For guidance with research data management and preserving datasets underpinning research publications contact researchdata@le.ac.uk

 

Topics

  • Last Updated Jan 09, 2023
  • Views 11
  • Answered By Simon Dixon

FAQ Actions

Was this helpful? 0 0