Modern Web Archiving Technologies
https://doi.org/10.20913/1815-3186-2024-3-28-37
Abstract
The idea of web archiving, pioneered in 1996 as a way to preserve web content for future researchers, has remained important in the 21st century. It is evident by the significant number of web archives, the development of web archiving software and tools, and increased awareness of initiatives to preserve the internet-resources, introducing changes in the legislation of some countries to provide access to historical web content. The purpose of the study is to identify web archiving technologies that contribute to the preservation of web content at the global, national and local levels, as well as within the framework of the formation of a wide range of thematic collections. As a result, trends in the development of web archives, approaches to structuring the web archive system for more efficient organization of work with them, as well as stages and methods of implementing web archiving, that allow one to complete the full preservation cycle: collect, save, provide access, distribute and evaluate the results obtained. A conclusion is made, that the prospects for the further development of web archives, taking into account the standards for collecting, preserving and providing long-term access to web content, recommended by the International Consortium for Internet Preservation, including modern web archiving tools (e.g. open source codes). They allow expanding capabilities and the functionality of web archives as sources of searching for open information, obtaining new knowledge, restoring lost information, as well as checking previously published data, that often have great cultural, scientific, educational, artistic and social significance.
About the Author
N. S. RedkinaRussian Federation
Natalya S. Redkina - Doctor of Pedagogical Sciences, Head of the Department of Scientific Research of Open Science.
15 Voskhod St., Novosibirsk, 630102
References
1. Balatskaya NM and Martirosova MB (2023) Local history web archive in the structure of library information resources: model and implementation possibilities. Saint Petersburg. (In Russ.).
2. Demidov PA (2017) Methods of web archiving in modern archival work. Razvitie nauki i tekhniki: mekhanizm vybora i realizatsii prioritetov: sb. st. Mezhdunar. nauch.-prakt. konf. (25 dek. 2017 g., Omsk). Omsk; Ufa, pt. 6, pp. 69-72. (In Russ.).
3. Povroznik NG (2020) Web archive as a source for studying modern history. Istoricheskiye issledovaniya v kontekste nauki o dannykh: informatsionnyye resursy, analiticheskiye metody i tsifrovyye tekhnologii. Moscow, pp. 401-407. (In Russ.).
4. Redkina NS (2021) Global trends in library web-archives. Nauchnye i tekhnicheskie biblioteki 1: 99-114. (In Russ.). DOI: https://doi.org/10.33186/1027-3689-2021-1-99-114.
5. Smirnov AA (2022) The problems of national and foreign web-archiving in libraries. Web-archiving as a functional area. Nauchnye i tekhnicheskie biblioteki 12: 104-123. (In Russ.). DOI: https://doi.org/10.33186/1027-3689-2022-12-104-123.
6. Bingham NJ and Byrne H (2021) Archival strategies for contemporary collecting in a world of big data: challenges and opportunities with curating the UK web archive. Big Data & Society 8 (1). DOI: https://doi.org/10.1177/20539517219904.
7. Brugger N (2005) Archiving websites: general considerations and strategies. Arhus, Denmark: Centre for Internet Reseach. URL: https://cfi.au.dk/fileadmin/www.cfi.au.dk/publikationer/archiving_underside/archiving.pdf (accessed 04.06.2024).
8. Chakarov R (2023) How many websites are there? How many are active in 2023? WebTribunal: website. URL: https://webtribunal.net/blog/how-many-websites (accessed 04.06.2024).
9. Costa M, Gomes D and Silva MJ (2017) The evolution of web archiving. International Journal on Digital Libraries 18 (3): 191-205. DOI: https://doi.org/10.1007/s00799-016-0171-9.
10. Cui C, Pinfield S, Cox A and Hopfgartner F (2023) Participatory web archiving: multifaceted challenges. Information for a better world: normality, virtuality, physicality, inclusivity: proc. of the 18th Intern. conf., iConference 2023, virtual event, March 13-17, 2023. Springer, pt. 1, pp. 79-87. DOI: https://doi.org/10.1007/978-3-031-28035-1_7.
11. Frew L, Nelson ML, Weigle MC (2023) Making changes in webpages discoverable: a change-text search interface for web archives. 2023 ACM/IEEE Joint conference on digital libraries (JCDL): proceedings: Santa Fe, NM, USA, 26-30 June 2023. Los Alamitos [et al.], pp. 71-81. DOI: https://doi.org/10.1109/JCDL57899.2023.00021.
12. Gomes D (2022) Web archives as research infrastructure for digital societies: the case study of Arquivo. pt. Archeion 123: 46-85. DOI: https://doi.org/10.4467/26581264ARC.22.012.16665.
13. Hegarty K (2022) The invention of the archived web: tracing the influence of library frameworks on web archiving infrastructure. Internet Histories 6 (4): 432451. DOI: https://doi.org/10.1080/24701475.2022.2103988.
14. Jayanetti HR, Jones SM, Klein M, Osbourne A, Koerbin P, Nelson ML and Weigle MC (2022) Creating structure in web archives with collections: different concepts from web archivists. arXiv: website. DOI: https://doi.org/10.48550/arXiv.2209.08649.
15. Khan M and Rahman AU (2019) A systematic approach towards web preservation. Information Technology and Libraries 38 (1): 71-90. DOI: https://doi.org/10.6017/ital.v38i1.10181.
16. Maemura E (2023a) Sorting URLs out: seeing the web through infrastructural inversion of archival crawling. Internet Histories 7 (4): 386-401. DOI: https://doi.org/10.1080/24701475.2023.2258697.
17. Maemura E (2023b). All WARC and no playback: the materialities of data-centered web archives research. Big Data & Society 10 (1). DOI: https://doi.org/10.1177/20539517231163172.
18. Ruest N, Fritz S and Milligan I (2022). Creating order from the mess: web archive derivative datasets and notebooks. Archives and Records 43 (3): 316-331. DOI: https://doi.org/10.1080/23257962.2022.2100336.
19. Ryan M, Keating D and Finegan J (2022) Managing and accessing web archives: Irish practitioners' perspectives. AI & Society 37 (3): 975-984. DOI: https://doi.org/10.1007/s00146-021-01364-0.
Review
For citations:
Redkina N.S. Modern Web Archiving Technologies. Bibliosphere. 2024;(3):28-37. (In Russ.) https://doi.org/10.20913/1815-3186-2024-3-28-37