HOME - ABOUT - CALENDAR - RESOURCES - DIGITAL TEXT ARCHIVES
Resources
R resources
- Downloading R and R Studio: Homepage of the R-project.
Using R for Optical Character Recognition (OCR)
- Setting up a Google Storage bucket: Thomas Hegghammer’s walkthrough.
- How to use daiR to process big file batches: Erik Skare’s example.
Data visualization
- How to create a timeline with gg_vistime: Jacob Høigilt’s example. Can be used with this .csv file. Tutorial for the gg_vistime package.
Creating maps
- How to create a world map and then zoom in on a region, adding country names etc. Jacob’s example.
Python resources
Installing and running Python and Anaconda
- How to install Python and Anaconda, create environments and install packages, and execute Python scripts: Erik Skare’s quick guide.
Text preprocessing
We usually need to preprocess our text before analysis (removing stopwords, lemmatization, and tokenization). Although R is one of our favourite tools for text mining and analysis, Python has several packages that are superior:
- Norwegian text preprocessing with SpaCy: Erik Skare’s example.
- Arabic text preprocessing with Camel Tools: Thomas Hegghammer’s exmaple.
- Persian text preprocessing with Hazm: Erik Skare’s example.
- Turkish text preprocessing with Zemberek and Zeyrek: Erik Skare’s example.
- Chinese text preprocessing with Jieba: Erik Skare’s example.
- Japanese text preprocessing with Janome: Erik Skare’s example.
Markdown resources:
- Markdown with Zotero and Better Bibtex: Jacob Høigilt’s example.
- Writing markdown in Quarto: Jacob Høigilt’s markdown script.
- Modifying VS Codium through its settings.json file: Erik Skare’s example.
A quick guide for High Performance Computing
High Performance Computing (HPC) is becoming increasingly important as we process, analyze, and perform complex calculations of increasing amounts of data. HPC uses clusters of powerful processors that work in parallel at extremely high speeds. The University of Oslo has its own HPC cluster called Fox.
- How run Python and R scripts in Fox: Erik Skare’s quick guide.