Setting up, and working with, Python

Author

Erik Skare

Published

March 18, 2023

Digital Area Studies (DAS) has published several resources that require Python (primarily for text preprocessing). Although Python can be pretty straightforward, its features (such as creating environments) can be quite confusing if you are used to R and its structure. This quick guide will hopefully enable you to install Python and Anaconda, create environments, install packages, and execute python scripts so you can preprocess whatever text corpora you work with.

What is Python and Anaconda?

Briefly explained, Python is a popular high-level programming language used for a wide range of applications, including scientific computing, data analysis, web development, and more. It has a simple and easy-to-learn syntax and is known for its readability and vast community support.

Anaconda, on the other hand, is a distribution of Python that includes the core Python language, commonly used packages for scientific computing, data analysis, and machine learning, as well as a package manager called Conda. When this quick-guide utilizes Anaconda to create environments, install packages, and run Python scripts, it is because it provides a convenient way to manage and install Python packages and dependencies.

The main differences between Python and Anaconda are as follows:

  1. Python is a programming language, while Anaconda is a distribution of Python that includes additional packages and tools for data science.

  2. Python can be downloaded and installed independently, while Anaconda is a complete platform that includes Python and other packages used in data science.

  3. Anaconda includes Conda, a package and environment manager, which can be used to create and manage isolated environments and install packages.

Installing Python

On a laptop from the University of Oslo

The installation of both Python and Anaconda does not require too many steps if you are working on a laptop from the University of Oslo as both are provided through the Software Center. First, check if Python is already installed on your computer by running python -v in your command prompt. If Python is not found, follow the steps below:

  1. Open UiO’s Software Center and find Python(version per March 24, 2023, is Python 3.10.4). Install Python as you would with any other program from the Software Center.

  2. The same applies to Anaconda(which we will use below to create environments, install packages, and run Python scripts). Install Anacondaas you would with any other program from the Software Center (version per March 24, 2023, is Anaconda3, Inc 2022.05).

You should now have installed all necessary software for running Python scripts.

On your personal laptop

To install Python on your personal laptop:

  • If you want to install Python on your personal laptop (given that you do not have it installed already after checking with python -v), then simply download it from Python’s homepage and install it as you would with any other software. Check that Python has been properly installed by running python -v again in your command prompt. Linux and MacOS should come with Python pre-installed.

Installing Anaconda

Installing Anaconda on Windows or Mac

  • Download the Anaconda installer for Windows or MacOS and install it as you would with any other software.

Installing Anaconda on Linux OS

  1. Download the Anaconda installer for Linux. Alternatively, if you are looking for a specific version of Anaconda, then you can find it in the Anaconda Archive and download it with the command curl [https://url.to.your.preferred.version.of.anaconda] anaconda.sh. Remember to install curlfirst if you have not already by running sudo apt install curl.

  2. (Recommended) Open your terminal and verify the installer’s data integrity with SHA-256 (a sequence of numbers and letters that you can use to check that your copy of a downloaded update file is identical to the original) by running shasum -a 256 /path/to/the/installation's/path/and/filename.

  3. Install Python by running bash ~/path/to/the/installation's/path/and/filename. Let’s say we downloaded Anaconda3-2020.05-Linux-x86_64.shto the Downloadsfolder. We would then run bash ~/Downloads/Anaconda3-2020.05-Linux-x86_64.sh.

  4. You will receive the license agreement once you press Enter. Review and scroll through the agreement by pressing and holding Enter.

  5. Write yesto accept the agreement.

  6. You must now choose the installation path of Anaconda. (Recommended) Press Enterto accept default location. Or enter another file path to specify an alternate installation directory. If you accept the default installation directory, then it will be installed in /home/USER/anaconda3. The installation will take a couple of minutes.

  7. You will be prompted to initialize Anaconda Distribution by running conda init. Run yes.

  8. The installer should now finish and display Thank you for installing Anaconda3!.

Setting up an environment

In order to run a Python script, it is recommended that you first create an environment in which you install all necessary packages (we will in this quick guide assume that you want to use Python to preprocess Chinese corpora). Essentially, a Python environment is an isolated folder structure. There are two main reasons to work with environments:

  • Avoid system pollution: If you install packages to your operating system’s global Python, it is a possibility that they will mix up and interfere with system-relevant packages (not good). Moreover, if you subsequently update your operating system, then you might lose your packages as they could be overwritten in the process.

  • Avoid dependency conflicts: Software dependency is, briefly summarized, the relationship between software components where one component relies on the other to work properly. So if your program uses a library to query a database, then that program “depends” on that library and will not function as intended if the library (the dependency becomes unavailable). The same applies to the Python packages that we wish to install. More often than not, they will require a number of dependencies to work properly. When we install packages in environments it is, partly, to avoid what is called “Dependency Hell”: A situation when the dependencies of all various packages we have installed become difficult to manage; resulting in conflicts, errors, and other problems that make it hard to build, deploy, or maintain the software.

Creating a virtual environment

  1. Open your terminal/command prompt.

  2. Run conda create --name [preferred name of environment]. For example, conda create --name china.

  3. In order to work with the environment, you will need to activate it by running conda activate [name of environment]. In this case, conda activate china. To deactivate the environment later, simply run conda deactivate.

You may remember that an environment is an isolated folder structure. You can thus find all environments in the Anaconda folder: ~/Anaconda3/envs. The file path to your china environment would hence be ~/Anaconda3/envs/china.

Installing packages

  1. In order to install packages in your preferred environment, you will need to activate it first. Run conda activate china. You should see (base) [username]:~$(on Linux OS) change into (china) [username]:~$.

  2. As you can see in the Chinese preprocess pipeline, we need the following packages pandas, requests, jieba, and opencc. You can install them with two different commands: conda install [package name] and pip install [package name]. If a package is available from Anaconda.org, then you can simply run conda install. If not, then run pip install. In this case, pandasand requestsare available from Anaconda, while jiebaand opencc are not. So we will first run conda install pandas requests and then pip install jieba opencc (you can install several packages with one command by simply listing them). If you want to install a specific version of a package, then add conda install [package name]=[package version].

  3. To check what packages are installed in your environment, run conda list.

Running a Python script

  1. This part assumes that you have installed Python and Anaconda, that you have created an environment, installed all required packages, and that your Python script and the text file you want to preprocess are located in the same folder.

  2. Activate your environment by running conda activate china.

  3. Navigate to the folder with the text file and the python script by running cd /full/path/to/folder/with/python/script (“cd” stands for “change directory”). If done correctly, then you should see (china) [username]:~$change into (china) [username]:~/full/path/to/folder/with/python/script$.

  4. Execute your python script by running the command python [name of your python script].py. In this case, we will assume your python script is called chinese_nlp.pyand we will thus execute the script by running python chinese_nlp.py.

That should, theoretically, be it!