Setting up, and working with, Python
Digital Area Studies (DAS) has published several resources that require Python (primarily for text preprocessing). Although Python can be pretty straightforward, its features (such as creating environments) can be quite confusing if you are used to R and its structure. This quick guide will hopefully enable you to install Python and Anaconda, create environments, install packages, and execute python scripts so you can preprocess whatever text corpora you work with.
What is Python and Anaconda?
Briefly explained, Python is a popular high-level programming language used for a wide range of applications, including scientific computing, data analysis, web development, and more. It has a simple and easy-to-learn syntax and is known for its readability and vast community support.
Anaconda, on the other hand, is a distribution of Python that includes the core Python language, commonly used packages for scientific computing, data analysis, and machine learning, as well as a package manager called Conda. When this quick-guide utilizes Anaconda to create environments, install packages, and run Python scripts, it is because it provides a convenient way to manage and install Python packages and dependencies.
The main differences between Python and Anaconda are as follows:
Python is a programming language, while Anaconda is a distribution of Python that includes additional packages and tools for data science.
Python can be downloaded and installed independently, while Anaconda is a complete platform that includes Python and other packages used in data science.
Anaconda includes Conda, a package and environment manager, which can be used to create and manage isolated environments and install packages.
Installing Python
On a laptop from the University of Oslo
The installation of both Python and Anaconda does not require too many steps if you are working on a laptop from the University of Oslo as both are provided through the Software Center. First, check if Python is already installed on your computer by running python -v
in your command prompt. If Python is not found, follow the steps below:
Open UiO’s Software Center and find
Python
(version per March 24, 2023, isPython 3.10.4
). Install Python as you would with any other program from the Software Center.The same applies to
Anaconda
(which we will use below to create environments, install packages, and run Python scripts). InstallAnaconda
as you would with any other program from the Software Center (version per March 24, 2023, isAnaconda3, Inc 2022.05
).
You should now have installed all necessary software for running Python scripts.
On your personal laptop
To install Python on your personal laptop:
- If you want to install Python on your personal laptop (given that you do not have it installed already after checking with
python -v
), then simply download it from Python’s homepage and install it as you would with any other software. Check that Python has been properly installed by runningpython -v
again in your command prompt. Linux and MacOS should come with Python pre-installed.
Installing Anaconda
Installing Anaconda on Windows or Mac
- Download the Anaconda installer for Windows or MacOS and install it as you would with any other software.
Installing Anaconda on Linux OS
Download the Anaconda installer for Linux. Alternatively, if you are looking for a specific version of Anaconda, then you can find it in the Anaconda Archive and download it with the command
curl [https://url.to.your.preferred.version.of.anaconda] anaconda.sh
. Remember to installcurl
first if you have not already by runningsudo apt install curl
.(Recommended) Open your terminal and verify the installer’s data integrity with SHA-256 (a sequence of numbers and letters that you can use to check that your copy of a downloaded update file is identical to the original) by running
shasum -a 256 /path/to/the/installation's/path/and/filename
.Install Python by running
bash ~/path/to/the/installation's/path/and/filename
. Let’s say we downloadedAnaconda3-2020.05-Linux-x86_64.sh
to theDownloads
folder. We would then runbash ~/Downloads/Anaconda3-2020.05-Linux-x86_64.sh
.You will receive the license agreement once you press
Enter
. Review and scroll through the agreement by pressing and holdingEnter
.Write
yes
to accept the agreement.You must now choose the installation path of Anaconda. (Recommended) Press
Enter
to accept default location. Or enter another file path to specify an alternate installation directory. If you accept the default installation directory, then it will be installed in/home/USER/anaconda3
. The installation will take a couple of minutes.You will be prompted to initialize Anaconda Distribution by running
conda init
. Runyes
.The installer should now finish and display
Thank you for installing Anaconda3!
.
Setting up an environment
In order to run a Python script, it is recommended that you first create an environment in which you install all necessary packages (we will in this quick guide assume that you want to use Python to preprocess Chinese corpora). Essentially, a Python environment is an isolated folder structure. There are two main reasons to work with environments:
Avoid system pollution: If you install packages to your operating system’s global Python, it is a possibility that they will mix up and interfere with system-relevant packages (not good). Moreover, if you subsequently update your operating system, then you might lose your packages as they could be overwritten in the process.
Avoid dependency conflicts: Software dependency is, briefly summarized, the relationship between software components where one component relies on the other to work properly. So if your program uses a library to query a database, then that program “depends” on that library and will not function as intended if the library (the dependency becomes unavailable). The same applies to the Python packages that we wish to install. More often than not, they will require a number of dependencies to work properly. When we install packages in environments it is, partly, to avoid what is called “Dependency Hell”: A situation when the dependencies of all various packages we have installed become difficult to manage; resulting in conflicts, errors, and other problems that make it hard to build, deploy, or maintain the software.
Creating a virtual environment
Open your terminal/command prompt.
Run
conda create --name [preferred name of environment]
. For example,conda create --name china
.In order to work with the environment, you will need to activate it by running
conda activate [name of environment]
. In this case,conda activate china
. To deactivate the environment later, simply runconda deactivate
.
You may remember that an environment is an isolated folder structure. You can thus find all environments in the Anaconda folder: ~/Anaconda3/envs
. The file path to your china environment would hence be ~/Anaconda3/envs/china
.
Installing packages
In order to install packages in your preferred environment, you will need to activate it first. Run
conda activate china
. You should see(base) [username]:~$
(on Linux OS) change into(china) [username]:~$
.As you can see in the Chinese preprocess pipeline, we need the following packages
pandas
,requests
,jieba
, andopencc
. You can install them with two different commands:conda install [package name]
andpip install [package name]
. If a package is available from Anaconda.org, then you can simply runconda install
. If not, then runpip install
. In this case,pandas
andrequests
are available from Anaconda, whilejieba
andopencc
are not. So we will first runconda install pandas requests
and thenpip install jieba opencc
(you can install several packages with one command by simply listing them). If you want to install a specific version of a package, then addconda install [package name]=[package version]
.To check what packages are installed in your environment, run
conda list
.
Running a Python script
This part assumes that you have installed Python and Anaconda, that you have created an environment, installed all required packages, and that your Python script and the text file you want to preprocess are located in the same folder.
Activate your environment by running
conda activate china
.Navigate to the folder with the text file and the python script by running
cd /full/path/to/folder/with/python/script
(“cd” stands for “change directory”). If done correctly, then you should see(china) [username]:~$
change into(china) [username]:~/full/path/to/folder/with/python/script$
.Execute your python script by running the command
python [name of your python script].py
. In this case, we will assume your python script is calledchinese_nlp.py
and we will thus execute the script by runningpython chinese_nlp.py
.
That should, theoretically, be it!