Everything we've done so far has been completely self-contained in the script and every time we run any of them we will get exactly the same output. The power of programming is to be able to take the same piece of code and apply it to different data to get different results. One common way in which this is done is writing a script which can analyse a data file. To do that we need to learn how to open files.
The simplest this we can do with files is read a file in and print it to the screen. Make a new script called file.py
and put the following in it:
with open("file.py") as f:
for line in f:
print(line, end="")
When you run it, you will see the following:
python file.py
which is (somewhat recursively) the contents of the file file.py
.
There are a few new things here so let's go through them in turn. The first thing is to open the file. You open files using the open
function. The part open("file.py")
says to open the file file.py
. This returns a file handle which is assigned to the variable f
. If the file does not exist, or is not readable then the script will exit with an error (have a try and see what the error looks like!). The use of a with
statement means that when the code inside the with
block has finished running the file will be closed automatically.
In the next line (for line in f:
) we are looping over the lines of the file. This loop looks just like those we used when looping over lists a few chapters previously. When looping over a list you get each of the elements in turn but when looping over an open file you get each of the lines in turn. We assign the string containing the line from the file to the variable line
.
Finally, we print the string line
. Each line in the file already ends with a "new-line" character so when it is printed, it will print the new-line too. By default the print
function will also add its own new-line so we disable that by using end=""
.
Simply reading the data and printing it isn't very useful. Let's take a first step towards some data analysis and pretend that the task we're trying to do is to read in data from the file and add 17 to each value.
with open("data.txt") as f:
for line in f:
new_number = line + 17 # Here is where we do our "data analysis"
print(new_number, end="")
If you edit file.py
to contain this code and run it you should see an error:
Traceback (most recent call last):
File "file.py", line 3, in <module>
new_number = line + 17
TypeError: can only concatenate str (not "int") to str
This is telling us that there is an error occuring when trying to add 17 to the data read in from the line in the file. The type of the error is TypeError
which tells us the problem is likely due to incorrect data types (i.e. string, float, int, list etc.). The error message says can only concatenate str (not "int") to str
which implies that the computer believes that we're trying to concatenate (join together) something with a string. The only two things involved in this operation are line
and 17
. We know that 17
is an integer so line
must be a string!
When reading from a file like this, everything it gives you will always be a string, even if the string only contains digits like "12"
. If we know that the file only contains integers then we can convert each number as it comes in using the int
function. Also, since we're now printing integers, we no longer need the end=""
tweak:
with open("data.txt") as f:
for line in f:
number = int(line) # Here we do the type conversion
new_number = number + 17 # Here is where we do our "data analysis"
print(new_number)
Running this new script will now print out our "processed" data:
python file.py
file.py
to multiply the data by 10 instead of adding 17. answer+=
like:
num = 3
num += 4
print(num) # `num` will now be 7
data.txt
. Add an if
statement to fix it. answer"sum"
, "count"
and "mean"
. answer