Python for Programmers: StringIO
Welcome to the StringIO lesson!
This lesson is shown as static text below. However, it's designed to be used interactively. Click the button below to start!
Python has extensive support for working with files. The details match what we see in most programming languages: we can
openfiles, call.readon them,.closethem, etc.In addition to regular files, Python comes with in-memory objects that look and act like files, called
StringIO. Unlike regular files, aStringIOnever interacts with the disk. Instead, aStringIO's data is stored entirely in memory.To use
StringIO, we need to import it from theiomodule. In the next example, we build an emptyStringIO,.writesome data to its internal buffer, then get that data with.getvalue().>
from io import StringIObuffer = StringIO()buffer.write("contents")buffer.getvalue()Result:
Writing appends to the buffer.
>
from io import StringIObuffer = StringIO()buffer.write("buffer ")buffer.write("contents")buffer.getvalue()Result:
'buffer contents'
As its name implies,
StringIOonly works with strings. If we try to write any other type of data,StringIOraises aTypeError.>
from io import StringIObuffer = StringIO()buffer.write("You have $")buffer.write(20)buffer.write(" in your account")buffer.getvalue()Result:
TypeError: string argument expected, got 'int'
When we read or write data in a
StringIO, it remembers our position. You can think of the position like the cursor in a text editor: it's normally one character ahead of the last character index.The position starts at 0. If we write 4 bytes, the position is now 4. If we write 4 more bytes, the position is now 8. We can get the current stream position by calling the
.tellmethod.>
from io import StringIObuffer = StringIO()buffer.tell()Result:
0
>
from io import StringIObuffer = StringIO()buffer.write("Hello")buffer.tell()Result:
5
We can set the stream's position with
.seek(n). Seeking only moves the stream position; it doesn't change the buffer's contents. Writes start from the stream position, so moving the position backward allows us to overwrite existing data in the buffer.>
from io import StringIObuffer = StringIO()buffer.write("Hello Amir")buffer.seek(6)buffer.write("Betty")buffer.getvalue()Result:
'Hello Betty'
We can specify an initial value for the
StringIO's buffer by passing it as a constructor argument.>
from io import StringIObuffer = StringIO("Hello")buffer.getvalue()Result:
'Hello'
However, the buffer's stream position always starts at 0. If we don't adjust the position, any writes to the buffer will overwrite the initial value. In the next example, the
.writecall overwrites 4 out of the 6 characters in the buffer.>
from io import StringIObuffer = StringIO("Hello ")buffer.write("Amir")buffer.getvalue()Result:
'Amiro '
StringIOs have a.readmethod. Calling.readwithout any arguments returns everything from the current position to the end of the buffer. (Remember that the position starts at 0)>
from io import StringIObuffer = StringIO("Hello Amir")buffer.read()Result:
At first,
.readlooks like it does the same thing as.getvalue. The difference is that.readstarts at the current stream position and increments the position while reading, whereas.getvaluereturns everything in the buffer without changing the stream position. In the next example, the first.readcall reads the entire buffer, so the second.readcall gets an empty string.>
from io import StringIObuffer = StringIO("Hello Amir")first_read = buffer.read()second_read = buffer.read()(first_read, second_read)Result:
('Hello Amir', '')However,
.getvaluealways gives us theStringIO's full internal buffer, regardless of the current position.>
from io import StringIObuffer = StringIO("Hello Amir")first_read = buffer.read()buffer_contents = buffer.getvalue()(first_read, buffer_contents)Result:
('Hello Amir', 'Hello Amir')We can request a certain number of bytes from
.read. For example,buffer.read(2)reads up to 2 characters, then returns that string. It also increments the stream position by 2.>
from io import StringIObuffer = StringIO("abcd")first_read = buffer.read(2)second_read = buffer.read(2)- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
first_readResult:
'ab'
- Note: this code example reuses elements (variables, etc.) defined in earlier examples.
>
second_readResult:
'cd'
If we ask for more bytes than are left in the string,
.readgives us the rest of the string.>
from io import StringIObuffer = StringIO("Keanu")buffer.read(9000)Result:
'Keanu'
Passing
Noneor a negative number to.readis like calling.readwithout arguments.>
from io import StringIObuffer = StringIO("abcd")buffer.read(-1)Result:
'abcd'
This is unusual. Normally, Python is more strict about function arguments than other dynamic languages, so we would expect it to throw an exception here. However, the
.read(-1)call is allowed, even though the negative argument probably indicates a mistake on our part. In practice, it's best to avoid negative arguments to.read.Now let's step back and ask an important question: what's the use for in-memory file-like objects like
StringIO? Why not just use regular strings?The answer is that
StringIOexcels in situations where some code requires a file, but we don't want to use an actual on-disk file. For example, suppose that we're writing some automated tests for a function that reads from files. We could have the tests write an actual file to disk, then read from that file.Unfortunately, that has many drawbacks. First, writing real files to disk is much slower than working with in-memory data. In a large test suite with thousands of tests, the slowdown can add up to significant delays. Second, files raise awkward questions like "what directory should the file be in?" and "will the file be deleted if the test runner terminates unexpectedly in the middle of a test?"
With
StringIO, all of these problems disappear!>
from io import StringIO# This is the function that we'll test.def file_mentions_ms_fluff(the_file):return "Ms. Fluff" in the_file.read()# Test the function.assert file_mentions_ms_fluff(StringIO("This file is about Ms. Fluff."))assert not file_mentions_ms_fluff(StringIO("This file is about Keanu."))Result:
The
asserts didn't raise any exceptions, so we know that our tiny suite of two tests passed. We didn't have to create any actual files to test our function, even though our function is designed to work with real files.Most of the
StringIOmethods discussed here also exist on regular files:.read,.write,.tell, and.seek. That's whyStringIOis such a good stand-in for real files: it has the same methods, and they behave in the same way! The exception is.getvalue, which is specific toStringIO.