Strathmore Data Week 2019

Posted on Wed 17 July 2019 by Matt Williams in teaching

As part of an ongoing collaboration betweek Strathmore Univeristy in Nairobi, Kenya and the Jean Golding Institute (JGI) at the University of Bristol, UK a delegation from Bristol visited Strathmore for a week of teaching and collaboration meetings.

The teaching was loosely modeled after the Data Week that the JGI has put on at Bristol for the last few years. In these we usually run a full week of parallel software engineering, data science and programming workshops along with talks. In the case of Strathmore Data Week, we ran three days of teaching from Tuesday 2019-07-02 to Thursday 2019-07-04.

Day one

Beginning Python

On the Tuesday we ran a three-hour morning session of Beginning Python. I've been running this course at Bristol multiple times a year for the last three years or so and was run by Christopher Woods for a few years before that. It goes through the very basics of Python, designed for people who have never programmed before. In the leadup to Strathmore Data Week we didn't have a good handle on the audience we should expect and so we decided to cover all the bases and offer courses from the most beginner to our most advanced. We usually run these sesison with up to 30 students in the room and have a single lead teacher with a 'second' to help answer questions. I was leading all the Python teaching for the week with help from Rachel Tunnicliffe from Bristol's Atmospheric Chemistry Research Group and Johannes Allotey, a PhD student in Physics, also from Bristol. All the sessions we run are self-paced so that the students can go through the material at their own pace, working on exercises and asking question as they go. In this case, since it was a one-off in another country we decided to allow a larger class size of 60 with two seconds to help out.

In Bristol we usually run this course in computer labs, connecting to our University's HPC cluster which runs Linux. The course originally started as a guide to Python for people running HPC/HTC workloads and so has a strong terminal focus, for example it uses nano as the editor and we have the students run Python scripts as python script.py. We feel that this cuts top the point of the language without too much additional tooling required. I wanted to keep this style and so opted for using JupyterLab as a micro-IDE, set up so that there is a text editor pane on one side and a termial pane on the other. In the future we're considering alternative ways of teaching Python at Bristol but personally I feel at the moment that starting with plain Python scripts like this rather than Jupyer Notebooks is a good approach.

In the end we had a completely packed room with about 75 students! We had to keep finding extra chairs to fill up the spare space in the room as students kept on coming, it seems word had got around. With it being the first session of the week and therefore requiring a lot of initial help with things like getting Anaconda Python installed, navigating JupyterLab, using the command-line and on top of that lerning Python it was a very busy session. We each had on average 25 students to cover and there wasn't a pause for pretty much the full three hours. However, despite it being a lot of work, it was also very rewarding as the students were very engaged with the material and it's always a good sign is they are asking questions and it shows that they are thinking about what they are doing.

Beginning Git

That afternoon we ran a introduction to Git course. Personally I find Git really hard to teach to beginniner programmers as it's solving a problem that most people in the room don't know they have. This was my forst time leading on this particular course and so I hadn't built up a good mental library of teaching methods. For next time I think I will find it goes much more smoothly. The second issue with this session was getting Git installed on the students laptops in the first place.

Since everyone was bringing their own laptops and I had limited communication with them beforehand, I had to find a way to get a consistent environment for all of them. I decided ahead of time to have them use the standard Git for Windows since that would most closely replicate the Linux environment we usually use. On the day, since everyone in the room already had Anaconda installed, I had them install Git through that so that we could use JupyterLab as our editor and command-line. This ended up causing some problems as the standard Windows command-line in JupyterLab did a very bad job for much of Git's output and there were many problems setting up notepad as the Git core.editor due to line endings etc. In hindsight, I think I should have just used Git for Windows's Bash environment as then the students could have used nano as the editor as we ususally do, and had a more normal command-line experience.

That said, I'm thinking about reworking the Git course considerably to amke some of the concepts more visual and focus on the things that people really need to know at first.

Day two

Introduction to Data Analysis in Python

On Wednesday I ran through two more Python courses. In the morning was Introduction to Data Analysis in Python. This was the first course I wrote myself when I started at Bristol and so is the course I am most comfortable teaching. Again we used Anaconda for this session but used traditional Jupyter Notebooks. This is because this course was written before JupyterLab existed but I wanted to do inline plotting etc. This was perhaps a bit confusing for the students there as they had to learn yet another tool and way of interacting with Python. I considered using the Notebook interface in JupyterLab but at present it is not as featureful as the old interface but hopefully in time this will change so I can do everything from within JupyerLab.

Intermediate Python

The afternoon session was covering what we call Intermediate Python. The important parts of this course are dictionaries, functions and modules which I work through from the perspective of increasing reusability and sharability of your code. The course is writter as a direct follow-on from Beginning Python so it still mostly uses text editors and command-line python but also introduces ipython as a way of exploring Python features and APIs. I think that in the future I'd like to remove IPython and instead just use Jupyter Notebooks for consistency with the rest of the courses, keeping the difference ways of using Python to a minimum.

Other sessions

Alongside the morning and afternoon sessions, we also ran training on working with geospatial data with R. This was provided by Natalie Thurlby, a data science specialist at the JGI.

In the evening there was a talk from Song Liu titled "Mysteries of Modern-day Statistical Machine Learning: Explained" which covered the amthematical underpinnings of some machine learning techniques and how that explained some of their strengths and shortcomings.

Day three

Applied data analysis in Python

The final day of Data Week started with me running my newest course, Applied data analysis in Python which finally gets on to some of the topics of data science and machine learning that the week of teaching was created for. BY this point we noticed that the number of questions from people in the class was much reduced compared to the earliest, beginner sessions. This seemed strange since I'd expect that more advanced sessions would be more difficult and therefore would promote more questions from students but most of the class seemed engrosed in the work.

Introduction to deep learning

The last session that I ran was one on deep learning. this was much more of a lecture and after a long week I was worried that it might be too much information loaded in at once. However, the class was very engaged and I had lots of very good questions from the students.

Other sessions

As well as the session taught by me, there were other courses being run in parallel which I had to miss unfortunately. Natalie Thurlby ran a number of workshops on geodata and Johannes Allotey ran a hackathon on analysing medical data:

Overall it was a very rewarding week and was a great honour to be able to teach so many wonderful students. We are hoping to continue the collaboration with more teaching and research partnerships in the future.