I took the day off today to recover from second round of the covid vaccine and spent a little while implementing my own version of Conway’s Game of Life in Python. Here’s an example of a simulation involving a grid of 500 x 500 cells. When initializing the first generation each cell had a 0.3 probability of starting as a living cell (start-as-living probability). Increasing this value will increase the number of cells the grid is initialized with. I ran the simulation for 100 generations. and with a 0.1 start-as-living probability. I was curious how changing the starting probability of a cell starting the simulation as living would effect the game as it progressed.
Posts with the tag programming:
Recently I was working on automating some legislation tracking tasks for the UC Davis Legislative Affairs Committee and noticed a potential bug in displaying the results of bill information searches and wanted to note it. If you are interested in California legislation you can go to the state website and search for bills by a number of parameters, including by keywords. Which brings up a number of bills. Clicking on one gives us the bill’s full text with out keyword highlighted. The bug occurs when we search for specific html tags like div or span. When these terms are searched the source html ends up being rendered on the actual page with tags highlighted.
Polo: an open-source graphical user interface for crystallization screening was just published in the Journal of Applied Crystallography. It is my first academic paper as well as my first first-author paper. Please check out the article at this link! Below are a few of the figures from the article, click to be taken to the full descriptions.
Recently I was asked by a friend if they knew about any databases that classified cannabis strains by symptoms people tend to use them to relieve. I didn’t know of the existence of any but had heard about leafly.com which catalogues user reviews of various cannabis strains and compiles data on their characteristics. I thought this could be a good place for them to start and so I started looking into what it would take to make a webscrapper to pull down all the data leafly has complied on hundreds on cannabis strains. It turns out it didn’t take that much.
NBI Background The National Bridge Inventory (NBI) is a program of the Federal Highway Administration which is an agency within the U.S Department of Transportation. The NBI makes available records and statistics about all the bridges in the United States which includes information about bridge location, integrity, inspection history and usage. Potential encoding discrepancy As a side project I have been working on creating a more exhaustive Python package for parsing NBI data. This is mainly focused on decoding the numerical representations present in data files to their semantic meanings specified in the NBI documentation. I ran into errors when trying to decode the state code fields, which based on the available documentation uses the coding table below.
The past couple days I have been running some ligand docking simulations as part of my current rotation with the Cortopasssi lab using Rosetta. One of these docking simulations involved fitting a small portion of the insulin receptor (IR) the lab is interested in, into a known binding region of the Shc1 protein. Any Rosetta docking simulation will require hundreds of repetitions, which generate a significant number of pdb files which show the final conformation of the protein and ligand at the end of a given simulation. While reading about the best way to aggregate and do analyise on these results I spent a bit of time looking for ways to visualize everything Rosetta spits out.
Are you using the UC Davis FARM for molecular modeling and need to figure out how to setup GROMACS? Well hello extremely small subset of the population! This is the guide for you. Note, this is only for a basic installation. For maximum performance refer to the GROMACS guide linked above. Getting started We will be working off the installation instructions on the GROMACS website but will modify a few steps to deal with the quirks of the FARM at the time of writing and the fact you will not have sudo privileges. If you want to cut to the chase, you can run this script, which will run all the code in this guide in one go.
After finding the COG-UK data I was looking around for other interesting COVID-19 datasets to play around with and build my R plotting skills with. User moritz.kraemer posted this article on early case descriptions which included a lot of geo-spacial data that I was interested in takeing a look at. There was a significant number of fields devoted to hospitalization related measurements and so I focused on that subject for the plot below. The dataset includes patients with and without hospitalization records and so first I filtered down to just those with records and those who also had location data. This subset of patients formed subplot A.
The Covid-19 Genomics UK Consortium has been collecting and sequencing thousands of COVID-19 genomes from patients in the UK and around the world. All of their data is publicly available. Here I played around with the phylogenetic tree they have created from global alignments of all the genomes they have sequenced. You can download the tree in Newick format from their data page which also hosts sequences and the alignment files. Visualizing the COVID-19 phylogenetic tree by country of origin Genome count by country Note this plot is log scale in the y-axis. 16 most prevalent UK COVID-19 lineages Density plots showing the number of genomes of the 16 most prevalent lineages detected by COG-UK.