-
New job as Bioinformatician in Rare Disease
Today I started working as a Bioinformatician in the Department of Medicine of University of Cambridge, working with PIs Chris Wallace and Ken Smith on the INTREPID project. We are exploring molecular data from Primary Immuno Deficient patients (https://www.intrepidproject.info/).
-
Nextflow Summit 2022
In October 2022, I was lucky to be encouraged by my research team to attend the Nextflow Summit and Hackathon in Barcelona. I wanted to improve my skills at the Hackathon, and meet the peeps behind the Nextflow pipelines that are open-source and community-driven. This was the occasion to meet my colleague Chris Wyatt too. It was an incredibly valuable experience, both for the technical skills I gained and the connections I made with other professionals in the field.
The Nextflow Summit dinner included neat logos on walls and beer bottles
The Hackathon was particularly memorable for me. I was struck by the team spirit within the nf-core community and the way everyone worked together to build and improve pipelines. It was exciting to see the updates being made in real time and to be a part of the collaborative process. This felt very inclusive for all levels of skills. There were lots of socials, such as making our own tapas à la Great British Bake Off. And swag!! I was super happy to share the apron, stickers and other goodies at our lab meeting back in London.
I now feel more comfortable in asking questions on Slack having met the core team. I also understand how pipelines can be build and twigged. Chris and I have now implemented more Nextflow structures in our lab and university, including the config file for our HPC.
I would encourage everyone to join the next Hackathon. It is running in end of March 2023 near you, and you can even take part in the workshop the week before. Try it for yourself; and meet people who are making the magic.
The Summit was also a great opportunity for me. I was able to make new connections with industry professionals and learn about the ways in which Nextflow is being used in real-world applications, particularly in medical and rare disease research. It was inspiring to hear Solenne Correard and her team who focused on including all patients, of all genetic background, in the research. It is fascinating to hear about the impact that these pipelines are having on improving patient care and advancing scientific discovery.
The Nextflow Summit and Hackathon was a fantastic learning experience and I am grateful for the opportunity to attend. I am excited to put what I learned into practice and to continue being a part of the nf-core community. I can already see the challenges for institutions and start-ups alike to maintain knowledge of existent pipelines.
-
Amazon Web Services workshop: creating and managing connected spaces
I attended an Amazon Web Services workshop organised by UCL and AWS’s joint venture Centre for Digital Innovation, in London few weeks ago. We spent a day exploring the ins-and-outs of this cloud management system: creating a connected space (EC3 instance) where collaborators can share data (bucket) for example. I knew that many companies, and Bioformatics Lecturers, use this modular, kind-of-pay-as-you-go admin system. This had peeked my interest, so I brought my laptop to IDEALondon to see what AWS could offer.
The workshop was intended for UCL researchers and innovators to explore the mission of the Centre for Digital Innovation, and to experience hands-on the technical aspects of AWS. I met other researchers, start-up directors and IT administrators; all of us knew that our current tools did not exactly meet our needs. Over few hours, thanks to Bruno Silva and James Grant, I used both the click-and-point interface and the terminal to create a system from scratch (ie similar to creating a Docker container or a RaspberryPi), how to access data and who should be in control. The tutorial is here. I want to put this in practice to help sharing data across time zones without costing too much.
Overall it was a great day to learn more about better solutions for data sharing. I look forward to see impacts on productivity in young start-ups and academic research teams once they embrace these architectures, instead of current cluncky, non-version controlled, non-scalable systems. I do think that the next challenges for all will be to train people and maintain knowledge of efficient use cloud management systems.
-
Pint of Science: come for a drink, stay for the science!
I was invited to present my research in a London pub - so much fun! This was an event organised by Pint of Science, encouraging scientists to share their projects to local pub goers.
The Ivy House Pub hosted us for the event. Photo credit: Dr Cintia Oi
The mission of this international science festival is simple: share a drink and exchange science facts. Each local pub, café or communal space hosts an evening of talks for the local audience. I really like it because the event is breaking down potential barriers between the academic world and local community.
I was thrown in front of an unknown crowd, not unlike SoapboxScience. Within twenty minutes, I could explain my current research: how to use genomics to understand the evolution of social organisation. I decided to include few stories from fieldwork - this often what captures people’s attention. I also explored ideas around evolutionary times: this is what keeps me happy in my job and I always break it down in small chunks of knowledge that is easy to reach for an interested teenager. I aimed to keep the slides simple: the audience could easily visualise data to derive insights.
This was a fantastic event with many questions from the keen audience. I thoroughly enjoyed myself and would recommend to attend these events! Find your next local Pint of Science here for 22-24 May 2023.
-
New skills after networking in Barcelona's bioinformatics hotspot
Right at the start of my postdoc, I applied for the ESEB Godfrey Hewitt Mobility Award to expand my questions on the evolution of social organisation to larger datasets, namely the non-coding regions of genomes (lncRNA). I was successful in obtaining the €1,600 grant at the turn of the new year 2020. Needless to say, the next two years proved more chaotic than anticipated.
The view next to the building, with Mar and her group walking back from a lunch in Barceloneta
In March 2022, I travelled to Barcelona and spent the month collaborating with Head of Evolutionary Genomics Group Prof M. Mar Albà Soler at the GRIB (Research Programme on Biomedical Informatics in Barcelona Biomedical Research Park). I wanted to learn more about lncRNA and developed my Python skills. The warm welcome that I received from Mar and her research team was fantastic: I had access to a desk in the shared office, I attended weekly lab meetings and seminars, I could ask dumb/direct technical and biological questions on my project. At the end of the month, my exploration of long-coding regions had jumpstarted to solid analyses (that are still running), and I was proud to present results in a Jupyter notebook.
While I was there, Mar received awesome news: her ERC project was funded, and she is currently recruiting. If you are/know of a postdoc who likes to explore long non-coding areas AND playing beach volley after work, apply here.
-
Top Command Lines
I welcomed a new colleague this week on campus, this was awesome after several months of working from home! Cintia Oi is coming with fantastic ideas and questions to challenge some genomic data. I’m updating my “current top commands” to share more of what’s going on my screen.
Writing reproducible science. In our lab, Chris Wyatt is the torchbearer of Nextlow and is encouraging us to use this version-control, open-source workflow pipeline. This allowed me to clean raw RNAseq data, to run QC steps, to map reads to a genome, to obtain read counts ready for statistical analyses. The process was relatively straight forward: select on a platform the workflow (e.g. https://nf-co.re/rnaseq/1.3), download an image containing the recipe for the workflow on your cluster, set the experiment, and let it run in the background. It can be used for all sizes of data, so I could run it with all sorts of samples laying around in our lab. Enrich your science with reproducibility: https://nf-co.re/usage/nextflow (tutorials).
Explaining results visually. Once the data have been analysed, I like to plot them to assess whether the starting hypothesis stands the test. I can spend many happy hours building a pretty graph, usually using R ggplot2 and looking for inspiration on this gallery. The code is avaiable for each graph, making my life easier. I pick colours that will convey the take-home message best, and that are colour-blind friendly (https://colorbrewer2.org/). Teach yourself some neat visualization tricks: https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html
Backing up your work. Once all is coded, statistical tests and data visualisation, I make sure to secure all scripts and READMEs. This is important because I might not look at the data for a long time, so leaving my desk tidy is more efficient in the long term. This is important because my laptop could cease to exist, and all efforts and outputs with it. And this is not cool! So I regularly practice git push, a simple command that mirrors my laptop’s content with a Github remote repo. Some colleagues practice the “plug the external hard drive and copy files” system. This is good too. Anything is better than just leaving your script on your laptop. Practice it here: https://swcarpentry.github.io/git-novice/
I am also using Python scikit-learn, Docker, qsub. I’ll talk about these another time.
-
A year working from home
2020-2021 in London have been great to reducing my commuting time to two seconds from my bedroom to my living-room. Challenges were many however, especially when establishing and nurturing intercontinental collaborations. Katie (postdoc at Iowa State University) and I have been working on each side of the pond, untangling the association between molecular processes and evolution of sociality. Here are few tips that worked for us.
Busy working in our working pods, thankfully connected with Internet
Collaboration thrives on continuous communication. We use emails for important topics, and we used Slack for keeping track of actions. We moved on to a shared Google folder with more flexibility to share, change and organise.
We found that recurrent video calls were important to keep us on track, celebrating small milestones and big personal wins. We discuss findings, troubleshoot issues, organise logistics (e.g. sampling and sequencing). As we slowly realised that I will not be able to go to Iowa as planned due to the pandemic, we started to co-work on Zoom in my late afternoons and Katie’s early mornings: we meet at a specific time, we state our aims, and work on mute until one of us needs to ask a question. As we work on the same datasets with slightly different tools, this coworking window allows us to get immediate fixes on small issues.
We quickly started a common Github directory, where we push our code and datasets. This allows a great parallelisation of work, as long as we do not forget to pull first (more info here). Github projects have a great kanban organisation tab, where I can create task list and update them from TO DO to IN PROGRESS. With a 5 hour difference between us, this card system is efficient to quickly communicate the analysis progress.
Emulating real life interations is not straight-forward, and I am sure we’ll look back on this time with bitter sweet curiosity.
-
Current top commands
I recently got in touch with a future colleague who was keen on learning more about genomics and that got me thinking about commands I frequently use these days. I’ve previously touched upon the usefulness of github. So here are more snapshots of my bioinformatics workbench.
February 2021: Spring is coming in still-in-lockdown London
Project organisation. Genomic datasets are huge and my work space does not allow for many of these files of several mega or giga bytes. I cannot keep multiple copies of each dataset in my projects, because my work space would be full. Instead, I give each project an access to the archived copy via a soft-linked version of the dataset (e.g.
ln -s archive/my-sequences.fasta project1/input/my-sequences.fasta
). The syntax of my command line will be as expected (e.g.head project1/input/my-sequences.fasta
), the tools will use the dataset as expected, and my work space remains clutter-free. Try it by typingman ln
in your terminal.Well-planned experiments. Some genomic experiments can take several hours, which makes me more aware of good time management. One good practice that I learnt from wet lab is to set the experiment in a lab room before lunch (say, a PCR), walk away from that room for a well-earned break, and come back in the afternoon to check the result. I do the same with the in silico experiments. From my work space, I enter my project space where datasets and tools are at the ready. I “create a lab room” and start the experiment, check for a minute that everything is running as expected, I then “walk away from the room” without the experiment stopping. I can “re-enter the room” as I wish to check the workflow, and I once the results are in, I can “remove the room”. Make your own lab rooms with
screen
ortmux
.Speed things up with parallel. One of the common tasks on my work bench is to map sequenced reads to a genome assembly. It is the process to match the sequences of nucleotides between the two sets of data: many small sequences that came out of a sequencer (reads) and a small number of long sequences that came out of an assembling algorithm (genome). This can take several hours, even days! I recently had a hundred samples to map to the assembly. To speed up the process, I used a tool that uses multiple computers at once, with the same output as if I was to run the tasks sequentially, one sample at a time. Learn more about GNU parallel.
I am also using R ggplot, Nextlow, Git push. I’ll talk about these another time.
-
Updated CV
Carrying on with my previous post, I have now updated my cv. Research has been tremedously eye-opening, thanks to the fantastic collaboration with Katie Geist, my “postdoc twin” from Amy Toth’s lab. More to come from this very soon! I have been equally busy with mentoring students, organising a conference and helping the UCL GEE postdoc community by acting as postdoc representative.
-
Happy New Year
I wish I could say what happened in 2020 stays in 2020. But there are plenty of good things that happened and one should celebrate the little wins and the big wows.
2020 was the first full year as a postdoc: I worked with great colleagues and students, and I learn a lot while joining the Sumner lab. I will also remember obtaining the Godfrey Hewitt mobility grant from ESEB, and organising the NWE IUSSI online conference. I loved reading cool science from friends (about ants and turtles) and from the research community at large (SARS-CoV-2 vaccine and hope to treat Batten disease).
Vespa crabro queen by Giacomo (Instagram: @kelp_art)
In 2021, I look forward to get more work published, to learn even more from our scientific community, to tackle some brain-hurting challenges be it technical or theoretical. I also look forward to some recurrent conferences, such as PopGroup; as well as some new ways of networking and navigating my way in the postdoc world. Reach out to me if you “see” me attending the same conference as you.
-
Organising a conference
Each year social insect researchers gather around to share their findings, sometimes at the NHM in London, sometimes somewhere else in the UK. This year, and thanks to the internet, the Winter Meeting will be happening in each isolated/confined office. I have been organising the logistics of a three-day conference of the North-Western European Section of the International Union for the Study of Social Insects.
Our beautiful flyer designed by Federico López-Osorio
I have been attending this Winter Meeting since 2015, so I thought it was time to give back to the community. With the help of a fantastic organising committee (Liz Bates, Anindita Brahma, Valentin Lecheval, Gabriel Luis Hernandez, Benjamin Hanson, Federico López-Osorio), we put together a headline that is diverse in scientific themes. This attracted more than 100 registrations from scientists from many countries in Europe and further afield. We will have a healthy combination of Zoom talks and flash-talks for posters, a Slack forum dedicated to all social insect chatter.
-
Sampling hornets
I have been celebrating one-year of postdoc-ing with Seirian Sumner’s lab with a fieldtrip to observe and collect European hornets (Vespa crabro) with PhD students Lewis and Owen, MSci student Iona, and Hornet expert John.
October 2020 in London park and lab
The nest was busy with workers and (at least) one male going in and out of the tree hole. At this time of the year, the reproductive queen has often died and the workers are left without a nursery job (read more here). The workers go out to forage, and it is becoming more and more difficult to find sugar and protein in the urban park in October.
We soon settled in a chain of sampling tasks. John catches the worker in the net, Iona keeps it still on dry ice, Lewis and Owen secure the sample into liquid nitrogen. Meanwhile I try to keep the public away from the sampling site by answering queries (“are those bees?”, “Are these native?”) and encouraging the scientific curiosity (“did you know that some rare species live in hornet nests? Think about the biodiversity”).
At the end of the session, we got all samples secured. We packed our kits and left some strawberry jam for the late workers: a bit of sugar and protein goes a long way for these soon-to-die workers.
Back in the lab, I was also able to handle a live male hornet (they do not sting), which was amazing as I had only seen specimens on a pin at the NHM.
-
Using Github
Why should I use Github?, a friend recently asked. I write code for my job and share scripts with colleagues. I think Github could be useful, but I feel a bit intimidated by the technical bits. I thought about it and replied with few tips that I learnt on the way. This is a summary.
Github essentials: you can fork a project and work on a new branch
Science relies in reproducible research, with carefully laid-out hypotheses, optimised protocols, rigorous reports that are peer-reviewed. Science is also about collaboration: “hey, did you manage to use the new cluster? how did you do that?”. For these two obvious reasons, sharing an R code or a bash command is quite useful if it involves some version-control, sharable system. This is what UCL recommends when publishing an article, and what I try to do with my colleagues.
Getting used to Github was not so easy at first, because it was quite alien to what I knew. I first used it to update a website: each learning step was often loaded with a small mistake that would inevitably crash the website. I frequently spent time to fix these bugs, with the immediate gratification of seeing the website working, and the long-term benefit of understanding git. Nowadays I find Github useful for my own personal website (read more here to make your own website).
I try to commit and push regularly in all my projects. This allows me to have a somewhat secure back-up of scripts if the end of times happens. I always talked about this in hypothetical terms, until I was deprived of HPC access for the whole month of May. I easily found my scripts on the remote repository and could play with ggplot. I have also stored some script templates that I can copy and paste easily off the website remote repository.
I currently use Github to understand the tools that I use, by looking at the source code or the comments of the issues. Some users might have had the same error message than me, and it is great to find a fix in the closed issue posts. I also keep an eye on few of my bioinformatic crushes (you can follow and star projects, organisations or users): what they are currently using might be the next big thing that everyone wants to use.
Lastly, there is this beautiful data visualisation on the front page of each user, showing moments of intense use of Github with little green tiles. It is nice to reflect on the past to see the big pushes. I know that it motivates some to fill in the blank too!
-
Teamwork with Undergraduate student Jadesada Schneider
This summer I collaborated with Jadesada Schneider, a first year undergraduate student who was keen to learn about wasp evolution and bioinformatics. He reached out to the Sumner lab with an email to present himself and his interests. We then met him for a friendly chat (prior to the pandemic situation), and agreed to research together the evolution of sociality in wasps, with the aim of publishing together a scientific paper.
Collaboration: process of two or more people working together to complete a task (Wikipedia)
Planning a summer internship during COVID19
We first created a mentoring contract, in which both Jadesada and I wrote our expectations and our goals inspired by Pleuni Pennings’s blog post. Because the internship was set during lockdown, we communicated via video calls and shared cloud-based documents. We both wanted to reproduce as much as possible the environment of an academic research project, so that Jadesada got to know as close as possible the academic world. So we thought about: reading articles and talking about the results, attending online seminars together (exchanging our thoughts after the talk), actively participating in the Sumner lab meetings, organising expert drop-out (video calls with discussing result slides), registering and attending a computer cluster workshop. The cherry on top would be Jadesada presenting his finding during a UCL GEE departmental talk.
Great collaborative work
We decided to catch up twice a week with a video call where we’d shared our findings, our code debugging. This worked really well to have an efficient team work without loading our long days with many other video calls. We also set up a message chat (UCL uses Microsoft Teams), so we could easily communicate the rest of the week.
I could help fixing things as they came up (e.g. issues while installing tools on the cluster, or search on Github issues for solution). I constantly encouraged the “thinking together”, mentioning as well the rubber duck effect and introducing typical research ideas, such as correlation is not causation. We both learnt from each other, for instance I showed Jadesada how to get RStudio to wrap the code at 80 characters, Jadesada conducted a side experiment with really cool findings that I did not foresee.
We focused our two-month collaboration on Evolution and Bioinformatics, with the example of the evolution of sociality in wasps. Jadesada started his internship with a vague idea of what a command line was. By the end of the internship, he had used the university cluster with success, including organising a bioinformatics project, installing tools (by no mean a small feat!!!), transforming genomics datasets into statistical findings with RMarkdown, writing well-documented scripts that are reproducible with version-control Github. He had also tackled many ideas coming from social evolution theory, and discussed many articles with growing academic expertise.
We also talked about communicating science. First, finding good sources of articles help thinking about the project, such as using Google Scholar to find a researcher’s list of publication, or increasing the depth of a literature review with ConnectedPapers, or even sending an email to the author to request a PDF of the article. Second, we talked about the bizarre system of publishing peer-reviewed articles and alternative platforms (biorxiv and PeerCommunity). Finally, I explained my previous experiences attending in-person conferences, to which Jadesada rightly commented that conferences are like music festivals, with long days of fun and A-list keynote speakers.
Take home message
Our research project is still ongoing, so watch this space for news of a co-authored manuscript and a Github public repository. Jadesada is presenting the project to the UCL GEE seminar community, and soon will start his second year of undergraduate degree.
I really enjoyed our collaboration! I was able to share many tips that I learnt, with a small contribution in making the academic world more accessible by describing my day-to-day schedule and talking about science and its issues. Mentoring also kept me engaged throughout the summer of COVID19. I hope by sharing these notes to encourage more collaboration across academia.
-
Letters to a Pre-Scientist
I am a first-generation at university, female bioinformatician, initially labelled immigrant worker; and sometimes I am finding my scientific status difficult to carry in the elitist world of academia. On the other hand, I have the privilege of my white skin. For me, science thrives from diversity and creativity. During my PhD, I quickly understood that research was fruitful and enjoyable if conducted in a team of scientists with complementary skills and from various backgrounds. These thoughts were also fed by conversations that I had in our PhD office, on our East London campus and across different institutions as part of my DTP training. I was also learning a lot around, including thought-provoking books about inequality in academia (Bhopal, 2018; Saini, 2019). I wanted to act for a more inclusive and diverse science world. And this is when I heard about Letters to a Pre-Scientist.
Snail mail essentials
Why I volunteered for Letters to a Pre-Scientist
LPS is a wonderful organisation that matches young people in US low-income communities and STEM scientists, enabling a conversation about science, through four rounds of hand-written letters. I love snail mail! For some of the Pre-Scientists, this would be their first ever snail mail. The concept of desacralizing the status of scientists by conversing about life and career is amazing. The more we know about each other (culture, community, etc), the more networks we create, the better science becomes. If academia was more honest and willing to address what is broken in the system; if academia was more humanised and visible as a career option; the better science would be (Catherine Okuboyejo, 2020).
How I found it
The short but essential training was eye-opening. Given the current pool of Pre-Scientists (8-12 years-olds from US schools), I learnt about the school system there, the inequalities that they are facing, and the differences in STEM curriculum compared to Europe. Matching was fun! They matched my interests (within and outside science) with my penpal, while focusing on matching every student in the class. In my case, we matched on liking animals, cooking and Netflix. Excited to get the letters: once I got the email that the teacher sent the letter, I was eagerly waiting to receive the letter from the US. The stories from my penpal were always fascinating and I learnt a lot from it. I think my penpal learnt quite a lot as well.
LPS are recruiting each summer, check it out here: https://www.prescientist.org/