上传日期:2023-12-26 20:11:22
上 传 者sh-1993
说明:  关于从湿式实验室生物学家过渡到生物数据分析师的短篇故事
(A short story about the transition from wet lab biologist to biologic data analyst)


# I have a background as a wet biologist, but at a certain point, I found myself drawn to statistics tasks, and it turned out to be an interesting and exciting experience for me. I decided to delve deeper into the world of data analysis and bioinformatics. I am aware that many of my wet lab colleagues may also face challenges in learning math and statistics. Therefore, I'd like to share my journey and the resources I have used or currently use as a bioinformatician. #### Technical Skills: R, Python, AWS, bash, cwl, Nextflow ## Education - Ph.D., Molecular biology | The State Research Institute of Genetics and Selection of Industrial Microorganisms of the National Research Center (_Febraury 2013_) - Specialist, Microbiology | The Moscow State University (_June 2010_) ## Work Experience **Postdoc @ Universittsklinikum Schleswig-Holstein (UKSH), Kiel (_November 2023 - Present_)** - Development of Bioinformatics platform service based on Nextflow pipelines (Nextflow, Jira, Globe) **Bioinformatician @ Bostongene,Yerevan (_July 2023 - October 2023_)** - Development of quality control tools (python, AWS, cwl) - Analysis of QC parameters for production (python, AWS) **Postdoc @ Deutsches Krebsforschungszentrum (DKFZ), Heidelberg (_June 2022 - July 2023_)** - Tryptophan metabolism and NAD formation: Designing pipeline for bulk RNAseq analysis, analysis of single cell RNAseq, data integration from multiple platforms(GO, UniProt, ENCODE), proteome analysis (MetaboAnalyst), graph analysis (igraph) - Literature mining: Implementation of NLP methods (gepi, rentrez, easyPubMed) **Senior research fellow @ Federal Research and Clinical Center (_February 2016 - February 2022_)** - Development and implementation of NGS methods and guidelines for cancer diagnostics: Designing and testing PCR-kits for clinical use - Participating in EMQN’s EQA: Implementation of European quality assessment to Clinical Centers ## Information resources that I used ### Practical courses - EMBO Workshop. Integrating genomics and biophysics to comprehend functional genetic variation ([]) This event proved to be highly theoretical, offering excellent networking opportunities but lacking hands-on sessions. - Course at EMBL-EBI. Cancer genomics ([]) This course stands out for its comprehensive content and numerous hands-on sessions conducted by leading scientists in the field. I still find value in reviewing the presentations and scripts provided during the course. - MGNGS School of Data Analysis in Medical Genetics (on Russian) ([]) During this course, we utilized Galaxy and faced a final exam where, within a limited time, we received a raw fastq file and a case description. The challenge was to generate a report identifying a nucleotide variant and its class of pathogenicity. - Bioinformatics Institute. Biostatistics and analysis of medical data(on Russian)([]). NGS intensive(on Russian) ([]) I have participated in both courses. The biostatistics part was well-balanced and coherent. However, the NGS intensive course felt somewhat cluttered, as if the authors aimed to include as much information as possible, resulting in an overloaded experience. - Machine Learning summer school. Bumblekite ([]). This course is ideal for beginners in machine learning, offering practical courses in various fields of ML applications. While lacking a deep dive into theory, what sets this course apart is the organizers' initiative to invite numerous top specialists in the field. These experts shared their life stories, providing inspiration and helping me clarify my career path. ### Education platforms - Stepik (on Russian) ([]) This platform offers numerous courses for DevOps and Data Scientists. What I appreciate about this source is that it includes genuinely challenging exercises. By solving them, you gain a deep understanding of the material. However, this can also be a drawback, especially during busy work periods when you may not want to invest a significant amount of time in acquiring a new skill. The courses I have completed on this platform include: - Python programming ([]) - Introduction to Linux ([]) - The basics of statistics.Part 1 ([]) - The basics of statistics.Part 2 ([]) - The basics of statistics.Part 3 ([]) - Nanopore sequencing ([]) - Git technology ([]) - Introduction to NGS ([]) - Docker for beginners ([]) - Data analysis in R ([]) - DataCamp ([]) This platform is primarily designed for data scientists and data analysts. The exercises are relatively straightforward, involving tasks such as filling in the gap with the correct command. This approach enables students to quickly immerse themselves in the programming environment. The courses I have completed on this platform include: - Intermediate R - Writing efficient R code - Object-Oriented Programming with S3 and R6 in R - Introduction to Python - Intermediate Python - Introduction to Tidyverse - Data Communication Concepts - Intermediate Importing Data in R - Introduction to Regression in R - Python Data Science Toolbox (Part 1) - Python Data Science Toolbox (Part 2) - Supervised Learning with scikit-learn - Unsupervised Learning in Python - Linear Classifiers in Python - Rosalind ([]) This platform serves as an excellent portal for hands-on learning of Python and principles in the field of bioinformatics analysis. It covers various topics, including genome assembly, genome rearrangements, phylogeny, probability, string algorithms, and more. - Coursera ([]) A globally recognized platform that offers diverse lectures and hands-on exercises from top universities worldwide. The courses I have completed on this platform include: - Mathematical Biostatistics Boot Camp 1 ([]) - Algorithms on Strings ([]) - Finding Hidden Messages in DNA (Bioinformatics I) ([]) - Biology Meets Programming: Bioinformatics for Beginners ([]) - R Programming ([]) ### Telegram channels Telegram channels serve as an excellent source for staying informed about the latest news in the field. Additionally, I occasionally seek expertise on these channels and typically find the assistance I need. I follow these channels. They all are on Russian: - [] - [] - [] - [] - [] - [] - [] - [] ### Slack channels In a similar situation, there are numerous instances where you need to discuss difficulties, and such a messaging platform provides a great opportunity to do so. I follow these channels: - [] - [] - [] ### YouTube channels Here is my curated list of useful YouTube channels - [] - (on Russian) The Bioinformatics Institute channel with lectures - [] - Data Science tutorials - [] - Tips and tricks from Data Scientist - [] - (on Russian) The guy write on C++ but gives a bunch of useful information regarding programming processes - [] - byte-size talks and conference videos from nf-core community - [] - good old statistics in simple words ## Books - R for Data Science by Hadley Wickham - [] - Python for Data Analysis by Wes McKinney - []