Any tips to help a scientist become a better programmer? - eviltoast

Hey there!

I’m a chemical physicist who has been using python (as well as matlab and R) for a lot of different tasks over the last ~10 years, mostly for data analysis but also to automate certain tasks. I am almost completely self-taught, and though I have gotten help and tips from professors throughout the completion of my degrees, I have never really been educated in best practices when it comes to coding.

I have some friends who work as developers but have a similar academic background as I do, and through them I have become painfully aware of how bad my code is. When I write code, it simply needs to do the thing, conventions be damned. I do try to read up on the “right” way to do things, but the holes in my knowledge become pretty apparent pretty quickly.

For example, I have never written a class and I wouldn’t know why or where to start (something to do with the init method, right?). I mostly just write functions and scripts that perform the tasks that I need, plus some work with jupyter notebooks from time to time. I only recently got started with git and uploading my projects to github, just as a way to try to teach myself the workflow.

So, I would like to learn to be better. Can anyone recommend good resources for learning programming, but perhaps that are aimed at people who already know a language? It’d be nice to find a guide that assumes you already know more than a beginner. Any help would be appreciated.

  • Turun@feddit.de
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    8 months ago

    I use classes to group data together. E.g.

    @dataclass.dataclass
    class Measurement:
        temperature: int
        voltage: numpy.ndarray
        current: numpy.ndarray
        another_parameter: bool
        
        def resistance(self) -> float:
            ...
    
    measurements = parse_measurements()
    measurements = [m for m in measurements if m.another_parameter]
    plt.plot(
        [m.temperature for m in measurements], 
        [m.resistance() for m in measurements]
    )
    

    This is much nicer to handle than three different lists of temperature, voltage and current. And then a fourth list of resistances. And another list for another_parameter. Especially if you have more parameters to each measurement and need to group measurements by these parameters.