Any tips to help a scientist become a better programmer? - eviltoast

Hey there!

I’m a chemical physicist who has been using python (as well as matlab and R) for a lot of different tasks over the last ~10 years, mostly for data analysis but also to automate certain tasks. I am almost completely self-taught, and though I have gotten help and tips from professors throughout the completion of my degrees, I have never really been educated in best practices when it comes to coding.

I have some friends who work as developers but have a similar academic background as I do, and through them I have become painfully aware of how bad my code is. When I write code, it simply needs to do the thing, conventions be damned. I do try to read up on the “right” way to do things, but the holes in my knowledge become pretty apparent pretty quickly.

For example, I have never written a class and I wouldn’t know why or where to start (something to do with the init method, right?). I mostly just write functions and scripts that perform the tasks that I need, plus some work with jupyter notebooks from time to time. I only recently got started with git and uploading my projects to github, just as a way to try to teach myself the workflow.

So, I would like to learn to be better. Can anyone recommend good resources for learning programming, but perhaps that are aimed at people who already know a language? It’d be nice to find a guide that assumes you already know more than a beginner. Any help would be appreciated.

  • Diplomjodler@feddit.de
    link
    fedilink
    arrow-up
    14
    arrow-down
    2
    ·
    8 months ago

    Forget everything you hear about OOP and just view it as a way to improve code readability. Just rewrite something convoluted with a class and you’ll se what they’re good for. Once you’ve got over the mental blockade, it’ll all make more sense.

    • WolfLink@lemmy.ml
      link
      fedilink
      arrow-up
      5
      ·
      8 months ago

      To add to this, there are kinda two main use cases for OOP. One is simply organizing your code by having a bunch of operations that could be performed on the same data be expressed as an object with different functions you could apply.

      The other use case is when you have two different data types where it makes sense to perform the same operation but with slight differences in behavior.

      For example, if you have a “real number” data type and a “complex number” data type, you could write classes for these data types that support basic arithmetic operations defined by a “numeric” superclass, and then write a matrix class that works for either data type automatically.

      • ALostInquirer@lemm.ee
        link
        fedilink
        arrow-up
        2
        ·
        8 months ago

        One is simply organizing your code by having a bunch of operations that could be performed on the same data be expressed as an object with different functions you could apply.

        Not OP, but also interested in wrapping my head around OOP and I still struggle with this in a few different respects. If what I’m writing isn’t a full program, but more like a few functions to process data, is there still a use case for writing it in an OOP style? Say I’m doing what you describe, operating on the same data with different functions, if written properly couldn’t a program do this even without a class structure to it? 🤔

        Perhaps it’s inelegant and terrible in the long term, but if it serves a brief purpose, is it more in the case of long term use that it reveals its greater utility?

        • WolfLink@lemmy.ml
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          8 months ago

          Say I’m doing what you describe, operating on the same data with different functions, if written properly couldn’t a program do this even without a class structure to it? 🤔

          Yeah thats kinda where the first object oriented programming came from. In C (which doesn’t have classes) you define a struct (an arrangement of data in memory, kinda like a named tuple in Python), and then you write functions to manipulate those structs.

          For example, multiplying two complex vectors might look like:

          ComplexVectorMultiply(myVectorA, myVectorB, &myOutputVector, length);

          Programmers decided it would be a lot more readable if you could write code that looked like:

          myOutputVector = myVectorA.multiply(myVectorB);

          Or even just;

          myOutputVector = myVectorA * myVectorB;

          (This last iteration is an example of “operator overloading”).

          So yes, you can work entirely without classes, and that’s kinda how classes work under the hood. Fundamentally object oriented programming is just an organizational tool to help you write more readable and more concise code.

          • ALostInquirer@lemm.ee
            link
            fedilink
            arrow-up
            1
            ·
            8 months ago

            Thanks for elaborating! I’m pretty sure I’ve written some variations of the first form you mention in my learning projects, or broken them up in some other ways to ease myself into it, which is why I was asking as I did.

        • Turun@feddit.de
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          8 months ago

          I use classes to group data together. E.g.

          @dataclass.dataclass
          class Measurement:
              temperature: int
              voltage: numpy.ndarray
              current: numpy.ndarray
              another_parameter: bool
              
              def resistance(self) -> float:
                  ...
          
          measurements = parse_measurements()
          measurements = [m for m in measurements if m.another_parameter]
          plt.plot(
              [m.temperature for m in measurements], 
              [m.resistance() for m in measurements]
          )
          

          This is much nicer to handle than three different lists of temperature, voltage and current. And then a fourth list of resistances. And another list for another_parameter. Especially if you have more parameters to each measurement and need to group measurements by these parameters.