But the code did run before I tried to thread it and each call gets a separate matchinput so I'm not sure where the issue of synchronicity comes in yet. Do I need four different subroutines, one for each thread?
The code you originally posted wasn't what you've ended up with after adding the threading and the bug-fixes. So I don't know what you mean by "the code did run before I tried to thread it", since you haven't posted any code that runs without threading. Perhaps you mean something before you started making threaded modifications, but if that's what you mean, you haven't posted it, so I have no context for comparing before/after threading. The only context you've posted is after adding threads.
If you have already run the current code without threading, you need to post that code. If you haven't run the current code without thread, you should do that test now, before attempting to proceed.
I suggested that you run the current match() function without threading. I mean the one that takes a single struct input*, not the one that you originally posted, which has no possibility of working as given.
You must do this unthreaded, and look very carefully at how you are passing back the output value(s). The code you've posted so far on passing back output values is a disaster. It doesn't make sense, and is quite certainly wrong as it was last posted, which I previously pointed out (the green-hilited code).
You don't need four different subroutines.
I've already told you where to focus your attention: on the input and output parameters. In particular, you need to take a much closer look at output parameters, i.e. the calculated result that each thread is producing, and where it's storing that result, and how the memory is being created for storing that result.
I truly have no more specific recommendation, because I can't tell exactly what your output values are supposed to be. I can't figure out exactly what you want, so I can't tell you how to achieve it.
You don't have any comments for 'sd', the code involving sd is a disaster, and it's unclear what you intend to happen because you haven't explained what your output result (or results) from each thread (or the combined threads) is supposed to be. In short: no context.
We haven't even gotten to the point of determining how the threads tell the main thread that they've finished their computation. Because that's going to be another necessary element if you're going to get the results out.