Basically, you analyze it then measure it.
If threads contend for any shared or limited resource, then the overhead of resolving that contention may be high. This includes coordination, such as exclusive access, as well as cooperation, such as sharing a network interface.
If threads operate with independent resources, and only coordinate briefly at start and end of the task, then contention may be low. This is the ideal and most scalable.
If the task is a sequence of contended and independent work, then analysis and prediction is even harder.
In general, you make some predictions based on the type of task, the time it takes for the task itself, and the time-varying nature of contention resolution. There's a whole area of Computer Science that deals with contention resolution, timing optimization, and doing work within limits or restrictions. With predictions made, and code written and tested, then you measure the actual code running the actual task and see how close your predictions were.
Detailed analysis and measurement may be too costly. Then you just code things to not do anything too stupid, then test it under a range of loads, and hope for the best. Optimal isn't needed if Good Enough is present.
Without knowing exactly what the task is and what it's actually doing, there's no way to tell if 21 threads is a good number.