You have fallen into the most simple of traps.
You have written in OpenGL, the HelloWorld program, and you now want to create something along the lines of iTunes. Wrong way of doing things.
For just handling objects so they move around each other correctly and your 'camera' and fly past, upto, and rotate around objects, you will need to create data structures for all objects that have local, world and camera co-ordinates. Luckily for you, OpenGL has alot of builtin routines that will automatically convert between different view spaces for you. After that, you will be able to fly a camera past a static scene, and the camera will fly right through the objects. To make the camera bounce or just stop you will need to add either bounding spheres or bounding boxes to your object models, then every time you move the camera, your code will need to examine every object against, every other object and camera to make sure no collisions occured between spehere-sphere, sphere-box, and box-box. After getting that working, you are going to need to have a physics routine that can use this collision data and the information about how fast the object was moving to simulate how objects will react to in these situations.
All-in-all, you have a monster amount of work ahead of you. It is why Torque is such an attractive option for developers, along with the Ogre engine. All of the above is already written.
Say you did, all the above. All you have written, is a very, and I mean this, very basic engine. You don't have any scripting, you don't have an AI, you would still need to write that.
I am not trying to put you off, plenty of people have done what you are attempting. If you have say 6 months of free-time that you can dedicate to a project, then you will more than likely get very far with this. If you can't guarantee the time required for such an endevour, you might end up feeling very frustrated about the whole process.