Should Spotlight overtake Finder?

leman · May 3, 2016

Shirasaki said:
For bundle stuff, xxx.app in Mac OS X is what you want. Basically it is a folder. But Apple encapsulate it as a package, and open it in a way like opening a package. RW disk image is similar to your bundle concept too.

Bundles are a very useful things, but all they do is giving additional semantics to a traditional hierarchical container. It doesn't solve the problem I was referring to (which is about multiple simultaneous ways of organising files).

Shirasaki said:
But, current computer is optimised for such hierarchical way to organise files. Teaching computer to think and store info similar to human still has a long way to go.

Why would you think that? BTW, what I was describing is miles away from making computer think as humans. Non-hierarchical data storage has existed for decades and decades. Look for example at relational databases.

Shirasaki said:
But what if you see individual element? How would you find the exact location of that element? We need clues to find it. And most of the time, multiple clues may be able to pinpoint that element. We treat that element as the "root", then all clues lie in a certain hierarchy level, or a "tree".

There is no doubt that hierarchy is a very powerful abstraction. Ultimately, it deals with groups and relations between the groups. The particular model hierarchal FS adopt is that of a meronymy. However, meronymy (part-whole-relationship) is not sufficient to represent many very relevant situations. We don't need to go far. The problem is well known in any area where organisation and quick access to data is needed.

Let us for example look how a library organises books — and I am talking here about proper, big libraries, not a generic small town library. Imagine you have a very large amount of books that you want to organise. If we are limited to tree structures only, we have a problem. We could for instance sort the books according to the author's name (e.g. have a folder hierarchy of name prefixes, ending in folders containing works by the same author) or year or both. However, some books have multiple authors. How do you handle those? Or what about if you also want to organise books according to the general topic or some keywords? The way how libraries were dealing with this is to introduce indexes — basically cabinets with cards, where different cabinets were organised around different piece of metadata and cards pointing to the actual book. Books themselves are usually not sorted by any obvious means, but rather by their location in the storage (e.g. row/shelf/position etc.). Library indexes are a radical departure from a standard hierarchy. Instead, you have multiple hierarchies, which are populated by the same items. You find things by consulting the hierarchy that is relevant to your needs or cross-referencing different hierarchies. If course, nowadays nobody is really using the paper indexes anymore in the libraries, instead they use a (relational) database.

Now, modern OSes have introduces many features to deal with this inefficiency of the hierarchical FS. We have Spotlight, which basically creates a 'shadow' FS by maintaining a database of indexes, very similar of that of a library index. We also have tags, which allow us to introduce groups that cross the hierarchy. We also have custom add-on indexes as ones employed by apps like iTunes or Photos. However, all this stuff is still quite rudimentary and does not remedy the underlaying problem: the fact that the files are artificially forced into a hierarchy.

I work with large data collections every day. Hierarchies might be sufficient for very simple data collections, but they are not nearly good enough for real world data organisation (as the library indexes illustrate). Take my previous example with different files referring to the same resource. The real problem is not just that its quite easy to damage the integrity of the collection, but also that the different files have different usage policies. Video and audio data are subject to privacy policy and only accessible to a handful of people. Other files can be accessed more liberally. Also, the backup policy on different types of data is different. Different files can be in different processing states and so on. Right now, we are dealing with it by having an in-house developed data management system that links together archives in different locations as well a versioned repository, and automatically creates bunch of hard links to provide different views onto the data. Its unwieldy, complex, and difficult to maintain. Now, if we had a FS that would allow us to simultaneously group files in different hierarchies, most of that would not be necessary.

Another example: my dissertation consists of a hierarchy of files, organised by chapters. Within every chapter are multimarkdown files as well as LaTeX files with figures and so on. I have a fairly complex build script that processes these files, assembles them together and runs the typesetting routine. However, the problem is that different files need to be processed quite differently. Having multiple hierarchies (chapters, figure/text/code) or even anonymous containers (same section etc.) would simplify the handling of the data dramatically and at the same time be more close to how the data is organised conceptually.

Shirasaki said:
But I believe current hierarchically designed filesystem is the best we can get. And the problem you mention is already solved.

Hierarchical FS are 'good enough'. They are powerful while remaining relatively simple (and even then, implementing a FS is a task of insane complexity). We have different tools to deal with the shortcomings of hierarchies, but I am fairly confident that hierarchical abstraction will be abandoned sooner or later. As you may have guessed, personally, I find it quite limiting. I can work with it, but its neither accurate nor convenient.

Shirasaki · May 3, 2016

leman said:
Bundles are a very useful things, but all they do is giving additional semantics to a traditional hierarchical container. It doesn't solve the problem I was referring to (which is about multiple simultaneous ways of organising files).

Why would you think that? BTW, what I was describing is miles away from making computer think as humans. Non-hierarchical data storage has existed for decades and decades. Look for example at relational databases.

There is no doubt that hierarchy is a very powerful abstraction. Ultimately, it deals with groups and relations between the groups. The particular model hierarchal FS adopt is that of a meronymy. However, meronymy (part-whole-relationship) is not sufficient to represent many very relevant situations. We don't need to go far. The problem is well known in any area where organisation and quick access to data is needed.

Let us for example look how a library organises books — and I am talking here about proper, big libraries, not a generic small town library. Imagine you have a very large amount of books that you want to organise. If we are limited to tree structures only, we have a problem. We could for instance sort the books according to the author's name (e.g. have a folder hierarchy of name prefixes, ending in folders containing works by the same author) or year or both. However, some books have multiple authors. How do you handle those? Or what about if you also want to organise books according to the general topic or some keywords? The way how libraries were dealing with this is to introduce indexes — basically cabinets with cards, where different cabinets were organised around different piece of metadata and cards pointing to the actual book. Books themselves are usually not sorted by any obvious means, but rather by their location in the storage (e.g. row/shelf/position etc.). Library indexes are a radical departure from a standard hierarchy. Instead, you have multiple hierarchies, which are populated by the same items. You find things by consulting the hierarchy that is relevant to your needs or cross-referencing different hierarchies. If course, nowadays nobody is really using the paper indexes anymore in the libraries, instead they use a (relational) database.

Now, modern OSes have introduces many features to deal with this inefficiency of the hierarchical FS. We have Spotlight, which basically creates a 'shadow' FS by maintaining a database of indexes, very similar of that of a library index. We also have tags, which allow us to introduce groups that cross the hierarchy. We also have custom add-on indexes as ones employed by apps like iTunes or Photos. However, all this stuff is still quite rudimentary and does not remedy the underlaying problem: the fact that the files are artificially forced into a hierarchy.

I work with large data collections every day. Hierarchies might be sufficient for very simple data collections, but they are not nearly good enough for real world data organisation (as the library indexes illustrate). Take my previous example with different files referring to the same resource. The real problem is not just that its quite easy to damage the integrity of the collection, but also that the different files have different usage policies. Video and audio data are subject to privacy policy and only accessible to a handful of people. Other files can be accessed more liberally. Also, the backup policy on different types of data is different. Different files can be in different processing states and so on. Right now, we are dealing with it by having an in-house developed data management system that links together archives in different locations as well a versioned repository, and automatically creates bunch of hard links to provide different views onto the data. Its unwieldy, complex, and difficult to maintain. Now, if we had a FS that would allow us to simultaneously group files in different hierarchies, most of that would not be necessary.

Another example: my dissertation consists of a hierarchy of files, organised by chapters. Within every chapter are multimarkdown files as well as LaTeX files with figures and so on. I have a fairly complex build script that processes these files, assembles them together and runs the typesetting routine. However, the problem is that different files need to be processed quite differently. Having multiple hierarchies (chapters, figure/text/code) or even anonymous containers (same section etc.) would simplify the handling of the data dramatically and at the same time be more close to how the data is organised conceptually.

Hierarchical FS are 'good enough'. They are powerful while remaining relatively simple (and even then, implementing a FS is a task of insane complexity). We have different tools to deal with the shortcomings of hierarchies, but I am fairly confident that hierarchical abstraction will be abandoned sooner or later. As you may have guessed, personally, I find it quite limiting. I can work with it, but its neither accurate nor convenient.

First, thank you for your lengthy reply.
It seems that you have very good understanding on how to manage data, while I don't have.
Could you elaborate a few points mentioned in your most current post so everyone in here would have a better chance to know what you are saying?
First of course, is this bundle thing. You want something like bundle but it is capable to have "multiple simultaneous ways of organising files".
Then, I am aware of relational databases, although don't use it much, and don't understand its under-the-hood structure. But you say what you describe is "miles away from making computer think as humans". This is mind-blowing.
Following the previous point you mention many organisations require quick data access. This demand, using current hierarchical FS, becomes harder to satisfy. Could you tell a little bit about "relevant situations"?
And talking about how library organise books, library uses indexes and numbers and certain hierarchies to organise because if they use "title, author, publisher" alike to manage, it will be very hard to write proper index to ensure readers can quickly find out a book. You may argue library can use the first letter of book name, or author name. However, books are often completed by multiple authors. This way is not that practical. Therefore, every book in the library has a unique number, while multiple copies of that title have the same number. That, a reader could quickly check index number, and find that book himself/herself. OK. This is just another way to do things.

And, how human organise large amount of data, while data chunks may have inner connections in between, is largely related to how human memorise stuff. We also need to understand how human think about something in order to design a filesystem, or more general, data management system mimicking how human does. I believe, hierarchy is one of such ways, and we are actively using it everyday.

In your company, file collections are categorised by different groups based on policies. I can think of such policies as a "branch" of "tree", for individual file. We first know "I want a file which can only be viewed/edited by certain people". This is the starting point of the branch. Then I find either those people responsible to such files to trace back to the exact file I want, or use system to find files complying the policy, and search for it. Both are also hierarchical methods when finding files. It seems that human not only use hierarchical approach to find stuff, they also use relations to help memorising things. But relations can also be abstracted into a special hierarchy.

BTW, I am thrilled that I can have a chance to discuss a topic like this.

leman · May 4, 2016

Shirasaki said:
Following the previous point you mention many organisations require quick data access. This demand, using current hierarchical FS, becomes harder to satisfy. Could you tell a little bit about "relevant situations"?

I think the examples I gave should give some basic idea about what I mean. Generally, I am talking about situations where there are good reasons for organising data in accordance with multiple simultaneous structures.

Shirasaki said:
And talking about how library organise books, library uses indexes and numbers and certain hierarchies to organise because if they use "title, author, publisher" alike to manage, it will be very hard to write proper index to ensure readers can quickly find out a book. You may argue library can use the first letter of book name, or author name. However, books are often completed by multiple authors. This way is not that practical. Therefore, every book in the library has a unique number, while multiple copies of that title have the same number. That, a reader could quickly check index number, and find that book himself/herself. OK. This is just another way to do things.

Yep, thats what I meant. The reader can consult the relevant author/topic etc. index to get the ID of the book and then either get the book themselves or order it (as it is usually the case with larger libraries).

Shirasaki said:
And, how human organise large amount of data, while data chunks may have inner connections in between, is largely related to how human memorise stuff. We also need to understand how human think about something in order to design a filesystem, or more general, data management system mimicking how human does. I believe, hierarchy is one of such ways, and we are actively using it everyday... Both are also hierarchical methods when finding files. It seems that human not only use hierarchical approach to find stuff, they also use relations to help memorising things. But relations can also be abstracted into a special hierarchy.

There is no doubt about that! I'd even say that its not just hierarchies but categorisation in general (of which hierarchy is a particular case). I am certainly not advocating abandoning hierarchies in FS, just making them more flexible, that is, allowing one to maintain multiple parallel hierarchies.

Again, OS X and Finder already have some features of this sort, such as tags and All My Files. They are also very fast. This is achieved by having an additional database that maintains indexes which are used to accelerate search queries.

throAU · May 4, 2016

Shirasaki said:
For bundle stuff, xxx.app in Mac OS X is what you want. Basically it is a folder. But Apple encapsulate it as a package, and open it in a way like opening a package. RW disk image is similar to your bundle concept too. We create an image, then put everything related to each other together.
About the inner constrains, or more generally, relationships, current filesystem may not be able to achieve this feature. But, current computer is optimised for such hierarchical way to organise files. Teaching computer to think and store info similar to human still has a long way to go.

And about your graph. We see this graph "as a graph", then you know the entire picture of that graph. But what if you see individual element? How would you find the exact location of that element? We need clues to find it. And most of the time, multiple clues may be able to pinpoint that element. We treat that element as the "root", then all clues lie in a certain hierarchy level, or a "tree".

I am neither professional computer scientist nor neuroscientist. But I believe current hierarchically designed filesystem is the best we can get. And the problem you mention is already solved.

My two cents.

OS X (both the filesystem, and Finder) already supports file tags. You don't need to rely purely on the filesystem hierarchy. Just tag your files with relevant tags and you can search by tag.

Where the file is on the filesystem is not the only way of organising stuff any more.

So its already a little better than your last statement suggests.

jpn · May 4, 2016

related: lately i have been using All My Files heading in the left side bar. its way more useful in than command + spacebar given its ability to re-order the list based on time or size or date created or modified or Kind, or whatever. if I have created a lot of files and just allowed them to be spread all across multiple folders which is the way it happens sometimes when you output a lot of files across a couple of days.

Partron22 · May 4, 2016

leman said:
OS X aliases continue working even if the file was moved. Try it out.

Try moving it to another drive. That does not always work.

Shirasaki · May 4, 2016

leman said:
There is no doubt about that! I'd even say that its not just hierarchies but categorisation in general (of which hierarchy is a particular case). I am certainly not advocating abandoning hierarchies in FS, just making them more flexible, that is, allowing one to maintain multiple parallel hierarchies.

Again, OS X and Finder already have some features of this sort, such as tags and All My Files. They are also very fast. This is achieved by having an additional database that maintains indexes which are used to accelerate search queries.

So, if you want multiple parallel hierarchies, just create multiple branches and manage them. But you say this is not what you want. OK. I am stuck at hierarchy stuff and I am lost.

I stop here. You win.

leman · May 4, 2016

Shirasaki said:
So, if you want multiple parallel hierarchies, just create multiple branches and manage them. But you say this is not what you want. OK. I am stuck at hierarchy stuff and I am lost.

I stop here. You win.

Its not about winning to losing

Through discussing these things we get to contemplate new possibilities. Thats at least what I find to be the most valuable thing.

Jess13 · May 4, 2016

The better option would be to have Apple expand the functionality of both Finder and Spotlight, without removing or crippling either. More options, functionality and enhancements, not less. Personally, I would like to have a Start Menu-like functionality added to Finder in the Dock. So when I click the Finder icon I can have it open a mouse-through set of panels to navigate to certain folder areas in Finder instead of having to first open a new Finder window then navigate by clicking my way through. Mouse-through panels with hover functionality not having to tap or click, from inside Finder in the Dock, please. Thanks, Apple.

This is functionality-limiting

mildocjr · May 4, 2016

I'd like to keep the finder as is, every other operating system out there uses the same time of structure: Windows, Linux, and Unix. The purpose of Launchpad was a gimmick, that's it. I too put my Applications folder in my dock just for easy access, but if I forget the name of an app, I use Spotlight (cmd + space) for a quick easy shortcut to get me where I need it.

Mac OS is also marketed more towards professional artists, musicians, journalists, and film makers. I know that industry requirements have exceeded some of the current generation Mac capabilities but that's for another thread. However, these professionals require the use of Finder and a hierarchal structure so they can organize their works, not to mention gain access into otherwise secretive areas such as the ~/Library folder. (If you don't know what it is, browse it but don't touch it, read a guide before you do).

Like it was mentioned previously, if you are looking for an OS that does not project a file system at you (ever) look at an iPad with a keyboard. I mention iPad because Android still lets you have some access to a file system. The idea behind a file system was simple to begin with, it was the best computer representation of a filling cabinet filled with folders and files.
[doublepost=1462422713][/doublepost]

leman said:
OS X aliases continue working even if the file was moved. Try it out. Aliases bypass the hierarchical filesystem and reference the FS storage directly. I have no idea about the details though.

The details are pretty simple, it makes a soft link (or symbolic link), similar to an Internet or network shortcut on Windows. The only difference is that when the file is moved, the operating system is like "hey that file moved, I know we've got a shortcut to it so let me update that as well." It's not really saying the file is on the disk at sector 10234124 it's just noting that a file is at /Users/Username/Documents/MyEssay.pages and this is noted in a database (for lack of a better word) file, when the file is moved it checks the database file to see if a link exists and if it does it updates it to reflect where it moved. When you open an Alias, it just checks that database and opens the file it's associated with.

This is different from a hard link which is the same as a shortcut to an application on Windows, if you move the application the link doesn't work because the actual path is stored in the hard link.

Search

Search

Should Spotlight overtake Finder?

leman

macrumors Core

Shirasaki

macrumors P6

leman

macrumors Core

throAU

macrumors G3

jpn

Cancelled

Partron22

macrumors 68030

Shirasaki

macrumors P6

leman

macrumors Core

Jess13

Suspended

mildocjr

macrumors 65816

Our Staff