Organizing files to prevent duplicity (but still automating backup)


In an ideal world, we’d have a “Google File System” right now which would have journaling and even versioning built in, no need for folders and we’d just label files. I believe this is still far to happen.

My question here is: how to get as close to that world as possible, today?

Also, can we do it on a mac and keep dropbox ok with it?

Let me give an example… I was applying to many universities and had many files in common between them, including my own document IDs, most of which are files I already save in a dropbox folder. I wanted to group them within a folder for each application, since they all have different files as well.

The common way to do this today is copying the files. This is quite bad, because it will generate many duplicated files. Let’s say I want to update one of them, I have to replicate everything, supposing I remember where is each copy. Copying files make sense when you do want differences among them, you want to make a new version. Not just to organize in folders.

For that there exists hardlinks and related stuff such as alias. Both are problematic. hardlinks break too easily, alias are too big. 2mb for a file that is actually 100kb. Also, when copying either into a pen drive, for instance, it copies the “link” instead of the actual file data and then it will break.

Alias also doesn’t work with dropbox, which is my common way to backup most important files (I also have crashplan and time machine). While hardlinks will remove space quota from my account.

I actually think there’s just no answer to this. But whatever happens, collectively, will be a good enough answer to me! :wink:

Any good link explaining why we should stop using microsoft word?
Best way to automate clipboard backup on a mac?

This isn’t Stack Exchange’s programming site; I don’t think you have to worry about that here. Ironically, I often get a lot of helpful advice from those “closed” questions, especially when there are no discussions in another, “more appropriate” forum.

So back to your question: you’re looking for some sort of de-duplication service that can be transferred across multiple operating systems, storage media, and online services. I’m going to go out on a limb here and say that it doesn’t exist - at least not something that doesn’t require some user interaction.

What I would do in your case is move the dupes to a shared folder, then create a file in each school’s folder called “Shared files.txt”. In that file, I’d list all the files needed for my application packet for that school.

It’s also possible that some compression utilities may de-duplicate files. I haven’t tested this, but you could try compressing 1 folder with the same, large image files. Then do it with 2 folders, then 3… see if the folder grows at a consistent rate, or if it just gets minutely bigger with each added folder. (I just tried this on Windows, uisng the built in file zipper. No joy.)

If you’re a programmer, this may also be a good time to write your own tool to archive the files and create your own de-duplicated storage container… but you may find that the overhead of writing a tool for this far outweighs the savings you’re gaining. You may be better off just investing in a good Zip utility.


Haha, you caught me! I actually posted the very same question on Super User and just pasted it here… Forgot to remove that part! :stuck_out_tongue:

Not looking for a service. Just a solution or ideas on how to better organize my files when I kinda feel the need to duplicate / hardlink them.

The TXT file is not a bad idea, I haven’t even thought of doing it, but it wouldn’t be any better than hardlinks. The TXT “link” (which is actually manually looking for the name) would still break (if name or location changes) and it’s almost harder to keep track if it does break. While it’s not broken it’s actually also worse than hardlink, because I still have to go manually through each “link”.

Compression utility would actually be a neat idea if it worked. I just tried it with regular zip and also no luck. But even if it did work, there’s a drawback: I couldn’t do it when folders are too far apart. Which Zip Utility do you recommend other than 7-zip? Although I couldn’t find a new version of 7z for mac. All 3 ones listed on the official site are outdated!

And I rather stick with my relatively small issue on redundant wasted bits than trying to write a program to solve this issue at this point in my life. I was just hoping someone had different priorities and might have figured a good way around it already. If there is such a solution, I feel I’m quite far from it yet.


So, my answer so far is: 123 views, if 12 of those actually read, then we got some random internet numbers from geeks who come here saying “yeah, sorry, no answer indeed”. :stuck_out_tongue:


When will Google release their FileSystem in someway we can install in any desktop?! :anguished:

Don’t bring me that Hadoop bullcrap! I want just something like dropbox to store files without duplicity, with tagging instead of folders and a search that actually works! :stuck_out_tongue_winking_eye:


There really isn’t anything like that. OS X / iCloud have tagging support, but it’s clearly not cross platform and so far iCloud Drive doesn’t really exist yet. I’m not certain whether the tags work over Dropbox or not, although I vaguely think they do.

And I have no idea how to deduplicate.


Actually, closest thing we got right now is google drive, I’d say! :wink:

At least they got the searching part right, which is the most important one.

But yeah, I never heard of a good deduplication system as well! :frowning:


Actually, maybe answering how to reduce aliases file size could help a lot!

And now that dropbox have 1TB, maybe they will also soon resolve this issue for us… :smiley: ( although dropbox search is still quite poor :frowning: )