Thursday, February 18, 2016

Backups, Kickstarter Ideas, and ReFS

Warning: Nothing in this blog post relates to photography.  I do this from time to time.

In this Issue:

  • ReFS on Windows 10
  • How Not to Back Things Up
  • 4 Kickstarter Ideas that Solve Major Societal Problems:
    • Uber for Seniors
    • E-book Authoring Tools
    • Encryption That Makes Everyone Happy
    • Class-Action Suit against Credit Card Companies

ReFS on Windows 10

If there was ever a compelling reason to upgrade to Windows 10, it would the the optional ReFS file system which was designed to combat the bitrot problem described in my previous blog posts.  After doing the research for those posts, I had committed to buying a NAS4Free server (running ZFS) and attach it to my home network to solve the problems.  It was cheap and it was available now.

Of course that didn't happen.  I didn't have the budget to acquire more hardware and I certainly didn't have the boatload of time required to learn and configure everything to my needs.  "I'll just do the easiest thing - wait for Microsoft's ReFS effort to mature and use it when it becomes available."  Based upon everything I read, Microsoft was rolling out ReFS slowly, at the same rate as the new wave of driverless cars, because pushing it out before it establishes a solid track record can be dangerous.  And so, even though ReFS is baked into Windows 10, it's not officially available to consumer users.

And even if it were, right now it can only be used in conjunction with an ambitious Storage Area Network software layer called "Storage Spaces".  With Storage Spaces, you can attach a collection of any old hard drives you have lying around, and it appears to you as one giant disk.  If you have mirroring configured, you can even swap out dead drives without any problem.  It's actually pretty cool and potentially very useful.

But I don't need any of that.  All I wanted to do was have my many multi-terabyte USB hard drives refresh and renew themselves so my growing archive would not disintegrate all the while "chkdsk" showed no problems.

Imagine my surprise when I learned that ReFS could indeed be enabled!  All you had to do is install a registry key, format the drive, then deinstall the registry key and reboot.  And bitlocker even works on it!

Want to learn the details of enabling ReFS on your copy of Windows 10?  http://www.makeuseof.com/tag/try-resilient-file-system-windows/  (Mac users: see previous posts about bitrot which talk about options for OSX.)

So I upgraded to Windows 10 and I tried it.  Then I went back to Windows 7.

What went wrong?  Under Windows 10, on my old hardware, USB transfer speeds slowed to a crawl; from 80-100 Mbps sustained to a measly 1-2 Mbps.  I guess some drivers never got updated.  Going back to Win7 fixed everything.



That was a waste of 2 days.

I'm not sure how I would have tested ReFS anyway.  It's like evaluating the effectiveness of vitamin supplements.  Still, I know the need is there and my data set seems to grow exponentially, so I see this in my future.

(Scholarly note: For power users there's a lot to like about Windows 10... the Storage Spaces disk virtualizer, it can handle high-density displays better, virtual desktops (something Unix has had since the '90's), God Mode control panel, Game DVR (helps me make instructional videos), better command prompt and power shell... too bad data transfer speed took a big hit otherwise I would have been be very happy with the upgrade.)

How Not to Back Things Up

Since we're in full-on geek mode, let's continue and talk about backups.  If you're presiding over terabytes of information, you'd think one of those high-falutin' fault-tolerant RAID arrays would be for you.  Believing that is like believing your pictures will improve if only you had a more expensive lens.

Those devices may offer a level of disaster protection, but they are not a replacement for backups.  Don't believe me?  Let's say some sort of ransomware gets into your system and locks up all your data.  (Which is what happened to these folks.)  If all you had was a RAID array, your duplicates would be held hostage as well.

RAID is great if you're in a production environment and you can't tolerate any downtime.  (You'll still need to back things up regularly, though.)  In my mind the extra reliability is nullified by the increased expense and the increased complexity that can become an additional failure mode.  And when solutions like Drobo which use proprietary file systems fail, it's much more difficult to recover from it - you have to invest in another one of their systems to read your disks.  Here are two stories from people who should have known better:

Scott kelby reference: http://scottkelby.com/2012/im-done-with-drobo/

David Kilpatrick had a similar problem with Drobo, which eventually got resolved (and the article's in print only so I can't point to it.).

(If only these folks would have asked me first...)

Yes, my websites have a high availability requirement, but that burden is on my web hosting company (globalhost.com).  So if my desktop or laptop computers fail for an hour or for a day, I'm not out of business.  And so I can save myself a tremendous amount of money and complexity by designing a backup system that meets my needs.  There's a very good chance that your needs are pretty close to mine, which is why I'm sharing these here.

Before you can design any kind of a system that is going to protect your data, it makes sense to identify the risks you're trying to protect your data from.  Here is my list of threats I'm concerned about:
  • Fire or theft
  • Hardware failure
  • Malware that somehow gets onto my system and starts deleting data.  
  • Ransomware (same thing as malware, really)
  • Accidental deletion or mangalation of a file by me.
That's it.  The other "soft" criteria is that I should be able to recover from a hardware failure in about a day.

The solution I use is cheaper than any commercially available backup solution out there.  I purchased 3 sets of USB 3.0 external drives, and use them in the following ways:

1) The first set of external disks are permanently attached to my desktop, and act as my main working drives.

2) The second set is my backup set, and is always OFF unless I’m making my nightly backups (that way if there’s a computer virus it can’t infect a drive that’s powered off.)  If there’s a fire and I have to grab something quick, I grab these drives and my laptop and scram.  I'll lose no more than a day’s work this way.

3) The third set acts as my off-site backups.  One of the drives is kept by a friend, and once a week it is swapped out for set 2 above.  If the unthinkable happens (fire/theft/both), the most I’ve lost is a week’s worth of data.  (And even that's not a problem - see step 4).

4) For projects where even losing a day's worth of work is unacceptable, I've started using dropbox as my working drive, and include it in my nightly backup schemes.  No data loss ever!

I've had every type of hardware failure happen to me - one time happening in the middle of a backup! - and have never lost any data using the above scheme.  It has served me well and didn't represent a huge capital outlay.

(Scholarly note: Online backups to the cloud won't work for me, since my data set is so large it would take me months to recover all the data by downloading.)


4 Kickstarter Ideas

My entrepreneurial days are over (truth be told, it ended here); but my ability to see legitimate needs and business opportunities will never go away.  And so I'm giving away four ideas that the world really needs.  If someone wants to go ahead and run with any of them, be my guest.  I will have done my part in helping to make the world a better place.  (And if you want to kick back 0.5%, I wouldn't mind that either. :-) )

1) 'Uber' for Seniors

The smart clamshell phone; configured just for Uber and Lyft, plus one-button dialing for family members.  Simple enough, right?
My dad is 88 years old and, god bless him, still teaches Systems Engineering and Constraint Theory as a professor at the University of Southern California.  But his driving skills aren't what they were, and the family would feel more comfortable if he were no longer making the commute.  Uber seemed like an ideal option for him, and so I got him a hybrid clamshell phone / smartphone (to retain the user interface of the clamshell phone he's used to), and configured it to just have the icons he'll need.

I'll spare you the detail of all the things that went wrong in getting him to operate a smartphone; suffice to say the experiment didn't go very well.  As one ages, even an iPhone is too complex to use.

Adding to the problem is that the Uber user interface is pathetic.  It requires several button presses, and if your finger wanders a little bit the car will pick you up across town instead of where you are.  Plus there's that "surge pricing" which I prefer not to patronize.  Lyft, which in theory requires only two button presses, presented other problems as the GPS feature thought we were across town.  Later, when the car arrived but couldn't find us, we were unable to respond back to the automatic "I'm here!" text.  (How useless is that?)

Watching this whole thing unfold was frustrating.  I knew how simple the process SHOULD be, and this is far from it.  This must have been how Steve Jobs felt when he had in his mind the iPhone but he had to pitch the Rokkr instead.

WHAT THE WORLD NEEDS is an "Uber for Seniors" - a one-button phone with a Come-get-me icon.  You press the physical button and a car appears.  That's it.  (Oh, there might be a status screen saying "your white ford escape will be there in 2 minutes" but no fine motor skills or concepts of modes should be required.)  I can see this being a secondary button to the Life Alert pendant - one button for transportation, one button for "I've fallen and I can't get up".

I believe the market for this is huge.  All it takes is someone to implement it properly.

2) .epub and .mobi authoring tools

I've written about this before.  The problem still isn't solved.  E-readers such as the Kindle were primarily designed for text-only books, and if you want to offer a complex-layout title with figures and tables you have to be a programmer.  Yes, there are tools such as Sigil and Calibre but they only takes you 90% of the way there.  The last 10% on a 600-page book is an exercise in tedium and an utter waste of consciousness.  Currently, publishing houses outsource this to India to have it polished by hand at considerable expense and time.

So if some giant, resource-rich organization were to tackle this difficult-to-solve problem, the world would be a better place for content creators.  Microsoft is actually well-positioned to do this, as starting with Word 2007 the entire internal document structure is represented via XML, which is like a higher-order HTML. In other words, they're almost there.

(Want proof about the internal structure?  Rename a .docx file to .zip and then open it - you'll be able to see and browse the interior document structure.)

3) Encryption That Makes Everyone Happy

As often happens with politicized topics, there is much ignorance and lies being spewed by all parties when it comes to the encryption vs. privacy vs. government access vs. security debate.  I'm not advocating mass eavesdropping; rather I'm talking about legitimate law enforcement needs to solve murders and kidnappings in cases where a warrant has been issued.  (Traditionally, warrants have been the mechanism to keep power-hungry government officials in check.)

If you've been following this subject at all in the media, you'll be hearing two major arguments:

1) "Strong encryption prevents the government from preventing terrorism, therefore manufacturers must install 'back doors' to the encryption that the government can use to eavesdrop".  (This has been proven to be propaganda, as there are no demonstrated cases where not having access to an encrypted channel would have prevented anything.)

2) "We want to help law enforcement, however if such back doors were to be installed, hackers would be able to access it too, allowing no shortage of evil to take place.  Plus, the NSA and other officials have demonstrated that they're not as concerned about due process when it comes to overstepping eavesdropping authority.  It would be a public policy disaster and U.S. tech companies would lose international business as confidence in their security drops."

The above set of arguments is what's called a false dichotomy; it implies that these are the only two options available.  Throughout this argument, nobody - not even encryption experts - has talked about existing encryption algorithms which can meet everyone's legitimate needs without necessitating a back door.  It's called (m,n)-threshold encryption, and it works like this: Instead of having one key (that can both encrypt and decrypt), or two keys (one to encrypt and another to decrypt), you can encrypt anything using m of n keys, meaning you can have multiple keys floating around, and any 2 or 3 (or whatever combination you choose) of those keys can decrypt the contents.  You can also configure it to have just one of the keys lock but two of any of the other keys will be required to unlock.  It can be custom-tailored to meet specific use cases.

How would this work in the case of a smartphone?

In this instance I would propose issuing three keys, one of which works as it does now, and two others being distributed to the mobile phone manufacturer (let's say Apple for the sake of example), and the FBI.  By themselves, neither Apple nor the FBI would be able to decrypt the phone.  However, when a warrant is issued the FBI approaches the mobile phone manufacturer with the warrant and their key, and upon verification of the warrant the manufacturer can combine the FBI's key with their own key to decrypt the information.  And since each person / phone / communication channel would get their own unique set of 3 keys, if one decrypting key combination were to be stolen or leaked, all the other phones would still be secure.

Of course I'm not familiar enough with the ways key management is used in modern mobile phones.  So I emailed a foremost expert on the subject, Bruce Schneier, whose book "Applied Cryptography" has been referenced by me for longer than I care to admit.

Bruce wrote back the next day:  "Many people have proposed secret sharing schemes for government access. What you're missing is that the problems are legal, and not mathematical."  (Gotta love a busy guy who answers his own email!)  Unfortunately he didn't go into more detail, but at least my premise has been validated: This problem has a technical solution which can be a win for all parties involved.

So, this is an opportunity to save the world from an epic political logjam, protect people's information from overzealous snoops and hackers alike, yet still give law enforcement a valuable tool to help track down that kidnap victim when there is a warrant.

(I haven't asked Edward Snowden his opinion yet.  I think he has bigger problems to deal with right now.)

4) Class Action Lawsuit Against Credit Card Companies

You know how your credit card companies make you feel warm and fuzzy, saying "Don't worry if you see some unauthorized activity on your bill - just let us know and we'll remove it"?  Do you ever wonder who absorbs that loss?  I'll give you a hint, it's not the credit card companies.  It's the merchants.  Merchants like me.

This can be a huge burden if you have an online shop that mails out physical goods, like I do with my other business: Maui Xaphoon Musical Instruments (www.Xaphoon.com).  A typical order goes for between $90 and $250 and we ship out physical goods right away.  I can't tell you how many times the credit card got approved at the time of sale, only to be reversed weeks later because the purchase was made by stolen c/c info.

Why am I the one who must pay for the weak security of the credit card infrastructure?  Why isn't the issuing bank absorbing this loss?  Why go to the trouble of getting an authorization for each transaction when the "Approved" response means nothing?  Perhaps if the parties involved were to be held liable for all the fraud their weak systems allow, they might be further incentivized to replace it with something that requires more than a 3-digit security code and a matching zip code.  (So far the only innovations they've implemented make data theft easier, not harder.)

So I propose all merchants get together and conduct a class action lawsuit against Visa, Mastercard, and American Express to make them absorb the fraudulent transactions that they are approving.  I have no idea if you can fund a class-action lawsuit with kickstarter, but it seems to me this would be an ideal way to find out.

(Oh, if only I could just accept bitcoin for payments instead of credit cards, this whole problem could go away.  Right now the weakest aspect of bitcoin is their digital wallets, but that won't affect me if I convert them to dollars daily. :-) )  (Too bad most customers don't know how to pay with bitcoin...)


Until next time,
Yours Truly, Gary Friedman

31 comments:

  1. No class actions allowed. Only binding arbitration from a company they effectively own. Good luck with that. Plus, they have the clause they can change the terms of the agreement at their whim. Best to create your own card; stores used to do that all the time.

    ReplyDelete
  2. Backup to any drives you want: Investigate the AeroFS sync and backup solution. I have used it for quite a while now, and its saved my bacon more than once! There is still a need for a third backup location, such as an external drive to keep data from viruses etc. but for the rest it work a treat!

    ReplyDelete
  3. RE: RAID systems. Because we used RAID extensively in my work environment for human resources, financial, and other critical systems, I was interested in trying it at home. I set up a Dell with RAID 1 (two disks) and was happy until I installed Adobe's suite (CS2 back then). Adobe could never understand the system, running for a while, then claiming there was a problem, with each application. Many discussions with Adobe tech support, many re-installs, including a patch. I finally gave up on RAID for home!

    ReplyDelete
    Replies
    1. That is very strange since the storage mechanism is supposed to be completely invisible to the apps. CS2 should not have known that you were using RAID!

      Delete
  4. Gary,

    You do know that some of the online backup services will send you a hard drive for recovery. see Backeblaze as an example.

    ReplyDelete
  5. Hi Gary.. Interesting to hear your experiences with Win10. I upgraded a while ago and haven't seen anything like the data transfer rate drop. I have a couple of quite old USB2 hard disks which work at the same rate as they did under Win7..quite slowly, but the same.

    Now about backups. I work as a database admin, so backups are near and dear to my heart. My mantra goes like this: Backup, then backup again, and then do a backup. Then do a restore, because you don't HAVE a backup unless you know you can restore it.

    I went the cloud route, though I don't yet have anything like your volume of data.
    When I connect my camera to my PC this is what happens.
    DropBox gets the files off the camera
    Google Drive gets the files off the camera.
    So thats two backups done and on their way to cloudy heaven.
    I then open Capture One Pro (I prefer it to Lightroom) and import from the camera to a catalog on my local hard disk.
    Once thats done, Crashplan starts working. I absolutely recommend Crashplan as a backup system. Crashplan works in the background (it's not heavy and very polite about getting out of the way when I'm doing heavy lifting in Photoshop or playing a game) and sends any changed or new files to a USB drive locally and also to their cloud servers.
    Yes, the initial backup takes ages, but once it's done it's done. If you ever need a complete restore, they'll ship you a hard disk that you can just attach to your rebuilt system.

    So in total there's 4 copies of my RAW files before I even start working on them. Until I remove them from the card in the camera, there are 5 copies.
    I bought 1tb on both DropBox and Googe Drive and they're about half full at the moment. Crashplan I have a 4-year subscription with effectively unlimited space (I don't believe unlimited, but certainly many Tb).

    All the best, BAzz

    ReplyDelete
  6. Gary:
    Re: ebooks etc. Take a look at exTyles and Bruce Rosenblum, in Cambridge Mass.

    ReplyDelete
  7. Gary:
    Re ebooks etc: take a look at eXtyles and the work of Bruce Rosenblum in Cambridge Mass. Or Inera.com

    ReplyDelete
    Replies
    1. Thanks for that reference; I had not heard of them. (One red flag, though, is that there is no price list on their website...)

      Delete
  8. Hi Gary. I simply use a blu-ray drive with M-discs and put it in a safe away from home. Should be pretty safe (I hope)But I first had to have a HD failure to do that.:-(. I recovered most if not all, but hard work ... :-)

    ReplyDelete
  9. I'm a bit biased when it comes to the OS-based solutions or software-raid setups. :(

    I agree that data needs to be protected at multiple levels:

    * physical hardware failures
    * bitrot (signalling/software failures)
    * OS crashes (kills software raid solutions)
    * serious failures(malware/site damage)

    Bitrot protection like what ZFS and reFS do. (I think BTRFS supports a feature to detect/avoid bitrot as well)

    Mac OS X doesn't have a good approach to deal with this issue yet. I _could_ install OpenZFS on MacOSX, but the risk is that with any given OS upgrade, it could break. And that was a deal breaker for me. :(

    To date, my solution has been to go with a RAID6 array: ARC-8050T2 w/6TB disks. (48TB raw, 36TB Raid6 storage usable)

    The ARC-8050T2 has a scrub feature which can help to combat bitrot. I have it setup to scrub weekly. Not as tight a safety net as ZFS, but it's also not OS dependent. So upgrades to OSX won't impact the functionality of the ARC-8050T2's internal functions.

    The ARC-8050T2 array addresses bitrot and physical drive failures thanks to the built-in scrub/re-silver functions and the RAID6 configuration I'm using. Can suffer loss of any 2 disks. Which is good, because it takes about 2 days to re-silver a disk that has been replaced.


    Time Machine backups from the desktop and laptops are sent to the array via the desktop(MacOSX Server). This provides a "backup" for the laptops and the desktop's OS/apps/user accounts. However, it does not provide a backup for the RAID array data itself.

    Original raw and hi res images are stored on the array, but the cache/catalog are stored on the system's SSD.

    Because it IS always-on storage and there is only one copy, it fails the backup test. But to do that, I would need another array to backup the data to. And that gets into the "*facepalm* I need to backup TB(s) of data..."

    The solution is to eventually create a second array of equal or greater size, which will be sync'd to nightly. Will probably use Bombich's Carbon Copy tool, which now has a scheduler function. So it can be setup to send the data to the second array periodically.

    I would probably setup the second array so that it is hosted off of a mac mini or something to keep it separate from the desktop.

    ReplyDelete
    Replies
    1. After looking around, I may opt for a Synology standalone NAS unit. They come with multiple gigabit ethernet ports and for backups, that would be more than fine.

      Delete
  10. Hi Gary, a message from Europe: your encryption solution seems to suggest that everyone in the world using Apple stuff should trust the American FBI as 'THE law enforcement' Well, I can tell you that I don't feel comfortable if Apple and FBI are the parties to decide whether or not to invade my privacy. Then on security of Credit Cards transactiosn : here we use an Credit Card Company App to generate a code for extra validation for credit card transactions...

    ReplyDelete
    Replies
    1. I don't trust them either... the basis of the suggestion is that a warrant would have to be issued first. No balanced system is perfect, but this approach seemed reasonable to me. Regarding the extra code for validating credit card transactions, that sounds like a good start.

      Delete
    2. My son whose IT company is hack attacked hundreds of times a week responded to this idea saying that the hackers don't attempt to discover the key itself but use the technology that surrounds the key to break the system. He's opposed to any technology which allows any (including multiple) key access by third parties. As an example, SSL banking encryption was thought to be unbreakable.

      Delete
    3. Anyone who proclaims ANY security scheme as being "unbreakable" is demonstrating their ignorance of the subject. SSL was designed to be strong and vastly preferable to unencrypted communication.

      Delete
  11. I would SO have bought that guitar, Gary!

    ReplyDelete
  12. I always considered myself fairly knowledgeable, at least for a non-professional, about computers. But I had to look up ReFS. So is there anything for us non-pros to use that you pros would consider safe and dependable to keep our meager files save?

    ReplyDelete
    Replies
    1. Check out the links to two former blog posts embedded in the first paragraph of this post. I talk about options there... Probably the easiest solution today would be to go for a NAS4FREE server and use that to hold all of your files.

      Delete
  13. Thanks for responding so quickly. I do remember reading those post. As a matter of fact I now use Teracopy. The only issue (always is) is I can't use Lightroom for direct importing. That means adding to a Lightroom catalog is now a 2 step process. The other thing is what about exporting edited or even unedited files from Lightroom or any other image management program? Programs like Lightroom don't have file integrity checkers do they? And I have noticed from time to time having issues when I tried to move or otherwise access some older files in Lightroom. I will get a message Lightroom couldn't open or move the file due to suspected corruption. I have a list of such files I haven't had time to investigate.

    Oh by the way have you are anyone you know ever used a program named Win-Hex to fix corrupt JPEGs?

    ReplyDelete
  14. I forgot to ask if you've tried using a program named GoSync? It is a computer synchronization and backup program which allows for data integrity checks. I've been using it for a little while and although there's a bit of learning curve for non-backup geeks like me, if a person needs just simple backup or syncing it's simple to use. So far I've only used it to do data synchronization between external hard drives. I haven't tried using it to setup true backups from a connected external hard drive to a NAS device.

    I don't understand why raid systems say like Drobo have data checking? As you say what is the point of backing up an already corrupted file. I was always under assumption that once a file had been backed up that subsequent backups would check the same file for changes and verify the new file. I always had my backup program configured to only backup changed files.

    ReplyDelete
    Replies
    1. So answering one point before another.

      Data checking or checksums: Boxes like Drobo or the awesome ARC-8050T2 do data checking periodically to ensure the data on the disks are valid.

      Invalid data can occur for any number of reasons:
      * bad sectors on a drive.
      * power loss or unclean restart/unplug/reboot

      The arrays are recalculating the checksums for blocks of data on the drives to ensure that the data is still correct.

      Note that RAID protection isn't backup. It's just data protection.

      And services that sync your data between your desktop/laptop to a remote device(USB drive or cloud service) needs to compare files to make sure all changed files are backed up and for the sake of efficiency, only transfer the changes.

      My ARC-8050T2 is configured to do a weekly re-checksum of all the disks' data to ensure nothing has gone wrong.

      Note, data can fade or become corrupted on drives for any number of reasons, the assumption with a RAID array like Drobo or ARC-8050T2 is that you want your data protected. And so it checks it periodically.

      Delete
    2. Wing Tang, everything you say is true. My attraction to ReFS was that it, too, proactively checked for bit rot without the need for a fancy RAID box with a proprietary file system.

      Delete
  15. Wing Tang, thanks for explaining the yet still confusing issues of backups vs syncing. I do somewhat get it but I must confess I still don't understand why creating a duplicate copy of a data set is not considered a backup? If I understand what you're saying an incremental backup or mirroring is NOT a backup even if it's data checked for errors?

    Why do I (we) have to be data processing or computer hardware and network specialist gurus to keep our data safe? Why can't I rely on the expertise of others so that if the box says backup it actually is backup? I think I might speak for a lot of people when I say I don't have the time or inclination to study data protection, reliability and longevity.

    ReplyDelete
    Replies
    1. @GEGjr hmm... sorry, let me reclarify:

      Given your box/original data. The primary storage you have it on, if it is a single drive, like in most laptops/desktops, it is unprotected.

      If you store the data on an array and run your data off of that array, it is protected via the RAID. This protects you against single disk failures. The larger your disks and the larger your array, the more disk failures you will want to be able to survive, due to rebuild times. My 48TB raw/36TB usable RAID6 can suffer 2 disk failures. But the 6TB drives take 2-3 days to rebuild, from the time I replace the failed disk. If I were using RAID5, a second disk could fail in that timespan, causing complete data loss. So the bigger the disks and the larger the array, the more protection you will need.

      Low level errors can occur, which RAID may miss. So this can lead to bitrot. Which is why some advanced filesystems and some RAID servers will re-verify/refresh data on the array. This protects against bitrot. A write could have gone bad enroute to the disk. A head could have skipped a track due to vibration/shock. Brown out/power outage. etc. All of that can introduce byte sized, block sized, or even stripe sized chunks of errors, which if undetected and uncorrected, can result in corrupted data. Windows ReFS and Sun/Oracle's ZFS protects against this using checksums during each and every operation. RAID arrays protect against this by running checksum checks against the entire array periodically.

      So the above covers the different forms of primary storage.


      So let's say you have your laptop and you backup to a TimeMachine. If that Time Machine is a physically separate set of storage from your laptop, then it is a backup. Ie, it isn't used as your primary storage.

      The same rules of data protection apply. Single disk = high risk. RAID5/6 = protection against disk failure. Checksums, either during operations or via periodic jobs = protection against bitrot.

      But let's say you have a desktop computer(or laptop) with an array attached to it all the time or most of the time. Even if you are storing your Time Machine backups to that array, because you are actively using the array as part of your primary storage, say you have an array that's split so you have 2TB for active data and 4TB for TimeMachine backups. If you lose the array, you lose your active data as well as your local TimeMachine backups. So generally speaking, one might not consider backups made to the array you actively use as a real backup.

      The issue being if a serious mistake is made, it can wipe out all data on the computer and all physically mounted storage.

      Same thing for backups of backups on the same array.

      If the array is connected ONLY for the sole purpose of making backups to, and is disconnected after backups are done, then that storage is considered a legitimate backup. If it has RAID and checksumming protections, then it is a backup protected against disk and bit-rot failure.

      My personal setup at home actually violates this definition of backup for my desktop, since I use the array as part of my primary storage. Though it is considered a legitimate backup for my laptops, which sends their time machine backups to the desktop's array via Apple's Mac OS X Server.


      Storage arrays that offers "snapshots", ie, point in time "backups" of your data are basically local copies of your data(technically, copy-on-write deltas of your data, since there is only 1 copy, with differences written to other parts of the array). So loss of the array or loss of the main copy will risk loss of all copies on the array. It's like an "undo" vs a backup. Both ReFS and ZFS offer this feature as well. But neither considers this feature a "backup".

      The idea of backups is that you can go back to them in the event of a serious failure on your primary computer. If your backup is live storage on your primary computer, chances are, your backup will be impacted as well.
      (answer to your 2nd question in another post)

      Delete
    2. @GEGjr here is the answer to your second question:

      But to your second question... quite honestly, building up a working understanding of what the terms mean and how things generally work will help you greatly in choosing a good service/product to make use of. It allows you to make an informed choice.

      But yes, you can rely on others. You can rely on hardware/software solutions. But even then, it's not just turn-on and use like a disk with USB connectivity. There is still some setup and there are still health indicators. The long and short of it is... even if you paid for professional consultants to design/build a multi-tiered storage array with local and remote backups for full disaster recovery, fault tolerance, and high availability... you would still need the expertise to use the system. The bigger and more complex the system is, the more maintenance it requires.

      Basically, at the end of the day... it's an investment. If your data is important to you, you would be well served to figure out the best way to protect that data. And to do that, requires having a working knowledge of the subject.

      It's the same with any other thing: car buying, insurance, babysitters, plumbers, etc.

      At the beginning, I felt that I didn't have time. That I would address the issue "later". The wakeup call was when my juggling act of several USB hard drives I used came tumbling down because it had been a couple of days since I rotated drives/copies, and one literally fell and hit the floor while it was running. Lost 2TB of data. It turns out my manual copying of data between drives had been lax. And it wasn't just 2 days of notes/emails/files/etc. I lost, but some random assortment of photoshoots, projects, and personal archives spread out over a 2 month period. Swiss cheese. And when I went to try to offload all the data from the remaining drives, I realized that while they worked fine most of the time, the moment I tried to do a full pull off of them, there were signs of bitrot from bad writes a long time ago that went undetected or because the enclosure was flaking out or a bad usb cable.

      At some point, the cost of losing data escalates to the point where it is worthwhile to make time to learn these things.

      I'm not saying everyone is going to lose data if they aren't fully read up on things and I'm not saying a single disk external drive is a bad idea. Just that all options have different levels of risk/cost/etc.

      There are certainly many more easy to use solutions out there today than there were some 8 years ago. You don't have to roll your own server. You can just go and pickup a beefy NAS unit with a good onboard controller, and you're set.

      It depends on your use case, your budget, and how much protection you want. But yeah, it really helps to invest the time to determine what you need for your situation.

      You don't need to be a specialist or a guru. Just need some baseline knowledge.

      Delete
  16. Wong, I certainly appreciate that like anything else a basic understanding is helpful. It's what I told my 87 year old mother about computers several years ago when I enrolled her in a basic computer class, it will be really helpful to her and me if she understand the difference in left and right clicking. Why didn't I teach her you might ask. Well, it's like teaching someone close to you to drive, not always a good idea if you want to keep harmony.

    That said and with all due respect for your obvious superior knowledge on the subject, I would hardly characterize understanding the complex world of an all encompassing backup strategy like "... car buying, insurance, babysitters, plumbers, etc. ". I can research a car without understanding why it runs, I can research a plumber simply by checking their past jobs or even easier by checking with Angie's List, etc, etc.

    ReplyDelete
    Replies
    1. I agree, it's not truly comparable to shopping for most products/services.

      The thing is, proper backup is not a trivial thing. For companies, it represents being in compliance or not, at risk of losing actual monies or not. So enterprise grade backup solutions are in-depth types of solutions.

      My prior posts approaches things from that angle. Looking at solutions that mitigates most/all of the risk. Something that a small business on up would want to look into, but not necessarily DIY.

      It comes down to what level of protection you want, how much you want to spend, and how involved do you want to be with the backup process.

      I agree the comparison to shopping for products/services isn't a perfect fit. But many of the things you would do for a car, plumber, or insurance are similar.

      It comes down to determining what you want to protect, what you want it protected against, how much of it is there, how long you need to keep it around for, and how much you want to spend.

      I think what Gary originally posted re: 3 x USB3 drives, is a very workable solution. It's more labor intensive, but if your data set all fits, then it is very workable and very portable.

      The only thing I would do to augment it is to suggest using external drives that internally mirror between two drives. That way, you would be able to survive loss of a single drive.

      Like I said, my own setup at home isn't complete yet. ^_^;; I've got backups for my laptops, but not for my desktop. And certainly not for my larger archive.

      To back it up, I would need to build a second storage array capable of serving as backup to the first array.

      Btw, thanks for engaging me in this discussion. It's reminded me, I need to get off my butt and save up for that second array.

      You'll notice, though, that my setup is not unlike the one detailed by Gary. The only real material difference is that I'm using 8 drive arrays vs USB3 drive. :)

      Delete
  17. Sir, you are most welcomed. I would be remiss for not mentioning that I got the best of the deal. You have urged me on to become more serious about my backup plan. I have USB drives and even have the NAS drive. What I'm missing and don't know how to setup is automatically doing the mirroring and the backup. For example, I keep my photos on a 2TB USB 3 drive, whenever I import from a card, I always do a simultaneous copy to my NAS which is partitioned into 2 drives. What I feel I'm missing is a way to make an automatic copy to a third location which will be stored at an off site location. Right now, I'm manually copying the first USB to Dropbox or OneDrive but only when I have time and think to do it and feel I've added enough new files to warrant the effort. I typically do the copy through Teracopy which even doing data checking it isn't slow but copying to OneDrive or Dropbox is not fast and I'm not comfortable starting the process and leaving the premises so I only do it when I know I'll be around a while.

    Fortunately, I am not shooting every day or even every week right now so there's no urgent need to do the backup. Although, I'd rather the copy be done to Dropbox or OneDrive every time I add even 1 image to my database.

    ReplyDelete
  18. @GEGjr Cool. :) I've got Google Drive and Dropbox as well. (Don't have OneDrive and don't use TeraCopy since I'm on the Mac).

    On the Mac, there is the built in TimeMachine, which will send backups to a TimeMachine capable device(TimeCapsule, MacOSX Server, USB drive, some NAS units). There is also the free software, Carbon Copy Cloner, which now has the means of scheduling automated copies from any SRC to any DST and even to multiple DST(s). Very very useful.

    On the Windows side of things(from googling), it looks like Windows has something called a Task Scheduler: http://windows.microsoft.com/en-US/windows/schedule-task#1TC=windows-7

    There looks to be a more customizable tool, ScheduledCopy, with multiple schedules and where you can specify SRC/DST individually: https://sourceforge.net/projects/scheduledcopy/


    So, it's possible to set it up so that you schedule automatic copies of files to the location where your OneDrive/Dropbox will pick them up.

    Another option is a watched folder. So as soon as a file lands in the folder, its sent to where you want it to be sent.

    ReplyDelete
  19. Hi Wing Tang Wong,

    I did some research on your suggestions. Using task scheduler in windows 10 requires some windows programming experience and I'm not at all sure a task can be setup to send a file from an external drive to a networked drive(s) or more to the point if I can figure out how to do it.

    Schedulecopy looks promising but it is in beta and only has 1 user review.

    The other program named watch 4 folder http://leelusoft.blogspot.in/2011/10/watch-4-folder-23.html looks promising but again I couldn't find any reviews other than on it's website.

    ReplyDelete

Thank you for your comment! All comments must be approved by a moderator before they will appear.