1

Switched to a new file server and backup scheme (Read 230 times)

eric :)


    Hi everyone,

    Almost a year after the data center fiasco, I finally finished testing the new file server and switched to it a few minutes ago.  If you uploaded your workouts in the last couple of hours, the maps might not be available for a while because they still need to be copied over from the old server, which is happening right now and will probably take about an hour or so.

     

    With the old file server, backups are made once a day, which means the maximum window of data loss is 24 hours.  I wanted to reduce this window.  I looked into various solutions, there are some distributed file systems that have no data loss.  Due to their complexity and resource requirements, I opted for a simpler solution.

     

    The previous backup scheme takes about 4 hours to complete because the server has to scan through all the files looking for new ones.  Obviously this is not scalable, which is another reason I had to make the switch.  With the new server, the backup time is proportional to the amount of new data.  It currently takes less than a minute.  The brief backup time allows the server to make a backup once every 10 minutes.  While there is still a brief window of time for data loss, I feel that it is a good balance of resources and data security.

     

    eric Smile


    A Saucy Wench

      Like

      I have become Death, the destroyer of electronic gadgets

       

      "When I got too tired to run anymore I just pretended I wasnt tired and kept running anyway" - dd, age 7

      BeeRunB


        schemer

            The brief backup time allows the server to make a backup once every 10 minutes.  While there is still a brief window of time for data loss, I feel that it is a good balance of resources and data security.

           

          That is something to be proud of.

          Buzzie


          Bacon Party!

            thank you!

            Liz

            pace sera, sera

            jpdeaux


              We should talk.


              running metalhead

                Nice!
                Just professioanl curiosity: What backup system are you using?
                I think to recall that your server is Win2K8R2? 

                Kudos !

                - Egmond ( 14 januari )            :  1:41:40 (21K)
                - Vondelparkloop ( 20 januari ) :  0:58.1 (10K but did 13.44!!!)
                - Twiskemolenloop ( 4 maart )  :   1:35:19 (3th M45!)

                - Ekiden Zwolle (10K)   ( 25 maart )
                - Rotterdam Marathon ( 8 april )
                - Leiden Marathon Halve ( 27 mei )
                - Marathon Amersfoort ( 10 juni)

                LedLincoln


                not bad for mile 25

                  That's a huge improvement!  Kudos to the RA staff! Eric Smile

                  sergiomm


                    Wow! You are awesome, Eric Smile


                    Feeling the growl again

                      Perfectionist.  Smile

                       

                      Which reminds me, time to renew my subscription...on it now.

                       

                      MTA:  Wow, it was a lot easier to give you money this time than last time.  Big grin

                      "If you want to be a bad a$s, then do what a bad a$s does.  There's your pep talk for today.  Go Run." -- Slo_Hand

                       

                      I am spaniel - Crusher of Treadmills

                       

                      eric :)


                        Nice!
                        Just professioanl curiosity: What backup system are you using?
                        I think to recall that your server is Win2K8R2? 

                        Kudos !

                         

                        The original file server ran on Linux, with a nightly cron job that ran rsync.  The backup was slow because it had to walk the directory tree looking for new files.  It doesn't scale at all.  I considered logging file changes on the web servers, which would avoid the directory walk, but creating a fault tolerant solution is not trivial.

                         

                        I considered distributed file systems.  For RA's needs, the cluster would consist of two servers, each having a complete copy of all the data.  One of the requirements is that it must be free or reasonably priced.  I researched Ceph, Hadoop, Lustre, etc.  I'm quite impressed by Ceph.  Eventually, I decided to abandon the distributed approach in favor of a simpler solution.

                         

                        One issue that I haven't figured out with the distributed file system approach is how to do backups.  A cluster environment avoids the loss of data from a single server failure, but it is not a replacement for backups.  The reason is that if an error is introduced in one of the nodes, it will be propagated to the other ones.  Backups are still needed, just in case.  I suppose I could create a parallel cluster for backup purposes, but that's another two servers.

                         

                        The next set of systems I looked at were network attached storages (NAS).  Again, the top requirement is price and you can't beat open source options such as OpenFiler, FreeNAS and NAS4free.  I ruled out most of them because some are abandoned while others are not as mature.  I tried out FreeNAS.  It was ok.  The UI is not great but it worked as advertised.

                         

                        I realized that all the functionality that I needed from FreeNAS was provided by ZFS so I started looking into that.  Since ZFS is just a file system, it is command line driven.  That is actually a better option because the server doesn't need to run a web server for the management UI.  Aside from its fantastic software RAID and write verification support, it also has a great snapshot and remote backup feature.

                         

                        A snapshot can be done on the fly, without having to interrupt the read/write processes.  More impressively, these snapshots can be sent to a remote server.  Only the changed bits are sent, thus saving bandwidth and transmit time.  Rsync took hours to complete.  Most of that time is spent searching for new or changed files.  ZFS can do the snapshot and remote backup in less than a minute.  The file server can have hundreds or even thousands of snapshots, thus ensuring that should an error is introduced, I can always go back to prior snapshots.

                         

                        It might not be the answer you're looking for if you're looking for the complete package, but if you're willing to tinker with it, ZFS is a pretty good option.

                         

                        eric Smile

                        LedLincoln


                        not bad for mile 25

                          Perfectionist.  Smile

                           

                          Which reminds me, time to renew my subscription...on it now.

                           

                          MTA:  Wow, it was a lot easier to give you money this time than last time.  Big grin

                           

                          I renewed yesterday as well, and it was indeed easy, and cheap.

                           

                          One thing...I had tried to renew some time ago, before my subscription expired, and it was counted as an additional donation, rather than extending the subscription.  I'm okay with that, actually, but if your intent is to just renew, you may have to wait until it expires and then click the Disable Ads link.  OTOH, I may have just done it wrong the first time.

                          davidramsay1122


                            nice and informative discussion