sample vol and aggr details below

can't upgrade, not on support, we have about 1PB in production across two clusters.  I am seeing this effect on both clusters, but not on all filers, which is strange.

if I move vols between affected and unaffected filers/aggrs, the allocated vs used normalizes for the volumes on the unaffected node, then re-inflate when moved back to the original node/aggr

                               Volume    Allocated      Used
vol created on problem aggr   v164402   11932348KB    1924KB
after move to unaffected aggr v164402       2872KB    2872KB
after move back to orig aggr  v164402   12005552KB   75284KB

resized to 1TB, moved to another aggr, and back to orig aggr

                              v164402    6003080KB   37972KB


it appears to be related to the size of the volume.  we thin-provision all volumes by setting them to a large size and set space guarantee to none.

vols sized at 2TB end up being allocated 12GB of space when created.  vols sized to 1TB end up with 6GB allocated when created, even though they are completely empty.

what's strange is this is happening on some filers/aggregates but not others.  

these clusters originated running Ontap GX, and then were upgraded to Ontap8 c-mode some time ago.

the aggregates that seem to be experiencing this allocation 'inflation' seem to be those that were created after the cluster was upgraded to Ontap8.

the aggregates that were originally created as GX aggrs, have identically matching 'allocated' and 'used' values in 'aggr show_space'

for example:

recently built aggregate under Ontap8:

Aggregate                       Allocated            Used           Avail
Total space                 18526684036KB   13662472732KB    1233809368KB   <- massive difference!


aggregate originated under Ontap GX:

Aggregate                       Allocated            Used           Avail
Total space                 32035928920KB   32035928920KB    1404570796KB  <- identical, which is what i would expect


maybe this allocation scheme changed at an aggregate level under Ontap 8?  perhaps it's expected behavior?

things do seem to normalize as the volumes begin to fill up though, so I believe that this space is not truly gone permanently, but it certainly appears to be not available for use, since tons of space is allocated to volumes that have very little data in them, and we have a LOT of volumes in these clusters.

it's definitely making it look like we are missing a substantial percentage of our disk space, when trying to reconcile the sum of data used when tallying up volume size, and comparing it to the aggregate used/remaining sizes.

example volume and affected aggregate details:

bc-gx-4b::*> vol show v164346 -instance
  (volume show)

                              Virtual Server Name: bc
                                      Volume Name: v164346
                                   Aggregate Name: gx4b_1
                                      Volume Size: 2TB
                                     Name Ordinal: base
                               Volume Data Set ID: 4041125
                        Volume Master Data Set ID: 2151509138
                                     Volume State: online
                                      Volume Type: RW
                                     Volume Style: flex
                                 Volume Ownership: cluster
                                    Export Policy: default
                                          User ID: jobsys
                                         Group ID: cgi
                                   Security Style: unix
                                 Unix Permissions: ---rwxr-x--x
                                    Junction Path: /bc/shows/ID2/DFO/0820
                             Junction Path Source: RW_volume
                                  Junction Active: true
                                    Parent Volume: v164232
                       Virtual Server Root Volume: false
                                          Comment:
                                   Available Size: 1.15TB
                                       Total Size: 2TB
                                        Used Size: 146.7MB
                                  Used Percentage: 42%
             Autosize Enabled (for flexvols only): false
             Maximum Autosize (for flexvols only): 2.40TB
           Autosize Increment (for flexvols only): 102.4GB
              Total Files (for user-visible data): 31876689
               Files Used (for user-visible data): 132
                           Maximum Directory Size: 100MB
                            Space Guarantee Style: none
                        Space Guarantee In Effect: true
                               Minimum Read Ahead: false
                       Access Time Update Enabled: true
                Snapshot Directory Access Enabled: true
          Percent of Space Reserved for Snapshots: 0%
                 Used Percent of Snapshot Reserve: 0%
                                  Snapshot Policy: daily
                                    Creation Time: Tue Dec 08 11:22:58 2015
                                         Language: C
                        Striped Data Volume Count: -
                 Striped Data Volume Stripe Width: 0.00B
                           Current Striping Epoch: -
             One data-volume per member aggregate: -
                                Concurrency Level: -
                              Optimization Policy: -
                                     Clone Volume: false
                      Anti-Virus On-Access Policy: default
                               UUID of the volume: 17fa4c6d-9de1-11e5-a888-123478563412
                            Striped Volume Format: -
                       Load Sharing Source Volume: -
                               Move Target Volume: false
                       Maximum Write Alloc Blocks: 0
                 Inconsistency in the file system: false

bc-gx-4b::*> aggr show -aggregate
    far4a_1  gx1a_1   gx1a_2   gx1b_1   gx2a_1   gx2b_1   gx3a_1   gx4b_1
    near1b_1 near3b_1 root_1a  root_1b  root_2a  root_2b  root_3a  root_3b
    root_4a  root_4b  slow2a_1 systems
bc-gx-4b::*> aggr show -aggregate gx4b_1

                                    Aggregate: gx4b_1
                                         UUID: c624f85e-96d3-11e3-a6ce-00a0980bb25a
                                         Size: 18.40TB
                                    Used Size: 17.25TB
                              Used Percentage: 94%
                               Available Size: 1.15TB
                                        State: online
                                        Nodes: bc-gx-4b
                              Number Of Disks: 63
                                        Disks: bc-gx-4b:0a.64, bc-gx-4b:0e.80,
                                               ...
                                               bc-gx-4b:0a.45
                            Number Of Volumes: 411
                                       Plexes: /gx4b_1/plex0(online)
                                  RAID Groups: /gx4b_1/plex0/rg0,
                                               /gx4b_1/plex0/rg1,
                                               /gx4b_1/plex0/rg2
                                    Raid Type: raid_dp
                                Max RAID Size: 21
                                  RAID Status: raid_dp
                             Checksum Enabled: true
                              Checksum Status: active
                               Checksum Style: block
                                 Inconsistent: false
                          Ignore Inconsistent: off
                    Block Checksum Protection: on
                    Zoned Checksum Protection: -
                  Automatic Snapshot Deletion: on
                        Enable Thorough Scrub: off
                                 Volume Style: flex
                                 Volume Types: flex
                             Has Mroot Volume: false
                Has Partner Node Mroot Volume: false
                                      Is root: false
                              Wafliron Status: -
                       Percent Blocks Scanned: -
                      Last Start Error Number: -
                        Last Start Error Info: -
                               Aggregate Type: aggr
                   Number of Quiesced Volumes: -
                 Number of Volumes not Online: -
      Number of LS Mirror Destination Volumes: -
      Number of DP Mirror Destination Volumes: -
    Number of Move Mirror Destination Volumes: -
Number of DP qtree Mirror Destination Volumes: -
                                    HA Policy: sfo
                                   Block Type: 64-bit



On Wed, Dec 9, 2015 at 12:50 PM, John Stoffel <john@stoffel.org> wrote:

Can you post the details of one of these volumes?  And of the
aggregate you have them in?  It smells like there's some sort of
minimum volume size setting somewhere.

Or maybe there's an aggregate level snapshot sitting around?

Can you upgrade?  You're in cluster mode, so hopefully it shoudln't be
too hard to move to 8.1, then 8.2 and onto 8.3, since there's lots of
nice bug fixes.