Sold Out
Book Categories |
Series Introduction | v | |
Foreword | vii | |
Preface | xi | |
1 | Introduction | 1 |
1.1 | Multiprocessor DSP systems | 2 |
1.2 | Application-specific multiprocessors | 4 |
1.3 | Exploitation of parallelism | 5 |
1.4 | Dataflow modeling for DSP design | 6 |
1.5 | Utility of dataflow for DSP | 9 |
1.6 | Overview | 11 |
2 | Application-Specific Multiprocessors | 13 |
2.1 | Parallel architecture classifications | 13 |
2.2 | Exploiting instruction level parallelism | 15 |
2.2.1 | ILP in programmable DSP processors | 15 |
2.2.2 | Sub-word parallelism | 17 |
2.2.3 | VLIW processors | 18 |
2.3 | Dataflow DSP architectures | 19 |
2.4 | Systolic and wavefront arrays | 20 |
2.5 | Multiprocessor DSP architectures | 21 |
2.6 | Single chip multiprocessors | 23 |
2.7 | Reconfigurable computing | 25 |
2.8 | Architectures that exploit predictable IPC | 27 |
2.9 | Summary | 29 |
3 | Background Terminology and Notation | 31 |
3.1 | Graph data structures | 31 |
3.2 | Dataflow graphs | 32 |
3.3 | Computation graphs | 32 |
3.4 | Petri nets | 33 |
3.5 | Synchronous dataflow | 34 |
3.6 | Analytical properties of SDF graphs | 35 |
3.7 | Converting a general SDF graph into a homogeneous SDF graph | 36 |
3.8 | Acyclic precedence expansion graph | 38 |
3.9 | Application graph | 41 |
3.10 | Synchronous languages | 42 |
3.11 | HSDFG concepts and notations | 43 |
3.12 | Complexity of algorithms | 45 |
3.13 | Shortest and longest paths in graphs | 47 |
3.13.1 | Dijkstra's algorithm | 48 |
3.13.2 | The Bellman-Ford algorithm | 48 |
3.13.3 | The Floyd-Warshall algorithm | 49 |
3.14 | Solving difference constraints using shortest paths | 50 |
3.15 | Maximum cycle mean | 53 |
3.16 | Summary | 53 |
4 | Mul Tiprocessor Scheduling Models | 55 |
4.1 | Task-level parallelism and data parallelism | 55 |
4.2 | Static versus dynamic scheduling strategies | 56 |
4.3 | Fully-static schedules | 57 |
4.4 | Self-timed schedules | 62 |
4.5 | Dynamic schedules | 64 |
4.6 | Quasi-static schedules | 65 |
4.7 | Schedule notation | 67 |
4.8 | Unfolding HSDF graphs | 69 |
4.9 | Execution time estimates and static schedules | 72 |
4.10 | Summary | 74 |
5 | IPC-Conscious Scheduling Algorithms | 75 |
5.1 | Problem description | 75 |
5.2 | Stone's assignment algorithm | 76 |
5.3 | List scheduling algorithms | 80 |
5.3.1 | Graham's bounds | 81 |
5.3.2 | The basic algorithms -- HLFET and ETF | 84 |
5.3.3 | The mapping heuristic | 84 |
5.3.4 | Dynamic level scheduling | 85 |
5.3.5 | Dynamic critical path scheduling | 86 |
5.4 | Clustering algorithms | 87 |
5.4.1 | Linear clustering | 88 |
5.4.2 | Internalization | 89 |
5.4.3 | Dominant sequence clustering | 89 |
5.4.4 | Declustering | 91 |
5.5 | Integrated scheduling algorithms | 92 |
5.6 | Pipelined scheduling | 94 |
5.7 | Summary | 100 |
6 | The Ordered-Transactions Strategy | 101 |
6.1 | The ordered-transactions strategy | 101 |
6.2 | Shared bus architecture | 104 |
6.3 | Interprocessor communication mechanisms | 104 |
6.4 | Using the ordered-transactions approach | 107 |
6.5 | Design of an ordered memory access multiprocessor | 108 |
6.5.1 | High level design description | 108 |
6.5.2 | A modified design | 109 |
6.6 | Design details of a prototype | 112 |
6.6.1 | Top level design | 112 |
6.6.2 | Transaction order controller | 114 |
6.6.3 | Host interface | 118 |
6.6.4 | Processing element | 121 |
6.6.5 | FPGA circuitry | 122 |
6.6.6 | Shared memory | 123 |
6.6.7 | Connecting multiple boards | 123 |
6.7 | Hardware and software implementation | 125 |
6.7.1 | Board design | 125 |
6.7.2 | Software interface | 125 |
6.8 | Ordered I/O and parameter control | 128 |
6.9 | Application examples | 129 |
6.9.1 | Music synthesis | 129 |
6.9.2 | QMF filter bank | 131 |
6.9.3 | 1024 point complex Fast Fourier Transform (FFT) | 132 |
6.10 | Summary | 134 |
7 | Analysis of the Ordered-Transactions Strategy | 135 |
7.1 | Inter-processor communication graph (G[subscript ipc]) | 138 |
7.2 | Execution time estimates | 143 |
7.3 | Ordering constraints viewed as edges added to G[subscript ipc] | 144 |
7.4 | Periodicity | 145 |
7.5 | Optimal order | 146 |
7.6 | Effects of changes in execution times | 149 |
7.6.1 | Deterministic case | 150 |
7.6.2 | Modeling run-time variations in execution times | 151 |
7.6.3 | Bounds on the average iteration period | 154 |
7.6.4 | Implications for the ordered transactions schedule | 155 |
7.7 | Summary | 157 |
8 | Extending the Oma Architecture | 159 |
8.1 | The Boolean dataflow model | 159 |
8.1.1 | Scheduling | 160 |
8.2 | Parallel implementation on shared memory machines | 163 |
8.2.1 | General strategy | 163 |
8.2.2 | Implementation on the OMA | 165 |
8.2.3 | Improved mechanism | 169 |
8.2.4 | Generating the annotated bus access list | 171 |
8.3 | Data-dependent iteration | 174 |
8.4 | Summary | 175 |
9 | Synchronization in Self-Timed Systems | 177 |
9.1 | The barrier MIMD technique | 178 |
9.2 | Redundant synchronization removal in non-iterative dataflow | 179 |
9.3 | Analysis of self-timed execution | 182 |
9.3.1 | Estimated throughput | 182 |
9.4 | Strongly connected components and buffer size bounds | 182 |
9.5 | Synchronization model | 185 |
9.5.1 | Synchronization protocols | 185 |
9.5.2 | The synchronization graph G[subscript s] | 187 |
9.6 | A synchronization cost metric | 190 |
9.7 | Removing redundant synchronizations | 191 |
9.7.1 | The independence of redundant synchronizations | 192 |
9.7.2 | Removing redundant synchronizations | 193 |
9.7.3 | Comparison with Shaffer's approach | 195 |
9.7.4 | An example | 195 |
9.8 | Making the synchronization graph strongly connected | 197 |
9.8.1 | Adding edges to the synchronization graph | 199 |
9.9 | Insertion of delays | 201 |
9.9.1 | Analysis of DetermineDelays | 205 |
9.9.2 | Delay insertion example | 207 |
9.9.3 | Extending the algorithm | 208 |
9.9.4 | Complexity | 210 |
9.9.5 | Related work | 210 |
9.10 | Summary | 211 |
10 | Resynchronization | 213 |
10.1 | Definition of resynchronization | 213 |
10.2 | Properties of resynchronization | 215 |
10.3 | Relationship to set covering | 218 |
10.4 | Intractability of resynchronization | 221 |
10.5 | Heuristic solutions | 224 |
10.5.1 | Applying set-covering techniques to pairs of SCCs | 224 |
10.5.2 | A more flexible approach | 225 |
10.5.3 | Unit-subsumption resynchronization edges | 231 |
10.5.4 | Example | 234 |
10.5.5 | Simulation approach | 236 |
10.6 | Chainable synchronization graphs | 236 |
10.6.1 | Chainable synchronization graph SCCs | 237 |
10.6.2 | Comparison to the Global-Resynchronize heuristic | 239 |
10.6.3 | A generalization of the chaining technique | 240 |
10.6.4 | Incorporating the chaining technique | 242 |
10.7 | Resynchronization of constraint graphs for relative scheduling | 242 |
10.8 | Summary | 243 |
11 | Latency-Constrained Resynchronization | 245 |
11.1 | Elimination of synchronization edges | 246 |
11.2 | Latency-constrained resynchronization | 248 |
11.3 | Intractability of LCR | 253 |
11.4 | Two-processor systems | 260 |
11.4.1 | Interval covering | 261 |
11.4.2 | Two-processor latency-constrained resynchronization | 262 |
11.4.3 | Taking delays into account | 266 |
11.5 | A heuristic for general synchronization graphs | 276 |
11.5.1 | Customization to transparent synchronization graphs | 278 |
11.5.2 | Complexity | 278 |
11.5.3 | Example | 280 |
11.6 | Summary | 286 |
12 | Integrated Synchronization Optimization | 291 |
12.1 | Computing buffer sizes | 291 |
12.2 | A framework for self-timed implementation | 292 |
12.3 | Summary | 294 |
13 | Future Research Directions | 297 |
Bibliography | 301 | |
Index | 321 |
Login|Complaints|Blog|Games|Digital Media|Souls|Obituary|Contact Us|FAQ
CAN'T FIND WHAT YOU'RE LOOKING FOR? CLICK HERE!!! X
You must be logged in to add to WishlistX
This item is in your Wish ListX
This item is in your CollectionEmbedded Multiprocessors : Scheduling and Synchronization
X
This Item is in Your InventoryEmbedded Multiprocessors : Scheduling and Synchronization
X
You must be logged in to review the productsX
X
X
Add Embedded Multiprocessors : Scheduling and Synchronization, Embedded Multiprocessors Scheduling and Synchronization Series Volume: 3 This item is part of the Signal Processing and Communications series. Application-specific, embedded multiprocessors are increasingly found today in high- performance communications , Embedded Multiprocessors : Scheduling and Synchronization to the inventory that you are selling on WonderClubX
X
Add Embedded Multiprocessors : Scheduling and Synchronization, Embedded Multiprocessors Scheduling and Synchronization Series Volume: 3 This item is part of the Signal Processing and Communications series. Application-specific, embedded multiprocessors are increasingly found today in high- performance communications , Embedded Multiprocessors : Scheduling and Synchronization to your collection on WonderClub |